Introduction¶
This dataset displays water quality data collected by Back Bay National Wildlife Refuge from 1989-2019. Samples collected once every two weeks from Back Bay and surrounding bodies of water (sites Bay, A, B, C, D) were measured by turbidity, pH, dissolved oxygen levels, salinity, and temperature. The purpose of this study was to ensure water quality standards were being met according to state regulations.
Citation:
Back Bay National Wildlife Refuge. 2020. Water Quality Data. Virginia Beach, Virginia
Virginia Department of Environmental Quality. Laws & Regulations. Virginia Department of Environmental Quality. https://www.deq.virginia.gov/laws-regulations
Preprocessing¶
To prepare this data for analysis, I got rid of unnecessary columns such as extra temperature columns, who verified the data, and sample number. I also renamed some columns to be more understandable and to better suit my liking. I got rid of a row that I believe was mistakenly inputted as being sampled in 1899 and got rid of rows with NaN values or rows that lacked the needed data.
import pandas as pd
water = pd.read_csv("BKB_WaterQualityData_2020084.csv")
water = water[water["Year"] > 1988]
water = water.drop(["Unit_Id", "Water Depth (m)", "Air Temp-Celsius", "Air Temp (?F)", "Time (24:00)", "Field_Tech", "DateVerified", "WhoVerified", "AirTemp (C)"], axis = 1)
water = water.rename(columns = {"Water Temp (?C)" : "Water Temp (C)", "pH (standard units)" : "pH"})
water = water.dropna()
water
/var/folders/gc/0752xrm56pnf0r0dsrn5370c0000gr/T/ipykernel_22553/84725136.py:1: DeprecationWarning: Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0), (to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries) but was not found to be installed on your system. If this would cause problems for you, please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466 import pandas as pd
Site_Id | Read_Date | Salinity (ppt) | Dissolved Oxygen (mg/L) | pH | Secchi Depth (m) | Water Temp (C) | Year | |
---|---|---|---|---|---|---|---|---|
0 | Bay | 1/3/1994 | 1.3 | 11.7 | 7.3 | 0.40 | 5.9 | 1994 |
1 | Bay | 1/31/1994 | 1.5 | 12.0 | 7.4 | 0.20 | 3.0 | 1994 |
2 | Bay | 2/7/1994 | 1.0 | 10.5 | 7.2 | 0.25 | 5.9 | 1994 |
3 | Bay | 2/23/1994 | 1.0 | 10.1 | 7.4 | 0.35 | 10.0 | 1994 |
4 | Bay | 2/28/1994 | 1.0 | 12.6 | 7.2 | 0.20 | 1.6 | 1994 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
2361 | D | 10/11/2018 | 0.0 | 6.0 | 6.5 | 0.70 | 26.0 | 2018 |
2364 | D | 11/7/2018 | 0.0 | 6.9 | 6.5 | 0.90 | 20.0 | 2018 |
2366 | Bay | 10/11/2018 | 1.9 | 5.0 | 7.0 | 4.00 | 25.0 | 2018 |
2367 | Bay | 10/24/2018 | 0.0 | 9.0 | 7.0 | 0.30 | 18.0 | 2018 |
2368 | Bay | 10/28/2018 | 0.9 | 2.9 | 7.0 | 0.40 | 13.0 | 2018 |
1320 rows × 8 columns
Here is more processing to account for seasonal variation by grouping data by month.
def month_convert(date):
month_number = int(date.split('/')[0])
if month_number == 1:
month = "January"
elif month_number == 2:
month = "February"
elif month_number == 3:
month = "March"
elif month_number == 4:
month = "April"
elif month_number == 5:
month = "May"
elif month_number == 6:
month = "June"
elif month_number == 7:
month = "July"
elif month_number == 8:
month = "August"
elif month_number == 9:
month = "September"
elif month_number == 10:
month = "October"
elif month_number == 11:
month = "November"
elif month_number == 12:
month = "December"
return month
water["Month"] = water["Read_Date"].apply(month_convert)
water.drop("Read_Date", axis = 1, inplace = True)
water
Site_Id | Salinity (ppt) | Dissolved Oxygen (mg/L) | pH | Secchi Depth (m) | Water Temp (C) | Year | Month | |
---|---|---|---|---|---|---|---|---|
0 | Bay | 1.3 | 11.7 | 7.3 | 0.40 | 5.9 | 1994 | January |
1 | Bay | 1.5 | 12.0 | 7.4 | 0.20 | 3.0 | 1994 | January |
2 | Bay | 1.0 | 10.5 | 7.2 | 0.25 | 5.9 | 1994 | February |
3 | Bay | 1.0 | 10.1 | 7.4 | 0.35 | 10.0 | 1994 | February |
4 | Bay | 1.0 | 12.6 | 7.2 | 0.20 | 1.6 | 1994 | February |
... | ... | ... | ... | ... | ... | ... | ... | ... |
2361 | D | 0.0 | 6.0 | 6.5 | 0.70 | 26.0 | 2018 | October |
2364 | D | 0.0 | 6.9 | 6.5 | 0.90 | 20.0 | 2018 | November |
2366 | Bay | 1.9 | 5.0 | 7.0 | 4.00 | 25.0 | 2018 | October |
2367 | Bay | 0.0 | 9.0 | 7.0 | 0.30 | 18.0 | 2018 | October |
2368 | Bay | 0.9 | 2.9 | 7.0 | 0.40 | 13.0 | 2018 | October |
1320 rows × 8 columns
Outlier Test - Salinity (ppt)
IQR_sal = water["Salinity (ppt)"].quantile(0.75) - water["Salinity (ppt)"].quantile(0.25)
outlier_sal = water[(water["Salinity (ppt)"] < water["Salinity (ppt)"].quantile(0.25) - 1.5*IQR_sal) | (water["Salinity (ppt)"] > water["Salinity (ppt)"].quantile(0.75) + 1.5*IQR_sal)]
count = outlier_sal.groupby("Month").size()
months_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
count.index = pd.Categorical(count.index, categories=months_order, ordered=True)
count = count.sort_index()
count
January 1 February 1 March 1 April 3 May 4 June 15 July 11 August 12 September 7 October 11 November 8 December 4 dtype: int64
outlier_sal
Site_Id | Salinity (ppt) | Dissolved Oxygen (mg/L) | pH | Secchi Depth (m) | Water Temp (C) | Year | Month | |
---|---|---|---|---|---|---|---|---|
61 | Bay | 4.2 | 11.6 | 7.5 | 0.15 | 5.0 | 1990 | February |
68 | Bay | 4.1 | 10.8 | 7.0 | 0.25 | 11.0 | 1990 | April |
72 | Bay | 4.5 | 8.9 | 7.0 | 0.25 | 17.0 | 1990 | May |
79 | Bay | 3.8 | 7.9 | 7.0 | 0.35 | 26.0 | 1990 | June |
81 | Bay | 4.2 | 7.5 | 7.5 | 0.30 | 26.0 | 1990 | July |
... | ... | ... | ... | ... | ... | ... | ... | ... |
813 | Bay | 4.0 | 3.7 | 7.0 | 0.50 | 27.0 | 2003 | September |
867 | B | 4.0 | 5.7 | 7.5 | 0.20 | 4.0 | 2004 | December |
1159 | Bay | 8.0 | 0.1 | 7.0 | 0.40 | 18.0 | 2006 | October |
1383 | Bay | 4.0 | 6.2 | 7.0 | 0.50 | 24.5 | 2008 | June |
2309 | Bay | 9.0 | 4.2 | 8.0 | 0.40 | 24.0 | 2019 | June |
78 rows × 8 columns
Outlier Test - Dissolved Oxygen (mg/L)
IQR_DO = water["Dissolved Oxygen (mg/L)"].quantile(0.75) - water["Dissolved Oxygen (mg/L)"].quantile(0.25)
outlier_DO = water[(water["Dissolved Oxygen (mg/L)"] < water["Dissolved Oxygen (mg/L)"].quantile(0.25) - 1.5*IQR_DO) | (water["Dissolved Oxygen (mg/L)"] > water["Dissolved Oxygen (mg/L)"].quantile(0.75) + 1.5*IQR_DO)]
outlier_DO
Site_Id | Salinity (ppt) | Dissolved Oxygen (mg/L) | pH | Secchi Depth (m) | Water Temp (C) | Year | Month | |
---|---|---|---|---|---|---|---|---|
133 | Bay | 1.4 | 15.1 | 8.0 | 0.2 | 2.0 | 1991 | December |
Outlier Test - pH
IQR_pH = water["pH"].quantile(0.75) - water["pH"].quantile(0.25)
outlier_pH = water[(water["pH"] < water["pH"].quantile(0.25) - 1.5*IQR_pH) | (water["pH"] > water["pH"].quantile(0.75) + 1.5*IQR_pH)]
count = outlier_pH.groupby("Month").size()
months_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
count.index = pd.Categorical(count.index, categories=months_order, ordered=True)
count = count.sort_index()
count
January 2 February 4 March 2 April 3 May 3 June 5 July 2 August 7 September 2 November 3 December 2 dtype: int64
outlier_pH
Site_Id | Salinity (ppt) | Dissolved Oxygen (mg/L) | pH | Secchi Depth (m) | Water Temp (C) | Year | Month | |
---|---|---|---|---|---|---|---|---|
15 | Bay | 1.0 | 10.80 | 9.1 | 0.05 | 21.7 | 1994 | May |
22 | Bay | 2.0 | 11.00 | 9.4 | 0.50 | 30.5 | 1994 | June |
23 | Bay | 2.0 | 10.70 | 9.2 | 0.45 | 29.0 | 1994 | August |
24 | Bay | 2.0 | 10.20 | 9.2 | 0.40 | 22.0 | 1994 | August |
26 | Bay | 2.0 | 9.10 | 9.1 | 0.45 | 27.0 | 1994 | August |
27 | Bay | 2.3 | 9.80 | 9.3 | 0.45 | 27.0 | 1994 | August |
250 | Bay | 4.5 | 10.40 | 9.2 | 0.30 | 24.5 | 1995 | September |
381 | Bay | 1.8 | 1.80 | 9.5 | 0.35 | 27.0 | 1999 | August |
382 | Bay | 3.0 | 7.00 | 9.2 | 0.40 | 28.0 | 1999 | August |
388 | Bay | 1.0 | 8.70 | 9.1 | 0.50 | 11.0 | 1999 | December |
417 | B | 1.5 | 7.60 | 9.7 | 4.00 | 30.0 | 2000 | May |
441 | C | 1.0 | 8.20 | 9.5 | 0.40 | 28.0 | 2000 | June |
475 | D | 0.0 | 10.00 | 9.3 | 1.10 | 7.0 | 2000 | December |
481 | Bay | 1.5 | 7.60 | 9.1 | 0.30 | 18.0 | 2000 | April |
495 | Bay | 1.0 | 4.80 | 9.2 | 0.30 | 16.0 | 2000 | November |
496 | Bay | 1.0 | 7.80 | 9.9 | 0.40 | 13.0 | 2000 | November |
507 | A | 1.0 | 2.50 | 9.1 | 0.20 | 22.0 | 2001 | May |
512 | A | 1.5 | 4.90 | 9.2 | 0.50 | 26.0 | 2001 | July |
515 | A | 3.0 | 3.00 | 0.3 | 0.35 | 26.0 | 2001 | September |
575 | D | 0.0 | 4.10 | 9.3 | 0.80 | 19.5 | 2001 | April |
591 | Bay | 1.0 | 8.10 | 9.2 | 0.36 | 7.0 | 2001 | January |
592 | Bay | 1.0 | 9.00 | 9.3 | 0.10 | 8.0 | 2001 | January |
594 | Bay | 1.0 | 9.20 | 9.1 | 0.20 | 11.0 | 2001 | February |
595 | Bay | 1.2 | 9.25 | 9.2 | 0.35 | 14.0 | 2001 | March |
596 | Bay | 1.0 | 9.90 | 9.4 | 0.25 | 10.0 | 2001 | March |
598 | Bay | 2.0 | 6.40 | 9.3 | 0.30 | 20.0 | 2001 | April |
601 | Bay | 2.0 | 4.35 | 9.2 | 0.40 | 23.0 | 2001 | June |
604 | Bay | 3.0 | 7.20 | 9.8 | 0.30 | 23.0 | 2001 | July |
605 | Bay | 3.0 | 4.10 | 9.1 | 0.40 | 30.0 | 2001 | August |
611 | Bay | 3.0 | 5.50 | 9.5 | 0.25 | 13.0 | 2001 | November |
680 | D | 0.0 | 8.30 | 9.6 | 1.30 | 11.0 | 2002 | February |
702 | Bay | 3.0 | 5.70 | 9.9 | 0.10 | 12.0 | 2002 | February |
703 | Bay | 3.0 | 7.60 | 9.2 | 0.30 | 11.0 | 2002 | February |
749 | B | 0.0 | 8.50 | 9.5 | 0.10 | 25.1 | 2003 | June |
2217 | A | 0.1 | 4.70 | 4.8 | 0.50 | 27.0 | 2018 | June |
Outlier Test - Secchi Depth (m)
IQR_secchi = water["Secchi Depth (m)"].quantile(0.75) - water["Secchi Depth (m)"].quantile(0.25)
outlier_secchi = water[(water["Secchi Depth (m)"] < water["Secchi Depth (m)"].quantile(0.25) - 1.5*IQR_secchi) | (water["Secchi Depth (m)"] > water["Secchi Depth (m)"].quantile(0.75) + 1.5*IQR_secchi)]
count = outlier_secchi.groupby("Month").size()
months_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
count.index = pd.Categorical(count.index, categories=months_order, ordered=True)
count = count.sort_index()
count
January 11 February 13 March 9 April 3 May 4 June 1 July 3 August 2 September 2 October 7 November 7 December 12 dtype: int64
outlier_secchi
Site_Id | Salinity (ppt) | Dissolved Oxygen (mg/L) | pH | Secchi Depth (m) | Water Temp (C) | Year | Month | |
---|---|---|---|---|---|---|---|---|
301 | Bay | 1.5 | 10.90 | 6.6 | 2.50 | 10.1 | 1997 | February |
362 | C | 1.0 | 2.40 | 8.0 | 5.80 | 34.0 | 1999 | September |
371 | D | 0.0 | 1.50 | 7.4 | 1.75 | 34.5 | 1999 | September |
375 | D | 0.0 | 6.50 | 8.0 | 1.05 | 15.0 | 1999 | November |
378 | D | 0.0 | 8.25 | 8.0 | 1.10 | 5.0 | 1999 | December |
... | ... | ... | ... | ... | ... | ... | ... | ... |
2205 | D | 0.0 | 9.90 | 6.5 | 1.30 | 7.0 | 2017 | December |
2215 | A | 0.1 | 4.00 | 6.5 | 5.50 | 26.0 | 2018 | May |
2241 | D | 0.0 | 10.00 | 6.5 | 1.60 | 4.0 | 2018 | January |
2242 | D | 0.0 | 9.50 | 6.5 | 1.30 | 6.0 | 2018 | February |
2366 | Bay | 1.9 | 5.00 | 7.0 | 4.00 | 25.0 | 2018 | October |
74 rows × 8 columns
Outlier Test - Water Temp (C)
IQR_temp = water["Water Temp (C)"].quantile(0.75) - water["Water Temp (C)"].quantile(0.25)
outlier_temp = water[(water["Water Temp (C)"] < water["Water Temp (C)"].quantile(0.25) - 1.5*IQR_temp) | (water["Water Temp (C)"] > water["Water Temp (C)"].quantile(0.75) + 1.5*IQR_temp)]
outlier_temp
Site_Id | Salinity (ppt) | Dissolved Oxygen (mg/L) | pH | Secchi Depth (m) | Water Temp (C) | Year | Month | |
---|---|---|---|---|---|---|---|---|
701 | Bay | 3.0 | 9.0 | 9.0 | 0.3 | 59.0 | 2002 | January |
822 | A | 1.0 | 7.0 | 7.0 | 0.8 | 60.0 | 2004 | February |
1038 | Bay | 0.0 | 5.0 | 6.5 | 0.1 | 74.0 | 2005 | September |
1232 | D | 0.0 | 7.6 | 5.0 | 1.2 | 54.0 | 2007 | January |
Summary Data Analysis¶
Below is a graph of the distribution of the number of samples taken from each site.
import matplotlib as plt
counts = water["Site_Id"].value_counts()
counts = counts.loc[['Bay', 'A', 'B', 'C', 'D']]
counts.plot(kind = "bar")
plt.pyplot.show()
Below are the averages of each quantitative column for each site over all the years in which data was collected.
filter = water.drop(["Month", "Year"], axis = 1)
averages = filter.groupby("Site_Id").mean()
averages = averages.loc[['Bay', 'A', 'B', 'C', 'D']]
averages
Salinity (ppt) | Dissolved Oxygen (mg/L) | pH | Secchi Depth (m) | Water Temp (C) | |
---|---|---|---|---|---|
Site_Id | |||||
Bay | 1.494714 | 7.677107 | 7.382452 | 0.405044 | 17.422482 |
A | 0.345875 | 5.372813 | 6.936875 | 0.608437 | 19.500000 |
B | 0.440588 | 5.383294 | 7.307647 | 0.304882 | 18.501176 |
C | 0.499159 | 5.591589 | 7.318692 | 0.480935 | 18.336449 |
D | 0.088317 | 6.386139 | 6.726733 | 0.893218 | 18.272772 |
Here are the averages of each quantitative value for the site Bay grouped by month for the years 1989-2019.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
site_Bay = water[water["Site_Id"] == "Bay"]
numeric_columns = site_Bay.select_dtypes(include = [np.number]).columns.tolist()
month_dict = {1: "January", 2: "February", 3: "March", 4: "April", 5: "May", 6: "June", 7: "July", 8: "August", 9: "September", 10: "October", 11: "November", 12: "December"}
water["Month"].replace(month_dict, inplace = True)
averages = site_Bay.groupby("Month")[numeric_columns].mean()
averages = averages.drop("Year", axis = 1)
averages
graph = sns.relplot(data = averages, kind = "line")
graph.set_xticklabels(rotation = 45)
graph
/var/folders/gc/0752xrm56pnf0r0dsrn5370c0000gr/T/ipykernel_22553/1616719987.py:7: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. water["Month"].replace(month_dict, inplace = True)
<seaborn.axisgrid.FacetGrid at 0x149dd66d0>
averages
Salinity (ppt) | Dissolved Oxygen (mg/L) | pH | Secchi Depth (m) | Water Temp (C) | |
---|---|---|---|---|---|
Month | |||||
April | 1.442687 | 8.267910 | 7.114179 | 0.298060 | 15.613433 |
August | 1.722203 | 6.455085 | 8.030508 | 0.495085 | 25.600000 |
December | 1.115217 | 9.014130 | 7.095652 | 0.399783 | 8.693696 |
February | 1.166071 | 9.002679 | 7.116071 | 0.360357 | 8.085714 |
January | 1.088000 | 9.283000 | 7.136000 | 0.274600 | 8.728000 |
July | 1.547273 | 6.228182 | 7.555455 | 0.473273 | 26.909091 |
June | 1.930769 | 6.813077 | 7.587692 | 0.539923 | 24.486154 |
March | 1.301786 | 8.899107 | 7.083929 | 0.254018 | 11.051786 |
May | 1.380000 | 7.603770 | 7.344262 | 0.344344 | 20.303279 |
November | 1.508929 | 8.223750 | 7.302679 | 0.507321 | 13.626786 |
October | 1.921429 | 7.099107 | 7.419643 | 0.482143 | 18.030357 |
September | 1.645370 | 5.642593 | 7.729630 | 0.418333 | 24.394444 |
Here are the averages of each quantitative value grouped by month for site A from the years 2000-2019.
import numpy as np
import matplotlib.pyplot as plt
site_Bay = water[water["Site_Id"] == "A"]
numeric_columns = site_Bay.select_dtypes(include=[np.number]).columns.tolist()
averages = site_Bay.groupby("Month")[numeric_columns].mean()
averages = averages.drop('Year', axis=1)
graph = sns.relplot(data = averages, kind = "line")
graph.set_xticklabels(rotation = 45)
graph
<seaborn.axisgrid.FacetGrid at 0x14f994c10>
averages
Salinity (ppt) | Dissolved Oxygen (mg/L) | pH | Secchi Depth (m) | Water Temp (C) | |
---|---|---|---|---|---|
Month | |||||
April | 0.100000 | 4.490000 | 6.700000 | 0.430000 | 18.500000 |
August | 0.400000 | 3.866667 | 7.046667 | 0.466667 | 27.400000 |
December | 0.750000 | 7.550000 | 7.012500 | 0.718750 | 8.500000 |
February | 0.366667 | 7.040000 | 7.006667 | 0.733333 | 12.666667 |
January | 0.416667 | 7.316667 | 6.891667 | 0.754167 | 7.916667 |
July | 0.426923 | 3.976923 | 7.153846 | 0.607692 | 27.923077 |
June | 0.318750 | 4.512500 | 7.062500 | 0.578125 | 25.468750 |
March | 0.208333 | 6.583333 | 6.750000 | 0.637500 | 11.708333 |
May | 0.162500 | 4.162500 | 7.025000 | 0.875000 | 23.062500 |
November | 0.285714 | 6.517857 | 6.928571 | 0.528571 | 14.785714 |
October | 0.323077 | 5.665385 | 6.961538 | 0.480769 | 21.000000 |
September | 0.493125 | 4.284375 | 6.650000 | 0.487500 | 25.687500 |
Here are the averages of each quantitative value grouped by month for site B between 2000-2019.
import numpy as np
import matplotlib.pyplot as plt
site_Bay = water[water["Site_Id"] == "B"]
numeric_columns = site_Bay.select_dtypes(include=[np.number]).columns.tolist()
averages = site_Bay.groupby("Month")[numeric_columns].mean()
averages = averages.drop('Year', axis=1)
graph = sns.relplot(data = averages, kind = "line")
graph.set_xticklabels(rotation = 45)
graph
<seaborn.axisgrid.FacetGrid at 0x139f406d0>
averages
Salinity (ppt) | Dissolved Oxygen (mg/L) | pH | Secchi Depth (m) | Water Temp (C) | |
---|---|---|---|---|---|
Month | |||||
April | 0.500000 | 5.975000 | 7.583333 | 0.329167 | 16.833333 |
August | 0.423077 | 3.623077 | 7.476923 | 0.258462 | 26.769231 |
December | 0.609091 | 6.136364 | 7.027273 | 0.311818 | 9.318182 |
February | 0.500000 | 7.561538 | 7.076923 | 0.346154 | 9.884615 |
January | 0.423077 | 6.969231 | 6.992308 | 0.334615 | 6.615385 |
July | 0.166667 | 3.416667 | 7.083333 | 0.433333 | 27.875000 |
June | 0.452941 | 4.505882 | 7.664706 | 0.220588 | 26.182353 |
March | 0.250000 | 7.414286 | 7.092857 | 0.321429 | 11.464286 |
May | 0.340000 | 5.024000 | 7.993333 | 0.544000 | 23.306667 |
November | 0.593750 | 6.050000 | 7.237500 | 0.228125 | 14.375000 |
October | 0.452941 | 5.329412 | 7.041176 | 0.232353 | 19.588235 |
September | 0.541176 | 3.282353 | 7.288235 | 0.178235 | 25.029412 |
Here are the averages of each quantitative value grouped by month for site C from the years 2000-2019.
import numpy as np
import matplotlib.pyplot as plt
site_Bay = water[water["Site_Id"] == "C"]
numeric_columns = site_Bay.select_dtypes(include=[np.number]).columns.tolist()
averages = site_Bay.groupby("Month")[numeric_columns].mean()
averages = averages.drop('Year', axis=1)
graph =sns.relplot(data = averages, kind = "line")
plt.xticks(rotation=45)
graph
<seaborn.axisgrid.FacetGrid at 0x14fb7e910>
averages
Salinity (ppt) | Dissolved Oxygen (mg/L) | pH | Secchi Depth (m) | Water Temp (C) | |
---|---|---|---|---|---|
Month | |||||
April | 0.200000 | 6.050000 | 7.260000 | 0.377000 | 17.300000 |
August | 0.611111 | 4.577778 | 7.122222 | 0.366667 | 28.222222 |
December | 0.510000 | 6.360000 | 6.950000 | 0.547000 | 7.500000 |
February | 0.666667 | 7.277778 | 7.333333 | 0.538889 | 8.333333 |
January | 1.000000 | 5.600000 | 7.160000 | 0.490000 | 8.500000 |
July | 0.126250 | 4.462500 | 7.337500 | 0.412500 | 28.875000 |
June | 0.636364 | 4.890909 | 7.945455 | 0.422727 | 26.090909 |
March | 0.500000 | 7.350000 | 7.100000 | 0.600000 | 8.500000 |
May | 1.000000 | 5.285714 | 7.585714 | 0.317143 | 20.857143 |
November | 0.220000 | 6.320000 | 7.020000 | 0.375000 | 15.800000 |
October | 0.509091 | 5.100000 | 7.545455 | 0.327273 | 20.954545 |
September | 0.333333 | 3.877778 | 7.322222 | 1.033333 | 24.666667 |
Here are the averages of each quantitative value grouped by month for site D between 2000-2019.
import numpy as np
import matplotlib as plt
site_Bay = water[water["Site_Id"] == "D"]
numeric_columns = site_Bay.select_dtypes(include=[np.number]).columns.tolist()
averages = site_Bay.groupby("Month")[numeric_columns].mean()
averages = averages.drop('Year', axis=1)
graph = sns.relplot(data = averages, kind = "line")
graph.set_xticklabels(rotation = 45)
graph
<seaborn.axisgrid.FacetGrid at 0x149dd6550>
averages
Salinity (ppt) | Dissolved Oxygen (mg/L) | pH | Secchi Depth (m) | Water Temp (C) | |
---|---|---|---|---|---|
Month | |||||
April | 0.055556 | 6.677778 | 6.688889 | 0.866667 | 16.805556 |
August | 0.011765 | 4.411765 | 6.682353 | 0.701765 | 27.235294 |
December | 0.238095 | 8.292857 | 6.652381 | 1.061905 | 9.261905 |
February | 0.055556 | 8.905556 | 6.777778 | 1.138889 | 7.444444 |
January | 0.076923 | 8.553846 | 6.584615 | 1.238462 | 11.076923 |
July | 0.058824 | 4.305882 | 6.741176 | 0.711765 | 28.176471 |
June | 0.000000 | 4.866667 | 6.713333 | 0.666667 | 25.066667 |
March | 0.083333 | 9.025000 | 6.666667 | 1.166667 | 10.000000 |
May | 0.015000 | 5.668750 | 6.706250 | 0.712500 | 21.150000 |
November | 0.133333 | 6.886667 | 6.733333 | 0.960000 | 15.066667 |
October | 0.161905 | 5.483333 | 6.819048 | 0.885714 | 20.904762 |
September | 0.105263 | 4.500000 | 6.878947 | 0.710526 | 24.984211 |
Here is a line graph showing the average salinity per year in ppt from the years 1989-2019, color coded by site. The second graph compares the mean salinities in ppt of each site. Because salinity stays relatively constant throughout a given year, the data is not grouped by month in any way.
import seaborn as sns
average_salinity = water.groupby(['Year', 'Site_Id'])['Salinity (ppt)'].mean().reset_index()
sns.relplot(data = average_salinity, x = "Year", y = "Salinity (ppt)", kind = "line", hue = "Site_Id")
sns.catplot(data = average_salinity, x = "Site_Id", y = "Salinity (ppt)", kind = "box")
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/categorical.py:640: FutureWarning: SeriesGroupBy.grouper is deprecated and will be removed in a future version of pandas. positions = grouped.grouper.result_index.to_numpy(dtype=float)
<seaborn.axisgrid.FacetGrid at 0x14fcd6590>
Here is a graph showing trends in the dissolved oxygen levels in mg/L from the years 1999-2019, color coded by site. Because dissolved oxygen varies seasonally, the graphs are grouped by month. This second graph compares the mean levels of dissolved oxygen in mg/L for all five sites. The third graph shows dissolved oxygen levels for site Bay from 1989-2019. This is a separate graph because from 1989-1998, the dissolved oxygen levels were abnormally, but consistently, high, and then in 1999 they dropped down and remained relatively in the same lower range.
import seaborn as sns
average_DO = water.groupby(["Year", "Month", "Site_Id"])["Dissolved Oxygen (mg/L)"].mean().reset_index()
bay = average_DO[average_DO["Site_Id"] == "Bay"]
average_DO = average_DO[average_DO["Year"] > 1998]
sns.relplot(data = average_DO, x = "Year", y = "Dissolved Oxygen (mg/L)", kind = "line", hue = "Site_Id", col = "Month", col_wrap = 4, height = 4)
sns.catplot(data = average_DO, x = "Site_Id", y = "Dissolved Oxygen (mg/L)", kind = "box")
sns.relplot(data = bay, x = "Year", y = "Dissolved Oxygen (mg/L)", kind = "line")
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/categorical.py:640: FutureWarning: SeriesGroupBy.grouper is deprecated and will be removed in a future version of pandas. positions = grouped.grouper.result_index.to_numpy(dtype=float)
<seaborn.axisgrid.FacetGrid at 0x17898d710>
Here is a graph showing the average pH each year from the years 1989-2019, color coded by site. The second graph compares the mean pH of each site. Because pH stays relatively constant throughout a given year, the data is not grouped by month in any way.
import seaborn as sns
average_pH = water.groupby(["Year", "Site_Id"])["pH"].mean().reset_index()
sns.relplot(data = average_pH, x = "Year", y = "pH", kind = "line", hue = "Site_Id")
sns.catplot(data = average_pH, x = "Site_Id", y = "pH", kind = "box")
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/categorical.py:640: FutureWarning: SeriesGroupBy.grouper is deprecated and will be removed in a future version of pandas. positions = grouped.grouper.result_index.to_numpy(dtype=float)
<seaborn.axisgrid.FacetGrid at 0x178c5d990>
Here is a graph showing the average Secchi depth in meters from the years 1989-2019, color coded by site. Secchi depth is a measurement of water transparency determined by how deep sunlight can penetrate the water. A shallow Secchi depth indicates cloudy waters. This is caused by a larger abundance of phytoplankton, which means the environment is healthy. Because Secchi depth stays relatively constant throughout a given year, the data is not grouped by month in any way.
import seaborn as sns
average_secchi = water.groupby(["Year", "Site_Id"])["Secchi Depth (m)"].mean().reset_index()
sns.relplot(data = average_secchi, x = "Year", y = "Secchi Depth (m)", kind = "line", hue = "Site_Id")
sns.catplot(data = average_secchi, x = "Site_Id", y = "Secchi Depth (m)", kind = "box")
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/categorical.py:640: FutureWarning: SeriesGroupBy.grouper is deprecated and will be removed in a future version of pandas. positions = grouped.grouper.result_index.to_numpy(dtype=float)
<seaborn.axisgrid.FacetGrid at 0x178c67e90>
Here is a graph showing the range of water temperatures in degrees Celsius from the years 1989-2019, color coded by site. The yearly average of temperature is not shown because of the fluctuation in temperature between winter and summer. The second graph compares the mean temperatures of each site. The temperature range stays fairly consistent between about 14-24 degrees Celsius.
import seaborn as sns
sns.scatterplot(data = water, x = "Year", y = "Water Temp (C)", hue = "Month", size = "Site_Id")
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
sns.catplot(data = water, x = "Site_Id", y = "Water Temp (C)", kind = "box")
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[27], line 3 1 import seaborn as sns 2 sns.scatterplot(data = water, x = "Year", y = "Water Temp (C)", hue = "Month", size = "Site_Id") ----> 3 plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.) 4 sns.catplot(data = water, x = "Site_Id", y = "Water Temp (C)", kind = "box") TypeError: 'module' object is not callable
Here is a graph comparing dissolved oxygen levels to the water's temperature. There appears to be a slight negative correlation, meaning as temperature decreases, dissolved oxygen levels increase. This makes sense because colder water can hold more oxygen.
import seaborn as sns
sns.scatterplot(data = water, x = "Water Temp (C)", y = "Dissolved Oxygen (mg/L)")
Here is a graph comparing dissolved oxygen levels to Secchi depth. There is not much correlation. I predicted that as Secchi depth increases, dissolved oxygen levels would increase due to the fact that sunlight would be able to penetrate the water deeper and consequently allow more organisms to complete photosynthesis.
import seaborn as sns
sns.scatterplot(data = water, x = "Secchi Depth (m)", y = "Dissolved Oxygen (mg/L)")
Here is a graph comparing salinity in ppt to the water's temperature in degrees Celsius. There is not really any correlation. However, the outliers match the trend I would expect; as temperature decreases, salinity increases because colder water has a higher capacity to hold dissolved salts.
import seaborn as sns
sns.scatterplot(data = water, x = "Water Temp (C)", y = "Salinity (ppt)")
Discussion¶
Can the pH, salinity, Secchi depth, and dissolved oxygen levels of a sample predict what body of water the sample came from in any given year?
Do salinity, pH, water depth, and water temperature predict dissolved oxygen levels?