Introduction¶

This dataset displays water quality data collected by Back Bay National Wildlife Refuge from 1989-2019. Samples collected once every two weeks from Back Bay and surrounding bodies of water (sites Bay, A, B, C, D) were measured by turbidity, pH, dissolved oxygen levels, salinity, and temperature. The purpose of this study was to ensure water quality standards were being met according to state regulations.

Citation:
Back Bay National Wildlife Refuge. 2020. Water Quality Data. Virginia Beach, Virginia
Virginia Department of Environmental Quality. Laws & Regulations. Virginia Department of Environmental Quality. https://www.deq.virginia.gov/laws-regulations

Preprocessing¶

To prepare this data for analysis, I got rid of unnecessary columns such as extra temperature columns, who verified the data, and sample number. I also renamed some columns to be more understandable and to better suit my liking. I got rid of a row that I believe was mistakenly inputted as being sampled in 1899 and got rid of rows with NaN values or rows that lacked the needed data.

In [ ]:
import pandas as pd
water = pd.read_csv("BKB_WaterQualityData_2020084.csv")
water = water[water["Year"] > 1988]
water = water.drop(["Unit_Id", "Water Depth (m)", "Air Temp-Celsius", "Air Temp (?F)", "Time (24:00)", "Field_Tech", "DateVerified", "WhoVerified", "AirTemp (C)"], axis = 1)
water = water.rename(columns = {"Water Temp (?C)" : "Water Temp (C)", "pH (standard units)" : "pH"})
water = water.dropna()
water
/var/folders/gc/0752xrm56pnf0r0dsrn5370c0000gr/T/ipykernel_22553/84725136.py:1: DeprecationWarning: 
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd
Out[ ]:
Site_Id Read_Date Salinity (ppt) Dissolved Oxygen (mg/L) pH Secchi Depth (m) Water Temp (C) Year
0 Bay 1/3/1994 1.3 11.7 7.3 0.40 5.9 1994
1 Bay 1/31/1994 1.5 12.0 7.4 0.20 3.0 1994
2 Bay 2/7/1994 1.0 10.5 7.2 0.25 5.9 1994
3 Bay 2/23/1994 1.0 10.1 7.4 0.35 10.0 1994
4 Bay 2/28/1994 1.0 12.6 7.2 0.20 1.6 1994
... ... ... ... ... ... ... ... ...
2361 D 10/11/2018 0.0 6.0 6.5 0.70 26.0 2018
2364 D 11/7/2018 0.0 6.9 6.5 0.90 20.0 2018
2366 Bay 10/11/2018 1.9 5.0 7.0 4.00 25.0 2018
2367 Bay 10/24/2018 0.0 9.0 7.0 0.30 18.0 2018
2368 Bay 10/28/2018 0.9 2.9 7.0 0.40 13.0 2018

1320 rows × 8 columns

Here is more processing to account for seasonal variation by grouping data by month.

In [ ]:
def month_convert(date):
    month_number = int(date.split('/')[0])
    if month_number == 1:
        month = "January"
    elif month_number == 2:
        month = "February"
    elif month_number == 3:
        month = "March"
    elif month_number == 4:
        month = "April"
    elif month_number == 5:
        month = "May"
    elif month_number == 6:
        month = "June"
    elif month_number == 7:
        month = "July"
    elif month_number == 8:
        month = "August"
    elif month_number == 9:
        month = "September"
    elif month_number == 10:
        month = "October"
    elif month_number == 11:    
        month = "November"
    elif month_number == 12:
        month = "December"
    return month
water["Month"] = water["Read_Date"].apply(month_convert)
water.drop("Read_Date", axis = 1, inplace = True)
water
Out[ ]:
Site_Id Salinity (ppt) Dissolved Oxygen (mg/L) pH Secchi Depth (m) Water Temp (C) Year Month
0 Bay 1.3 11.7 7.3 0.40 5.9 1994 January
1 Bay 1.5 12.0 7.4 0.20 3.0 1994 January
2 Bay 1.0 10.5 7.2 0.25 5.9 1994 February
3 Bay 1.0 10.1 7.4 0.35 10.0 1994 February
4 Bay 1.0 12.6 7.2 0.20 1.6 1994 February
... ... ... ... ... ... ... ... ...
2361 D 0.0 6.0 6.5 0.70 26.0 2018 October
2364 D 0.0 6.9 6.5 0.90 20.0 2018 November
2366 Bay 1.9 5.0 7.0 4.00 25.0 2018 October
2367 Bay 0.0 9.0 7.0 0.30 18.0 2018 October
2368 Bay 0.9 2.9 7.0 0.40 13.0 2018 October

1320 rows × 8 columns

Outlier Test - Salinity (ppt)

In [ ]:
IQR_sal = water["Salinity (ppt)"].quantile(0.75) - water["Salinity (ppt)"].quantile(0.25)
outlier_sal = water[(water["Salinity (ppt)"] < water["Salinity (ppt)"].quantile(0.25) - 1.5*IQR_sal) | (water["Salinity (ppt)"] > water["Salinity (ppt)"].quantile(0.75) + 1.5*IQR_sal)]
count = outlier_sal.groupby("Month").size()
months_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
count.index = pd.Categorical(count.index, categories=months_order, ordered=True)
count = count.sort_index()
count
Out[ ]:
January       1
February      1
March         1
April         3
May           4
June         15
July         11
August       12
September     7
October      11
November      8
December      4
dtype: int64
In [ ]:
outlier_sal
Out[ ]:
Site_Id Salinity (ppt) Dissolved Oxygen (mg/L) pH Secchi Depth (m) Water Temp (C) Year Month
61 Bay 4.2 11.6 7.5 0.15 5.0 1990 February
68 Bay 4.1 10.8 7.0 0.25 11.0 1990 April
72 Bay 4.5 8.9 7.0 0.25 17.0 1990 May
79 Bay 3.8 7.9 7.0 0.35 26.0 1990 June
81 Bay 4.2 7.5 7.5 0.30 26.0 1990 July
... ... ... ... ... ... ... ... ...
813 Bay 4.0 3.7 7.0 0.50 27.0 2003 September
867 B 4.0 5.7 7.5 0.20 4.0 2004 December
1159 Bay 8.0 0.1 7.0 0.40 18.0 2006 October
1383 Bay 4.0 6.2 7.0 0.50 24.5 2008 June
2309 Bay 9.0 4.2 8.0 0.40 24.0 2019 June

78 rows × 8 columns

Outlier Test - Dissolved Oxygen (mg/L)

In [ ]:
 
In [ ]:
IQR_DO = water["Dissolved Oxygen (mg/L)"].quantile(0.75) - water["Dissolved Oxygen (mg/L)"].quantile(0.25)
outlier_DO = water[(water["Dissolved Oxygen (mg/L)"] < water["Dissolved Oxygen (mg/L)"].quantile(0.25) - 1.5*IQR_DO) | (water["Dissolved Oxygen (mg/L)"] > water["Dissolved Oxygen (mg/L)"].quantile(0.75) + 1.5*IQR_DO)]
outlier_DO
Out[ ]:
Site_Id Salinity (ppt) Dissolved Oxygen (mg/L) pH Secchi Depth (m) Water Temp (C) Year Month
133 Bay 1.4 15.1 8.0 0.2 2.0 1991 December

Outlier Test - pH

In [ ]:
IQR_pH = water["pH"].quantile(0.75) - water["pH"].quantile(0.25)
outlier_pH = water[(water["pH"] < water["pH"].quantile(0.25) - 1.5*IQR_pH) | (water["pH"] > water["pH"].quantile(0.75) + 1.5*IQR_pH)]
count = outlier_pH.groupby("Month").size()
months_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
count.index = pd.Categorical(count.index, categories=months_order, ordered=True)
count = count.sort_index()
count
Out[ ]:
January      2
February     4
March        2
April        3
May          3
June         5
July         2
August       7
September    2
November     3
December     2
dtype: int64
In [ ]:
outlier_pH
Out[ ]:
Site_Id Salinity (ppt) Dissolved Oxygen (mg/L) pH Secchi Depth (m) Water Temp (C) Year Month
15 Bay 1.0 10.80 9.1 0.05 21.7 1994 May
22 Bay 2.0 11.00 9.4 0.50 30.5 1994 June
23 Bay 2.0 10.70 9.2 0.45 29.0 1994 August
24 Bay 2.0 10.20 9.2 0.40 22.0 1994 August
26 Bay 2.0 9.10 9.1 0.45 27.0 1994 August
27 Bay 2.3 9.80 9.3 0.45 27.0 1994 August
250 Bay 4.5 10.40 9.2 0.30 24.5 1995 September
381 Bay 1.8 1.80 9.5 0.35 27.0 1999 August
382 Bay 3.0 7.00 9.2 0.40 28.0 1999 August
388 Bay 1.0 8.70 9.1 0.50 11.0 1999 December
417 B 1.5 7.60 9.7 4.00 30.0 2000 May
441 C 1.0 8.20 9.5 0.40 28.0 2000 June
475 D 0.0 10.00 9.3 1.10 7.0 2000 December
481 Bay 1.5 7.60 9.1 0.30 18.0 2000 April
495 Bay 1.0 4.80 9.2 0.30 16.0 2000 November
496 Bay 1.0 7.80 9.9 0.40 13.0 2000 November
507 A 1.0 2.50 9.1 0.20 22.0 2001 May
512 A 1.5 4.90 9.2 0.50 26.0 2001 July
515 A 3.0 3.00 0.3 0.35 26.0 2001 September
575 D 0.0 4.10 9.3 0.80 19.5 2001 April
591 Bay 1.0 8.10 9.2 0.36 7.0 2001 January
592 Bay 1.0 9.00 9.3 0.10 8.0 2001 January
594 Bay 1.0 9.20 9.1 0.20 11.0 2001 February
595 Bay 1.2 9.25 9.2 0.35 14.0 2001 March
596 Bay 1.0 9.90 9.4 0.25 10.0 2001 March
598 Bay 2.0 6.40 9.3 0.30 20.0 2001 April
601 Bay 2.0 4.35 9.2 0.40 23.0 2001 June
604 Bay 3.0 7.20 9.8 0.30 23.0 2001 July
605 Bay 3.0 4.10 9.1 0.40 30.0 2001 August
611 Bay 3.0 5.50 9.5 0.25 13.0 2001 November
680 D 0.0 8.30 9.6 1.30 11.0 2002 February
702 Bay 3.0 5.70 9.9 0.10 12.0 2002 February
703 Bay 3.0 7.60 9.2 0.30 11.0 2002 February
749 B 0.0 8.50 9.5 0.10 25.1 2003 June
2217 A 0.1 4.70 4.8 0.50 27.0 2018 June

Outlier Test - Secchi Depth (m)

In [ ]:
IQR_secchi = water["Secchi Depth (m)"].quantile(0.75) - water["Secchi Depth (m)"].quantile(0.25)
outlier_secchi = water[(water["Secchi Depth (m)"] < water["Secchi Depth (m)"].quantile(0.25) - 1.5*IQR_secchi) | (water["Secchi Depth (m)"] > water["Secchi Depth (m)"].quantile(0.75) + 1.5*IQR_secchi)]
count = outlier_secchi.groupby("Month").size()
months_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
count.index = pd.Categorical(count.index, categories=months_order, ordered=True)
count = count.sort_index()
count
Out[ ]:
January      11
February     13
March         9
April         3
May           4
June          1
July          3
August        2
September     2
October       7
November      7
December     12
dtype: int64
In [ ]:
outlier_secchi
Out[ ]:
Site_Id Salinity (ppt) Dissolved Oxygen (mg/L) pH Secchi Depth (m) Water Temp (C) Year Month
301 Bay 1.5 10.90 6.6 2.50 10.1 1997 February
362 C 1.0 2.40 8.0 5.80 34.0 1999 September
371 D 0.0 1.50 7.4 1.75 34.5 1999 September
375 D 0.0 6.50 8.0 1.05 15.0 1999 November
378 D 0.0 8.25 8.0 1.10 5.0 1999 December
... ... ... ... ... ... ... ... ...
2205 D 0.0 9.90 6.5 1.30 7.0 2017 December
2215 A 0.1 4.00 6.5 5.50 26.0 2018 May
2241 D 0.0 10.00 6.5 1.60 4.0 2018 January
2242 D 0.0 9.50 6.5 1.30 6.0 2018 February
2366 Bay 1.9 5.00 7.0 4.00 25.0 2018 October

74 rows × 8 columns

Outlier Test - Water Temp (C)

In [ ]:
IQR_temp = water["Water Temp (C)"].quantile(0.75) - water["Water Temp (C)"].quantile(0.25)
outlier_temp = water[(water["Water Temp (C)"] < water["Water Temp (C)"].quantile(0.25) - 1.5*IQR_temp) | (water["Water Temp (C)"] > water["Water Temp (C)"].quantile(0.75) + 1.5*IQR_temp)]
outlier_temp
Out[ ]:
Site_Id Salinity (ppt) Dissolved Oxygen (mg/L) pH Secchi Depth (m) Water Temp (C) Year Month
701 Bay 3.0 9.0 9.0 0.3 59.0 2002 January
822 A 1.0 7.0 7.0 0.8 60.0 2004 February
1038 Bay 0.0 5.0 6.5 0.1 74.0 2005 September
1232 D 0.0 7.6 5.0 1.2 54.0 2007 January

Summary Data Analysis¶

Below is a graph of the distribution of the number of samples taken from each site.

In [ ]:
import matplotlib as plt
counts = water["Site_Id"].value_counts()
counts = counts.loc[['Bay', 'A', 'B', 'C', 'D']]
counts.plot(kind = "bar")
plt.pyplot.show()
No description has been provided for this image

Below are the averages of each quantitative column for each site over all the years in which data was collected.

In [ ]:
filter = water.drop(["Month", "Year"], axis = 1)
averages = filter.groupby("Site_Id").mean()
averages = averages.loc[['Bay', 'A', 'B', 'C', 'D']]
averages
Out[ ]:
Salinity (ppt) Dissolved Oxygen (mg/L) pH Secchi Depth (m) Water Temp (C)
Site_Id
Bay 1.494714 7.677107 7.382452 0.405044 17.422482
A 0.345875 5.372813 6.936875 0.608437 19.500000
B 0.440588 5.383294 7.307647 0.304882 18.501176
C 0.499159 5.591589 7.318692 0.480935 18.336449
D 0.088317 6.386139 6.726733 0.893218 18.272772

Here are the averages of each quantitative value for the site Bay grouped by month for the years 1989-2019.

In [ ]:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
site_Bay = water[water["Site_Id"] == "Bay"]
numeric_columns = site_Bay.select_dtypes(include = [np.number]).columns.tolist()
month_dict = {1: "January", 2: "February", 3: "March", 4: "April", 5: "May", 6: "June", 7: "July", 8: "August", 9: "September", 10: "October", 11: "November", 12: "December"}
water["Month"].replace(month_dict, inplace = True)
averages = site_Bay.groupby("Month")[numeric_columns].mean()
averages = averages.drop("Year", axis = 1)
averages
graph = sns.relplot(data = averages, kind = "line")
graph.set_xticklabels(rotation = 45)
graph
/var/folders/gc/0752xrm56pnf0r0dsrn5370c0000gr/T/ipykernel_22553/1616719987.py:7: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  water["Month"].replace(month_dict, inplace = True)
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x149dd66d0>
No description has been provided for this image
In [ ]:
averages
Out[ ]:
Salinity (ppt) Dissolved Oxygen (mg/L) pH Secchi Depth (m) Water Temp (C)
Month
April 1.442687 8.267910 7.114179 0.298060 15.613433
August 1.722203 6.455085 8.030508 0.495085 25.600000
December 1.115217 9.014130 7.095652 0.399783 8.693696
February 1.166071 9.002679 7.116071 0.360357 8.085714
January 1.088000 9.283000 7.136000 0.274600 8.728000
July 1.547273 6.228182 7.555455 0.473273 26.909091
June 1.930769 6.813077 7.587692 0.539923 24.486154
March 1.301786 8.899107 7.083929 0.254018 11.051786
May 1.380000 7.603770 7.344262 0.344344 20.303279
November 1.508929 8.223750 7.302679 0.507321 13.626786
October 1.921429 7.099107 7.419643 0.482143 18.030357
September 1.645370 5.642593 7.729630 0.418333 24.394444

Here are the averages of each quantitative value grouped by month for site A from the years 2000-2019.

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
site_Bay = water[water["Site_Id"] == "A"]
numeric_columns = site_Bay.select_dtypes(include=[np.number]).columns.tolist()
averages = site_Bay.groupby("Month")[numeric_columns].mean()
averages = averages.drop('Year', axis=1)
graph = sns.relplot(data = averages, kind = "line")
graph.set_xticklabels(rotation = 45)
graph
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x14f994c10>
No description has been provided for this image
In [ ]:
averages
Out[ ]:
Salinity (ppt) Dissolved Oxygen (mg/L) pH Secchi Depth (m) Water Temp (C)
Month
April 0.100000 4.490000 6.700000 0.430000 18.500000
August 0.400000 3.866667 7.046667 0.466667 27.400000
December 0.750000 7.550000 7.012500 0.718750 8.500000
February 0.366667 7.040000 7.006667 0.733333 12.666667
January 0.416667 7.316667 6.891667 0.754167 7.916667
July 0.426923 3.976923 7.153846 0.607692 27.923077
June 0.318750 4.512500 7.062500 0.578125 25.468750
March 0.208333 6.583333 6.750000 0.637500 11.708333
May 0.162500 4.162500 7.025000 0.875000 23.062500
November 0.285714 6.517857 6.928571 0.528571 14.785714
October 0.323077 5.665385 6.961538 0.480769 21.000000
September 0.493125 4.284375 6.650000 0.487500 25.687500

Here are the averages of each quantitative value grouped by month for site B between 2000-2019.

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
site_Bay = water[water["Site_Id"] == "B"]
numeric_columns = site_Bay.select_dtypes(include=[np.number]).columns.tolist()
averages = site_Bay.groupby("Month")[numeric_columns].mean()
averages = averages.drop('Year', axis=1)
graph = sns.relplot(data = averages, kind = "line")
graph.set_xticklabels(rotation = 45)
graph
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x139f406d0>
No description has been provided for this image
In [ ]:
averages
Out[ ]:
Salinity (ppt) Dissolved Oxygen (mg/L) pH Secchi Depth (m) Water Temp (C)
Month
April 0.500000 5.975000 7.583333 0.329167 16.833333
August 0.423077 3.623077 7.476923 0.258462 26.769231
December 0.609091 6.136364 7.027273 0.311818 9.318182
February 0.500000 7.561538 7.076923 0.346154 9.884615
January 0.423077 6.969231 6.992308 0.334615 6.615385
July 0.166667 3.416667 7.083333 0.433333 27.875000
June 0.452941 4.505882 7.664706 0.220588 26.182353
March 0.250000 7.414286 7.092857 0.321429 11.464286
May 0.340000 5.024000 7.993333 0.544000 23.306667
November 0.593750 6.050000 7.237500 0.228125 14.375000
October 0.452941 5.329412 7.041176 0.232353 19.588235
September 0.541176 3.282353 7.288235 0.178235 25.029412

Here are the averages of each quantitative value grouped by month for site C from the years 2000-2019.

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
site_Bay = water[water["Site_Id"] == "C"]
numeric_columns = site_Bay.select_dtypes(include=[np.number]).columns.tolist()
averages = site_Bay.groupby("Month")[numeric_columns].mean()
averages = averages.drop('Year', axis=1)
graph =sns.relplot(data = averages, kind = "line")
plt.xticks(rotation=45)
graph
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x14fb7e910>
No description has been provided for this image
In [ ]:
averages
Out[ ]:
Salinity (ppt) Dissolved Oxygen (mg/L) pH Secchi Depth (m) Water Temp (C)
Month
April 0.200000 6.050000 7.260000 0.377000 17.300000
August 0.611111 4.577778 7.122222 0.366667 28.222222
December 0.510000 6.360000 6.950000 0.547000 7.500000
February 0.666667 7.277778 7.333333 0.538889 8.333333
January 1.000000 5.600000 7.160000 0.490000 8.500000
July 0.126250 4.462500 7.337500 0.412500 28.875000
June 0.636364 4.890909 7.945455 0.422727 26.090909
March 0.500000 7.350000 7.100000 0.600000 8.500000
May 1.000000 5.285714 7.585714 0.317143 20.857143
November 0.220000 6.320000 7.020000 0.375000 15.800000
October 0.509091 5.100000 7.545455 0.327273 20.954545
September 0.333333 3.877778 7.322222 1.033333 24.666667

Here are the averages of each quantitative value grouped by month for site D between 2000-2019.

In [ ]:
import numpy as np
import matplotlib as plt
site_Bay = water[water["Site_Id"] == "D"]
numeric_columns = site_Bay.select_dtypes(include=[np.number]).columns.tolist()
averages = site_Bay.groupby("Month")[numeric_columns].mean()
averages = averages.drop('Year', axis=1)
graph = sns.relplot(data = averages, kind = "line")
graph.set_xticklabels(rotation = 45)
graph
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x149dd6550>
No description has been provided for this image
In [ ]:
averages
Out[ ]:
Salinity (ppt) Dissolved Oxygen (mg/L) pH Secchi Depth (m) Water Temp (C)
Month
April 0.055556 6.677778 6.688889 0.866667 16.805556
August 0.011765 4.411765 6.682353 0.701765 27.235294
December 0.238095 8.292857 6.652381 1.061905 9.261905
February 0.055556 8.905556 6.777778 1.138889 7.444444
January 0.076923 8.553846 6.584615 1.238462 11.076923
July 0.058824 4.305882 6.741176 0.711765 28.176471
June 0.000000 4.866667 6.713333 0.666667 25.066667
March 0.083333 9.025000 6.666667 1.166667 10.000000
May 0.015000 5.668750 6.706250 0.712500 21.150000
November 0.133333 6.886667 6.733333 0.960000 15.066667
October 0.161905 5.483333 6.819048 0.885714 20.904762
September 0.105263 4.500000 6.878947 0.710526 24.984211

Here is a line graph showing the average salinity per year in ppt from the years 1989-2019, color coded by site. The second graph compares the mean salinities in ppt of each site. Because salinity stays relatively constant throughout a given year, the data is not grouped by month in any way.

In [ ]:
import seaborn as sns
average_salinity = water.groupby(['Year', 'Site_Id'])['Salinity (ppt)'].mean().reset_index()
sns.relplot(data = average_salinity, x = "Year", y = "Salinity (ppt)", kind = "line", hue = "Site_Id")
sns.catplot(data = average_salinity, x = "Site_Id", y = "Salinity (ppt)", kind = "box")
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/categorical.py:640: FutureWarning: SeriesGroupBy.grouper is deprecated and will be removed in a future version of pandas.
  positions = grouped.grouper.result_index.to_numpy(dtype=float)
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x14fcd6590>
No description has been provided for this image
No description has been provided for this image

Here is a graph showing trends in the dissolved oxygen levels in mg/L from the years 1999-2019, color coded by site. Because dissolved oxygen varies seasonally, the graphs are grouped by month. This second graph compares the mean levels of dissolved oxygen in mg/L for all five sites. The third graph shows dissolved oxygen levels for site Bay from 1989-2019. This is a separate graph because from 1989-1998, the dissolved oxygen levels were abnormally, but consistently, high, and then in 1999 they dropped down and remained relatively in the same lower range.

In [ ]:
import seaborn as sns
average_DO = water.groupby(["Year", "Month", "Site_Id"])["Dissolved Oxygen (mg/L)"].mean().reset_index()
bay = average_DO[average_DO["Site_Id"] == "Bay"]
average_DO = average_DO[average_DO["Year"] > 1998]
sns.relplot(data = average_DO, x = "Year", y = "Dissolved Oxygen (mg/L)", kind = "line", hue = "Site_Id", col = "Month", col_wrap = 4, height = 4)
sns.catplot(data = average_DO, x = "Site_Id", y = "Dissolved Oxygen (mg/L)", kind = "box")
sns.relplot(data = bay, x = "Year", y = "Dissolved Oxygen (mg/L)", kind = "line")
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/categorical.py:640: FutureWarning: SeriesGroupBy.grouper is deprecated and will be removed in a future version of pandas.
  positions = grouped.grouper.result_index.to_numpy(dtype=float)
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x17898d710>
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Here is a graph showing the average pH each year from the years 1989-2019, color coded by site. The second graph compares the mean pH of each site. Because pH stays relatively constant throughout a given year, the data is not grouped by month in any way.

In [ ]:
import seaborn as sns
average_pH = water.groupby(["Year", "Site_Id"])["pH"].mean().reset_index()
sns.relplot(data = average_pH, x = "Year", y = "pH", kind = "line", hue = "Site_Id")
sns.catplot(data = average_pH, x = "Site_Id", y = "pH", kind = "box")
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/categorical.py:640: FutureWarning: SeriesGroupBy.grouper is deprecated and will be removed in a future version of pandas.
  positions = grouped.grouper.result_index.to_numpy(dtype=float)
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x178c5d990>
No description has been provided for this image
No description has been provided for this image

Here is a graph showing the average Secchi depth in meters from the years 1989-2019, color coded by site. Secchi depth is a measurement of water transparency determined by how deep sunlight can penetrate the water. A shallow Secchi depth indicates cloudy waters. This is caused by a larger abundance of phytoplankton, which means the environment is healthy. Because Secchi depth stays relatively constant throughout a given year, the data is not grouped by month in any way.

In [ ]:
import seaborn as sns
average_secchi = water.groupby(["Year", "Site_Id"])["Secchi Depth (m)"].mean().reset_index()
sns.relplot(data = average_secchi, x = "Year", y = "Secchi Depth (m)", kind = "line", hue = "Site_Id")
sns.catplot(data = average_secchi, x = "Site_Id", y = "Secchi Depth (m)", kind = "box")
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/driscoll/mambaforge/envs/219/lib/python3.11/site-packages/seaborn/categorical.py:640: FutureWarning: SeriesGroupBy.grouper is deprecated and will be removed in a future version of pandas.
  positions = grouped.grouper.result_index.to_numpy(dtype=float)
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x178c67e90>
No description has been provided for this image
No description has been provided for this image

Here is a graph showing the range of water temperatures in degrees Celsius from the years 1989-2019, color coded by site. The yearly average of temperature is not shown because of the fluctuation in temperature between winter and summer. The second graph compares the mean temperatures of each site. The temperature range stays fairly consistent between about 14-24 degrees Celsius.

In [ ]:
import seaborn as sns
sns.scatterplot(data = water, x = "Year", y = "Water Temp (C)", hue = "Month", size = "Site_Id")
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
sns.catplot(data = water, x = "Site_Id", y = "Water Temp (C)", kind = "box")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[27], line 3
      1 import seaborn as sns
      2 sns.scatterplot(data = water, x = "Year", y = "Water Temp (C)", hue = "Month", size = "Site_Id")
----> 3 plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
      4 sns.catplot(data = water, x = "Site_Id", y = "Water Temp (C)", kind = "box")

TypeError: 'module' object is not callable
No description has been provided for this image

Here is a graph comparing dissolved oxygen levels to the water's temperature. There appears to be a slight negative correlation, meaning as temperature decreases, dissolved oxygen levels increase. This makes sense because colder water can hold more oxygen.

In [ ]:
import seaborn as sns
sns.scatterplot(data = water, x = "Water Temp (C)", y = "Dissolved Oxygen (mg/L)")

Here is a graph comparing dissolved oxygen levels to Secchi depth. There is not much correlation. I predicted that as Secchi depth increases, dissolved oxygen levels would increase due to the fact that sunlight would be able to penetrate the water deeper and consequently allow more organisms to complete photosynthesis.

In [ ]:
import seaborn as sns
sns.scatterplot(data = water, x = "Secchi Depth (m)", y = "Dissolved Oxygen (mg/L)")

Here is a graph comparing salinity in ppt to the water's temperature in degrees Celsius. There is not really any correlation. However, the outliers match the trend I would expect; as temperature decreases, salinity increases because colder water has a higher capacity to hold dissolved salts.

In [ ]:
import seaborn as sns
sns.scatterplot(data = water, x = "Water Temp (C)", y = "Salinity (ppt)")  

Discussion¶

Can the pH, salinity, Secchi depth, and dissolved oxygen levels of a sample predict what body of water the sample came from in any given year?

Do salinity, pH, water depth, and water temperature predict dissolved oxygen levels?