Water quality

Get the lesson R script: wq.R

Get the lesson data: download zip

Goals and motivation

This module will provide you with background into the structure of the peptools R package and how to use functions in this package to read, analyze, and plot water quality indicators for the Peconic Estuary Program. You should be able to answer these questions at the end of this session:

  • What is the peptools package and how are the functions setup?
  • How can I use the peptools package to import water quality data?
  • What functions are available for summarizing water quality data?
  • What functions are available for plotting water quality data?

peptools overview

The peptools package was developed as a set of functions to read, analyze, and plot water quality data relevant for the PEP. the package source code lives on GitHub. A website that includes function references and full documentation is also provided. The following shows all functions provided in peptools, organized by prefix.

Read

  • read_pepdo(): Import dissolved oxygen data

  • read_pepwq(): Load local water quality file

  • read_pepent(): Load Enterococcus beach data file

Analyze

  • anlz_attainpep(): Get attainment categories

  • anlz_dodlypep(): Analyze daily DO values relative to threshold

  • anlz_domopep(): Analyze monthly DO values relative to threshold

  • anlz_entpep(): Count beach exceedances for enterococcus

  • anlz_medpep(): Estimate annual medians

Show

  • show_boxpep(): Plot monthly chlorophyll or secchi depth values for a bay segment

  • show_entmatrix(): Create a colorized table for beach closure reporting

  • show_matrixpep(): Create a colorized table for indicator reporting

  • show_reactablepep(): Create a reactable table for reporting matrices, used internally only

  • show_sitemappep(): Map water quality data for a selected year

  • show_thrpep(): Plot annual water quality values and thresholds for a bay segment

  • show_plotlypep(): Plot chlorophyll and secchi data together with matrix outcomes

  • show_wqmatrixpep(): Create a colorized table for chlorophyll or secchi exceedances

  • show_segmatrixpep(): Create a colorized table for water quality outcomes and exceedances by segment

Basic use

Before we use the package, we’ll load it into our current R session as follows. We’ll also load the mapview package. You should have already installed these packages in the previous lesson.

library(peptools)
library(mapview)

The package includes a pepstations data object that includes metadata for each station, including lat/lon and bay segment.

pepstations
## # A tibble: 46 x 5
##    BayStation StationName         bay_segment Longitude Latitude
##  * <chr>      <chr>               <fct>           <dbl>    <dbl>
##  1 60280      Peconic River       Western         -72.7     40.9
##  2 60275      Peconic River       Western         -72.7     40.9
##  3 60270      Peconic River       Western         -72.6     40.9
##  4 60265      Peconic River       Western         -72.6     40.9
##  5 60266      Peconic River       Western         -72.6     40.9
##  6 60260      Peconic River       Western         -72.6     40.9
##  7 60250      Sawmill Creek       Western         -72.6     40.9
##  8 60240      Peconic River Mouth Western         -72.6     40.9
##  9 60230      Terrys Creek        Western         -72.6     40.9
## 10 60210      Reeves Bay          Western         -72.6     40.9
## # ... with 36 more rows

The pepseg data object also included with the package shows the polygons for the bay segments. These data are included as a geographic data object, so they can be viewed with the mapview package.

mapview(pepseg)

The water quality data can be imported using the read_pepwq() function. The data are included in this lesson and it was obtained as a raw data file from here. To original data file is named “Peconics SCDHS WQ data - up to 2019 so far.xlsx”, or something similar depending on when the data were downloaded. The location of this file on your computer is passed to the import function. Below, a local file on your computer renamed as “currentdata.xlsx” that contains the water quality data is imported.

rawdat <- read_pepwq('data/currentdata.xlsx')
head(rawdat)
## # A tibble: 6 x 11
##   Date       BayStation name  value status    yr    mo StationName   bay_segment
##   <date>     <chr>      <chr> <dbl> <chr>  <dbl> <dbl> <chr>         <fct>      
## 1 1990-01-31 60113      chla    4.8 ""      1990     1 Little Pecon~ Central    
## 2 1990-01-31 60113      sd      8   ""      1990     1 Little Pecon~ Central    
## 3 1990-01-31 60114      chla    2.8 ""      1990     1 Paradise Poi~ Central    
## 4 1990-01-31 60114      sd     12   ""      1990     1 Paradise Poi~ Central    
## 5 1990-01-31 60115      chla    8.5 ""      1990     1 Orient Harbor Eastern    
## 6 1990-01-31 60115      sd      7   ""      1990     1 Orient Harbor Eastern    
## # ... with 2 more variables: Longitude <dbl>, Latitude <dbl>

The raw data includes multiple fields, but only the chlorophyll and secchi data are retained for reporting. The data are in long format with the name column showing which observation (chlorophyll or secchi) the row value shows and the status showing if the observation was above or below detection (indicated as > or <). Each station is also grouped by major bay segment, defined as Western, Central, Eastern.

The function anlz_medpep() summarizes the station data by bay segment. The function returns annual means for chlorophyll and secchi depth and monthly means by year for chlorophyll and secchi depth. These summaries are then used to determine if bay segment targets for water quality are met using the anlz_attainpep() function.

Below shows how to use anlz_medpep() to summarize the data by bay segment to estimate annual and monthly means for chlorophyll and secchi depth. The output is a two-element list for the annual (ann) and monthly (mos) means by segment.

medpep <- anlz_medpep(rawdat)
medpep
## $ann
## # A tibble: 540 x 5
##    bay_segment    yr   val est    var  
##    <fct>       <dbl> <dbl> <chr>  <chr>
##  1 Western      2019  3.03 lwr.ci chla 
##  2 Western      2019  4.22 medv   chla 
##  3 Western      2019  6.26 upr.ci chla 
##  4 Western      2018  3.89 lwr.ci chla 
##  5 Western      2018  4.82 medv   chla 
##  6 Western      2018  6    upr.ci chla 
##  7 Western      2017  3.58 lwr.ci chla 
##  8 Western      2017  5.29 medv   chla 
##  9 Western      2017  7.04 upr.ci chla 
## 10 Western      2016  3.14 lwr.ci chla 
## # ... with 530 more rows
## 
## $mos
## # A tibble: 2,160 x 6
##    bay_segment    yr    mo   val est   var  
##    <fct>       <dbl> <dbl> <dbl> <chr> <chr>
##  1 Western      2019    12 NA    medv  chla 
##  2 Western      2019    11 NA    medv  chla 
##  3 Western      2019    10 NA    medv  chla 
##  4 Western      2019     9 NA    medv  chla 
##  5 Western      2019     8 NA    medv  chla 
##  6 Western      2019     7  2.06 medv  chla 
##  7 Western      2019     6  4.64 medv  chla 
##  8 Western      2019     5  2.45 medv  chla 
##  9 Western      2019     4 NA    medv  chla 
## 10 Western      2019     3  3.49 medv  chla 
## # ... with 2,150 more rows

This output can then be further analyzed with anlz_attainpep() to determine if the bay segment outcomes are met in each year. The results are used by the plotting functions described below. In short, the chl_sd column indicates the categorical outcome for chlorophyll and light attenuation for each segment. The outcomes are integer values from zero to three. The relative exceedances of water quality thresholds for each segment, both in duration and magnitude, are indicated by higher integer values. This is explained in detail in the next section.

anlz_attainpep(medpep)
## # A tibble: 90 x 4
##    bay_segment    yr chla_sd outcome
##    <fct>       <dbl> <chr>   <chr>  
##  1 Western      2019 2_3     red    
##  2 Western      2018 2_3     red    
##  3 Western      2017 2_3     red    
##  4 Western      2016 0_3     yellow 
##  5 Western      2015 0_2     yellow 
##  6 Western      2014 0_0     green  
##  7 Western      2013 1_1     yellow 
##  8 Western      2012 1_2     yellow 
##  9 Western      2011 1_2     yellow 
## 10 Western      2010 3_3     red    
## # ... with 80 more rows

Plotting

The plotting functions are used to view station data, long-term trends for each bay segment, and annual results for the overall water quality assessment.

The show_sitemappep() function produces an interactive map of annual medians of water quality conditions at each station. Medians can be shown for chlorophyll or secchi depth and for all stations or only stations from selected bay segments. Each point on the map shows the annual median for the parameter, with the size and color of the point in proportion to the other median values shown on the map. The color scale for the median shows higher concentrations of chlorophyll or shallower secchi depths in red and lower concentrations of chlorophyll or deeper secchi depths in blue. Hovering the mouse pointer over a site location will indicate the site name and the median value. Clicking on a station point will reveal the underlying plot data.

Here, the 2019 chlorophyll medians are shown for stations in all bay segments.

show_sitemappep(rawdat, yrsel = 2019)

A different year, parameter, and bay segment can also be chosen. Note that the size and color ramps are reversed for secchi depth, such that smaller and bluer points indicate larger secchi values.

show_sitemappep(rawdat, yrsel = 2010, param = 'sd', bay_segment = c('Central', 'Eastern'))

By default, the color and size scaling in show_sitemappep() is relative to only the points on the map. You can view scaling relative to all values in the dataset (across time and space) to get a sense of how the values for the selected year compare to the rest of the record. This can be done by changing the relative argument to TRUE.

show_sitemappep(rawdat, yrsel = 2019, relative = T)

The scaling is also sensitive to outliers. The default is to use the maximum scaling as the 99th percentile value observed in the entire dataset for the chosen parameter. Otherwise, non-sensical results will be returned if the absolute maximum value is used to set the scale. If, however, you want to see scaling relative to a smaller quantile, you can choose it accordingly with the maxrel argument. The size and color ramps will be scaled to the defined upper quantile value. The actual observed value at a point will always be visible on mouseover.

show_sitemappep(rawdat, yrsel = 2019, relative = T, maxrel = 0.8)

The show_thrpep() function provides a more descriptive assessment of annual trends for a chosen bay segment relative to thresholds. In this plot we show the annual medians and non-parametric confidence internals (95%) across stations for a segment. The red line shows annual trends and the horizontal blue line indicates the threshold for chlorophyll-a.

show_thrpep(rawdat, bay_segment = "Western", param = "chla")

We can show the same plot but for secchi depth by changing the param = "chla" to param = "sd". Note the change in the horizontal reference lines for the secchi depth target. Secchi trends must also be interpreted inversely to chlorophyll, such that lower values generally indicate less desirable water quality.

show_thrpep(rawdat, bay_segment = "Western", param = "sd")

The year range to plot can also be specified using the yrrng argument, where the default is yrrng = c(1990, 2019).

show_thrpep(rawdat, bay_segment = "Western", param = "chla", yrrng = c(2000, 2010))

Similarly, the show_boxpep() function provides an assessment of seasonal changes in chlorophyll or secchi depth values by bay segment. The most recent year is highlighted in red by default. This allows a simple evaluation of how the most recent year compared to historical averages. The threshold value is shown in blue text and as the dotted line. This is the same dotted line shown in show_thrpep().

show_boxpep(rawdat, param = 'chla', bay_segment = "Western")

show_boxpep(rawdat, param = 'sd', bay_segment = "Eastern")

A different subset of years and selected year of interest can also be viewed by changing the yrrng and yrsel arguments. Here we show 1990 compared to monthly averages for the last ten years.

show_boxpep(rawdat, param = 'chla', bay_segment = "Western", yrrng = c(2008, 2018), yrsel = 1990)

The show_thrpep() function is useful to understand annual variation in chlorophyll and secchi depth relative to thresholds for each bay segment. The information from these plots can provide an understanding of how the annual reporting outcomes are determined. An outcome integer from zero to three is assigned to each bay segment for each annual estimate of chlorophyll and secchi depth. These outcomes are based on both the exceedance of the annual estimate above the threshold (blue lines in show_thrpep()) and duration of the exceedance for the years prior. The following graphic describes this logic (Janicki, Wade, and Pribble, 2000).

Outcomes for annual estimates of water quality are assigned an integer value from zero to three depending on both magnitude and duration of the exceedence.

Outcomes for annual estimates of water quality are assigned an integer value from zero to three depending on both magnitude and duration of the exceedence.

The outcomes above are assigned for both chlorophyll and secchi depth. The duration criteria are determined based on whether the exceedance was observed for years prior to the current year. The exceedance criteria for chlorophyll and light-attenuation are currently the same for each segment. The peptools package contains a peptargets data file that is a reference for determining annual outcomes. This file is loaded automatically with the package and can be viewed from the command line.

peptargets
## # A tibble: 3 x 4
##   bay_segment name    sd_thresh chla_thresh
##   <fct>       <chr>       <dbl>       <dbl>
## 1 Western     Western       6.5         5.5
## 2 Central     Central       6.5         5.5
## 3 Eastern     Eastern       6.5         5.5

The final plotting function is show_matrixpep(), which creates an annual reporting matrix that reflects the combined outcomes for chlorophyll and secchi depth. Tracking the attainment outcomes provides the framework from which bay management actions can be developed and initiated. For each year and segment, a color-coded management action is assigned:

Stay the Course: Continue planned projects. Report data via annual progress reports and Baywide Environmental Monitoring Report.

Caution: Review monitoring data and nitrogen loading estimates. Begin/continue TAC and Management Board development of specific management recommendations.

On Alert: Finalize development and implement appropriate management actions to get back on track.

The management category or action is based on the combination of outcomes for chlorophyll and secchi depth (Janicki, Wade, and Pribble, 2000).

Management action categories assigned to each bay segment and year based on chlorophyll and Secchi depth outcomes.

Management action categories assigned to each bay segment and year based on chlorophyll and Secchi depth outcomes.

The results can be viewed with show_matrixpep().

show_matrixpep(rawdat)

If preferred, the matrix can also be returned in an HTML table that can be sorted and scrolled.

show_matrixpep(rawdat, asreact = TRUE)

Bay segment exceedances can also be viewed in a matrix using show_wqmatrixpep(). The not met/met outcome categories indicate if the median was above/below the threshold. However, the small/large exceedances used for the overall report card depend on degree of overlap of the confidence intervals with the threshold. The matrix outcome below is a simplification that shows a binary outcome (not met/met) for location of the median relative to the threshold.

show_wqmatrixpep(rawdat)

By default, the show_wqmatrixpep() function returns chlorophyll exceedances by segment. Secchi depth exceedances can be viewed by changing the param argument. Note that exceedances are reversed, i.e., lower values are considered less desirable water quality conditions for Secchi.

show_wqmatrixpep(rawdat, param = 'sd')

The results from show_matrixpep() for both secchi and chlorophyll can be combined for an individual segment using the show_segmatrixpep() function. Only one segment can be plotted for each function call.

show_segmatrixpep(rawdat, bay_segment = 'Western')

Finally, all segment plots can be shown together using the show_plotlypep() function that combines chlorophyll and secchi data for a given segment. This function combines outputs from show_thrpep() and show_segmatrixpep(). The final plot is interactive and can be zoomed by dragging the mouse pointer over a section of the plot. Information about each cell or value can be seen by hovering over a location in the plot.

show_plotlypep(rawdat)