This vignette provides an overview of the functions in tbeptools that can be used to work with water quality data in Tampa Bay. View the other vignettes for topical introductions to other reporting products (e.g., seagrasess, tidal creeks, etc.).
The environmental recovery of Tampa Bay is an exceptional success story for coastal water quality management. Nitrogen loads in the mid 1970s have been estimated at 8.2 million kg/yr, with approximately 5.5 million kg/yr entering the upper Bay alone . Reduced water clarity associated with phytoplankton biomass contributed to a dramatic reduction in the areal coverage of seagrass  and development of hypoxic events, causing a decline in benthic faunal production . Extensive efforts to reduce nutrient loads to the Bay occurred by the late 1970s, with the most notable being improvements in infrastructure for wastewater treatment in 1979. Improvements in water clarity and decreases in chlorophyll concentrations were observed Bay-wide in the 1980s, with conditions generally remaining constant to present day .
Tracking changes in environmental condition from the past to present day would not have been possible without a long-term monitoring dataset. Data have been collected monthly by the Environmental Protection Commission of Hillsborough County since 1974 [6,7]. Samples are taken at forty-five stations using by water collection or monitoring sonde at bottom, mid- or surface depths, depending on parameter. The locations of monitoring stations are fixed and cover the entire Bay from the uppermost mesohaline sections to the lowermost euhaline portions that have direct interaction with the Gulf of Mexico. Up to 515 observations are available for different parameters at each station, e.g., nitrogen, chlorophyll-a, and secchi depth.
Data collected from the monitoring program are processed and maintained in a spreadsheet titled
RWMDataSpreadsheet_ThroughCurrentReportMonth.xlsx at ftp://ftp.epchc.org/EPC_ERM_FTP/WQM_Reports/. These data include observations at all stations and for all parameters throughout the period of record. To date, there have been no systematic tools for importing, analyzing, and reporting information from these data. The tbeptools package provides was developed to address this need.
The main function for importing water quality data is
read_importwq(). This function downloads the latest file if one is not already available at the location specified by the
xlsx input argument.
First, create a character path for the location of the file. If one does not exist, specify a desired location and name for the downloaded file. Here, we want to put the file in the vignettes folder and name is 2018_Results_updated.xls. Note that this file path is relative to the root working directly for the current R session. You can view the working directory with
xlsx <- 'vignettes/2018_Results_updated.xls'
Now we pass this
xlsx object to the
ecpdata <- read_importwq(xlsx)
#> Error in read_importwq("empty") : file.exists(xlsx) is not TRUE
We get an error message from the function indicating that the file is not found. This makes sense because the file doesn’t exist yet, so we need to tell the function to download the latest file. This is done by changing the
download_latest argument to
TRUE (the default is
ecpdata <- read_importwq(xlsx, download_latest = TRUE)
#> File vignettes/2018_Results_updated.xls does not exist, replacing with downloaded file... #> trying URL 'ftp://ftp.epchc.org/EPC_ERM_FTP/WQM_Reports/RWMDataSpreadsheet_ThroughCurrentReportMonth.xlsx' length 24562051 bytes (23.4 MB)
Now we get the same message, but with an indication that file on the server is being downloaded. We’ll have the data downloaded and saved to the
epcdata object after it finishes downloading.
If we try to run the function again after downloading the data from the server, we get the following message. This check is done to make sure that the data are not unnecessarily downloaded if the current matches the file on the server.
ecpdata <- read_importwq(xlsx, download_latest = TRUE)
#> File is current...
Every time that tbeptools is used to work with the monitoring data,
read_importwq() should be used to import the data. You will always receive the message
File is current... if your local file matches the one on the server. However, new data are regularly collected and posted on the server. If
download_latest = TRUE and your local file is out of date, you will receive the following message:
#> Replacing local file with current...
The final argument
na indicates which fields in the downloaded spreadsheet are treated as blank values and assigned to
NA. Any number of strings can be added to this function to replace fields with
After the data are successfully imported, you can view them from the assigned object:
epcdata #> # A tibble: 26,476 x 22 #> bay_segment epchc_station SampleTime yr mo Latitude Longitude #> <chr> <dbl> <dttm> <dbl> <dbl> <dbl> <dbl> #> 1 HB 6 2021-03-16 10:24:00 2021 3 27.9 -82.5 #> 2 HB 7 2021-03-16 10:43:00 2021 3 27.9 -82.5 #> 3 HB 8 2021-03-16 13:52:00 2021 3 27.9 -82.4 #> 4 MTB 9 2021-03-16 13:09:00 2021 3 27.8 -82.4 #> 5 MTB 11 2021-03-16 10:59:00 2021 3 27.8 -82.5 #> 6 MTB 13 2021-03-16 11:12:00 2021 3 27.8 -82.5 #> 7 MTB 14 2021-03-16 12:46:00 2021 3 27.8 -82.5 #> 8 MTB 16 2021-03-23 10:43:00 2021 3 27.7 -82.5 #> 9 MTB 19 2021-03-23 10:58:00 2021 3 27.7 -82.6 #> 10 LTB 23 2021-03-23 13:57:00 2021 3 27.7 -82.6 #> # … with 26,466 more rows, and 15 more variables: Total_Depth_m <dbl>, #> # Sample_Depth_m <dbl>, tn <dbl>, tn_q <chr>, sd_m <dbl>, sd_raw_m <dbl>, #> # sd_q <chr>, chla <dbl>, chla_q <chr>, Sal_Top_ppth <dbl>, #> # Sal_Mid_ppth <dbl>, Sal_Bottom_ppth <dbl>, Temp_Water_Top_degC <dbl>, #> # Temp_Water_Mid_degC <dbl>, Temp_Water_Bottom_degC <dbl>
These data include the bay segment name, station number, sample time, year, month, latitude, longitude, station depth, sample depth, secchi depth, and chlorophyll. Note that the monitoring data include additional parameters. Chlorophyll and secchi depth are currently the only parameters returned by
read_importwq() given the reporting indicators used below.
An import function is also available to download and format phytoplankton cell count data. The
read_importphyto() function works similarly as the import function for the water quality data. Start by specifying a path where the data should be downloaded and set
TRUE. This function will download and summarize data from the file
PlanktonDataList_ThroughCurrentReportMonth.xlsx on the EPC website.
xlsx <- 'vignettes/phyto_data.xlsx' phytodata <- read_importphyto(xlsx, download_latest = T)
#> File vignettes/phyto_data.xlsx does not exist, replacing with downloaded file... #> trying URL 'ftp://ftp.epchc.org/EPC_ERM_FTP/WQM_Reports/PlanktonDataList_ThroughCurrentReportMonth.xlsx' length 12319508 bytes (11.7 MB)
After the phytoplankton data are successfully imported, you can view them from the assigned object:
phytodata #> # A tibble: 22,815 x 8 #> epchc_station Date name units count yrqrt yr mo #> <chr> <date> <chr> <chr> <dbl> <date> <dbl> <ord> #> 1 11 1975-07-23 Cyanobacteria /0.1mL 0 1975-07-01 1975 Jul #> 2 11 1976-01-07 Cyanobacteria /0.1mL 1 1976-01-01 1976 Jan #> 3 11 1977-01-05 other /0.1mL 1 1977-01-01 1977 Jan #> 4 11 1977-04-20 other /0.1mL 1 1977-04-01 1977 Apr #> 5 11 1977-04-20 Tripos hircus /0.1mL 1 1977-04-01 1977 Apr #> 6 11 1977-07-13 other /0.1mL 12 1977-07-01 1977 Jul #> 7 11 1978-01-11 other /0.1mL 16 1978-01-01 1978 Jan #> 8 11 1979-02-08 other /0.1mL 1 1979-01-01 1979 Feb #> 9 11 1979-05-02 Karenia brevis /0.1mL 1 1979-04-01 1979 May #> 10 11 1979-05-30 other /0.1mL 1 1979-04-01 1979 May #> # … with 22,805 more rows
These data are highly summarized from the raw data file available online. Cell counts (as number of cells per 0.1mL) for selected taxa are summed for each station by quarters (i.e., Jan/Feb/Mar, Apr/May/Jun, etc.). The quarter is indicated in the
yrqrt column specified by the starting date of each quarter (e.g.,
1975-07-01 is the quarter Jul/Aug/Sep for 1975). These data are primarily used to support analyses in the water quality dashboard: https://shiny.tbep.org/wq-dash/
anlz_avedatsite() summarize the station data by bay segments or by sites, respectively. Both functions return annual means for chlorophyll and light attenuation (based on Secchi depth measurements) and monthly means by year for chlorophyll and light attenuation. These summaries are then used to determine if bay segment targets for water quality are met using the
Here we use
anlz_avedat() to summarize the data by bay segment to estimate annual and monthly means for chlorophyll and light attenuation. The output is a two-element list for the annual (
ann) and monthly (
mos) means by segment.
avedat <- anlz_avedat(epcdata) avedat #> $ann #> # A tibble: 584 x 4 #> yr bay_segment var val #> <dbl> <chr> <chr> <dbl> #> 1 1974 HB mean_chla 22.4 #> 2 1974 LTB mean_chla 4.24 #> 3 1974 MTB mean_chla 9.66 #> 4 1974 OTB mean_chla 10.2 #> 5 1975 HB mean_chla 27.9 #> 6 1975 LTB mean_chla 4.93 #> 7 1975 MTB mean_chla 11.4 #> 8 1975 OTB mean_chla 13.2 #> 9 1976 HB mean_chla 29.5 #> 10 1976 LTB mean_chla 5.08 #> # … with 574 more rows #> #> $mos #> # A tibble: 4,460 x 5 #> bay_segment yr mo var val #> <chr> <dbl> <dbl> <chr> <dbl> #> 1 HB 1974 1 mean_chla 36.2 #> 2 LTB 1974 1 mean_chla 1.75 #> 3 MTB 1974 1 mean_chla 11.5 #> 4 OTB 1974 1 mean_chla 4.4 #> 5 HB 1974 2 mean_chla 42.4 #> 6 LTB 1974 2 mean_chla 5.5 #> 7 MTB 1974 2 mean_chla 9.35 #> 8 OTB 1974 2 mean_chla 4.07 #> 9 HB 1974 3 mean_chla 14.9 #> 10 LTB 1974 3 mean_chla 5.88 #> # … with 4,450 more rows
This output can then be further analyzed with
anlz_attain() to determine if the bay segment outcomes are met in each year. The results are used by the plotting functions described below. In short, the
chl_la column indicates the categorical outcome for chlorophyll and light attenuation for each segment. The outcomes are integer values from zero to three. The relative exceedances of water quality thresholds for each segment, both in duration and magnitude, are indicated by higher integer values.
anlz_attain(avedat) #> # A tibble: 192 x 4 #> bay_segment yr chl_la outcome #> <chr> <dbl> <chr> <chr> #> 1 HB 1974 3_0 yellow #> 2 HB 1975 3_2 red #> 3 HB 1976 3_2 red #> 4 HB 1977 3_2 red #> 5 HB 1978 3_3 red #> 6 HB 1979 3_3 red #> 7 HB 1980 3_3 red #> 8 HB 1981 3_3 red #> 9 HB 1982 3_3 red #> 10 HB 1983 3_0 yellow #> # … with 182 more rows
Similar information can be obtained for individual sites using
anlz_attainsite(). The main difference is that a yes/no column
metis added that indicates only if the target was above or below the segment threshold for each site.
anlz_avedatsite(epcdata) %>% anlz_attainsite #> # A tibble: 2,160 x 9 #> yr bay_segment epchc_station var val target smallex thresh met #> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 1974 HB 6 chla 25.6 13.2 14.1 15 no #> 2 1974 HB 7 chla 21.6 13.2 14.1 15 no #> 3 1974 HB 8 chla 22.6 13.2 14.1 15 no #> 4 1974 HB 44 chla 23.4 13.2 14.1 15 no #> 5 1974 HB 52 chla 23.5 13.2 14.1 15 no #> 6 1974 HB 55 chla 20.2 13.2 14.1 15 no #> 7 1974 HB 70 chla 33.1 13.2 14.1 15 no #> 8 1974 HB 71 chla 25.8 13.2 14.1 15 no #> 9 1974 HB 73 chla 17.6 13.2 14.1 15 no #> 10 1974 HB 80 chla 10.5 13.2 14.1 15 yes #> # … with 2,150 more rows
External package libraries in R can be used to plot the time series data. Here’s an example using the popular ggplot2 package. Some data wrangling with the dplyr is done first to filter the data we want to plot.
toplo <- epcdata %>% filter(epchc_station == '52') ggplot(toplo, aes(x = SampleTime, y = chla)) + geom_line() + geom_point() + scale_y_log10() + labs( y = 'Chlorophyll-a concentration (ug/L)', x = NULL, title = 'Chlorophyll trends', subtitle = 'Hillsborough Bay station 52, all dates' ) + theme_bw()
show_thrplot() function provides a more descriptive assessment of annual trends for a chosen bay segment relative to defined targets or thresholds. In this plot we show the annual averages across stations Old Tampa bay (
bay_segment = "OTB") for chlorophyll (
thr = "chla"). The red line shows annual trends and the horizontal blue lines indicate the thresholds and targets for chlorophyll-a that are specific to Old Tampa Bay. The dashed and dotted blue lines indicate +1 and +2 standard errors for the management target shown by the filled line. The target and standard errors are considered when identifying the annual segment outcome for chlorophyll.
show_thrplot(epcdata, bay_segment = "OTB", thr = "chla")
We can show the same plot but for light attenuation by changing the
thr = "chla" to
thr = "la". Note the change in the horizontal reference lines for the light attenuation target.
show_thrplot(epcdata, bay_segment = "OTB", thr = "la")
The year range to plot can also be specified using the
yrrng argument, where the default is
yrrng = c(1975, 2018).
epcdata %>% anlz_avedat %>% .[['ann']] %>% filter(bay_segment == 'OTB') %>% filter(var == 'mean_la') %>% filter(yr >= 2000 & yr <= 2018) #> # A tibble: 19 x 4 #> yr bay_segment var val #> <dbl> <chr> <chr> <dbl> #> 1 2000 OTB mean_la 0.733 #> 2 2001 OTB mean_la 0.951 #> 3 2002 OTB mean_la 0.927 #> 4 2003 OTB mean_la 1.04 #> 5 2004 OTB mean_la 0.878 #> 6 2005 OTB mean_la 0.769 #> 7 2006 OTB mean_la 0.620 #> 8 2007 OTB mean_la 0.677 #> 9 2008 OTB mean_la 0.696 #> 10 2009 OTB mean_la 0.808 #> 11 2010 OTB mean_la 0.842 #> 12 2011 OTB mean_la 0.912 #> 13 2012 OTB mean_la 0.687 #> 14 2013 OTB mean_la 0.567 #> 15 2014 OTB mean_la 0.606 #> 16 2015 OTB mean_la 0.560 #> 17 2016 OTB mean_la 0.575 #> 18 2017 OTB mean_la 0.682 #> 19 2018 OTB mean_la 0.678
show_boxplot() function provides an assessment of seasonal changes in chlorophyll or light attenuation values by bay segment. The most recent year is highlighted in red by default. This allows a simple evaluation of how the most recent year compared to historical averages. The large exceedance value is shown in blue text and as the dotted line. This corresponds to a “large” magnitude change of +2 standard errors above the bay segment threshold and is the same dotted line shown in
show_boxplot(epcdata, param = 'chla', bay_segment = "OTB")
show_boxplot(epcdata, param = 'la', bay_segment = "HB")
A different subset of years and selected year of interest can also be viewed by changing the
yrsel arguments. Here we show 1980 compared to monthly averages for the last ten years.
show_thrplot() function is useful to understand annual variation in chlorophyll and light attenuation relative to management targets for each bay segment. The information from these plots can provide an understanding of how the annual reporting outcomes are determined. As noted above, an outcome integer from zero to three is assigned to each bay segment for each annual estimate of chlorophyll and light attenuation. These outcomes are based on both the exceedance of the annual estimate above the threshold or target (blue lines in
show_thrplot()) and duration of the exceedance for the years prior. The following graphic describes this logic .
These outcomes are assigned for both chlorophyll and light attenuation. The duration criteria are determined based on whether the exceedance was observed for years prior to the current year. The exceedance criteria for chlorophyll and light-attenuation are specific to each segment. The tbeptools package contains a
targets data file that is a reference for determining annual outcomes. This file is loaded automatically with the package and can be viewed from the command line.
targets #> bay_segment name chla_target chla_smallex chla_thresh la_target #> 1 OTB Old Tampa Bay 8.5 8.9 9.3 0.83 #> 2 HB Hillsborough Bay 13.2 14.1 15.0 1.58 #> 3 MTB Middle Tampa Bay 7.4 7.9 8.5 0.83 #> 4 LTB Lower Tampa Bay 4.6 4.8 5.1 0.63 #> la_smallex la_thresh #> 1 0.86 0.88 #> 2 1.63 1.67 #> 3 0.87 0.91 #> 4 0.66 0.68
The final plotting function is
show_matrix(), which creates an annual reporting matrix that reflects the combined outcomes for chlorophyll and light attenuation. Tracking the attainment of bay segment specific targets for these indicators provides the framework from which bay management actions are developed and initiated. For each year and segment, a color-coded management action is assigned:
Stay the Course: Continue planned projects. Report data via annual progress reports and Baywide Environmental Monitoring Report.
Caution: Review monitoring data and nitrogen loading estimates. Begin/continue TAC and Management Board development of specific management recommendations.
On Alert: Finalize development and implement appropriate management actions to get back on track.
The management category or action is based on the combination of outcomes for chlorophyll and light attenuation .
The results can be viewed with
The matrix is also a
ggplot object and its layout can be changed using
ggplot elements. Note the use of
txtsz = NULL to remove the color labels.
show_matrix(epcdata, txtsz = NULL) + scale_y_continuous(expand = c(0,0), breaks = c(1975:2018)) + coord_flip() + theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 7))
If preferred, the matrix can also be returned in an HTML table that can be sorted and scrolled. Only the first ten rows are shown by defaul. The default number of rows (10) can be changed with the argument. Use a very large number to show all rows.
show_matrix(epcdata, asreact = TRUE)
A plotly (interactive, dynamic plot) can be returned by setting the
plotly argument to
show_matrix(epcdata, plotly = TRUE)
Results can also be obtained for a selected year. Outcomes can be returned in tabular format with
anlz_yrattain(). This table also shows segment averages for chlorophyll and light attenuation, including the associated targets.
anlz_yrattain(epcdata, yrsel = 2018) #> # A tibble: 4 x 6 #> bay_segment chla_val chla_target la_val la_target outcome #> <fct> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 OTB 9.22 8.5 0.678 0.83 yellow #> 2 HB 13.9 13.2 1.09 1.58 green #> 3 MTB 7.05 7.4 0.570 0.83 green #> 4 LTB 4.65 4.6 0.593 0.63 green
A map showing if individual sites achieved chlorophyll targets can be obtained with
show_sitemap(). The station averages for chlorophyll for the selected year are shown next to each point. Stations in red failed to meet the segment target.
show_sitemap(epcdata, yrsel = 2018)
show_sitemap() function also includes an argument to specify a particular monthly range for the selected year. If this option is chosen, averages are shown as continuous values at each station.
Bay segment exceedances can also be viewed in a matrix using
show_wqmatrix(). The thresholds for these values correspond to the Florida DEP criteria (or a large exceedance defined as +2 standard errors above the segment target).
By default, the
show_wqmatrix() function returns chlorophyll exceedances by segment. Light attenuation exceedances can be viewed by changing the
show_wqmatrix(epcdata, param = 'la')
The results from
show_wqmatrix() can be combined for an individual segment using the
show_segmatrix() function. This is useful to understand which water quality parameter is driving the management outcome for a given year. The plot shows the light attenuation and chlorophyll outcomes from
show_wqmatrix() next to the segment management outcomes from
show_matrix(). Only one segment can be plotted for each function call.
show_segmatrix(epcdata, bay_segment = 'OTB')
Finally, all segment plots can be shown together using the
show_segplotly() function that combines chlorophyll and secchi data for a given segment. This function combines outputs from
show_segmatrix(). The final plot is interactive and can be zoomed by dragging the mouse pointer over a section of the plot. Information about each cell or value can be seen by hovering over a location in the plot. Please note that the scaling here is horrible, but this can be changed when creating the plot on your own.
From these plots, we can quickly view a summary of the environmental history of water quality in Tampa Bay. Degraded conditions were common early in the period of record, particularly for Old Tampa Bay and Hillsborough Bay. Conditions began to improve by the late 1980s and early 1990s, with good conditions persisting to present day. However, recent trends in Old Tampa Bay have shown conditions changing from “stay the course” to “caution”.
 A. Poe, K. Hackett, S. Janicki, R. Pribble, A. Janicki, Estimates of total nitrogen, total phosphorus, total suspended solids, and biochemical oxygen demand loadings to Tampa Bay, Florida: 1999-2003, Tampa Bay Estuary Program, St. Petersburg, Florida, USA, 2005. https://drive.google.com/file/d/1GNSb5i_x_WSxe8VKz9FtqZ7fjWcnVqHO/view?usp=drivesdk.
 H. Greening, A. Janicki, Toward reversal of eutrophic conditions in a subtropical estuary: Water quality and seagrass response to nitrogen loading reductions in Tampa Bay, Florida, USA, Environmental Management. 38 (2006) 163–178.
 D.A. Tomasko, C.A. Corbett, H.S. Greening, G.E. Raulerson, Spatial and temporal variation in seagrass coverage in Southwest Florida: Assessing the relative effects of anthropogenic nutrient load reductions and rainfall in four contiguous estuaries, Marine Pollution Bulletin. 50 (2005) 797–805.
 S.L. Santos, J.L. Simon, Marine soft-bottom community establishment following annual defaunation: Larval or adult recruitment, Marine Ecology - Progress Series. 2 (1980) 235–241.
 M.W. Beck, J.D. Hagy III, Adaptation of a weighted regression approach to evaluate water quality trends in an estuary, Environmental Modelling and Assessment. 20 (2015) 637–655. https://doi.org/10.1007/s10666-015-9452-8.
 E.T. Sherwood, H.S. Greening, A.J. Janicki, D.J. Karlen, Tampa Bay estuary: Monitoring long-term recovery through regional partnerships, Regional Studies in Marine Science. 4 (2016) 1–11. https://doi.org/10.1016/j.rsma.2015.05.005.
 TBEP (Tampa Bay Estuary Program), Tampa Bay Water Atlas, (2017).
 A. Janicki, D.Wade, J.R. Pribble, Developing and Establishing a Process to Track the Status of Chlorophyll-a Concentrations and Light Attenuation to Support Seagrass Restoration Goals in Tampa Bay, Tampa Bay Estuary Program, St. Petersburg, Florida, 2000. https://drive.google.com/file/d/1XMULU8w4syWcSv_ciOUOhnC_G4xt6GIF/view?usp=drivesdk.