This vignette provides an overview of the functions in tbeptools that can be used to work with water quality data in Tampa Bay. View the other vignettes for topical introductions to other reporting products (e.g., seagrasess, tidal creeks, etc.).

The environmental recovery of Tampa Bay is an exceptional success story for coastal water quality management. Nitrogen loads in the mid 1970s have been estimated at 8.2 million kg/yr, with approximately 5.5 million kg/yr entering the upper Bay alone [1]. Reduced water clarity associated with phytoplankton biomass contributed to a dramatic reduction in the areal coverage of seagrass [3] and development of hypoxic events, causing a decline in benthic faunal production [4]. Extensive efforts to reduce nutrient loads to the Bay occurred by the late 1970s, with the most notable being improvements in infrastructure for wastewater treatment in 1979. Improvements in water clarity and decreases in chlorophyll concentrations were observed Bay-wide in the 1980s, with conditions generally remaining constant to present day [5].

Tracking changes in environmental condition from the past to present day would not have been possible without a long-term monitoring dataset. Data have been collected monthly by the Environmental Protection Commission of Hillsborough County since 1974 [6,7]. Samples are taken at forty-five stations using by water collection or monitoring sonde at bottom, mid- or surface depths, depending on parameter. The locations of monitoring stations are fixed and cover the entire Bay from the uppermost mesohaline sections to the lowermost euhaline portions that have direct interaction with the Gulf of Mexico. Up to 515 observations are available for different parameters at each station, e.g., nitrogen, chlorophyll-a, and secchi depth.

Data collected from the monitoring program are processed and maintained in a spreadsheet titled RWMDataSpreadsheet_ThroughCurrentReportMonth.xlsx at These data include observations at all stations and for all parameters throughout the period of record. To date, there have been no systematic tools for importing, analyzing, and reporting information from these data. The tbeptools package provides was developed to address this need.

Locations of long-term monitoring stations in Tampa Bay. The Bay is separated into four segments defined by chemical, physical, and geopolitical boundaries.

Locations of long-term monitoring stations in Tampa Bay. The Bay is separated into four segments defined by chemical, physical, and geopolitical boundaries.


The main function for importing water quality data is read_importwq(). This function downloads the latest file if one is not already available at the location specified by the xlsx input argument.

First, create a character path for the location of the file. If one does not exist, specify a desired location and name for the downloaded file. Here, we want to put the file in the vignettes folder and name is 2018_Results_updated.xls. Note that this file path is relative to the root working directly for the current R session. You can view the working directory with getwd().

xlsx <- 'vignettes/2018_Results_updated.xls'

Now we pass this xlsx object to the read_importwq() function.

ecpdata <- read_importwq(xlsx)
#> Error in read_importwq("empty") : file.exists(xlsx) is not TRUE

We get an error message from the function indicating that the file is not found. This makes sense because the file doesn’t exist yet, so we need to tell the function to download the latest file. This is done by changing the download_latest argument to TRUE (the default is FALSE).

ecpdata <- read_importwq(xlsx, download_latest = TRUE)
#> File vignettes/2018_Results_updated.xls does not exist, replacing with downloaded file...

#> trying URL ''
 length 24562051 bytes (23.4 MB)

Now we get the same message, but with an indication that file on the server is being downloaded. We’ll have the data downloaded and saved to the epcdata object after it finishes downloading.

If we try to run the function again after downloading the data from the server, we get the following message. This check is done to make sure that the data are not unnecessarily downloaded if the current matches the file on the server.

ecpdata <- read_importwq(xlsx, download_latest = TRUE)
#> File is current...

Every time that tbeptools is used to work with the monitoring data, read_importwq() should be used to import the data. You will always receive the message File is current... if your local file matches the one on the server. However, new data are regularly collected and posted on the server. If download_latest = TRUE and your local file is out of date, you will receive the following message:

#> Replacing local file with current...

The final argument na indicates which fields in the downloaded spreadsheet are treated as blank values and assigned to NA. Any number of strings can be added to this function to replace fields with NA values.

After the data are successfully imported, you can view them from the assigned object:

#> # A tibble: 26,476 x 22
#>    bay_segment epchc_station SampleTime             yr    mo Latitude Longitude
#>    <chr>               <dbl> <dttm>              <dbl> <dbl>    <dbl>     <dbl>
#>  1 HB                      6 2021-03-16 10:24:00  2021     3     27.9     -82.5
#>  2 HB                      7 2021-03-16 10:43:00  2021     3     27.9     -82.5
#>  3 HB                      8 2021-03-16 13:52:00  2021     3     27.9     -82.4
#>  4 MTB                     9 2021-03-16 13:09:00  2021     3     27.8     -82.4
#>  5 MTB                    11 2021-03-16 10:59:00  2021     3     27.8     -82.5
#>  6 MTB                    13 2021-03-16 11:12:00  2021     3     27.8     -82.5
#>  7 MTB                    14 2021-03-16 12:46:00  2021     3     27.8     -82.5
#>  8 MTB                    16 2021-03-23 10:43:00  2021     3     27.7     -82.5
#>  9 MTB                    19 2021-03-23 10:58:00  2021     3     27.7     -82.6
#> 10 LTB                    23 2021-03-23 13:57:00  2021     3     27.7     -82.6
#> # … with 26,466 more rows, and 15 more variables: Total_Depth_m <dbl>,
#> #   Sample_Depth_m <dbl>, tn <dbl>, tn_q <chr>, sd_m <dbl>, sd_raw_m <dbl>,
#> #   sd_q <chr>, chla <dbl>, chla_q <chr>, Sal_Top_ppth <dbl>,
#> #   Sal_Mid_ppth <dbl>, Sal_Bottom_ppth <dbl>, Temp_Water_Top_degC <dbl>,
#> #   Temp_Water_Mid_degC <dbl>, Temp_Water_Bottom_degC <dbl>

These data include the bay segment name, station number, sample time, year, month, latitude, longitude, station depth, sample depth, secchi depth, and chlorophyll. Note that the monitoring data include additional parameters. Chlorophyll and secchi depth are currently the only parameters returned by read_importwq() given the reporting indicators used below.

An import function is also available to download and format phytoplankton cell count data. The read_importphyto() function works similarly as the import function for the water quality data. Start by specifying a path where the data should be downloaded and set download_latest to TRUE. This function will download and summarize data from the file PlanktonDataList_ThroughCurrentReportMonth.xlsx on the EPC website.

xlsx <- 'vignettes/phyto_data.xlsx'
phytodata <- read_importphyto(xlsx, download_latest = T)
#> File vignettes/phyto_data.xlsx does not exist, replacing with downloaded file...

#> trying URL ''
 length 12319508 bytes (11.7 MB)

After the phytoplankton data are successfully imported, you can view them from the assigned object:

#> # A tibble: 22,815 x 8
#>    epchc_station Date       name           units  count yrqrt         yr mo   
#>    <chr>         <date>     <chr>          <chr>  <dbl> <date>     <dbl> <ord>
#>  1 11            1975-07-23 Cyanobacteria  /0.1mL     0 1975-07-01  1975 Jul  
#>  2 11            1976-01-07 Cyanobacteria  /0.1mL     1 1976-01-01  1976 Jan  
#>  3 11            1977-01-05 other          /0.1mL     1 1977-01-01  1977 Jan  
#>  4 11            1977-04-20 other          /0.1mL     1 1977-04-01  1977 Apr  
#>  5 11            1977-04-20 Tripos hircus  /0.1mL     1 1977-04-01  1977 Apr  
#>  6 11            1977-07-13 other          /0.1mL    12 1977-07-01  1977 Jul  
#>  7 11            1978-01-11 other          /0.1mL    16 1978-01-01  1978 Jan  
#>  8 11            1979-02-08 other          /0.1mL     1 1979-01-01  1979 Feb  
#>  9 11            1979-05-02 Karenia brevis /0.1mL     1 1979-04-01  1979 May  
#> 10 11            1979-05-30 other          /0.1mL     1 1979-04-01  1979 May  
#> # … with 22,805 more rows

These data are highly summarized from the raw data file available online. Cell counts (as number of cells per 0.1mL) for selected taxa are summed for each station by quarters (i.e., Jan/Feb/Mar, Apr/May/Jun, etc.). The quarter is indicated in the yrqrt column specified by the starting date of each quarter (e.g., 1975-07-01 is the quarter Jul/Aug/Sep for 1975). These data are primarily used to support analyses in the water quality dashboard:


The functions anlz_avedat() and anlz_avedatsite() summarize the station data by bay segments or by sites, respectively. Both functions return annual means for chlorophyll and light attenuation (based on Secchi depth measurements) and monthly means by year for chlorophyll and light attenuation. These summaries are then used to determine if bay segment targets for water quality are met using the anlz_attain() and anlz_attainsite() function.

Here we use anlz_avedat() to summarize the data by bay segment to estimate annual and monthly means for chlorophyll and light attenuation. The output is a two-element list for the annual (ann) and monthly (mos) means by segment.

avedat <- anlz_avedat(epcdata)
#> $ann
#> # A tibble: 584 x 4
#>       yr bay_segment var         val
#>    <dbl> <chr>       <chr>     <dbl>
#>  1  1974 HB          mean_chla 22.4 
#>  2  1974 LTB         mean_chla  4.24
#>  3  1974 MTB         mean_chla  9.66
#>  4  1974 OTB         mean_chla 10.2 
#>  5  1975 HB          mean_chla 27.9 
#>  6  1975 LTB         mean_chla  4.93
#>  7  1975 MTB         mean_chla 11.4 
#>  8  1975 OTB         mean_chla 13.2 
#>  9  1976 HB          mean_chla 29.5 
#> 10  1976 LTB         mean_chla  5.08
#> # … with 574 more rows
#> $mos
#> # A tibble: 4,460 x 5
#>    bay_segment    yr    mo var         val
#>    <chr>       <dbl> <dbl> <chr>     <dbl>
#>  1 HB           1974     1 mean_chla 36.2 
#>  2 LTB          1974     1 mean_chla  1.75
#>  3 MTB          1974     1 mean_chla 11.5 
#>  4 OTB          1974     1 mean_chla  4.4 
#>  5 HB           1974     2 mean_chla 42.4 
#>  6 LTB          1974     2 mean_chla  5.5 
#>  7 MTB          1974     2 mean_chla  9.35
#>  8 OTB          1974     2 mean_chla  4.07
#>  9 HB           1974     3 mean_chla 14.9 
#> 10 LTB          1974     3 mean_chla  5.88
#> # … with 4,450 more rows

This output can then be further analyzed with anlz_attain() to determine if the bay segment outcomes are met in each year. The results are used by the plotting functions described below. In short, the chl_la column indicates the categorical outcome for chlorophyll and light attenuation for each segment. The outcomes are integer values from zero to three. The relative exceedances of water quality thresholds for each segment, both in duration and magnitude, are indicated by higher integer values.

#> # A tibble: 192 x 4
#>    bay_segment    yr chl_la outcome
#>    <chr>       <dbl> <chr>  <chr>  
#>  1 HB           1974 3_0    yellow 
#>  2 HB           1975 3_2    red    
#>  3 HB           1976 3_2    red    
#>  4 HB           1977 3_2    red    
#>  5 HB           1978 3_3    red    
#>  6 HB           1979 3_3    red    
#>  7 HB           1980 3_3    red    
#>  8 HB           1981 3_3    red    
#>  9 HB           1982 3_3    red    
#> 10 HB           1983 3_0    yellow 
#> # … with 182 more rows

Similar information can be obtained for individual sites using anlz_avedatsite() and anlz_attainsite(). The main difference is that a yes/no column metis added that indicates only if the target was above or below the segment threshold for each site.

anlz_avedatsite(epcdata) %>% anlz_attainsite
#> # A tibble: 2,160 x 9
#>       yr bay_segment epchc_station var     val target smallex thresh met  
#>    <dbl> <chr>               <dbl> <chr> <dbl>  <dbl>   <dbl>  <dbl> <chr>
#>  1  1974 HB                      6 chla   25.6   13.2    14.1     15 no   
#>  2  1974 HB                      7 chla   21.6   13.2    14.1     15 no   
#>  3  1974 HB                      8 chla   22.6   13.2    14.1     15 no   
#>  4  1974 HB                     44 chla   23.4   13.2    14.1     15 no   
#>  5  1974 HB                     52 chla   23.5   13.2    14.1     15 no   
#>  6  1974 HB                     55 chla   20.2   13.2    14.1     15 no   
#>  7  1974 HB                     70 chla   33.1   13.2    14.1     15 no   
#>  8  1974 HB                     71 chla   25.8   13.2    14.1     15 no   
#>  9  1974 HB                     73 chla   17.6   13.2    14.1     15 no   
#> 10  1974 HB                     80 chla   10.5   13.2    14.1     15 yes  
#> # … with 2,150 more rows


External package libraries in R can be used to plot the time series data. Here’s an example using the popular ggplot2 package. Some data wrangling with the dplyr is done first to filter the data we want to plot.

toplo <- epcdata %>% 
  filter(epchc_station == '52')

ggplot(toplo, aes(x = SampleTime, y = chla)) + 
  geom_line() + 
  geom_point() + 
  scale_y_log10() + 
    y = 'Chlorophyll-a concentration (ug/L)', 
    x = NULL, 
    title = 'Chlorophyll trends',
    subtitle = 'Hillsborough Bay station 52, all dates'
    ) + 

The show_thrplot() function provides a more descriptive assessment of annual trends for a chosen bay segment relative to defined targets or thresholds. In this plot we show the annual averages across stations Old Tampa bay (bay_segment = "OTB") for chlorophyll (thr = "chla"). The red line shows annual trends and the horizontal blue lines indicate the thresholds and targets for chlorophyll-a that are specific to Old Tampa Bay. The dashed and dotted blue lines indicate +1 and +2 standard errors for the management target shown by the filled line. The target and standard errors are considered when identifying the annual segment outcome for chlorophyll.

show_thrplot(epcdata, bay_segment = "OTB", thr = "chla")

We can show the same plot but for light attenuation by changing the thr = "chla" to thr = "la". Note the change in the horizontal reference lines for the light attenuation target.

show_thrplot(epcdata, bay_segment = "OTB", thr = "la")

The year range to plot can also be specified using the yrrng argument, where the default is yrrng = c(1975, 2018).

show_thrplot(epcdata, bay_segment = "OTB", thr = "la", yrrng = c(2000, 2018))

The show_thrplot() function uses results from the anlz_avedat() function. For example, you can retrieve the values from the above plot as follows:

epcdata %>% 
  anlz_avedat %>% 
  .[['ann']] %>% 
  filter(bay_segment == 'OTB') %>% 
  filter(var == 'mean_la') %>% 
  filter(yr >= 2000 & yr <= 2018)
#> # A tibble: 19 x 4
#>       yr bay_segment var       val
#>    <dbl> <chr>       <chr>   <dbl>
#>  1  2000 OTB         mean_la 0.733
#>  2  2001 OTB         mean_la 0.951
#>  3  2002 OTB         mean_la 0.927
#>  4  2003 OTB         mean_la 1.04 
#>  5  2004 OTB         mean_la 0.878
#>  6  2005 OTB         mean_la 0.769
#>  7  2006 OTB         mean_la 0.620
#>  8  2007 OTB         mean_la 0.677
#>  9  2008 OTB         mean_la 0.696
#> 10  2009 OTB         mean_la 0.808
#> 11  2010 OTB         mean_la 0.842
#> 12  2011 OTB         mean_la 0.912
#> 13  2012 OTB         mean_la 0.687
#> 14  2013 OTB         mean_la 0.567
#> 15  2014 OTB         mean_la 0.606
#> 16  2015 OTB         mean_la 0.560
#> 17  2016 OTB         mean_la 0.575
#> 18  2017 OTB         mean_la 0.682
#> 19  2018 OTB         mean_la 0.678

Similarly, the show_boxplot() function provides an assessment of seasonal changes in chlorophyll or light attenuation values by bay segment. The most recent year is highlighted in red by default. This allows a simple evaluation of how the most recent year compared to historical averages. The large exceedance value is shown in blue text and as the dotted line. This corresponds to a “large” magnitude change of +2 standard errors above the bay segment threshold and is the same dotted line shown in show_thrplot().

show_boxplot(epcdata, param = 'chla', bay_segment = "OTB")

show_boxplot(epcdata, param = 'la', bay_segment = "HB")

A different subset of years and selected year of interest can also be viewed by changing the yrrng and yrsel arguments. Here we show 1980 compared to monthly averages for the last ten years.

show_boxplot(epcdata, param = 'chla', bay_segment = "OTB", yrrng = c(2008, 2018), yrsel = 1980)

The show_thrplot() function is useful to understand annual variation in chlorophyll and light attenuation relative to management targets for each bay segment. The information from these plots can provide an understanding of how the annual reporting outcomes are determined. As noted above, an outcome integer from zero to three is assigned to each bay segment for each annual estimate of chlorophyll and light attenuation. These outcomes are based on both the exceedance of the annual estimate above the threshold or target (blue lines in show_thrplot()) and duration of the exceedance for the years prior. The following graphic describes this logic [8].

Outcomes for annual estimates of water quality are assigned an integer value from zero to three depending on both magnitude and duration of the exceedence.

Outcomes for annual estimates of water quality are assigned an integer value from zero to three depending on both magnitude and duration of the exceedence.

These outcomes are assigned for both chlorophyll and light attenuation. The duration criteria are determined based on whether the exceedance was observed for years prior to the current year. The exceedance criteria for chlorophyll and light-attenuation are specific to each segment. The tbeptools package contains a targets data file that is a reference for determining annual outcomes. This file is loaded automatically with the package and can be viewed from the command line.

#>   bay_segment             name chla_target chla_smallex chla_thresh la_target
#> 1         OTB    Old Tampa Bay         8.5          8.9         9.3      0.83
#> 2          HB Hillsborough Bay        13.2         14.1        15.0      1.58
#> 3         MTB Middle Tampa Bay         7.4          7.9         8.5      0.83
#> 4         LTB  Lower Tampa Bay         4.6          4.8         5.1      0.63
#>   la_smallex la_thresh
#> 1       0.86      0.88
#> 2       1.63      1.67
#> 3       0.87      0.91
#> 4       0.66      0.68

The final plotting function is show_matrix(), which creates an annual reporting matrix that reflects the combined outcomes for chlorophyll and light attenuation. Tracking the attainment of bay segment specific targets for these indicators provides the framework from which bay management actions are developed and initiated. For each year and segment, a color-coded management action is assigned:

Stay the Course: Continue planned projects. Report data via annual progress reports and Baywide Environmental Monitoring Report.

Caution: Review monitoring data and nitrogen loading estimates. Begin/continue TAC and Management Board development of specific management recommendations.

On Alert: Finalize development and implement appropriate management actions to get back on track.

The management category or action is based on the combination of outcomes for chlorophyll and light attenuation [8].

Management action categories assigned to each bay segment and year based on chlorophyll and light attenuation outcomes.

Management action categories assigned to each bay segment and year based on chlorophyll and light attenuation outcomes.

The results can be viewed with show_matrix().


The matrix is also a ggplot object and its layout can be changed using ggplot elements. Note the use of txtsz = NULL to remove the color labels.

show_matrix(epcdata, txtsz = NULL) +
  scale_y_continuous(expand = c(0,0), breaks = c(1975:2018)) + 
  coord_flip() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 7))

If preferred, the matrix can also be returned in an HTML table that can be sorted and scrolled. Only the first ten rows are shown by defaul. The default number of rows (10) can be changed with the argument. Use a very large number to show all rows.

show_matrix(epcdata, asreact = TRUE)

A plotly (interactive, dynamic plot) can be returned by setting the plotly argument to TRUE.

show_matrix(epcdata, plotly = TRUE)

Results can also be obtained for a selected year. Outcomes can be returned in tabular format with anlz_yrattain(). This table also shows segment averages for chlorophyll and light attenuation, including the associated targets.

anlz_yrattain(epcdata, yrsel = 2018)
#> # A tibble: 4 x 6
#>   bay_segment chla_val chla_target la_val la_target outcome
#>   <fct>          <dbl>       <dbl>  <dbl>     <dbl> <chr>  
#> 1 OTB             9.22         8.5  0.678      0.83 yellow 
#> 2 HB             13.9         13.2  1.09       1.58 green  
#> 3 MTB             7.05         7.4  0.570      0.83 green  
#> 4 LTB             4.65         4.6  0.593      0.63 green

A map showing if individual sites achieved chlorophyll targets can be obtained with show_sitemap(). The station averages for chlorophyll for the selected year are shown next to each point. Stations in red failed to meet the segment target.

show_sitemap(epcdata, yrsel = 2018)

The show_sitemap() function also includes an argument to specify a particular monthly range for the selected year. If this option is chosen, averages are shown as continuous values at each station.

show_sitemap(epcdata, yrsel = 2018, mosel = c(7, 9))

Bay segment exceedances can also be viewed in a matrix using show_wqmatrix(). The thresholds for these values correspond to the Florida DEP criteria (or a large exceedance defined as +2 standard errors above the segment target).


By default, the show_wqmatrix() function returns chlorophyll exceedances by segment. Light attenuation exceedances can be viewed by changing the param argument.

show_wqmatrix(epcdata, param = 'la')

The results from show_matrix() and show_wqmatrix() can be combined for an individual segment using the show_segmatrix() function. This is useful to understand which water quality parameter is driving the management outcome for a given year. The plot shows the light attenuation and chlorophyll outcomes from show_wqmatrix() next to the segment management outcomes from show_matrix(). Only one segment can be plotted for each function call.

show_segmatrix(epcdata, bay_segment = 'OTB')

Finally, all segment plots can be shown together using the show_segplotly() function that combines chlorophyll and secchi data for a given segment. This function combines outputs from show_thrplot() and show_segmatrix(). The final plot is interactive and can be zoomed by dragging the mouse pointer over a section of the plot. Information about each cell or value can be seen by hovering over a location in the plot. Please note that the scaling here is horrible, but this can be changed when creating the plot on your own.

From these plots, we can quickly view a summary of the environmental history of water quality in Tampa Bay. Degraded conditions were common early in the period of record, particularly for Old Tampa Bay and Hillsborough Bay. Conditions began to improve by the late 1980s and early 1990s, with good conditions persisting to present day. However, recent trends in Old Tampa Bay have shown conditions changing from “stay the course” to “caution”.


[1] A. Poe, K. Hackett, S. Janicki, R. Pribble, A. Janicki, Estimates of total nitrogen, total phosphorus, total suspended solids, and biochemical oxygen demand loadings to Tampa Bay, Florida: 1999-2003, Tampa Bay Estuary Program, St. Petersburg, Florida, USA, 2005.

[2] H. Greening, A. Janicki, Toward reversal of eutrophic conditions in a subtropical estuary: Water quality and seagrass response to nitrogen loading reductions in Tampa Bay, Florida, USA, Environmental Management. 38 (2006) 163–178.

[3] D.A. Tomasko, C.A. Corbett, H.S. Greening, G.E. Raulerson, Spatial and temporal variation in seagrass coverage in Southwest Florida: Assessing the relative effects of anthropogenic nutrient load reductions and rainfall in four contiguous estuaries, Marine Pollution Bulletin. 50 (2005) 797–805.

[4] S.L. Santos, J.L. Simon, Marine soft-bottom community establishment following annual defaunation: Larval or adult recruitment, Marine Ecology - Progress Series. 2 (1980) 235–241.

[5] M.W. Beck, J.D. Hagy III, Adaptation of a weighted regression approach to evaluate water quality trends in an estuary, Environmental Modelling and Assessment. 20 (2015) 637–655.

[6] E.T. Sherwood, H.S. Greening, A.J. Janicki, D.J. Karlen, Tampa Bay estuary: Monitoring long-term recovery through regional partnerships, Regional Studies in Marine Science. 4 (2016) 1–11.

[7] TBEP (Tampa Bay Estuary Program), Tampa Bay Water Atlas, (2017).

[8] A. Janicki, D.Wade, J.R. Pribble, Developing and Establishing a Process to Track the Status of Chlorophyll-a Concentrations and Light Attenuation to Support Seagrass Restoration Goals in Tampa Bay, Tampa Bay Estuary Program, St. Petersburg, Florida, 2000.