Analyze Fecal Indicator Bacteria categories over time by station or bay segment
Source:R/anlz_fibmatrix.R
anlz_fibmatrix.Rd
Analyze Fecal Indicator Bacteria categories over time by station or bay segment
Usage
anlz_fibmatrix(
fibdata,
yrrng = NULL,
stas = NULL,
bay_segment = NULL,
indic,
threshold = NULL,
lagyr = 3,
subset_wetdry = c("all", "wet", "dry"),
precipdata = NULL,
temporal_window = NULL,
wet_threshold = NULL,
warn = TRUE
)
Arguments
- fibdata
input data frame as returned by
read_importfib
,read_importentero
, orread_importwqp
, see details- yrrng
numeric vector indicating min, max years to include, defaults to range of years in data, see details
- stas
optional vector of stations to include, see details
- bay_segment
optional vector of bay segment names to include, supercedes
stas
if provided, see details- indic
character for choice of fecal indicator. Allowable options are
fcolif
for fecal coliform, orentero
for Enterococcus. A numeric column in the data frame must have this name.- threshold
optional numeric for threshold against which to calculate exceedances for the indicator bacteria of choice. If not provided, defaults to 400 for
fcolif
and 130 forentero
.- lagyr
numeric for year lag to calculate categories, see details
- subset_wetdry
character, subset data frame to only wet or dry samples as defined by
wet_threshold
andtemporal_window
? Defaults to"all"
, which will not subset. If"wet"
or"dry"
is specified,anlz_fibwetdry
is called using the further specified parameters, and the data frame is subsetted accordingly.- precipdata
input data frame as returned by
read_importrain
. columns should be: station, date (yyyy-mm-dd), rain (in inches). The objectcatchprecip
has this data from 1995-2023 for select Enterococcus stations. IfNULL
, defaults tocatchprecip
.- temporal_window
numeric; required if
subset_wetdry
is not"all"
. number of days precipitation should be summed over (1 = day of sample only; 2 = day of sample + day before; etc.)- wet_threshold
numeric; required if
subset_wetdry
is not"all"
. inches accumulated through the defined temporal window, above which a sample should be defined as being from a 'wet' time period- warn
logical to print warnings about stations with insufficient data, default
TRUE
Value
A tibble
object with FIB summaries by year and station including columns for the estimated geometric mean of fecal coliform or Enterococcus concentrations (gmean
), the proportion of samples exceeding 400 CFU / 100 mL (fecal coliform) or 130 CFU / 100 mL (Enterococcus) (exced
), the count of samples (cnt
), and a category indicating a letter outcome based on the proportion of exceedences (cat
). Results can be summarized by bay segment if bay_segment
is not NULL
and the input data is from read_importentero
.
Details
This function is used to create output for plotting a matrix stoplight graphic for FIB categories by station. The output can also be summarized by bay segment if bay_segment
is not NULL
and the input data is from read_importentero
. In the latter case, the stas
argument is ignored and all stations within each subsegment watershed are used to evaluate the FIB categories. Each station (or bay segment) and year combination is categorized based on the likelihood of fecal indicator bacteria concentrations exceeding some threshold in a given year. For fecal coliform, the default threshold is 400 CFU / 100 mL in a given year (using Fecal Coliform, fcolif
in fibdata
). For Enterococcus, the default threshold is 130 CFU / 100 mL. The proportions are categorized as A, B, C, D, or E (Microbial Water Quality Assessment or MWQA categories) with corresponding colors, where the breakpoints for each category are <10%, 10-30%, 30-50%, 50-75%, and >75% (right-closed). By default, the results for each year are based on a right-centered window that uses the previous two years and the current year to calculate probabilities using the monthly samples (lagyr = 3
). See show_fibmatrix
for additional details.
yrrng
can be specified several ways. If yrrng = NULL
, the year range of the data for the selected changes is chosen. User-defined values for the minimum and maximum years can also be used, or only a minimum or maximum can be specified, e.g., yrrng = c(2000, 2010)
or yrrng = c(2000, NA)
. In the latter case, the maximum year will be defined by the data.
The default stations for fecal coliform data are those used in TBEP report #05-13 (https://drive.google.com/file/d/1MZnK3cMzV7LRg6dTbCKX8AOZU0GNurJJ/view) for the Hillsborough River Basin Management Action Plan (BMAP) subbasins if bay_segment
is NULL
and the input data are from read_importfib
. These include Blackwater Creek (WBID 1482, EPC stations 143, 108), Baker Creek (WBID 1522C, EPC station 107), Lake Thonotosassa (WBID 1522B, EPC stations 135, 118), Flint Creek (WBID 1522A, EPC station 148), and the Lower Hillsborough River (WBID 1443E, EPC stations 105, 152, 137). Other stations can be plotted using the stas
argument.
Input from read_importwqp
for Manatee County (21FLMANA_WQX) FIB data can also be used. The function has not been tested for other organizations.
Examples
anlz_fibmatrix(fibdata, indic = 'fcolif')
#> # A tibble: 459 × 6
#> yr grp gmean Latitude Longitude cat
#> <dbl> <fct> <dbl> <dbl> <dbl> <chr>
#> 1 1974 143 NA NA NA NA
#> 2 1974 108 NA NA NA NA
#> 3 1974 107 NA NA NA NA
#> 4 1974 135 NA NA NA NA
#> 5 1974 118 NA NA NA NA
#> 6 1974 148 NA NA NA NA
#> 7 1974 105 NA NA NA NA
#> 8 1974 152 NA NA NA NA
#> 9 1974 137 NA NA NA NA
#> 10 1975 143 NA NA NA NA
#> # ℹ 449 more rows
# use different indicator
anlz_fibmatrix(fibdata, indic = 'entero')
#> # A tibble: 216 × 6
#> yr grp gmean Latitude Longitude cat
#> <dbl> <fct> <dbl> <dbl> <dbl> <chr>
#> 1 2001 143 NA NA NA NA
#> 2 2001 108 NA NA NA NA
#> 3 2001 107 NA NA NA NA
#> 4 2001 135 NA NA NA NA
#> 5 2001 118 NA NA NA NA
#> 6 2001 148 NA NA NA NA
#> 7 2001 105 NA NA NA NA
#> 8 2001 152 NA NA NA NA
#> 9 2001 137 NA NA NA NA
#> 10 2002 143 NA NA NA NA
#> # ℹ 206 more rows
# use different dataset
anlz_fibmatrix(enterodata, indic = 'entero', lagyr = 1)
#> Warning: Stations with insufficient data for lagyr: 21FLPDEM_WQX-05-06
#> # A tibble: 1,224 × 6
#> yr grp gmean Latitude Longitude cat
#> <dbl> <fct> <dbl> <dbl> <dbl> <chr>
#> 1 2000 21FLCOSP_WQX-32-03 NA NA NA NA
#> 2 2000 21FLCOSP_WQX-44-02 NA NA NA NA
#> 3 2000 21FLCOSP_WQX-48-03 NA NA NA NA
#> 4 2000 21FLCOSP_WQX-CENTRAL CANAL NA NA NA NA
#> 5 2000 21FLCOSP_WQX-COSP580 NA NA NA NA
#> 6 2000 21FLCOSP_WQX-NORTH CANAL NA NA NA NA
#> 7 2000 21FLCOSP_WQX-SC-01 NA NA NA NA
#> 8 2000 21FLCOSP_WQX-SOUTH CANAL NA NA NA NA
#> 9 2000 21FLDOH_WQX-MANATEE152 10.7 27.5 -82.7 A
#> 10 2000 21FLHILL_WQX-101 NA NA NA NA
#> # ℹ 1,214 more rows
# same entero data; lower threshold - changes 'cat' scores
anlz_fibmatrix(enterodata, indic = 'entero', lagyr = 1, threshold = 30)
#> Warning: Stations with insufficient data for lagyr: 21FLPDEM_WQX-05-06
#> # A tibble: 1,224 × 6
#> yr grp gmean Latitude Longitude cat
#> <dbl> <fct> <dbl> <dbl> <dbl> <chr>
#> 1 2000 21FLCOSP_WQX-32-03 NA NA NA NA
#> 2 2000 21FLCOSP_WQX-44-02 NA NA NA NA
#> 3 2000 21FLCOSP_WQX-48-03 NA NA NA NA
#> 4 2000 21FLCOSP_WQX-CENTRAL CANAL NA NA NA NA
#> 5 2000 21FLCOSP_WQX-COSP580 NA NA NA NA
#> 6 2000 21FLCOSP_WQX-NORTH CANAL NA NA NA NA
#> 7 2000 21FLCOSP_WQX-SC-01 NA NA NA NA
#> 8 2000 21FLCOSP_WQX-SOUTH CANAL NA NA NA NA
#> 9 2000 21FLDOH_WQX-MANATEE152 10.7 27.5 -82.7 A
#> 10 2000 21FLHILL_WQX-101 NA NA NA NA
#> # ℹ 1,214 more rows
# subset to only wet samples
anlz_fibmatrix(enterodata, indic = 'entero', lagyr = 1, subset_wetdry = "wet",
temporal_window = 2, wet_threshold = 0.5)
#> # A tibble: 1,150 × 6
#> yr grp gmean Latitude Longitude cat
#> <dbl> <fct> <dbl> <dbl> <dbl> <chr>
#> 1 2001 21FLCOSP_WQX-32-03 NA NA NA NA
#> 2 2001 21FLCOSP_WQX-44-02 NA NA NA NA
#> 3 2001 21FLCOSP_WQX-48-03 NA NA NA NA
#> 4 2001 21FLCOSP_WQX-CENTRAL CANAL NA NA NA NA
#> 5 2001 21FLCOSP_WQX-COSP580 NA NA NA NA
#> 6 2001 21FLCOSP_WQX-NORTH CANAL NA NA NA NA
#> 7 2001 21FLCOSP_WQX-SC-01 NA NA NA NA
#> 8 2001 21FLCOSP_WQX-SOUTH CANAL NA NA NA NA
#> 9 2001 21FLDOH_WQX-MANATEE152 31.6 27.5 -82.7 A
#> 10 2001 21FLHILL_WQX-101 2252. 28.0 -82.6 C
#> # ℹ 1,140 more rows
# Manatee County data
anlz_fibmatrix(mancofibdata, indic = 'fcolif', lagyr = 1)
#> # A tibble: 1,350 × 6
#> yr grp gmean Latitude Longitude cat
#> <dbl> <fct> <dbl> <dbl> <dbl> <chr>
#> 1 1995 396 NA NA NA NA
#> 2 1995 BC1 NA NA NA NA
#> 3 1995 BC2 NA NA NA NA
#> 4 1995 BC41 NA NA NA NA
#> 5 1995 BL01 NA NA NA NA
#> 6 1995 BL201 NA NA NA NA
#> 7 1995 BR1 14.8 27.4 -82.5 A
#> 8 1995 BR2 41.1 27.4 -82.5 A
#> 9 1995 BR3 62.4 27.4 -82.5 A
#> 10 1995 BU01A NA NA NA NA
#> # ℹ 1,340 more rows