Generalized Additive Models (GAMs)
The current management paradigm for Old Tampa Bay (OTB) holds that (1) reducing external nitrogen loads will reduce chlorophyll a concentrations in the OTB water column, (2) reducing chlorophyll a concentrations will improve water clarity, and (3) improved water clarity will stimulate seagrass growth/recovery. The management paradigm is based on earlier studies that reported significant correlations between each subsequent pair of covariates, considering data from across Tampa Bay as a whole and/or from Tampa Bay segments including OTB (e.g., Janicki & Wade 1996; Morrison et al. 1996; Greening & Janicki 2016).
This document explores statistical relationships between TN loads and chlorophyll-a concentrations and between chlorophyll-a concentrations and Secchi depth at various OTB sub-segments using monthly averaged data between approximately 2000 and 2023 (periods of record vary by data source and variable). The analyses were implemented in R version 4.3.1 (R Core Team 2023).
The results of linear models indicate correlations at each OTB sub-segment:
The results of generalized additive models (GAMs) indicated that the TN-chlorophyll and chlorophyll-Secchi relationships varied substantially over time at some OTB sub-segments. We conclude that correlation analyses and simple regression models do not reliably capture the potentially synergistic effects of multiple stressors influencing water quality and seagrass abundance over time.
Water quality and external loading data were assembled by the
../R/data_proc.R
script. This section processes these data
to support the subsequent analyses, including a few basic QA/QC
checks.
We begin by loading packages (see References).
::opts_chunk$set(echo=TRUE, message=FALSE, warning=FALSE)
knitrlibrary(lubridate)
library(plyr)
library(dplyr)
library(tidyr)
library(mgcv)
This section loads the EPC water quality data from file
(../data/epcwq.RData
), pivots the dataframe from wide to
long format, performs QC checks, aggregates the data to a monthly
timeframe (monthly means), and creates a new epcwq2
dataframe for analysis. As indicated in output below, a substantial
number of Secchi records indicate that reported Secchi depth equals the
total depth (i.e., the Secchi depth was visible at bottom); in these
cases, the Secchi depth is not necessarily a reliable proxy for water
clarity (e.g., in shallow areas).
# load data
load( here::here("data/epcwq.RData" ))
# dates
<- epcwq |> as.data.frame()
epcwq1 $date <- epcwq1$SampleTime |> as.Date("%Y-%m-%d")
epcwq1# subset columns (data and 'Q' columns must be listed in consecutive pairs!)
<- select( epcwq1, date, StationNumber, SampleDepth,
epcwq1
Total_Nitrogen, Total_NitrogenQ,
Chlorophyll_a, Chlorophyll_aQ,
SecchiDepth, Secchi_Q,
Total_Suspended_Solids, Total_Suspended_SolidsQ,
Turbidity, TurbidityQ )# subset OTB stations
<- epcwq1[ which( epcwq1$StationNumber %in% c(36, 38, 40, 41, 42, 46, 47,
epcwq1 50, 51, 60, 61, 62, 63, 64,
65, 66, 67, 68) ), ]
# subset by date
<- epcwq1[ which( year(epcwq1$date) >= 2000 ), ]
epcwq1 # loop through data and QC columns to pivot from wide to long format
<- seq( 4, 12, 2 )
datacols <- data.frame( matrix(ncol=7,nrow=0) )
epcwq2 for( i in datacols ){
<- epcwq1[ ,1:3 ]
temp $Analyte <- colnames(epcwq1)[i]
temp$value <- epcwq1[,i] |> as.numeric()
temp$unit <- NA
temp$QA <- epcwq1[,i+1]
temp<- temp[ complete.cases( temp$value ), ]
temp <- rbind( epcwq2, temp )
epcwq2
}# standardize parameter names
<- epcwq2$Analyte |> unique() |> sort()
param.names.old <- c( "Chla", "Secchi", "TN", "TSS", "Turbidity" )
param.names.new $param <- mapvalues( epcwq2$Analyte, param.names.old, param.names.new )
epcwq2# units
<- c( "ug/l", "m", "mg/l", "mg/l", "NTU" ) # listed in same order as param.names.new, above
units $unit <- mapvalues( epcwq2$param, param.names.new, units )
epcwq2# qualifier codes
# replace "NULL" with NA
$QA[ which(epcwq2$QA=="NULL") ] <- NA
epcwq2# specify codes
<- c("*","?","A","B","G","H","J","K",
fatal.codes "L","N","O","Q","T","V","Y","Z")
# define function to locate fatal records
<- function( QUALIFIER, FATAL=fatal.codes ){ # QUALIFER arg is a string
find.fatal <- QUALIFIER |> strsplit(split='') |>
input.code unlist() |> toupper() # parse string into single characters
<- input.code %in% FATAL |> any() # check if any characters match FATAL
fatal return( fatal ) # return TRUE or FALSE
# // end find.fatal()
} # apply function to locate fatal records
<- apply( matrix(epcwq2$QA,ncol=1), 1, find.fatal ) |> which()
rm.fatal.idx <- epcwq2[ -rm.fatal.idx, ]
epcwq2 # acknowledge Secchi records visible on bottom
which( epcwq2$param=="Secchi" & epcwq2$QA==">"), ] |> nrow() # number of VOB Secchi records epcwq2[
## [1] 1333
which( epcwq2$param=="Secchi"), ] |> nrow() # total number of Secchi records epcwq2[
## [1] 5092
# coerce depth to numeric
$SampleDepth <- epcwq2$SampleDepth |> as.numeric()
epcwq2# check for duplicates (confirm FALSE)
|> duplicated() |> any() epcwq2
## [1] FALSE
# subset and rename columns
<- epcwq2[, c("date","param","value","unit","StationNumber") ]
epcwq2 colnames( epcwq2 ) <- c("date","param", "value", "unit", "site" )
# aggregate data to monthly timeframe
<- epcwq2 |> group_by(date,param,site,unit) |> summarise(value=mean(value)) |> as.data.frame()
epcwq2 $date <- epcwq2$date |> floor_date('month') epcwq2
The dataset contains 19,602 rows.
save(epcwq2, file = here::here('data/epcwq2.RData'))
head( epcwq2 )
## date param site unit value
## 1 2000-01-01 Chla 36 ug/l 3.563
## 2 2000-01-01 Chla 38 ug/l 3.161
## 3 2000-01-01 Chla 40 ug/l 2.221
## 4 2000-01-01 Chla 41 ug/l 2.053
## 5 2000-01-01 Chla 46 ug/l 3.255
## 6 2000-01-01 Chla 47 ug/l 2.911
Pinellas County collects water quality data at five strata along the western boundary of OTB:
The code below loads the raw data from file
(../data-raw/DataDownload_2999342_row.csv
), performs QC
checks, standardizes column names, aggregates the data to a monthly
timeframe (monthly means), and creates a new pcwq2
dataframe for analysis.
# load data
<- read.csv( here::here("data-raw/DataDownload_2999342_row.csv" ))[,1:21]
pcwq1 # dates
$Date <- pcwq1$SampleDate |> as.Date("%m/%d/%Y")
pcwq1# qualifier codes
# specify codes
<- c("*","?","A","B","G","H","J","K",
fatal.codes "L","N","O","Q","T","V","Y","Z")
# define function to locate fatal records
<- function( QUALIFIER, FATAL=fatal.codes ){ # QUALIFER arg is a string
find.fatal <- QUALIFIER |> strsplit(split='') |>
input.code unlist() |> toupper() # parse string into single characters
<- input.code %in% FATAL |> any() # check if any characters match FATAL
fatal return( fatal ) # return TRUE or FALSE
# // end find.fatal()
} # apply function to locate fatal records
<- apply( matrix(pcwq1$QACode,ncol=1), 1, find.fatal ) |> which()
rm.fatal.idx <- pcwq1[ -rm.fatal.idx, ]
pcwq1 # parameter names
<- pcwq1$Parameter |> unique() |> sort()
param.names.old <- c( "BOD", "Chla", "ChlaC", "Chlb", "Chlc", "DO", "DO_Sat",
param.names.new "NOx", "Salinity", "TempW", "TKN", "TN", "TSS", "Turbidity" )
$Analyte <- mapvalues( pcwq1$Parameter, param.names.old, param.names.new )
pcwq1# result units
$Unit <- lapply( pcwq1$Parameter, function(x) strsplit(x,"_")[[1]][2] ) |> unlist()
pcwq1# check for duplicates (confirm FALSE)
|> duplicated() |> any() pcwq1
## [1] FALSE
# subset and rename columns
<- pcwq1[, c("Date","Analyte","Result_Value","Unit","StationID") ]
pcwq2 colnames( pcwq2 ) <- c("date","param", "value", "unit", "site" )
# aggregate data to monthly timeframe
<- pcwq2 |> group_by(date,param,site,unit) |> summarise(value=mean(value)) |> as.data.frame()
pcwq2 $date <- pcwq2$date |> floor_date('month') pcwq2
The dataset contains 11,695 rows.
head( pcwq2 )
## date param site unit value
## 1 2000-01-01 BOD 62-01 mgl 1.00
## 2 2000-01-01 Chla 62-01 ugl 2.60
## 3 2000-01-01 Chlb 62-01 ugl 0.50
## 4 2000-01-01 Chlc 62-01 ugl 0.70
## 5 2000-01-01 DO 62-01 mgl 6.38
## 6 2000-01-01 NOx 62-01 ugl 20.00
This section loads the Tampa Bay Estuary Program’s monthly external
loading estimates for Tampa Bay
(../data-raw/TB_loads_monthly.csv
), computes total monthly
TN loads by summing over TN load sources to OTB (e.g., AD, NPS, DPS,
etc.), standardizes column names, and creates a new loads
dataframe for analysis.
# load data
<- read.csv( here::here("data-raw/TB_loads_monthly.csv" ))
loads # subset TN data for OTB segment
<- loads[ which( loads$bay_segment=="Old Tampa Bay"), c('year','month','tn_load') ]
loads # date column
$date <- as.Date( paste0(loads$year,"-",loads$month,"-",01), format="%Y-%m-%d" )
loads# aggregate total TN load across sources
<- loads |> group_by(date) |> summarise(tn_load=sum(tn_load)) |> as.data.frame()
loads # create columns
$param <- "TN load"
loads$site <- "OTB watershed"
loads$unit <- 'tons'
loads# rename columns
colnames( loads ) <- c('date','value','param','site','unit')
# re-order columns
<- select( loads, date, param, value, unit, site ) loads
The dataset contains 324 rows.
head( loads )
## date param value unit site
## 1 1995-01-01 TN load 20.78062 tons OTB watershed
## 2 1995-02-01 TN load 19.80657 tons OTB watershed
## 3 1995-03-01 TN load 18.34947 tons OTB watershed
## 4 1995-04-01 TN load 19.93694 tons OTB watershed
## 5 1995-05-01 TN load 58.07647 tons OTB watershed
## 6 1995-06-01 TN load 54.70274 tons OTB watershed
Janicki & Wade (1996) developed a linear regression model relating monthly average chlorophyll-a concentrations to cumulative TN loads over a three-month period (the concurrent month and two previous months). The model used data from across Tampa Bay during 1985–1991 and stratified the data by the month of the year and by bay segment, and it produced an R2 value of 0.73. Subsequently, Greening et al. (2014) developed a similar regression model for the 1985–2011 period and observed an R2 value of 0.66.
We apply a similar approach as Janciki & Wade (1996) to explore relationships between monthly average chlorophyll-a concentrations and TN loads at three OTB sub-segments during the 2000-2021 period. For simplicity, the models below do not stratify by the month of the year, in contrast to the model of Janicki & Wade (1996).
We begin by assembling the TN load and chlorophyll-a data from the
three OTB sub-segments compartmentalized by the Courtney Campbell,
Howard Frankland, and Gandy Blvd causeways/bridges, using water quality
data collected by EPC at fixed monitoring locations between 2000 and
2021. The three resulting datasets – epc1
,
epc2
, and epc3
– are arranged in wide format
and include the columns date
, chla
,
TN_load
(monthly load values), and TN_load_3mo
(cumulative 3-month loads).
# Subset and join load and chl-a data for OTB sub-segments
# Sub-segment north of Courtney Campbell
<- epcwq2[ which( epcwq2$site %in% c(46,47,60,61,62,64) ), ] # subset EPC data by station
epc1 <- epc1[which(epc1$param=="Chla"),] |> group_by(date) |> # aggregate chl data to monthly means
epc1 summarise(value=mean(value)) |> as.data.frame()
<- left_join( epc1, loads[,c('date','value')], by='date' ) # value.x is chl; value.y is load
epc1 colnames(epc1) <- c("date","chla","TN_load")
# Sub-segment between Courtney Campbell and Frankland
<- epcwq2[ which( epcwq2$site %in% c(40,41,42,63,65,66) ), ]
epc2 <- epc2[which(epc2$param=="Chla"),] |> group_by(date) |>
epc2 summarise(value=mean(value)) |> as.data.frame()
<- left_join( epc2, loads[,c('date','value')], by='date' )
epc2 colnames(epc2) <- c("date","chla","TN_load")
# Sub-segment between Frankland and Gandy
<- epcwq2[ which( epcwq2$site %in% c(38,50,51,67) ), ]
epc3 <- epc3[which(epc3$param=="Chla"),] |> group_by(date) |>
epc3 summarise(value=mean(value)) |> as.data.frame()
<- left_join( epc3, loads[,c('date','value')], by='date' )
epc3 colnames(epc3) <- c("date","chla","TN_load")
# Compute 3-month TN loads
# Define function to sum load values over a specified window (`win`)
# The function assumes there are no NAs
<- function( x, win = 3 ){
sum3mo <- rep(NA,length=(win-1))
out for( i in win:length(x) ){
<- sum( x[i:(i-win+1)] )
out[i]
}return( out )
}# Add a 3-month TN load column to each dataset
$TN_load_3mo <- sum3mo( epc1$TN_load )
epc1$TN_load_3mo <- sum3mo( epc2$TN_load )
epc2$TN_load_3mo <- sum3mo( epc3$TN_load ) epc3
The monthly sub-segment chlorophyll-a data and segment-wide TN load data are visualized below.
par( mfrow=c(2,2) )
plot( chla ~ date, data = epc1, pch = 16, col = rgb(0,0.4,0.8,0.7),
ylim = c(0,80), ylab = "Chlorophyll-a (ug/l)", xlab = "", las = 1,
main = "Chlorophyll-a, OTB north of Courtney Campbell" )
lines( chla ~ date, data = epc1, col = rgb(0,0.4,0.8,0.4) )
abline( v = seq.Date(as.Date("2000-01-01"),as.Date("2024-01-01"),'year'),
col = rgb(0,0,0,0.2) )
abline( h = axTicks(2), col = rgb(0,0,0,0.2) )
plot( chla ~ date, data = epc2, pch = 16, col = rgb(0,0.4,0.8,0.7),
ylim = c(0,80), ylab = "Chlorophyll-a (ug/l)", xlab = "", las = 1,
main = "Chlorophyll-a, OTB between Frankland & Courtney Campbell" )
lines( chla ~ date, data = epc2, col = rgb(0,0.4,0.8,0.4) )
abline( v = seq.Date(as.Date("2000-01-01"),as.Date("2024-01-01"),'year'),
col = rgb(0,0,0,0.2) )
abline( h = axTicks(2), col = rgb(0,0,0,0.2) )
plot( chla ~ date, data = epc3, pch = 16, col = rgb(0,0.4,0.8,0.7),
ylim = c(0,80), ylab = "Chlorophyll-a (ug/l)", xlab = "", las = 1,
main = "Chlorophyll-a, OTB between Gandy & Frankland" )
lines( chla ~ date, data = epc3, col = rgb(0,0.4,0.8,0.4) )
abline( v = seq.Date(as.Date("2000-01-01"),as.Date("2024-01-01"),'year'),
col = rgb(0,0,0,0.2) )
abline( h = axTicks(2), col = rgb(0,0,0,0.2) )
plot( TN_load ~ date, data = epc3, pch = 16, col = rgb(0.2,0.2,0.2,0.7),
ylab = "TN load (tons)", xlab = "", las = 1,
main = "Monthly TN load, Old Tampa Bay" )
lines( TN_load ~ date, data = epc3, col = rgb(0.2,0.2,0.2,0.4) )
abline( v = seq.Date(as.Date("2000-01-01"),as.Date("2024-01-01"),'year'),
col = rgb(0,0,0,0.2) )
abline( h = axTicks(2), col = rgb(0,0,0,0.2) )
Below, we fit linear models for chlorophyll as a function of
cumulative 3-month TN loads at each sub-segment and visualize the
results. The TN_chl_plot()
function takes the data
(dat
) and model (mod
) as input and produces a
scatterplot of the data with the regression line. The models show
significant relationships (P<0.001) and R2 values of 0.32, 0.37, and
0.46. However, these R2 values are not directly comparable to the
R2=0.73 of the Janicki & Wade (1996) model, because the models
differ in spatial scale (OTB sub-segments vs. Tampa Bay as a whole).
Further, the residuals of each model do not appear to be homoskedastic
nor Gaussian (diagnostic visualizations follow); experimentation with
alternative model specifications that include a month-of-year term did
not improve the model diagnostics.
# Define plotting function for data and model
<- function( dat, mod, lab.main ){
TN_chl_plot # 95% CI
<- seq( min( dat$TN_load_3mo, na.rm=TRUE ),
pred.seq max( dat$TN_load_3mo, na.rm=TRUE ), length.out=100 )
<- predict.gam( mod,
pred newdata = data.frame( TN_load_3mo = pred.seq ),
se.fit=TRUE )
# P value
<- summary(mod)$p.table[2,4]
pval <- rgb(1,0,0,0.4)
pcol1 <- rgb(1,0,0,0.15)
pcol2 if( pval < 0.001 ){
<- "P < 0.001"
pstring else if( pval < 0.01 ){
} <- "P < 0.01"
pstring else if( pval < 0.05 ){
} <- "P < 0.05"
pstring else {
} <- "P > 0.05"
pstring <- rgb(0.2,0.3,0.9,0.4)
pcol1 <- rgb(0,0.1,0.6,0.15)
pcol2
}
# scatterplot data
plot( chla ~ TN_load_3mo, data = dat,
col = rgb(0,0,0,0.6), cex = 1.4,
main = paste( lab.main,"\n",
min(dat$date), " - ", max(dat$date) ),
cex.main = 1.6, cex.lab = 1.4,
xlim = range( c(0,dat$TN_load_3mo,pred.seq), na.rm=TRUE ),
ylim = range( c(0,dat$chla,pred$fit-pred$se.fit*1.96,
$fit+pred$se.fit*1.96)*1.2 ),
predxlab = "3-month TN load (tons)", cex.axis = 1.4,
ylab = "Chlorophyll-a (ug/l)", las = 1
)abline( v = axTicks(1), col = rgb(0,0,0,0.2) )
abline( h = axTicks(2), col = rgb(0,0,0,0.2) )
abline( v = 0, col = rgb(0,0,0,0.5) )
abline( h = 0, col = rgb(0,0,0,0.5) )
# plot model fit
lines( pred$fit ~ pred.seq, col = pcol1, lwd = 2, lty = 2 )
polygon( x = c( pred.seq, rev(pred.seq) ),
y = c( (pred$fit+pred$se.fit*1.96), rev((pred$fit-pred$se.fit*1.96)) ),
col = pcol2, border = rgb(0,0,0,0) )
# write model R2 and pval
legend( 'top', horiz = TRUE, bty = 'n',
border = rgb(0,0,0,0), cex = 1.4,
legend = paste0( "N = ", nrow(dat), " ",
"R2 = ", round(summary(mod)$r.sq,3),
" ", pstring )
)# // end TN_chl_plot() function
}
# remove incomplete cases in the data
<- epc1[ complete.cases(epc1), ]
epc1 <- epc2[ complete.cases(epc2), ]
epc2 <- epc3[ complete.cases(epc3), ]
epc3
# models
<- gam( chla ~ TN_load_3mo, data = epc1 )
ncmod1 <- gam( chla ~ TN_load_3mo, data = epc2 )
ncmod2 <- gam( chla ~ TN_load_3mo, data = epc3 )
ncmod3
# plots
par( mfrow=c(1,3) )
TN_chl_plot( epc1, ncmod1, "Chla north of CC vs. 3mo TN load (OTB)" )
TN_chl_plot( epc2, ncmod2, "Chla bw HF-CC vs. 3mo TN load (OTB)" )
TN_chl_plot( epc3, ncmod3, "Chla bw GB-HF vs. 3mo TN load (OTB)" )
The diagnostics for each of the above OTB models indicate that the assumption of homoskedastic, Gaussian residuals is not met.
par(mfrow=c(2,2))
# North of CC
gam.check(ncmod1,rep=1000)
##
## Method: GCV Optimizer: magic
## Model required no smoothing parameter selectionModel rank = 2 / 2
# Between CC and HF
gam.check(ncmod2,rep=1000)
##
## Method: GCV Optimizer: magic
## Model required no smoothing parameter selectionModel rank = 2 / 2
# Between HF and GB
gam.check(ncmod3,rep=1000)
##
## Method: GCV Optimizer: magic
## Model required no smoothing parameter selectionModel rank = 2 / 2
This section further explores relationships between chlorophyll-a and TN loads by considering monthly TN loading values at delays of zero through two months across various OTB sub-segments, including the EPC sub-segments and the Pinellas County strata.
The code chunk below defines the OTB_corr()
function,
which computes and visualizes a linear fit between a specified
explanatory variable x and response variable y, after
joining the datasets by date; the input data may come from two different
dataframes. The function’s arguments are described in comments in the
first few lines of the code chunk.
The delay
argument allows the user to specify a
non-negative integer time delay between the explanatory and response
variables. By default, the delay=0
setting pairs response
values with concurrent explanatory values (e.g., TN loads and
chlorophyll concentrations during the same month); setting
delay
to a positive values lags the explanatory variable by
the specified number of time periods. The log
argument
(logical) allows the user to specify whether to log10-transform
x and y prior to the regression. The function outputs
an x-y scatterplot that displays the linear fit with a 95% confidence
band, as well as R2 and P-value estimates. The color of the regression
line indicates whether the estimated slope is statistically significant
(red) or not significant (gray) at alpha = 0.05. The function returns
the input data and the model object.
# define regression/plot function
# x, y dataframes for explanatory (x) and response (y) variables,
# each containing columns date, param, value, unit,
# param.x, water quality parameters in x$param, y$param (strings)
# param.y
# site.x, location labels in x$site, y$site (string)
# site.y
# log logical, if TRUE log10-transform the x and y data. FALSE by default
# delay number of time steps to delay param.x (non-negative integer)
<- function( x, y,
OTB_corr
param.x, param.y,site.y = site.x,
site.x, log = FALSE,
delay = 0 ){
# units
<- x$unit[ which( x$param==param.x ) ] |> unique()
unit.x <- y$unit[ which( y$param==param.y ) ] |> unique()
unit.y
# assemble data for x and y in wide format
<- x[ which( x$param==param.x & x$site==site.x ), ]
dat.x <- dat.x |> group_by(date) |> summarise(value=mean(value)) |> as.data.frame()
dat.x <- y[ which( y$param==param.y & y$site==site.y ), ]
dat.y <- dat.y |> group_by(date) |> summarise(value=mean(value)) |> as.data.frame()
dat.y <- dplyr::inner_join( dat.x, dat.y, by = 'date' )
dat
# log10-transform
if( log == TRUE ){
$value.x[ which( dat$value.x != 0) ] <- dat$value.x |> log10()
dat$value.y[ which( dat$value.y != 0) ] <- dat$value.y |> log10()
dat<- paste( "log10", unit.x )
unit.x <- paste( "log10", unit.y )
unit.y
}
# delay
if( delay > 0 ){
<- data.frame( date = seq.Date( min(dat$date), max(dat$date), 'month' ) )
dateseq <- dplyr::left_join( dateseq, dat, by = 'date' )
dat $value.x <- dplyr::lag( dat$value.x, n = delay )
dat<- na.omit( dat )
dat else if ( delay < 0 ){
} stop("Negative delay specified.")
}
# fit linear model
# fit model
<- gam( value.y ~ value.x, data = dat )
mod1 # plot fit with 95% CI
<- seq( min( dat$value.x ), max( dat$value.x ), length.out=100 )
pred.seq <- predict.gam( mod1,
pred newdata = data.frame( value.x = pred.seq ),
se.fit=TRUE )
# P value
<- summary(mod1)$p.table[2,4]
pval <- rgb(1,0,0,0.4)
pcol1 <- rgb(1,0,0,0.15)
pcol2 if( pval < 0.001 ){
<- "P < 0.001"
pstring else if( pval < 0.01 ){
} <- "P < 0.01"
pstring else if( pval < 0.05 ){
} <- "P < 0.05"
pstring else {
} <- "P > 0.05"
pstring <- rgb(0.2,0.3,0.9,0.4)
pcol1 <- rgb(0,0.1,0.6,0.15)
pcol2
}
# scatterplot data
plot( value.y ~ value.x, data = dat,
col = rgb(0,0,0,0.6), cex = 1.4,
main = paste( param.y, "at", site.y, " vs. ", param.x, "at", site.x,"\n",
min(dat$date), " - ", max(dat$date) ),
cex.main = 1.6, cex.lab = 1.4,
xlim = range( c(0,dat$value.x,pred.seq) ),
ylim = range( c(0,dat$value.y,pred$fit-pred$se.fit*1.96,
$fit+pred$se.fit*1.96)*1.2 ),
predxlab = paste0( param.x, " (", unit.x, ")" ), cex.axis = 1.4,
ylab = paste0( param.y, " (", unit.y, ")" ), las = 1
)abline( v = axTicks(1), col = rgb(0,0,0,0.2) )
abline( h = axTicks(2), col = rgb(0,0,0,0.2) )
abline( v = 0, col = rgb(0,0,0,0.5) )
abline( h = 0, col = rgb(0,0,0,0.5) )
# plot model fit
lines( pred$fit ~ pred.seq, col = pcol1, lwd = 2, lty = 2 )
polygon( x = c( pred.seq, rev(pred.seq) ),
y = c( (pred$fit+pred$se.fit*1.96), rev((pred$fit-pred$se.fit*1.96)) ),
col = pcol2, border = rgb(0,0,0,0) )
# write model R2 and pval
legend( 'top', horiz = TRUE, bty = 'n',
border = rgb(0,0,0,0), cex = 1.4,
legend = paste0( "N = ", nrow(dat), " ",
"R2 = ", round(summary(mod1)$r.sq,3),
" ", pstring, " delay = ", delay )
)
return( list( data = dat, model = mod1 ) )
# // end OTB_corr() }
The plots below show relationships between log10-transformed external TN loads and chlorophyll-a concentrations within the three OTB sub-segments compartmentalized by the Courtney Campbell, Howard Frankland, and Gandy Blvd causeways/bridges, using water quality data collected by EPC at fixed monitoring locations between 2000 and 2021. The three OTB compartments span the three rows, and the three columns pertain to delays of zero, one, and two months. All three OTB compartments show significant positive correlations at each delay, with a maximum R2 value of 0.36. In each case the data exhibit substantial scatter (unexplained variance) around the regression line. Although not visualized below, the diagnostics for each model indicate that the residuals are homoskedastic and approximately Gaussian.
par(mfrow=c(3,3))
# assemble and re-label data for locations north of Courtney Campbell
<- epcwq2[ which( epcwq2$site %in% c(46,47,60,61,62,64) ), ]
epc1 $site <- "North CC"
epc1# models
<- OTB_corr( x = loads, y = epc1, 'TN load', 'Chla',
epc1_load_Chla_0 'OTB watershed', 'North CC', log = TRUE, delay = 0 )
<- OTB_corr( x = loads, y = epc1, 'TN load', 'Chla',
epc1_load_Chla_1 'OTB watershed', 'North CC', log = TRUE, delay = 1 )
<- OTB_corr( x = loads, y = epc1, 'TN load', 'Chla',
epc1_load_Chla_2 'OTB watershed', 'North CC', log = TRUE, delay = 2 )
# assemble and re-label data for locations between Frankland & Campbell
<- epcwq2[ which( epcwq2$site %in% c(40,41,42,63,65,66) ), ]
epc2 $site <- "HF-CC"
epc2# models
<- OTB_corr( x = loads, y = epc2, 'TN load', 'Chla',
epc2_load_Chla_0 'OTB watershed', 'HF-CC', log = TRUE, delay = 0 )
<- OTB_corr( x = loads, y = epc2, 'TN load', 'Chla',
epc2_load_Chla_1 'OTB watershed', 'HF-CC', log = TRUE, delay = 1 )
<- OTB_corr( x = loads, y = epc2, 'TN load', 'Chla',
epc2_load_Chla_2 'OTB watershed', 'HF-CC', log = TRUE, delay = 2 )
# assemble and re-label data for locations between Gandy & Frankland
<- epcwq2[ which( epcwq2$site %in% c(38,50,51,67) ), ]
epc3 $site <- "GB-HF"
epc3# models
<- OTB_corr( x = loads, y = epc3, 'TN load', 'Chla',
epc3_load_Chla_0 'OTB watershed', 'GB-HF', log = TRUE, delay = 0 )
<- OTB_corr( x = loads, y = epc3, 'TN load', 'Chla',
epc3_load_Chla_1 'OTB watershed', 'GB-HF', log = TRUE, delay = 1 )
<- OTB_corr( x = loads, y = epc3, 'TN load', 'Chla',
epc3_load_Chla_2 'OTB watershed', 'GB-HF', log = TRUE, delay = 2 )
The plots below show relationships between log10-transformed TN loads and chlorophyll-a concentrations within Pinellas County strata E1–E5 (rows) at delays of 0–2 months (columns). During the same month (first column), all strata show significant positive correlations, with a maximum R2 value of 0.27. The data typically exhibit substantial scatter (unexplained variance) around the regression lines. Across all strata, with the exception of E1, significant relationships are also observed at delays of one month (second column) and two months (third column), although the R2 values are substantially lower. Although not visualized below, the diagnostics for each model indicate that the residuals are homoskedastic and approximately Gaussian.
par(mfrow=c(5,3))
<- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E1',
E1_load_Chla_0 log = TRUE, delay = 0 )
<- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E1',
E1_load_Chla_1 log = TRUE, delay = 1 )
<- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E1',
E1_load_Chla_2 log = TRUE, delay = 2 )
<- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E2',
E2_load_Chla_0 log = TRUE, delay = 0 )
<- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E2',
E2_load_Chla_1 log = TRUE, delay = 1 )
<- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E2',
E2_load_Chla_2 log = TRUE, delay = 2 )
<- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E3',
E3_load_Chla_0 log = TRUE, delay = 0 )
<- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E3',
E3_load_Chla_1 log = TRUE, delay = 1 )
<- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E3',
E3_load_Chla_2 log = TRUE, delay = 2 )
<- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E4',
E4_load_Chla_0 log = TRUE, delay = 0 )
<- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E4',
E4_load_Chla_1 log = TRUE, delay = 1 )
<- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E4',
E4_load_Chla_2 log = TRUE, delay = 2 )
<- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E5',
E5_load_Chla_0 log = TRUE, delay = 0 )
<- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E5',
E5_load_Chla_1 log = TRUE, delay = 1 )
<- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E5',
E5_load_Chla_2 log = TRUE, delay = 2 )
This section explores relationships between log10-transformed chlorophyll-a concentrations and Secchi depths within the three OTB sub-segments compartmentalized by the Courtney Campbell, Howard Frankland, and Gandy Blvd causeways/bridges, using data collected by EPC at fixed monitoring locations between 2000 and 2023. Following Greening & Janicki (2016), the variables are log-transformed prior to analysis. When interpreting these results, it is important to consider that the Secchi data contain a substantial number of records in which the Secchi depth equals the total depth (i.e., the Secchi disk was visible at bottom). In these cases, the reported Secchi depth is not necessarily a reliable proxy of water clarity (e.g., in shallow areas), and the presence of these records may bias the regression results.
All three OTB compartments show significant negative correlations at
delays of zero and one month (first and second columns), with a maximum
R2 value of 0.38. The magnitudes of the slopes and the R2 values both
decrease between delay=0
and delay=1
. And in
each case, the data exhibit substantial scatter around the regression
line, indicating that any given chlorophyll concentration is typically
associated with a large range of Secchi values. At a delay of two months
(third column), none of the OTB compartments shows a significant
correlation. Although not visualized below, the diagnostics for each
model indicate that the residuals are homoskedastic and approximately
Gaussian.
par(mfrow=c(3,3))
# models
<- OTB_corr( x = epc1, y = epc1, 'Chla', 'Secchi',
epc1_Chla_Secchi_0 'North CC', log = TRUE, delay = 0 )
<- OTB_corr( x = epc1, y = epc1, 'Chla', 'Secchi',
epc1_Chla_Secchi_1 'North CC', log = TRUE, delay = 1 )
<- OTB_corr( x = epc1, y = epc1, 'Chla', 'Secchi',
epc1_Chla_Secchi_2 'North CC', log = TRUE, delay = 2 )
# models
<- OTB_corr( x = epc2, y = epc2, 'Chla', 'Secchi',
epc2_Chla_Secchi_0 'HF-CC', log = TRUE, delay = 0 )
<- OTB_corr( x = epc2, y = epc2, 'Chla', 'Secchi',
epc2_Chla_Secchi_1 'HF-CC', log = TRUE, delay = 1 )
<- OTB_corr( x = epc2, y = epc2, 'Chla', 'Secchi',
epc2_Chla_Secchi_2 'HF-CC', log = TRUE, delay = 2 )
# models
<- OTB_corr( x = epc3, y = epc3, 'Chla', 'Secchi',
epc3_Chla_Secchi_0 'GB-HF', log = TRUE, delay = 0 )
<- OTB_corr( x = epc3, y = epc3, 'Chla', 'Secchi',
epc3_Chla_Secchi_1 'GB-HF', log = TRUE, delay = 1 )
<- OTB_corr( x = epc3, y = epc3, 'Chla', 'Secchi',
epc3_Chla_Secchi_2 'GB-HF', log = TRUE, delay = 2 )
Next, we explore whether nitrogen-chlorophyll and chlorophyll-Secchi relationships have changed over time across OTB sub-segments.
This section explores whether relationships between monthly averaged chlorophyll concentrations and concurrent TN loads have changed over time within the three sub-segments divided by bridges/causeways (using EPC data) and within the five Pinellas County strata (E1–E5).
We assemble the EPC data and specify a GAM that estimates the
chlorophyll-a concentration using smooth terms for the decimal date
(decdate
), TN load (value.x
), and their tensor
product interaction. Model diagnostics indicate that the residuals are
approximately Gaussian (QQ plot) and that the model had sufficient basis
dimensions (k).
# join TN load data to chlorophyll data
<- epcwq2[ which( epcwq2$site %in% c(46,47,60,61,62,64) ), ] # subset EPC data by station
epc1 <- epc1[which(epc1$param=="Chla"),] |> group_by(date) |> # aggregate chl data to monthly means
epc1 summarise(value=mean(value)) |> as.data.frame()
<- inner_join( x = epc1, y = loads, by = 'date' )
gamdat_epc1 nrow(gamdat_epc1)
## [1] 261
# decimal date
$decdate <- gamdat_epc1$date |> decimal_date()
gamdat_epc1# fit gam
<- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_epc1 )
gam_epc1 # model diagnostics
par(mfrow=c(2,2))
gam.check( gam_epc1, rep = 1000 )
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 12 iterations.
## The RMS GCV score gradient at convergence was 1.781363e-06 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120 13 0.98 0.35
summary( gam_epc1 )
##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.50621 0.01844 81.69 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 13.03 17.5 8.752 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.368 Deviance explained = 40%
## GCV = 0.093781 Scale est. = 0.08874 n = 261
The plots below visualize the estimated te()
coefficient
surface and suggest that the association between chlorophyll
concentrations and TN load (value.x
) has increased over
time (decdate
) within the sub-segment. In the heatmap
image, colors range from red (relatively low magnitude) to yellow
(higher magnitude). White patches indicate gaps in the input space.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam_epc1, scheme = 1 )
plot.gam( gam_epc1, scheme = 2 )
The model shows good diagnostics.
# join TN load data to chlorophyll data
<- epcwq2[ which( epcwq2$site %in% c(40,41,42,63,65,66) ), ]
epc2 <- epc2[which(epc2$param=="Chla"),] |> group_by(date) |>
epc2 summarise(value=mean(value)) |> as.data.frame()
<- inner_join( x = epc2, y = loads, by = 'date' )
gamdat_epc2 nrow(gamdat_epc2)
## [1] 261
# decimal date
$decdate <- gamdat_epc2$date |> decimal_date()
gamdat_epc2# fit gam
<- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_epc2 )
gam_epc2 # model diagnostics
par(mfrow=c(2,2))
gam.check( gam_epc2, rep = 1000 )
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 19 iterations.
## The RMS GCV score gradient at convergence was 8.974192e-08 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120.00 5.79 0.94 0.04 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary( gam_epc2 )
##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.50621 0.01877 80.24 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 5.789 6.76 20.55 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.345 Deviance explained = 36%
## GCV = 0.094419 Scale est. = 0.091963 n = 261
The plots below suggest that the association between chlorophyll
concentrations and TN load (value.x
) has increased over
time (decdate
) within the sub-segment. In the heatmap
image, colors range from red (relatively low magnitude) to yellow
(higher magnitude). Comparisons across sub-segments require
consideration of the te()
coefficient surfaces as well as
each model’s estimated intercept.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam_epc2, scheme = 1 )
plot.gam( gam_epc2, scheme = 2 )
The model shows good diagnostics.
# join TN load data to chlorophyll data
<- epcwq2[ which( epcwq2$site %in% c(38,50,51,67) ), ]
epc3 <- epc3[which(epc3$param=="Chla"),] |> group_by(date) |>
epc3 summarise(value=mean(value)) |> as.data.frame()
<- inner_join( x = epc3, y = loads, by = 'date' )
gamdat_epc3 nrow(gamdat_epc3)
## [1] 261
# decimal date
$decdate <- gamdat_epc3$date |> decimal_date()
gamdat_epc3# fit gam
<- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_epc3 )
gam_epc3 # model diagnostics
par(mfrow=c(2,2))
gam.check( gam_epc3, rep = 1000 )
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 15 iterations.
## The RMS GCV score gradient at convergence was 2.640248e-06 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120 18 1.03 0.66
summary( gam_epc3 )
##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.50621 0.01905 79.08 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 17.96 21.99 6.071 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.326 Deviance explained = 37.3%
## GCV = 0.1021 Scale est. = 0.094686 n = 261
The plots below suggest that the association between chlorophyll
concentrations and TN load (value.x
) has generally
increased over time (decdate
) within the sub-segment, but
the dynamics here are a bit more complex than in the two previous
sub-segments. In the heatmap image, colors range from red (relatively
low magnitude) to yellow (higher magnitude). Comparisons across
sub-segments require consideration of the te()
coefficient
surfaces as well as each model’s estimated intercept.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam_epc3, scheme = 1 )
plot.gam( gam_epc3, scheme = 2 )
The model shows good diagnostics.
# join TN load data to chlorophyll data
<- inner_join( x = pcwq2[ which( pcwq2$param=='Chla' & pcwq2$site=="E1" ), ],
gamdat_E1 y = loads,
by = 'date'
)nrow(gamdat_E1)
## [1] 127
# decimal date
$decdate <- gamdat_E1$date |> decimal_date()
gamdat_E1# fit gam
<- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_E1 )
gam1 # model diagnostics
par(mfrow=c(2,2))
gam.check( gam1, rep = 1000 )
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 22 iterations.
## The RMS GCV score gradient at convergence was 6.398965e-08 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120 3 1.07 0.8
summary( gam1 )
##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.53432 0.02782 55.15 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 3 3 15.54 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.257 Deviance explained = 27.5%
## GCV = 0.10149 Scale est. = 0.098294 n = 127
The plots below visualize the estimated te()
coefficient
surface and suggest that the association between chlorophyll
concentrations and TN load (value.x
) has not substantially
changed over time (decdate
). In the heatmap image, colors
range from red (relatively low magnitude) to yellow (higher magnitude).
White patches indicate gaps in the input space. Comparisons across
sub-segments require consideration of the te()
coefficient
surfaces as well as each model’s estimated intercept.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam1, scheme = 1)
plot.gam( gam1, scheme = 2)
The model shows good diagnostics.
# join TN load data to chlorophyll data
<- inner_join( x = pcwq2[ which( pcwq2$param=='Chla' & pcwq2$site=="E2" ), ],
gamdat_E2 y = loads,
by = 'date'
)nrow(gamdat_E2)
## [1] 132
# decimal date
$decdate <- gamdat_E2$date |> decimal_date()
gamdat_E2# fit gam
<- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_E2 )
gam2 # model diagnostics
par(mfrow=c(2,2))
gam.check( gam2, rep = 1000 )
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 12 iterations.
## The RMS GCV score gradient at convergence was 1.556571e-06 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120.00 7.94 1.01 0.46
summary( gam2 )
##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.5445 0.0288 53.62 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 7.938 10.37 3.734 0.000182 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.217 Deviance explained = 26.5%
## GCV = 0.11745 Scale est. = 0.1095 n = 132
The plots below suggest that the association between chlorophyll
concentrations and TN load (value.x
) has changed
substantially over time (decdate
) at Largo Inlet, with TN
loads (value.x
) most strongly associated with chlorophyll
concentrations between about 2010 and 2015. Comparisons across
sub-segments require consideration of the te()
coefficient
surfaces as well as each model’s estimated intercept.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam2, scheme = 1)
plot.gam( gam2, scheme = 2)
The model shows good diagnostics.
# join TN load data to chlorophyll data
<- inner_join( x = pcwq2[ which( pcwq2$param=='Chla' & pcwq2$site=="E3" ), ],
gamdat_E3 y = loads,
by = 'date'
)nrow(gamdat_E3)
## [1] 132
# decimal date
$decdate <- gamdat_E3$date |> decimal_date()
gamdat_E3# fit gam
<- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_E3 )
gam3 # model diagnostics
par(mfrow=c(2,2))
gam.check( gam3, rep = 1000 )
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 22 iterations.
## The RMS GCV score gradient at convergence was 9.880194e-08 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120.00 4.52 0.89 0.06 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary( gam3 )
##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.52336 0.02802 54.36 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 4.518 5.194 6.702 1.18e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.197 Deviance explained = 22.5%
## GCV = 0.10818 Scale est. = 0.10366 n = 132
The plots below suggest that the association between chlorophyll
concentrations and TN load (value.x
) has not changed
substantially over time (decdate
) at Feather Sound.
Comparisons across sub-segments require consideration of the
te()
coefficient surfaces as well as each model’s estimated
intercept.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam3, scheme = 1)
plot.gam( gam3, scheme = 2)
The model shows good diagnostics.
# join TN load data to chlorophyll data
<- inner_join( x = pcwq2[ which( pcwq2$param=='Chla' & pcwq2$site=="E4" ), ],
gamdat_E4 y = loads,
by = 'date'
)nrow(gamdat_E4)
## [1] 126
# decimal date
$decdate <- gamdat_E4$date |> decimal_date()
gamdat_E4# fit gam
<- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_E4 )
gam4 # model diagnostics
par(mfrow=c(2,2))
gam.check( gam4, rep = 1000 )
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 40 iterations.
## The RMS GCV score gradient at convergence was 6.314619e-06 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120.00 5.71 0.97 0.31
summary( gam4 )
##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.52974 0.02616 58.47 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 5.712 7.127 8.513 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.322 Deviance explained = 35.3%
## GCV = 0.091092 Scale est. = 0.08624 n = 126
The plots below suggest that the association between chlorophyll
concentrations and TN load (value.x
) has not changed
substantially over time (decdate
) at this sub-segment.
Comparisons across sub-segments require consideration of the
te()
coefficient surfaces as well as each model’s estimated
intercept.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam4, scheme = 1)
plot.gam( gam4, scheme = 2)
The model shows good diagnostics.
# join TN load data to chlorophyll data
<- inner_join( x = pcwq2[ which( pcwq2$param=='Chla' & pcwq2$site=="E5" ), ],
gamdat_E5 y = loads,
by = 'date'
)nrow(gamdat_E5)
## [1] 128
# decimal date
$decdate <- gamdat_E5$date |> decimal_date()
gamdat_E5# fit gam
<- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_E5 )
gam5 # model diagnostics
par(mfrow=c(2,2))
gam.check( gam5, rep = 1000 )
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 16 iterations.
## The RMS GCV score gradient at convergence was 1.229579e-07 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120.0 12.9 1.21 0.99
summary( gam5 )
##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.51629 0.02511 60.39 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 12.94 16.78 5.789 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.421 Deviance explained = 48%
## GCV = 0.090549 Scale est. = 0.080691 n = 128
The plots below suggests a possible bifurcation in the association
between chlorophyll concentrations and TN load (value.x
) at
Weedon Island over time (decdate
), for moderate to high
levels of TN load. Comparisons across sub-segments require consideration
of the te()
coefficient surfaces as well as each model’s
estimated intercept.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam5, scheme = 1)
plot.gam( gam5, scheme = 2)
This section explores whether relationships between chlorophyll concentrations and Secchi depth have changed over time across the three OTB sub-segments divided by bridges/causeways.
We assemble the EPC data and fit a GAM to model the chlorophyll-Secchi relationship over time. The model shows good diagnostics.
# join TN load data to chlorophyll data
<- epcwq2[ which( epcwq2$site %in% c(46,47,60,61,62,64) ), ] # subset EPC data by station
epc1 <- epc1[ which( epc1$param=="Chla" ), ]
epc1.chl <- epc1.chl |> group_by(date) |> summarize(value=mean(value)) |> as.data.frame()
epc1.chl <- epc1[ which( epc1$param=="Secchi" ), ]
epc1.sec <- epc1.sec |> group_by(date) |> summarize(value=mean(value)) |> as.data.frame()
epc1.sec <- inner_join( x = epc1.chl, y = epc1.sec, by = 'date' )
gamdat_epc1 nrow(gamdat_E2)
## [1] 132
# decimal date
$decdate <- gamdat_epc1$date |> decimal_date()
gamdat_epc1# fit gam
<- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_epc1 )
gam6 # model diagnostics
par(mfrow=c(2,2))
gam.check( gam6, rep = 1000 )
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 10 iterations.
## The RMS GCV score gradient at convergence was 8.176381e-08 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120.0 34.2 1 0.46
summary( gam6 )
##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.137672 0.004366 31.54 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 34.19 43.53 4.091 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.371 Deviance explained = 44.7%
## GCV = 0.0061972 Scale est. = 0.005432 n = 285
The plots below visualize the estimated te()
coefficient
surface and suggest that the association between Secchi depth and
chlorophyll concentration (value.x
) has changed
substantially over time (decdate
). Redder areas (between
approximately 2000–2005 and 2010–2015) in the heatmap indicate time
periods when increased chlorophyll concentrations were more strongly
associated with reduced water clarity.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam6, scheme = 1)
plot.gam( gam6, scheme = 2)
The model shows good diagnostics.
# join TN load data to chlorophyll data
<- epcwq2[ which( epcwq2$site %in% c(40,41,42,63,65,66) ), ]
epc2 <- epc2[ which( epc2$param=="Chla" ), ]
epc2.chl <- epc2.chl |> group_by(date) |> summarize(value=mean(value)) |> as.data.frame()
epc2.chl <- epc2[ which( epc2$param=="Secchi" ), ]
epc2.sec <- epc2.sec |> group_by(date) |> summarize(value=mean(value)) |> as.data.frame()
epc2.sec <- inner_join( x = epc2.chl, y = epc2.sec, by = 'date' )
gamdat_epc2 nrow(gamdat_E2)
## [1] 132
# decimal date
$decdate <- gamdat_epc2$date |> decimal_date()
gamdat_epc2# fit gam
<- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_epc2 )
gam7 # model diagnostics
par(mfrow=c(2,2))
gam.check( gam7, rep = 1000 )
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 6 iterations.
## The RMS GCV score gradient at convergence was 3.698484e-06 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120.0 30.1 1.03 0.65
summary( gam7 )
##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.246518 0.005706 43.2 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 30.14 37.86 9.754 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.563 Deviance explained = 60.9%
## GCV = 0.010418 Scale est. = 0.0092794 n = 285
The plots below suggest that the association between Secchi depth and
chlorophyll concentration (value.x
) has changed
substantially over time (decdate
) at this sub-segment.
Redder areas in the heatmap indicate time periods when increased
chlorophyll concentrations were more strongly associated with reduced
water clarity. Although the heatmap indicates gaps in some
high-chlorophyll areas of the input space, the temporal pattern in these
results noticeably resembles those of the previous model
(gam6
). Nonetheless, quantitative comparisons across
sub-segments require consideration of the te()
coefficient
surfaces as well as each model’s estimated intercept.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam7, scheme = 1)
plot.gam( gam7, scheme = 2)
The model shows good diagnostics.
# join TN load data to chlorophyll data
<- epcwq2[ which( epcwq2$site %in% c(38,50,51,67) ), ]
epc3 <- epc3[ which( epc3$param=="Chla" ), ]
epc3.chl <- epc3.chl |> group_by(date) |> summarize(value=mean(value)) |> as.data.frame()
epc3.chl <- epc3[ which( epc3$param=="Secchi" ), ]
epc3.sec <- epc3.sec |> group_by(date) |> summarize(value=mean(value)) |> as.data.frame()
epc3.sec <- inner_join( x = epc3.chl, y = epc3.sec, by = 'date' )
gamdat_epc3 nrow(gamdat_E2)
## [1] 132
# decimal date
$decdate <- gamdat_epc3$date |> decimal_date()
gamdat_epc3# fit gam
<- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_epc3 )
gam8 # model diagnostics
par(mfrow=c(2,2))
gam.check( gam8, rep = 1000 )
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 8 iterations.
## The RMS GCV score gradient at convergence was 1.457851e-07 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120.0 22.6 1.05 0.86
summary( gam8 )
##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.313710 0.005222 60.07 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 22.55 28.58 10.38 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.508 Deviance explained = 54.7%
## GCV = 0.0084728 Scale est. = 0.0077726 n = 285
The plots below suggest that the association between Secchi depth and
chlorophyll concentration has changed substantially over time at this
sub-segment. Comparisons across sub-segments require consideration of
the te()
coefficient surfaces as well as each model’s
estimated intercept.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam8, scheme = 1)
plot.gam( gam8, scheme = 2)
Consulting reports and peer-reviewed studies:
Greening H & Janicki A (2006). Toward reversal of eutrophic conditions in a subtropical Estuary: Water quality and seagrass response to nitrogen loading reductions in Tampa Bay, Florida, USA. Environmental Management 38: 163-178. http://doi.org/10.1007/s00267-005-0079-4.
Greening H, Janicki A, Sherwood ET, Pribble R & Johansson, JOR (2014). Ecosystem responses to long-term nutrient management in an urban estuary: Tampa Bay, Florida, USA. Estuarine, Coastal and Shelf Science http://dx.doi.org/10.1016/j.ecss.2014.10.003
Janicki A & Wade D (1996). Estimating critical external nitrogen loads for the Tampa Bay Estuary: An empirically based approach to setting management targets, Final Report.
Morrison G, Janicki A, Wade D, Martin J & Vargo G (1996). Estimated nitrogen fluxes and nitrogen-chlorophyll relationships in Tampa Bay, 1985-1994. Tampa Bay Area Study Group Project Reports: 128. https://digitalcommons.usf.edu/basgp_report/128.
Software:
Grolemund G, Wickham H (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software 40(3), 1-25. https://www.jstatsoft.org/v40/i03/.
R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
Wickham H (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software 40(1): 1-29. URL https://www.jstatsoft.org/v40/i01/.
Wickham H, Francois R, Henry L, Muller K, Vaughan D (2023). dplyr: A Grammar of Data Manipulation. R package version 1.1.2, https://CRAN.R-project.org/package=dplyr)
Wickham H, Vaughan D, Girlich M (2023). tidyr: Tidy Messy Data. R package version 1.3.0, https://CRAN.R-project.org/package=tidyr.
Wood, SN (2017) Generalized Additive Models: An Introduction with R (2nd edition). Chapman and Hall/CRC.