Generalized Additive Models (GAMs)
The current management paradigm for Old Tampa Bay (OTB) holds that (1) reducing external nitrogen loads will reduce chlorophyll a concentrations in the OTB water column, (2) reducing chlorophyll a concentrations will improve water clarity, and (3) improved water clarity will stimulate seagrass growth/recovery. The management paradigm is based on earlier studies that reported significant correlations between each subsequent pair of covariates, considering data from across Tampa Bay as a whole and/or from Tampa Bay segments including OTB (e.g., Janicki & Wade 1996; Morrison et al. 1996; Greening & Janicki 2016).
This document explores statistical relationships between TN loads and chlorophyll-a concentrations and between chlorophyll-a concentrations and Secchi depth at various OTB sub-segments using monthly averaged data between approximately 2000 and 2023 (periods of record vary by data source and variable). The analyses were implemented in R version 4.3.1 (R Core Team 2023).
The results of linear models indicate correlations at each OTB sub-segment:
The results of generalized additive models (GAMs) indicated that the TN-chlorophyll and chlorophyll-Secchi relationships varied substantially over time at some OTB sub-segments. We conclude that correlation analyses and simple regression models do not reliably capture the potentially synergistic effects of multiple stressors influencing water quality and seagrass abundance over time.
Water quality and external loading data were assembled by the
../R/data_proc.R script. This section processes these data
to support the subsequent analyses, including a few basic QA/QC
checks.
We begin by loading packages (see References).
knitr::opts_chunk$set(echo=TRUE, message=FALSE, warning=FALSE)
library(lubridate)
library(plyr)
library(dplyr)
library(tidyr)
library(mgcv)This section loads the EPC water quality data from file
(../data/epcwq.RData), pivots the dataframe from wide to
long format, performs QC checks, aggregates the data to a monthly
timeframe (monthly means), and creates a new epcwq2
dataframe for analysis. As indicated in output below, a substantial
number of Secchi records indicate that reported Secchi depth equals the
total depth (i.e., the Secchi depth was visible at bottom); in these
cases, the Secchi depth is not necessarily a reliable proxy for water
clarity (e.g., in shallow areas).
# load data
load( here::here("data/epcwq.RData" ))
# dates
epcwq1 <- epcwq |> as.data.frame()
epcwq1$date <- epcwq1$SampleTime |> as.Date("%Y-%m-%d")
# subset columns (data and 'Q' columns must be listed in consecutive pairs!)
epcwq1 <- select( epcwq1, date, StationNumber, SampleDepth,
Total_Nitrogen, Total_NitrogenQ,
Chlorophyll_a, Chlorophyll_aQ,
SecchiDepth, Secchi_Q,
Total_Suspended_Solids, Total_Suspended_SolidsQ,
Turbidity, TurbidityQ )
# subset OTB stations
epcwq1 <- epcwq1[ which( epcwq1$StationNumber %in% c(36, 38, 40, 41, 42, 46, 47,
50, 51, 60, 61, 62, 63, 64,
65, 66, 67, 68) ), ]
# subset by date
epcwq1 <- epcwq1[ which( year(epcwq1$date) >= 2000 ), ]
# loop through data and QC columns to pivot from wide to long format
datacols <- seq( 4, 12, 2 )
epcwq2 <- data.frame( matrix(ncol=7,nrow=0) )
for( i in datacols ){
temp <- epcwq1[ ,1:3 ]
temp$Analyte <- colnames(epcwq1)[i]
temp$value <- epcwq1[,i] |> as.numeric()
temp$unit <- NA
temp$QA <- epcwq1[,i+1]
temp <- temp[ complete.cases( temp$value ), ]
epcwq2 <- rbind( epcwq2, temp )
}
# standardize parameter names
param.names.old <- epcwq2$Analyte |> unique() |> sort()
param.names.new <- c( "Chla", "Secchi", "TN", "TSS", "Turbidity" )
epcwq2$param <- mapvalues( epcwq2$Analyte, param.names.old, param.names.new )
# units
units <- c( "ug/l", "m", "mg/l", "mg/l", "NTU" ) # listed in same order as param.names.new, above
epcwq2$unit <- mapvalues( epcwq2$param, param.names.new, units )
# qualifier codes
# replace "NULL" with NA
epcwq2$QA[ which(epcwq2$QA=="NULL") ] <- NA
# specify codes
fatal.codes <- c("*","?","A","B","G","H","J","K",
"L","N","O","Q","T","V","Y","Z")
# define function to locate fatal records
find.fatal <- function( QUALIFIER, FATAL=fatal.codes ){ # QUALIFER arg is a string
input.code <- QUALIFIER |> strsplit(split='') |>
unlist() |> toupper() # parse string into single characters
fatal <- input.code %in% FATAL |> any() # check if any characters match FATAL
return( fatal ) # return TRUE or FALSE
} # // end find.fatal()
# apply function to locate fatal records
rm.fatal.idx <- apply( matrix(epcwq2$QA,ncol=1), 1, find.fatal ) |> which()
epcwq2 <- epcwq2[ -rm.fatal.idx, ]
# acknowledge Secchi records visible on bottom
epcwq2[ which( epcwq2$param=="Secchi" & epcwq2$QA==">"), ] |> nrow() # number of VOB Secchi records## [1] 1333
epcwq2[ which( epcwq2$param=="Secchi"), ] |> nrow() # total number of Secchi records## [1] 5092
# coerce depth to numeric
epcwq2$SampleDepth <- epcwq2$SampleDepth |> as.numeric()
# check for duplicates (confirm FALSE)
epcwq2 |> duplicated() |> any()## [1] FALSE
# subset and rename columns
epcwq2 <- epcwq2[, c("date","param","value","unit","StationNumber") ]
colnames( epcwq2 ) <- c("date","param", "value", "unit", "site" )
# aggregate data to monthly timeframe
epcwq2 <- epcwq2 |> group_by(date,param,site,unit) |> summarise(value=mean(value)) |> as.data.frame()
epcwq2$date <- epcwq2$date |> floor_date('month')The dataset contains 19,602 rows.
save(epcwq2, file = here::here('data/epcwq2.RData'))
head( epcwq2 )## date param site unit value
## 1 2000-01-01 Chla 36 ug/l 3.563
## 2 2000-01-01 Chla 38 ug/l 3.161
## 3 2000-01-01 Chla 40 ug/l 2.221
## 4 2000-01-01 Chla 41 ug/l 2.053
## 5 2000-01-01 Chla 46 ug/l 3.255
## 6 2000-01-01 Chla 47 ug/l 2.911
Pinellas County collects water quality data at five strata along the western boundary of OTB:
The code below loads the raw data from file
(../data-raw/DataDownload_2999342_row.csv), performs QC
checks, standardizes column names, aggregates the data to a monthly
timeframe (monthly means), and creates a new pcwq2
dataframe for analysis.
# load data
pcwq1 <- read.csv( here::here("data-raw/DataDownload_2999342_row.csv" ))[,1:21]
# dates
pcwq1$Date <- pcwq1$SampleDate |> as.Date("%m/%d/%Y")
# qualifier codes
# specify codes
fatal.codes <- c("*","?","A","B","G","H","J","K",
"L","N","O","Q","T","V","Y","Z")
# define function to locate fatal records
find.fatal <- function( QUALIFIER, FATAL=fatal.codes ){ # QUALIFER arg is a string
input.code <- QUALIFIER |> strsplit(split='') |>
unlist() |> toupper() # parse string into single characters
fatal <- input.code %in% FATAL |> any() # check if any characters match FATAL
return( fatal ) # return TRUE or FALSE
} # // end find.fatal()
# apply function to locate fatal records
rm.fatal.idx <- apply( matrix(pcwq1$QACode,ncol=1), 1, find.fatal ) |> which()
pcwq1 <- pcwq1[ -rm.fatal.idx, ]
# parameter names
param.names.old <- pcwq1$Parameter |> unique() |> sort()
param.names.new <- c( "BOD", "Chla", "ChlaC", "Chlb", "Chlc", "DO", "DO_Sat",
"NOx", "Salinity", "TempW", "TKN", "TN", "TSS", "Turbidity" )
pcwq1$Analyte <- mapvalues( pcwq1$Parameter, param.names.old, param.names.new )
# result units
pcwq1$Unit <- lapply( pcwq1$Parameter, function(x) strsplit(x,"_")[[1]][2] ) |> unlist()
# check for duplicates (confirm FALSE)
pcwq1 |> duplicated() |> any()## [1] FALSE
# subset and rename columns
pcwq2 <- pcwq1[, c("Date","Analyte","Result_Value","Unit","StationID") ]
colnames( pcwq2 ) <- c("date","param", "value", "unit", "site" )
# aggregate data to monthly timeframe
pcwq2 <- pcwq2 |> group_by(date,param,site,unit) |> summarise(value=mean(value)) |> as.data.frame()
pcwq2$date <- pcwq2$date |> floor_date('month')The dataset contains 11,695 rows.
head( pcwq2 )## date param site unit value
## 1 2000-01-01 BOD 62-01 mgl 1.00
## 2 2000-01-01 Chla 62-01 ugl 2.60
## 3 2000-01-01 Chlb 62-01 ugl 0.50
## 4 2000-01-01 Chlc 62-01 ugl 0.70
## 5 2000-01-01 DO 62-01 mgl 6.38
## 6 2000-01-01 NOx 62-01 ugl 20.00
This section loads the Tampa Bay Estuary Program’s monthly external
loading estimates for Tampa Bay
(../data-raw/TB_loads_monthly.csv), computes total monthly
TN loads by summing over TN load sources to OTB (e.g., AD, NPS, DPS,
etc.), standardizes column names, and creates a new loads
dataframe for analysis.
# load data
loads <- read.csv( here::here("data-raw/TB_loads_monthly.csv" ))
# subset TN data for OTB segment
loads <- loads[ which( loads$bay_segment=="Old Tampa Bay"), c('year','month','tn_load') ]
# date column
loads$date <- as.Date( paste0(loads$year,"-",loads$month,"-",01), format="%Y-%m-%d" )
# aggregate total TN load across sources
loads <- loads |> group_by(date) |> summarise(tn_load=sum(tn_load)) |> as.data.frame()
# create columns
loads$param <- "TN load"
loads$site <- "OTB watershed"
loads$unit <- 'tons'
# rename columns
colnames( loads ) <- c('date','value','param','site','unit')
# re-order columns
loads <- select( loads, date, param, value, unit, site )The dataset contains 324 rows.
head( loads )## date param value unit site
## 1 1995-01-01 TN load 20.78062 tons OTB watershed
## 2 1995-02-01 TN load 19.80657 tons OTB watershed
## 3 1995-03-01 TN load 18.34947 tons OTB watershed
## 4 1995-04-01 TN load 19.93694 tons OTB watershed
## 5 1995-05-01 TN load 58.07647 tons OTB watershed
## 6 1995-06-01 TN load 54.70274 tons OTB watershed
Janicki & Wade (1996) developed a linear regression model relating monthly average chlorophyll-a concentrations to cumulative TN loads over a three-month period (the concurrent month and two previous months). The model used data from across Tampa Bay during 1985–1991 and stratified the data by the month of the year and by bay segment, and it produced an R2 value of 0.73. Subsequently, Greening et al. (2014) developed a similar regression model for the 1985–2011 period and observed an R2 value of 0.66.
We apply a similar approach as Janciki & Wade (1996) to explore relationships between monthly average chlorophyll-a concentrations and TN loads at three OTB sub-segments during the 2000-2021 period. For simplicity, the models below do not stratify by the month of the year, in contrast to the model of Janicki & Wade (1996).
We begin by assembling the TN load and chlorophyll-a data from the
three OTB sub-segments compartmentalized by the Courtney Campbell,
Howard Frankland, and Gandy Blvd causeways/bridges, using water quality
data collected by EPC at fixed monitoring locations between 2000 and
2021. The three resulting datasets – epc1,
epc2, and epc3 – are arranged in wide format
and include the columns date, chla,
TN_load (monthly load values), and TN_load_3mo
(cumulative 3-month loads).
# Subset and join load and chl-a data for OTB sub-segments
# Sub-segment north of Courtney Campbell
epc1 <- epcwq2[ which( epcwq2$site %in% c(46,47,60,61,62,64) ), ] # subset EPC data by station
epc1 <- epc1[which(epc1$param=="Chla"),] |> group_by(date) |> # aggregate chl data to monthly means
summarise(value=mean(value)) |> as.data.frame()
epc1 <- left_join( epc1, loads[,c('date','value')], by='date' ) # value.x is chl; value.y is load
colnames(epc1) <- c("date","chla","TN_load")
# Sub-segment between Courtney Campbell and Frankland
epc2 <- epcwq2[ which( epcwq2$site %in% c(40,41,42,63,65,66) ), ]
epc2 <- epc2[which(epc2$param=="Chla"),] |> group_by(date) |>
summarise(value=mean(value)) |> as.data.frame()
epc2 <- left_join( epc2, loads[,c('date','value')], by='date' )
colnames(epc2) <- c("date","chla","TN_load")
# Sub-segment between Frankland and Gandy
epc3 <- epcwq2[ which( epcwq2$site %in% c(38,50,51,67) ), ]
epc3 <- epc3[which(epc3$param=="Chla"),] |> group_by(date) |>
summarise(value=mean(value)) |> as.data.frame()
epc3 <- left_join( epc3, loads[,c('date','value')], by='date' )
colnames(epc3) <- c("date","chla","TN_load")
# Compute 3-month TN loads
# Define function to sum load values over a specified window (`win`)
# The function assumes there are no NAs
sum3mo <- function( x, win = 3 ){
out <- rep(NA,length=(win-1))
for( i in win:length(x) ){
out[i] <- sum( x[i:(i-win+1)] )
}
return( out )
}
# Add a 3-month TN load column to each dataset
epc1$TN_load_3mo <- sum3mo( epc1$TN_load )
epc2$TN_load_3mo <- sum3mo( epc2$TN_load )
epc3$TN_load_3mo <- sum3mo( epc3$TN_load )The monthly sub-segment chlorophyll-a data and segment-wide TN load data are visualized below.
par( mfrow=c(2,2) )
plot( chla ~ date, data = epc1, pch = 16, col = rgb(0,0.4,0.8,0.7),
ylim = c(0,80), ylab = "Chlorophyll-a (ug/l)", xlab = "", las = 1,
main = "Chlorophyll-a, OTB north of Courtney Campbell" )
lines( chla ~ date, data = epc1, col = rgb(0,0.4,0.8,0.4) )
abline( v = seq.Date(as.Date("2000-01-01"),as.Date("2024-01-01"),'year'),
col = rgb(0,0,0,0.2) )
abline( h = axTicks(2), col = rgb(0,0,0,0.2) )
plot( chla ~ date, data = epc2, pch = 16, col = rgb(0,0.4,0.8,0.7),
ylim = c(0,80), ylab = "Chlorophyll-a (ug/l)", xlab = "", las = 1,
main = "Chlorophyll-a, OTB between Frankland & Courtney Campbell" )
lines( chla ~ date, data = epc2, col = rgb(0,0.4,0.8,0.4) )
abline( v = seq.Date(as.Date("2000-01-01"),as.Date("2024-01-01"),'year'),
col = rgb(0,0,0,0.2) )
abline( h = axTicks(2), col = rgb(0,0,0,0.2) )
plot( chla ~ date, data = epc3, pch = 16, col = rgb(0,0.4,0.8,0.7),
ylim = c(0,80), ylab = "Chlorophyll-a (ug/l)", xlab = "", las = 1,
main = "Chlorophyll-a, OTB between Gandy & Frankland" )
lines( chla ~ date, data = epc3, col = rgb(0,0.4,0.8,0.4) )
abline( v = seq.Date(as.Date("2000-01-01"),as.Date("2024-01-01"),'year'),
col = rgb(0,0,0,0.2) )
abline( h = axTicks(2), col = rgb(0,0,0,0.2) )
plot( TN_load ~ date, data = epc3, pch = 16, col = rgb(0.2,0.2,0.2,0.7),
ylab = "TN load (tons)", xlab = "", las = 1,
main = "Monthly TN load, Old Tampa Bay" )
lines( TN_load ~ date, data = epc3, col = rgb(0.2,0.2,0.2,0.4) )
abline( v = seq.Date(as.Date("2000-01-01"),as.Date("2024-01-01"),'year'),
col = rgb(0,0,0,0.2) )
abline( h = axTicks(2), col = rgb(0,0,0,0.2) )Below, we fit linear models for chlorophyll as a function of
cumulative 3-month TN loads at each sub-segment and visualize the
results. The TN_chl_plot() function takes the data
(dat) and model (mod) as input and produces a
scatterplot of the data with the regression line. The models show
significant relationships (P<0.001) and R2 values of 0.32, 0.37, and
0.46. However, these R2 values are not directly comparable to the
R2=0.73 of the Janicki & Wade (1996) model, because the models
differ in spatial scale (OTB sub-segments vs. Tampa Bay as a whole).
Further, the residuals of each model do not appear to be homoskedastic
nor Gaussian (diagnostic visualizations follow); experimentation with
alternative model specifications that include a month-of-year term did
not improve the model diagnostics.
# Define plotting function for data and model
TN_chl_plot <- function( dat, mod, lab.main ){
# 95% CI
pred.seq <- seq( min( dat$TN_load_3mo, na.rm=TRUE ),
max( dat$TN_load_3mo, na.rm=TRUE ), length.out=100 )
pred <- predict.gam( mod,
newdata = data.frame( TN_load_3mo = pred.seq ),
se.fit=TRUE )
# P value
pval <- summary(mod)$p.table[2,4]
pcol1 <- rgb(1,0,0,0.4)
pcol2 <- rgb(1,0,0,0.15)
if( pval < 0.001 ){
pstring <- "P < 0.001"
} else if( pval < 0.01 ){
pstring <- "P < 0.01"
} else if( pval < 0.05 ){
pstring <- "P < 0.05"
} else {
pstring <- "P > 0.05"
pcol1 <- rgb(0.2,0.3,0.9,0.4)
pcol2 <- rgb(0,0.1,0.6,0.15)
}
# scatterplot data
plot( chla ~ TN_load_3mo, data = dat,
col = rgb(0,0,0,0.6), cex = 1.4,
main = paste( lab.main,"\n",
min(dat$date), " - ", max(dat$date) ),
cex.main = 1.6, cex.lab = 1.4,
xlim = range( c(0,dat$TN_load_3mo,pred.seq), na.rm=TRUE ),
ylim = range( c(0,dat$chla,pred$fit-pred$se.fit*1.96,
pred$fit+pred$se.fit*1.96)*1.2 ),
xlab = "3-month TN load (tons)", cex.axis = 1.4,
ylab = "Chlorophyll-a (ug/l)", las = 1
)
abline( v = axTicks(1), col = rgb(0,0,0,0.2) )
abline( h = axTicks(2), col = rgb(0,0,0,0.2) )
abline( v = 0, col = rgb(0,0,0,0.5) )
abline( h = 0, col = rgb(0,0,0,0.5) )
# plot model fit
lines( pred$fit ~ pred.seq, col = pcol1, lwd = 2, lty = 2 )
polygon( x = c( pred.seq, rev(pred.seq) ),
y = c( (pred$fit+pred$se.fit*1.96), rev((pred$fit-pred$se.fit*1.96)) ),
col = pcol2, border = rgb(0,0,0,0) )
# write model R2 and pval
legend( 'top', horiz = TRUE, bty = 'n',
border = rgb(0,0,0,0), cex = 1.4,
legend = paste0( "N = ", nrow(dat), " ",
"R2 = ", round(summary(mod)$r.sq,3),
" ", pstring )
)
} # // end TN_chl_plot() function
# remove incomplete cases in the data
epc1 <- epc1[ complete.cases(epc1), ]
epc2 <- epc2[ complete.cases(epc2), ]
epc3 <- epc3[ complete.cases(epc3), ]
# models
ncmod1 <- gam( chla ~ TN_load_3mo, data = epc1 )
ncmod2 <- gam( chla ~ TN_load_3mo, data = epc2 )
ncmod3 <- gam( chla ~ TN_load_3mo, data = epc3 )
# plots
par( mfrow=c(1,3) )
TN_chl_plot( epc1, ncmod1, "Chla north of CC vs. 3mo TN load (OTB)" )
TN_chl_plot( epc2, ncmod2, "Chla bw HF-CC vs. 3mo TN load (OTB)" )
TN_chl_plot( epc3, ncmod3, "Chla bw GB-HF vs. 3mo TN load (OTB)" )The diagnostics for each of the above OTB models indicate that the assumption of homoskedastic, Gaussian residuals is not met.
par(mfrow=c(2,2))
# North of CC
gam.check(ncmod1,rep=1000)##
## Method: GCV Optimizer: magic
## Model required no smoothing parameter selectionModel rank = 2 / 2
# Between CC and HF
gam.check(ncmod2,rep=1000)##
## Method: GCV Optimizer: magic
## Model required no smoothing parameter selectionModel rank = 2 / 2
# Between HF and GB
gam.check(ncmod3,rep=1000)##
## Method: GCV Optimizer: magic
## Model required no smoothing parameter selectionModel rank = 2 / 2
This section further explores relationships between chlorophyll-a and TN loads by considering monthly TN loading values at delays of zero through two months across various OTB sub-segments, including the EPC sub-segments and the Pinellas County strata.
The code chunk below defines the OTB_corr() function,
which computes and visualizes a linear fit between a specified
explanatory variable x and response variable y, after
joining the datasets by date; the input data may come from two different
dataframes. The function’s arguments are described in comments in the
first few lines of the code chunk.
The delay argument allows the user to specify a
non-negative integer time delay between the explanatory and response
variables. By default, the delay=0 setting pairs response
values with concurrent explanatory values (e.g., TN loads and
chlorophyll concentrations during the same month); setting
delay to a positive values lags the explanatory variable by
the specified number of time periods. The log argument
(logical) allows the user to specify whether to log10-transform
x and y prior to the regression. The function outputs
an x-y scatterplot that displays the linear fit with a 95% confidence
band, as well as R2 and P-value estimates. The color of the regression
line indicates whether the estimated slope is statistically significant
(red) or not significant (gray) at alpha = 0.05. The function returns
the input data and the model object.
# define regression/plot function
# x, y dataframes for explanatory (x) and response (y) variables,
# each containing columns date, param, value, unit,
# param.x, water quality parameters in x$param, y$param (strings)
# param.y
# site.x, location labels in x$site, y$site (string)
# site.y
# log logical, if TRUE log10-transform the x and y data. FALSE by default
# delay number of time steps to delay param.x (non-negative integer)
OTB_corr <- function( x, y,
param.x, param.y,
site.x, site.y = site.x,
log = FALSE,
delay = 0 ){
# units
unit.x <- x$unit[ which( x$param==param.x ) ] |> unique()
unit.y <- y$unit[ which( y$param==param.y ) ] |> unique()
# assemble data for x and y in wide format
dat.x <- x[ which( x$param==param.x & x$site==site.x ), ]
dat.x <- dat.x |> group_by(date) |> summarise(value=mean(value)) |> as.data.frame()
dat.y <- y[ which( y$param==param.y & y$site==site.y ), ]
dat.y <- dat.y |> group_by(date) |> summarise(value=mean(value)) |> as.data.frame()
dat <- dplyr::inner_join( dat.x, dat.y, by = 'date' )
# log10-transform
if( log == TRUE ){
dat$value.x[ which( dat$value.x != 0) ] <- dat$value.x |> log10()
dat$value.y[ which( dat$value.y != 0) ] <- dat$value.y |> log10()
unit.x <- paste( "log10", unit.x )
unit.y <- paste( "log10", unit.y )
}
# delay
if( delay > 0 ){
dateseq <- data.frame( date = seq.Date( min(dat$date), max(dat$date), 'month' ) )
dat <- dplyr::left_join( dateseq, dat, by = 'date' )
dat$value.x <- dplyr::lag( dat$value.x, n = delay )
dat <- na.omit( dat )
} else if ( delay < 0 ){
stop("Negative delay specified.")
}
# fit linear model
# fit model
mod1 <- gam( value.y ~ value.x, data = dat )
# plot fit with 95% CI
pred.seq <- seq( min( dat$value.x ), max( dat$value.x ), length.out=100 )
pred <- predict.gam( mod1,
newdata = data.frame( value.x = pred.seq ),
se.fit=TRUE )
# P value
pval <- summary(mod1)$p.table[2,4]
pcol1 <- rgb(1,0,0,0.4)
pcol2 <- rgb(1,0,0,0.15)
if( pval < 0.001 ){
pstring <- "P < 0.001"
} else if( pval < 0.01 ){
pstring <- "P < 0.01"
} else if( pval < 0.05 ){
pstring <- "P < 0.05"
} else {
pstring <- "P > 0.05"
pcol1 <- rgb(0.2,0.3,0.9,0.4)
pcol2 <- rgb(0,0.1,0.6,0.15)
}
# scatterplot data
plot( value.y ~ value.x, data = dat,
col = rgb(0,0,0,0.6), cex = 1.4,
main = paste( param.y, "at", site.y, " vs. ", param.x, "at", site.x,"\n",
min(dat$date), " - ", max(dat$date) ),
cex.main = 1.6, cex.lab = 1.4,
xlim = range( c(0,dat$value.x,pred.seq) ),
ylim = range( c(0,dat$value.y,pred$fit-pred$se.fit*1.96,
pred$fit+pred$se.fit*1.96)*1.2 ),
xlab = paste0( param.x, " (", unit.x, ")" ), cex.axis = 1.4,
ylab = paste0( param.y, " (", unit.y, ")" ), las = 1
)
abline( v = axTicks(1), col = rgb(0,0,0,0.2) )
abline( h = axTicks(2), col = rgb(0,0,0,0.2) )
abline( v = 0, col = rgb(0,0,0,0.5) )
abline( h = 0, col = rgb(0,0,0,0.5) )
# plot model fit
lines( pred$fit ~ pred.seq, col = pcol1, lwd = 2, lty = 2 )
polygon( x = c( pred.seq, rev(pred.seq) ),
y = c( (pred$fit+pred$se.fit*1.96), rev((pred$fit-pred$se.fit*1.96)) ),
col = pcol2, border = rgb(0,0,0,0) )
# write model R2 and pval
legend( 'top', horiz = TRUE, bty = 'n',
border = rgb(0,0,0,0), cex = 1.4,
legend = paste0( "N = ", nrow(dat), " ",
"R2 = ", round(summary(mod1)$r.sq,3),
" ", pstring, " delay = ", delay )
)
return( list( data = dat, model = mod1 ) )
} # // end OTB_corr()The plots below show relationships between log10-transformed external TN loads and chlorophyll-a concentrations within the three OTB sub-segments compartmentalized by the Courtney Campbell, Howard Frankland, and Gandy Blvd causeways/bridges, using water quality data collected by EPC at fixed monitoring locations between 2000 and 2021. The three OTB compartments span the three rows, and the three columns pertain to delays of zero, one, and two months. All three OTB compartments show significant positive correlations at each delay, with a maximum R2 value of 0.36. In each case the data exhibit substantial scatter (unexplained variance) around the regression line. Although not visualized below, the diagnostics for each model indicate that the residuals are homoskedastic and approximately Gaussian.
par(mfrow=c(3,3))
# assemble and re-label data for locations north of Courtney Campbell
epc1 <- epcwq2[ which( epcwq2$site %in% c(46,47,60,61,62,64) ), ]
epc1$site <- "North CC"
# models
epc1_load_Chla_0 <- OTB_corr( x = loads, y = epc1, 'TN load', 'Chla',
'OTB watershed', 'North CC', log = TRUE, delay = 0 )
epc1_load_Chla_1 <- OTB_corr( x = loads, y = epc1, 'TN load', 'Chla',
'OTB watershed', 'North CC', log = TRUE, delay = 1 )
epc1_load_Chla_2 <- OTB_corr( x = loads, y = epc1, 'TN load', 'Chla',
'OTB watershed', 'North CC', log = TRUE, delay = 2 )
# assemble and re-label data for locations between Frankland & Campbell
epc2 <- epcwq2[ which( epcwq2$site %in% c(40,41,42,63,65,66) ), ]
epc2$site <- "HF-CC"
# models
epc2_load_Chla_0 <- OTB_corr( x = loads, y = epc2, 'TN load', 'Chla',
'OTB watershed', 'HF-CC', log = TRUE, delay = 0 )
epc2_load_Chla_1 <- OTB_corr( x = loads, y = epc2, 'TN load', 'Chla',
'OTB watershed', 'HF-CC', log = TRUE, delay = 1 )
epc2_load_Chla_2 <- OTB_corr( x = loads, y = epc2, 'TN load', 'Chla',
'OTB watershed', 'HF-CC', log = TRUE, delay = 2 )
# assemble and re-label data for locations between Gandy & Frankland
epc3 <- epcwq2[ which( epcwq2$site %in% c(38,50,51,67) ), ]
epc3$site <- "GB-HF"
# models
epc3_load_Chla_0 <- OTB_corr( x = loads, y = epc3, 'TN load', 'Chla',
'OTB watershed', 'GB-HF', log = TRUE, delay = 0 )
epc3_load_Chla_1 <- OTB_corr( x = loads, y = epc3, 'TN load', 'Chla',
'OTB watershed', 'GB-HF', log = TRUE, delay = 1 )
epc3_load_Chla_2 <- OTB_corr( x = loads, y = epc3, 'TN load', 'Chla',
'OTB watershed', 'GB-HF', log = TRUE, delay = 2 )The plots below show relationships between log10-transformed TN loads and chlorophyll-a concentrations within Pinellas County strata E1–E5 (rows) at delays of 0–2 months (columns). During the same month (first column), all strata show significant positive correlations, with a maximum R2 value of 0.27. The data typically exhibit substantial scatter (unexplained variance) around the regression lines. Across all strata, with the exception of E1, significant relationships are also observed at delays of one month (second column) and two months (third column), although the R2 values are substantially lower. Although not visualized below, the diagnostics for each model indicate that the residuals are homoskedastic and approximately Gaussian.
par(mfrow=c(5,3))
E1_load_Chla_0 <- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E1',
log = TRUE, delay = 0 )
E1_load_Chla_1 <- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E1',
log = TRUE, delay = 1 )
E1_load_Chla_2 <- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E1',
log = TRUE, delay = 2 )
E2_load_Chla_0 <- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E2',
log = TRUE, delay = 0 )
E2_load_Chla_1 <- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E2',
log = TRUE, delay = 1 )
E2_load_Chla_2 <- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E2',
log = TRUE, delay = 2 )
E3_load_Chla_0 <- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E3',
log = TRUE, delay = 0 )
E3_load_Chla_1 <- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E3',
log = TRUE, delay = 1 )
E3_load_Chla_2 <- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E3',
log = TRUE, delay = 2 )
E4_load_Chla_0 <- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E4',
log = TRUE, delay = 0 )
E4_load_Chla_1 <- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E4',
log = TRUE, delay = 1 )
E4_load_Chla_2 <- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E4',
log = TRUE, delay = 2 )
E5_load_Chla_0 <- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E5',
log = TRUE, delay = 0 )
E5_load_Chla_1 <- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E5',
log = TRUE, delay = 1 )
E5_load_Chla_2 <- OTB_corr( x = loads, y = pcwq2, 'TN load', 'Chla', 'OTB watershed', 'E5',
log = TRUE, delay = 2 )
This section explores relationships between log10-transformed chlorophyll-a concentrations and Secchi depths within the three OTB sub-segments compartmentalized by the Courtney Campbell, Howard Frankland, and Gandy Blvd causeways/bridges, using data collected by EPC at fixed monitoring locations between 2000 and 2023. Following Greening & Janicki (2016), the variables are log-transformed prior to analysis. When interpreting these results, it is important to consider that the Secchi data contain a substantial number of records in which the Secchi depth equals the total depth (i.e., the Secchi disk was visible at bottom). In these cases, the reported Secchi depth is not necessarily a reliable proxy of water clarity (e.g., in shallow areas), and the presence of these records may bias the regression results.
All three OTB compartments show significant negative correlations at
delays of zero and one month (first and second columns), with a maximum
R2 value of 0.38. The magnitudes of the slopes and the R2 values both
decrease between delay=0 and delay=1. And in
each case, the data exhibit substantial scatter around the regression
line, indicating that any given chlorophyll concentration is typically
associated with a large range of Secchi values. At a delay of two months
(third column), none of the OTB compartments shows a significant
correlation. Although not visualized below, the diagnostics for each
model indicate that the residuals are homoskedastic and approximately
Gaussian.
par(mfrow=c(3,3))
# models
epc1_Chla_Secchi_0 <- OTB_corr( x = epc1, y = epc1, 'Chla', 'Secchi',
'North CC', log = TRUE, delay = 0 )
epc1_Chla_Secchi_1 <- OTB_corr( x = epc1, y = epc1, 'Chla', 'Secchi',
'North CC', log = TRUE, delay = 1 )
epc1_Chla_Secchi_2 <- OTB_corr( x = epc1, y = epc1, 'Chla', 'Secchi',
'North CC', log = TRUE, delay = 2 )
# models
epc2_Chla_Secchi_0 <- OTB_corr( x = epc2, y = epc2, 'Chla', 'Secchi',
'HF-CC', log = TRUE, delay = 0 )
epc2_Chla_Secchi_1 <- OTB_corr( x = epc2, y = epc2, 'Chla', 'Secchi',
'HF-CC', log = TRUE, delay = 1 )
epc2_Chla_Secchi_2 <- OTB_corr( x = epc2, y = epc2, 'Chla', 'Secchi',
'HF-CC', log = TRUE, delay = 2 )
# models
epc3_Chla_Secchi_0 <- OTB_corr( x = epc3, y = epc3, 'Chla', 'Secchi',
'GB-HF', log = TRUE, delay = 0 )
epc3_Chla_Secchi_1 <- OTB_corr( x = epc3, y = epc3, 'Chla', 'Secchi',
'GB-HF', log = TRUE, delay = 1 )
epc3_Chla_Secchi_2 <- OTB_corr( x = epc3, y = epc3, 'Chla', 'Secchi',
'GB-HF', log = TRUE, delay = 2 )
Next, we explore whether nitrogen-chlorophyll and chlorophyll-Secchi relationships have changed over time across OTB sub-segments.
This section explores whether relationships between monthly averaged chlorophyll concentrations and concurrent TN loads have changed over time within the three sub-segments divided by bridges/causeways (using EPC data) and within the five Pinellas County strata (E1–E5).
We assemble the EPC data and specify a GAM that estimates the
chlorophyll-a concentration using smooth terms for the decimal date
(decdate), TN load (value.x), and their tensor
product interaction. Model diagnostics indicate that the residuals are
approximately Gaussian (QQ plot) and that the model had sufficient basis
dimensions (k).
# join TN load data to chlorophyll data
epc1 <- epcwq2[ which( epcwq2$site %in% c(46,47,60,61,62,64) ), ] # subset EPC data by station
epc1 <- epc1[which(epc1$param=="Chla"),] |> group_by(date) |> # aggregate chl data to monthly means
summarise(value=mean(value)) |> as.data.frame()
gamdat_epc1 <- inner_join( x = epc1, y = loads, by = 'date' )
nrow(gamdat_epc1)## [1] 261
# decimal date
gamdat_epc1$decdate <- gamdat_epc1$date |> decimal_date()
# fit gam
gam_epc1 <- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_epc1 )
# model diagnostics
par(mfrow=c(2,2))
gam.check( gam_epc1, rep = 1000 )##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 12 iterations.
## The RMS GCV score gradient at convergence was 1.781363e-06 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120 13 0.98 0.35
summary( gam_epc1 )##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.50621 0.01844 81.69 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 13.03 17.5 8.752 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.368 Deviance explained = 40%
## GCV = 0.093781 Scale est. = 0.08874 n = 261
The plots below visualize the estimated te() coefficient
surface and suggest that the association between chlorophyll
concentrations and TN load (value.x) has increased over
time (decdate) within the sub-segment. In the heatmap
image, colors range from red (relatively low magnitude) to yellow
(higher magnitude). White patches indicate gaps in the input space.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam_epc1, scheme = 1 )
plot.gam( gam_epc1, scheme = 2 )The model shows good diagnostics.
# join TN load data to chlorophyll data
epc2 <- epcwq2[ which( epcwq2$site %in% c(40,41,42,63,65,66) ), ]
epc2 <- epc2[which(epc2$param=="Chla"),] |> group_by(date) |>
summarise(value=mean(value)) |> as.data.frame()
gamdat_epc2 <- inner_join( x = epc2, y = loads, by = 'date' )
nrow(gamdat_epc2)## [1] 261
# decimal date
gamdat_epc2$decdate <- gamdat_epc2$date |> decimal_date()
# fit gam
gam_epc2 <- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_epc2 )
# model diagnostics
par(mfrow=c(2,2))
gam.check( gam_epc2, rep = 1000 )##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 19 iterations.
## The RMS GCV score gradient at convergence was 8.974192e-08 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120.00 5.79 0.94 0.04 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary( gam_epc2 )##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.50621 0.01877 80.24 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 5.789 6.76 20.55 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.345 Deviance explained = 36%
## GCV = 0.094419 Scale est. = 0.091963 n = 261
The plots below suggest that the association between chlorophyll
concentrations and TN load (value.x) has increased over
time (decdate) within the sub-segment. In the heatmap
image, colors range from red (relatively low magnitude) to yellow
(higher magnitude). Comparisons across sub-segments require
consideration of the te() coefficient surfaces as well as
each model’s estimated intercept.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam_epc2, scheme = 1 )
plot.gam( gam_epc2, scheme = 2 )The model shows good diagnostics.
# join TN load data to chlorophyll data
epc3 <- epcwq2[ which( epcwq2$site %in% c(38,50,51,67) ), ]
epc3 <- epc3[which(epc3$param=="Chla"),] |> group_by(date) |>
summarise(value=mean(value)) |> as.data.frame()
gamdat_epc3 <- inner_join( x = epc3, y = loads, by = 'date' )
nrow(gamdat_epc3)## [1] 261
# decimal date
gamdat_epc3$decdate <- gamdat_epc3$date |> decimal_date()
# fit gam
gam_epc3 <- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_epc3 )
# model diagnostics
par(mfrow=c(2,2))
gam.check( gam_epc3, rep = 1000 )##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 15 iterations.
## The RMS GCV score gradient at convergence was 2.640248e-06 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120 18 1.03 0.66
summary( gam_epc3 )##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.50621 0.01905 79.08 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 17.96 21.99 6.071 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.326 Deviance explained = 37.3%
## GCV = 0.1021 Scale est. = 0.094686 n = 261
The plots below suggest that the association between chlorophyll
concentrations and TN load (value.x) has generally
increased over time (decdate) within the sub-segment, but
the dynamics here are a bit more complex than in the two previous
sub-segments. In the heatmap image, colors range from red (relatively
low magnitude) to yellow (higher magnitude). Comparisons across
sub-segments require consideration of the te() coefficient
surfaces as well as each model’s estimated intercept.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam_epc3, scheme = 1 )
plot.gam( gam_epc3, scheme = 2 )The model shows good diagnostics.
# join TN load data to chlorophyll data
gamdat_E1 <- inner_join( x = pcwq2[ which( pcwq2$param=='Chla' & pcwq2$site=="E1" ), ],
y = loads,
by = 'date'
)
nrow(gamdat_E1)## [1] 127
# decimal date
gamdat_E1$decdate <- gamdat_E1$date |> decimal_date()
# fit gam
gam1 <- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_E1 )
# model diagnostics
par(mfrow=c(2,2))
gam.check( gam1, rep = 1000 )##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 22 iterations.
## The RMS GCV score gradient at convergence was 6.398965e-08 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120 3 1.07 0.8
summary( gam1 )##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.53432 0.02782 55.15 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 3 3 15.54 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.257 Deviance explained = 27.5%
## GCV = 0.10149 Scale est. = 0.098294 n = 127
The plots below visualize the estimated te() coefficient
surface and suggest that the association between chlorophyll
concentrations and TN load (value.x) has not substantially
changed over time (decdate). In the heatmap image, colors
range from red (relatively low magnitude) to yellow (higher magnitude).
White patches indicate gaps in the input space. Comparisons across
sub-segments require consideration of the te() coefficient
surfaces as well as each model’s estimated intercept.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam1, scheme = 1)
plot.gam( gam1, scheme = 2)The model shows good diagnostics.
# join TN load data to chlorophyll data
gamdat_E2 <- inner_join( x = pcwq2[ which( pcwq2$param=='Chla' & pcwq2$site=="E2" ), ],
y = loads,
by = 'date'
)
nrow(gamdat_E2)## [1] 132
# decimal date
gamdat_E2$decdate <- gamdat_E2$date |> decimal_date()
# fit gam
gam2 <- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_E2 )
# model diagnostics
par(mfrow=c(2,2))
gam.check( gam2, rep = 1000 )##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 12 iterations.
## The RMS GCV score gradient at convergence was 1.556571e-06 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120.00 7.94 1.01 0.46
summary( gam2 )##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.5445 0.0288 53.62 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 7.938 10.37 3.734 0.000182 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.217 Deviance explained = 26.5%
## GCV = 0.11745 Scale est. = 0.1095 n = 132
The plots below suggest that the association between chlorophyll
concentrations and TN load (value.x) has changed
substantially over time (decdate) at Largo Inlet, with TN
loads (value.x) most strongly associated with chlorophyll
concentrations between about 2010 and 2015. Comparisons across
sub-segments require consideration of the te() coefficient
surfaces as well as each model’s estimated intercept.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam2, scheme = 1)
plot.gam( gam2, scheme = 2)The model shows good diagnostics.
# join TN load data to chlorophyll data
gamdat_E3 <- inner_join( x = pcwq2[ which( pcwq2$param=='Chla' & pcwq2$site=="E3" ), ],
y = loads,
by = 'date'
)
nrow(gamdat_E3)## [1] 132
# decimal date
gamdat_E3$decdate <- gamdat_E3$date |> decimal_date()
# fit gam
gam3 <- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_E3 )
# model diagnostics
par(mfrow=c(2,2))
gam.check( gam3, rep = 1000 )##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 22 iterations.
## The RMS GCV score gradient at convergence was 9.880194e-08 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120.00 4.52 0.89 0.06 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary( gam3 )##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.52336 0.02802 54.36 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 4.518 5.194 6.702 1.18e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.197 Deviance explained = 22.5%
## GCV = 0.10818 Scale est. = 0.10366 n = 132
The plots below suggest that the association between chlorophyll
concentrations and TN load (value.x) has not changed
substantially over time (decdate) at Feather Sound.
Comparisons across sub-segments require consideration of the
te() coefficient surfaces as well as each model’s estimated
intercept.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam3, scheme = 1)
plot.gam( gam3, scheme = 2)The model shows good diagnostics.
# join TN load data to chlorophyll data
gamdat_E4 <- inner_join( x = pcwq2[ which( pcwq2$param=='Chla' & pcwq2$site=="E4" ), ],
y = loads,
by = 'date'
)
nrow(gamdat_E4)## [1] 126
# decimal date
gamdat_E4$decdate <- gamdat_E4$date |> decimal_date()
# fit gam
gam4 <- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_E4 )
# model diagnostics
par(mfrow=c(2,2))
gam.check( gam4, rep = 1000 )##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 40 iterations.
## The RMS GCV score gradient at convergence was 6.314619e-06 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120.00 5.71 0.97 0.31
summary( gam4 )##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.52974 0.02616 58.47 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 5.712 7.127 8.513 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.322 Deviance explained = 35.3%
## GCV = 0.091092 Scale est. = 0.08624 n = 126
The plots below suggest that the association between chlorophyll
concentrations and TN load (value.x) has not changed
substantially over time (decdate) at this sub-segment.
Comparisons across sub-segments require consideration of the
te() coefficient surfaces as well as each model’s estimated
intercept.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam4, scheme = 1)
plot.gam( gam4, scheme = 2)The model shows good diagnostics.
# join TN load data to chlorophyll data
gamdat_E5 <- inner_join( x = pcwq2[ which( pcwq2$param=='Chla' & pcwq2$site=="E5" ), ],
y = loads,
by = 'date'
)
nrow(gamdat_E5)## [1] 128
# decimal date
gamdat_E5$decdate <- gamdat_E5$date |> decimal_date()
# fit gam
gam5 <- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_E5 )
# model diagnostics
par(mfrow=c(2,2))
gam.check( gam5, rep = 1000 )##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 16 iterations.
## The RMS GCV score gradient at convergence was 1.229579e-07 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120.0 12.9 1.21 0.99
summary( gam5 )##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.51629 0.02511 60.39 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 12.94 16.78 5.789 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.421 Deviance explained = 48%
## GCV = 0.090549 Scale est. = 0.080691 n = 128
The plots below suggests a possible bifurcation in the association
between chlorophyll concentrations and TN load (value.x) at
Weedon Island over time (decdate), for moderate to high
levels of TN load. Comparisons across sub-segments require consideration
of the te() coefficient surfaces as well as each model’s
estimated intercept.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam5, scheme = 1)
plot.gam( gam5, scheme = 2)
This section explores whether relationships between chlorophyll concentrations and Secchi depth have changed over time across the three OTB sub-segments divided by bridges/causeways.
We assemble the EPC data and fit a GAM to model the chlorophyll-Secchi relationship over time. The model shows good diagnostics.
# join TN load data to chlorophyll data
epc1 <- epcwq2[ which( epcwq2$site %in% c(46,47,60,61,62,64) ), ] # subset EPC data by station
epc1.chl <- epc1[ which( epc1$param=="Chla" ), ]
epc1.chl <- epc1.chl |> group_by(date) |> summarize(value=mean(value)) |> as.data.frame()
epc1.sec <- epc1[ which( epc1$param=="Secchi" ), ]
epc1.sec <- epc1.sec |> group_by(date) |> summarize(value=mean(value)) |> as.data.frame()
gamdat_epc1 <- inner_join( x = epc1.chl, y = epc1.sec, by = 'date' )
nrow(gamdat_E2)## [1] 132
# decimal date
gamdat_epc1$decdate <- gamdat_epc1$date |> decimal_date()
# fit gam
gam6 <- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_epc1 )
# model diagnostics
par(mfrow=c(2,2))
gam.check( gam6, rep = 1000 )##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 10 iterations.
## The RMS GCV score gradient at convergence was 8.176381e-08 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120.0 34.2 1 0.46
summary( gam6 )##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.137672 0.004366 31.54 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 34.19 43.53 4.091 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.371 Deviance explained = 44.7%
## GCV = 0.0061972 Scale est. = 0.005432 n = 285
The plots below visualize the estimated te() coefficient
surface and suggest that the association between Secchi depth and
chlorophyll concentration (value.x) has changed
substantially over time (decdate). Redder areas (between
approximately 2000–2005 and 2010–2015) in the heatmap indicate time
periods when increased chlorophyll concentrations were more strongly
associated with reduced water clarity.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam6, scheme = 1)
plot.gam( gam6, scheme = 2)The model shows good diagnostics.
# join TN load data to chlorophyll data
epc2 <- epcwq2[ which( epcwq2$site %in% c(40,41,42,63,65,66) ), ]
epc2.chl <- epc2[ which( epc2$param=="Chla" ), ]
epc2.chl <- epc2.chl |> group_by(date) |> summarize(value=mean(value)) |> as.data.frame()
epc2.sec <- epc2[ which( epc2$param=="Secchi" ), ]
epc2.sec <- epc2.sec |> group_by(date) |> summarize(value=mean(value)) |> as.data.frame()
gamdat_epc2 <- inner_join( x = epc2.chl, y = epc2.sec, by = 'date' )
nrow(gamdat_E2)## [1] 132
# decimal date
gamdat_epc2$decdate <- gamdat_epc2$date |> decimal_date()
# fit gam
gam7 <- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_epc2 )
# model diagnostics
par(mfrow=c(2,2))
gam.check( gam7, rep = 1000 )##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 6 iterations.
## The RMS GCV score gradient at convergence was 3.698484e-06 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120.0 30.1 1.03 0.65
summary( gam7 )##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.246518 0.005706 43.2 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 30.14 37.86 9.754 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.563 Deviance explained = 60.9%
## GCV = 0.010418 Scale est. = 0.0092794 n = 285
The plots below suggest that the association between Secchi depth and
chlorophyll concentration (value.x) has changed
substantially over time (decdate) at this sub-segment.
Redder areas in the heatmap indicate time periods when increased
chlorophyll concentrations were more strongly associated with reduced
water clarity. Although the heatmap indicates gaps in some
high-chlorophyll areas of the input space, the temporal pattern in these
results noticeably resembles those of the previous model
(gam6). Nonetheless, quantitative comparisons across
sub-segments require consideration of the te() coefficient
surfaces as well as each model’s estimated intercept.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam7, scheme = 1)
plot.gam( gam7, scheme = 2)The model shows good diagnostics.
# join TN load data to chlorophyll data
epc3 <- epcwq2[ which( epcwq2$site %in% c(38,50,51,67) ), ]
epc3.chl <- epc3[ which( epc3$param=="Chla" ), ]
epc3.chl <- epc3.chl |> group_by(date) |> summarize(value=mean(value)) |> as.data.frame()
epc3.sec <- epc3[ which( epc3$param=="Secchi" ), ]
epc3.sec <- epc3.sec |> group_by(date) |> summarize(value=mean(value)) |> as.data.frame()
gamdat_epc3 <- inner_join( x = epc3.chl, y = epc3.sec, by = 'date' )
nrow(gamdat_E2)## [1] 132
# decimal date
gamdat_epc3$decdate <- gamdat_epc3$date |> decimal_date()
# fit gam
gam8 <- gam( log10(value.y) ~ te( decdate, value.x, k=11 ), data = gamdat_epc3 )
# model diagnostics
par(mfrow=c(2,2))
gam.check( gam8, rep = 1000 )##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 8 iterations.
## The RMS GCV score gradient at convergence was 1.457851e-07 .
## The Hessian was positive definite.
## Model rank = 121 / 121
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(decdate,value.x) 120.0 22.6 1.05 0.86
summary( gam8 )##
## Family: gaussian
## Link function: identity
##
## Formula:
## log10(value.y) ~ te(decdate, value.x, k = 11)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.313710 0.005222 60.07 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(decdate,value.x) 22.55 28.58 10.38 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.508 Deviance explained = 54.7%
## GCV = 0.0084728 Scale est. = 0.0077726 n = 285
The plots below suggest that the association between Secchi depth and
chlorophyll concentration has changed substantially over time at this
sub-segment. Comparisons across sub-segments require consideration of
the te() coefficient surfaces as well as each model’s
estimated intercept.
# plot smooth
par(mfrow=c(1,2))
plot.gam( gam8, scheme = 1)
plot.gam( gam8, scheme = 2)
Consulting reports and peer-reviewed studies:
Greening H & Janicki A (2006). Toward reversal of eutrophic conditions in a subtropical Estuary: Water quality and seagrass response to nitrogen loading reductions in Tampa Bay, Florida, USA. Environmental Management 38: 163-178. http://doi.org/10.1007/s00267-005-0079-4.
Greening H, Janicki A, Sherwood ET, Pribble R & Johansson, JOR (2014). Ecosystem responses to long-term nutrient management in an urban estuary: Tampa Bay, Florida, USA. Estuarine, Coastal and Shelf Science http://dx.doi.org/10.1016/j.ecss.2014.10.003
Janicki A & Wade D (1996). Estimating critical external nitrogen loads for the Tampa Bay Estuary: An empirically based approach to setting management targets, Final Report.
Morrison G, Janicki A, Wade D, Martin J & Vargo G (1996). Estimated nitrogen fluxes and nitrogen-chlorophyll relationships in Tampa Bay, 1985-1994. Tampa Bay Area Study Group Project Reports: 128. https://digitalcommons.usf.edu/basgp_report/128.
Software:
Grolemund G, Wickham H (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software 40(3), 1-25. https://www.jstatsoft.org/v40/i03/.
R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
Wickham H (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software 40(1): 1-29. URL https://www.jstatsoft.org/v40/i01/.
Wickham H, Francois R, Henry L, Muller K, Vaughan D (2023). dplyr: A Grammar of Data Manipulation. R package version 1.1.2, https://CRAN.R-project.org/package=dplyr)
Wickham H, Vaughan D, Girlich M (2023). tidyr: Tidy Messy Data. R package version 1.3.0, https://CRAN.R-project.org/package=tidyr.
Wood, SN (2017) Generalized Additive Models: An Introduction with R (2nd edition). Chapman and Hall/CRC.