First, install the wqtrends package from the ssc branch on GitHub:

install.packages('devtools')
library(devtools)
install_github('tbep-tech/wqtrends', ref = 'ssc')

Load the package (and dplyr):

library(wqtrends)
library(dplyr)

The rawdat object now includes ssc (mg/L) from Dumbarton Bridge as daily-averaged values.

head(rawdat)
##         date station param    value doy cont_year   yr  mo      ssc
## 1 1993-01-05      30   chl 1.400000   5  1993.011 1993 Jan 53.84946
## 2 1993-01-05      24   chl 1.400000   5  1993.011 1993 Jan 53.84946
## 3 1993-01-05      27   chl 1.433333   5  1993.011 1993 Jan 53.84946
## 4 1993-01-05      21   chl 1.366667   5  1993.011 1993 Jan 53.84946
## 5 1993-01-05      32   chl 1.400000   5  1993.011 1993 Jan 53.84946
## 6 1993-01-05      22   chl 1.333333   5  1993.011 1993 Jan 53.84946

The sscdat object includes the full record of ssc data. The dataset is used to create prediction matrices for the downstream analysis and plotting functions. This allows for an assessment of results from a fitted model at points in the time series when chlorophyll-a was not monitored, but ssc is available. The ssc data are available for most but not all days.

head(sscdat)
## # A tibble: 6 x 2
##   date         ssc
##   <date>     <dbl>
## 1 1992-10-21  36.3
## 2 1992-10-22  44.7
## 3 1992-10-23  45.7
## 4 1992-10-24  48.9
## 5 1992-10-25  59.4
## 6 1992-10-26  62.2

The anlz_gam function includes an additional smoother for ssc. By default, the number of knots for the continuous year variable (cont_year) is set to “high” as 12 times the number of years in the input data. The default number of knots for ssc is 24.

tomod <- rawdat %>% 
  filter(station %in% 34) %>% 
  filter(param %in% 'chl')
mod1 <- anlz_gam(tomod, trans = 'log10')
mod1
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## value ~ s(cont_year, k = 324) + s(ssc, k = 24)
## 
## Estimated degrees of freedom:
## 160.20   2.16  total = 163.36 
## 
## GCV score: 0.08157124

Changing the knots for cont_year and ssc can be done with the kts and ssckts arguments.

mod2 <- anlz_gam(tomod, trans = 'log10', kts = 10, ssckts = 200)
mod2
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## value ~ s(cont_year, k = 10) + s(ssc, k = 200)
## 
## Estimated degrees of freedom:
## 4.08 2.01  total = 7.09 
## 
## GCV score: 0.1169829

The original GAM formula without ssc can be used by setting the usessc argument to FALSE.

mod3 <- anlz_gam(tomod, trans = 'log10', usessc = FALSE)
mod3
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## value ~ s(cont_year, k = 324)
## 
## Estimated degrees of freedom:
## 169  total = 170.21 
## 
## GCV score: 0.09209277

Fits for each model can be assessed as before using the anlz_fit function.

anlz_fit(mod1)
##             AIC        GCV        R2
## GCV.Cp 15.80171 0.08157124 0.6891917
anlz_fit(mod2)
##             AIC       GCV        R2
## GCV.Cp 253.8026 0.1169829 0.2071176
anlz_fit(mod3)
##             AIC        GCV        R2
## GCV.Cp 48.37136 0.09209277 0.6610841

The plotting functions can also be used as before.

show_prdseries(mod1, ylab = 'Chlorophyll-a (ug/L)')

show_prdseries(mod2, ylab = 'Chlorophyll-a (ug/L)')

show_prdseries(mod3, ylab = 'Chlorophyll-a (ug/L)')

show_prddoy(mod1, ylab = 'Chlorophyll-a (ug/L)')

show_prddoy(mod2, ylab = 'Chlorophyll-a (ug/L)')

show_prddoy(mod3, ylab = 'Chlorophyll-a (ug/L)')