In this document, we illustrate the R package aceDLNM
(https://github.com/tianyi-pan/aceDLNM) for Adaptive Cumulative Exposure Distributed Lag Non-Linear Models (ACE-DLNM).
# install.packages('devtools')
# devtools::install_github('tianyi-pan/aceDLNM')
library(aceDLNM)
library(mgcv)
Let \(Y_t\) be the outcome count at discrete time point \(t \in \left\{1, 2, \cdots, N\right\}\). Let \(X\) be a continuous exposure process, with collected value \(x_t\) at time point \(t\). Denote \(\boldsymbol{z}_t \in \mathbb{R}^p\) as the vector of covariates. We assume \(Y_t | X, \boldsymbol{z}_t\) independently follow a negative binomial distribution \(\mathrm{NB}(\mu_t, \theta)\) with mean \(\mu_t\), where \(\theta\) represents the dispersion parameter. The probability mass function (pmf) is given by \[ p(y_t;\mu_t,\theta) = \frac{\Gamma(y_t + \theta)}{y_t! \Gamma(\theta)} \left(\frac{\mu_t}{\theta + \mu_t}\right)^y \left(\frac{\theta}{\theta + \mu_t}\right)^\theta. \] Then, the variance has the form of \(\mathrm{var}\left(Y_t | X, \boldsymbol{z}_t\right) = \mu_t+ \mu_t^2/\theta\).
An ACE-DLNM is specified as \[ \begin{aligned} Y_t | X, \boldsymbol{z}_t \sim & \mathrm{NB}\left(\mu_t, \theta\right),\\ \log(\mu_t) =& s(t; X) + \boldsymbol{h}(\boldsymbol{z}_{t}), \end{aligned} \] where \(\boldsymbol{h}(\boldsymbol{z}_{t}) = \sum_{j=1}^p h_j (\boldsymbol{z}_{tj})\) and \(h_j(\boldsymbol{z}_{tj})\) are \(p\) unknown smooth functions of covariate \(\boldsymbol{z}_{tj}\), \(j = 1, \cdots, p\).
We formulate \(s(t; X)\), the effect of the exposure process \(X\) at time \(t\), as follows, \[ s(t;X) = f \left\{\int_{0}^{L} w(l) X(t-l) dl\right\}. \] Define \(E(t) \equiv \int_{0}^{L} w(l) X(t-l) dl\) the ACE at (continuous) time \(t\), and \(f\) the ACERF.
We use the function aceDLNM::GenerateData
to simulate a dataset, to illustrate the main features of aceDLNM
package.
The exposure \(x_t\) is the PM2.5 concentration in Waterloo. Some weight functions and ACERFs are pre-specified in GenerateData()
. The maximum lag \(L\), dispersion parameter \(\theta\) and the number of observations are specified by argument maxL
, theta
, and Nt
. A long-term trend function \(h(t)\) is considered. That is, the mean model is, \[ \log(\mu_t) = s(t; X) + h(t) \] We generate the dataset using the following code.
set.seed(12345)
maxL <- 14 # max lag: two week
theta <- 8 # dispersion parameter; var = mu + mu^2/8
Nt <- 1000 # sample size: 1000
wltype <- "type1" # weight function
fEtype <- "cubic" # ACERF
ht <- function(x) sin(x/150) # h(t)
ttmp <- 1:(maxL+Nt)
ttmp <- ttmp[-(1:maxL)]
meanht <- mean(ht(ttmp))
otherterm <- data.frame(trend = ht(ttmp) - meanht)
dat.gen <- GenerateData(fEtype = fEtype, wltype = wltype,
Nt = Nt,
theta = theta,
maxL = maxL,
other = otherterm # for other covariates z
)
dat <- data.frame(x = dat.gen$x,
t = dat.gen$t,
y = dat.gen$y)
To fit the ACE-DLNM, we use the function aceDLNM()
. The distributed lag term and response is in argument formula
, using the build-in function sX()
. The long-term trend function \(h(t)\) is fitted by a B-spline function of \(t\), specified by mgcv::s
. kw
and kE
are the numbers of spline coefficients for \(w\) and \(f\).
mod.fit <- aceDLNM(formula = y~sX(t, x),
smooth = ~s(t, bs = "bs", k = 20),
dat = dat,
kw = 20,
kE = 20,
maxL = 14,
verbose = FALSE)
If verbose = TRUE
, the progress and diagnostic information will be print, such as the gradients and function values in inner and middle optimizations. For example,
# parameters: 2.054733 3.484608 8.157696 5.458378
# * Start optimize profile likelihood
# -- AlphaF Gradient Max: 5.02176e-12
# -- AlphaF Gradient Max: 9.19265e-13
# -- AlphaF Gradient Max: 9.19265e-13
# -- AlphaF Gradient Max: 7.91697e-09
# -- AlphaF Gradient Max: 7.91697e-09
# -- AlphaF Gradient Max: 4.49796e-12
# -- AlphaF Gradient Max: 4.49796e-12
# -- AlphaF Gradient Max: 1.26299e-12
# -- AlphaF Gradient Max: 1.26299e-12
# * Finish middle opt. Profile Likelihood Gradient Max: 3.28311e-07
# LAML: 1884.214
#
# parameters: 2.054733 3.484608 8.157696 5.458378
# Gradient: -0.001849584 -0.001983532 -0.0002105075 0.002368254
#
# parameters: 2.054751 3.485166 8.157844 5.457864
parameters: 2.054733 3.484608 8.157696 5.458378
are the current values of \(\theta\) and smoothing parameters in outer optimization. Under the parameters, middle optimization (for \(w\)) is implemented where AlphaF, the spline coefficients for \(f\), are found by inner optimization. -- AlphaF Gradient Max:
reports the final gradient of inner opt and * Finish middle opt. Profile Likelihood Gradient Max:
report the final gradient of middle optimization. Then, the LAML LAML: 1884.214
and gradient Gradient: -0.001849584 -0.001983532 -0.0002105075 0.002368254
are evaluated, which lead to the next BFGS iteration for outer optimization.
The time of model fitting (in seconds) can be found in mod.fit$opttime
,
mod.fit$opttime
#> [1] 16.10967
Let’s now summarize the fitted ace-DLNM. The object from aceDLNM()
is a S3 class.
class(mod.fit)
#> [1] "aceDLNM_fit"
We introduce the S3 functions summary
and residuals
.
summary()
The function summary
is to report the estimated function. The estimates are reported in a dataframe and can also be visualized. We provide the true function in data generating.
true.function <- dat.gen$true.f
true.function$smooth <- function(x,var){
if(var == "t") ht(x) - meanht
}
summ <- summary(mod.fit, plot = TRUE, true.function = true.function)
The figures for ACERF \(f\), weight function \(w\) and long-term trend \(h\) are reported. The blue curves are the true functions. The point estimates and CIs are demonstrated by solid lines and dashed lines.
The AIC of the fitted model is calculated and save in summ$AIC
for model comparison,
## l: log likelihood.
## edf0: effective degree of freedom, for conditional AIC,
# equivalent to the edf in:
# Hastie, T., and Tibshirani, R. (1990),
# Generalized Additive Models, London: Chapman & Hall.
## edf: effective degree of freedom, for heuristic alternative,
# considering the uncertainty in smoothing parameters.
## (coditional) AIC = -2*summ$AIC$l + 2*summ$AIC$edf0
## AIC = -2*summ$AIC$l + 2*summ$AIC$edf
##
## see details:
# Wood S N, Pya N, Säfken B. Smoothing parameter and model selection
# for general smooth models[J]. JASA, 2016, 111(516): 1548-1563.
summ$AIC
#> $AIC
#> [1] 5615.447
#>
#> $l
#> [1] -2778.793
#>
#> $edf
#> [1] 28.92995
#>
#> $edf0
#> [1] 24.50617
residuals()
We provide a function residuals
for randomized quantile residuals diagnostic.
res <- residuals(mod.fit, plot = TRUE)