
For data that are normally distributed:
For count or binary data:
Fieberg, J., Rieger, R.H., Zicus, M. C., Schildcrout, J. S. 2009. Regression modelling of correlated data in ecology: subject specific and population averaged response patterns. Journal of Applied Ecology 46:1018-1025.
Research Questions: which types of cylinders are best (single or double)?
Where should they be placed?
Zicus, M.C., J. Fieberg and D. P. Rave. 2003. Does mallard clutch size vary with landscape composition: a different view. Wilson Bulletin 114:409-413.
Zicus, M. C., D. P. Rave, and J. Fieberg. 2006. Cost effectiveness of single- vs. double-cylinder over-water nest structures. Wildlife Society Bulletin 34:647-655.
Zicus, M. C., Rave, D. P., Das, A., Riggs, M. R., and Buitenwerf, M. L. (2006). Influence of land use on mallard nest‐structure occupancy. The Journal of wildlife management, 70(5), 1325-1333.
\(Y_{ij}\) = clutch size for the \(i^{th}\) structure during year \(j\)
\(Init.Date_{ij}\) = nest initiation date (Julian day) for the \(i^{th}\) structure during year \(j\)
\(I(deply=2)_{i}\) = 0 if \(i^{th}\) structure is a single cylinder, 1 if double cylinder
Model: \[Y_{ij} \mid b_{0i} \sim N(\mu_{ij}, \sigma^2_{\epsilon})\] \[\mu_{ij} = (\beta_0 + b_{0i}) + \beta_1Init.Date_{ij} + \beta_2I(deply=2)_i\] \[b_{0i} \sim N(0, \sigma^2_{b_0})\]
Similar to including “structure” as a series of dummy variables with the added assumption that the parameters are drawn from a normal distribution
For normally distributed response data, we can fit correlated data models without having to resort to random effects:
\[Clutch size_{ij} = \beta_0 + \beta_1Init.Date_{ij}+ \beta_2I(deply=2)_i +\epsilon_{ij}\] \[\epsilon \sim N(0, \Omega)\]
\(\Omega\) = Var/Cov matrix for \(\epsilon\). We no longer assume the errors, \(\epsilon_{ij}\), are independent!
\[Clutch size_{ij} = \beta_0 + \beta_1Init.Date_{ij}+ \beta_2I(deply=2)_i+\epsilon_{ij}\]
\[\epsilon \sim N(0, \Omega)\]
Compound symmetric covariance matrix for within-cluster data:
\[\Omega= \begin{bmatrix} \Sigma_i & 0 & \cdots & 0 \\ 0 & \ddots & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \cdots & 0 & \Sigma_i \end{bmatrix} \text{ with } \Sigma_i = \begin{bmatrix} \sigma^2 & \rho\sigma^2 & \cdots & \rho \sigma^2 \\ \rho \sigma^2 & \ddots & \ddots & \vdots \\ \vdots & \ddots & \ddots & \rho \sigma^2 \\ \rho \sigma^2 & \cdots & \rho \sigma^2 & \sigma^2 \end{bmatrix}\]
Correlation between:
This model is equivalent to a linear mixed effects model with random intercepts for each structure! Note: \(\rho = \frac{\sigma^2_{b_0}} {\sigma^2_{b_0}+ \sigma^2_{\epsilon}}\) and \(\sigma^2 = \sigma^2_{b_0}+ \sigma^2_{\epsilon}\).
Unlike the multivariate normal distribution, Poisson, Negative Binomial, Bernoulli, Binomial distributions
2 Options:
Systematic component: \(g(\mu_{ij}) = \eta_{ij} = (\beta_0 + b_{0i}) + (\beta_1 + b_{1i})x_{ij}\)
Random components:
Poisson-normal model:
Logistic-normal model:
Frequentist analysis: the \(b\)’s are unobserved random variables…we do not estimate them! Rather, we estimate \(\beta_0, \beta_1, \Sigma\).
Logistic-normal model:
\[Y_{ij} | b_i \sim Bernoulli(p_{ij})\] \[\log[p_{ij}/(1-p_{ij})] = \beta_0 + b_{0i} + \beta_1VOM_{ij} + \beta_2I(deply=2)_i\] \[b_{0i} \sim N(0, \tau^2)\]
Structures have different “propensities” of being occupied:
To estimate parameters using Maximum Likelihood, we need to determine:
Total law of Probability If events \(B_1, B_2, \ldots, B_k\) are mutually exclusive and together make up all possibilities, then:
\(P(A) = \sum_iP(A|B_i)P(B)\)

To determine the distribution of \(Y_{ij}\), we integrate over the random effects:
\[L(Y_{ij} | \beta, \Sigma) = \int f(Y_{ij} | b_i)f(b_i)db_i\]
\[Y_{ij}|b \sim N(\mu_{ij}, \sigma^2)\] \[\mu_{ij} = X_{ij}\beta + Z_ib\] \[b \sim N(0, \Sigma)\]
If we average over (or integrate out) the random effects (\(b\)), we get the marginal distribution of \(Y\):
\[Y \sim MVN(X\beta, \Omega)\] \[\Omega = Z\Sigma Z^\prime + \sigma^2I\]
\[I= \begin{bmatrix} 1 & 0 & \cdots & 0 \\ 0 & \ddots & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \cdots & 0 & 1 \end{bmatrix}_{(n \times n)}\]
\[Y \sim MVN(X\beta, \Omega)\]
Random Intercept Model:
\[\Omega= \begin{bmatrix} \Sigma_i & 0 & \cdots & 0 \\ 0 & \ddots & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \cdots & 0 & \Sigma_i \end{bmatrix} \text{ with }\] \[\Sigma_i = \begin{bmatrix} \tau^2 + \sigma^2 & \tau^2 & \cdots & \tau^2 \\ \tau^2 & \ddots & \ddots & \vdots \\ \vdots & \ddots & \ddots & \tau^2 \\ \tau^2 & \cdots & \tau^2 & \tau^2 + \sigma^2 \end{bmatrix}\]
GLS Model with compound-symmetric correlation structure:
\[\Omega= \begin{bmatrix} \Sigma_i & 0 & \cdots & 0 \\ 0 & \ddots & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \cdots & 0 & \Sigma_i \end{bmatrix} \text{ with }\] \[\Sigma_i = \begin{bmatrix} \sigma^2 & \rho\sigma^2 & \cdots & \rho \sigma^2 \\ \rho \sigma^2 & \ddots & \ddots & \vdots \\ \vdots & \ddots & \ddots & \rho \sigma^2 \\ \rho \sigma^2 & \cdots & \rho \sigma^2 & \sigma^2 \end{bmatrix}\]
\[Y_{ij} |b_i \sim Bernoulli(p_{ij})\] \[logit(p_{ij}) = \beta_0 + b_{0i} + \beta_{1}x_{ij}\] \[b_{0i} \sim N(0, \sigma^2_{b})\]
\[L(Y_{ij} | \beta, \Sigma) = \int f(Y_{ij} | b_i)f(b_i)db_i\]
\[\int \left[ \frac{exp(\beta_0 + b_{0i} + \beta_1x_{ij})}{1+exp(\beta_0 + b_{0i} + \beta_1x_{ij})}\right]^{Y_{ij}}\left[ \frac{1}{1+exp(\beta_0 + b_{0i} + \beta_1x_{ij})}\right]^{1-Y_{ij}} \frac{1}{\sqrt{2\pi}\sigma_{b}}e^{\frac{(b_{0i}-0)^2}{2\sigma^2_{b}}}db_{0i}\]
No closed-form solution!
How do we use maximum likelihood to estimate parameters?
\[L(Y_{ij} | \beta, \Sigma) = \prod_{i=1}^{n} \int f(Y_{ij} | b_i)f(b_i)db_i\]
Options:
nAGQ
See also mixed_model in the GLMMadaptive package.
Generalized linear mixed effects models are more challenging to fit than linear mixed effects models…and
Parameters in generalized linear mixed effects models have a “subject-specific”, but not “population-average” interpretation.
\[Clutch size_{ij} = (\beta_0 + b_{0i}) + \beta_1Init.Date_{ij} + \beta_2I(deply=2)_i+\epsilon_{ij}\] \[\epsilon_{ij} \sim N(0, \sigma^2)\] \[b_{0i} \sim N(0, \sigma^2_{b_0})\]
Assume \(\epsilon_{ij}\) and \(b_{0i}\) are independent.
How does clutch size vary with nest initiation date and structure type for a “typical” structure (i.e., one with \(b_{0i}\) = 0)?
How does clutch size vary across the population of structures as a function of nest initiation date and structure type?
Fixed effects parameters have both population-averaged and subject-specific interpretations!
\[Y_i | b_i \sim Bernoulli(p_i)\] \[\log[p_{ij}/(1-p_{ij})] = \beta_0 + b_{0i} + \beta_1VOM_{ij} + \beta_2I(deply=2)_i\] \[b_{0i} \sim N(0, \sigma^2_{b_0})\]
The fixed effects parameters in logistic regression models only have a subject-specific interpretation when we transform back to scales of interest!
Note: a subject-specific interpretation may not be meaningful for predictors that are constant within a cluster.
\(E[Y_{ij} | X]\) (red curve) is no longer the same as \(E[Y_{ij}|X, b_{0i}=0]\) (blue curve)!
\[Y_{ij} | b_{0i}, b_{1i} \sim f(y_{ij} | b_{0i}, b_{1i})\] \[(b_{0i}, b_{1i}) \sim N(0, \Sigma), \mbox{ with}\]
\(f(y_i | b_{0i}, b_{1i})\) given by Poisson, binomial, negative binomial.
How can we quantify how \(E[Y|X]\) changes with \(X\) (as opposed to \(E[Y |X, b_{0i}, b_{1i}]\) or \(E[Y |X, b_{0i}= b_{1i}=0]\))?
mixed_model + marginal_coefs in GLMMadaptive package to estimate equivalent “marginal coefficients” (based on Hedeker et al. 2018).Hedeker, D., du Toit, S. H., Demirtas, H. and Gibbons, R. D. (2018), A note on marginalization of regression parameters from mixed models of binary outcomes. Biometrics 74, 354-361.
Readings:
Useful Links
For Normally distributed data:
\(L(\mu, \sigma^2; y_1, y_2, \ldots, y_n) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{(y_i-\mu)^2}{2\sigma^2}\right)\)
With linear regression, we assume \(Y_i \sim N(\mu_i, \sigma^2)\), with \(\mu_i = \beta_0 + X_i\beta_1\) so…
\(L(\beta_0, \beta_1, \sigma; x) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{(y_i-\beta_0+X_i\beta_1)^2}{2\sigma^2}\right) =\frac{1}{(\sqrt{2\pi}\sigma)^n}\exp\left(-\sum_{i=1}^n\frac{(y_i-\beta_0+X_i\beta_1)^2}{2\sigma^2}\right)\)
\(\Rightarrow logL =-nlog(\sigma) - \frac{n}{2}log(2\pi) -\sum_{i=1}^n\frac{(y_i-\beta_0+X_i\beta_1)^2}{2\sigma^2}\)
\(\Rightarrow\) maximizing logL \(\Rightarrow\) minimizing \(\sum_{i=1}^n\frac{(y_i-\beta_0+X_i\beta_1)^2}{2\sigma^2}\)
or, equivalently \(\sum_{i=1}^n(y_i-\beta_0-X_i\beta_1)^2\)
\[\begin{gather} \sum_{i=1}^n (Y_i - E[Y_i|X_i])^2 \newline = \sum_{i=1}^n (Y_i - (\beta_0+X_{1i}\beta_1+\ldots))^2 \end{gather}\]
Least squares leads to the following set of equations for estimating parameters (take the derivative and set = 0):
\[2\sum_{i=1}^n \frac{\partial E[Y_i|X_i]}{\partial \beta}(Y_i -E[Y_i|X_i]) = 0\]
Maximum Likelihood estimators are found by solving:
\(\sum_{i=1}^n \frac{\partial E[Y_i|X_i]}{\partial \beta}V_i^{-1}(Y_i -E[Y_i|X_i]) = 0\).
Logistic Regression:
Poisson Regression:
GEE: \(\hat{\beta}\) solves: \(\sum_{i=1}^n \frac{\partial \mu_i}{\partial \beta}V_i(\alpha)^{-1}(Y_i -E[Y_i|X_i]) = 0\).
\(R_i\) = working correlation model that describes within subject correlation.
GEE: \(\hat{\beta}\) solves: \(\sum_{i=1}^n \frac{\partial \mu_i}{\partial \beta}V_i(\alpha)^{-1}(Y_i -E[Y_i|X_i]) = 0\).
geeglm in geepack libraryEstimates will be asymptotically unbiased (think “large number of clusters”)
Uses “robust” (or “sandwich”) standard errors, treating clusters as independent observational units