| log odds \(= \log\Big(\frac{p}{1-p}\Big)\) | -6.907 | -1.386 | 0.0 | 1.386 | 6.907 |
| odds \(= \frac{p}{1-p}\) | 0.001 | 0.250 | 1.0 | 4.000 | 999.000 |
| p | 0.001 | 0.200 | 0.5 | 0.800 | 0.999 |
Model for binary (0/1) data or binomial data (number of 1’s out of \(n\) trials).
\[Y_i | X_i \sim Bernoulli(p_i)\]
\[logit(p_i) = log\left(\frac{p_i}{1-p_i}\right) = \beta_0 + \beta_1 X_{1,i}+ \ldots \beta_p X_{p,i}\]
Remember, for binary data, \(E[Y_i|X_i] = p_i\), \(Var[Y_i|X_i] = p_i(1-p_i)\)
\(\Rightarrow p_i = \frac{\exp(\beta_0 + \beta_1 X_{1,i}+ \ldots \beta_p X_{p,i})}{1+\exp(\beta_0 + \beta_1 X_{1,i}+ \ldots \beta_p X_{p,i})}\)
(can use plogis function in R)
\[Y_i | X_i \sim Bernoulli(p_i)\]
\[logit(p_i) = log\left(\frac{p_i}{1-p_i}\right) = \beta_0 + \beta_1 X_{1,i}+ \ldots \beta_p X_{p,i}\]
\(\frac{p}{1-p}\) is referred to as the odds.
The link function, \(\log \Big(\frac{p}{1-p}\Big)\), is referred to as logit.
Thus, we can describe our model in the following ways:
If the probability of winning a bet is = 2/3, what are the odds of winning?
odds = \(\frac{p}{1-p}\) = (2/3 \(\div\) 1/3) = 2 (or “2 to 1”).
| log odds \(= \log\Big(\frac{p}{1-p}\Big)\) | -6.907 | -1.386 | 0.0 | 1.386 | 6.907 |
| odds \(= \frac{p}{1-p}\) | 0.001 | 0.250 | 1.0 | 4.000 | 999.000 |
| p | 0.001 | 0.200 | 0.5 | 0.800 | 0.999 |
Odds can vary between 0 and \(\infty\), so log(odds) can live on \(-\infty\) to \(\infty\).
Consider a regression coefficient for a categorical variable:
\[logit(p_i) = log(\frac{p_i}{1-p_i}) = \beta_0 + \beta_1I(group=B)_i\]
\(I(group=B)_i\) = 1 if observation \(i\) is from Group B and 0 if Group A
Consider the ratio of these odds:
\[\frac{\frac{p_B}{1-p_B}}{\frac{p_A}{1-p_A}} = \frac{\exp(\beta_0+\beta_1)}{\exp(\beta_0)} =\frac{e^{\beta_0}e^{\beta_1}}{e^{\beta_0}}= \exp(\beta_1)\]
So, \(\exp(\beta_1)\) gives an odds ratio (or ratio of odds) for Group B relative to group A.
Consider a continuous predictor, \(X\):
\[logit(p_i) = log(\frac{p_i}{1-p_i}) = \beta_0 + \beta_1X_i\]
\(\beta_1\) gives the change in log odds per unit change in \(X\).
Consider the ratio of these odds:
\[\frac{\frac{p_j}{1-p_j}}{\frac{p_i}{1-p_i}} = \frac{\exp(\beta_0+\beta_1(a+1))}{\exp(\beta_0+\beta_1 a)} = \frac{e^{\beta_0}e^{\beta_1a}e^{\beta_1}}{e^{\beta_0}e^{\beta_1 a}}= \exp(\beta_1)\]
So, \(\exp(\beta_1)\), gives the odds ratio for two observation that differ by 1 unit of \(X\).
For multiple predictor models:
The odds is expected to increase by a factor of \(\exp(\beta_i)\) when \(X_i\) increases by 1 unit, and everything else is held constant!
\[logit(p_i) = log(\frac{p_i}{1-p_i}) = \beta_0 + \beta_1X_i\]
The slope coefficient \(\beta_1\) controls how quickly we transition from 0 to 1.
\[logit(p_i) = log(\frac{p_i}{1-p_i}) = \beta_0 + \beta_1x_i\]
The sign of \(\beta_1\) determines if \(p\) increases or decreases as we increase \(X\).
\[logit(p_i) = log(\frac{p_i}{1-p_i}) = \beta_0 + \beta_1X_i\]
\(\beta_0\) controls the height of the curve when \(X=0\).

lm would eventually predict \(p_i \ge 1\) and \(p_i \le 0\)lm assumes constant variance rather than \(var(p_i) = p_i(1-p_i)\)\[Y_i | X_i \sim Bernoulli(p_i)\]
\[logit(p_i) = \log\left(\frac{p_i}{1-p_i}\right) = \beta_0 + \beta_1 voc_i\]
Assumptions:
\(E[Y_i|X_i] = p_i; Var[Y_i|X_i] = p_i(1-p_i)\) with:
\(p_i = \frac{\exp(\beta_0 + \beta_1 voc_i)}{1+\exp(\beta_0 + \beta_1 voc_i)}\)
Call:
glm(formula = observed ~ voc, family = binomial(), data = exp.m)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.759933 0.460137 3.825 0.000131 ***
voc -0.034792 0.007753 -4.487 7.21e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 171.61 on 123 degrees of freedom
Residual deviance: 147.38 on 122 degrees of freedom
AIC: 151.38
Number of Fisher Scoring iterations: 4
Regression coefficient for voc (visual obstruction) = -0.039.
Intercept = 2.12 = log(odds) of detection when VOC = 0.
We see roughly 85% of moose if there is no visual obstruction.
exp.m$year<-as.factor(exp.m$year)
mod2<-glm(observed~voc+year, data=exp.m, family=binomial())
summary(mod2)
Call:
glm(formula = observed ~ voc + year, family = binomial(), data = exp.m)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.453203 0.622248 3.942 8.06e-05 ***
voc -0.037391 0.008199 -4.560 5.11e-06 ***
year2006 -0.453862 0.516567 -0.879 0.3796
year2007 -1.111884 0.508269 -2.188 0.0287 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 171.61 on 123 degrees of freedom
Residual deviance: 142.23 on 120 degrees of freedom
AIC: 150.23
Number of Fisher Scoring iterations: 4
Year 2005: \(log(p_i/(1-p_i)) = 2.45 -0.037VOC\)
Year 2006: \(log(p_i/(1-p_i)) = 2.45 -0.037VOC -0.45\)
So,-0.45 gives the difference in log odds between years 2005 and 2004 (if we hold VOC constant).
exp(-0.45) = 0.63 = odds ratio (year 2006 to year 2005)
odds ratio = \(\frac{p_{2006}/(1-p_{2006})}{p_{2005}/(1-p_{2005})} = 0.63\)
The estimates of \(\beta\) are maximum likelihood estimates, found by maximizing:
\[L(\beta; y, x) = \prod_{i=1}^{n}p_i^{y_i}(1-p_i)^{1-y_i}, \text{ with}\]
\[p_i = \frac{e^{\beta_0 + \beta_1x_1 + \ldots \beta_kx_k}}{1+e^{\beta_0 + \beta_1x_1 + \ldots \beta_kx_k}}\]
Remember, for large samples, \(\hat{\beta} \sim N(\beta, \Sigma)\).
We can use this theory to conduct tests (z-statistics and p-values in output by the summary function) and to get confidence intervals.
Generate confidence intervals for logit(\(p\)), then back-transform to get confidence intervals for \(p\)
If confidence limits for \(\beta\) include 0 or confidence limits for \(exp(\beta)\) include 1, then we do not have enough evidence to say that years differ in their detection probabilities.
(Intercept) voc year2006 year2007
2.45320264 -0.03739118 -0.45386154 -1.11188432
(Intercept) voc year2006 year2007
0.622247867 0.008199483 0.516567443 0.508269279
[1] 0.3517145 2.8826021
95% Confidence interval for odds ratio = (0.35, 2.88) includes 1 (not statistically significant)
2.5 % 97.5 %
(Intercept) 1.30341777 3.7586692
voc -0.05448153 -0.0221268
year2006 -1.48479529 0.5516852
year2007 -2.14380706 -0.1382692
These are profile-likelihood based confidence intervals based on “inverting” the likelihood ratio test (see Maximum Likelihood notes).
Group Observations by deciles of their predicted values to form groups, then calculate the expected and observed number of successes and failures for each group.
\[\chi^2 = \sum_{i=1}^{n_g}\frac{(O_i-E_i)^2}{E_i} \sim \chi^2_{g-2}\]
where \(g\) = number of groups.
Can adapt our general approach for testing goodness-of-fit using Pearson residuals (\(r_i\))
\[r_i = \frac{Y_i-E[Y_i|X_i]}{\sqrt{Var[Y_i|X_i]}}\]
See textbook for an implementation of this test…
We can again use difference in deviences (equivalent to likelihood ratio tests) to compare full and reduced models.
Single term deletions
Model:
observed ~ voc + year
Df Deviance AIC LRT Pr(>Chi)
<none> 142.23 150.23
voc 1 168.20 174.20 25.9720 3.464e-07 ***
year 2 147.38 151.38 5.1558 0.07593 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
voc is an important predictor, the importance of year is less clear.
Or use Anova in car package
We can compare nested or non-nested models using the AIC function
We can also summarize models by getting predicted values: \(P(\mbox{detect animal} | \mbox{voc})\):
We can use predict(model, newdata=, type="link", se=TRUE) to get predictions on logit scale.
Then use plogis(p.hat$fit +/- 1.95*p.hat$se.fit) to transform the limits back to the probability scale.
Model 2: observed \(\sim\) voc + year is additive on the logit scale
See: Section 16.6.3 in the book
effects package or ggeffects to do something similarUsing effects or ggeffects package:
ggeffectsInstead of averaging predictions across years, we could set year to a specific value. This leads to adjusted plots.
Will use a similar structure as we used for count models:
Gelman’s recommendations (see arxiv.org/pdf/0901.4011.pdf):
In class exercise: adapt the JAGS code for fitting mod1 (voc only) to allow fitting of mod2 (voc + year).