males<-c(120, 107, 110, 116, 114, 111, 113, 117, 114, 112)
females<-c(110, 111, 107, 108, 110, 105, 107, 106, 111, 111)
jawdat <- data.frame(jaws = c(males, females),
sex = c(rep("M",10), rep("F", 10)))We covered 2 main ways to code for categorical variables:
Reference coding (the book also calls this effects coding following Marc Kery’s book, Introduction to Winbugs for Ecologists)
Means coding
Several students used a third type of coding, sometimes ALSO called effects coding (not to be confused with reference/effects coding in the book).
Call:
lm(formula = jaws ~ sex, data = jawdat)
Residuals:
Min 1Q Median 3Q Max
-6.4 -1.8 0.1 2.4 6.6
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 108.6000 0.9741 111.486 < 2e-16 ***
sexM 4.8000 1.3776 3.484 0.00265 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.08 on 18 degrees of freedom
Multiple R-squared: 0.4028, Adjusted R-squared: 0.3696
F-statistic: 12.14 on 1 and 18 DF, p-value: 0.002647
Call:
lm(formula = jaws ~ sex - 1, data = jawdat)
Residuals:
Min 1Q Median 3Q Max
-6.4 -1.8 0.1 2.4 6.6
Coefficients:
Estimate Std. Error t value Pr(>|t|)
sexF 108.6000 0.9741 111.5 <2e-16 ***
sexM 113.4000 0.9741 116.4 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.08 on 18 degrees of freedom
Multiple R-squared: 0.9993, Adjusted R-squared: 0.9992
F-statistic: 1.299e+04 on 2 and 18 DF, p-value: < 2.2e-16
Call:
lm(formula = jaws ~ sex, data = jawdat, contrasts = list(sex = contr.sum))
Residuals:
Min 1Q Median 3Q Max
-6.4 -1.8 0.1 2.4 6.6
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 111.0000 0.6888 161.150 < 2e-16 ***
sex1 -2.4000 0.6888 -3.484 0.00265 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.08 on 18 degrees of freedom
Multiple R-squared: 0.4028, Adjusted R-squared: 0.3696
F-statistic: 12.14 on 1 and 18 DF, p-value: 0.002647
The methods/tools from the last class (getting confidence and prediction intervals for GLS models) are also useful here (and elsewhere)!
Estimate species richness (when NAP = 0) in week 4: \(\hat{\beta}_0 - \hat{\beta}_{weekf1} - \hat{\beta}_{weekf2} - \hat{\beta}_{weekf3}\). Let X = [1 0 -1 -1 -1]. Then:
In general, when writing down an equation to describe a model it is important to:
\[Y_i = \beta_0 + \beta_1flipperln_i + \beta_2I(sex = male)_i + \epsilon_i\] \[\epsilon_i \sim N(0, \sigma^2)\]
\[Y_i \sim N(\mu_i, \sigma^2)\] \[\mu_i = \beta_0 + \beta_1flipperln_i + \beta_2I(sex = male)_i\]
If I ask for equations to give predictions, that is a little different. In this case, it is reasonable to:
^ to the \(\beta\)’s, and drop the \(\epsilon_i\).\(\widehat{Body mass}_i = \hat{\beta}_0 + \hat{\beta}_1flipperln_i + \hat{\beta}_2\) (if the prediction is for a male)