Gain an appreciation for challenges associated with selecting among competing models and performing multi-model inference.
Understand common approaches used to select a model (e.g., stepwise selection using p-values, AIC, Adjusted \(R^2\)).
Understand the implications of model selection for statistical inference.
Gain exposure to alternatives to traditional model selection, including full model inference (df spending), model averaging, and penalized likelihood/regularization techniques.
Be able to evaluate evaluate model performance using cross-validation and model stability using the bootstrap.
Be able to choose an appropriate modeling strategy, depending on the goal of the analysis (describe, predict, or infer).
Linear Models Mid-term
Goals were largely to demonstrate:
Strategy I took:
Figure 1: My mid-tem exam from my Linear Models class.
Nested = can get from one model to the other by setting one or more parameters = 0.
We can compare nested models using…
And, nested or non-nested models using…
Problem 1: different (arbitrary) criteria often point to different models as “best.”
Problem 2: the importance of a variable may depend on what else is in the model!
If we have several potential predictors, how can we decide on a best model?
One method: Backward Elimination
The stepAIC function in the MASS library will do this for us.
Can also do “forward selection” (or fit all possible models “all subsets”)
Heinze et al. (2017) suggest using “augmented backwards elimination”
Eliminate variables using significant testing, but only drop a variable if it does not lead to large changes in other regression coefficients.
Available in abe package (I have not used, but am intrigued…)
Problem 4: p-values, confidence intervals, etc assume the model has been pre-specified. Measures of fit, after model selection, will be overly optimistic.
Problem 5: no guarantee that the ‘best-fit’ model is actually the most appropriate model for answering your question.
Harrel et al. 2001. Regression Modeling Strategies:
Copas and Long,” the choice of the variables to be included depends on estimated regression coefficients rather than their true values, and so \(X_j\) is more likely to be included if its regression coefficient is over-estimated than if its regression coefficient is underestimated.”
Stepwise methods often select noise variables rather than ones that are truly important.
Problems get worse as you consider more candidate predictors and as these predictors become more highly correlated.
Why do variable selection?
Often sensible to determine how many ‘degrees of freedom’ you can spend, spend them, and then don’t look back.
Increases the chance that the model will fit future data nearly as well as the current data set (versus overfitting current data)
How likely are you to end up with the same model if you collect another data set of the same size and apply the same model selection algorithm.
can report bootstrap inclusion frequencies (how often variables are selected in final models)
may want to also report “competitive models” (others that are frequently chosen)
don’t trust a single model unless it is almost always chosen
Can also use the bootstrap to get a more honest measure of fit.
Rather than choose a best model, another approach is to average predictions among “competitive” models or models with roughly equal “support”.
Steps:
\[w_i= \frac{\exp(-\Delta AIC_i)}{\sum_{k=1}^K\exp(-\Delta AIC_k)}\]
where \(\Delta AIC_i\) = \(min_k(AIC_k) - AIC_i\) (difference in AIC between the “best” model and model \(i\))
Use \(AIC\) weights to calculate a weighted average prediction:
\[\hat{\theta}_{avg}=\sum_{k=1}^K w_k\hat{\theta}_k\]
Calculate a standard error that accounts for model uncertainty and sampling uncertainty:
\[\widehat{SE}_{avg} = \sum_{k=1}^K w_k\sqrt{SE^2(\hat{\theta}_k)+ (\hat{\theta}_k-\hat{\theta}_{avg})^2}\]
Typically, 95% CIs are formed using \(\hat{\theta}_{avg} \pm 1.96\widehat{SE}_{avg}\), assuming that \(\hat{\theta}_{avg}\) is normally distributed.
Burnham, Kenneth P., and David R. Anderson. Model selection and multimodel inference: a practical information-theoretic approach. Springer, 2002.
For a counter-point, see (optional reading on Canvas):
Cade, B. S. (2015). Model averaging and muddled multimodel inferences. Ecology, 96(9), 2370-2382.