
Understand the role of random variables and common statistical distributions in formulating modern statistical regression models.


\[Y_i \sim N(\mu_i, \sigma^2)\] \[\mu_i = \beta_0+\beta_1X_i\]
Instead of errors, think about the normal distribution as a data-generating mechanism:
Replace the normal distribution as the data-generating mechanism with another probability distribution, but which one?
It depends on the characteristics of the data
Before we learn about different distributions, we need to know a bit more about how we measure probabilities!
A random variable is a mapping that takes us from random events to random numbers.
Discrete Random Variables can take on a finite (or countably infinite1) set of possible values.
Continuous Random variables take on random values within some interval.
A probability mass function, \(p(x)\), assigns a probability to each value that a discrete random variable, \(X\), can take on.
Probability mass function:
| x | p(x) |
| 0 | 1/4 |
| 1 | 1/2 |
| 2 | 1/4 |
Note: for any probability mass function \(\sum p(x) = 1\)
Many categorical response variables can take on only one of two values (dead or alive, migrated or not, etc.)
A Bernoulli random variable, \(X\), maps the two possibilities to the numbers {0, 1} with probabilities \(1-p\) and \(p\), respectively.
| x | p(x) |
| 0 | 1-p |
| 1 | p |
\[X \sim Bernoulli(p)\]
Probability mass function:
\[p(x) = P(X = x) = p^x(1-p)^{1-x}\]
The mean for a discrete random variable with probability function, p(x), is given by:
\[E[X] = \sum_{x} xp(x)\]
\[X \sim Bernoulli(p)\]
| x | p(x) |
| 0 | 1-p |
| 1 | p |
The variance for a discrete random variable with probability function, p(x), and mean \(E[x]\) is given by:
\(Var(X) = E(X-E(X))^2 = \sum \limits_{x} (x-E[x])^2p(x) = E[x^2]-(E[x])^2\)
The standard deviation is \(\sigma=\sqrt{Var(x)}\)
\[X \sim Bernoulli(p)\]
| x | p(x) |
| 0 | 1-p |
| 1 | p |
\[X \sim N(\mu, \sigma^2)\]
\[f(x) = \frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)\]
Probabilities are measured in terms of areas under the curve, \(f(x)\):

\[P(-1.96 < X < 1.96) = \int_{-1.96}^{1.96} f(x)dx = 0.95\]
Probability density function, \(f(X)\)
Cumulative distribution function, \(F(X) = P(X \le x)= \int_{-\infty}^{x}f(x)dx\)
Replace sums with integrals!
Mean: \(E[X] = \mu = \int_{-\infty}^{\infty}xf(x)dx\)
Variance: \(Var(X) = \int_{-\infty}^{\infty}(x-E[X])^2f(x)dx\)
For each probability distribution in R, there are 4 basic probability functions, starting with either - d, p, q, or r:
d is for “density” and returns the value of f(x) - probability density function (continuous distributions) - probability mass function (discrete distributions).
p is for “probability”; returns a value of F(x), cumulative distribution function.
q is for “quantile”; returns a value from the inverse of F(X); also know as the quantile function.
r is for “random”; generates a random value from the given distribution.
Use this graph, and R help functions if necessary, to complete Exercise 9.1 in the companion book.
\[f(x) = \frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)\]
Parameters:
Characteristics:
\(X\) can take on any value (i.e., the range goes from \(-\infty\) to \(\infty\)) …
R normal functions: dnorm, pnorm, qnorm, rnorm.
JAGS: dnorm
Other notes:
CLT: if we sum a lot of independent things, then we get a normal distribution.
If we multiply a lot of independent things, we get a log-normal distribution, since:
\[log(X_1X_2\cdots X_n) = log(X_1)+log(X_2)+\ldots log(X_n)\]
Possible examples in biology? Population dynamic models
Explore briefly in R:
Compare to the expressions for the mean and variance as a function of (\(\mu, \sigma\)):
\[X \sim Bernoulli(p)\]
\[f(x) = P(X = x) = p^X(1-p)^{1-x}\]
A binomial random variable counts the the number of “successes” (any outcome of interest) in a sequence of trials where
Formally, a binomial random variable arises from a sum of independent Bernoulli random variables, each with parameter, \(p\):
\[Y = X_1+X_2+\ldots X_n\]
For a binomial random variable with n trials and probability of success p on each trial, the probability of exactly k successes in the n trials is:
\(P(x = k) ={n \choose k}p^k(1-p)^{n-k}\)
\({n \choose k} = \frac{n!}{k!(n-k)!}\) with \(n!\) = \(n(n-1)(n-2) \cdots (2)1\)
size = \(n\) and prob = \(p\) when using these functions.Raymond Felton’s free throw percentage during the 2004-2005 season at North Carolina was 70%. If we assume successive attempts are independent, what is the probability that he would hit at least 4 out of 6 free throws in 2005 Championship Game (he hit 5)?

\(P(X \ge 4) = P(X=4) + P(X=5) + P(X=6)\)
\(= {6 \choose 4}0.7^{4}0.3^2 + {6 \choose 5}0.7^{5}0.3^1 + 0.7^{6}\)
\[X \sim Multinomial(n, p_1, p_2, \ldots, p_k)\]
\(X = (x_1, x_2, \ldots, x_k)\) a multivariate random variable recording the number of events in each category
If \((n_1, n_2, \ldots, n_k)\) is the observed number of events in each category, then:
\(P((x_1, x_2, \ldots, x_k) = (n_1, n_2, \ldots, n_k)) = \frac{n!}{n_1!n_2! \cdots n_k!}p_1^{n_1}p_2^{n_2}\cdots p_k^{n_k}\)
Let \(N_t\) = number of events occurring in a time interval of length \(t\). What is the probability of observing \(k\) events in this interval?
\[P(N_t = k) = \frac{\exp(-\lambda t)(\lambda t)^k}{k!}\]
Events in 2-D space, if events occur at a constant rate, the probability of observing \(k\) events in an area of size \(A\):
\[P(N_A = k) = \frac{\exp(-\lambda A)(\lambda A)^k}{k!}\]
If \(A\) or \(t\) is constant:
\[P(N = k) = \frac{\exp(-\lambda )(\lambda)^k}{k!}\]
lambda.Examples:
Suppose a certain region of California experiences about 5 earthquakes a year. Assume occurrences follow a Poisson distribution. What is the probability of 3 earthquakes in a given year?
Number of failures until you get your first success.
\[f(x) = P(X = x) = (1-p)^xp\]
\(X_r\) = Number of failures, \(x\), before you get \(r\) successes; \(X_r \sim\) NegBinom(\(p\))
\(P(X = x) = {x+r-1 \choose x}p^{r-1}(1-p)^xp\)
or
\(P(X = x) = {x+r-1 \choose x}p^{r}(1-p)^x\)
Express \(p\) in terms of mean, \(\mu\) and \(r\):
\[\mu = \frac{r(1-p)}{p} \Rightarrow p = \frac{r}{\mu+r} \text{ and }\]
\[1-p = \frac{\mu}{\mu+r}\]
Plugging these values in to \(f(x)\) and changing \(r\) to \(\theta\), we get:
\(P(X = x) = {x+\theta-1 \choose x}\left(\frac{\theta}{\mu+\theta}\right)^{\theta}\left(\frac{\mu}{\mu+\theta}\right)^x\)
Then, let \(\theta\) = dispersion parameter take on any positive number (not just integers as in the original parameterization)
\(P(X = x) = {x+\theta-1 \choose x}\left(\frac{\theta}{\mu+\theta}\right)^{\theta}\left(\frac{\mu}{\mu+\theta}\right)^x = \frac{(x+\theta-1)!}{x!(\theta-1)!}\left(\frac{\theta}{\mu+\theta}\right)^{\theta}\left(\frac{\mu}{\mu+\theta}\right)^x\)
prob = p, size = \(n\)) or (mu = \(\mu\), size = \(\theta\))Overdispersed relative to Poisson (Var(x)/E[x] = 1 + \(\frac{\mu}{\theta}\)) versus 1 for Poisson
Poisson is a limiting case (when \(\theta \rightarrow \infty\))
Its appeal for use as a probability generating mechanism in ecology includes the following.
If: \(X_i \sim\) Poisson(\(\lambda_i\)), with \(\lambda_i \sim\) Gamma(\(\alpha,\beta\)), then \(X_i\) has a negative binomial distribution.
If observations are equally likely within an interval (A,B):
\[f(x) = \frac{1}{b-a}\]
\[f(x) = \frac{1}{\Gamma(\alpha)}x^{\alpha-1}\beta^\alpha\exp(-\beta x)\]
\[f(x) = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1}(1-x)^{\beta-1}\]
\[f(x) = \lambda \exp(-\lambda x)\]
How do we choose an appropriate distribution for our data?
For a diagram showing links between distributions, see:
Diagram of distribution relationships
Want to visualize different statistical distributions, check out this link.
Note that some can be written in multiple ways:
For example, gamma:
\[f(x) = \frac{1}{\Gamma(\alpha)}x^{\alpha-1}\beta^\alpha \exp(-\beta x)\]
\[f(x) = \frac{1}{\Gamma(\alpha)\beta^\alpha}x^{\alpha-1}\exp(-x/\beta)\]