In this lecture we will introduce the main principles of working with count data, and how to model these using generalized linear models (GLMs). We focus on introducing the concept of generalized linear models, and how to interpret its results. We touch briefly upon statistical inference, providing the main results rather than the theory behind it, such that they can be applied to genomics data analysis.
set.seed(11)
y1 <- rpois(n=500, lambda=10)
y2 <- rpois(n=500, lambda=100)
par(mfrow = c(1,2))
hist(y1, main="Poisson(10)", breaks=40)
hist(y2, main="Poisson(100)", breaks=40)
Take a minute to consider the following question:
Let’s approximate the uncertainty in \(\beta_g\) and \(\beta_k\) using simulation:
N <- 1e3
beta_g <- beta_k <- vector(length=N)
for(ii in 1:N){
ygr1 <- rpois(n=3, lambda=10)
ygr0 <- rpois(n=3, lambda=2)
ykr1 <- rpois(n=3, lambda=100)
ykr0 <- rpois(n=3, lambda=20)
beta_g[ii] <- mean(ygr1) / mean(ygr0)
beta_k[ii] <- mean(ykr1) / mean(ykr0)
}
par(mfrow=c(1,2), mar=c(4,2,3,1))
hist(beta_g, breaks=seq(0,50,by=1), xlim=c(0,50))
hist(beta_k, breaks=seq(0,50,by=1), xlim=c(0,50))
We clearly see that the uncertainty on \(\beta_k\) is much lower than on \(\beta_g\). Even though the variance on the counts of gene \(k\) is higher, since its mean is higher and it is distributed as a Poisson variable. How do we explain this?
## [1] 0.1
## [1] 0.3162278
Just like we have modeled protein abundances in the proteomics module of this course in order to assess differential protein abundance, we can model gene expression counts to identify genes with differences in average expression between groups of samples.
\[ \left\{ \begin{array}{ccc} Y_i & = & \beta_0 + \beta_1 X_i + \epsilon_i \\ Y_i | X_i & \sim & N(\beta_0 + \beta_1 X_i, \sigma^2 \mathbf{I}). \end{array} \right. \]
For linear models, we can derive an equivalent estimator for \(\beta\) using maximum likelihood estimation as we had derived in our recap lecture using least squares estimation. We can define a linear model as
\[ Y_i \sim N(\mu_i, \sigma^2\mathbf{I}) \\ \mu_i = \mathbf{X}_i \mathbf{\beta} \]
The likelihood function of the data is the product of the likelihoods of each datum. Since we are assuming a Gaussian distribution, we use the Gaussian probability density function:
\[ L(\mathbf{Y}; \beta, \sigma) = \prod_{i=1}^n \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left\{ - \frac{(Y_i - \mathbf{X}_i \beta)^2}{2 \sigma^2} \right\} \]
Log-likelihood function
\[ \ell(\mathbf{Y}; \beta, \sigma) = \sum_{i=1}^n \left\{ -\frac{1}{2} \log(2\pi \sigma^2) - \frac{1}{2\sigma^2} (Y_i - \mathbf{X}_i \beta)^2 \right\} \]
Score function is the derivative of the log-likelihood
\[ S(\mathbf{\beta}) = \frac{\partial \ell(\mathbf{Y}; \beta, \sigma)}{\partial \beta} = \sum_{i=1}^n \frac{1}{\sigma^2} \mathbf{X}_i (Y_i - \mathbf{X}_i \beta) \]
Set to zero and solve
\[ \mathbf{X}^T\mathbf{Y} - \mathbf{X}^T \mathbf{X} \mathbf{\beta} = \mathbf{0} \\ \rightarrow \widehat{\mathbf{\beta}} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T\mathbf{Y} \]
which gives us exactly the same estimator as we had derived using least squares!
Now that we know how to use maximum likelihood for parameter estimation, we can also apply it to estimate the parameters of a generalized linear model. Let’s try it for the Poisson GLM we have just introduced:
\[ \left\{ \begin{array}{ccc} Y_i & \sim & Poi(\mu_i) \\ \log \mu_i & = & \eta_i \\ \eta_i & = & \mathbf{X}^T_i \beta \\ \end{array} \right. \]
Likelihood function of the Poisson distribution
\[ L(Y_i ; \mu) = \prod_{i=1}^n \frac{e^{-\mu} \mu^{Y_i}}{Y_i!} \]
Log-likelihood function
\[ \ell(Y_i ; \mu) = \sum_{i=1}^n - \mu + Y_i \log(\mu) - \log (Y_i!) \]
Note that the score function is the derivative of the log-likelihood with respect to our parameter of interest, \(\beta\). So let’s first rewrite our log-likelihood as a function of our parameter of interest. We know from the model that \(\mu_i = \exp(\mathbf{X}_i \mathbf{\beta})\).
\[ \ell(Y_i ; \beta) = \sum_{i=1}^n - \exp(\mathbf{X}_i \mathbf{\beta}) + Y_i (\mathbf{X}_i \mathbf{\beta}) - \log (Y_i!) \]
The score function then equals
\[ S(\mathbf{\beta}) = \frac{\partial \ell(\mathbf{Y}; \beta)}{\partial \beta} = \sum_{i=1}^n -\mathbf{X}_i^T \exp(\mathbf{X}_i \mathbf{\beta}) + \mathbf{X}_i^TY_i = -\mathbf{X}^T \exp(\mathbf{X} \mathbf{\beta}) + \mathbf{X}^T \mathbf{Y} \]
Set to zero and solve
\[ \mathbf{X}^T \mathbf{Y} = \mathbf{X}^T \exp(\mathbf{X} \mathbf{\beta}) \]
However, since this is a non-linear equation in \(\beta\), we cannot find a closed-form solution! You may see this more clearly when writing out in non-matrix form
\[ \sum_i \sum_p x_{ip} \exp(x_{ip} \beta_p) = \sum_i \sum_p x_{ip} Y_i. \]
Figure: Finding the root of the score function using Newton-Raphson optimization. The Figure shows estimation of a single \(\beta\) parameter. The black solid line is the Score function evaluated at \(\beta\). An initial estimate of \(\beta\) is 2.25, which is represented by the dotted line. The value of the score function of this initial value is \(S(\beta^k)\). The first derivative of the score function at that point, evaluated at \(\beta = 2.25\), is represented by the solid blue line and is given by \(\frac{\partial S(\beta)}{\partial \beta}\). The value of \(\beta\) where the solid blue line crosses zero is the new estimate for \(\beta\), namely \(\beta^{k+1}\) which has a value of 1.4. The difference between \(\beta^{k+1}\) and \(\beta^{k}\) is given by \(\left \{ \frac{ \partial S(\beta)}{ \partial \beta} \right \}^{-1} S(\beta^k)\). This procedure is iterated until a convergence in the \(\beta\) estimate is met.
R
In order to get familiar with GLMs, we will fit a Poisson GLM in R
, using the Bikeshare
dataset as part of the ISLR2
package. This dataset records how many bikes were being used from a bike-sharing service, every hour of the day over a full year (365 days).
Full information of the dataset is provided here. Variables of interest for us are:
bikers
: Discrete count variable; the number of bikes being used that hour.
hum
: Continuous variable ranging between 0 and 1; normalized humidity.
hr
: Categorical variable between 0 and 23; the hour of the day. One could also consider this variable to be numeric and model it as such, but the data exploration will show that’s not appropriate.
weathersit
: Categorical variable; the weather condition of that hour, with
# if ISLR2 isn't installed, install it:
if(!"ISLR2" %in% installed.packages()[,1]){
install.packages("ISLR2")
}
## Installing package into '/Users/runner/work/_temp/Library'
## (as 'lib' is unspecified)
##
## The downloaded binary packages are in
## /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T//RtmpmpbBq0/downloaded_packages
plot(bikers ~ hum, data=Bikeshare, pch=16, cex=1/2,
xlab = "Humidity", ylab = "Bikers")
loHum <- loess(bikers ~ hum, data=Bikeshare)
xGrid <- seq(0, 1, length=50)
yhat <- predict(loHum, data.frame(hum = xGrid))
lines(x=xGrid, y=yhat, col="red", lwd=3)
plot(log1p(bikers) ~ hum, data=Bikeshare, pch=16, cex=1/2,
xlab = "Humidity", ylab = "Log (bikers +1)")
loHum <- loess(log1p(bikers) ~ hum, data=Bikeshare)
xGrid <- seq(0, 1, length=50)
yhat <- predict(loHum, data.frame(hum = xGrid))
lines(x=xGrid, y=yhat, col="red", lwd=3)
# association with hour on count and log scale
barplot(table(Bikeshare$hr), xlab="Hour of day", ylab="Number of observations")
plot(log1p(bikers) ~ hr, data=Bikeshare, pch=16, cex=1/2,
xlab = "Hour of day", ylab = "Log (bikers +1)")
The data exploration shows that
hr
as a categorical variable to the model, estimating one parameter for each hour. Note that alternative strategies are possible that may be more efficient, such as incorporating hr
as a numerical variable and modeling the non-linearity using a lower number of parameters.Bikeshare
dataset. For example, it seems likely that more people commute by bike in good weather, while fewer people will commute by bike in terrible weather. This would motivate an interaction between the variables weathersit
and hr
.glm
function. The number of bikers is used as a response variable, which is modeled as a function of weathersit
, hum
and hr
.humc
. This means that when humc=0
, this corresponds to the average humidity in the dataset.family = "poisson"
specifies the Poisson distribution for the response variable and by default the canonical link function, which is the log link, will be used.Bikeshare$humc <- Bikeshare$hum - mean(Bikeshare$hum)
m <- glm(bikers ~ weathersit + humc + I(humc^2) + I(humc^3) + hr,
data = Bikeshare,
family = "poisson")
summary(m)
##
## Call:
## glm(formula = bikers ~ weathersit + humc + I(humc^2) + I(humc^3) +
## hr, family = "poisson", data = Bikeshare)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -23.3408 -4.6201 -0.9922 3.4605 27.4153
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.893651 0.008083 481.708 <2e-16 ***
## weathersitcloudy/misty -0.146618 0.002277 -64.401 <2e-16 ***
## weathersitlight rain/snow -0.556153 0.004585 -121.292 <2e-16 ***
## weathersitheavy rain/snow -1.855194 0.166742 -11.126 <2e-16 ***
## humc 0.091751 0.009233 9.938 <2e-16 ***
## I(humc^2) -2.233919 0.029421 -75.929 <2e-16 ***
## I(humc^3) -1.823066 0.091428 -19.940 <2e-16 ***
## hr1 -0.476470 0.012999 -36.654 <2e-16 ***
## hr2 -0.806959 0.014646 -55.099 <2e-16 ***
## hr3 -1.433648 0.018842 -76.090 <2e-16 ***
## hr4 -2.058714 0.024796 -83.027 <2e-16 ***
## hr5 -1.061695 0.016074 -66.051 <2e-16 ***
## hr6 0.315691 0.010607 29.761 <2e-16 ***
## hr7 1.317856 0.009052 145.586 <2e-16 ***
## hr8 1.830026 0.008653 211.480 <2e-16 ***
## hr9 1.352135 0.009022 149.871 <2e-16 ***
## hr10 1.129497 0.009271 121.831 <2e-16 ***
## hr11 1.308554 0.009102 143.766 <2e-16 ***
## hr12 1.522234 0.008947 170.131 <2e-16 ***
## hr13 1.536827 0.008959 171.542 <2e-16 ***
## hr14 1.499506 0.008999 166.633 <2e-16 ***
## hr15 1.535043 0.008974 171.062 <2e-16 ***
## hr16 1.745128 0.008800 198.318 <2e-16 ***
## hr17 2.140488 0.008565 249.925 <2e-16 ***
## hr18 2.037740 0.008588 237.279 <2e-16 ***
## hr19 1.711545 0.008747 195.667 <2e-16 ***
## hr20 1.393901 0.008976 155.284 <2e-16 ***
## hr21 1.132543 0.009216 122.895 <2e-16 ***
## hr22 0.882534 0.009537 92.539 <2e-16 ***
## hr23 0.481354 0.010207 47.157 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1052921 on 8644 degrees of freedom
## Residual deviance: 375265 on 8615 degrees of freedom
## AIC: 428362
##
## Number of Fisher Scoring iterations: 5
Interpretation of the intercept.
weathersit
level 1), and average humidity (humc
=0). We will denote the intercept as \(\beta_0\) and its estimate as \(\hat{\beta}_0\). All other coefficients will thus denote a relative change with respect to that reference level.Interpretation of weathersitcloudy/misty
.
weathersit=2
and weathersit=1
, all other variables being equal (say, at their reference level). Indeed, define \(\eta_{w2}\) and \(\eta_{w1}\) to denote the linear predictor at weathersit=2
, and weathersit=1
, respectively. Then, \(\eta_{w2} - \eta_{w1} = (\beta_0 + \beta_1) - \beta_0 = \beta_1\).Interpretation of the humidity effect.
The humidity effect is a bit more involved to interpret. Due to the quadratic and cubic terms, it is not straight forward to interpret the linear term separately (and the same applies to the quadratic or cubic term); we must interpret both the linear, quadratic and cubic term simultaneously.
Also due to the higher-order terms, the rate of change in average bikers will not be constant across the range of humidity. We can therefore not interpret the humidity effect using a single number as we’ve done previously.
We can, however, provide some examples for specific humidity values, along with a visualization of its global effect.
For example, let’s derive the change in average bikes being used at a humidity that is \(0.2\) above average, versus average humidity.
For average humidity \(+0.2\) the linear predictor \(\eta_{0.2} = \beta_0 + \beta_4 x_{hum} + \beta_5 x_{hum}^2 + \beta_6 x_{hum}^3 = \beta_0 + \beta_4 0.2 + \beta_5 0.2^2 + \beta_6 0.2^3\).
For average humidity, the linear predictor \(\eta_{0} = \beta_0 + \beta_4 x_{hum} + \beta_5 x_{hum}^2 + \beta_6 x_{hum}^3 = \beta_0 + \beta_4 0 + \beta_5 0^2 + \beta_6 0^3 = \beta_0\).
We thus have \(\log \frac{\mu_{0.2}}{\mu_0} = \beta_4 0.2 + \beta_5 0.2^2 + \beta_6 0.2^3\). In our case, \(\log \frac{\hat{\mu}_{0.2}}{\hat{\mu}_0} = 0.091751*0.2 - 2.233919 * 0.2^2 - 1.823066 * 0.2^3 = -0.0856\) and thus \(\frac{\hat{\mu}_{0.2}}{\hat{\mu}_0} = 0.92\). Therefore, at humidity that is \(0.2\) above average, the average number of bikes being used are \(0.92\) times the average number of bikes used at average humidity.
Just like with linear models, the predict
function is extremely helpful when trying to visualize and understand a fitted GLM. In GLMs, the type
argument becomes essential when using the predict
function. Indeed, by default, estimates are provided on the linear predictor scale: in our case, on the \(\log\) scale. If we’d like predictions on the scale of the response variable, we need to set type="response"
. You can find more information in the help file using ?predict.glm
.
The visualization shows that the highest number of bikes are being used at around average humidity, with a decreased usage at higher and lower humidities.
humidityGrid <- seq(min(Bikeshare$humc), max(Bikeshare$humc),
length.out = 50)
newDf <- data.frame(weathersit = factor("clear"),
hr = factor(8),
humc = humidityGrid,
"I(humc^2)" = humidityGrid^2,
"I(humc^3)" = humidityGrid^3)
yhat <- predict(m,
newdata = newDf,
type = "response")
plot(x = humidityGrid,
y = yhat,
type = 'l', lwd=2,
xlab = "Centered humidity",
ylab = "Average number of bikers")
Setting up a contrast.
light rain/snow
weather category, versus (B) average humidity, hour 8, in the clear
weather category. This requires us to set up a contrast in terms of a linear combination of the model parameters.light rain/snow
weather category the average number of bikers is 56% times the average number of bikers in the average humidity, hour 8, in the clear
weather category.R
: We can also use matrix multiplication to derive the estimates. We know from our manual calculations above, that the contrast of interest is \((\beta_0 + \beta_2 x_{rainSnow} + \beta_4 x_{hum} + \beta_5 x_{hum}^+ \beta_6 x_{hum}^3 + \beta_{23} x_{hr17}) - (\beta_0 + \beta_{14} x_{hr8})\). We can store this in a contrast matrix, and then multiply it with the coefficients of our model:L <- matrix(0,
nrow = length(coef(m)),
ncol = 1)
rownames(L) <- names(coef(m))
L["weathersitlight rain/snow",1] <- 1
L["humc",1] <- 0.357
L["I(humc^2)", 1] <- 0.357^2
L["I(humc^3)", 1] <- 0.357^3
L["hr17", 1] <- 1
L["hr8",1] <- -1
L
## [,1]
## (Intercept) 0.00000000
## weathersitcloudy/misty 0.00000000
## weathersitlight rain/snow 1.00000000
## weathersitheavy rain/snow 0.00000000
## humc 0.35700000
## I(humc^2) 0.12744900
## I(humc^3) 0.04549929
## hr1 0.00000000
## hr2 0.00000000
## hr3 0.00000000
## hr4 0.00000000
## hr5 0.00000000
## hr6 0.00000000
## hr7 0.00000000
## hr8 -1.00000000
## hr9 0.00000000
## hr10 0.00000000
## hr11 0.00000000
## hr12 0.00000000
## hr13 0.00000000
## hr14 0.00000000
## hr15 0.00000000
## hr16 0.00000000
## hr17 1.00000000
## hr18 0.00000000
## hr19 0.00000000
## hr20 0.00000000
## hr21 0.00000000
## hr22 0.00000000
## hr23 0.00000000
## [,1]
## [1,] 0.5595652
predict
in R
:# set up data frames with relevant predictor variables' values.
dfA <- data.frame(weathersit = factor("light rain/snow"),
hr = factor(17),
humc = 0.357)
dfB <- data.frame(weathersit = factor("clear"),
hr = factor(8),
humc = 0)
# calculate estimated average number of bikers
yhatA <- predict(m,
newdata = dfA,
type = "response")
yhatB <- predict(m,
newdata = dfB,
type = "response")
yhatA / yhatB # also equal to above.
## 1
## 0.5595652
Exercise: try to derive the change in average number of bikers between (a) humidity of 0.1 above average, clear weather (weathersit=1
), at hour 10 and (b) humidity of 0.1 below average, cloudy weather (weathersit=2
), at hour 20, using all three methods.
R
Let’s use a Wald test and a likelihood ratio test to test whether the average number of bikers differs between a working day or a weekend day, using a simple GLM with only that variable as a covariate. This amounts to testing
\[H_0: \beta_{workingday} = 0\] \[H_1: \beta_{workingday} \ne 0\]
mSimple <- glm(bikers ~ workingday,
family = "poisson",
data = Bikeshare)
summSimple <- summary(mSimple)
summSimple$coefficients["workingday",]
## Estimate Std. Error z value Pr(>|z|)
## 2.352087e-02 1.937241e-03 1.214143e+01 6.370326e-34
# Wald test manually
W <- summSimple$coefficients["workingday", "Estimate"] / summSimple$coefficients["workingday", "Std. Error"]
pval <- 2*(1 - pnorm(W))
W
## [1] 12.14143
## [1] 0
# Wald test through a contrast
C <- matrix(0, nrow=1, ncol=length(coef(mSimple)))
colnames(C) <- names(coef(mSimple))
C[, "workingday"] <- 1
C
## (Intercept) workingday
## [1,] 0 1
beta <- matrix(coef(mSimple), ncol=1)
Sigma <- vcov(mSimple)
W2 <- C %*% beta %*% solve(C %*% Sigma %*% t(C)) %*% t(beta) %*% t(C)
W2
## [,1]
## [1,] 147.4143
## [1] 147.4143
## [,1]
## [1,] 0
##
## Call:
## glm(formula = bikers ~ workingday, family = "poisson", data = Bikeshare)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -16.666 -11.361 -3.026 5.214 30.729
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.952243 0.001608 3080.12 <2e-16 ***
## workingday 0.023521 0.001937 12.14 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1052921 on 8644 degrees of freedom
## Residual deviance: 1052773 on 8643 degrees of freedom
## AIC: 1105815
##
## Number of Fisher Scoring iterations: 5
mFull <- glm(bikers ~ workingday,
family = "poisson",
data = Bikeshare)
mReduced <- glm(bikers ~ 1,
family = "poisson",
data = Bikeshare)
# manual LRT
llFull <- logLik(mFull)
llReduced <- logLik(mReduced)
lrt <- as.numeric(2 * (llFull - llReduced))
lrt
## [1] 147.8481
## [1] 0
Below, we show how one may be able to derive independent contrasts from their contrast matrix for statistical inference. We also show that attempting to use linearly dependent contrasts results in an issue, since the variance-covariance matrix of our contrasts will be singular and therefore non-invertible.
C <- matrix(0, nrow=3, ncol=length(coef(mSimple)))
colnames(C) <- names(coef(mSimple))
C[1, "(Intercept)"] <- 1
C[2, "workingday"] <- 1
C[3,c("(Intercept)", "workingday")] <- c(1,1)
C
## (Intercept) workingday
## [1,] 1 0
## [2,] 0 1
## [3,] 1 1
# doesn't work because we have a singular matrix
W2 <- try(t(C %*% beta) %*% solve(C %*% Sigma %*% t(C)) %*% C %*% beta)
## Error in solve.default(C %*% Sigma %*% t(C)) :
## system is computationally singular: reciprocal condition number = 1.88084e-17
## [1] "Error in solve.default(C %*% Sigma %*% t(C)) : \n system is computationally singular: reciprocal condition number = 1.88084e-17\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <simpleError in solve.default(C %*% Sigma %*% t(C)): system is computationally singular: reciprocal condition number = 1.88084e-17>
# identify the linearly independent contrasts
Ct <- t(C)
CtIndep <- Ct[,qr(Ct)$pivot[1:qr(Ct)$rank]]
CIndep <- t(CtIndep)
# try again
W2 <- t(CIndep %*% beta) %*% solve(CIndep %*% Sigma %*% t(CIndep)) %*% CIndep %*% beta
W2
## [,1]
## [1,] 30686825
Exercise:
R
by resid(m, type="deviance")
and resid(m, type="pearson")
.Bikeshare
dataset. We will notice that the overdispersion is huge! The p-values and standard errors provided by the model can therefore not be trusted!ePearson <- resid(m, type="pearson")
n <- nrow(Bikeshare)
p <- length(coef(m))
varPearson <- sum((ePearson^2)) / (n - p)
varPearson # HUGE!
## [1] 42.44815
Below, we fit a negative binomial and quasi-Poisson model in R
.
## negative binomial
library(MASS)
mNB <- glm.nb(bikers ~ weathersit + humc + I(humc^2) + I(humc^3) + hr,
data = Bikeshare)
summary(mNB)
##
## Call:
## glm.nb(formula = bikers ~ weathersit + humc + I(humc^2) + I(humc^3) +
## hr, data = Bikeshare, init.theta = 2.420957542, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.0170 -0.9246 -0.1729 0.5008 5.4746
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.86978 0.03560 108.701 < 2e-16 ***
## weathersitcloudy/misty -0.15215 0.01753 -8.681 < 2e-16 ***
## weathersitlight rain/snow -0.62708 0.02956 -21.212 < 2e-16 ***
## weathersitheavy rain/snow -1.95785 0.66526 -2.943 0.00325 **
## humc 0.22567 0.06968 3.239 0.00120 **
## I(humc^2) -1.95867 0.20007 -9.790 < 2e-16 ***
## I(humc^3) -0.75361 0.59317 -1.270 0.20391
## hr1 -0.47370 0.04968 -9.535 < 2e-16 ***
## hr2 -0.79592 0.05041 -15.789 < 2e-16 ***
## hr3 -1.42636 0.05216 -27.346 < 2e-16 ***
## hr4 -2.04660 0.05478 -37.360 < 2e-16 ***
## hr5 -1.05264 0.05085 -20.701 < 2e-16 ***
## hr6 0.32952 0.04909 6.712 1.92e-11 ***
## hr7 1.33266 0.04868 27.375 < 2e-16 ***
## hr8 1.86123 0.04862 38.283 < 2e-16 ***
## hr9 1.38427 0.04877 28.382 < 2e-16 ***
## hr10 1.14603 0.04900 23.390 < 2e-16 ***
## hr11 1.32835 0.04913 27.035 < 2e-16 ***
## hr12 1.54916 0.04934 31.401 < 2e-16 ***
## hr13 1.56716 0.04951 31.655 < 2e-16 ***
## hr14 1.53797 0.04960 31.010 < 2e-16 ***
## hr15 1.57197 0.04961 31.689 < 2e-16 ***
## hr16 1.78550 0.04947 36.092 < 2e-16 ***
## hr17 2.19094 0.04924 44.491 < 2e-16 ***
## hr18 2.08597 0.04911 42.473 < 2e-16 ***
## hr19 1.75147 0.04890 35.816 < 2e-16 ***
## hr20 1.41815 0.04883 29.044 < 2e-16 ***
## hr21 1.14329 0.04875 23.450 < 2e-16 ***
## hr22 0.89344 0.04879 18.313 < 2e-16 ***
## hr23 0.49606 0.04891 10.142 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(2.421) family taken to be 1)
##
## Null deviance: 26632.4 on 8644 degrees of freedom
## Residual deviance: 9314.9 on 8615 degrees of freedom
## AIC: 93224
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 2.4210
## Std. Err.: 0.0374
##
## 2 x log-likelihood: -93161.6150
## [1] 0.9833573
## quasi-Poisson
mQP <- glm(bikers ~ weathersit + humc + I(humc^2) + I(humc^3) + hr,
data = Bikeshare,
family="quasipoisson")
# note the dispersion parameter being estimated is equal to our overdispersion diagnostic measure.
# Indeed, this is the way the dispersion parameter is estimated for the QP!!
summary(mQP)
##
## Call:
## glm(formula = bikers ~ weathersit + humc + I(humc^2) + I(humc^3) +
## hr, family = "quasipoisson", data = Bikeshare)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -23.3408 -4.6201 -0.9922 3.4605 27.4153
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.89365 0.05266 73.936 < 2e-16 ***
## weathersitcloudy/misty -0.14662 0.01483 -9.885 < 2e-16 ***
## weathersitlight rain/snow -0.55615 0.02987 -18.617 < 2e-16 ***
## weathersitheavy rain/snow -1.85519 1.08636 -1.708 0.08773 .
## humc 0.09175 0.06015 1.525 0.12723
## I(humc^2) -2.23392 0.19169 -11.654 < 2e-16 ***
## I(humc^3) -1.82307 0.59568 -3.060 0.00222 **
## hr1 -0.47647 0.08469 -5.626 1.90e-08 ***
## hr2 -0.80696 0.09542 -8.457 < 2e-16 ***
## hr3 -1.43365 0.12276 -11.679 < 2e-16 ***
## hr4 -2.05871 0.16155 -12.744 < 2e-16 ***
## hr5 -1.06169 0.10473 -10.138 < 2e-16 ***
## hr6 0.31569 0.06911 4.568 4.99e-06 ***
## hr7 1.31786 0.05898 22.346 < 2e-16 ***
## hr8 1.83003 0.05638 32.459 < 2e-16 ***
## hr9 1.35214 0.05878 23.003 < 2e-16 ***
## hr10 1.12950 0.06040 18.699 < 2e-16 ***
## hr11 1.30855 0.05930 22.066 < 2e-16 ***
## hr12 1.52223 0.05829 26.113 < 2e-16 ***
## hr13 1.53683 0.05837 26.329 < 2e-16 ***
## hr14 1.49951 0.05863 25.576 < 2e-16 ***
## hr15 1.53504 0.05846 26.256 < 2e-16 ***
## hr16 1.74513 0.05733 30.439 < 2e-16 ***
## hr17 2.14049 0.05580 38.360 < 2e-16 ***
## hr18 2.03774 0.05595 36.419 < 2e-16 ***
## hr19 1.71154 0.05699 30.032 < 2e-16 ***
## hr20 1.39390 0.05848 23.834 < 2e-16 ***
## hr21 1.13254 0.06004 18.863 < 2e-16 ***
## hr22 0.88253 0.06214 14.203 < 2e-16 ***
## hr23 0.48135 0.06650 7.238 4.94e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 42.44818)
##
## Null deviance: 1052921 on 8644 degrees of freedom
## Residual deviance: 375265 on 8615 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 5
In this lecture, we have introduced GLMs to model data that can be assumed to follow a distribution belonging to the exponential family. We focussed on estimation, interpretation, statistical inference and some model goodness-of-fit diagnostics. We did not consider important topics like model selection, or even whether a GLM is appropriate for your dataset! These are other important topics which, unfortunately, we do not have the bandwidth for to include in this course. Therefore, a final note through an XKCD comic, after finishing this long chapter!