1 Exponential Family

\[ f(y_i\vert \theta_i,\phi)=\exp\left\{ \frac{y_i\theta_i- b(\theta_i)}{a(\phi)}+c(y_i,\phi)\right\} \]

likelihood to observe \(y_1, \ldots, y_n\) independent observations \[L(\boldsymbol{\theta},\phi\vert \boldsymbol{y})=\prod\limits_{i=1}^n f(y_i,\theta_i,\phi)\]

log-likelihood \[ l(\boldsymbol{\theta},\phi\vert \boldsymbol{y}) = \sum\limits_{i=1}^n \log f(y_i,\theta_i,\phi) \]

Log-likelihood for one observation: \[ l(\theta_i,\phi\vert y_i)=\left\{ \frac{y_i\theta_i- b(\theta_i)}{a(\phi)}+c(y_i,\phi)\right\} \]

  • \(E[y_i]=\mu_i=b^\prime(\theta_i)\)
  • \(\text{var}[y_i]=b^{\prime\prime}(\theta_i) a(\phi) =V(\mu_i)\)

e.g. for Poisson data: \(\text{var}[y_i] = \mu_i\) and for negative binomial data \(\text{var}[y_i] = \mu_i + \phi \mu_i^2\).

1.1 Parameter Estimation: Maximum Likelihood

\[ S(\boldsymbol{\beta}) = \frac{\partial l(\boldsymbol{\mu} \vert \mathbf{y})}{ \partial{\boldsymbol{\beta}}} = 0 \]

\[ S_i(\boldsymbol{\beta})= \frac{y_i-\mu_i}{\text{var}[y_i]}\frac{\partial\mu}{\partial\eta}\mathbf{x}_i \]

There are only two moments involved in the estimation equations of the mean model parameters: the mean and the variance!

\[ S(\boldsymbol{\beta})=\mathbf{X}^T\mathbf{A}\left(\mathbf{y}-\boldsymbol{\mu}\right) = 0 \]

We can solve this set of score equations iteratively to estimate the model parameters: \[ \mathbf{\beta}^{k+1} = (\mathbf{X}^T\mathbf{W}^k\mathbf{X})^{-1}\mathbf{X}^T\mathbf{W}^kz^{k} \]

with pseudodata

\[ z^k_i = \eta^k_i + \left.\frac{\partial \eta_i}{\partial \mu_i}\right\vert_k (y_i - \mu^k_i) \]

2 Quasi-likelihood

We no longer define the full distribution but only two moments: the mean and the variance of \(y_i\).

  • We typically define these based on an existing distribution.

  • However we make the variance more flexible, i.e. we rescale it with an additional model parameter.

  • Hence, we can model data that are over or under disperse with respect to a known distribution.

\[ \left\{ \begin{array}{rcl} E[Y_i\vert x_i] &=& \mu_i\\ g(\mu_i) &=& \eta_i\\ \eta_i &=& \mathbf{x}_i^T\boldsymbol{\beta}\\ \text{var}[Y_i] &=& \sigma^2 V(\mu_i) \end{array}\right. \]

e.g. quasi poisson

\[ \left\{ \begin{array}{rcl} E[Y_i\vert x_i] &=& \mu_i\\ g(\mu_i) &=& \eta_i\\ \eta_i &=& \mathbf{x}_i^T\boldsymbol{\beta}\\ \text{var}[Y_i] &=& \sigma^2 \mu_i \end{array}\right. \]

e.g. quasi-negative binomial

\[ \left\{ \begin{array}{rcl} E[Y_i\vert x_i] &=& \mu_i\\ g(\mu_i) &=& \eta_i\\ \eta_i &=& \mathbf{x}_i^T\boldsymbol{\beta}\\ \text{var}[Y_i] &=& \sigma^2 (\mu_i + \phi \mu_i^2) \end{array}\right. \]

2.1 Parameter Estimation

Note, that this leads to the following estimation equations:

\[ U_i(\boldsymbol{\beta})= \frac{y_i-\mu_i}{\sigma^2V(\mu_i)}\frac{\partial\mu}{\partial\eta}\mathbf{x}_i \]

\[ \begin{array}{rcccc} U(\boldsymbol{\beta})&=&\left[\mathbf{X}^T\mathbf{A}\left(\mathbf{y}-\boldsymbol{\mu}\right)\right]/\sigma^2 &=& 0\\ &&\Updownarrow\\ &&\mathbf{X}^T\mathbf{A}\left(\mathbf{y}-\boldsymbol{\mu}\right) &=& 0 \end{array} \] Hence, the estimating equations for the mean model parameters remain the same as those for the conventional distribution where the quasi-likelihood is derived from.

The variance covariance matrix of the mean model parameters then becomes: \[ \Sigma_{\hat{\boldsymbol{\beta}}}= (\mathbf{X}^T\mathbf{WX})^{-1}\sigma^2 \]

We can estimate \(\sigma^2\) using a moment estimator on the residual vector \(\mathbf{y}-\boldsymbol{\mu}\), i.e. 

\[ \hat{\sigma}^2 = \frac{1}{n-p} \sum_\limits{i=1}^n \frac{(y_i-\hat\mu_i)^2}{V(\hat\mu_i)} \]

Note, that \[e_i^p=\frac{(y_i-\hat\mu_i)}{\sqrt{V(\hat\mu_i)}}\] are also referred to as the Pearson residuals.

Another type of residuals for GLMs are deviance residuals, i.e. 

\[e_i^d = \text{sign}(y_i-\hat\mu_i)\sqrt{2[l(y_i,y_i)-l(y_i,\hat\mu_i)]} \] So another estimator is

\[\hat{\sigma}^2 = \frac{\sum_\limits{i=1}^n(e_i^d)^2}{n-p} = \frac{D}{n-p} \] with D the deviance statistic: the LRT test between the saturated model (a model that provides a perfect fit) and the current model:

\[ D = 2 \sum\limits_{i=1}^n [l(y_i,y_i,\phi)- l(y_i,\hat{\mu}_i,\phi)] \]

So the standard errors on the model parameters under quasi-likelihood become:

\[ \hat{\text{se}}_{\mathbf{L}^T\hat{\boldsymbol{\beta}}}^{QL} = \hat{\text{se}}_{\mathbf{L}^T\hat{\boldsymbol{\beta}}}^{ML}\times \hat{\sigma}\]

2.2 Hypothesis tests

An approximate t-test statistic can be defined as

\[ t = \frac{\mathbf{L}^T\hat{\boldsymbol{\beta}}}{\hat{\text{se}}_{\mathbf{L}^T\hat{\boldsymbol{\beta}}}^{QL}} \approx t_{n-p}\vert H_0 \]

And an approximate F-test statistic can be defined as:

\[F = \frac{\frac{\text{LRT}^{ML}}{c}}{\hat\sigma^2} \approx F_{c,n-p} \vert H_0,\] with c the degrees of freedom of the LRT statistic.

2.3 Example

## quasipoisson.
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
d.AD <- data.frame(treatment, outcome, counts)
d.AD
glm.pois <- glm(counts ~ outcome + treatment, family = poisson())
summary(glm.pois)
## 
## Call:
## glm(formula = counts ~ outcome + treatment, family = poisson())
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.045e+00  1.709e-01  17.815   <2e-16 ***
## outcome2    -4.543e-01  2.022e-01  -2.247   0.0246 *  
## outcome3    -2.930e-01  1.927e-01  -1.520   0.1285    
## treatment2   1.398e-16  2.000e-01   0.000   1.0000    
## treatment3  -2.416e-16  2.000e-01   0.000   1.0000    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 10.5814  on 8  degrees of freedom
## Residual deviance:  5.1291  on 4  degrees of freedom
## AIC: 56.761
## 
## Number of Fisher Scoring iterations: 4
glm.qpois <- glm(counts ~ outcome+treatment, family = quasipoisson())
summary(glm.qpois)
## 
## Call:
## glm(formula = counts ~ outcome + treatment, family = quasipoisson())
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.045e+00  1.944e-01  15.665  9.7e-05 ***
## outcome2    -4.543e-01  2.299e-01  -1.976    0.119    
## outcome3    -2.930e-01  2.192e-01  -1.337    0.252    
## treatment2   1.398e-16  2.274e-01   0.000    1.000    
## treatment3  -2.416e-16  2.274e-01   0.000    1.000    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 1.2933)
## 
##     Null deviance: 10.5814  on 8  degrees of freedom
## Residual deviance:  5.1291  on 4  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 4

Estimate dispersion ourselves? We first calculate the Pearson residuals.

ep <- (counts - glm.pois$fitted)/sqrt(glm.pois$fitted)
range(ep - resid(glm.pois,type="pearson"))
## [1] 0 0

Now, we can estimate the quasi-dispersion parameter

n <- length(counts)
p <- length(coef(glm.pois))
sigma2 <- sum(ep^2)/(n-p)
sigma2
## [1] 1.2933
poisRes <- summary(glm.pois)$coef
poisRes
##                  Estimate Std. Error       z value     Pr(>|z|)
## (Intercept)  3.044522e+00  0.1708987  1.781478e+01 5.426767e-71
## outcome2    -4.542553e-01  0.2021708 -2.246889e+00 2.464711e-02
## outcome3    -2.929871e-01  0.1927423 -1.520097e+00 1.284865e-01
## treatment2   1.397701e-16  0.2000000  6.988506e-16 1.000000e+00
## treatment3  -2.415657e-16  0.2000000 -1.207829e-15 1.000000e+00
qlRes <- summary(glm.qpois)$coef
qlRes
##                  Estimate Std. Error       t value     Pr(>|t|)
## (Intercept)  3.044522e+00  0.1943517  1.566502e+01 9.698855e-05
## outcome2    -4.542553e-01  0.2299154 -1.975750e+00 1.193809e-01
## outcome3    -2.929871e-01  0.2191931 -1.336662e+00 2.522944e-01
## treatment2   1.397701e-16  0.2274467  6.145180e-16 1.000000e+00
## treatment3  -2.415657e-16  0.2274467 -1.062076e-15 1.000000e+00
cbind(ql.se=qlRes[,2],pois.se=poisRes[,2],qlself.se=poisRes[,2]*sqrt(sigma2))
##                 ql.se   pois.se qlself.se
## (Intercept) 0.1943517 0.1708987 0.1943517
## outcome2    0.2299154 0.2021708 0.2299154
## outcome3    0.2191931 0.1927423 0.2191931
## treatment2  0.2274467 0.2000000 0.2274467
## treatment3  0.2274467 0.2000000 0.2274467

2.3.1 Approximate F-test

LR -test with Poisson regression:

library(car)
## Warning: package 'car' was built under R version 4.4.1
## Loading required package: carData
Anova(glm.pois)

We check that the approximate F-statistic equals LRT/df/sigma2

Anova(glm.qpois, test="F")
Anova(glm.pois)[1,1]/Anova(glm.pois)[1,2]/sigma2
## [1] 2.107903

2.3.2 Deviance

deviance(glm.pois)
## [1] 5.129141
lsat <- sum(dpois(counts,counts,log = TRUE))
lcur <- sum(dpois(counts,glm.pois$fitted.values,log = TRUE))
2*(lsat-lcur)
## [1] 5.129141

Deviance residuals

lsat.i <-  dpois(counts,counts,log = TRUE)
lcur.i <- dpois(counts,glm.pois$fitted.values,log = TRUE)
e.dev <- sign(counts-glm.pois$fitted) * sqrt(2*(lsat.i-lcur.i))
e.dev
##           1           2           3           4           5           6 
## -0.67124923  0.96272360 -0.16964662 -0.21998507 -0.95552353  1.04938637 
##           7           8           9 
##  0.84715368 -0.09167147 -0.96656372
resid(glm.pois,type="deviance")
##           1           2           3           4           5           6 
## -0.67124923  0.96272360 -0.16964662 -0.21998507 -0.95552353  1.04938637 
##           7           8           9 
##  0.84715368 -0.09167147 -0.96656372
glm.pois.out.null <- glm(counts ~  treatment, family = poisson())
anova(glm.pois.out.null,glm.pois)
lrtfull <- logLik(glm.pois)
lrtout0 <- logLik(glm.pois.out.null)
2*(lrtfull-lrtout0)
## 'log Lik.' 5.452305 (df=5)

2.3.3 Approximate F-test 2

glm.qpois.out.null <- glm(counts ~  treatment, family = quasipoisson())
anova(glm.qpois,glm.qpois.out.null,test = "F")
2*(lrtfull-lrtout0)
## 'log Lik.' 5.452305 (df=5)
2*(lrtfull-lrtout0)/2/sigma2
## 'log Lik.' 2.107903 (df=5)
---
title: 'Quasi-likelihood'
author: "Lieven Clement"
date: "Last edited on `r format(Sys.time(), '%d %B, %Y')`"
output: 
  html_document:
    toc: true
    toc_float: true
  pdf_document:
    toc: true
    number_sections: true
    latex_engine: xelatex
always_allow_html: true
linkcolor: blue
urlcolor: blue 
citecolor: blue
link-citations: yes

---

# Exponential Family

$$
f(y_i\vert \theta_i,\phi)=\exp\left\{ \frac{y_i\theta_i- b(\theta_i)}{a(\phi)}+c(y_i,\phi)\right\}
$$

likelihood to observe $y_1, \ldots, y_n$ independent observations 
$$L(\boldsymbol{\theta},\phi\vert \boldsymbol{y})=\prod\limits_{i=1}^n f(y_i,\theta_i,\phi)$$

log-likelihood
$$
l(\boldsymbol{\theta},\phi\vert \boldsymbol{y}) =  \sum\limits_{i=1}^n \log  f(y_i,\theta_i,\phi)
$$

Log-likelihood for one observation: 
$$
l(\theta_i,\phi\vert y_i)=\left\{ \frac{y_i\theta_i- b(\theta_i)}{a(\phi)}+c(y_i,\phi)\right\}
$$

- $E[y_i]=\mu_i=b^\prime(\theta_i)$
- $\text{var}[y_i]=b^{\prime\prime}(\theta_i) a(\phi) =V(\mu_i)$

e.g. for Poisson data: 
$\text{var}[y_i] = \mu_i$ and for negative binomial data $\text{var}[y_i] = \mu_i + \phi \mu_i^2$.

## Parameter Estimation: Maximum Likelihood

$$
S(\boldsymbol{\beta}) = \frac{\partial  l(\boldsymbol{\mu} \vert \mathbf{y})}{
\partial{\boldsymbol{\beta}}} = 0
$$

$$
S_i(\boldsymbol{\beta})= \frac{y_i-\mu_i}{\text{var}[y_i]}\frac{\partial\mu}{\partial\eta}\mathbf{x}_i
$$


There are only two moments involved in the estimation equations of the mean model parameters: 
the mean and the variance! 

$$
S(\boldsymbol{\beta})=\mathbf{X}^T\mathbf{A}\left(\mathbf{y}-\boldsymbol{\mu}\right) = 0
$$

We can solve this set of score equations iteratively to estimate the model parameters: 
$$
\mathbf{\beta}^{k+1} = (\mathbf{X}^T\mathbf{W}^k\mathbf{X})^{-1}\mathbf{X}^T\mathbf{W}^kz^{k}
$$

with  pseudodata 

$$
z^k_i = \eta^k_i + \left.\frac{\partial \eta_i}{\partial \mu_i}\right\vert_k (y_i - \mu^k_i) 
$$


# Quasi-likelihood 

We no longer define the full distribution but only two moments: the mean and the variance of $y_i$. 

- We typically define these based on an existing distribution.
- However we make the variance more flexible, i.e. we rescale it with an additional model parameter. 

- Hence, we can model data that are over or under disperse with respect to a known distribution. 

$$
\left\{ \begin{array}{rcl}
E[Y_i\vert x_i] &=& \mu_i\\
g(\mu_i) &=& \eta_i\\
\eta_i &=& \mathbf{x}_i^T\boldsymbol{\beta}\\
\text{var}[Y_i] &=& \sigma^2 V(\mu_i)
\end{array}\right.
$$


e.g. quasi poisson

$$
\left\{ \begin{array}{rcl}
E[Y_i\vert x_i] &=& \mu_i\\
g(\mu_i) &=& \eta_i\\
\eta_i &=& \mathbf{x}_i^T\boldsymbol{\beta}\\
\text{var}[Y_i] &=& \sigma^2 \mu_i
\end{array}\right.
$$

e.g. quasi-negative binomial

$$
\left\{ \begin{array}{rcl}
E[Y_i\vert x_i] &=& \mu_i\\
g(\mu_i) &=& \eta_i\\
\eta_i &=& \mathbf{x}_i^T\boldsymbol{\beta}\\
\text{var}[Y_i] &=& \sigma^2 (\mu_i + \phi \mu_i^2)
\end{array}\right.
$$

## Parameter Estimation

Note, that this leads to the following estimation equations:

$$
U_i(\boldsymbol{\beta})= \frac{y_i-\mu_i}{\sigma^2V(\mu_i)}\frac{\partial\mu}{\partial\eta}\mathbf{x}_i
$$
 

$$
\begin{array}{rcccc}
U(\boldsymbol{\beta})&=&\left[\mathbf{X}^T\mathbf{A}\left(\mathbf{y}-\boldsymbol{\mu}\right)\right]/\sigma^2 &=& 0\\
&&\Updownarrow\\
&&\mathbf{X}^T\mathbf{A}\left(\mathbf{y}-\boldsymbol{\mu}\right) &=& 0
\end{array}
$$
Hence, the estimating equations for the mean model parameters remain the same as those for the conventional distribution where the quasi-likelihood is derived from. 

The variance covariance matrix of the mean model parameters then becomes: 
$$
\Sigma_{\hat{\boldsymbol{\beta}}}= (\mathbf{X}^T\mathbf{WX})^{-1}\sigma^2
$$

We can estimate $\sigma^2$ using a moment estimator on the residual vector $\mathbf{y}-\boldsymbol{\mu}$, i.e. 

$$
\hat{\sigma}^2 = \frac{1}{n-p} \sum_\limits{i=1}^n \frac{(y_i-\hat\mu_i)^2}{V(\hat\mu_i)}
$$



Note, that 
$$e_i^p=\frac{(y_i-\hat\mu_i)}{\sqrt{V(\hat\mu_i)}}$$ are also referred to as the Pearson residuals. 

Another type of residuals for GLMs are deviance residuals, i.e. 

$$e_i^d = \text{sign}(y_i-\hat\mu_i)\sqrt{2[l(y_i,y_i)-l(y_i,\hat\mu_i)]}
$$
So another estimator is 

$$\hat{\sigma}^2 = \frac{\sum_\limits{i=1}^n(e_i^d)^2}{n-p} = \frac{D}{n-p}
$$
with D the deviance statistic: the LRT test between the saturated model (a model that provides a perfect fit) and the current model:

$$
D = 2 \sum\limits_{i=1}^n [l(y_i,y_i,\phi)- l(y_i,\hat{\mu}_i,\phi)]
$$

So the standard errors on the model parameters under quasi-likelihood become: 

$$
\hat{\text{se}}_{\mathbf{L}^T\hat{\boldsymbol{\beta}}}^{QL} = \hat{\text{se}}_{\mathbf{L}^T\hat{\boldsymbol{\beta}}}^{ML}\times \hat{\sigma}$$

## Hypothesis tests

An approximate t-test statistic can be defined as

$$
t = \frac{\mathbf{L}^T\hat{\boldsymbol{\beta}}}{\hat{\text{se}}_{\mathbf{L}^T\hat{\boldsymbol{\beta}}}^{QL}} \approx t_{n-p}\vert H_0
$$

And an approximate F-test statistic can be defined as: 

$$F = \frac{\frac{\text{LRT}^{ML}}{c}}{\hat\sigma^2}  \approx F_{c,n-p} \vert H_0,$$
with c the degrees of freedom of the LRT statistic.

## Example 

```{r}
## quasipoisson.
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
d.AD <- data.frame(treatment, outcome, counts)
d.AD
```

```{r}
glm.pois <- glm(counts ~ outcome + treatment, family = poisson())
summary(glm.pois)
```

```{r}
glm.qpois <- glm(counts ~ outcome+treatment, family = quasipoisson())
summary(glm.qpois)
```

Estimate dispersion ourselves? 
We first calculate the Pearson residuals. 

```{r}
ep <- (counts - glm.pois$fitted)/sqrt(glm.pois$fitted)
range(ep - resid(glm.pois,type="pearson"))
```

Now, we can estimate the quasi-dispersion parameter
```{r}
n <- length(counts)
p <- length(coef(glm.pois))
sigma2 <- sum(ep^2)/(n-p)
sigma2
```

```{r}
poisRes <- summary(glm.pois)$coef
poisRes
qlRes <- summary(glm.qpois)$coef
qlRes
cbind(ql.se=qlRes[,2],pois.se=poisRes[,2],qlself.se=poisRes[,2]*sqrt(sigma2))
```

### Approximate F-test

LR -test with Poisson regression: 

```{r}
library(car)
Anova(glm.pois)
```
We check that the approximate F-statistic equals LRT/df/sigma2 

```{r}
Anova(glm.qpois, test="F")
Anova(glm.pois)[1,1]/Anova(glm.pois)[1,2]/sigma2
```


### Deviance

```{r}
deviance(glm.pois)
lsat <- sum(dpois(counts,counts,log = TRUE))
lcur <- sum(dpois(counts,glm.pois$fitted.values,log = TRUE))
2*(lsat-lcur)
```

Deviance residuals

```{r}
lsat.i <-  dpois(counts,counts,log = TRUE)
lcur.i <- dpois(counts,glm.pois$fitted.values,log = TRUE)
e.dev <- sign(counts-glm.pois$fitted) * sqrt(2*(lsat.i-lcur.i))
e.dev
resid(glm.pois,type="deviance")
```

```{r}
glm.pois.out.null <- glm(counts ~  treatment, family = poisson())
anova(glm.pois.out.null,glm.pois)
lrtfull <- logLik(glm.pois)
lrtout0 <- logLik(glm.pois.out.null)
2*(lrtfull-lrtout0)
```

### Approximate F-test 2

```{r}
glm.qpois.out.null <- glm(counts ~  treatment, family = quasipoisson())
anova(glm.qpois,glm.qpois.out.null,test = "F")
2*(lrtfull-lrtout0)
2*(lrtfull-lrtout0)/2/sigma2
```
