Researchers assessed the effect of spinal nerve ligation (SNL) on the transcriptome of rats. In this experiment, transcriptome profiling occurred at two weeks and two months after treatment, for both the SNL group and a control group. Two biological replicates are used for every treatment - time combination. The researchers are interested in early and late effects and in genes for which the effect changes over time.
"http://bowtie-bio.sourceforge.net/recount/ExpressionSets/hammer_eset.RData"
file=load(url(file))
hammer.eset
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 29516 features, 8 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: SRX020102 SRX020103 ... SRX020098-101 (8 total)
## varLabels: sample.id num.tech.reps ... Time (5 total)
## varMetadata: labelDescription
## featureData
## featureNames: ENSRNOG00000000001 ENSRNOG00000000007 ...
## ENSRNOG00000045521 (29516 total)
## fvarLabels: gene
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
pData(hammer.eset)
## sample.id num.tech.reps protocol strain Time
## SRX020102 SRX020102 1 control Sprague Dawley 2 months
## SRX020103 SRX020103 2 control Sprague Dawley 2 months
## SRX020104 SRX020104 1 L5 SNL Sprague Dawley 2 months
## SRX020105 SRX020105 2 L5 SNL Sprague Dawley 2months
## SRX020091-3 SRX020091-3 1 control Sprague Dawley 2 weeks
## SRX020088-90 SRX020088-90 2 control Sprague Dawley 2 weeks
## SRX020094-7 SRX020094-7 1 L5 SNL Sprague Dawley 2 weeks
## SRX020098-101 SRX020098-101 2 L5 SNL Sprague Dawley 2 weeks
library(tidyverse)
pData(hammer.eset)
## sample.id num.tech.reps protocol strain Time
## SRX020102 SRX020102 1 control Sprague Dawley 2 months
## SRX020103 SRX020103 2 control Sprague Dawley 2 months
## SRX020104 SRX020104 1 L5 SNL Sprague Dawley 2 months
## SRX020105 SRX020105 2 L5 SNL Sprague Dawley 2months
## SRX020091-3 SRX020091-3 1 control Sprague Dawley 2 weeks
## SRX020088-90 SRX020088-90 2 control Sprague Dawley 2 weeks
## SRX020094-7 SRX020094-7 1 L5 SNL Sprague Dawley 2 weeks
## SRX020098-101 SRX020098-101 2 L5 SNL Sprague Dawley 2 weeks
%>% exprs %>% head hammer.eset
## SRX020102 SRX020103 SRX020104 SRX020105 SRX020091-3
## ENSRNOG00000000001 2 4 18 24 7
## ENSRNOG00000000007 4 1 3 1 5
## ENSRNOG00000000008 0 1 4 2 0
## ENSRNOG00000000009 0 0 0 0 0
## ENSRNOG00000000010 19 10 19 13 50
## ENSRNOG00000000012 7 5 1 0 31
## SRX020088-90 SRX020094-7 SRX020098-101
## ENSRNOG00000000001 4 93 77
## ENSRNOG00000000007 4 9 4
## ENSRNOG00000000008 5 2 6
## ENSRNOG00000000009 0 0 0
## ENSRNOG00000000010 57 45 58
## ENSRNOG00000000012 26 12 9
The researchers are interested in an effect of the treatment at the early time point, the late timepoint and the treatment \(\times\) time interaction.
The following model is used at the gene-level to model the read count \(y_{ig}\) for gene \(g\) of mouse \(i\).
For quasi-likelihood we do not specify the full distribution, only the first two moments.
\[ \left\{ \begin{array}{lcl} E[y_{ig}\vert \mathbf{x}_{ig}]&=&\mu_{ig}\\ log(\mu_{ig})&=&\eta_{ig}\\ \eta_{ig}&=&\beta_0 + \beta_{snl} x_{snl,i} + \beta_{t2m}x_{t2m,i} + \beta_\text{snl,t2m} x_{snl,i}x_{t2m,i} + \log N_i\\ \text{Var}[y_{ig}\vert \mathbf{x}_{ig}]&=&\sigma^2_g\left(\mu_{ig}+\phi\mu_{ig}^2\right) \end{array}\right. \]
with \(x_{snl,i}\) a dummy variable that is 1 if a mouse had the spinal nerve ligation and is 0 otherwise, \(x_{t2m,i}\) a dummy variable that equals 1 if the mouse was sacrificed after 2 months and 0 otherwise, and, \(\log{N}_i\) a normalisation offset to correct for sequencing depth. Note, that \(\beta_{snl}\) is the main effect for spinal nerve ligation, and corresponds to the average log fold change between treated and control mice after two weeks. The interaction \(\beta_\text{snl,t2m}\) can be interpreted as the average change in log FC between treated and control mouse at the late and early timepoint. The researchers are also interested in a assessing third contrast: the effect of the treatment at the late time point.
\[ \log \text{FC}^\text{2 months}_\text{snl-c}= \beta_{snl}+\beta_{snl,t2m}\]
The design matrix is constructed for the linear predictor.
pData(hammer.eset)$time<-factor(rep(c("2m","2w"),each=4),levels = c("2w","2m"))
levels(pData(hammer.eset)$protocol)<-c("c","snl")
DGEList(counts=exprs(hammer.eset))
dge <-$sample dge
## group lib.size norm.factors
## SRX020102 1 5282855 1
## SRX020103 1 4562100 1
## SRX020104 1 4897807 1
## SRX020105 1 5123782 1
## SRX020091-3 1 17705411 1
## SRX020088-90 1 17449646 1
## SRX020094-7 1 23649094 1
## SRX020098-101 1 23537179 1
model.matrix(~time*protocol,pData(hammer.eset))
design <-rownames(design) = colnames(dge)
design
## (Intercept) time2m protocolsnl time2m:protocolsnl
## SRX020102 1 1 0 0
## SRX020103 1 1 0 0
## SRX020104 1 1 1 1
## SRX020105 1 1 1 1
## SRX020091-3 1 0 0 0
## SRX020088-90 1 0 0 0
## SRX020094-7 1 0 1 0
## SRX020098-101 1 0 1 0
## attr(,"assign")
## [1] 0 1 2 3
## attr(,"contrasts")
## attr(,"contrasts")$time
## [1] "contr.treatment"
##
## attr(,"contrasts")$protocol
## [1] "contr.treatment"
Filtering
filterByExpr(dge, design)
keep <- dge[keep, , keep.lib.sizes=FALSE] dge <-
calcNormFactors(dge)
dge <-$samples dge
## group lib.size norm.factors
## SRX020102 1 5279636 0.9980777
## SRX020103 1 4559314 0.9860762
## SRX020104 1 4894684 1.0233202
## SRX020105 1 5120633 1.0194303
## SRX020091-3 1 17694917 0.9642809
## SRX020088-90 1 17438982 0.9784500
## SRX020094-7 1 23631984 1.0185845
## SRX020098-101 1 23521582 1.0134838
An MDS plot shows the leading log fold changes between the 8 samples. There is a large effect according to the SNL. Are there issues with the design?
plotMDS(dge,labels=paste(hammer.eset$protocol,hammer.eset$time,sep="-"),col=as.double(hammer.eset$protocol))
By replacing the glmFit function by the glmQLFit function we can perform inference with quasi likelihood. We will also replace the glmLRT function by the glmQLFTest function. Note, that to estimate the additional dispersion parameter \(\sigma^2_g\) we can correct for the degrees of freedom that are used to estimate the mean model parameters. This will enable us to correct for the degrees of freedom in the inference. Indeed, the glmQLFit test uses an F-distribution instead of the asymptotic \(\chi^2\)-distribution.
estimateDisp(dge, design, robust=TRUE)
dge <-plotBCV(dge)
glmQLFit(dge,design) fit <-
DE at the early timepoint can be assessed by testing the null hypothesis
\[ H_0: \log \text{FC}^\text{2 weeks}_\text{snl-c}=0 \rightarrow \beta_\text{snl}=0 \]
against the two side alternative hypothesis
\[ H_1: \log \text{FC}^\text{2 weeks}_\text{snl-c}\neq 0 \rightarrow \beta_\text{snl}\neq0 \]
glmQLFTest(fit,coef="protocolsnl")
early <-topTags(early, n = nrow(dge)) # all genes
ttEarly<-hist(ttEarly$table$PValue)
summary(dtEarly <- decideTestsDGE(early))
## protocolsnl
## Down 3425
## NotSig 6924
## Up 3677
ggplot(ttEarly$table,aes(x=logFC,y=-log10(PValue),color=FDR<0.05)) + geom_point() + scale_color_manual(values=c("black","red"))
volcano<- volcano
plotSmear(early,de.tags=rownames(dge)[as.logical(dtEarly)],ylab="log FC_late - log FC_early")
pheatmap(cpm(dge,log=TRUE)[rownames(ttEarly$table)[1:30],],labels_col = paste(hammer.eset$protocol,hammer.eset$time,sep="-"))
The effect of the treatment after two months can be estimated by the log fold change corresponding to the sum of the main effect and the interaction. This can be assessed by testing the null hypothesis
\[ H_0: \log \text{FC}^\text{2 months}_\text{snl-c}=0 \rightarrow \beta_\text{snl}+\beta_\text{snl,t2m}=0 \]
against the two side alternative hypothesis
\[ H_1: \log \text{FC}^\text{2 months}_\text{snl-c}\neq0 \rightarrow \beta_\text{snl}+\beta_\text{snl,t2m}\neq0 \]
array(0,ncol(design))
L<-names(L)<-colnames(design)
c(3,4)] <- 1
L[ L
## (Intercept) time2m protocolsnl time2m:protocolsnl
## 0 0 1 1
glmQLFTest(fit,contrast=L)
late<-topTags(late, n = nrow(dge)) # all genes
ttLate<-hist(ttLate$table$PValue)
summary(dtLate <- decideTestsDGE(late))
## 1*protocolsnl 1*time2m:protocolsnl
## Down 3113
## NotSig 7465
## Up 3448
ggplot(ttLate$table,aes(x=logFC,y=-log10(PValue),color=FDR<0.05)) + geom_point() + scale_color_manual(values=c("black","red"))
volcano<- volcano
plotSmear(late,de.tags=rownames(dge)[as.logical(dtLate)],ylab="log FC_late - log FC_early")
pheatmap(cpm(dge,log=TRUE)[rownames(ttLate$table)[1:30],],labels_col = paste(hammer.eset$protocol,hammer.eset$time,sep="-"))
To assess if the treatment effect changes over time, we will test the null hypothesis that the interaction term equals zero vs the alternative that the interaction term is different from zero.
This can be assessed by testing the null hypothesis
\[ H_0: \log \text{FC}^\text{2 months}_\text{snl-c}-\log \text{FC}^\text{2 weeks}_\text{snl-c}=0 \rightarrow \beta_\text{snl,t2m}=0 \]
against the alternative hyptothesis
\[ H_1: \log \text{FC}^\text{2 months}_\text{snl-c}-\log \text{FC}^\text{2 weeks}_\text{snl-c}\neq0 \rightarrow \beta_\text{snl,t2m}\neq0 \]
glmQLFTest(fit,coef="time2m:protocolsnl")
inter <-topTags(inter, n = nrow(dge)) # all genes
ttInter<-hist(ttInter$table$PValue)
summary(dtInter <- decideTestsDGE(inter))
## time2m:protocolsnl
## Down 0
## NotSig 14025
## Up 1
ggplot(ttInter$table,aes(x=logFC,y=-log10(PValue),color=FDR<0.05)) + geom_point() + scale_color_manual(values=c("black","red"))
volcano<- volcano
plotSmear(inter,de.tags=rownames(dge)[as.logical(dtInter)],ylab="log FC_late - log FC_early")
pheatmap(cpm(dge,log=TRUE)[rownames(ttInter$table)[1:30],],labels_col = paste(hammer.eset$protocol,hammer.eset$time,sep="-"))
There are very many DE genes according to the SNL treatment at the early and late timepoint.
Issues with the design?
There are very few interactions significant. Can you explain this?
$table <- ttEarly$table %>%
ttEarly mutate(z = sign(logFC) * abs(qnorm(PValue/2)))
$table %>%
ttEarly ggplot(aes(x=z)) +
geom_histogram(aes(y = ..density..), color = "black") +
stat_function(fun = dnorm,
args = list(
mean = 0,
sd=1)
)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.