Researchers assessed the effect of spinal nerve ligation (SNL) on the transcriptome of rats. In this experiment, transcriptome profiling occurred at two weeks and two months after treatment, for both the SNL group and a control group. Two biological replicates are used for every treatment - time combination. The researchers are interested in early and late effects and in genes for which the effect changes over time.
"http://bowtie-bio.sourceforge.net/recount/ExpressionSets/hammer_eset.RData"
file=load(url(file))
hammer.eset
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 29516 features, 8 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: SRX020102 SRX020103 ... SRX020098-101 (8 total)
## varLabels: sample.id num.tech.reps ... Time (5 total)
## varMetadata: labelDescription
## featureData
## featureNames: ENSRNOG00000000001 ENSRNOG00000000007 ...
## ENSRNOG00000045521 (29516 total)
## fvarLabels: gene
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
pData(hammer.eset)
## sample.id num.tech.reps protocol strain Time
## SRX020102 SRX020102 1 control Sprague Dawley 2 months
## SRX020103 SRX020103 2 control Sprague Dawley 2 months
## SRX020104 SRX020104 1 L5 SNL Sprague Dawley 2 months
## SRX020105 SRX020105 2 L5 SNL Sprague Dawley 2months
## SRX020091-3 SRX020091-3 1 control Sprague Dawley 2 weeks
## SRX020088-90 SRX020088-90 2 control Sprague Dawley 2 weeks
## SRX020094-7 SRX020094-7 1 L5 SNL Sprague Dawley 2 weeks
## SRX020098-101 SRX020098-101 2 L5 SNL Sprague Dawley 2 weeks
library(tidyverse)
pData(hammer.eset)
## sample.id num.tech.reps protocol strain Time
## SRX020102 SRX020102 1 control Sprague Dawley 2 months
## SRX020103 SRX020103 2 control Sprague Dawley 2 months
## SRX020104 SRX020104 1 L5 SNL Sprague Dawley 2 months
## SRX020105 SRX020105 2 L5 SNL Sprague Dawley 2months
## SRX020091-3 SRX020091-3 1 control Sprague Dawley 2 weeks
## SRX020088-90 SRX020088-90 2 control Sprague Dawley 2 weeks
## SRX020094-7 SRX020094-7 1 L5 SNL Sprague Dawley 2 weeks
## SRX020098-101 SRX020098-101 2 L5 SNL Sprague Dawley 2 weeks
%>% exprs %>% head hammer.eset
## SRX020102 SRX020103 SRX020104 SRX020105 SRX020091-3
## ENSRNOG00000000001 2 4 18 24 7
## ENSRNOG00000000007 4 1 3 1 5
## ENSRNOG00000000008 0 1 4 2 0
## ENSRNOG00000000009 0 0 0 0 0
## ENSRNOG00000000010 19 10 19 13 50
## ENSRNOG00000000012 7 5 1 0 31
## SRX020088-90 SRX020094-7 SRX020098-101
## ENSRNOG00000000001 4 93 77
## ENSRNOG00000000007 4 9 4
## ENSRNOG00000000008 5 2 6
## ENSRNOG00000000009 0 0 0
## ENSRNOG00000000010 57 45 58
## ENSRNOG00000000012 26 12 9
The researchers are interested in an effect of the treatment at the early time point, the late timepoint and the treatment \(\times\) time interaction.
pData(hammer.eset)$time<-factor(rep(c("2m","2w"),each=4),levels = c("2w","2m"))
levels(pData(hammer.eset)$protocol)<-c("c","snl")
The read count \(y_{ig}\) for gene \(g\) of mouse \(i\) are modelled as follows:
\[ \left\{ \begin{array}{lcl} y_{ig} &\sim& NB(\mu_{ig},\phi_g)\\ E[y_{ig}\vert \mathbf{x}_{ig}]&=&\mu_{ig}\\ log(\mu_{ig})&=&\eta_{ig}\\ \eta_{ig}&=&\beta_0 + \beta_{snl} x_{snl,i} + \beta_{t2m}x_{t2m,i} + \beta_\text{snl,t2m} x_{snl,i}x_{t2m,i} + \log N_i\\ \end{array}\right. \]
with \(x_{snl,i}\) a dummy variable that is 1 if a mouse had the spinal nerve ligation and is 0 otherwise, \(x_{t2m,i}\) a dummy variable that equals 1 if the mouse was sacrificed after 2 months and 0 otherwise, and, \(\log{N}_i\) a normalisation offset to correct for sequencing depth. Note, that \(\beta_{snl}\) is the main effect for spinal nerve ligation, and corresponds to the average log fold change between treated and control mice after two weeks. The interaction \(\beta_\text{snl,t2m}\) can be interpreted as the average change in log FC between treated and control mouse at the late and early timepoint. The researchers are also interested in a assessing third contrast: the effect of the treatment at the late time point.
\[ \log \text{FC}^\text{2 months}_\text{snl-c}= \beta_{snl}+\beta_{snl,t2m}\]
The design matrix is constructed for the linear predictor.
DGEList(counts=exprs(hammer.eset))
dge <-$sample dge
## group lib.size norm.factors
## SRX020102 1 5282855 1
## SRX020103 1 4562100 1
## SRX020104 1 4897807 1
## SRX020105 1 5123782 1
## SRX020091-3 1 17705411 1
## SRX020088-90 1 17449646 1
## SRX020094-7 1 23649094 1
## SRX020098-101 1 23537179 1
model.matrix(~time*protocol,pData(hammer.eset))
design <-rownames(design) = colnames(dge)
design
## (Intercept) time2m protocolsnl time2m:protocolsnl
## SRX020102 1 1 0 0
## SRX020103 1 1 0 0
## SRX020104 1 1 1 1
## SRX020105 1 1 1 1
## SRX020091-3 1 0 0 0
## SRX020088-90 1 0 0 0
## SRX020094-7 1 0 1 0
## SRX020098-101 1 0 1 0
## attr(,"assign")
## [1] 0 1 2 3
## attr(,"contrasts")
## attr(,"contrasts")$time
## [1] "contr.treatment"
##
## attr(,"contrasts")$protocol
## [1] "contr.treatment"
Filtering
filterByExpr(dge, design)
keep <- dge[keep, , keep.lib.sizes=FALSE] dge <-
calcNormFactors(dge)
dge <-$samples dge
## group lib.size norm.factors
## SRX020102 1 5279636 0.9980777
## SRX020103 1 4559314 0.9860762
## SRX020104 1 4894684 1.0233202
## SRX020105 1 5120633 1.0194303
## SRX020091-3 1 17694917 0.9642809
## SRX020088-90 1 17438982 0.9784500
## SRX020094-7 1 23631984 1.0185845
## SRX020098-101 1 23521582 1.0134838
An MDS plot shows the leading log fold changes between the 8 samples. There is a large effect according to the SNL. Are there issues with the design?
plotMDS(dge,labels=paste(hammer.eset$protocol,hammer.eset$time,sep="-"),col=as.double(hammer.eset$protocol))
estimateDisp(dge, design, robust=TRUE)
dge <-plotBCV(dge)
glmFit(dge,design) fit <-
DE at the early timepoint can be assessed by testing the null hypothesis
\[ H_0: \log \text{FC}^\text{2 weeks}_\text{snl-c}=0 \rightarrow \beta_\text{snl}=0 \]
against the two side alternative hypothesis
\[ H_1: \log \text{FC}^\text{2 weeks}_\text{snl-c}\neq 0 \rightarrow \beta_\text{snl}\neq0 \]
glmLRT(fit,coef="protocolsnl")
early <-topTags(early, n = nrow(dge)) # all genes
ttEarly<-hist(ttEarly$table$PValue,main="early",xlab="p-values")
summary(dtEarly <- decideTestsDGE(early))
## protocolsnl
## Down 3445
## NotSig 6859
## Up 3722
ggplot(ttEarly$table,aes(x=logFC,y=-log10(PValue),color=FDR<0.05)) + geom_point() + scale_color_manual(values=c("black","red"))
volcano<- volcano
plotSmear(early,de.tags=rownames(dge)[as.logical(dtEarly)],ylab="log FC_late - log FC_early")
Because there are 7167 significant genes we restrict the heatmap to the top 30 genes.
pheatmap(cpm(dge,log=TRUE)[rownames(ttEarly$table)[1:30],],labels_col = paste(hammer.eset$protocol,hammer.eset$time,sep="-"))
The effect of the treatment after two months can be estimated by the log fold change corresponding to the sum of the main effect and the interaction. This can be assessed by testing the null hypothesis
\[ H_0: \log \text{FC}^\text{2 months}_\text{snl-c}=0 \rightarrow \beta_\text{snl}+\beta_\text{snl,t2m}=0 \]
against the two side alternative hypothesis
\[ H_1: \log \text{FC}^\text{2 months}_\text{snl-c}\neq0 \rightarrow \beta_\text{snl}+\beta_\text{snl,t2m}\neq0 \]
array(0,ncol(design))
L<-names(L)<-colnames(design)
c(3,4)] <- 1
L[ L
## (Intercept) time2m protocolsnl time2m:protocolsnl
## 0 0 1 1
glmLRT(fit,contrast=L)
late<-topTags(late, n = nrow(dge)) # all genes
ttLate<-hist(ttLate$table$PValue,main="late",xlab="p-values")
summary(dtLate <- decideTestsDGE(late))
## 1*protocolsnl 1*time2m:protocolsnl
## Down 3103
## NotSig 7446
## Up 3477
ggplot(ttLate$table,aes(x=logFC,y=-log10(PValue),color=FDR<0.05)) + geom_point() + scale_color_manual(values=c("black","red"))
volcano<- volcano
plotSmear(late,de.tags=rownames(dge)[as.logical(dtLate)],ylab="log FC_late - log FC_early")
Because there are 6580 significant genes we restrict the heatmap to the top 30 genes.
pheatmap(cpm(dge,log=TRUE)[rownames(ttLate$table)[1:30],],labels_col = paste(hammer.eset$protocol,hammer.eset$time,sep="-"))
To assess if the treatment effect changes over time, we will test the null hypothesis that the interaction term equals zero vs the alternative that the interaction term is different from zero.
This can be assessed by testing the null hypothesis
\[ H_0: \log \text{FC}^\text{2 months}_\text{snl-c}-\log \text{FC}^\text{2 weeks}_\text{snl-c}=0 \rightarrow \beta_\text{snl,t2m}=0 \]
against the alternative hyptothesis
\[ H_1: \log \text{FC}^\text{2 months}_\text{snl-c}-\log \text{FC}^\text{2 weeks}_\text{snl-c}\neq0 \rightarrow \beta_\text{snl,t2m}\neq0 \]
glmLRT(fit,coef="time2m:protocolsnl")
inter <-topTags(inter, n = nrow(dge)) # all genes
ttInter<-hist(ttInter$table$PValue,main="interaction",xlab="p-values")
summary(dtInter <- decideTestsDGE(inter))
## time2m:protocolsnl
## Down 17
## NotSig 13992
## Up 17
ggplot(ttInter$table,aes(x=logFC,y=-log10(PValue),color=FDR<0.05)) + geom_point() + scale_color_manual(values=c("black","red"))
volcano<- volcano
plotSmear(inter,de.tags=rownames(dge)[as.logical(dtInter)],ylab="log FC_late - log FC_early")
Because there are 34 significant genes we plot them all in the heatmap.
pheatmap(cpm(dge,log=TRUE)[as.logical(dtInter),],labels_col = paste(hammer.eset$protocol,hammer.eset$time,sep="-"))
There are very many DE genes according to the SNL treatment at the early and late timepoint.
Issues with the design?
There are very few interactions significant. Can you explain this?
$table <- ttEarly$table %>%
ttEarly mutate(z = sign(logFC) * abs(qnorm(PValue/2)))
$table %>%
ttEarly ggplot(aes(x=z)) +
geom_histogram(aes(y = ..density..), color = "black") +
stat_function(fun = dnorm,
args = list(
mean = 0,
sd=1)
)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.