After fertilization but prior to the onset of zygotic transcription, the C. elegans zygote cleaves asymmetrically to create the anterior AB and posterior P1 blastomeres, each of which goes on to generate distinct cell lineages. To understand how patterns of RNA inheritance and abundance arise after this first asymmetric cell division, we pooled hand-dissected AB and P1 blastomeres and performed RNA-seq. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE59943

The reads have been aligned to the genome and summarized to gene level in the Rsubread tutorial.

library(edgeR)

## Loading required package: limma

1 Read featurecounts object

We import the featurecounts object that we have stored.

fc <- readRDS("fcElegans.rds")
names(fc)

## [1] "counts"     "annotation" "targets"    "stat"

counts_featurecounts <- fc$counts
head(counts_featurecounts)

##                SRR1532959.bam SRR1532960.bam SRR1532961.bam SRR1532962.bam
## WBGene00197333              0              0              0              0
## WBGene00198386              0              0              0              0
## WBGene00015153              0              0              0              0
## WBGene00002061            152            253            157             95
## WBGene00255704              0              0              0              0
## WBGene00235314              0              0              0              0
##                SRR1532963.bam SRR1532964.bam
## WBGene00197333              0              0
## WBGene00198386              0              0
## WBGene00015153              1              2
## WBGene00002061            144            152
## WBGene00255704              0              0
## WBGene00235314              0              0

dim(counts_featurecounts)

## [1] 46904     6

fc$stat

##                           Status SRR1532959.bam SRR1532960.bam
## 1                       Assigned        1657500        1323934
## 2            Unassigned_Unmapped          98398         257946
## 3           Unassigned_Read_Type              0              0
## 4           Unassigned_Singleton              0              0
## 5      Unassigned_MappingQuality              0              0
## 6             Unassigned_Chimera              0              0
## 7      Unassigned_FragmentLength              0              0
## 8           Unassigned_Duplicate              0              0
## 9        Unassigned_MultiMapping              0              0
## 10          Unassigned_Secondary              0              0
## 11           Unassigned_NonSplit              0              0
## 12         Unassigned_NoFeatures           9430          11104
## 13 Unassigned_Overlapping_Length              0              0
## 14          Unassigned_Ambiguity          18542          17985
##    SRR1532961.bam SRR1532962.bam SRR1532963.bam SRR1532964.bam
## 1         2013110        1304759        2568046        2208253
## 2           67401          82920         108861          92578
## 3               0              0              0              0
## 4               0              0              0              0
## 5               0              0              0              0
## 6               0              0              0              0
## 7               0              0              0              0
## 8               0              0              0              0
## 9               0              0              0              0
## 10              0              0              0              0
## 11              0              0              0              0
## 12          10279           5867          11931           9791
## 13              0              0              0              0
## 14          26523          16467          47238          27401

1.1 Read Meta Data

target<-readRDS("elegansMetaData.rds")

2 Data Analysis

2.1 Setup count object edgeR

dge<-DGEList(fc$counts)
substr(colnames(dge),1,10)==target$Run

## [1] TRUE TRUE TRUE TRUE TRUE TRUE

Order of information in target file and count table is the same

colnames(dge)<-paste0(target$`cell type:ch1`,target$`replicate:ch1`)
cellType<-as.factor(target$`cell type:ch1`)

2.2 Normalisation

dge<-calcNormFactors(dge)

2.3 Data exploration

One way to reduce dimensionality is the use of multidimensional scaling (MDS). For MDS, we first have to calculate all pairwise distances between our objects (samples in this case), and then create a (typically) two-dimensional representation where these pre-calculated distances are represented as accurately as possible. This means that depending on how the pairwise sample distances are defined, the two-dimensional plot can be very different, and it is important to choose a distance that is suitable for the type of data at hand.

edgeR contains a function plotMDS, which operates on a DGEList object and generates a two-dimensional MDS representation of the samples. The default distance between two samples can be interpreted as the “typical” log fold change between the two samples, for the genes that are most different between them (by default, the top 500 genes, but this can be modified). We generate an MDS plot from the DGEList object dge, coloring by the treatment and using different plot symbols for different cell lines.

plotMDS(dge, top = 500,col=as.double(cellType))

2.4 Differential analysis

2.4.1 Filtering

design<-model.matrix(~cellType)
keep <- filterByExpr(dge, design)
dge <- dge[keep, ,keep.lib.sizes=FALSE]

The option keep.lib.sizes=FALSE causes the library sizes to be recomputed after the filtering. This is generally recommended, although the effect on the downstream analysis is usually small.

The normalisation factors have to be recalculated upon filtering.

dge<-calcNormFactors(dge)

2.4.2 Model

We first estimate the overdispersion.

dge <- estimateDisp(dge, design)
plotBCV(dge)

Finally, we fit the generalized linear model and perform the test. In the glmLRT function, we indicate which coefficient (which column in the design matrix) that we would like to test for. It is possible to test more general contrasts as well, and the user guide contains many examples on how to do this. The topTags function extracts the top-ranked genes. You can indicate the adjusted p-value cutoff, and/or the number of genes to keep.

fit <- glmFit(dge, design)
lrt <- glmLRT(fit, coef = ncol(design))
ttAll <-topTags(lrt, n = nrow(dge)) # all genes
hist(ttAll$table$PValue)

tt <- topTags(lrt, n = nrow(dge), p.value = 0.05) # genes with adj.p<0.05
tt

## Coefficient:  cellTypeP1 
##                     logFC    logCPM       LR       PValue          FDR
## WBGene00011409  2.3051280  7.119802 73.05792 1.259006e-17 7.752958e-14
## WBGene00017985 -1.7957089  7.372791 62.01882 3.401907e-15 1.047447e-11
## WBGene00016824 -1.8923338  6.753206 60.58811 7.035757e-15 1.444206e-11
## WBGene00019811  1.8317181  6.588456 48.07969 4.092443e-12 6.300316e-09
## WBGene00000388 -1.5819410  7.523366 45.20634 1.773298e-11 2.183993e-08
## WBGene00020948 -1.4293678  9.866419 44.42801 2.638848e-11 2.399019e-08
## WBGene00000496  2.5765059  6.137689 44.36365 2.727043e-11 2.399019e-08
## WBGene00021200  2.8405675  5.752301 43.29921 4.697733e-11 3.616080e-08
## WBGene00017397  1.8053895  4.954892 39.44648 3.371751e-10 2.307027e-07
## WBGene00010354  3.5189772  3.594563 36.90842 1.238099e-09 7.624211e-07
## WBGene00021009 -2.5009414  5.901792 35.03340 3.240984e-09 1.814362e-06
## WBGene00008823 -1.4590401  6.893432 33.67495 6.513416e-09 3.342468e-06
## WBGene00009035  2.4812880  3.637571 33.33107 7.773078e-09 3.682047e-06
## WBGene00008080 -1.5849757  6.273693 31.88531 1.635490e-08 7.193819e-06
## WBGene00000863 -1.3206086  6.869690 31.18917 2.340683e-08 9.609285e-06
## WBGene00009264  1.3101438  7.737531 29.98508 4.353831e-08 1.675681e-05
## WBGene00019001  1.5310735  5.796877 29.80333 4.781688e-08 1.732096e-05
## WBGene00009793 -1.5448441  6.443390 28.72037 8.361950e-08 2.860716e-05
## WBGene00016823 -1.6283682  6.822229 28.52248 9.261680e-08 3.001759e-05
## WBGene00009606 -1.6836921  7.018377 27.59933 1.492302e-07 4.221610e-05
## WBGene00006562  1.2334479  8.237130 27.59138 1.498448e-07 4.221610e-05
## WBGene00004374  1.3126857  8.205130 27.57883 1.508208e-07 4.221610e-05
## WBGene00018009 -1.1679161  7.288469 27.24427 1.793061e-07 4.800725e-05
## WBGene00009921 -1.4520825  7.509272 27.08123 1.950830e-07 5.005505e-05
## WBGene00005022  3.0989396  3.072409 26.65312 2.434579e-07 5.996854e-05
## WBGene00012031  1.6710488  5.983430 26.46157 2.688333e-07 6.367214e-05
## WBGene00007145  1.3911358  5.955030 26.23073 3.029626e-07 6.909791e-05
## WBGene00000550  1.8567741  6.986777 25.65848 4.075003e-07 8.692874e-05
## WBGene00008464 -1.3691265  6.098891 25.59697 4.206987e-07 8.692874e-05
## WBGene00002276 -1.7151651  7.783958 25.58421 4.234917e-07 8.692874e-05
## WBGene00020721  1.1766318  6.785538 25.31562 4.867513e-07 9.669080e-05
## WBGene00001333 -2.1906933  6.816367 25.20089 5.165825e-07 9.940984e-05
## WBGene00015463  1.1298098  7.473493 23.97879 9.740292e-07 1.817598e-04
## WBGene00004258  1.1255996  7.206061 23.83100 1.051748e-06 1.904901e-04
## WBGene00015515  1.4428044  4.813602 23.46583 1.271522e-06 2.237152e-04
## WBGene00020375 -1.1448080  9.964018 23.25288 1.420377e-06 2.429634e-04
## WBGene00001402  1.3805811  5.396251 22.72596 1.868267e-06 3.105851e-04
## WBGene00003994  1.1635326  7.673470 22.67691 1.916569e-06 3.105851e-04
## WBGene00019792 -1.2852712  6.342227 22.31845 2.309766e-06 3.647061e-04
## WBGene00010678  1.4904174  7.068914 22.21037 2.443501e-06 3.761770e-04
## WBGene00008825 -1.0459065  9.314393 21.83429 2.972407e-06 4.464411e-04
## WBGene00010048  1.2281757  6.560963 21.60210 3.354841e-06 4.918836e-04
## WBGene00013765  1.8422391  5.233003 21.42048 3.688109e-06 5.281715e-04
## WBGene00008789 -1.1018128  8.449436 20.27976 6.690447e-06 9.363585e-04
## WBGene00011543  1.3377977  6.012064 20.19114 7.007682e-06 9.589624e-04
## WBGene00018349  2.2981756  2.694337 19.98302 7.813312e-06 1.045965e-03
## WBGene00000146 -1.0998566  9.314730 19.47284 1.020404e-05 1.336946e-03
## WBGene00016029  2.3799083  2.545932 18.96458 1.331678e-05 1.708432e-03
## WBGene00016955 -1.2883337  7.532638 18.90645 1.372878e-05 1.725344e-03
## WBGene00017037 -1.2944945  5.432695 18.55934 1.646962e-05 2.028398e-03
## WBGene00019890  1.0627183  6.801901 18.49984 1.699182e-05 2.051679e-03
## WBGene00019892  1.2158317  5.784601 18.34878 1.839364e-05 2.178232e-03
## WBGene00003229 -0.9372311 10.526524 18.21850 1.969566e-05 2.288413e-03
## WBGene00000090 -1.3084139  7.318347 18.10653 2.088838e-05 2.382049e-03
## WBGene00020269 -0.9985510  7.976130 17.82257 2.424894e-05 2.715000e-03
## WBGene00011350  2.0098571  4.328635 17.34870 3.111111e-05 3.421111e-03
## WBGene00001979 -0.8765155  8.474812 17.17564 3.407786e-05 3.681605e-03
## WBGene00005019  1.0245921  7.112465 17.07943 3.584846e-05 3.806118e-03
## WBGene00019710  1.4716330  4.586270 16.96655 3.804429e-05 3.970792e-03
## WBGene00004855 -1.1541964  6.333542 16.42440 5.062935e-05 5.145943e-03
## WBGene00017263 -0.9855036  8.581250 16.41151 5.097475e-05 5.145943e-03
## WBGene00012716  1.0387739  7.859506 16.19168 5.724499e-05 5.631971e-03
## WBGene00021440  0.8953822  6.731520 16.15533 5.835424e-05 5.631971e-03
## WBGene00021467 -0.9242074  7.905627 16.14953 5.853299e-05 5.631971e-03
## WBGene00011775  1.1204249  7.361743 15.96554 6.450616e-05 6.111214e-03
## WBGene00005026  1.4272652  5.311851 15.79960 7.041742e-05 6.570158e-03
## WBGene00008775 -1.1781239  7.984441 15.69945 7.424558e-05 6.823944e-03
## WBGene00000585  1.0513345  6.756986 15.66281 7.569812e-05 6.855133e-03
## WBGene00011153 -1.4629380  6.169867 15.49869 8.256232e-05 7.368388e-03
## WBGene00001949 -1.0142771  6.977753 15.43446 8.541655e-05 7.514215e-03
## WBGene00001492 -1.2619828  4.913039 14.75912 1.221550e-04 1.059479e-02
## WBGene00012934  1.8036181  3.441029 14.66929 1.281169e-04 1.095755e-02
## WBGene00010036  1.4099247  5.412889 14.60783 1.323635e-04 1.116568e-02
## WBGene00000872 -0.9535369  6.287212 14.55322 1.362555e-04 1.128024e-02
## WBGene00004076  1.4108462  4.517042 14.53766 1.373852e-04 1.128024e-02
## WBGene00013381  1.5419159  3.944388 14.51101 1.393425e-04 1.129041e-02
## WBGene00020771 -1.2922388  4.633363 14.33126 1.532978e-04 1.225984e-02
## WBGene00020447  1.0656443  5.986732 14.24530 1.604607e-04 1.266817e-02
## WBGene00004268  1.2772874  6.431653 13.99685 1.831173e-04 1.406991e-02
## WBGene00015217 -1.9734551  3.064682 13.99320 1.834732e-04 1.406991e-02
## WBGene00000537  0.9602926  6.110812 13.97690 1.850703e-04 1.406991e-02
## WBGene00001566  1.4373083  5.172078 13.87318 1.955695e-04 1.468679e-02
## WBGene00004094  1.6105546  3.673329 13.74675 2.091829e-04 1.551986e-02
## WBGene00006934  1.0510840  5.928105 13.25434 2.719493e-04 1.980062e-02
## WBGene00004373  1.3076155  6.353003 13.24497 2.733116e-04 1.980062e-02
## WBGene00016486 -1.0003894  5.894267 13.13510 2.898147e-04 2.075208e-02
## WBGene00003898  0.8692242  6.677060 13.01894 3.083558e-04 2.182592e-02
## WBGene00020498  1.0154543  7.016678 12.65675 3.742126e-04 2.618638e-02
## WBGene00009065 -1.2121033  6.659070 12.51676 4.033184e-04 2.755132e-02
## WBGene00021051  1.3519472  4.947833 12.50484 4.058981e-04 2.755132e-02
## WBGene00021352 -0.7553539  7.201147 12.47903 4.115467e-04 2.755132e-02
## WBGene00010846  1.2226988  4.625478 12.47872 4.116144e-04 2.755132e-02
## WBGene00008410  0.7656502  7.069255 12.41191 4.266037e-04 2.824759e-02
## WBGene00008118  1.0221416  5.062304 12.38480 4.328426e-04 2.835579e-02
## WBGene00016501 -0.7720642  9.024532 12.32469 4.470059e-04 2.897539e-02
## WBGene00009559  0.7864199  7.725558 12.13651 4.944433e-04 3.171648e-02
## WBGene00011225  1.8970620  3.966269 12.11100 5.012535e-04 3.182184e-02
## WBGene00003026  1.4137142  4.879505 12.05388 5.168481e-04 3.224261e-02
## WBGene00013529 -0.8337794  6.423924 12.04846 5.183532e-04 3.224261e-02
## WBGene00011272 -0.9044026  6.470998 11.95771 5.442162e-04 3.351283e-02
## WBGene00017951 -0.8492104  8.086304 11.84378 5.785435e-04 3.527397e-02
## WBGene00013339  0.8364854  6.384116 11.80672 5.901719e-04 3.563018e-02
## WBGene00020549  0.8701194  7.354495 11.65816 6.392142e-04 3.821632e-02
## WBGene00002804  1.0513295  5.504499 11.59312 6.619628e-04 3.919583e-02
## WBGene00004272  0.8604911  7.606023 11.57305 6.691470e-04 3.924388e-02
## WBGene00011986  1.1433862  6.271676 11.48880 7.001667e-04 4.060516e-02
## WBGene00018226  1.3684392  3.741883 11.47458 7.055460e-04 4.060516e-02
## WBGene00019341 -0.7131633  6.992116 11.31288 7.697138e-04 4.388794e-02
## WBGene00016632 -0.7425463  6.955333 11.22841 8.055457e-04 4.550964e-02
## WBGene00015102  1.1349527 10.206572 11.12247 8.528824e-04 4.774591e-02
## WBGene00004445  1.0439088  6.327324 11.05683 8.836111e-04 4.902052e-02

2.4.3 Plots

We first make a volcanoplot and an MA plot.

library(ggplot2)
volcano<- ggplot(ttAll$table,aes(x=logFC,y=-log10(PValue),color=FDR<0.05)) + geom_point() + scale_color_manual(values=c("black","red"))
volcano

plotSmear(lrt, de.tags = row.names(tt$table))

Another way of representing the results of a differential expression analysis is to construct a heatmap of the top differentially expressed genes. Here, we would expect the contrasted sample groups to cluster separately. A heatmap is a “color coded expression matrix”, where the rows and columns are clustered using hierarchical clustering. Typically, it should not be applied to counts, but works better with transformed values. Here we show how it can be applied to the variance-stabilized values generated above. We choose the top 30 differentially expressed genes. There are many functions in R that can generate heatmaps, here we show the one from the pheatmap package.

library(pheatmap)
pheatmap(cpm(dge,log=TRUE)[rownames(tt$table)[1:30],])

Elegans: DE analysis

27 October 2019

Contents