The data used in this workflow comes from an RNA-seq experiment where airway smooth muscle cells were treated with dexamethasone, a synthetic glucocorticoid steroid with anti-inflammatory effects (Himes et al. 2014). Glucocorticoids are used, for example, by people with asthma to reduce inflammation of the airways. In the experiment, four human airway smooth muscle cell lines were treated with 1 micromolar dexamethasone for 18 hours. For each of the four cell lines, we have a treated and an untreated sample. For more description of the experiment see the article, PubMed entry 24926665, and for raw data see the GEO entry GSE52778.
Many parts of this tutorial are based on parts of a published RNA-seq workflow available via Love et al. 2015 F1000Research and as a Bioconductor package and on Charlotte Soneson’s material from the bss2019 workshop.
FastQ files with a small subset of the reads can be found on https://github.com/statOmics/SGA2019/tree/data-rnaseq Here, we will not use our count table because it is based on a small subset of the reads. We will use the count table that was provided after mapping all the reads with the read mapper star.
library(edgeR)
library(tidyverse)
We import the featurecounts object that we have stored.
readRDS(
fc <-url("https://github.com/statOmics/SGA2020/blob/data-rnaseq/airway/featureCounts/star_featurecounts.rds?raw=true")
)colnames(fc$counts)
## [1] "SRR1039508" "SRR1039509" "SRR1039512" "SRR1039513" "SRR1039516"
## [6] "SRR1039517" "SRR1039520" "SRR1039521"
readRDS("airwayMetaData.rds") target <-
## Warning in readRDS("airwayMetaData.rds"): input string 'Cells from each donor
## were treated with 1 μM dexamethasone, 1 μM albuterol, both dexamethasone and
## albuterol, or control vehicle for 18 h.' cannot be translated to UTF-8, is it
## valid in 'US-ASCII'?
## Warning in readRDS("airwayMetaData.rds"): input string 'Primary ASM cells were
## isolated from four white aborted lung transplant donors with no chronic illness.
## Passages 4 to 7 ASM cells maintained in Ham's F12 medium supplemented with 10%
## FBS were used in all experiments.' cannot be translated to UTF-8, is it valid in
## 'US-ASCII'?
DGEList(fc$counts)
dge <-colnames(dge)==target$Run
## [1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE
$Run%in%colnames(dge) target
## [1] TRUE TRUE FALSE FALSE TRUE TRUE FALSE FALSE TRUE TRUE FALSE FALSE
## [13] TRUE TRUE FALSE FALSE
rownames(target) <- target$Run
target[colnames(dge),]
target <-grep(":ch1",colnames(target))] target[,
## cell line:ch1 cell type:ch1 ercc_mix:ch1
## SRR1039508 N61311 airway smooth muscle cells -
## SRR1039509 N61311 airway smooth muscle cells -
## SRR1039512 N052611 airway smooth muscle cells -
## SRR1039513 N052611 airway smooth muscle cells 1
## SRR1039516 N080611 airway smooth muscle cells -
## SRR1039517 N080611 airway smooth muscle cells -
## SRR1039520 N061011 airway smooth muscle cells 1
## SRR1039521 N061011 airway smooth muscle cells 1
## tissue:ch1 treatment:ch1
## SRR1039508 human airway smooth muscle cells Untreated
## SRR1039509 human airway smooth muscle cells Dexamethasone
## SRR1039512 human airway smooth muscle cells Untreated
## SRR1039513 human airway smooth muscle cells Dexamethasone
## SRR1039516 human airway smooth muscle cells Untreated
## SRR1039517 human airway smooth muscle cells Dexamethasone
## SRR1039520 human airway smooth muscle cells Untreated
## SRR1039521 human airway smooth muscle cells Dexamethasone
$cellLine <- as.factor(target[,grep("cell line:ch1",colnames(target))])
target$treatment <- as.factor(target[,grep("treatment:ch1",colnames(target))])
target$treatment <- relevel(target$treatment,"Untreated") target
colnames(dge) <- paste0(
substr(target$cellLine, 1, 3),
"_",
substr(target$treatment, 1, 3)
)
model.matrix( ~ treatment + cellLine, data = target)
design <- filterByExpr(dge, design)
keep <- dge[keep, , keep.lib.sizes=FALSE]
dge <-calcNormFactors(dge) dge<-
One way to reduce dimensionality is the use of multidimensional scaling (MDS). For MDS, we first have to calculate all pairwise distances between our objects (samples in this case), and then create a (typically) two-dimensional representation where these pre-calculated distances are represented as accurately as possible. This means that depending on how the pairwise sample distances are defined, the two-dimensional plot can be very different, and it is important to choose a distance that is suitable for the type of data at hand.
edgeR contains a function plotMDS, which operates on a DGEList object and generates a two-dimensional MDS representation of the samples. The default distance between two samples can be interpreted as the “typical” log fold change between the two samples, for the genes that are most different between them (by default, the top 500 genes, but this can be modified). We generate an MDS plot from the DGEList object dge, coloring by the treatment and using different plot symbols for different cell lines.
plotMDS(dge, top = 500,col=as.double(target$cellLine))