1 Background

The data used in this workflow comes from an RNA-seq experiment where airway smooth muscle cells were treated with dexamethasone, a synthetic glucocorticoid steroid with anti-inflammatory effects (Himes et al. 2014). Glucocorticoids are used, for example, by people with asthma to reduce inflammation of the airways. In the experiment, four human airway smooth muscle cell lines were treated with 1 micromolar dexamethasone for 18 hours. For each of the four cell lines, we have a treated and an untreated sample. For more description of the experiment see the article, PubMed entry 24926665, and for raw data see the GEO entry GSE52778.

Many parts of this tutorial are based on parts of a published RNA-seq workflow available via Love et al. 2015 F1000Research and as a Bioconductor package and on Charlotte Soneson’s material from the bss2019 workshop.

2 Data

FastQ files with a small subset of the reads can be found on https://github.com/statOmics/SGA2019/tree/data-rnaseq Here, we will not use our count table because it is based on a small subset of the reads. We will use the count table that was provided after mapping all the reads with the read mapper star.

library(edgeR)
library(tidyverse)

3 Read featurecounts object

We import the featurecounts object that we have stored.

fc <-  readRDS(
  url("https://github.com/statOmics/SGA2020/blob/data-rnaseq/airway/featureCounts/star_featurecounts.rds?raw=true")
  )
colnames(fc$counts)
## [1] "SRR1039508" "SRR1039509" "SRR1039512" "SRR1039513" "SRR1039516"
## [6] "SRR1039517" "SRR1039520" "SRR1039521"

3.1 Read Meta Data

target <- readRDS("airwayMetaData.rds")
## Warning in readRDS("airwayMetaData.rds"): input string 'Cells from each donor
## were treated with 1 μM dexamethasone, 1 μM albuterol, both dexamethasone and
## albuterol, or control vehicle for 18 h.' cannot be translated to UTF-8, is it
## valid in 'US-ASCII'?
## Warning in readRDS("airwayMetaData.rds"): input string 'Primary ASM cells were
## isolated from four white aborted lung transplant donors with no chronic illness.
## Passages 4 to 7 ASM cells maintained in Ham's F12 medium supplemented with 10%
## FBS were used in all experiments.' cannot be translated to UTF-8, is it valid in
## 'US-ASCII'?

4 Data Analysis

4.1 Setup count object edgeR

dge <- DGEList(fc$counts)
colnames(dge)==target$Run
##  [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE
target$Run%in%colnames(dge)
##  [1]  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE
## [13]  TRUE  TRUE FALSE FALSE
rownames(target) <- target$Run
target <- target[colnames(dge),]
target[,grep(":ch1",colnames(target))]
##            cell line:ch1              cell type:ch1 ercc_mix:ch1
## SRR1039508        N61311 airway smooth muscle cells            -
## SRR1039509        N61311 airway smooth muscle cells            -
## SRR1039512       N052611 airway smooth muscle cells            -
## SRR1039513       N052611 airway smooth muscle cells            1
## SRR1039516       N080611 airway smooth muscle cells            -
## SRR1039517       N080611 airway smooth muscle cells            -
## SRR1039520       N061011 airway smooth muscle cells            1
## SRR1039521       N061011 airway smooth muscle cells            1
##                                  tissue:ch1 treatment:ch1
## SRR1039508 human airway smooth muscle cells     Untreated
## SRR1039509 human airway smooth muscle cells Dexamethasone
## SRR1039512 human airway smooth muscle cells     Untreated
## SRR1039513 human airway smooth muscle cells Dexamethasone
## SRR1039516 human airway smooth muscle cells     Untreated
## SRR1039517 human airway smooth muscle cells Dexamethasone
## SRR1039520 human airway smooth muscle cells     Untreated
## SRR1039521 human airway smooth muscle cells Dexamethasone
target$cellLine <- as.factor(target[,grep("cell line:ch1",colnames(target))])
target$treatment <- as.factor(target[,grep("treatment:ch1",colnames(target))])
target$treatment <- relevel(target$treatment,"Untreated")
colnames(dge) <- paste0(
  substr(target$cellLine, 1, 3),
  "_",
  substr(target$treatment, 1, 3)
  )

4.2 Filtering and normalisation

design <- model.matrix( ~ treatment + cellLine, data = target)
keep <- filterByExpr(dge, design)
dge <- dge[keep, , keep.lib.sizes=FALSE]
dge<-calcNormFactors(dge)

4.3 Data exploration

One way to reduce dimensionality is the use of multidimensional scaling (MDS). For MDS, we first have to calculate all pairwise distances between our objects (samples in this case), and then create a (typically) two-dimensional representation where these pre-calculated distances are represented as accurately as possible. This means that depending on how the pairwise sample distances are defined, the two-dimensional plot can be very different, and it is important to choose a distance that is suitable for the type of data at hand.

edgeR contains a function plotMDS, which operates on a DGEList object and generates a two-dimensional MDS representation of the samples. The default distance between two samples can be interpreted as the “typical” log fold change between the two samples, for the genes that are most different between them (by default, the top 500 genes, but this can be modified). We generate an MDS plot from the DGEList object dge, coloring by the treatment and using different plot symbols for different cell lines.

plotMDS(dge, top = 500,col=as.double(target$cellLine))