1 Background

The data used in this workflow comes from an RNA-seq experiment where airway smooth muscle cells were treated with dexamethasone, a synthetic glucocorticoid steroid with anti-inflammatory effects (Himes et al. 2014). Glucocorticoids are used, for example, by people with asthma to reduce inflammation of the airways. In the experiment, four human airway smooth muscle cell lines were treated with 1 micromolar dexamethasone for 18 hours. For each of the four cell lines, we have a treated and an untreated sample. For more description of the experiment see the article, PubMed entry 24926665, and for raw data see the GEO entry GSE52778.

In this file we will assess DE by starting from transcript counts that we summarize at the gene level. The DE analysis is performed by DESeq2. The analysis is based on the R/markdown script from Charlotte Soneson given at shortcourse in 2019: https://uclouvain-cbio.github.io/BSS2019/ https://uclouvain-cbio.github.io/BSS2019/rnaseq_gene_summerschool_belgium_2019.html

2 Data

FastQ files with a small subset of the reads can be found on https://github.com/statOmics/SGA2019/tree/data-rnaseq

2.1 Read Meta Data

suppressPackageStartupMessages({
  library(tximeta)
  library(DESeq2)
  library(org.Hs.eg.db)
  library(SummarizedExperiment)
  library(ggplot2)
  library(tidyverse)
  library(pheatmap)
})

target<-readRDS("airwayMetaData.rds")

## Warning in readRDS("airwayMetaData.rds"): input string 'Cells from each donor
## were treated with 1 μM dexamethasone, 1 μM albuterol, both dexamethasone and
## albuterol, or control vehicle for 18 h.' cannot be translated to UTF-8, is it
## valid in 'US-ASCII'?

## Warning in readRDS("airwayMetaData.rds"): input string 'Primary ASM cells were
## isolated from four white aborted lung transplant donors with no chronic illness.
## Passages 4 to 7 ASM cells maintained in Ham's F12 medium supplemented with 10%
## FBS were used in all experiments.' cannot be translated to UTF-8, is it valid in
## 'US-ASCII'?

We do not have the Albuterol samples

rownames(target) <- target$Run
target <- target[-grep("Albuterol",target[,"treatment:ch1"]),]
target[,grep(":ch1",colnames(target))]

##            cell line:ch1              cell type:ch1 ercc_mix:ch1
## SRR1039508        N61311 airway smooth muscle cells            -
## SRR1039509        N61311 airway smooth muscle cells            -
## SRR1039512       N052611 airway smooth muscle cells            -
## SRR1039513       N052611 airway smooth muscle cells            1
## SRR1039516       N080611 airway smooth muscle cells            -
## SRR1039517       N080611 airway smooth muscle cells            -
## SRR1039520       N061011 airway smooth muscle cells            1
## SRR1039521       N061011 airway smooth muscle cells            1
##                                  tissue:ch1 treatment:ch1
## SRR1039508 human airway smooth muscle cells     Untreated
## SRR1039509 human airway smooth muscle cells Dexamethasone
## SRR1039512 human airway smooth muscle cells     Untreated
## SRR1039513 human airway smooth muscle cells Dexamethasone
## SRR1039516 human airway smooth muscle cells     Untreated
## SRR1039517 human airway smooth muscle cells Dexamethasone
## SRR1039520 human airway smooth muscle cells     Untreated
## SRR1039521 human airway smooth muscle cells Dexamethasone

target$cellLine <- target[, grep("cell line:ch1", colnames(target))] %>%
  as.factor
target$treatment<-target[, grep("treatment:ch1", colnames(target))] %>%
  as.factor
target$treatment<-relevel(target$treatment, "Untreated")

#Alignment-free transcript quantification

Software such as kallisto [@Bray2016Near], Salmon [@Patro2017Salmon] and Sailfish [@Patro2014Sailfish], as well as other transcript quantification methods like Cufflinks [@Trapnell2010Cufflinks; @Trapnell2013Cufflinks2] and RSEM [@Li2011RSEM], differ from the counting methods introduced in the previous tutorials in that they provide quantifications (usually both as counts and as TPMs) for each transcript. These can then be summarized on the gene level by adding all values for transcripts from the same gene. A simple way to import results from these packages into R is provided by the tximport and tximeta packages. Here, tximport reads the quantifications into a list of matrices, and tximeta aggregates the information into a SummarizedExperiment object, and also automatically adds additional annotations for the features. Both packages can return quantifications on the transcript level or aggregate them on the gene level. They also calculate average transcript lengths for each gene and each sample, which can be used as offsets to improve the differential expression analysis by accounting for differential isoform usage across samples [@Soneson2015Differential]. aturecounts object

##Salmon

The code below imports the Salmon quantifications into R using the tximeta package. Note how the transcriptome that was used for the quantification is automatically recognized and used to annotate the resulting data object. In order for this to work, tximeta requires that the output folder structure from Salmon is retained, since it reads information from the associated log files in addition to the quantified abundances themselves. With the addIds() function, we can add additional annotation columns.

## List all quant.sf output files from Salmon
salmonfiles <- paste0("~/Downloads/airwayLecture/salmon/", target$Run, "/quant.sf")
names(salmonfiles) <- target$Run
stopifnot(all(file.exists(salmonfiles)))

## Add a column "files" to the metadata table. This table must contain at least
## two columns: "names" and "files"
coldata <- cbind(target, files = salmonfiles, stringsAsFactors = FALSE)
coldata$names <- coldata$Run

## Import quantifications on the transcript level
st <- tximeta::tximeta(coldata)

## importing quantifications

## reading in files with read_tsv

## 1 2 3 4 5 6 7 8 
## found matching transcriptome:
## [ GENCODE - Homo sapiens - release 29 ]
## loading existing TxDb created: 2019-11-25 07:49:13
## loading existing transcript ranges created: 2020-11-20 06:53:12
## fetching genome info for GENCODE

## Summarize quantifications on the gene level
sg <- tximeta::summarizeToGene(st)

## loading existing TxDb created: 2019-11-25 07:49:13
## obtaining transcript-to-gene mapping from database
## loading existing gene ranges created: 2020-11-20 06:53:23
## summarizing abundance
## summarizing counts
## summarizing length

## Add gene symbols
sg <- tximeta::addIds(sg, "SYMBOL", gene = TRUE)

## mapping to new IDs using 'org.Hs.eg.db' data package
## if all matching IDs are desired, and '1:many mappings' are reported,
## set multiVals='list' to obtain all the matching IDs
## 'select()' returned 1:many mapping between keys and columns

sg

## class: RangedSummarizedExperiment 
## dim: 58294 8 
## metadata(6): tximetaInfo quantInfo ... txomeInfo txdbInfo
## assays(3): counts abundance length
## rownames(58294): ENSG00000000003.14 ENSG00000000005.5 ...
##   ENSG00000285993.1 ENSG00000285994.1
## rowData names(2): gene_id SYMBOL
## colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
## colData names(101): SampleName title ... treatment names

Note that Salmon returns estimated counts, which are not necessarily integers. They may need to be rounded before they are passed to count-based statistical methods (e.g. DESeq2). This is not necesary for edgeR.

counts_salmon <- round(assay(sg, "counts"))

3 Data Analysis

3.1 Setup count object in edgeR

suppressPackageStartupMessages({
  library(edgeR)
})
dge <- DGEList(counts = counts_salmon,
               samples = target)

avetxlengths <- assay(sg, "length")
stopifnot(all(rownames(avetxlengths) == rownames(counts_salmon)))
stopifnot(all(colnames(avetxlengths) == colnames(counts_salmon)))
avetxlengths <- avetxlengths/exp(rowMeans(log(avetxlengths)))
offsets <- log(calcNormFactors(counts_salmon/avetxlengths)) +
  log(colSums(counts_salmon/avetxlengths))
dge <- scaleOffset(dge, t(t(log(avetxlengths)) + offsets))
names(dge)

## [1] "counts"  "samples" "offset"

3.2 Filtering and normalisation

design <- model.matrix(~treatment+cellLine,data=target)
keep <- filterByExpr(dge, design)
dge <- dge[keep, ,keep.lib.sizes=FALSE]
dge <- calcNormFactors(dge)

## Warning in calcNormFactors.DGEList(dge): object contains offsets, which take precedence over library
## sizes and norm factors (and which will not be recomputed).

dge$samples

##            group lib.size norm.factors SampleName             title
## SRR1039508     1 21048128    1.0580056 GSM1275862  N61311_untreated
## SRR1039509     1 19242591    1.0199099 GSM1275863        N61311_Dex
## SRR1039512     1 26056184    0.9939708 GSM1275866 N052611_untreated
## SRR1039513     1 15638136    0.9464884 GSM1275867       N052611_Dex
## SRR1039516     1 25179091    1.0340526 GSM1275870 N080611_untreated
## SRR1039517     1 31804785    0.9698506 GSM1275871       N080611_Dex
## SRR1039520     1 19619345    1.0306481 GSM1275874 N061011_untreated
## SRR1039521     1 21761908    0.9530232 GSM1275875       N061011_Dex
##            geo_accession                status submission_date last_update_date
## SRR1039508    GSM1275862 Public on Jan 01 2014     Nov 26 2013      May 15 2019
## SRR1039509    GSM1275863 Public on Jan 01 2014     Nov 26 2013      May 15 2019
## SRR1039512    GSM1275866 Public on Jan 01 2014     Nov 26 2013      May 15 2019
## SRR1039513    GSM1275867 Public on Jan 01 2014     Nov 26 2013      May 15 2019
## SRR1039516    GSM1275870 Public on Jan 01 2014     Nov 26 2013      May 15 2019
## SRR1039517    GSM1275871 Public on Jan 01 2014     Nov 26 2013      May 15 2019
## SRR1039520    GSM1275874 Public on Jan 01 2014     Nov 26 2013      May 15 2019
## SRR1039521    GSM1275875 Public on Jan 01 2014     Nov 26 2013      May 15 2019
##            type channel_count            source_name_ch1 organism_ch1
## SRR1039508  SRA             1 airway smooth muscle cells Homo sapiens
## SRR1039509  SRA             1 airway smooth muscle cells Homo sapiens
## SRR1039512  SRA             1 airway smooth muscle cells Homo sapiens
## SRR1039513  SRA             1 airway smooth muscle cells Homo sapiens
## SRR1039516  SRA             1 airway smooth muscle cells Homo sapiens
## SRR1039517  SRA             1 airway smooth muscle cells Homo sapiens
## SRR1039520  SRA             1 airway smooth muscle cells Homo sapiens
## SRR1039521  SRA             1 airway smooth muscle cells Homo sapiens
##                 characteristics_ch1                    characteristics_ch1.1
## SRR1039508     treatment: Untreated tissue: human airway smooth muscle cells
## SRR1039509 treatment: Dexamethasone tissue: human airway smooth muscle cells
## SRR1039512     treatment: Untreated tissue: human airway smooth muscle cells
## SRR1039513 treatment: Dexamethasone tissue: human airway smooth muscle cells
## SRR1039516     treatment: Untreated tissue: human airway smooth muscle cells
## SRR1039517 treatment: Dexamethasone tissue: human airway smooth muscle cells
## SRR1039520     treatment: Untreated tissue: human airway smooth muscle cells
## SRR1039521 treatment: Dexamethasone tissue: human airway smooth muscle cells
##            characteristics_ch1.2 characteristics_ch1.3
## SRR1039508           ercc_mix: -     cell line: N61311
## SRR1039509           ercc_mix: -     cell line: N61311
## SRR1039512           ercc_mix: -    cell line: N052611
## SRR1039513           ercc_mix: 1    cell line: N052611
## SRR1039516           ercc_mix: -    cell line: N080611
## SRR1039517           ercc_mix: -    cell line: N080611
## SRR1039520           ercc_mix: 1    cell line: N061011
## SRR1039521           ercc_mix: 1    cell line: N061011
##                            characteristics_ch1.4
## SRR1039508 cell type: airway smooth muscle cells
## SRR1039509 cell type: airway smooth muscle cells
## SRR1039512 cell type: airway smooth muscle cells
## SRR1039513 cell type: airway smooth muscle cells
## SRR1039516 cell type: airway smooth muscle cells
## SRR1039517 cell type: airway smooth muscle cells
## SRR1039520 cell type: airway smooth muscle cells
## SRR1039521 cell type: airway smooth muscle cells
##                                                                                                                                 treatment_protocol_ch1
## SRR1039508 Cells from each donor were treated with 1 μM dexamethasone,  1 μM albuterol, both dexamethasone and albuterol, or control vehicle for 18 h.
## SRR1039509 Cells from each donor were treated with 1 μM dexamethasone,  1 μM albuterol, both dexamethasone and albuterol, or control vehicle for 18 h.
## SRR1039512 Cells from each donor were treated with 1 μM dexamethasone,  1 μM albuterol, both dexamethasone and albuterol, or control vehicle for 18 h.
## SRR1039513 Cells from each donor were treated with 1 μM dexamethasone,  1 μM albuterol, both dexamethasone and albuterol, or control vehicle for 18 h.
## SRR1039516 Cells from each donor were treated with 1 μM dexamethasone,  1 μM albuterol, both dexamethasone and albuterol, or control vehicle for 18 h.
## SRR1039517 Cells from each donor were treated with 1 μM dexamethasone,  1 μM albuterol, both dexamethasone and albuterol, or control vehicle for 18 h.
## SRR1039520 Cells from each donor were treated with 1 μM dexamethasone,  1 μM albuterol, both dexamethasone and albuterol, or control vehicle for 18 h.
## SRR1039521 Cells from each donor were treated with 1 μM dexamethasone,  1 μM albuterol, both dexamethasone and albuterol, or control vehicle for 18 h.
##                                                                                                                                                                                                                 growth_protocol_ch1
## SRR1039508 Primary ASM cells were isolated from four white aborted lung transplant donors with no chronic illness. Passages 4 to 7 ASM cells maintained in Ham's F12 medium supplemented with 10% FBS were used in all experiments.
## SRR1039509 Primary ASM cells were isolated from four white aborted lung transplant donors with no chronic illness. Passages 4 to 7 ASM cells maintained in Ham's F12 medium supplemented with 10% FBS were used in all experiments.
## SRR1039512 Primary ASM cells were isolated from four white aborted lung transplant donors with no chronic illness. Passages 4 to 7 ASM cells maintained in Ham's F12 medium supplemented with 10% FBS were used in all experiments.
## SRR1039513 Primary ASM cells were isolated from four white aborted lung transplant donors with no chronic illness. Passages 4 to 7 ASM cells maintained in Ham's F12 medium supplemented with 10% FBS were used in all experiments.
## SRR1039516 Primary ASM cells were isolated from four white aborted lung transplant donors with no chronic illness. Passages 4 to 7 ASM cells maintained in Ham's F12 medium supplemented with 10% FBS were used in all experiments.
## SRR1039517 Primary ASM cells were isolated from four white aborted lung transplant donors with no chronic illness. Passages 4 to 7 ASM cells maintained in Ham's F12 medium supplemented with 10% FBS were used in all experiments.
## SRR1039520 Primary ASM cells were isolated from four white aborted lung transplant donors with no chronic illness. Passages 4 to 7 ASM cells maintained in Ham's F12 medium supplemented with 10% FBS were used in all experiments.
## SRR1039521 Primary ASM cells were isolated from four white aborted lung transplant donors with no chronic illness. Passages 4 to 7 ASM cells maintained in Ham's F12 medium supplemented with 10% FBS were used in all experiments.
##            molecule_ch1
## SRR1039508    total RNA
## SRR1039509    total RNA
## SRR1039512    total RNA
## SRR1039513    total RNA
## SRR1039516    total RNA
## SRR1039517    total RNA
## SRR1039520    total RNA
## SRR1039521    total RNA
##                                                                                     extract_protocol_ch1
## SRR1039508 Total RNA was extracted using the miRNAeasy mini kit (Qiagen Sciences, Inc., Germantown, MD).
## SRR1039509 Total RNA was extracted using the miRNAeasy mini kit (Qiagen Sciences, Inc., Germantown, MD).
## SRR1039512 Total RNA was extracted using the miRNAeasy mini kit (Qiagen Sciences, Inc., Germantown, MD).
## SRR1039513 Total RNA was extracted using the miRNAeasy mini kit (Qiagen Sciences, Inc., Germantown, MD).
## SRR1039516 Total RNA was extracted using the miRNAeasy mini kit (Qiagen Sciences, Inc., Germantown, MD).
## SRR1039517 Total RNA was extracted using the miRNAeasy mini kit (Qiagen Sciences, Inc., Germantown, MD).
## SRR1039520 Total RNA was extracted using the miRNAeasy mini kit (Qiagen Sciences, Inc., Germantown, MD).
## SRR1039521 Total RNA was extracted using the miRNAeasy mini kit (Qiagen Sciences, Inc., Germantown, MD).
##                                                    extract_protocol_ch1.1
## SRR1039508 TruSeq RNA Sample Prep Kit v2 (Illumina, Inc., San Diego, CA).
## SRR1039509 TruSeq RNA Sample Prep Kit v2 (Illumina, Inc., San Diego, CA).
## SRR1039512 TruSeq RNA Sample Prep Kit v2 (Illumina, Inc., San Diego, CA).
## SRR1039513 TruSeq RNA Sample Prep Kit v2 (Illumina, Inc., San Diego, CA).
## SRR1039516 TruSeq RNA Sample Prep Kit v2 (Illumina, Inc., San Diego, CA).
## SRR1039517 TruSeq RNA Sample Prep Kit v2 (Illumina, Inc., San Diego, CA).
## SRR1039520 TruSeq RNA Sample Prep Kit v2 (Illumina, Inc., San Diego, CA).
## SRR1039521 TruSeq RNA Sample Prep Kit v2 (Illumina, Inc., San Diego, CA).
##            taxid_ch1                                   data_processing
## SRR1039508      9606 Illumina Casava1.8 software used for basecalling.
## SRR1039509      9606 Illumina Casava1.8 software used for basecalling.
## SRR1039512      9606 Illumina Casava1.8 software used for basecalling.
## SRR1039513      9606 Illumina Casava1.8 software used for basecalling.
## SRR1039516      9606 Illumina Casava1.8 software used for basecalling.
## SRR1039517      9606 Illumina Casava1.8 software used for basecalling.
## SRR1039520      9606 Illumina Casava1.8 software used for basecalling.
## SRR1039521      9606 Illumina Casava1.8 software used for basecalling.
##                                                                                                                                                              data_processing.1
## SRR1039508 Sequenced reads were trimmed by 12bp then mapped to hg19 whole genome using tophat v.2.0.4 with parameters -G [hg19_ERCC_gtf] --no-novel-juncs --transcriptome-only
## SRR1039509 Sequenced reads were trimmed by 12bp then mapped to hg19 whole genome using tophat v.2.0.4 with parameters -G [hg19_ERCC_gtf] --no-novel-juncs --transcriptome-only
## SRR1039512 Sequenced reads were trimmed by 12bp then mapped to hg19 whole genome using tophat v.2.0.4 with parameters -G [hg19_ERCC_gtf] --no-novel-juncs --transcriptome-only
## SRR1039513 Sequenced reads were trimmed by 12bp then mapped to hg19 whole genome using tophat v.2.0.4 with parameters -G [hg19_ERCC_gtf] --no-novel-juncs --transcriptome-only
## SRR1039516 Sequenced reads were trimmed by 12bp then mapped to hg19 whole genome using tophat v.2.0.4 with parameters -G [hg19_ERCC_gtf] --no-novel-juncs --transcriptome-only
## SRR1039517 Sequenced reads were trimmed by 12bp then mapped to hg19 whole genome using tophat v.2.0.4 with parameters -G [hg19_ERCC_gtf] --no-novel-juncs --transcriptome-only
## SRR1039520 Sequenced reads were trimmed by 12bp then mapped to hg19 whole genome using tophat v.2.0.4 with parameters -G [hg19_ERCC_gtf] --no-novel-juncs --transcriptome-only
## SRR1039521 Sequenced reads were trimmed by 12bp then mapped to hg19 whole genome using tophat v.2.0.4 with parameters -G [hg19_ERCC_gtf] --no-novel-juncs --transcriptome-only
##                                                                                                                                                                                                    data_processing.2
## SRR1039508 For each sample, Cufflinks (v.2.0.2) was used to quantify External RNA Controls Consortium (ERCC) Spike-In and hg19 transcripts based on reads that mapped to the provided hg19 and ERCC reference files.
## SRR1039509 For each sample, Cufflinks (v.2.0.2) was used to quantify External RNA Controls Consortium (ERCC) Spike-In and hg19 transcripts based on reads that mapped to the provided hg19 and ERCC reference files.
## SRR1039512 For each sample, Cufflinks (v.2.0.2) was used to quantify External RNA Controls Consortium (ERCC) Spike-In and hg19 transcripts based on reads that mapped to the provided hg19 and ERCC reference files.
## SRR1039513 For each sample, Cufflinks (v.2.0.2) was used to quantify External RNA Controls Consortium (ERCC) Spike-In and hg19 transcripts based on reads that mapped to the provided hg19 and ERCC reference files.
## SRR1039516 For each sample, Cufflinks (v.2.0.2) was used to quantify External RNA Controls Consortium (ERCC) Spike-In and hg19 transcripts based on reads that mapped to the provided hg19 and ERCC reference files.
## SRR1039517 For each sample, Cufflinks (v.2.0.2) was used to quantify External RNA Controls Consortium (ERCC) Spike-In and hg19 transcripts based on reads that mapped to the provided hg19 and ERCC reference files.
## SRR1039520 For each sample, Cufflinks (v.2.0.2) was used to quantify External RNA Controls Consortium (ERCC) Spike-In and hg19 transcripts based on reads that mapped to the provided hg19 and ERCC reference files.
## SRR1039521 For each sample, Cufflinks (v.2.0.2) was used to quantify External RNA Controls Consortium (ERCC) Spike-In and hg19 transcripts based on reads that mapped to the provided hg19 and ERCC reference files.
##                                                                                                                                                                                                                                                                                                                                                                                                                   data_processing.3
## SRR1039508 Differential expression of genes and transcripts in samples treated with Dex vs. Untreated samples was obtained using Cuffdiff (v.2.0.2) with the quantified transcripts computed by Cufflinks (v.2.0.2), while applying bias correction. Options: -p 12 --min-outlier-p 0.05 -b [hg19_reference_index] -u [hg19_gtf] -L Dex,Control  [List of accepted_hits.bam for each sample sorted by condition (i.e. Dex,Control)]
## SRR1039509 Differential expression of genes and transcripts in samples treated with Dex vs. Untreated samples was obtained using Cuffdiff (v.2.0.2) with the quantified transcripts computed by Cufflinks (v.2.0.2), while applying bias correction. Options: -p 12 --min-outlier-p 0.05 -b [hg19_reference_index] -u [hg19_gtf] -L Dex,Control  [List of accepted_hits.bam for each sample sorted by condition (i.e. Dex,Control)]
## SRR1039512 Differential expression of genes and transcripts in samples treated with Dex vs. Untreated samples was obtained using Cuffdiff (v.2.0.2) with the quantified transcripts computed by Cufflinks (v.2.0.2), while applying bias correction. Options: -p 12 --min-outlier-p 0.05 -b [hg19_reference_index] -u [hg19_gtf] -L Dex,Control  [List of accepted_hits.bam for each sample sorted by condition (i.e. Dex,Control)]
## SRR1039513 Differential expression of genes and transcripts in samples treated with Dex vs. Untreated samples was obtained using Cuffdiff (v.2.0.2) with the quantified transcripts computed by Cufflinks (v.2.0.2), while applying bias correction. Options: -p 12 --min-outlier-p 0.05 -b [hg19_reference_index] -u [hg19_gtf] -L Dex,Control  [List of accepted_hits.bam for each sample sorted by condition (i.e. Dex,Control)]
## SRR1039516 Differential expression of genes and transcripts in samples treated with Dex vs. Untreated samples was obtained using Cuffdiff (v.2.0.2) with the quantified transcripts computed by Cufflinks (v.2.0.2), while applying bias correction. Options: -p 12 --min-outlier-p 0.05 -b [hg19_reference_index] -u [hg19_gtf] -L Dex,Control  [List of accepted_hits.bam for each sample sorted by condition (i.e. Dex,Control)]
## SRR1039517 Differential expression of genes and transcripts in samples treated with Dex vs. Untreated samples was obtained using Cuffdiff (v.2.0.2) with the quantified transcripts computed by Cufflinks (v.2.0.2), while applying bias correction. Options: -p 12 --min-outlier-p 0.05 -b [hg19_reference_index] -u [hg19_gtf] -L Dex,Control  [List of accepted_hits.bam for each sample sorted by condition (i.e. Dex,Control)]
## SRR1039520 Differential expression of genes and transcripts in samples treated with Dex vs. Untreated samples was obtained using Cuffdiff (v.2.0.2) with the quantified transcripts computed by Cufflinks (v.2.0.2), while applying bias correction. Options: -p 12 --min-outlier-p 0.05 -b [hg19_reference_index] -u [hg19_gtf] -L Dex,Control  [List of accepted_hits.bam for each sample sorted by condition (i.e. Dex,Control)]
## SRR1039521 Differential expression of genes and transcripts in samples treated with Dex vs. Untreated samples was obtained using Cuffdiff (v.2.0.2) with the quantified transcripts computed by Cufflinks (v.2.0.2), while applying bias correction. Options: -p 12 --min-outlier-p 0.05 -b [hg19_reference_index] -u [hg19_gtf] -L Dex,Control  [List of accepted_hits.bam for each sample sorted by condition (i.e. Dex,Control)]
##                                                                       data_processing.4
## SRR1039508 CummeRbund R package (v.0.1.3) was used to create FPKM matrix of all samples
## SRR1039509 CummeRbund R package (v.0.1.3) was used to create FPKM matrix of all samples
## SRR1039512 CummeRbund R package (v.0.1.3) was used to create FPKM matrix of all samples
## SRR1039513 CummeRbund R package (v.0.1.3) was used to create FPKM matrix of all samples
## SRR1039516 CummeRbund R package (v.0.1.3) was used to create FPKM matrix of all samples
## SRR1039517 CummeRbund R package (v.0.1.3) was used to create FPKM matrix of all samples
## SRR1039520 CummeRbund R package (v.0.1.3) was used to create FPKM matrix of all samples
## SRR1039521 CummeRbund R package (v.0.1.3) was used to create FPKM matrix of all samples
##             data_processing.5
## SRR1039508 Genome_build: hg19
## SRR1039509 Genome_build: hg19
## SRR1039512 Genome_build: hg19
## SRR1039513 Genome_build: hg19
## SRR1039516 Genome_build: hg19
## SRR1039517 Genome_build: hg19
## SRR1039520 Genome_build: hg19
## SRR1039521 Genome_build: hg19
##                                                                                                                                                                                                                                                             data_processing.6
## SRR1039508 Supplementary_files_format_and_content: tab-delimited text files on Series record include FPKM values for each sample (ALL_Sample_FPKM_Matrix.txt) and gene-based differential expression results for Dex vs. Untreated condition (Dex_vs_Untreated_gene_exp.diff)
## SRR1039509 Supplementary_files_format_and_content: tab-delimited text files on Series record include FPKM values for each sample (ALL_Sample_FPKM_Matrix.txt) and gene-based differential expression results for Dex vs. Untreated condition (Dex_vs_Untreated_gene_exp.diff)
## SRR1039512 Supplementary_files_format_and_content: tab-delimited text files on Series record include FPKM values for each sample (ALL_Sample_FPKM_Matrix.txt) and gene-based differential expression results for Dex vs. Untreated condition (Dex_vs_Untreated_gene_exp.diff)
## SRR1039513 Supplementary_files_format_and_content: tab-delimited text files on Series record include FPKM values for each sample (ALL_Sample_FPKM_Matrix.txt) and gene-based differential expression results for Dex vs. Untreated condition (Dex_vs_Untreated_gene_exp.diff)
## SRR1039516 Supplementary_files_format_and_content: tab-delimited text files on Series record include FPKM values for each sample (ALL_Sample_FPKM_Matrix.txt) and gene-based differential expression results for Dex vs. Untreated condition (Dex_vs_Untreated_gene_exp.diff)
## SRR1039517 Supplementary_files_format_and_content: tab-delimited text files on Series record include FPKM values for each sample (ALL_Sample_FPKM_Matrix.txt) and gene-based differential expression results for Dex vs. Untreated condition (Dex_vs_Untreated_gene_exp.diff)
## SRR1039520 Supplementary_files_format_and_content: tab-delimited text files on Series record include FPKM values for each sample (ALL_Sample_FPKM_Matrix.txt) and gene-based differential expression results for Dex vs. Untreated condition (Dex_vs_Untreated_gene_exp.diff)
## SRR1039521 Supplementary_files_format_and_content: tab-delimited text files on Series record include FPKM values for each sample (ALL_Sample_FPKM_Matrix.txt) and gene-based differential expression results for Dex vs. Untreated condition (Dex_vs_Untreated_gene_exp.diff)
##            platform_id   contact_name                 contact_email
## SRR1039508    GPL11154 Blanca,E,Himes bhimes@pennmedicine.upenn.edu
## SRR1039509    GPL11154 Blanca,E,Himes bhimes@pennmedicine.upenn.edu
## SRR1039512    GPL11154 Blanca,E,Himes bhimes@pennmedicine.upenn.edu
## SRR1039513    GPL11154 Blanca,E,Himes bhimes@pennmedicine.upenn.edu
## SRR1039516    GPL11154 Blanca,E,Himes bhimes@pennmedicine.upenn.edu
## SRR1039517    GPL11154 Blanca,E,Himes bhimes@pennmedicine.upenn.edu
## SRR1039520    GPL11154 Blanca,E,Himes bhimes@pennmedicine.upenn.edu
## SRR1039521    GPL11154 Blanca,E,Himes bhimes@pennmedicine.upenn.edu
##            contact_laboratory
## SRR1039508  402 Blockley Hall
## SRR1039509  402 Blockley Hall
## SRR1039512  402 Blockley Hall
## SRR1039513  402 Blockley Hall
## SRR1039516  402 Blockley Hall
## SRR1039517  402 Blockley Hall
## SRR1039520  402 Blockley Hall
## SRR1039521  402 Blockley Hall
##                                                   contact_department
## SRR1039508 Department of Biostatistics, Epidemiology and Informatics
## SRR1039509 Department of Biostatistics, Epidemiology and Informatics
## SRR1039512 Department of Biostatistics, Epidemiology and Informatics
## SRR1039513 Department of Biostatistics, Epidemiology and Informatics
## SRR1039516 Department of Biostatistics, Epidemiology and Informatics
## SRR1039517 Department of Biostatistics, Epidemiology and Informatics
## SRR1039520 Department of Biostatistics, Epidemiology and Informatics
## SRR1039521 Department of Biostatistics, Epidemiology and Informatics
##                     contact_institute contact_address contact_city
## SRR1039508 University of Pennsylvania 423 Guardian Dr Philadelphia
## SRR1039509 University of Pennsylvania 423 Guardian Dr Philadelphia
## SRR1039512 University of Pennsylvania 423 Guardian Dr Philadelphia
## SRR1039513 University of Pennsylvania 423 Guardian Dr Philadelphia
## SRR1039516 University of Pennsylvania 423 Guardian Dr Philadelphia
## SRR1039517 University of Pennsylvania 423 Guardian Dr Philadelphia
## SRR1039520 University of Pennsylvania 423 Guardian Dr Philadelphia
## SRR1039521 University of Pennsylvania 423 Guardian Dr Philadelphia
##            contact_state contact_zip.postal_code contact_country data_row_count
## SRR1039508            PA                   19104             USA              0
## SRR1039509            PA                   19104             USA              0
## SRR1039512            PA                   19104             USA              0
## SRR1039513            PA                   19104             USA              0
## SRR1039516            PA                   19104             USA              0
## SRR1039517            PA                   19104             USA              0
## SRR1039520            PA                   19104             USA              0
## SRR1039521            PA                   19104             USA              0
##               instrument_model library_selection library_source
## SRR1039508 Illumina HiSeq 2000              cDNA transcriptomic
## SRR1039509 Illumina HiSeq 2000              cDNA transcriptomic
## SRR1039512 Illumina HiSeq 2000              cDNA transcriptomic
## SRR1039513 Illumina HiSeq 2000              cDNA transcriptomic
## SRR1039516 Illumina HiSeq 2000              cDNA transcriptomic
## SRR1039517 Illumina HiSeq 2000              cDNA transcriptomic
## SRR1039520 Illumina HiSeq 2000              cDNA transcriptomic
## SRR1039521 Illumina HiSeq 2000              cDNA transcriptomic
##            library_strategy
## SRR1039508          RNA-Seq
## SRR1039509          RNA-Seq
## SRR1039512          RNA-Seq
## SRR1039513          RNA-Seq
## SRR1039516          RNA-Seq
## SRR1039517          RNA-Seq
## SRR1039520          RNA-Seq
## SRR1039521          RNA-Seq
##                                                                  relation
## SRR1039508 BioSample: https://www.ncbi.nlm.nih.gov/biosample/SAMN02422669
## SRR1039509 BioSample: https://www.ncbi.nlm.nih.gov/biosample/SAMN02422675
## SRR1039512 BioSample: https://www.ncbi.nlm.nih.gov/biosample/SAMN02422678
## SRR1039513 BioSample: https://www.ncbi.nlm.nih.gov/biosample/SAMN02422670
## SRR1039516 BioSample: https://www.ncbi.nlm.nih.gov/biosample/SAMN02422682
## SRR1039517 BioSample: https://www.ncbi.nlm.nih.gov/biosample/SAMN02422673
## SRR1039520 BioSample: https://www.ncbi.nlm.nih.gov/biosample/SAMN02422683
## SRR1039521 BioSample: https://www.ncbi.nlm.nih.gov/biosample/SAMN02422677
##                                                      relation.1
## SRR1039508 SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX384345
## SRR1039509 SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX384346
## SRR1039512 SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX384349
## SRR1039513 SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX384350
## SRR1039516 SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX384353
## SRR1039517 SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX384354
## SRR1039520 SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX384357
## SRR1039521 SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX384358
##            supplementary_file_1 cell.line.ch1              cell.type.ch1
## SRR1039508                 NONE        N61311 airway smooth muscle cells
## SRR1039509                 NONE        N61311 airway smooth muscle cells
## SRR1039512                 NONE       N052611 airway smooth muscle cells
## SRR1039513                 NONE       N052611 airway smooth muscle cells
## SRR1039516                 NONE       N080611 airway smooth muscle cells
## SRR1039517                 NONE       N080611 airway smooth muscle cells
## SRR1039520                 NONE       N061011 airway smooth muscle cells
## SRR1039521                 NONE       N061011 airway smooth muscle cells
##            ercc_mix.ch1                       tissue.ch1 treatment.ch1
## SRR1039508            - human airway smooth muscle cells     Untreated
## SRR1039509            - human airway smooth muscle cells Dexamethasone
## SRR1039512            - human airway smooth muscle cells     Untreated
## SRR1039513            1 human airway smooth muscle cells Dexamethasone
## SRR1039516            - human airway smooth muscle cells     Untreated
## SRR1039517            - human airway smooth muscle cells Dexamethasone
## SRR1039520            1 human airway smooth muscle cells     Untreated
## SRR1039521            1 human airway smooth muscle cells Dexamethasone
##                   Run         ReleaseDate            LoadDate    spots
## SRR1039508 SRR1039508 2014-01-02 09:16:11 2013-11-26 16:39:38 22935521
## SRR1039509 SRR1039509 2014-01-02 09:16:11 2013-11-26 16:38:41 21155707
## SRR1039512 SRR1039512 2014-01-02 09:16:11 2013-11-26 16:47:15 28136282
## SRR1039513 SRR1039513 2014-01-02 09:16:11 2013-11-26 17:02:02 43356464
## SRR1039516 SRR1039516 2014-01-02 09:16:11 2013-11-26 17:03:45 30043024
## SRR1039517 SRR1039517 2014-01-02 09:16:11 2013-11-26 16:49:48 34298260
## SRR1039520 SRR1039520 2014-01-02 09:16:11 2013-11-26 17:04:50 34575286
## SRR1039521 SRR1039521 2014-01-02 09:16:11 2013-11-26 17:08:16 41152075
##                 bases spots_with_mates avgLength size_MB AssemblyName
## SRR1039508 2889875646         22935521       126    1588           NA
## SRR1039509 2665619082         21155707       126    1480           NA
## SRR1039512 3545171532         28136282       126    2055           NA
## SRR1039513 3791311776         16823088        87    2271           NA
## SRR1039516 3612545622         27298970       120    2147           NA
## SRR1039517 4321580760         34298260       126    2654           NA
## SRR1039520 3518623962         21275888       101    2109           NA
## SRR1039521 4072315905         23487860        98    2449           NA
##                                                                                     download_path
## SRR1039508 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos/sra-pub-run-5/SRR1039508/SRR1039508.1
## SRR1039509 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos/sra-pub-run-5/SRR1039509/SRR1039509.1
## SRR1039512 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos/sra-pub-run-5/SRR1039512/SRR1039512.1
## SRR1039513 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos/sra-pub-run-5/SRR1039513/SRR1039513.1
## SRR1039516 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos/sra-pub-run-5/SRR1039516/SRR1039516.1
## SRR1039517 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos/sra-pub-run-5/SRR1039517/SRR1039517.1
## SRR1039520 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos/sra-pub-run-5/SRR1039520/SRR1039520.1
## SRR1039521 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos/sra-pub-run-5/SRR1039521/SRR1039521.1
##            Experiment LibraryName LibraryStrategy LibrarySelection
## SRR1039508  SRX384345          NA         RNA-Seq             cDNA
## SRR1039509  SRX384346          NA         RNA-Seq             cDNA
## SRR1039512  SRX384349          NA         RNA-Seq             cDNA
## SRR1039513  SRX384350          NA         RNA-Seq             cDNA
## SRR1039516  SRX384353          NA         RNA-Seq             cDNA
## SRR1039517  SRX384354          NA         RNA-Seq             cDNA
## SRR1039520  SRX384357          NA         RNA-Seq             cDNA
## SRR1039521  SRX384358          NA         RNA-Seq             cDNA
##             LibrarySource LibraryLayout InsertSize InsertDev Platform
## SRR1039508 TRANSCRIPTOMIC        PAIRED          0         0 ILLUMINA
## SRR1039509 TRANSCRIPTOMIC        PAIRED          0         0 ILLUMINA
## SRR1039512 TRANSCRIPTOMIC        PAIRED          0         0 ILLUMINA
## SRR1039513 TRANSCRIPTOMIC        PAIRED          0         0 ILLUMINA
## SRR1039516 TRANSCRIPTOMIC        PAIRED          0         0 ILLUMINA
## SRR1039517 TRANSCRIPTOMIC        PAIRED          0         0 ILLUMINA
## SRR1039520 TRANSCRIPTOMIC        PAIRED          0         0 ILLUMINA
## SRR1039521 TRANSCRIPTOMIC        PAIRED          0         0 ILLUMINA
##                          Model  SRAStudy  BioProject Study_Pubmed_id ProjectID
## SRR1039508 Illumina HiSeq 2000 SRP033351 PRJNA229998               2    229998
## SRR1039509 Illumina HiSeq 2000 SRP033351 PRJNA229998               2    229998
## SRR1039512 Illumina HiSeq 2000 SRP033351 PRJNA229998               2    229998
## SRR1039513 Illumina HiSeq 2000 SRP033351 PRJNA229998               2    229998
## SRR1039516 Illumina HiSeq 2000 SRP033351 PRJNA229998               2    229998
## SRR1039517 Illumina HiSeq 2000 SRP033351 PRJNA229998               2    229998
## SRR1039520 Illumina HiSeq 2000 SRP033351 PRJNA229998               2    229998
## SRR1039521 Illumina HiSeq 2000 SRP033351 PRJNA229998               2    229998
##               Sample    BioSample SampleType TaxID ScientificName g1k_pop_code
## SRR1039508 SRS508568 SAMN02422669     simple  9606   Homo sapiens           NA
## SRR1039509 SRS508567 SAMN02422675     simple  9606   Homo sapiens           NA
## SRR1039512 SRS508571 SAMN02422678     simple  9606   Homo sapiens           NA
## SRR1039513 SRS508572 SAMN02422670     simple  9606   Homo sapiens           NA
## SRR1039516 SRS508575 SAMN02422682     simple  9606   Homo sapiens           NA
## SRR1039517 SRS508576 SAMN02422673     simple  9606   Homo sapiens           NA
## SRR1039520 SRS508579 SAMN02422683     simple  9606   Homo sapiens           NA
## SRR1039521 SRS508580 SAMN02422677     simple  9606   Homo sapiens           NA
##            source g1k_analysis_group Subject_ID Sex Disease Tumor
## SRR1039508     NA                 NA         NA  NA      NA    no
## SRR1039509     NA                 NA         NA  NA      NA    no
## SRR1039512     NA                 NA         NA  NA      NA    no
## SRR1039513     NA                 NA         NA  NA      NA    no
## SRR1039516     NA                 NA         NA  NA      NA    no
## SRR1039517     NA                 NA         NA  NA      NA    no
## SRR1039520     NA                 NA         NA  NA      NA    no
## SRR1039521     NA                 NA         NA  NA      NA    no
##            Affection_Status Analyte_Type Histological_Type Body_Site CenterName
## SRR1039508               NA           NA                NA        NA        GEO
## SRR1039509               NA           NA                NA        NA        GEO
## SRR1039512               NA           NA                NA        NA        GEO
## SRR1039513               NA           NA                NA        NA        GEO
## SRR1039516               NA           NA                NA        NA        GEO
## SRR1039517               NA           NA                NA        NA        GEO
## SRR1039520               NA           NA                NA        NA        GEO
## SRR1039521               NA           NA                NA        NA        GEO
##            Submission dbgap_study_accession Consent
## SRR1039508  SRA114259                    NA  public
## SRR1039509  SRA114259                    NA  public
## SRR1039512  SRA114259                    NA  public
## SRR1039513  SRA114259                    NA  public
## SRR1039516  SRA114259                    NA  public
## SRR1039517  SRA114259                    NA  public
## SRR1039520  SRA114259                    NA  public
## SRR1039521  SRA114259                    NA  public
##                                     RunHash                         ReadHash
## SRR1039508 65345B22A4745B616DD5840CF2AD90EE 54B084AA7AD2A02B39D37CBBD5B1C3DC
## SRR1039509 E2F5FBD3278F29EC6EA7B00FC95C28B2 294D13FE5364BA88B5FEAD68A53CB11B
## SRR1039512 828A997474D0BBD6E5E14DEDB0D9E31D 5834A34CA3CE7EF962A396098543C9E6
## SRR1039513 0EB840FF1F4AA22A5017CB754C039225 5A5DD90EDCD5998A26F7F7BC099D4010
## SRR1039516 F88DCB56A77C7D4408B19C8B7E03CE3E 87E7B86FEEC66F5FF14D8FB7D9CB5AE3
## SRR1039517 87EB19F8B326E4DD8F17A8729295CC8D 1A0159FDA5AF8E3B542666812DE41137
## SRR1039520 3478B4523B6FD56D362B5890F0DBFE8D 0B6E49B16B2296F33C5B250BD1CB93D5
## SRR1039521 1ABC620B261A70743C633D564D0FBA62 28D9F5348982A2FEF14824E13AB8397A
##            cellLine     treatment
## SRR1039508   N61311     Untreated
## SRR1039509   N61311 Dexamethasone
## SRR1039512  N052611     Untreated
## SRR1039513  N052611 Dexamethasone
## SRR1039516  N080611     Untreated
## SRR1039517  N080611 Dexamethasone
## SRR1039520  N061011     Untreated
## SRR1039521  N061011 Dexamethasone

3.3 Data exploration

One way to reduce dimensionality is the use of multidimensional scaling (MDS). For MDS, we first have to calculate all pairwise distances between our objects (samples in this case), and then create a (typically) two-dimensional representation where these pre-calculated distances are represented as accurately as possible. This means that depending on how the pairwise sample distances are defined, the two-dimensional plot can be very different, and it is important to choose a distance that is suitable for the type of data at hand.

edgeR contains a function plotMDS, which operates on a DGEList object and generates a two-dimensional MDS representation of the samples. The default distance between two samples can be interpreted as the “typical” log fold change between the two samples, for the genes that are most different between them (by default, the top 500 genes, but this can be modified). We generate an MDS plot from the DGEList object dge, coloring by the treatment and using different plot symbols for different cell lines.

colnames(dge) <- paste0(substr(target$cellLine,1,3),"_",substr(target$treatment,1,3))
plotMDS(dge, top = 500,col=as.double(target$cellLine))

##Main analysis

3.3.1 Model

We first estimate the overdispersion.

dge <- estimateDisp(dge, design)
plotBCV(dge)

fit <- glmFit(dge, design)
lrt <- glmLRT(fit, coef = "treatmentDexamethasone")
ttAll <- topTags(lrt, n = nrow(dge)) # all genes
hist(ttAll$table$PValue)

tt <- topTags(lrt, n = nrow(dge), p.value = 0.05) # genes with adj.p<0.05
head(tt)

## Coefficient:  treatmentDexamethasone 
##                       logFC   logCPM       LR        PValue           FDR
## ENSG00000109906.13 6.453642 4.098238 759.1054 4.203188e-167 7.206366e-163
## ENSG00000189221.9  3.395128 6.726943 502.9298 2.190325e-111 1.877656e-107
## ENSG00000120129.5  2.948968 7.262484 501.2133 5.175974e-111 2.958069e-107
## ENSG00000157214.13 2.016345 7.076735 452.8130 1.761685e-100  7.551020e-97
## ENSG00000196136.17 3.237191 6.919779 435.9441  8.263555e-97  2.833573e-93
## ENSG00000152583.12 4.501127 5.494281 434.3035  1.880370e-96  4.607574e-93

nrow(tt)

## [1] 4581

3.3.2 Plots

We first make a volcanoplot and an MA plot.

library(ggplot2)
volcano <-  ggplot(ttAll$table,aes(x=logFC,y=-log10(PValue),color=FDR<0.05)) + geom_point() + scale_color_manual(values=c("black","red"))
volcano

plotSmear(lrt, de.tags = row.names(tt$table))

library(pheatmap)
pheatmap(cpm(dge,log=TRUE)[rownames(tt$table)[1:30],])

LS0tCnRpdGxlOiAiQWlyd2F5OiBERSBhbmFseXNpcyBiYXNlZCBvbiB0cmFuc2NyaXB0IGNvdW50cyBhbmQgREVTZXEyIgphdXRob3I6ICJMaWV2ZW4gQ2xlbWVudCIKZGF0ZTogInN0YXRPbWljcywgR2hlbnQgVW5pdmVyc2l0eSAoaHR0cHM6Ly9zdGF0b21pY3MuZ2l0aHViLmlvKSIKb3V0cHV0OgogICAgaHRtbF9kb2N1bWVudDoKICAgICAgY29kZV9kb3dubG9hZDogdHJ1ZQogICAgICB0aGVtZTogY29zbW8KICAgICAgdG9jOiB0cnVlCiAgICAgIHRvY19mbG9hdDogdHJ1ZQogICAgICBoaWdobGlnaHQ6IHRhbmdvCiAgICAgIG51bWJlcl9zZWN0aW9uczogdHJ1ZQotLS0KCiMgQmFja2dyb3VuZApUaGUgZGF0YSB1c2VkIGluIHRoaXMgd29ya2Zsb3cgY29tZXMgZnJvbSBhbiBSTkEtc2VxIGV4cGVyaW1lbnQgd2hlcmUgYWlyd2F5IHNtb290aCBtdXNjbGUgY2VsbHMgd2VyZSB0cmVhdGVkIHdpdGggZGV4YW1ldGhhc29uZSwgYSBzeW50aGV0aWMgZ2x1Y29jb3J0aWNvaWQgc3Rlcm9pZCB3aXRoIGFudGktaW5mbGFtbWF0b3J5IGVmZmVjdHMgKEhpbWVzIGV0IGFsLiAyMDE0KS4gR2x1Y29jb3J0aWNvaWRzIGFyZSB1c2VkLCBmb3IgZXhhbXBsZSwgYnkgcGVvcGxlIHdpdGggYXN0aG1hIHRvIHJlZHVjZSBpbmZsYW1tYXRpb24gb2YgdGhlIGFpcndheXMuIEluIHRoZSBleHBlcmltZW50LCBmb3VyIGh1bWFuIGFpcndheSBzbW9vdGggbXVzY2xlIGNlbGwgbGluZXMgd2VyZSB0cmVhdGVkIHdpdGggMSBtaWNyb21vbGFyIGRleGFtZXRoYXNvbmUgZm9yIDE4IGhvdXJzLiBGb3IgZWFjaCBvZiB0aGUgZm91ciBjZWxsIGxpbmVzLCB3ZSBoYXZlIGEgdHJlYXRlZCBhbmQgYW4gdW50cmVhdGVkIHNhbXBsZS4KRm9yIG1vcmUgZGVzY3JpcHRpb24gb2YgdGhlIGV4cGVyaW1lbnQgc2VlIHRoZSBhcnRpY2xlLCBQdWJNZWQgZW50cnkgMjQ5MjY2NjUsIGFuZCBmb3IgcmF3IGRhdGEgc2VlIHRoZSBHRU8gZW50cnkgR1NFNTI3NzguCgpJbiB0aGlzIGZpbGUgd2Ugd2lsbCBhc3Nlc3MgREUgYnkgc3RhcnRpbmcgZnJvbSB0cmFuc2NyaXB0IGNvdW50cyB0aGF0IHdlIHN1bW1hcml6ZSBhdCB0aGUgZ2VuZSBsZXZlbC4gVGhlIERFIGFuYWx5c2lzIGlzIHBlcmZvcm1lZCBieSBERVNlcTIuClRoZSBhbmFseXNpcyBpcyBiYXNlZCBvbiB0aGUgUi9tYXJrZG93biBzY3JpcHQgZnJvbSBDaGFybG90dGUgU29uZXNvbiBnaXZlbiBhdCBzaG9ydGNvdXJzZSBpbiAyMDE5OiBodHRwczovL3VjbG91dmFpbi1jYmlvLmdpdGh1Yi5pby9CU1MyMDE5LwpodHRwczovL3VjbG91dmFpbi1jYmlvLmdpdGh1Yi5pby9CU1MyMDE5L3JuYXNlcV9nZW5lX3N1bW1lcnNjaG9vbF9iZWxnaXVtXzIwMTkuaHRtbAoKIyBEYXRhCkZhc3RRIGZpbGVzIHdpdGggYSBzbWFsbCBzdWJzZXQgb2YgdGhlIHJlYWRzIGNhbiBiZSBmb3VuZCBvbiBodHRwczovL2dpdGh1Yi5jb20vc3RhdE9taWNzL1NHQTIwMTkvdHJlZS9kYXRhLXJuYXNlcQoKCiMjIFJlYWQgTWV0YSBEYXRhCgpgYGB7cn0Kc3VwcHJlc3NQYWNrYWdlU3RhcnR1cE1lc3NhZ2VzKHsKICBsaWJyYXJ5KHR4aW1ldGEpCiAgbGlicmFyeShERVNlcTIpCiAgbGlicmFyeShvcmcuSHMuZWcuZGIpCiAgbGlicmFyeShTdW1tYXJpemVkRXhwZXJpbWVudCkKICBsaWJyYXJ5KGdncGxvdDIpCiAgbGlicmFyeSh0aWR5dmVyc2UpCiAgbGlicmFyeShwaGVhdG1hcCkKfSkKCnRhcmdldDwtcmVhZFJEUygiYWlyd2F5TWV0YURhdGEucmRzIikKYGBgCgpXZSBkbyBub3QgaGF2ZSB0aGUgQWxidXRlcm9sIHNhbXBsZXMKCmBgYHtyfQpyb3duYW1lcyh0YXJnZXQpIDwtIHRhcmdldCRSdW4KdGFyZ2V0IDwtIHRhcmdldFstZ3JlcCgiQWxidXRlcm9sIix0YXJnZXRbLCJ0cmVhdG1lbnQ6Y2gxIl0pLF0KdGFyZ2V0WyxncmVwKCI6Y2gxIixjb2xuYW1lcyh0YXJnZXQpKV0KYGBgCgpgYGB7cn0KdGFyZ2V0JGNlbGxMaW5lIDwtIHRhcmdldFssIGdyZXAoImNlbGwgbGluZTpjaDEiLCBjb2xuYW1lcyh0YXJnZXQpKV0gJT4lCiAgYXMuZmFjdG9yCnRhcmdldCR0cmVhdG1lbnQ8LXRhcmdldFssIGdyZXAoInRyZWF0bWVudDpjaDEiLCBjb2xuYW1lcyh0YXJnZXQpKV0gJT4lCiAgYXMuZmFjdG9yCnRhcmdldCR0cmVhdG1lbnQ8LXJlbGV2ZWwodGFyZ2V0JHRyZWF0bWVudCwgIlVudHJlYXRlZCIpCmBgYAoKCiNBbGlnbm1lbnQtZnJlZSB0cmFuc2NyaXB0IHF1YW50aWZpY2F0aW9uCgpTb2Z0d2FyZSBzdWNoIGFzICprYWxsaXN0byoKW0BCcmF5MjAxNk5lYXJdLCAqU2FsbW9uKiBbQFBhdHJvMjAxN1NhbG1vbl0gYW5kICpTYWlsZmlzaCoKW0BQYXRybzIwMTRTYWlsZmlzaF0sIGFzIHdlbGwgYXMgb3RoZXIgdHJhbnNjcmlwdCBxdWFudGlmaWNhdGlvbiBtZXRob2RzIGxpa2UKKkN1ZmZsaW5rcyogW0BUcmFwbmVsbDIwMTBDdWZmbGlua3M7IEBUcmFwbmVsbDIwMTNDdWZmbGlua3MyXSBhbmQgKlJTRU0qCltATGkyMDExUlNFTV0sIGRpZmZlciBmcm9tIHRoZSBjb3VudGluZyBtZXRob2RzIGludHJvZHVjZWQgaW4gdGhlIHByZXZpb3VzIHR1dG9yaWFscyBpbiB0aGF0IHRoZXkKcHJvdmlkZSBxdWFudGlmaWNhdGlvbnMgKHVzdWFsbHkgYm90aCBhcyBjb3VudHMgYW5kIGFzIFRQTXMpIGZvciBlYWNoCip0cmFuc2NyaXB0Ki4gVGhlc2UgY2FuIHRoZW4gYmUgc3VtbWFyaXplZCBvbiB0aGUgZ2VuZSBsZXZlbCBieSBhZGRpbmcgYWxsCnZhbHVlcyBmb3IgdHJhbnNjcmlwdHMgZnJvbSB0aGUgc2FtZSBnZW5lLiBBIHNpbXBsZSB3YXkgdG8gaW1wb3J0IHJlc3VsdHMgZnJvbQp0aGVzZSBwYWNrYWdlcyBpbnRvIFIgaXMgcHJvdmlkZWQgYnkgdGhlIGB0eGltcG9ydGAgYW5kCmB0eGltZXRhYCBwYWNrYWdlcy4gSGVyZSwgKnR4aW1wb3J0KiByZWFkcyB0aGUgcXVhbnRpZmljYXRpb25zIGludG8gYQpsaXN0IG9mIG1hdHJpY2VzLCBhbmQgKnR4aW1ldGEqIGFnZ3JlZ2F0ZXMgdGhlIGluZm9ybWF0aW9uIGludG8gYQoqU3VtbWFyaXplZEV4cGVyaW1lbnQqIG9iamVjdCwgYW5kIGFsc28gYXV0b21hdGljYWxseSBhZGRzIGFkZGl0aW9uYWwKYW5ub3RhdGlvbnMgZm9yIHRoZSBmZWF0dXJlcy4gQm90aCBwYWNrYWdlcyBjYW4gcmV0dXJuIHF1YW50aWZpY2F0aW9ucyBvbiB0aGUKdHJhbnNjcmlwdCBsZXZlbCBvciBhZ2dyZWdhdGUgdGhlbSBvbiB0aGUgZ2VuZSBsZXZlbC4gVGhleSBhbHNvIGNhbGN1bGF0ZQphdmVyYWdlIHRyYW5zY3JpcHQgbGVuZ3RocyBmb3IgZWFjaCBnZW5lIGFuZCBlYWNoIHNhbXBsZSwgd2hpY2ggY2FuIGJlIHVzZWQgYXMKb2Zmc2V0cyB0byBpbXByb3ZlIHRoZSBkaWZmZXJlbnRpYWwgZXhwcmVzc2lvbiBhbmFseXNpcyBieSBhY2NvdW50aW5nIGZvcgpkaWZmZXJlbnRpYWwgaXNvZm9ybSB1c2FnZSBhY3Jvc3Mgc2FtcGxlcyBbQFNvbmVzb24yMDE1RGlmZmVyZW50aWFsXS4KYXR1cmVjb3VudHMgb2JqZWN0CgojI1NhbG1vbgoKVGhlIGNvZGUgYmVsb3cgaW1wb3J0cyB0aGUgKlNhbG1vbiogcXVhbnRpZmljYXRpb25zIGludG8gUiB1c2luZyB0aGUgKnR4aW1ldGEqCnBhY2thZ2UuIE5vdGUgaG93IHRoZSB0cmFuc2NyaXB0b21lIHRoYXQgd2FzIHVzZWQgZm9yIHRoZSBxdWFudGlmaWNhdGlvbiBpcwphdXRvbWF0aWNhbGx5IHJlY29nbml6ZWQgYW5kIHVzZWQgdG8gYW5ub3RhdGUgdGhlIHJlc3VsdGluZyBkYXRhIG9iamVjdC4gSW4Kb3JkZXIgZm9yIHRoaXMgdG8gd29yaywgKnR4aW1ldGEqIHJlcXVpcmVzIHRoYXQgdGhlIG91dHB1dCBmb2xkZXIgc3RydWN0dXJlIGZyb20KU2FsbW9uIGlzIHJldGFpbmVkLCBzaW5jZSBpdCByZWFkcyBpbmZvcm1hdGlvbiBmcm9tIHRoZSBhc3NvY2lhdGVkIGxvZyBmaWxlcyBpbgphZGRpdGlvbiB0byB0aGUgcXVhbnRpZmllZCBhYnVuZGFuY2VzIHRoZW1zZWx2ZXMuIFdpdGggdGhlIGBhZGRJZHMoKWAgZnVuY3Rpb24sCndlIGNhbiBhZGQgYWRkaXRpb25hbCBhbm5vdGF0aW9uIGNvbHVtbnMuCgpgYGB7cn0KCiMjIExpc3QgYWxsIHF1YW50LnNmIG91dHB1dCBmaWxlcyBmcm9tIFNhbG1vbgpzYWxtb25maWxlcyA8LSBwYXN0ZTAoIn4vRG93bmxvYWRzL2FpcndheUxlY3R1cmUvc2FsbW9uLyIsIHRhcmdldCRSdW4sICIvcXVhbnQuc2YiKQpuYW1lcyhzYWxtb25maWxlcykgPC0gdGFyZ2V0JFJ1bgpzdG9waWZub3QoYWxsKGZpbGUuZXhpc3RzKHNhbG1vbmZpbGVzKSkpCgojIyBBZGQgYSBjb2x1bW4gImZpbGVzIiB0byB0aGUgbWV0YWRhdGEgdGFibGUuIFRoaXMgdGFibGUgbXVzdCBjb250YWluIGF0IGxlYXN0CiMjIHR3byBjb2x1bW5zOiAibmFtZXMiIGFuZCAiZmlsZXMiCmNvbGRhdGEgPC0gY2JpbmQodGFyZ2V0LCBmaWxlcyA9IHNhbG1vbmZpbGVzLCBzdHJpbmdzQXNGYWN0b3JzID0gRkFMU0UpCmNvbGRhdGEkbmFtZXMgPC0gY29sZGF0YSRSdW4KCiMjIEltcG9ydCBxdWFudGlmaWNhdGlvbnMgb24gdGhlIHRyYW5zY3JpcHQgbGV2ZWwKc3QgPC0gdHhpbWV0YTo6dHhpbWV0YShjb2xkYXRhKQoKIyMgU3VtbWFyaXplIHF1YW50aWZpY2F0aW9ucyBvbiB0aGUgZ2VuZSBsZXZlbApzZyA8LSB0eGltZXRhOjpzdW1tYXJpemVUb0dlbmUoc3QpCgojIyBBZGQgZ2VuZSBzeW1ib2xzCnNnIDwtIHR4aW1ldGE6OmFkZElkcyhzZywgIlNZTUJPTCIsIGdlbmUgPSBUUlVFKQpzZwpgYGAKCk5vdGUgdGhhdCAqU2FsbW9uKiByZXR1cm5zICplc3RpbWF0ZWQqIGNvdW50cywgd2hpY2ggYXJlIG5vdCBuZWNlc3NhcmlseQppbnRlZ2Vycy4gVGhleSBtYXkgbmVlZCB0byBiZSByb3VuZGVkIGJlZm9yZSB0aGV5IGFyZSBwYXNzZWQgdG8gY291bnQtYmFzZWQKc3RhdGlzdGljYWwgbWV0aG9kcyAoZS5nLiAqREVTZXEyKikuClRoaXMgaXMgbm90IG5lY2VzYXJ5IGZvciAqZWRnZVIqLgoKYGBge3J9CmNvdW50c19zYWxtb24gPC0gcm91bmQoYXNzYXkoc2csICJjb3VudHMiKSkKYGBgCgoKIyBEYXRhIEFuYWx5c2lzCiMjIFNldHVwIGNvdW50IG9iamVjdCBpbiBlZGdlUgoKYGBge3J9CnN1cHByZXNzUGFja2FnZVN0YXJ0dXBNZXNzYWdlcyh7CiAgbGlicmFyeShlZGdlUikKfSkKZGdlIDwtIERHRUxpc3QoY291bnRzID0gY291bnRzX3NhbG1vbiwKICAgICAgICAgICAgICAgc2FtcGxlcyA9IHRhcmdldCkKYGBgCgpgYGB7cn0KYXZldHhsZW5ndGhzIDwtIGFzc2F5KHNnLCAibGVuZ3RoIikKc3RvcGlmbm90KGFsbChyb3duYW1lcyhhdmV0eGxlbmd0aHMpID09IHJvd25hbWVzKGNvdW50c19zYWxtb24pKSkKc3RvcGlmbm90KGFsbChjb2xuYW1lcyhhdmV0eGxlbmd0aHMpID09IGNvbG5hbWVzKGNvdW50c19zYWxtb24pKSkKYXZldHhsZW5ndGhzIDwtIGF2ZXR4bGVuZ3Rocy9leHAocm93TWVhbnMobG9nKGF2ZXR4bGVuZ3RocykpKQpvZmZzZXRzIDwtIGxvZyhjYWxjTm9ybUZhY3RvcnMoY291bnRzX3NhbG1vbi9hdmV0eGxlbmd0aHMpKSArCiAgbG9nKGNvbFN1bXMoY291bnRzX3NhbG1vbi9hdmV0eGxlbmd0aHMpKQpkZ2UgPC0gc2NhbGVPZmZzZXQoZGdlLCB0KHQobG9nKGF2ZXR4bGVuZ3RocykpICsgb2Zmc2V0cykpCm5hbWVzKGRnZSkKYGBgCgojIyBGaWx0ZXJpbmcgYW5kIG5vcm1hbGlzYXRpb24KCmBgYHtyfQpkZXNpZ24gPC0gbW9kZWwubWF0cml4KH50cmVhdG1lbnQrY2VsbExpbmUsZGF0YT10YXJnZXQpCmtlZXAgPC0gZmlsdGVyQnlFeHByKGRnZSwgZGVzaWduKQpkZ2UgPC0gZGdlW2tlZXAsICxrZWVwLmxpYi5zaXplcz1GQUxTRV0KZGdlIDwtIGNhbGNOb3JtRmFjdG9ycyhkZ2UpCmRnZSRzYW1wbGVzCmBgYAoKIyMgRGF0YSBleHBsb3JhdGlvbgoKT25lIHdheSB0byByZWR1Y2UgZGltZW5zaW9uYWxpdHkgaXMgdGhlIHVzZSBvZiBtdWx0aWRpbWVuc2lvbmFsIHNjYWxpbmcgKE1EUykuIEZvciBNRFMsIHdlIGZpcnN0IGhhdmUgdG8gY2FsY3VsYXRlIGFsbCBwYWlyd2lzZSBkaXN0YW5jZXMgYmV0d2VlbiBvdXIgb2JqZWN0cyAoc2FtcGxlcyBpbiB0aGlzIGNhc2UpLCBhbmQgdGhlbiBjcmVhdGUgYSAodHlwaWNhbGx5KSB0d28tZGltZW5zaW9uYWwgcmVwcmVzZW50YXRpb24gd2hlcmUgdGhlc2UgcHJlLWNhbGN1bGF0ZWQgZGlzdGFuY2VzIGFyZSByZXByZXNlbnRlZCBhcyBhY2N1cmF0ZWx5IGFzIHBvc3NpYmxlLiBUaGlzIG1lYW5zIHRoYXQgZGVwZW5kaW5nIG9uIGhvdyB0aGUgcGFpcndpc2Ugc2FtcGxlIGRpc3RhbmNlcyBhcmUgZGVmaW5lZCwgdGhlIHR3by1kaW1lbnNpb25hbCBwbG90IGNhbiBiZSB2ZXJ5IGRpZmZlcmVudCwgYW5kIGl0IGlzIGltcG9ydGFudCB0byBjaG9vc2UgYSBkaXN0YW5jZSB0aGF0IGlzIHN1aXRhYmxlIGZvciB0aGUgdHlwZSBvZiBkYXRhIGF0IGhhbmQuCgplZGdlUiBjb250YWlucyBhIGZ1bmN0aW9uIHBsb3RNRFMsIHdoaWNoIG9wZXJhdGVzIG9uIGEgREdFTGlzdCBvYmplY3QgYW5kIGdlbmVyYXRlcyBhIHR3by1kaW1lbnNpb25hbCBNRFMgcmVwcmVzZW50YXRpb24gb2YgdGhlIHNhbXBsZXMuIFRoZSBkZWZhdWx0IGRpc3RhbmNlIGJldHdlZW4gdHdvIHNhbXBsZXMgY2FuIGJlIGludGVycHJldGVkIGFzIHRoZSAidHlwaWNhbCIgbG9nIGZvbGQgY2hhbmdlIGJldHdlZW4gdGhlIHR3byBzYW1wbGVzLCBmb3IgdGhlIGdlbmVzIHRoYXQgYXJlIG1vc3QgZGlmZmVyZW50IGJldHdlZW4gdGhlbSAoYnkgZGVmYXVsdCwgdGhlIHRvcCA1MDAgZ2VuZXMsIGJ1dCB0aGlzIGNhbiBiZSBtb2RpZmllZCkuIFdlIGdlbmVyYXRlIGFuIE1EUyBwbG90IGZyb20gdGhlIERHRUxpc3Qgb2JqZWN0IGRnZSwgY29sb3JpbmcgYnkgdGhlIHRyZWF0bWVudCBhbmQgdXNpbmcgZGlmZmVyZW50IHBsb3Qgc3ltYm9scyBmb3IgZGlmZmVyZW50IGNlbGwgbGluZXMuCgpgYGB7cn0KY29sbmFtZXMoZGdlKSA8LSBwYXN0ZTAoc3Vic3RyKHRhcmdldCRjZWxsTGluZSwxLDMpLCJfIixzdWJzdHIodGFyZ2V0JHRyZWF0bWVudCwxLDMpKQpwbG90TURTKGRnZSwgdG9wID0gNTAwLGNvbD1hcy5kb3VibGUodGFyZ2V0JGNlbGxMaW5lKSkKYGBgCgoKIyNNYWluIGFuYWx5c2lzCgojIyMgTW9kZWwKCldlIGZpcnN0IGVzdGltYXRlIHRoZSBvdmVyZGlzcGVyc2lvbi4KCmBgYHtyfQpkZ2UgPC0gZXN0aW1hdGVEaXNwKGRnZSwgZGVzaWduKQpwbG90QkNWKGRnZSkKYGBgCgoKYGBge3J9CmZpdCA8LSBnbG1GaXQoZGdlLCBkZXNpZ24pCmxydCA8LSBnbG1MUlQoZml0LCBjb2VmID0gInRyZWF0bWVudERleGFtZXRoYXNvbmUiKQp0dEFsbCA8LSB0b3BUYWdzKGxydCwgbiA9IG5yb3coZGdlKSkgIyBhbGwgZ2VuZXMKaGlzdCh0dEFsbCR0YWJsZSRQVmFsdWUpCnR0IDwtIHRvcFRhZ3MobHJ0LCBuID0gbnJvdyhkZ2UpLCBwLnZhbHVlID0gMC4wNSkgIyBnZW5lcyB3aXRoIGFkai5wPDAuMDUKaGVhZCh0dCkKbnJvdyh0dCkKYGBgCgoKIyMjIFBsb3RzCgpXZSBmaXJzdCBtYWtlIGEgdm9sY2Fub3Bsb3QgYW5kIGFuIE1BIHBsb3QuCgpgYGB7cn0KbGlicmFyeShnZ3Bsb3QyKQp2b2xjYW5vIDwtICBnZ3Bsb3QodHRBbGwkdGFibGUsYWVzKHg9bG9nRkMseT0tbG9nMTAoUFZhbHVlKSxjb2xvcj1GRFI8MC4wNSkpICsgZ2VvbV9wb2ludCgpICsgc2NhbGVfY29sb3JfbWFudWFsKHZhbHVlcz1jKCJibGFjayIsInJlZCIpKQp2b2xjYW5vCnBsb3RTbWVhcihscnQsIGRlLnRhZ3MgPSByb3cubmFtZXModHQkdGFibGUpKQpsaWJyYXJ5KHBoZWF0bWFwKQpwaGVhdG1hcChjcG0oZGdlLGxvZz1UUlVFKVtyb3duYW1lcyh0dCR0YWJsZSlbMTozMF0sXSkKYGBgCg==

Airway: DE analysis based on transcript counts and DESeq2

Lieven Clement

statOmics, Ghent University (https://statomics.github.io)