The data used in this workflow comes from an RNA-seq experiment where airway smooth muscle cells were treated with dexamethasone, a synthetic glucocorticoid steroid with anti-inflammatory effects (Himes et al. 2014). Glucocorticoids are used, for example, by people with asthma to reduce inflammation of the airways. In the experiment, four human airway smooth muscle cell lines were treated with 1 micromolar dexamethasone for 18 hours. For each of the four cell lines, we have a treated and an untreated sample. For more description of the experiment see the article, PubMed entry 24926665, and for raw data see the GEO entry GSE52778.
In this file we will assess DE by starting from transcript counts that we summarize at the gene level. The DE analysis is performed by DESeq2. The analysis is based on the R/markdown script from Charlotte Soneson given at shortcourse in 2019: https://uclouvain-cbio.github.io/BSS2019/ https://uclouvain-cbio.github.io/BSS2019/rnaseq_gene_summerschool_belgium_2019.html
FastQ files with a small subset of the reads can be found on https://github.com/statOmics/SGA2019/tree/data-rnaseq
suppressPackageStartupMessages({
library(tximeta)
library(DESeq2)
library(org.Hs.eg.db)
library(SummarizedExperiment)
library(ggplot2)
library(tidyverse)
library(pheatmap)
})
readRDS("airwayMetaData.rds") target<-
## Warning in readRDS("airwayMetaData.rds"): input string 'Cells from each donor
## were treated with 1 μM dexamethasone, 1 μM albuterol, both dexamethasone and
## albuterol, or control vehicle for 18 h.' cannot be translated to UTF-8, is it
## valid in 'US-ASCII'?
## Warning in readRDS("airwayMetaData.rds"): input string 'Primary ASM cells were
## isolated from four white aborted lung transplant donors with no chronic illness.
## Passages 4 to 7 ASM cells maintained in Ham's F12 medium supplemented with 10%
## FBS were used in all experiments.' cannot be translated to UTF-8, is it valid in
## 'US-ASCII'?
We do not have the Albuterol samples
rownames(target) <- target$Run
target[-grep("Albuterol",target[,"treatment:ch1"]),]
target <-grep(":ch1",colnames(target))] target[,
## cell line:ch1 cell type:ch1 ercc_mix:ch1
## SRR1039508 N61311 airway smooth muscle cells -
## SRR1039509 N61311 airway smooth muscle cells -
## SRR1039512 N052611 airway smooth muscle cells -
## SRR1039513 N052611 airway smooth muscle cells 1
## SRR1039516 N080611 airway smooth muscle cells -
## SRR1039517 N080611 airway smooth muscle cells -
## SRR1039520 N061011 airway smooth muscle cells 1
## SRR1039521 N061011 airway smooth muscle cells 1
## tissue:ch1 treatment:ch1
## SRR1039508 human airway smooth muscle cells Untreated
## SRR1039509 human airway smooth muscle cells Dexamethasone
## SRR1039512 human airway smooth muscle cells Untreated
## SRR1039513 human airway smooth muscle cells Dexamethasone
## SRR1039516 human airway smooth muscle cells Untreated
## SRR1039517 human airway smooth muscle cells Dexamethasone
## SRR1039520 human airway smooth muscle cells Untreated
## SRR1039521 human airway smooth muscle cells Dexamethasone
$cellLine <- target[, grep("cell line:ch1", colnames(target))] %>%
target as.factor
$treatment<-target[, grep("treatment:ch1", colnames(target))] %>%
target as.factor
$treatment<-relevel(target$treatment, "Untreated") target
#Alignment-free transcript quantification
Software such as kallisto [@Bray2016Near], Salmon [@Patro2017Salmon] and Sailfish [@Patro2014Sailfish], as well as other transcript quantification methods like Cufflinks [@Trapnell2010Cufflinks; @Trapnell2013Cufflinks2] and RSEM [@Li2011RSEM], differ from the counting methods introduced in the previous tutorials in that they provide quantifications (usually both as counts and as TPMs) for each transcript. These can then be summarized on the gene level by adding all values for transcripts from the same gene. A simple way to import results from these packages into R is provided by the tximport
and tximeta
packages. Here, tximport reads the quantifications into a list of matrices, and tximeta aggregates the information into a SummarizedExperiment object, and also automatically adds additional annotations for the features. Both packages can return quantifications on the transcript level or aggregate them on the gene level. They also calculate average transcript lengths for each gene and each sample, which can be used as offsets to improve the differential expression analysis by accounting for differential isoform usage across samples [@Soneson2015Differential]. aturecounts object
##Salmon
The code below imports the Salmon quantifications into R using the tximeta package. Note how the transcriptome that was used for the quantification is automatically recognized and used to annotate the resulting data object. In order for this to work, tximeta requires that the output folder structure from Salmon is retained, since it reads information from the associated log files in addition to the quantified abundances themselves. With the addIds()
function, we can add additional annotation columns.
## List all quant.sf output files from Salmon
paste0("~/Downloads/airwayLecture/salmon/", target$Run, "/quant.sf")
salmonfiles <-names(salmonfiles) <- target$Run
stopifnot(all(file.exists(salmonfiles)))
## Add a column "files" to the metadata table. This table must contain at least
## two columns: "names" and "files"
cbind(target, files = salmonfiles, stringsAsFactors = FALSE)
coldata <-$names <- coldata$Run
coldata
## Import quantifications on the transcript level
tximeta::tximeta(coldata) st <-
## importing quantifications
## reading in files with read_tsv
## 1 2 3 4 5 6 7 8
## found matching transcriptome:
## [ GENCODE - Homo sapiens - release 29 ]
## loading existing TxDb created: 2019-11-25 07:49:13
## loading existing transcript ranges created: 2020-11-20 06:53:12
## fetching genome info for GENCODE
## Summarize quantifications on the gene level
tximeta::summarizeToGene(st) sg <-
## loading existing TxDb created: 2019-11-25 07:49:13
## obtaining transcript-to-gene mapping from database
## loading existing gene ranges created: 2020-11-20 06:53:23
## summarizing abundance
## summarizing counts
## summarizing length
## Add gene symbols
tximeta::addIds(sg, "SYMBOL", gene = TRUE) sg <-
## mapping to new IDs using 'org.Hs.eg.db' data package
## if all matching IDs are desired, and '1:many mappings' are reported,
## set multiVals='list' to obtain all the matching IDs
## 'select()' returned 1:many mapping between keys and columns
sg
## class: RangedSummarizedExperiment
## dim: 58294 8
## metadata(6): tximetaInfo quantInfo ... txomeInfo txdbInfo
## assays(3): counts abundance length
## rownames(58294): ENSG00000000003.14 ENSG00000000005.5 ...
## ENSG00000285993.1 ENSG00000285994.1
## rowData names(2): gene_id SYMBOL
## colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
## colData names(101): SampleName title ... treatment names
Note that Salmon returns estimated counts, which are not necessarily integers. They may need to be rounded before they are passed to count-based statistical methods (e.g. DESeq2). This is not necesary for edgeR.
round(assay(sg, "counts")) counts_salmon <-
suppressPackageStartupMessages({
library(edgeR)
}) DGEList(counts = counts_salmon,
dge <-samples = target)
assay(sg, "length")
avetxlengths <-stopifnot(all(rownames(avetxlengths) == rownames(counts_salmon)))
stopifnot(all(colnames(avetxlengths) == colnames(counts_salmon)))
avetxlengths/exp(rowMeans(log(avetxlengths)))
avetxlengths <- log(calcNormFactors(counts_salmon/avetxlengths)) +
offsets <- log(colSums(counts_salmon/avetxlengths))
scaleOffset(dge, t(t(log(avetxlengths)) + offsets))
dge <-names(dge)
## [1] "counts" "samples" "offset"
model.matrix(~treatment+cellLine,data=target)
design <- filterByExpr(dge, design)
keep <- dge[keep, ,keep.lib.sizes=FALSE]
dge <- calcNormFactors(dge) dge <-
## Warning in calcNormFactors.DGEList(dge): object contains offsets, which take precedence over library
## sizes and norm factors (and which will not be recomputed).
$samples dge
## group lib.size norm.factors SampleName title
## SRR1039508 1 21048128 1.0580056 GSM1275862 N61311_untreated
## SRR1039509 1 19242591 1.0199099 GSM1275863 N61311_Dex
## SRR1039512 1 26056184 0.9939708 GSM1275866 N052611_untreated
## SRR1039513 1 15638136 0.9464884 GSM1275867 N052611_Dex
## SRR1039516 1 25179091 1.0340526 GSM1275870 N080611_untreated
## SRR1039517 1 31804785 0.9698506 GSM1275871 N080611_Dex
## SRR1039520 1 19619345 1.0306481 GSM1275874 N061011_untreated
## SRR1039521 1 21761908 0.9530232 GSM1275875 N061011_Dex
## geo_accession status submission_date last_update_date
## SRR1039508 GSM1275862 Public on Jan 01 2014 Nov 26 2013 May 15 2019
## SRR1039509 GSM1275863 Public on Jan 01 2014 Nov 26 2013 May 15 2019
## SRR1039512 GSM1275866 Public on Jan 01 2014 Nov 26 2013 May 15 2019
## SRR1039513 GSM1275867 Public on Jan 01 2014 Nov 26 2013 May 15 2019
## SRR1039516 GSM1275870 Public on Jan 01 2014 Nov 26 2013 May 15 2019
## SRR1039517 GSM1275871 Public on Jan 01 2014 Nov 26 2013 May 15 2019
## SRR1039520 GSM1275874 Public on Jan 01 2014 Nov 26 2013 May 15 2019
## SRR1039521 GSM1275875 Public on Jan 01 2014 Nov 26 2013 May 15 2019
## type channel_count source_name_ch1 organism_ch1
## SRR1039508 SRA 1 airway smooth muscle cells Homo sapiens
## SRR1039509 SRA 1 airway smooth muscle cells Homo sapiens
## SRR1039512 SRA 1 airway smooth muscle cells Homo sapiens
## SRR1039513 SRA 1 airway smooth muscle cells Homo sapiens
## SRR1039516 SRA 1 airway smooth muscle cells Homo sapiens
## SRR1039517 SRA 1 airway smooth muscle cells Homo sapiens
## SRR1039520 SRA 1 airway smooth muscle cells Homo sapiens
## SRR1039521 SRA 1 airway smooth muscle cells Homo sapiens
## characteristics_ch1 characteristics_ch1.1
## SRR1039508 treatment: Untreated tissue: human airway smooth muscle cells
## SRR1039509 treatment: Dexamethasone tissue: human airway smooth muscle cells
## SRR1039512 treatment: Untreated tissue: human airway smooth muscle cells
## SRR1039513 treatment: Dexamethasone tissue: human airway smooth muscle cells
## SRR1039516 treatment: Untreated tissue: human airway smooth muscle cells
## SRR1039517 treatment: Dexamethasone tissue: human airway smooth muscle cells
## SRR1039520 treatment: Untreated tissue: human airway smooth muscle cells
## SRR1039521 treatment: Dexamethasone tissue: human airway smooth muscle cells
## characteristics_ch1.2 characteristics_ch1.3
## SRR1039508 ercc_mix: - cell line: N61311
## SRR1039509 ercc_mix: - cell line: N61311
## SRR1039512 ercc_mix: - cell line: N052611
## SRR1039513 ercc_mix: 1 cell line: N052611
## SRR1039516 ercc_mix: - cell line: N080611
## SRR1039517 ercc_mix: - cell line: N080611
## SRR1039520 ercc_mix: 1 cell line: N061011
## SRR1039521 ercc_mix: 1 cell line: N061011
## characteristics_ch1.4
## SRR1039508 cell type: airway smooth muscle cells
## SRR1039509 cell type: airway smooth muscle cells
## SRR1039512 cell type: airway smooth muscle cells
## SRR1039513 cell type: airway smooth muscle cells
## SRR1039516 cell type: airway smooth muscle cells
## SRR1039517 cell type: airway smooth muscle cells
## SRR1039520 cell type: airway smooth muscle cells
## SRR1039521 cell type: airway smooth muscle cells
## treatment_protocol_ch1
## SRR1039508 Cells from each donor were treated with 1 μM dexamethasone, 1 μM albuterol, both dexamethasone and albuterol, or control vehicle for 18 h.
## SRR1039509 Cells from each donor were treated with 1 μM dexamethasone, 1 μM albuterol, both dexamethasone and albuterol, or control vehicle for 18 h.
## SRR1039512 Cells from each donor were treated with 1 μM dexamethasone, 1 μM albuterol, both dexamethasone and albuterol, or control vehicle for 18 h.
## SRR1039513 Cells from each donor were treated with 1 μM dexamethasone, 1 μM albuterol, both dexamethasone and albuterol, or control vehicle for 18 h.
## SRR1039516 Cells from each donor were treated with 1 μM dexamethasone, 1 μM albuterol, both dexamethasone and albuterol, or control vehicle for 18 h.
## SRR1039517 Cells from each donor were treated with 1 μM dexamethasone, 1 μM albuterol, both dexamethasone and albuterol, or control vehicle for 18 h.
## SRR1039520 Cells from each donor were treated with 1 μM dexamethasone, 1 μM albuterol, both dexamethasone and albuterol, or control vehicle for 18 h.
## SRR1039521 Cells from each donor were treated with 1 μM dexamethasone, 1 μM albuterol, both dexamethasone and albuterol, or control vehicle for 18 h.
## growth_protocol_ch1
## SRR1039508 Primary ASM cells were isolated from four white aborted lung transplant donors with no chronic illness. Passages 4 to 7 ASM cells maintained in Ham's F12 medium supplemented with 10% FBS were used in all experiments.
## SRR1039509 Primary ASM cells were isolated from four white aborted lung transplant donors with no chronic illness. Passages 4 to 7 ASM cells maintained in Ham's F12 medium supplemented with 10% FBS were used in all experiments.
## SRR1039512 Primary ASM cells were isolated from four white aborted lung transplant donors with no chronic illness. Passages 4 to 7 ASM cells maintained in Ham's F12 medium supplemented with 10% FBS were used in all experiments.
## SRR1039513 Primary ASM cells were isolated from four white aborted lung transplant donors with no chronic illness. Passages 4 to 7 ASM cells maintained in Ham's F12 medium supplemented with 10% FBS were used in all experiments.
## SRR1039516 Primary ASM cells were isolated from four white aborted lung transplant donors with no chronic illness. Passages 4 to 7 ASM cells maintained in Ham's F12 medium supplemented with 10% FBS were used in all experiments.
## SRR1039517 Primary ASM cells were isolated from four white aborted lung transplant donors with no chronic illness. Passages 4 to 7 ASM cells maintained in Ham's F12 medium supplemented with 10% FBS were used in all experiments.
## SRR1039520 Primary ASM cells were isolated from four white aborted lung transplant donors with no chronic illness. Passages 4 to 7 ASM cells maintained in Ham's F12 medium supplemented with 10% FBS were used in all experiments.
## SRR1039521 Primary ASM cells were isolated from four white aborted lung transplant donors with no chronic illness. Passages 4 to 7 ASM cells maintained in Ham's F12 medium supplemented with 10% FBS were used in all experiments.
## molecule_ch1
## SRR1039508 total RNA
## SRR1039509 total RNA
## SRR1039512 total RNA
## SRR1039513 total RNA
## SRR1039516 total RNA
## SRR1039517 total RNA
## SRR1039520 total RNA
## SRR1039521 total RNA
## extract_protocol_ch1
## SRR1039508 Total RNA was extracted using the miRNAeasy mini kit (Qiagen Sciences, Inc., Germantown, MD).
## SRR1039509 Total RNA was extracted using the miRNAeasy mini kit (Qiagen Sciences, Inc., Germantown, MD).
## SRR1039512 Total RNA was extracted using the miRNAeasy mini kit (Qiagen Sciences, Inc., Germantown, MD).
## SRR1039513 Total RNA was extracted using the miRNAeasy mini kit (Qiagen Sciences, Inc., Germantown, MD).
## SRR1039516 Total RNA was extracted using the miRNAeasy mini kit (Qiagen Sciences, Inc., Germantown, MD).
## SRR1039517 Total RNA was extracted using the miRNAeasy mini kit (Qiagen Sciences, Inc., Germantown, MD).
## SRR1039520 Total RNA was extracted using the miRNAeasy mini kit (Qiagen Sciences, Inc., Germantown, MD).
## SRR1039521 Total RNA was extracted using the miRNAeasy mini kit (Qiagen Sciences, Inc., Germantown, MD).
## extract_protocol_ch1.1
## SRR1039508 TruSeq RNA Sample Prep Kit v2 (Illumina, Inc., San Diego, CA).
## SRR1039509 TruSeq RNA Sample Prep Kit v2 (Illumina, Inc., San Diego, CA).
## SRR1039512 TruSeq RNA Sample Prep Kit v2 (Illumina, Inc., San Diego, CA).
## SRR1039513 TruSeq RNA Sample Prep Kit v2 (Illumina, Inc., San Diego, CA).
## SRR1039516 TruSeq RNA Sample Prep Kit v2 (Illumina, Inc., San Diego, CA).
## SRR1039517 TruSeq RNA Sample Prep Kit v2 (Illumina, Inc., San Diego, CA).
## SRR1039520 TruSeq RNA Sample Prep Kit v2 (Illumina, Inc., San Diego, CA).
## SRR1039521 TruSeq RNA Sample Prep Kit v2 (Illumina, Inc., San Diego, CA).
## taxid_ch1 data_processing
## SRR1039508 9606 Illumina Casava1.8 software used for basecalling.
## SRR1039509 9606 Illumina Casava1.8 software used for basecalling.
## SRR1039512 9606 Illumina Casava1.8 software used for basecalling.
## SRR1039513 9606 Illumina Casava1.8 software used for basecalling.
## SRR1039516 9606 Illumina Casava1.8 software used for basecalling.
## SRR1039517 9606 Illumina Casava1.8 software used for basecalling.
## SRR1039520 9606 Illumina Casava1.8 software used for basecalling.
## SRR1039521 9606 Illumina Casava1.8 software used for basecalling.
## data_processing.1
## SRR1039508 Sequenced reads were trimmed by 12bp then mapped to hg19 whole genome using tophat v.2.0.4 with parameters -G [hg19_ERCC_gtf] --no-novel-juncs --transcriptome-only
## SRR1039509 Sequenced reads were trimmed by 12bp then mapped to hg19 whole genome using tophat v.2.0.4 with parameters -G [hg19_ERCC_gtf] --no-novel-juncs --transcriptome-only
## SRR1039512 Sequenced reads were trimmed by 12bp then mapped to hg19 whole genome using tophat v.2.0.4 with parameters -G [hg19_ERCC_gtf] --no-novel-juncs --transcriptome-only
## SRR1039513 Sequenced reads were trimmed by 12bp then mapped to hg19 whole genome using tophat v.2.0.4 with parameters -G [hg19_ERCC_gtf] --no-novel-juncs --transcriptome-only
## SRR1039516 Sequenced reads were trimmed by 12bp then mapped to hg19 whole genome using tophat v.2.0.4 with parameters -G [hg19_ERCC_gtf] --no-novel-juncs --transcriptome-only
## SRR1039517 Sequenced reads were trimmed by 12bp then mapped to hg19 whole genome using tophat v.2.0.4 with parameters -G [hg19_ERCC_gtf] --no-novel-juncs --transcriptome-only
## SRR1039520 Sequenced reads were trimmed by 12bp then mapped to hg19 whole genome using tophat v.2.0.4 with parameters -G [hg19_ERCC_gtf] --no-novel-juncs --transcriptome-only
## SRR1039521 Sequenced reads were trimmed by 12bp then mapped to hg19 whole genome using tophat v.2.0.4 with parameters -G [hg19_ERCC_gtf] --no-novel-juncs --transcriptome-only
## data_processing.2
## SRR1039508 For each sample, Cufflinks (v.2.0.2) was used to quantify External RNA Controls Consortium (ERCC) Spike-In and hg19 transcripts based on reads that mapped to the provided hg19 and ERCC reference files.
## SRR1039509 For each sample, Cufflinks (v.2.0.2) was used to quantify External RNA Controls Consortium (ERCC) Spike-In and hg19 transcripts based on reads that mapped to the provided hg19 and ERCC reference files.
## SRR1039512 For each sample, Cufflinks (v.2.0.2) was used to quantify External RNA Controls Consortium (ERCC) Spike-In and hg19 transcripts based on reads that mapped to the provided hg19 and ERCC reference files.
## SRR1039513 For each sample, Cufflinks (v.2.0.2) was used to quantify External RNA Controls Consortium (ERCC) Spike-In and hg19 transcripts based on reads that mapped to the provided hg19 and ERCC reference files.
## SRR1039516 For each sample, Cufflinks (v.2.0.2) was used to quantify External RNA Controls Consortium (ERCC) Spike-In and hg19 transcripts based on reads that mapped to the provided hg19 and ERCC reference files.
## SRR1039517 For each sample, Cufflinks (v.2.0.2) was used to quantify External RNA Controls Consortium (ERCC) Spike-In and hg19 transcripts based on reads that mapped to the provided hg19 and ERCC reference files.
## SRR1039520 For each sample, Cufflinks (v.2.0.2) was used to quantify External RNA Controls Consortium (ERCC) Spike-In and hg19 transcripts based on reads that mapped to the provided hg19 and ERCC reference files.
## SRR1039521 For each sample, Cufflinks (v.2.0.2) was used to quantify External RNA Controls Consortium (ERCC) Spike-In and hg19 transcripts based on reads that mapped to the provided hg19 and ERCC reference files.
## data_processing.3
## SRR1039508 Differential expression of genes and transcripts in samples treated with Dex vs. Untreated samples was obtained using Cuffdiff (v.2.0.2) with the quantified transcripts computed by Cufflinks (v.2.0.2), while applying bias correction. Options: -p 12 --min-outlier-p 0.05 -b [hg19_reference_index] -u [hg19_gtf] -L Dex,Control [List of accepted_hits.bam for each sample sorted by condition (i.e. Dex,Control)]
## SRR1039509 Differential expression of genes and transcripts in samples treated with Dex vs. Untreated samples was obtained using Cuffdiff (v.2.0.2) with the quantified transcripts computed by Cufflinks (v.2.0.2), while applying bias correction. Options: -p 12 --min-outlier-p 0.05 -b [hg19_reference_index] -u [hg19_gtf] -L Dex,Control [List of accepted_hits.bam for each sample sorted by condition (i.e. Dex,Control)]
## SRR1039512 Differential expression of genes and transcripts in samples treated with Dex vs. Untreated samples was obtained using Cuffdiff (v.2.0.2) with the quantified transcripts computed by Cufflinks (v.2.0.2), while applying bias correction. Options: -p 12 --min-outlier-p 0.05 -b [hg19_reference_index] -u [hg19_gtf] -L Dex,Control [List of accepted_hits.bam for each sample sorted by condition (i.e. Dex,Control)]
## SRR1039513 Differential expression of genes and transcripts in samples treated with Dex vs. Untreated samples was obtained using Cuffdiff (v.2.0.2) with the quantified transcripts computed by Cufflinks (v.2.0.2), while applying bias correction. Options: -p 12 --min-outlier-p 0.05 -b [hg19_reference_index] -u [hg19_gtf] -L Dex,Control [List of accepted_hits.bam for each sample sorted by condition (i.e. Dex,Control)]
## SRR1039516 Differential expression of genes and transcripts in samples treated with Dex vs. Untreated samples was obtained using Cuffdiff (v.2.0.2) with the quantified transcripts computed by Cufflinks (v.2.0.2), while applying bias correction. Options: -p 12 --min-outlier-p 0.05 -b [hg19_reference_index] -u [hg19_gtf] -L Dex,Control [List of accepted_hits.bam for each sample sorted by condition (i.e. Dex,Control)]
## SRR1039517 Differential expression of genes and transcripts in samples treated with Dex vs. Untreated samples was obtained using Cuffdiff (v.2.0.2) with the quantified transcripts computed by Cufflinks (v.2.0.2), while applying bias correction. Options: -p 12 --min-outlier-p 0.05 -b [hg19_reference_index] -u [hg19_gtf] -L Dex,Control [List of accepted_hits.bam for each sample sorted by condition (i.e. Dex,Control)]
## SRR1039520 Differential expression of genes and transcripts in samples treated with Dex vs. Untreated samples was obtained using Cuffdiff (v.2.0.2) with the quantified transcripts computed by Cufflinks (v.2.0.2), while applying bias correction. Options: -p 12 --min-outlier-p 0.05 -b [hg19_reference_index] -u [hg19_gtf] -L Dex,Control [List of accepted_hits.bam for each sample sorted by condition (i.e. Dex,Control)]
## SRR1039521 Differential expression of genes and transcripts in samples treated with Dex vs. Untreated samples was obtained using Cuffdiff (v.2.0.2) with the quantified transcripts computed by Cufflinks (v.2.0.2), while applying bias correction. Options: -p 12 --min-outlier-p 0.05 -b [hg19_reference_index] -u [hg19_gtf] -L Dex,Control [List of accepted_hits.bam for each sample sorted by condition (i.e. Dex,Control)]
## data_processing.4
## SRR1039508 CummeRbund R package (v.0.1.3) was used to create FPKM matrix of all samples
## SRR1039509 CummeRbund R package (v.0.1.3) was used to create FPKM matrix of all samples
## SRR1039512 CummeRbund R package (v.0.1.3) was used to create FPKM matrix of all samples
## SRR1039513 CummeRbund R package (v.0.1.3) was used to create FPKM matrix of all samples
## SRR1039516 CummeRbund R package (v.0.1.3) was used to create FPKM matrix of all samples
## SRR1039517 CummeRbund R package (v.0.1.3) was used to create FPKM matrix of all samples
## SRR1039520 CummeRbund R package (v.0.1.3) was used to create FPKM matrix of all samples
## SRR1039521 CummeRbund R package (v.0.1.3) was used to create FPKM matrix of all samples
## data_processing.5
## SRR1039508 Genome_build: hg19
## SRR1039509 Genome_build: hg19
## SRR1039512 Genome_build: hg19
## SRR1039513 Genome_build: hg19
## SRR1039516 Genome_build: hg19
## SRR1039517 Genome_build: hg19
## SRR1039520 Genome_build: hg19
## SRR1039521 Genome_build: hg19
## data_processing.6
## SRR1039508 Supplementary_files_format_and_content: tab-delimited text files on Series record include FPKM values for each sample (ALL_Sample_FPKM_Matrix.txt) and gene-based differential expression results for Dex vs. Untreated condition (Dex_vs_Untreated_gene_exp.diff)
## SRR1039509 Supplementary_files_format_and_content: tab-delimited text files on Series record include FPKM values for each sample (ALL_Sample_FPKM_Matrix.txt) and gene-based differential expression results for Dex vs. Untreated condition (Dex_vs_Untreated_gene_exp.diff)
## SRR1039512 Supplementary_files_format_and_content: tab-delimited text files on Series record include FPKM values for each sample (ALL_Sample_FPKM_Matrix.txt) and gene-based differential expression results for Dex vs. Untreated condition (Dex_vs_Untreated_gene_exp.diff)
## SRR1039513 Supplementary_files_format_and_content: tab-delimited text files on Series record include FPKM values for each sample (ALL_Sample_FPKM_Matrix.txt) and gene-based differential expression results for Dex vs. Untreated condition (Dex_vs_Untreated_gene_exp.diff)
## SRR1039516 Supplementary_files_format_and_content: tab-delimited text files on Series record include FPKM values for each sample (ALL_Sample_FPKM_Matrix.txt) and gene-based differential expression results for Dex vs. Untreated condition (Dex_vs_Untreated_gene_exp.diff)
## SRR1039517 Supplementary_files_format_and_content: tab-delimited text files on Series record include FPKM values for each sample (ALL_Sample_FPKM_Matrix.txt) and gene-based differential expression results for Dex vs. Untreated condition (Dex_vs_Untreated_gene_exp.diff)
## SRR1039520 Supplementary_files_format_and_content: tab-delimited text files on Series record include FPKM values for each sample (ALL_Sample_FPKM_Matrix.txt) and gene-based differential expression results for Dex vs. Untreated condition (Dex_vs_Untreated_gene_exp.diff)
## SRR1039521 Supplementary_files_format_and_content: tab-delimited text files on Series record include FPKM values for each sample (ALL_Sample_FPKM_Matrix.txt) and gene-based differential expression results for Dex vs. Untreated condition (Dex_vs_Untreated_gene_exp.diff)
## platform_id contact_name contact_email
## SRR1039508 GPL11154 Blanca,E,Himes bhimes@pennmedicine.upenn.edu
## SRR1039509 GPL11154 Blanca,E,Himes bhimes@pennmedicine.upenn.edu
## SRR1039512 GPL11154 Blanca,E,Himes bhimes@pennmedicine.upenn.edu
## SRR1039513 GPL11154 Blanca,E,Himes bhimes@pennmedicine.upenn.edu
## SRR1039516 GPL11154 Blanca,E,Himes bhimes@pennmedicine.upenn.edu
## SRR1039517 GPL11154 Blanca,E,Himes bhimes@pennmedicine.upenn.edu
## SRR1039520 GPL11154 Blanca,E,Himes bhimes@pennmedicine.upenn.edu
## SRR1039521 GPL11154 Blanca,E,Himes bhimes@pennmedicine.upenn.edu
## contact_laboratory
## SRR1039508 402 Blockley Hall
## SRR1039509 402 Blockley Hall
## SRR1039512 402 Blockley Hall
## SRR1039513 402 Blockley Hall
## SRR1039516 402 Blockley Hall
## SRR1039517 402 Blockley Hall
## SRR1039520 402 Blockley Hall
## SRR1039521 402 Blockley Hall
## contact_department
## SRR1039508 Department of Biostatistics, Epidemiology and Informatics
## SRR1039509 Department of Biostatistics, Epidemiology and Informatics
## SRR1039512 Department of Biostatistics, Epidemiology and Informatics
## SRR1039513 Department of Biostatistics, Epidemiology and Informatics
## SRR1039516 Department of Biostatistics, Epidemiology and Informatics
## SRR1039517 Department of Biostatistics, Epidemiology and Informatics
## SRR1039520 Department of Biostatistics, Epidemiology and Informatics
## SRR1039521 Department of Biostatistics, Epidemiology and Informatics
## contact_institute contact_address contact_city
## SRR1039508 University of Pennsylvania 423 Guardian Dr Philadelphia
## SRR1039509 University of Pennsylvania 423 Guardian Dr Philadelphia
## SRR1039512 University of Pennsylvania 423 Guardian Dr Philadelphia
## SRR1039513 University of Pennsylvania 423 Guardian Dr Philadelphia
## SRR1039516 University of Pennsylvania 423 Guardian Dr Philadelphia
## SRR1039517 University of Pennsylvania 423 Guardian Dr Philadelphia
## SRR1039520 University of Pennsylvania 423 Guardian Dr Philadelphia
## SRR1039521 University of Pennsylvania 423 Guardian Dr Philadelphia
## contact_state contact_zip.postal_code contact_country data_row_count
## SRR1039508 PA 19104 USA 0
## SRR1039509 PA 19104 USA 0
## SRR1039512 PA 19104 USA 0
## SRR1039513 PA 19104 USA 0
## SRR1039516 PA 19104 USA 0
## SRR1039517 PA 19104 USA 0
## SRR1039520 PA 19104 USA 0
## SRR1039521 PA 19104 USA 0
## instrument_model library_selection library_source
## SRR1039508 Illumina HiSeq 2000 cDNA transcriptomic
## SRR1039509 Illumina HiSeq 2000 cDNA transcriptomic
## SRR1039512 Illumina HiSeq 2000 cDNA transcriptomic
## SRR1039513 Illumina HiSeq 2000 cDNA transcriptomic
## SRR1039516 Illumina HiSeq 2000 cDNA transcriptomic
## SRR1039517 Illumina HiSeq 2000 cDNA transcriptomic
## SRR1039520 Illumina HiSeq 2000 cDNA transcriptomic
## SRR1039521 Illumina HiSeq 2000 cDNA transcriptomic
## library_strategy
## SRR1039508 RNA-Seq
## SRR1039509 RNA-Seq
## SRR1039512 RNA-Seq
## SRR1039513 RNA-Seq
## SRR1039516 RNA-Seq
## SRR1039517 RNA-Seq
## SRR1039520 RNA-Seq
## SRR1039521 RNA-Seq
## relation
## SRR1039508 BioSample: https://www.ncbi.nlm.nih.gov/biosample/SAMN02422669
## SRR1039509 BioSample: https://www.ncbi.nlm.nih.gov/biosample/SAMN02422675
## SRR1039512 BioSample: https://www.ncbi.nlm.nih.gov/biosample/SAMN02422678
## SRR1039513 BioSample: https://www.ncbi.nlm.nih.gov/biosample/SAMN02422670
## SRR1039516 BioSample: https://www.ncbi.nlm.nih.gov/biosample/SAMN02422682
## SRR1039517 BioSample: https://www.ncbi.nlm.nih.gov/biosample/SAMN02422673
## SRR1039520 BioSample: https://www.ncbi.nlm.nih.gov/biosample/SAMN02422683
## SRR1039521 BioSample: https://www.ncbi.nlm.nih.gov/biosample/SAMN02422677
## relation.1
## SRR1039508 SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX384345
## SRR1039509 SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX384346
## SRR1039512 SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX384349
## SRR1039513 SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX384350
## SRR1039516 SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX384353
## SRR1039517 SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX384354
## SRR1039520 SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX384357
## SRR1039521 SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX384358
## supplementary_file_1 cell.line.ch1 cell.type.ch1
## SRR1039508 NONE N61311 airway smooth muscle cells
## SRR1039509 NONE N61311 airway smooth muscle cells
## SRR1039512 NONE N052611 airway smooth muscle cells
## SRR1039513 NONE N052611 airway smooth muscle cells
## SRR1039516 NONE N080611 airway smooth muscle cells
## SRR1039517 NONE N080611 airway smooth muscle cells
## SRR1039520 NONE N061011 airway smooth muscle cells
## SRR1039521 NONE N061011 airway smooth muscle cells
## ercc_mix.ch1 tissue.ch1 treatment.ch1
## SRR1039508 - human airway smooth muscle cells Untreated
## SRR1039509 - human airway smooth muscle cells Dexamethasone
## SRR1039512 - human airway smooth muscle cells Untreated
## SRR1039513 1 human airway smooth muscle cells Dexamethasone
## SRR1039516 - human airway smooth muscle cells Untreated
## SRR1039517 - human airway smooth muscle cells Dexamethasone
## SRR1039520 1 human airway smooth muscle cells Untreated
## SRR1039521 1 human airway smooth muscle cells Dexamethasone
## Run ReleaseDate LoadDate spots
## SRR1039508 SRR1039508 2014-01-02 09:16:11 2013-11-26 16:39:38 22935521
## SRR1039509 SRR1039509 2014-01-02 09:16:11 2013-11-26 16:38:41 21155707
## SRR1039512 SRR1039512 2014-01-02 09:16:11 2013-11-26 16:47:15 28136282
## SRR1039513 SRR1039513 2014-01-02 09:16:11 2013-11-26 17:02:02 43356464
## SRR1039516 SRR1039516 2014-01-02 09:16:11 2013-11-26 17:03:45 30043024
## SRR1039517 SRR1039517 2014-01-02 09:16:11 2013-11-26 16:49:48 34298260
## SRR1039520 SRR1039520 2014-01-02 09:16:11 2013-11-26 17:04:50 34575286
## SRR1039521 SRR1039521 2014-01-02 09:16:11 2013-11-26 17:08:16 41152075
## bases spots_with_mates avgLength size_MB AssemblyName
## SRR1039508 2889875646 22935521 126 1588 NA
## SRR1039509 2665619082 21155707 126 1480 NA
## SRR1039512 3545171532 28136282 126 2055 NA
## SRR1039513 3791311776 16823088 87 2271 NA
## SRR1039516 3612545622 27298970 120 2147 NA
## SRR1039517 4321580760 34298260 126 2654 NA
## SRR1039520 3518623962 21275888 101 2109 NA
## SRR1039521 4072315905 23487860 98 2449 NA
## download_path
## SRR1039508 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos/sra-pub-run-5/SRR1039508/SRR1039508.1
## SRR1039509 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos/sra-pub-run-5/SRR1039509/SRR1039509.1
## SRR1039512 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos/sra-pub-run-5/SRR1039512/SRR1039512.1
## SRR1039513 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos/sra-pub-run-5/SRR1039513/SRR1039513.1
## SRR1039516 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos/sra-pub-run-5/SRR1039516/SRR1039516.1
## SRR1039517 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos/sra-pub-run-5/SRR1039517/SRR1039517.1
## SRR1039520 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos/sra-pub-run-5/SRR1039520/SRR1039520.1
## SRR1039521 https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos/sra-pub-run-5/SRR1039521/SRR1039521.1
## Experiment LibraryName LibraryStrategy LibrarySelection
## SRR1039508 SRX384345 NA RNA-Seq cDNA
## SRR1039509 SRX384346 NA RNA-Seq cDNA
## SRR1039512 SRX384349 NA RNA-Seq cDNA
## SRR1039513 SRX384350 NA RNA-Seq cDNA
## SRR1039516 SRX384353 NA RNA-Seq cDNA
## SRR1039517 SRX384354 NA RNA-Seq cDNA
## SRR1039520 SRX384357 NA RNA-Seq cDNA
## SRR1039521 SRX384358 NA RNA-Seq cDNA
## LibrarySource LibraryLayout InsertSize InsertDev Platform
## SRR1039508 TRANSCRIPTOMIC PAIRED 0 0 ILLUMINA
## SRR1039509 TRANSCRIPTOMIC PAIRED 0 0 ILLUMINA
## SRR1039512 TRANSCRIPTOMIC PAIRED 0 0 ILLUMINA
## SRR1039513 TRANSCRIPTOMIC PAIRED 0 0 ILLUMINA
## SRR1039516 TRANSCRIPTOMIC PAIRED 0 0 ILLUMINA
## SRR1039517 TRANSCRIPTOMIC PAIRED 0 0 ILLUMINA
## SRR1039520 TRANSCRIPTOMIC PAIRED 0 0 ILLUMINA
## SRR1039521 TRANSCRIPTOMIC PAIRED 0 0 ILLUMINA
## Model SRAStudy BioProject Study_Pubmed_id ProjectID
## SRR1039508 Illumina HiSeq 2000 SRP033351 PRJNA229998 2 229998
## SRR1039509 Illumina HiSeq 2000 SRP033351 PRJNA229998 2 229998
## SRR1039512 Illumina HiSeq 2000 SRP033351 PRJNA229998 2 229998
## SRR1039513 Illumina HiSeq 2000 SRP033351 PRJNA229998 2 229998
## SRR1039516 Illumina HiSeq 2000 SRP033351 PRJNA229998 2 229998
## SRR1039517 Illumina HiSeq 2000 SRP033351 PRJNA229998 2 229998
## SRR1039520 Illumina HiSeq 2000 SRP033351 PRJNA229998 2 229998
## SRR1039521 Illumina HiSeq 2000 SRP033351 PRJNA229998 2 229998
## Sample BioSample SampleType TaxID ScientificName g1k_pop_code
## SRR1039508 SRS508568 SAMN02422669 simple 9606 Homo sapiens NA
## SRR1039509 SRS508567 SAMN02422675 simple 9606 Homo sapiens NA
## SRR1039512 SRS508571 SAMN02422678 simple 9606 Homo sapiens NA
## SRR1039513 SRS508572 SAMN02422670 simple 9606 Homo sapiens NA
## SRR1039516 SRS508575 SAMN02422682 simple 9606 Homo sapiens NA
## SRR1039517 SRS508576 SAMN02422673 simple 9606 Homo sapiens NA
## SRR1039520 SRS508579 SAMN02422683 simple 9606 Homo sapiens NA
## SRR1039521 SRS508580 SAMN02422677 simple 9606 Homo sapiens NA
## source g1k_analysis_group Subject_ID Sex Disease Tumor
## SRR1039508 NA NA NA NA NA no
## SRR1039509 NA NA NA NA NA no
## SRR1039512 NA NA NA NA NA no
## SRR1039513 NA NA NA NA NA no
## SRR1039516 NA NA NA NA NA no
## SRR1039517 NA NA NA NA NA no
## SRR1039520 NA NA NA NA NA no
## SRR1039521 NA NA NA NA NA no
## Affection_Status Analyte_Type Histological_Type Body_Site CenterName
## SRR1039508 NA NA NA NA GEO
## SRR1039509 NA NA NA NA GEO
## SRR1039512 NA NA NA NA GEO
## SRR1039513 NA NA NA NA GEO
## SRR1039516 NA NA NA NA GEO
## SRR1039517 NA NA NA NA GEO
## SRR1039520 NA NA NA NA GEO
## SRR1039521 NA NA NA NA GEO
## Submission dbgap_study_accession Consent
## SRR1039508 SRA114259 NA public
## SRR1039509 SRA114259 NA public
## SRR1039512 SRA114259 NA public
## SRR1039513 SRA114259 NA public
## SRR1039516 SRA114259 NA public
## SRR1039517 SRA114259 NA public
## SRR1039520 SRA114259 NA public
## SRR1039521 SRA114259 NA public
## RunHash ReadHash
## SRR1039508 65345B22A4745B616DD5840CF2AD90EE 54B084AA7AD2A02B39D37CBBD5B1C3DC
## SRR1039509 E2F5FBD3278F29EC6EA7B00FC95C28B2 294D13FE5364BA88B5FEAD68A53CB11B
## SRR1039512 828A997474D0BBD6E5E14DEDB0D9E31D 5834A34CA3CE7EF962A396098543C9E6
## SRR1039513 0EB840FF1F4AA22A5017CB754C039225 5A5DD90EDCD5998A26F7F7BC099D4010
## SRR1039516 F88DCB56A77C7D4408B19C8B7E03CE3E 87E7B86FEEC66F5FF14D8FB7D9CB5AE3
## SRR1039517 87EB19F8B326E4DD8F17A8729295CC8D 1A0159FDA5AF8E3B542666812DE41137
## SRR1039520 3478B4523B6FD56D362B5890F0DBFE8D 0B6E49B16B2296F33C5B250BD1CB93D5
## SRR1039521 1ABC620B261A70743C633D564D0FBA62 28D9F5348982A2FEF14824E13AB8397A
## cellLine treatment
## SRR1039508 N61311 Untreated
## SRR1039509 N61311 Dexamethasone
## SRR1039512 N052611 Untreated
## SRR1039513 N052611 Dexamethasone
## SRR1039516 N080611 Untreated
## SRR1039517 N080611 Dexamethasone
## SRR1039520 N061011 Untreated
## SRR1039521 N061011 Dexamethasone
One way to reduce dimensionality is the use of multidimensional scaling (MDS). For MDS, we first have to calculate all pairwise distances between our objects (samples in this case), and then create a (typically) two-dimensional representation where these pre-calculated distances are represented as accurately as possible. This means that depending on how the pairwise sample distances are defined, the two-dimensional plot can be very different, and it is important to choose a distance that is suitable for the type of data at hand.
edgeR contains a function plotMDS, which operates on a DGEList object and generates a two-dimensional MDS representation of the samples. The default distance between two samples can be interpreted as the “typical” log fold change between the two samples, for the genes that are most different between them (by default, the top 500 genes, but this can be modified). We generate an MDS plot from the DGEList object dge, coloring by the treatment and using different plot symbols for different cell lines.
colnames(dge) <- paste0(substr(target$cellLine,1,3),"_",substr(target$treatment,1,3))
plotMDS(dge, top = 500,col=as.double(target$cellLine))
##Main analysis
We first estimate the overdispersion.
estimateDisp(dge, design)
dge <-plotBCV(dge)
glmFit(dge, design)
fit <- glmLRT(fit, coef = "treatmentDexamethasone")
lrt <- topTags(lrt, n = nrow(dge)) # all genes
ttAll <-hist(ttAll$table$PValue)
topTags(lrt, n = nrow(dge), p.value = 0.05) # genes with adj.p<0.05
tt <-head(tt)
## Coefficient: treatmentDexamethasone
## logFC logCPM LR PValue FDR
## ENSG00000109906.13 6.453642 4.098238 759.1054 4.203188e-167 7.206366e-163
## ENSG00000189221.9 3.395128 6.726943 502.9298 2.190325e-111 1.877656e-107
## ENSG00000120129.5 2.948968 7.262484 501.2133 5.175974e-111 2.958069e-107
## ENSG00000157214.13 2.016345 7.076735 452.8130 1.761685e-100 7.551020e-97
## ENSG00000196136.17 3.237191 6.919779 435.9441 8.263555e-97 2.833573e-93
## ENSG00000152583.12 4.501127 5.494281 434.3035 1.880370e-96 4.607574e-93
nrow(tt)
## [1] 4581
We first make a volcanoplot and an MA plot.
library(ggplot2)
ggplot(ttAll$table,aes(x=logFC,y=-log10(PValue),color=FDR<0.05)) + geom_point() + scale_color_manual(values=c("black","red"))
volcano <- volcano
plotSmear(lrt, de.tags = row.names(tt$table))
library(pheatmap)
pheatmap(cpm(dge,log=TRUE)[rownames(tt$table)[1:30],])