In this first lab session, we will perform quality control, gene-level quantification and data pre-processing on two different scRNA-seq datasets.
Macosko dataset
In this workshop session, we will preprocess the single-cell RNA-seq dataset from the publication by Macosko et al., Cell 161, 1202–1214 from 2015 (link). This is the manuscript in which the droplet scRNA-seq technology Drop-seq was introduced. Six years after the original publication, drop-seq is still one of the most commonly adopted scRNA-seq protocols, as evidenced by the large number of citations for Macosko et al. (4.303 citations at November 3, 2021).
In this particular experiment, Macosko et al. sequenced 49,300 cells from the mouse retina, identifying 39 transcriptionally distinct cell populations. The experiment was performed in 7 batches.
The raw data (FASTQ files) of the experiment can be retrieved from the sequencing read archive (SRA) under the accession SRR1853178. However, given the size of the data, we have provided a more amenable subset of the data on our lab GitHub page.
Data quantification with QC
We will make use of alevin software, which is integrated with the salmon quantification software to quantify a subset of reads form the Macosko experiment, while simulataneously providing some quality control for the sequenced reads.
These steps are implemented in a small .shell script, which can be obtained using the following link Macosko_sh.
Data exploration and preprocessing in R
Here, we provide an example of a script for performing data exploration and preprocessing in R for the Macosko dataset (first sections, until normalization):
lab1_macoskoWorkflow
Cuomo dataset
We here make use of the publication of Anna Cuomo et al. (last author Oliver Stegle), which we will refer to as the Cuomo dataset
. The paper that describes this dataset can be found using this link.
In the experiment, the authors harvested induced pluripotent stem cells (iPSCs) from 125 healthy human donors. These cells were used to study the endoderm differentiation process. In this process, iPSCs differentiate to endoderm cells, a process which takes approximately three days. As such, the authors cultered the iPSCs cell lines and allowed for differentiation for three days. During the experiment, cells were harvested at four different time points: day0 (directly at to incubation), day1, day2 and day3. Knowing the process of endoderm differentiation, these time points should correspond with different cell types: day0 are (undifferentiated) iPSCs, day1 are mesendoderm cells, day2 are “intermediate” cells and day3 are fully differentiated endoderm cells.
This dataset was generated using the SMART-Seq2 scRNA-seq protocol.
The final goal of the experiment was to characterize population variation in the process of endoderm differentiation.
Data quantification
We will make use of the salmon quantification software to quantify a subset of the cells form the Cuomo experiment. Note that here we do not make use of Alevin, as Alevin is only used for drople-based data such as drop-seq and 10X data.
The salmon quantification is implemented in a small .shell script, which can be obtained using the following link Cuomo_sh.
Data exploration and preprocessing in R
Here, we provide a template for a script for performing data exploration and preprocessing in R for the Cuomo dataset. It is up to you to fill out this script, based on the concepts discussed in the theory session and the fucntions explained for the Macosko analysis!
lab1_CuomoTemplate
Here, we provide a “solution” file for the analysis of the Cuomo dataset for the first lab session:
lab1_CuomoWorkflow_html
LS0tCnRpdGxlOiAiUHJlcHJvY2Vzc2luZyBhbmQgcXVhbGl0eSBjb250cm9sIG9mIHNjUk5BLXNlcSBkYXRhIgphdXRob3I6ICJLb2VuIHZhbiBkZW4gQmVyZ2UgYW5kIEplcm9lbiBHaWxpcyIKZGF0ZTogIjI0LzExLzIwMjEiCm91dHB1dDoKICAgIGh0bWxfZG9jdW1lbnQ6CiAgICAgIGNvZGVfZG93bmxvYWQ6IHRydWUgICAgCiAgICAgIHRoZW1lOiBjb3NtbwogICAgICB0b2M6IHRydWUKICAgICAgdG9jX2Zsb2F0OiB0cnVlCiAgICAgIGhpZ2hsaWdodDogdGFuZ28KICAgICAgbnVtYmVyX3NlY3Rpb25zOiB0cnVlCi0tLQoKSW4gdGhpcyBmaXJzdCBsYWIgc2Vzc2lvbiwgd2Ugd2lsbCBwZXJmb3JtIHF1YWxpdHkgY29udHJvbCwgZ2VuZS1sZXZlbCAKcXVhbnRpZmljYXRpb24gYW5kIGRhdGEgcHJlLXByb2Nlc3Npbmcgb24gdHdvIGRpZmZlcmVudCBzY1JOQS1zZXEgZGF0YXNldHMuCgojIE1hY29za28gZGF0YXNldAoKSW4gdGhpcyB3b3Jrc2hvcCBzZXNzaW9uLCB3ZSB3aWxsIHByZXByb2Nlc3MgdGhlIHNpbmdsZS1jZWxsIFJOQS1zZXEgZGF0YXNldApmcm9tIHRoZSBwdWJsaWNhdGlvbiBieSBNYWNvc2tvICpldCBhbC4qLCBDZWxsIDE2MSwgMTIwMuKAkzEyMTQgZnJvbSAyMDE1ClsobGluayldKGh0dHBzOi8vZG9pLm9yZy8xMC4xMDE2L2ouY2VsbC4yMDE1LjA1LjAwMikuIFRoaXMgaXMgdGhlIG1hbnVzY3JpcHQgaW4Kd2hpY2ggdGhlIGRyb3BsZXQgc2NSTkEtc2VxIHRlY2hub2xvZ3kgKipEcm9wLXNlcSoqIHdhcyBpbnRyb2R1Y2VkLgpTaXggeWVhcnMgYWZ0ZXIgdGhlIG9yaWdpbmFsIHB1YmxpY2F0aW9uLCBkcm9wLXNlcSBpcyBzdGlsbCBvbmUgb2YgdGhlIG1vc3QgCmNvbW1vbmx5IGFkb3B0ZWQgc2NSTkEtc2VxIHByb3RvY29scywgYXMgZXZpZGVuY2VkIGJ5IHRoZQpsYXJnZSBudW1iZXIgb2YgY2l0YXRpb25zIGZvciBNYWNvc2tvICpldCBhbC4qIAooNC4zMDMgY2l0YXRpb25zIGF0IE5vdmVtYmVyIDMsIDIwMjEpLgoKSW4gdGhpcyBwYXJ0aWN1bGFyIGV4cGVyaW1lbnQsIE1hY29za28gKmV0IGFsLiogc2VxdWVuY2VkIDQ5LDMwMCBjZWxscyBmcm9tIHRoZQptb3VzZSByZXRpbmEsIGlkZW50aWZ5aW5nIDM5IHRyYW5zY3JpcHRpb25hbGx5IGRpc3RpbmN0IGNlbGwgcG9wdWxhdGlvbnMuIFRoZQpleHBlcmltZW50IHdhcyBwZXJmb3JtZWQgaW4gNyBiYXRjaGVzLgoKVGhlIHJhdyBkYXRhIChGQVNUUSBmaWxlcykgb2YgdGhlIGV4cGVyaW1lbnQgY2FuIGJlIHJldHJpZXZlZCBmcm9tIHRoZSAKc2VxdWVuY2luZyByZWFkIGFyY2hpdmUgKFNSQSkgdW5kZXIgdGhlIGFjY2Vzc2lvbiAKW1NSUjE4NTMxNzhdKGh0dHBzOi8vdHJhY2UubmNiaS5ubG0ubmloLmdvdi9UcmFjZXMvc3JhLz9ydW49U1JSMTg1MzE3OCkuIApIb3dldmVyLCBnaXZlbiB0aGUgc2l6ZSBvZiB0aGUgZGF0YSwgd2UgaGF2ZSBwcm92aWRlZCBhIG1vcmUgYW1lbmFibGUgc3Vic2V0Cm9mIHRoZSBkYXRhIG9uIG91ciBsYWIgR2l0SHViIHBhZ2UuCgojIyBEYXRhIHF1YW50aWZpY2F0aW9uIHdpdGggUUMKCldlIHdpbGwgbWFrZSB1c2Ugb2YgClthbGV2aW5dKGh0dHBzOi8vc2FsbW9uLnJlYWR0aGVkb2NzLmlvL2VuL2xhdGVzdC9hbGV2aW4uaHRtbCkgc29mdHdhcmUsIHdoaWNoIAppcyBpbnRlZ3JhdGVkIHdpdGggdGhlIApbc2FsbW9uXShodHRwczovL3NhbG1vbi5yZWFkdGhlZG9jcy5pby9lbi9sYXRlc3Qvc2FsbW9uLmh0bWwpIHF1YW50aWZpY2F0aW9uIApzb2Z0d2FyZSB0byBxdWFudGlmeSBhIHN1YnNldCBvZiByZWFkcyBmb3JtIHRoZSBNYWNvc2tvIGV4cGVyaW1lbnQsIHdoaWxlIApzaW11bGF0YW5lb3VzbHkgcHJvdmlkaW5nIHNvbWUgcXVhbGl0eSBjb250cm9sIGZvciB0aGUgc2VxdWVuY2VkIHJlYWRzLgoKVGhlc2Ugc3RlcHMgYXJlIGltcGxlbWVudGVkIGluIGEgc21hbGwgLnNoZWxsIHNjcmlwdCwgd2hpY2ggY2FuIGJlIG9idGFpbmVkCnVzaW5nIHRoZSBmb2xsb3dpbmcgbGluawpbTWFjb3Nrb19zaF0oLi9sYWIxX3ByZXByb2Nlc3NpbmcvYWxldmluX21hY29za28vcHJlcHJvY2Vzc0Ryb3BzZXFfbGFiLnNoKS4KCiMjIERhdGEgZXhwbG9yYXRpb24gYW5kIHByZXByb2Nlc3NpbmcgaW4gUgoKSGVyZSwgd2UgcHJvdmlkZSBhbiBleGFtcGxlIG9mIGEgc2NyaXB0IGZvciBwZXJmb3JtaW5nIGRhdGEgZXhwbG9yYXRpb24KYW5kIHByZXByb2Nlc3NpbmcgaW4gUiBmb3IgdGhlIE1hY29za28gZGF0YXNldCAoZmlyc3Qgc2VjdGlvbnMsIHVudGlsCm5vcm1hbGl6YXRpb24pOgoKW2xhYjFfbWFjb3Nrb1dvcmtmbG93XSguL2xhYjNfTWFjb3Nrb1dvcmtmbG93Lmh0bWwpCgojIEN1b21vIGRhdGFzZXQKCldlIGhlcmUgbWFrZSB1c2Ugb2YgdGhlIHB1YmxpY2F0aW9uIG9mIEFubmEgQ3VvbW8gZXQgYWwuCihsYXN0IGF1dGhvciBPbGl2ZXIgU3RlZ2xlKSwgd2hpY2ggd2Ugd2lsbCByZWZlciB0byBhcyB0aGUgYEN1b21vIGRhdGFzZXRgLiBUaGUgCnBhcGVyIHRoYXQgZGVzY3JpYmVzIHRoaXMgZGF0YXNldCBjYW4gYmUgZm91bmQgdXNpbmcgdGhpcyAKW2xpbmtdKGh0dHBzOi8vd3d3Lm5hdHVyZS5jb20vYXJ0aWNsZXMvczQxNDY3LTAyMC0xNDQ1Ny16KS4KCkluIHRoZSBleHBlcmltZW50LCB0aGUgYXV0aG9ycyBoYXJ2ZXN0ZWQgaW5kdWNlZCBwbHVyaXBvdGVudCBzdGVtIGNlbGxzIChpUFNDcykKZnJvbSAxMjUgaGVhbHRoeSBodW1hbiBkb25vcnMuIFRoZXNlIGNlbGxzIHdlcmUgdXNlZCB0byBzdHVkeSB0aGUgZW5kb2Rlcm0gCmRpZmZlcmVudGlhdGlvbiBwcm9jZXNzLiBJbiB0aGlzIHByb2Nlc3MsIGlQU0NzIGRpZmZlcmVudGlhdGUgdG8gZW5kb2Rlcm0gY2VsbHMsCmEgcHJvY2VzcyB3aGljaCB0YWtlcyBhcHByb3hpbWF0ZWx5IHRocmVlIGRheXMuIEFzIHN1Y2gsIHRoZSBhdXRob3JzIApjdWx0ZXJlZCB0aGUgaVBTQ3MgY2VsbCBsaW5lcyBhbmQgYWxsb3dlZCBmb3IgZGlmZmVyZW50aWF0aW9uIGZvciB0aHJlZSBkYXlzLiAKRHVyaW5nIHRoZSBleHBlcmltZW50LCBjZWxscyB3ZXJlIGhhcnZlc3RlZCBhdCBmb3VyIGRpZmZlcmVudCB0aW1lIHBvaW50czogCmRheTAgKGRpcmVjdGx5IGF0IHRvIGluY3ViYXRpb24pLCBkYXkxLCBkYXkyIGFuZCBkYXkzLiBLbm93aW5nIHRoZSBwcm9jZXNzIG9mIAplbmRvZGVybSBkaWZmZXJlbnRpYXRpb24sIHRoZXNlIHRpbWUgcG9pbnRzIHNob3VsZCBjb3JyZXNwb25kIHdpdGggZGlmZmVyZW50IApjZWxsIHR5cGVzOiBkYXkwIGFyZSAodW5kaWZmZXJlbnRpYXRlZCkgaVBTQ3MsIGRheTEgYXJlIG1lc2VuZG9kZXJtIGNlbGxzLCBkYXkyCmFyZSAiaW50ZXJtZWRpYXRlIiBjZWxscyBhbmQgZGF5MyBhcmUgZnVsbHkgZGlmZmVyZW50aWF0ZWQgZW5kb2Rlcm0gY2VsbHMuCgpUaGlzIGRhdGFzZXQgd2FzIGdlbmVyYXRlZCB1c2luZyB0aGUgKipTTUFSVC1TZXEyKiogc2NSTkEtc2VxIHByb3RvY29sLgoKVGhlIGZpbmFsIGdvYWwgb2YgdGhlIGV4cGVyaW1lbnQgd2FzIHRvIGNoYXJhY3Rlcml6ZSBwb3B1bGF0aW9uIHZhcmlhdGlvbiBpbiB0aGUKcHJvY2VzcyBvZiBlbmRvZGVybSBkaWZmZXJlbnRpYXRpb24uCgojIyBEYXRhIHF1YW50aWZpY2F0aW9uCgpXZSB3aWxsIG1ha2UgdXNlIG9mIHRoZSAKW3NhbG1vbl0oaHR0cHM6Ly9zYWxtb24ucmVhZHRoZWRvY3MuaW8vZW4vbGF0ZXN0L3NhbG1vbi5odG1sKSBxdWFudGlmaWNhdGlvbiAKc29mdHdhcmUgdG8gcXVhbnRpZnkgYSBzdWJzZXQgb2YgdGhlIGNlbGxzIGZvcm0gdGhlIEN1b21vIGV4cGVyaW1lbnQuIE5vdGUgdGhhdApoZXJlIHdlIGRvIG5vdCBtYWtlIHVzZSBvZiBBbGV2aW4sIGFzIEFsZXZpbiBpcyBvbmx5IHVzZWQgZm9yIGRyb3BsZS1iYXNlZApkYXRhIHN1Y2ggYXMgZHJvcC1zZXEgYW5kIDEwWCBkYXRhLgoKVGhlIHNhbG1vbiBxdWFudGlmaWNhdGlvbiBpcyBpbXBsZW1lbnRlZCBpbiBhIHNtYWxsIC5zaGVsbCBzY3JpcHQsIHdoaWNoIGNhbiAKYmUgb2J0YWluZWQgdXNpbmcgdGhlIGZvbGxvd2luZyBsaW5rCltDdW9tb19zaF0oLi9sYWIxX3ByZXByb2Nlc3Npbmcvc2FsbW9uX2N1b21vL3F1YW50aWZ5X2N1b21vX2xhYi5zaCkuCgojIyBEYXRhIGV4cGxvcmF0aW9uIGFuZCBwcmVwcm9jZXNzaW5nIGluIFIKCkhlcmUsIHdlIHByb3ZpZGUgYSB0ZW1wbGF0ZSBmb3IgYSBzY3JpcHQgZm9yIHBlcmZvcm1pbmcgZGF0YSBleHBsb3JhdGlvbgphbmQgcHJlcHJvY2Vzc2luZyBpbiBSIGZvciB0aGUgQ3VvbW8gZGF0YXNldC4gSXQgaXMgdXAgdG8geW91IHRvIGZpbGwgb3V0CnRoaXMgc2NyaXB0LCBiYXNlZCBvbiB0aGUgY29uY2VwdHMgZGlzY3Vzc2VkIGluIHRoZSB0aGVvcnkgc2Vzc2lvbiBhbmQgdGhlCmZ1Y250aW9ucyBleHBsYWluZWQgZm9yIHRoZSBNYWNvc2tvIGFuYWx5c2lzIQoKW2xhYjFfQ3VvbW9UZW1wbGF0ZV0oLi9sYWIxX0N1b21vVGVtcGxhdGUuaHRtbCkKCkhlcmUsIHdlIHByb3ZpZGUgYSAic29sdXRpb24iIGZpbGUgZm9yIHRoZSBhbmFseXNpcyBvZiB0aGUgQ3VvbW8gZGF0YXNldCBmb3IKdGhlIGZpcnN0IGxhYiBzZXNzaW9uOgoKW2xhYjFfQ3VvbW9Xb3JrZmxvd19odG1sXSguL2xhYjFfQ3VvbW9Xb3JrZmxvdy5odG1sKQoKCgotLS0KCgoKCg==