In this second lab session, we will perform normalization, feature selection and dimension reduction on two different scRNA-seq datasets.
Macosko dataset
In the first lab session (24 November, 2021), we have quantified and pre-processed the droplet single-cell RNA-seq dataset (drop-seq protocol) from the publication by Macosko et al., Cell 161, 1202–1214 (link). In this experiment, Macosko et al. sequenced 49,300 cells from the mouse retina, identifying 39 transcriptionally distinct cell populations. The experiment was performed in 7 batches.
We now take off where we left last week. Last week, we have performed the following steps:
- Constructed a
SingleCellExperiment
object for the Macosko experiment
- Included information on the genes to that object
- Removed lowly abundant genes
- Performed cell-level quality control, including the removal of cells with a lower than expected library size or transcriptional complexity, cells with a high percentage of counts going to mitochondrial RNA, removing empty droplets and removing doublets.
- Normalization to remove technical noise (brief)
During this session, we will add the following steps to this workflow :
- Normalization to remove technical noise (continued)
- Feature selection (selecting genes for downstream dimension reduction and clustering)
- Various flavors of dimension reduction
To guide you with these next steps, we provide with an Rmarkdown template that you can fill out:
lab2_MacoskoTemplate
Here, we provide you with a solution file for the exercise (sections normalization, feature selection and dimensionality reduction):
lab2_macoskoWorkflow
Cuomo dataset
Same as for the Macosko dataset, we will continue our workflow for the Cuomo dataset.
In the experiment, the authors harvested induced pluripotent stem cells (iPSCs) from 125 healthy human donors. These cells were used to study the endoderm differentiation process. The authors cultered the iPSCs cell lines and allowed for differentiation for three days. Knowing the process of endoderm differentiation, these time points should correspond with different cell types: day0 are (undifferentiated) iPSCs, day1 are mesendoderm cells, day2 are “intermediate” cells and day3 are fully differentiated endoderm cells. This dataset was generated using the SMART-Seq2 scRNA-seq protocol.
We will continue our analysis for the Cuomo dataset by including normalization, feature selection and dimension reduction to last session’s workflow.
We here provide you with a “solution” file for the analysis of last week’s session, and some placeholders for the analysis of this session:
lab2_CuomoTemplate
Here, we provide you with a solution file for the exercise:
lab2_Macosko_CuomoWorkflow
LS0tCnRpdGxlOiAiTm9ybWFsaXphdGlvbiwgZmVhdHVyZSBzZWxlY3Rpb24gYW5kIGRpbWVuc2lvbmFsaXR5IHJlZHVjdGlvbiIKYXV0aG9yOiAiS29lbiB2YW4gZGVuIEJlcmdlIGFuZCBKZXJvZW4gR2lsaXMiCmRhdGU6ICIzMC8xMS8yMDIxIgpvdXRwdXQ6CiAgICBodG1sX2RvY3VtZW50OgogICAgICBjb2RlX2Rvd25sb2FkOiB0cnVlICAgIAogICAgICB0aGVtZTogY29zbW8KICAgICAgdG9jOiB0cnVlCiAgICAgIHRvY19mbG9hdDogdHJ1ZQogICAgICBoaWdobGlnaHQ6IHRhbmdvCiAgICAgIG51bWJlcl9zZWN0aW9uczogdHJ1ZQotLS0KCkluIHRoaXMgc2Vjb25kIGxhYiBzZXNzaW9uLCB3ZSB3aWxsIHBlcmZvcm0gbm9ybWFsaXphdGlvbiwgZmVhdHVyZSBzZWxlY3Rpb24gYW5kIApkaW1lbnNpb24gcmVkdWN0aW9uIG9uIHR3byBkaWZmZXJlbnQgc2NSTkEtc2VxIGRhdGFzZXRzLgoKIyBNYWNvc2tvIGRhdGFzZXQKCkluIHRoZSBmaXJzdCBsYWIgc2Vzc2lvbiAoMjQgTm92ZW1iZXIsIDIwMjEpLCB3ZSBoYXZlIHF1YW50aWZpZWQgYW5kIApwcmUtcHJvY2Vzc2VkIHRoZSBkcm9wbGV0IHNpbmdsZS1jZWxsIFJOQS1zZXEgZGF0YXNldCAoKipkcm9wLXNlcSBwcm90b2NvbCoqKSAKZnJvbSB0aGUgcHVibGljYXRpb24gYnkgTWFjb3NrbyAqZXQgYWwuKiwgQ2VsbCAxNjEsIDEyMDLigJMxMjE0ClsobGluayldKGh0dHBzOi8vZG9pLm9yZy8xMC4xMDE2L2ouY2VsbC4yMDE1LjA1LjAwMikuIEluIHRoaXMgZXhwZXJpbWVudCwgCk1hY29za28gKmV0IGFsLiogc2VxdWVuY2VkIDQ5LDMwMCBjZWxscyBmcm9tIHRoZSBtb3VzZSByZXRpbmEsIGlkZW50aWZ5aW5nIAozOSB0cmFuc2NyaXB0aW9uYWxseSBkaXN0aW5jdCBjZWxsIHBvcHVsYXRpb25zLiBUaGUgZXhwZXJpbWVudCB3YXMgcGVyZm9ybWVkIGluCjcgYmF0Y2hlcy4KCldlIG5vdyB0YWtlIG9mZiB3aGVyZSB3ZSBsZWZ0IGxhc3Qgd2Vlay4gTGFzdCB3ZWVrLCB3ZSBoYXZlIHBlcmZvcm1lZCB0aGUgCmZvbGxvd2luZyBzdGVwczoKCiAtIENvbnN0cnVjdGVkIGEgYFNpbmdsZUNlbGxFeHBlcmltZW50YCBvYmplY3QgZm9yIHRoZSBNYWNvc2tvIGV4cGVyaW1lbnQKIC0gSW5jbHVkZWQgaW5mb3JtYXRpb24gb24gdGhlIGdlbmVzIHRvIHRoYXQgb2JqZWN0CiAtIFJlbW92ZWQgbG93bHkgYWJ1bmRhbnQgZ2VuZXMKIC0gUGVyZm9ybWVkIGNlbGwtbGV2ZWwgcXVhbGl0eSBjb250cm9sLCBpbmNsdWRpbmcgdGhlIHJlbW92YWwgb2YgY2VsbHMgd2l0aAogYSBsb3dlciB0aGFuIGV4cGVjdGVkIGxpYnJhcnkgc2l6ZSBvciB0cmFuc2NyaXB0aW9uYWwgY29tcGxleGl0eSwgY2VsbHMgd2l0aAogYSBoaWdoIHBlcmNlbnRhZ2Ugb2YgY291bnRzIGdvaW5nIHRvIG1pdG9jaG9uZHJpYWwgUk5BLCByZW1vdmluZyBlbXB0eSBkcm9wbGV0cwogYW5kIHJlbW92aW5nIGRvdWJsZXRzLgogLSBOb3JtYWxpemF0aW9uIHRvIHJlbW92ZSB0ZWNobmljYWwgbm9pc2UgKGJyaWVmKQogCiBEdXJpbmcgdGhpcyBzZXNzaW9uLCB3ZSB3aWxsIGFkZCB0aGUgZm9sbG93aW5nIHN0ZXBzIHRvIHRoaXMgd29ya2Zsb3cgOgogCiAtIE5vcm1hbGl6YXRpb24gdG8gcmVtb3ZlIHRlY2huaWNhbCBub2lzZSAoY29udGludWVkKQogLSBGZWF0dXJlIHNlbGVjdGlvbiAoc2VsZWN0aW5nIGdlbmVzIGZvciBkb3duc3RyZWFtIGRpbWVuc2lvbiByZWR1Y3Rpb24gYW5kCiBjbHVzdGVyaW5nKQogLSBWYXJpb3VzIGZsYXZvcnMgb2YgZGltZW5zaW9uIHJlZHVjdGlvbgogClRvIGd1aWRlIHlvdSB3aXRoIHRoZXNlIG5leHQgc3RlcHMsIHdlIHByb3ZpZGUgd2l0aCBhbiBSbWFya2Rvd24gdGVtcGxhdGUKdGhhdCB5b3UgY2FuIGZpbGwgb3V0OgoKW2xhYjJfTWFjb3Nrb1RlbXBsYXRlXSguL2xhYjJfTWFjb3Nrb1RlbXBsYXRlLmh0bWwpCgpIZXJlLCB3ZSBwcm92aWRlIHlvdSB3aXRoIGEgc29sdXRpb24gZmlsZSBmb3IgdGhlIGV4ZXJjaXNlIChzZWN0aW9ucwpub3JtYWxpemF0aW9uLCBmZWF0dXJlIHNlbGVjdGlvbiBhbmQgZGltZW5zaW9uYWxpdHkgcmVkdWN0aW9uKToKCltsYWIyX21hY29za29Xb3JrZmxvd10oLi9sYWIzX01hY29za29Xb3JrZmxvdy5odG1sKQoKIyBDdW9tbyBkYXRhc2V0CgpTYW1lIGFzIGZvciB0aGUgTWFjb3NrbyBkYXRhc2V0LCB3ZSB3aWxsIGNvbnRpbnVlIG91ciB3b3JrZmxvdyBmb3IgdGhlIEN1b21vCmRhdGFzZXQuIAoKSW4gdGhlIGV4cGVyaW1lbnQsIHRoZSBhdXRob3JzIGhhcnZlc3RlZCBpbmR1Y2VkIHBsdXJpcG90ZW50IHN0ZW0gY2VsbHMgKGlQU0NzKQpmcm9tIDEyNSBoZWFsdGh5IGh1bWFuIGRvbm9ycy4gVGhlc2UgY2VsbHMgd2VyZSB1c2VkIHRvIHN0dWR5IHRoZSBlbmRvZGVybSAKZGlmZmVyZW50aWF0aW9uIHByb2Nlc3MuIFRoZSBhdXRob3JzIGN1bHRlcmVkIHRoZSBpUFNDcyBjZWxsIGxpbmVzIGFuZCBhbGxvd2VkIApmb3IgZGlmZmVyZW50aWF0aW9uIGZvciB0aHJlZSBkYXlzLiBLbm93aW5nIHRoZSBwcm9jZXNzIG9mIGVuZG9kZXJtIApkaWZmZXJlbnRpYXRpb24sIHRoZXNlIHRpbWUgcG9pbnRzIHNob3VsZCBjb3JyZXNwb25kIHdpdGggZGlmZmVyZW50IGNlbGwgdHlwZXM6IApkYXkwIGFyZSAodW5kaWZmZXJlbnRpYXRlZCkgaVBTQ3MsIGRheTEgYXJlIG1lc2VuZG9kZXJtIGNlbGxzLCBkYXkyCmFyZSAiaW50ZXJtZWRpYXRlIiBjZWxscyBhbmQgZGF5MyBhcmUgZnVsbHkgZGlmZmVyZW50aWF0ZWQgZW5kb2Rlcm0gY2VsbHMuClRoaXMgZGF0YXNldCB3YXMgZ2VuZXJhdGVkIHVzaW5nIHRoZSAqKlNNQVJULVNlcTIqKiBzY1JOQS1zZXEgcHJvdG9jb2wuCgpXZSB3aWxsIGNvbnRpbnVlIG91ciBhbmFseXNpcyBmb3IgdGhlIEN1b21vIGRhdGFzZXQgYnkgaW5jbHVkaW5nIG5vcm1hbGl6YXRpb24sCmZlYXR1cmUgc2VsZWN0aW9uIGFuZCBkaW1lbnNpb24gcmVkdWN0aW9uIHRvIGxhc3Qgc2Vzc2lvbidzIHdvcmtmbG93LiAKCldlIGhlcmUgcHJvdmlkZSB5b3Ugd2l0aCBhICJzb2x1dGlvbiIgZmlsZSBmb3IgdGhlIGFuYWx5c2lzIG9mIGxhc3Qgd2VlaydzIApzZXNzaW9uLCBhbmQgc29tZSBwbGFjZWhvbGRlcnMgZm9yIHRoZSBhbmFseXNpcyBvZiB0aGlzIHNlc3Npb246CgpbbGFiMl9DdW9tb1RlbXBsYXRlXSguL2xhYjJfQ3VvbW9UZW1wbGF0ZS5odG1sKQoKSGVyZSwgd2UgcHJvdmlkZSB5b3Ugd2l0aCBhIHNvbHV0aW9uIGZpbGUgZm9yIHRoZSBleGVyY2lzZToKCltsYWIyX01hY29za29fQ3VvbW9Xb3JrZmxvd10oLi9sYWIyX0N1b21vV29ya2Zsb3cuaHRtbCkKCgotLS0KCg==