1 Introduction

We here make use of the publication of Anna Cuomo et al. (last author Oliver Stegle), which we will refer to as the iPSC dataset. The paper that describes this dataset can be found using this link.

In the experiment, the authors harvested induced pluripotent stem cells (iPSCs) from 125 healthy human donors. These cells were used to study the endoderm differentiation process. In this process, iPSCs differentiate to endoderm cells, a process which takes approximately three days. As such, the authors cultered the iPSCs cell lines and allowed for differentiation for three days. During the experiment, cells were harvested at four different time points: day0 (directly at to incubation), day1, day2 and day3. Knowing the process of endoderm differentiation, these time points should correspond with different cell types: day0 are (undifferentiated) iPSCs, day1 are mesendoderm cells, day2 are “intermediate” cells and day3 are fully differentiated endoderm cells.

This dataset was generated using the SMART-Seq2 scRNA-seq protocol.

The final goal of the experiment was to characterize population variation in the process of endoderm differentiation.

2 Download data

For this lab session, we will work with a subset of the data, i.e., the data for the first (alphabetically) 15 patients in the experiment. These can be downloaded through the belnet filesender link provided through email, https://filesender.belnet.be/?s=download&token=eb8136df-67d3-4869-b2a9-f65767054e81.

The data original (125 patient) could be downloaded from Zenodo. At the bottom of this web-page, we can download the files raw_counts.csv.zip and cell_metadata_cols.tsv and store these files locally. We do not recommend doing this during the lab session, to avoid overloading the system.

3 Import data

First we read in the count matrix:

sce <- readRDS("/Users/jg/Desktop/sce_15_cuomo.rds") #change to your data path
sce

4 Explore metadata

Exploration of the metadata is essential to get a better idea of what the experiment was about and how it was organized. In contrast with the previous dataset by Macosko et al., we here have a large amount of metadata that we can work with; and that we need to explore.

When we think of the experiment, the key aspects are;

  • At which day of the developmental process the cells were sequenced (which should be a proxy for the cell type)

  • Cells come from 125 (15 in this reduced dataset) donors

In addition, to reduce technical artefacts and to allow for batch correction, each batch (“experiment” variable) may contain cells of multiple patients/days.

Explore he metadata. The table() function will come in handy for this (see the Macosko analysis).

5 Obtaining and including rowData

  • Assess what is currently stored in the rowData of the SingelCellExperiment object.

  • Retrieve relevant information form bioMart. Make sure to select the right values for the dataset and version arguments for the useEnsembl function (these can be retrieved from the Cuomo et al. paper).

6 Filtering non-informative genes

Filter the genes using relavant criteria. Compare your results with what we obtained with the Macosko analysis. Can you explain what you observe?

7 Quality control

7.1 Calculate QC variables

Use perCellQCMetrics to compute QC metrics.

7.2 Exploratory data analysis

7.3 QC using adaptive thresholds

Visualize the cells that are going to be removed. Are you happy with the selection criterion, i.e., does it appear that we are only removing technical artefacts or could we be removing biological signal as well?

To do this, try coloring the “detected” versus “subsets_Mito_percent” plot and “sum” versus “detected” plots based on biologically significant metadata.

7.4 Remove empty droplets

What do you think of this step for the analysis of this dataset?

7.5 Identifying and removing doublets

What do you think of this step for the analysis of this dataset?

8 Normalization

LS0tCnRpdGxlOiAnRGF0YSBpbXBvcnQsIHF1YWxpdHkgY29udHJvbCBhbmQgbm9ybWFsaXphdGlvbiBmb3IgdGhlIEN1b21vIGRhdGFzZXQnCmF1dGhvcjogIktvZW4gVmFuIGRlbiBCZXJnZSBhbmQgSmVyb2VuIEdpbGlzIgpkYXRlOiAiMjQvMTEvMjAyMSIKb3V0cHV0OiAKICBodG1sX2RvY3VtZW50OgogICAgdG9jOiB0cnVlCiAgICB0b2NfZmxvYXQ6IHRydWUKLS0tCgojIEludHJvZHVjdGlvbgoKV2UgaGVyZSBtYWtlIHVzZSBvZiB0aGUgcHVibGljYXRpb24gb2YgQW5uYSBDdW9tbyBldCBhbC4KKGxhc3QgYXV0aG9yIE9saXZlciBTdGVnbGUpLCB3aGljaCB3ZSB3aWxsIHJlZmVyIHRvIGFzIHRoZSBgaVBTQyBkYXRhc2V0YC4gVGhlIApwYXBlciB0aGF0IGRlc2NyaWJlcyB0aGlzIGRhdGFzZXQgY2FuIGJlIGZvdW5kIHVzaW5nIHRoaXMgCltsaW5rXShodHRwczovL3d3dy5uYXR1cmUuY29tL2FydGljbGVzL3M0MTQ2Ny0wMjAtMTQ0NTcteikuCgpJbiB0aGUgZXhwZXJpbWVudCwgdGhlIGF1dGhvcnMgaGFydmVzdGVkIGluZHVjZWQgcGx1cmlwb3RlbnQgc3RlbSBjZWxscyAoaVBTQ3MpCmZyb20gMTI1IGhlYWx0aHkgaHVtYW4gZG9ub3JzLiBUaGVzZSBjZWxscyB3ZXJlIHVzZWQgdG8gc3R1ZHkgdGhlIGVuZG9kZXJtIApkaWZmZXJlbnRpYXRpb24gcHJvY2Vzcy4gSW4gdGhpcyBwcm9jZXNzLCBpUFNDcyBkaWZmZXJlbnRpYXRlIHRvIGVuZG9kZXJtIGNlbGxzLAphIHByb2Nlc3Mgd2hpY2ggdGFrZXMgYXBwcm94aW1hdGVseSB0aHJlZSBkYXlzLiBBcyBzdWNoLCB0aGUgYXV0aG9ycyAKY3VsdGVyZWQgdGhlIGlQU0NzIGNlbGwgbGluZXMgYW5kIGFsbG93ZWQgZm9yIGRpZmZlcmVudGlhdGlvbiBmb3IgdGhyZWUgZGF5cy4gCkR1cmluZyB0aGUgZXhwZXJpbWVudCwgY2VsbHMgd2VyZSBoYXJ2ZXN0ZWQgYXQgZm91ciBkaWZmZXJlbnQgdGltZSBwb2ludHM6IApkYXkwIChkaXJlY3RseSBhdCB0byBpbmN1YmF0aW9uKSwgZGF5MSwgZGF5MiBhbmQgZGF5My4gS25vd2luZyB0aGUgcHJvY2VzcyBvZiAKZW5kb2Rlcm0gZGlmZmVyZW50aWF0aW9uLCB0aGVzZSB0aW1lIHBvaW50cyBzaG91bGQgY29ycmVzcG9uZCB3aXRoIGRpZmZlcmVudCAKY2VsbCB0eXBlczogZGF5MCBhcmUgKHVuZGlmZmVyZW50aWF0ZWQpIGlQU0NzLCBkYXkxIGFyZSBtZXNlbmRvZGVybSBjZWxscywgZGF5MgphcmUgImludGVybWVkaWF0ZSIgY2VsbHMgYW5kIGRheTMgYXJlIGZ1bGx5IGRpZmZlcmVudGlhdGVkIGVuZG9kZXJtIGNlbGxzLgoKVGhpcyBkYXRhc2V0IHdhcyBnZW5lcmF0ZWQgdXNpbmcgdGhlICoqU01BUlQtU2VxMioqIHNjUk5BLXNlcSBwcm90b2NvbC4KClRoZSBmaW5hbCBnb2FsIG9mIHRoZSBleHBlcmltZW50IHdhcyB0byBjaGFyYWN0ZXJpemUgcG9wdWxhdGlvbiB2YXJpYXRpb24gaW4gdGhlCnByb2Nlc3Mgb2YgZW5kb2Rlcm0gZGlmZmVyZW50aWF0aW9uLgoKIyBEb3dubG9hZCBkYXRhCgpGb3IgdGhpcyBsYWIgc2Vzc2lvbiwgd2Ugd2lsbCB3b3JrIHdpdGggYSBzdWJzZXQgb2YgdGhlIGRhdGEsIGkuZS4sIHRoZSBkYXRhCmZvciB0aGUgZmlyc3QgKGFscGhhYmV0aWNhbGx5KSAxNSBwYXRpZW50cyBpbiB0aGUgZXhwZXJpbWVudC4gVGhlc2UgY2FuIGJlCmRvd25sb2FkZWQgdGhyb3VnaCB0aGUgKmJlbG5ldCBmaWxlc2VuZGVyKiBsaW5rIHByb3ZpZGVkIHRocm91Z2ggZW1haWwsCmh0dHBzOi8vZmlsZXNlbmRlci5iZWxuZXQuYmUvP3M9ZG93bmxvYWQmdG9rZW49ZWI4MTM2ZGYtNjdkMy00ODY5LWIyYTktZjY1NzY3MDU0ZTgxLgoKVGhlIGRhdGEgb3JpZ2luYWwgKDEyNSBwYXRpZW50KSBjb3VsZCBiZSBkb3dubG9hZGVkIGZyb20gCltaZW5vZG9dKGh0dHBzOi8vemVub2RvLm9yZy9yZWNvcmQvMzYyNTAyNCMuWVdmYWh0bEJ4QjEpLiBBdCB0aGUgYm90dG9tIG9mIHRoaXMKd2ViLXBhZ2UsIHdlIGNhbiBkb3dubG9hZCB0aGUgZmlsZXMgYHJhd19jb3VudHMuY3N2LnppcGAgYW5kIApgY2VsbF9tZXRhZGF0YV9jb2xzLnRzdmAgYW5kIHN0b3JlIHRoZXNlIGZpbGVzIGxvY2FsbHkuIFdlIGRvIG5vdCByZWNvbW1lbmQgCmRvaW5nIHRoaXMgZHVyaW5nIHRoZSBsYWIgc2Vzc2lvbiwgdG8gYXZvaWQgb3ZlcmxvYWRpbmcgdGhlIHN5c3RlbS4KCiMgSW1wb3J0IGRhdGEKCkZpcnN0IHdlIHJlYWQgaW4gdGhlIGNvdW50IG1hdHJpeDoKCmBgYHtyLCBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFLCBldmFsPUZBTFNFfQpzY2UgPC0gcmVhZFJEUygiL1VzZXJzL2pnL0Rlc2t0b3Avc2NlXzE1X2N1b21vLnJkcyIpICNjaGFuZ2UgdG8geW91ciBkYXRhIHBhdGgKc2NlCmBgYAoKIyBFeHBsb3JlIG1ldGFkYXRhCgpFeHBsb3JhdGlvbiBvZiB0aGUgbWV0YWRhdGEgaXMgZXNzZW50aWFsIHRvIGdldCBhIGJldHRlciBpZGVhIG9mIHdoYXQgdGhlCmV4cGVyaW1lbnQgd2FzIGFib3V0IGFuZCBob3cgaXQgd2FzIG9yZ2FuaXplZC4gSW4gY29udHJhc3Qgd2l0aCB0aGUgcHJldmlvdXMKZGF0YXNldCBieSBNYWNvc2tvIGV0IGFsLiwgd2UgaGVyZSBoYXZlIGEgbGFyZ2UgYW1vdW50IG9mIG1ldGFkYXRhIHRoYXQgd2UgY2FuCndvcmsgd2l0aDsgYW5kIHRoYXQgd2UgbmVlZCB0byBleHBsb3JlLgoKV2hlbiB3ZSB0aGluayBvZiB0aGUgZXhwZXJpbWVudCwgdGhlIGtleSBhc3BlY3RzIGFyZTsKCi0gQXQgd2hpY2ggZGF5IG9mIHRoZSBkZXZlbG9wbWVudGFsIHByb2Nlc3MgdGhlIGNlbGxzIHdlcmUgc2VxdWVuY2VkICh3aGljaApzaG91bGQgYmUgYSBwcm94eSBmb3IgdGhlIGNlbGwgdHlwZSkKCi0gQ2VsbHMgY29tZSBmcm9tIDEyNSAoMTUgaW4gdGhpcyByZWR1Y2VkIGRhdGFzZXQpIGRvbm9ycwoKSW4gYWRkaXRpb24sIHRvIHJlZHVjZSB0ZWNobmljYWwgYXJ0ZWZhY3RzIGFuZCB0byBhbGxvdyBmb3IgYmF0Y2ggY29ycmVjdGlvbiwgCmVhY2ggYmF0Y2ggKCJleHBlcmltZW50IiB2YXJpYWJsZSkgbWF5IGNvbnRhaW4gY2VsbHMgb2YgbXVsdGlwbGUgcGF0aWVudHMvZGF5cy4KCkV4cGxvcmUgaGUgbWV0YWRhdGEuIFRoZSBgdGFibGUoKWAgZnVuY3Rpb24gd2lsbCBjb21lIGluIGhhbmR5IGZvciB0aGlzIChzZWUKdGhlIE1hY29za28gYW5hbHlzaXMpLgoKIyBPYnRhaW5pbmcgYW5kIGluY2x1ZGluZyByb3dEYXRhCgotIEFzc2VzcyB3aGF0IGlzIGN1cnJlbnRseSBzdG9yZWQgaW4gdGhlIGByb3dEYXRhYCBvZiB0aGUgU2luZ2VsQ2VsbEV4cGVyaW1lbnQKb2JqZWN0LgoKLSBSZXRyaWV2ZSByZWxldmFudCBpbmZvcm1hdGlvbiBmb3JtIGBiaW9NYXJ0YC4gTWFrZSBzdXJlIHRvIHNlbGVjdCB0aGUgcmlnaHQKdmFsdWVzIGZvciB0aGUgYGRhdGFzZXRgIGFuZCBgdmVyc2lvbmAgYXJndW1lbnRzIGZvciB0aGUgYHVzZUVuc2VtYmxgIGZ1bmN0aW9uCih0aGVzZSBjYW4gYmUgcmV0cmlldmVkIGZyb20gdGhlIEN1b21vIGV0IGFsLiBwYXBlcikuCgojIEZpbHRlcmluZyBub24taW5mb3JtYXRpdmUgZ2VuZXMKCkZpbHRlciB0aGUgZ2VuZXMgdXNpbmcgcmVsYXZhbnQgY3JpdGVyaWEuIENvbXBhcmUgeW91ciByZXN1bHRzIHdpdGggd2hhdCB3ZQpvYnRhaW5lZCB3aXRoIHRoZSBNYWNvc2tvIGFuYWx5c2lzLiBDYW4geW91IGV4cGxhaW4gd2hhdCB5b3Ugb2JzZXJ2ZT8KCiMgUXVhbGl0eSBjb250cm9sCgojIyBDYWxjdWxhdGUgUUMgdmFyaWFibGVzCgpVc2UgYHBlckNlbGxRQ01ldHJpY3NgIHRvIGNvbXB1dGUgUUMgbWV0cmljcy4KCiMjIEV4cGxvcmF0b3J5IGRhdGEgYW5hbHlzaXMKCiMjIFFDIHVzaW5nIGFkYXB0aXZlIHRocmVzaG9sZHMKClZpc3VhbGl6ZSB0aGUgY2VsbHMgdGhhdCBhcmUgZ29pbmcgdG8gYmUgcmVtb3ZlZC4gQXJlIHlvdSBoYXBweSB3aXRoIHRoZQpzZWxlY3Rpb24gY3JpdGVyaW9uLCBpLmUuLCBkb2VzIGl0IGFwcGVhciB0aGF0IHdlIGFyZSBvbmx5IHJlbW92aW5nCnRlY2huaWNhbCBhcnRlZmFjdHMgb3IgY291bGQgd2UgYmUgcmVtb3ZpbmcgYmlvbG9naWNhbCBzaWduYWwgYXMgd2VsbD8KClRvIGRvIHRoaXMsIHRyeSBjb2xvcmluZyB0aGUgImRldGVjdGVkIiB2ZXJzdXMgInN1YnNldHNfTWl0b19wZXJjZW50IiBwbG90IGFuZAoic3VtIiB2ZXJzdXMgImRldGVjdGVkIiBwbG90cyBiYXNlZCBvbiBiaW9sb2dpY2FsbHkgc2lnbmlmaWNhbnQgbWV0YWRhdGEuCgojIyBSZW1vdmUgZW1wdHkgZHJvcGxldHMKCldoYXQgZG8geW91IHRoaW5rIG9mIHRoaXMgc3RlcCBmb3IgdGhlIGFuYWx5c2lzIG9mIHRoaXMgZGF0YXNldD8KCiMjIElkZW50aWZ5aW5nIGFuZCByZW1vdmluZyBkb3VibGV0cwoKV2hhdCBkbyB5b3UgdGhpbmsgb2YgdGhpcyBzdGVwIGZvciB0aGUgYW5hbHlzaXMgb2YgdGhpcyBkYXRhc2V0PwoKIyBOb3JtYWxpemF0aW9uCgoKCgoKCg==