In this fourth and final lab session, we will perform batch correction, trajectory inference and differential expression testing along the estimated trajectory on the dataset from Cuomo et al.

1 Cuomo dataset

Same as for the Macosko dataset, we will continue our workflow for the Cuomo dataset.

In the experiment, the authors harvested induced pluripotent stem cells (iPSCs) from 125 healthy human donors. These cells were used to study the endoderm differentiation process. The authors cultered the iPSCs cell lines and allowed for differentiation for three days. Knowing the process of endoderm differentiation, these time points should correspond with different cell types: day0 are (undifferentiated) iPSCs, day1 are mesendoderm cells, day2 are “intermediate” cells and day3 are fully differentiated endoderm cells. This dataset was generated using the SMART-Seq2 scRNA-seq protocol.

In the first and second sessions, we already performed several steps:

  1. Remove very lowly expressed genes

  2. Remove low quality cells

    2.1. Cells with outlying library size

    2.2. Cells with outlying transcriptome complexity

    2.3. Cells with outlying percentage of mitochondrial reads

  3. Normalization

    3.1. Compute log-normalized counts

    3.2. Compute scaling factor to correct for differences in library size

  4. Feature selection

    4.1. Genes with high variance

    4.2. Genes with high variance with respect to their mean expression

    4.3. Genes with high deviance

    4.4. Genes with high variance after variance-stabilizing transformation (VST)

  5. Dimensionality reduction

    5.1. Based on two most variable genes from step 6.2.

    5.2. PCA

    5.3. GLM-PCA

    5.4. T-SNE

    5.5. UMAP

During this session, we will add the following steps to this workflow :

  • Batch correction
  • Clustering (hierarchical clustering)
  • Building a trajectory
  • Testing differential expression along the inferred trajectory

To guide you with these next steps, we provide with an Rmarkdown template that you can fill out:

lab4_CuomoTemplate.html

We also provide the solution to this exercise here (sections clustering, marker gene detection and annotation):

lab4_CuomoWorkflow.html


LS0tCnRpdGxlOiAnTGFiNDogQmF0Y2ggY29ycmVjdGlvbiBhbmQgdHJhamVjdG9yeSBpbmZlcmVuY2UnCmF1dGhvcjogIktvZW4gVmFuIGRlbiBCZXJnZSBhbmQgSmVyb2VuIEdpbGlzIgpkYXRlOiAiMTEvMTIvMjAyMSIKb3V0cHV0OgogICAgaHRtbF9kb2N1bWVudDoKICAgICAgY29kZV9kb3dubG9hZDogdHJ1ZSAgICAKICAgICAgdGhlbWU6IGNvc21vCiAgICAgIHRvYzogdHJ1ZQogICAgICB0b2NfZmxvYXQ6IHRydWUKICAgICAgaGlnaGxpZ2h0OiB0YW5nbwogICAgICBudW1iZXJfc2VjdGlvbnM6IHRydWUKLS0tCgpJbiB0aGlzIGZvdXJ0aCBhbmQgZmluYWwgbGFiIHNlc3Npb24sIHdlIHdpbGwgcGVyZm9ybSBiYXRjaCBjb3JyZWN0aW9uLCAKdHJhamVjdG9yeSBpbmZlcmVuY2UgYW5kIGRpZmZlcmVudGlhbCBleHByZXNzaW9uIHRlc3RpbmcgYWxvbmcgdGhlIGVzdGltYXRlZAp0cmFqZWN0b3J5IG9uIHRoZSBkYXRhc2V0IGZyb20gQ3VvbW8gKmV0IGFsLioKCiMgQ3VvbW8gZGF0YXNldAoKU2FtZSBhcyBmb3IgdGhlIE1hY29za28gZGF0YXNldCwgd2Ugd2lsbCBjb250aW51ZSBvdXIgd29ya2Zsb3cgZm9yIHRoZSBDdW9tbwpkYXRhc2V0LiAKCkluIHRoZSBleHBlcmltZW50LCB0aGUgYXV0aG9ycyBoYXJ2ZXN0ZWQgaW5kdWNlZCBwbHVyaXBvdGVudCBzdGVtIGNlbGxzIChpUFNDcykKZnJvbSAxMjUgaGVhbHRoeSBodW1hbiBkb25vcnMuIFRoZXNlIGNlbGxzIHdlcmUgdXNlZCB0byBzdHVkeSB0aGUgZW5kb2Rlcm0gCmRpZmZlcmVudGlhdGlvbiBwcm9jZXNzLiBUaGUgYXV0aG9ycyBjdWx0ZXJlZCB0aGUgaVBTQ3MgY2VsbCBsaW5lcyBhbmQgYWxsb3dlZCAKZm9yIGRpZmZlcmVudGlhdGlvbiBmb3IgdGhyZWUgZGF5cy4gS25vd2luZyB0aGUgcHJvY2VzcyBvZiBlbmRvZGVybSAKZGlmZmVyZW50aWF0aW9uLCB0aGVzZSB0aW1lIHBvaW50cyBzaG91bGQgY29ycmVzcG9uZCB3aXRoIGRpZmZlcmVudCBjZWxsIHR5cGVzOiAKZGF5MCBhcmUgKHVuZGlmZmVyZW50aWF0ZWQpIGlQU0NzLCBkYXkxIGFyZSBtZXNlbmRvZGVybSBjZWxscywgZGF5MgphcmUgImludGVybWVkaWF0ZSIgY2VsbHMgYW5kIGRheTMgYXJlIGZ1bGx5IGRpZmZlcmVudGlhdGVkIGVuZG9kZXJtIGNlbGxzLgpUaGlzIGRhdGFzZXQgd2FzIGdlbmVyYXRlZCB1c2luZyB0aGUgKipTTUFSVC1TZXEyKiogc2NSTkEtc2VxIHByb3RvY29sLgoKSW4gdGhlIGZpcnN0IGFuZCBzZWNvbmQgc2Vzc2lvbnMsIHdlIGFscmVhZHkgcGVyZm9ybWVkIHNldmVyYWwgc3RlcHM6CgoKMS4gUmVtb3ZlIHZlcnkgbG93bHkgZXhwcmVzc2VkIGdlbmVzCgoyLiBSZW1vdmUgbG93IHF1YWxpdHkgY2VsbHMgCgogICAgMi4xLiBDZWxscyB3aXRoIG91dGx5aW5nIGxpYnJhcnkgc2l6ZSAKICAgIAogICAgMi4yLiBDZWxscyB3aXRoIG91dGx5aW5nIHRyYW5zY3JpcHRvbWUgY29tcGxleGl0eSAKICAgIAogICAgMi4zLiBDZWxscyB3aXRoIG91dGx5aW5nIHBlcmNlbnRhZ2Ugb2YgbWl0b2Nob25kcmlhbCByZWFkcyAKCjMuIE5vcm1hbGl6YXRpb24gCgogICAgMy4xLiBDb21wdXRlIGxvZy1ub3JtYWxpemVkIGNvdW50cwogICAgCiAgICAzLjIuIENvbXB1dGUgc2NhbGluZyBmYWN0b3IgdG8gY29ycmVjdCBmb3IgZGlmZmVyZW5jZXMgaW4gbGlicmFyeSBzaXplCgo0LiBGZWF0dXJlIHNlbGVjdGlvbiAKICAgIAogICAgNC4xLiBHZW5lcyB3aXRoIGhpZ2ggdmFyaWFuY2UgCiAgICAKICAgIDQuMi4gR2VuZXMgd2l0aCBoaWdoIHZhcmlhbmNlIHdpdGggcmVzcGVjdCB0byB0aGVpciBtZWFuIGV4cHJlc3Npb24gCiAgICAKICAgIDQuMy4gR2VuZXMgd2l0aCBoaWdoIGRldmlhbmNlIAogICAgCiAgICA0LjQuIEdlbmVzIHdpdGggaGlnaCB2YXJpYW5jZSBhZnRlciB2YXJpYW5jZS1zdGFiaWxpemluZyB0cmFuc2Zvcm1hdGlvbiAoVlNUKQoKNS4gRGltZW5zaW9uYWxpdHkgcmVkdWN0aW9uCgogICAgNS4xLiBCYXNlZCBvbiB0d28gbW9zdCB2YXJpYWJsZSBnZW5lcyBmcm9tIHN0ZXAgNi4yLiAKICAgIAogICAgNS4yLiBQQ0EgCiAgICAKICAgIDUuMy4gR0xNLVBDQSAKICAgIAogICAgNS40LiBULVNORSAKICAgIAogICAgNS41LiBVTUFQCiAKIER1cmluZyB0aGlzIHNlc3Npb24sIHdlIHdpbGwgYWRkIHRoZSBmb2xsb3dpbmcgc3RlcHMgdG8gdGhpcyB3b3JrZmxvdyA6CiAKIC0gQmF0Y2ggY29ycmVjdGlvbgogLSBDbHVzdGVyaW5nIChoaWVyYXJjaGljYWwgY2x1c3RlcmluZykKIC0gQnVpbGRpbmcgYSB0cmFqZWN0b3J5CiAtIFRlc3RpbmcgZGlmZmVyZW50aWFsIGV4cHJlc3Npb24gYWxvbmcgdGhlIGluZmVycmVkIHRyYWplY3RvcnkKIApUbyBndWlkZSB5b3Ugd2l0aCB0aGVzZSBuZXh0IHN0ZXBzLCB3ZSBwcm92aWRlIHdpdGggYW4gUm1hcmtkb3duIHRlbXBsYXRlCnRoYXQgeW91IGNhbiBmaWxsIG91dDoKCltsYWI0X0N1b21vVGVtcGxhdGUuaHRtbF0oLi9sYWI0X0N1b21vVGVtcGxhdGUuaHRtbCkKCldlIGFsc28gcHJvdmlkZSB0aGUgc29sdXRpb24gdG8gdGhpcyBleGVyY2lzZSBoZXJlIChzZWN0aW9ucyBjbHVzdGVyaW5nLAptYXJrZXIgZ2VuZSBkZXRlY3Rpb24gYW5kIGFubm90YXRpb24pOgoKW2xhYjRfQ3VvbW9Xb3JrZmxvdy5odG1sXSguL2xhYjQvbGFiNF9DdW9tb1dvcmtmbG93Lmh0bWwpCgoKLS0t