Course Description

High-throughput ’omics studies generate ever larger datasets and, as a consequence, complex data interpretation challenges. This course focusses on statistical concepts involved in preprocessing, quantification and differential analysis of high-throughput ’omics data. The core focus will be on shotgun proteomics and (bulk and single-cell) RNA-sequencing. Experimental design is essential to allow for correct interpretation in all ’omics studies, and we will cover how to design a statistically efficient experiment, as well as discuss the impact experimental design has on how we model ’omics data, introducing concepts such as blocking. The course will rely exclusively on free and user-friendly open-source tools in R/Bioconductor. We hope that this will provide a solid basis for beginners, but will also bring new perspectives to those already familiar with standard data analysis workflows for proteomics and next-generation sequencing applications.

Target Audience

This course is oriented towards biologists and bioinformaticians with a particular interest in differential analysis for quantitative ’omics data.

Prerequisites

The prerequisites for the Statistical Genomics Analysis course are the successful completion of a basic course of statistics that covers topics on data exploration and descriptive statistics, statistical modeling, and inference: linear models, confidence intervals, t-tests, F-tests, anova, chi-squared test. The basis concepts may be revisited in the online course at https://statomics.github.io/PSLS/ (English) and in https://statomics.github.io/sbc/ (Dutch).

In addition, knowledge of programming in R is preferred. A primer to R and Data visualization in R can be found at:

Software

source("https://raw.githubusercontent.com/statOmics/SGA/master/install.R")

Detailed Program

1. Introduction (Week 1)

1.1. Position of the course: HTML

1.2. Recap Linear Models

Module I: Proteomics Data Analysis (Week 2-5)

1. Bioinformatics for proteomics

2. Statistics for Proteomics Data Analysis

2.1. Identification

2.2. Preprocessing & Analysis of Label Free Quantitative Proteomics Experiments with Simple Designs

2.3. Statistical Inference & Analysis of Experiments with Factorial Designs

Module II: Next-generation Sequencing (Week 6-12)

1. Introduction to sequencing technology, QC, read mapping and count table

2. Introduction to count data and GLMs

3. Technical details on RNA-seq DE analysis

4. DE and usage analysis upon transcript-level quantification

5. Solutions Bulk RNA-seq Data Analysis

  • airway example: Mapping html
  • airway example: html
  • Parathyroid Example (quasi-edgeR): html, pdf
  • airway transcript quantification: html

6. Introduction to Single Cell Transcriptomics (scRNA-seq)