1 Course Description

High-throughput ’omics studies generate ever larger datasets and, as a consequence, complex data interpretation challenges. This course focusses on statistical concepts involved in preprocessing, quantification and differential analysis of high-throughput ’omics data. The core focus will be on shotgun proteomics and (bulk and single-cell) RNA-sequencing. Experimental design is essential to allow for correct interpretation in all ’omics studies, and we will cover how to design a statistically efficient experiment, as well as discuss the impact experimental design has on how we model ’omics data, introducing concepts such as blocking. The course will rely exclusively on free and user-friendly open-source tools in R/Bioconductor. We hope that this will provide a solid basis for beginners, but will also bring new perspectives to those already familiar with standard data analysis workflows for proteomics and next-generation sequencing applications.

2 Target Audience

This course is oriented towards biologists and bioinformaticians with a particular interest in differential analysis for quantitative ’omics data.

3 GitHub repository

All source and data files for this course are available on the accompanying GitHub repository.

4 Prerequisites

The prerequisites for the Statistical Genomics Analysis course are the successful completion of a basic course of statistics that covers topics on data exploration and descriptive statistics, statistical modeling, and inference: linear models, confidence intervals, t-tests, F-tests, anova, chi-squared test. The basis concepts may be revisited in the online course at https://gtpb.github.io/PSLS20/ (English) and in https://statomics.github.io/statistiekCursusNotas/ (Dutch).

In addition, knowledge of programming in R is preferred. A primer to R and Data visualization in R can be found at:

5 Software

  • Participants are required to bring their own laptop with R version 4.1.1 or greater.

  • We also recommend to also install the latest version of RStudio.

  • Installation script: to install all required packages, please copy and paste this line of code in your R console.

source("https://raw.githubusercontent.com/statOmics/SGA21/master/install.R")
  • Participants who have issues with the installation of the R/Rstudio can use an Rstudio instance in the cloud with all packages installed for the course in the mean time. Note, that this is instance is not for routine use.

Binder

6 Detailed Program

  1. Position of the course: PDF

  2. Recap Linear Models (Week 1)

6.1 Module I: Proteomics Data Analysis (Week 1-5)

  1. Bioinformatics for proteomics

  1. Preprocessing & Analysis of Simple Designs (week 3)

  1. Statistical Inference & Analysis of Factorial Designs (Week 3-4)

  2. Advanced materials and reading materials

  3. Solutions

6.2 Module II: Bulk RNA-sequencing

  1. Introduction to sequencing technology, raw data and preprocessing.

  1. Working with count data and generalized linear models.

  1. Analysis of RNA-seq data.

  1. Technical topics in bulk RNA-seq differential expression analysis (DEA).

6.3 Module III: Single-cell RNA-sequencing

  1. General concepts and analysis workflow of single-cell RNA-seq data.

