Proteomics Data Analysis for Data-dependent and Data-independent acquisition at EBI 2026 (PDA26EBI)

Author
Affiliation

Lieven Clement

Ghent University

This course provides comprehensive hands-on tutorials on how to apply the msqrob2 software for the statistical analysis of mass spectrometry (MS)-based proteomics data.
The course first introduces general concepts of statistical proteomics data analysis and msqrob2. Further chapters will demonstrate the application of msqrob2 for assessing different biological questions starting from datasets with different experimental designs, acquisition strategies, instruments, and search engines. The book aims to help proteomics researchers and data analysists tailoring their statistical analysis workflow to their specific datasets and research questions.

Why msqrob2?

MS-based proteomics experiments often imposes a complex correlation structure among observations. Addressing this correlation is key for correct statistical inference and reliable biomarker discovery. This msqrob2 course provides a set of model-based workflows dedicated to differential abundance analysis for label-free as well as labeled MS-based proteomics data. The key features of msqrob2 workflows are:

  1. Modularity: all core functions rely on the QFeatures class, a standardised data structure, meaning that output of a function can be fed as input to any other function. Hence, different functions are assembled as modular blocks into a complete data analysis workflows that can be easily adapted to the peculiarities of any MS-based proteomics data set. Therefore, the approach extends well beyond the use case presented in this chapter
  2. Flexibility: the msqrob2 modelling approach relies on the lme4::lmer() model specification syntax, meaning that any linear model can be specified. For fixed effects, this includes modelling categorical and numerical variables, as well as their interaction. Moreover, msqrob2 can model both sample-specific and feature-specific (e.g. peptide or protein) covariates, which unlocks the inference to experiments with arbitrarily complex designs as well as to correct explicitly for feature-specific properties.
  3. Performance: thanks to the inclusion of robust ridge regression, we demonstrated improved performance of msqrob2 workflows upon the competing software (Goeminne et al. 2016; Sticker et al. 2020; Vandenbulcke et al. 2025).

Outline

The course is divided in two parts.

1 Data Processing

This parts introduces the user to the key concepts for data processing in differential proteomics data analysis and provides extensive description of the code. While this part is conceptual, the concepts are illustrated using a real spike-in study.

2 Statistical Inference & Design concepts

3 Vignettes

More example scripts for case-studies from various technologies, search engines and experimental designs can be found in our e-book: msqrob2book

4 Targeted audience and assumed background

The course material is targeted to either proteomics practitioners or data analysts/bioinformaticians that would like to learn how to analyse proteomics data.

Familiarity with MS or proteomics in general is recommended, this would allow for a better understanding of the modelling assumptions taken throughout this course.

The course is available in two different tracks.

  1. Using a graphical user interface. Here, there is a focus on general modeling concepts for proteomics data analysis.
  2. Using Rmarkdown/Quarto scripts. In this track, learners will learn both the general modeling concepts as well as how to implement these concepts in the R/Bioconductor coding language.

For learning that want to follow the course using scripts:

  • A working knowledge of R (R syntax, commonly used functions, basic data structures such as data frames, vectors, matrices, … and their manipulation) is required for .

  • Familiarity with other Bioconductor omics data classes and the tidyverse syntax is useful.

  • We recommend reading the quantitative proteomics chapter of the R for mass spectrometry book.

  • We highly recommend reading our msqrob2book for more in dept modeling concepts and case studies msqrob2book.

5 Software and data

To install all the necessary package, please use R 4.6 or higher and execute the code chunk below in the RStudio Console/R console:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(c(
   "arrow",
   "BiocParallel",
   "BiocFileCache",
   "ComplexHeatmap",
   "dplyr",
   "ExploreModelMatrix",
   "ggpattern",
   "ggplot2",
   "ggrepel",
   "impute",
   "MsDataHub",
   "patchwork",
   "scater",
   "tidyr",
   "bookdown", 
   "iq",
   "QFeatures",
   "msqrob2",
   "kableExtra",
   "data.table",
   "ggcorrplot",
   "ggpubr"
))

Users that would like to make use of the msqrob2gui should also install:

BiocManager::install("remotes")
BiocManager::install("statomics/msqrob2gui")

All data for this course can be directly accessed from the web msqrob2data if you use R scripting or can be downloaded locally: Download data

License

Creative Commons Licence
This material is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. You are free to share (copy and redistribute the material in any medium or format) and adapt (remix, transform, and build upon the material) for any purpose, even commercially, as long as you give appropriate credit and distribute your contributions under the same license as the original.

References

Goeminne, Ludger J E, Kris Gevaert, and Lieven Clement. 2016. “Peptide-Level Robust Ridge Regression Improves Estimation, Sensitivity, and Specificity in Data-Dependent Quantitative Label-Free Shotgun Proteomics.” Mol. Cell. Proteomics 15 (2): 657–68.
Sticker, Adriaan, Ludger Goeminne, Lennart Martens, and Lieven Clement. 2020. “Robust Summarization and Inference in Proteome-Wide Label-Free Quantification.” Mol. Cell. Proteomics 19 (7): 1209–19.
Vandenbulcke, Stijn, Christophe Vanderaa, Oliver Crook, Lennart Martens, and Lieven Clement. 2025. Msqrob2TMT: Robust Linear Mixed Models for Inferring Differential Abundant Proteins in Labeled Experiments with Arbitrarily Complex Design.” Mol. Cell. Proteomics 24 (7): 101002.