2. Preprocessing and statistical analysis wit msqrob2 for experiments with simple designs

2.1 The CPTAC A vs B dataset

The result of a quantitative analysis is a list of peptide and/or protein abundances for every protein in different samples, or abundance ratios between the samples. In this chapter we will describe a generic workflow for differential analysis of quantitative datasets with simple experimental designs.

In order to extract relevant information from these massive datasets, we will use our msqrob2 software tool (Goeminne, Gevaert, and Clement 2016), (Goeminne et al. 2020) and (Sticker et al. 2020).

The tutorial can be done using R/Rmarkdown scripts or using the graphical user interface that is provided by the msqrob2gui Shiny App.

2.2 The CPTAC A vs B dataset

Our first case-study is a subset of the data of the 6th study of the Clinical Proteomic Technology Assessment for Cancer (CPTAC). In this experiment, the authors spiked the Sigma Universal Protein Standard mixture 1 (UPS1) containing 48 different human proteins in a protein background of 60 ng/μL Saccharomyces cerevisiae strain BY4741 (MATa, leu2Δ0, met15Δ0, ura3Δ0, his3Δ1). Two different spike-in concentrations were used: 6A (0.25 fmol UPS1 proteins/μL) and 6B (0.74 fmol UPS1 proteins/μL) [5].

We limited ourselves to the data of LTQ-Orbitrap W at site 56. The data were searched with MaxQuant version 1.5.2.8, and detailed search settings were described in (Goeminne, Gevaert, and Clement 2016). Three replicates are available for each concentration.

The raw data files can be downloaded from https://cptac-data-portal.georgetown.edu/cptac/public?scope=Phase+I (Study 6)
The MaxQuant data can be downloaded zip file with data. The peptides.txt file can be found in data/quantification/cptacAvsB_lab3.
Note, that participants who use R/Rmarkdown scripts do not have to download the data as they can directly import the data from the web in R within their script.

[2.3.a] Participants can perform an analysis using the GUI or an Rmarkdown script

Follow the steps in the GUI or in the script and try to understand each of the analysis steps. We know the real FC for the spike in proteins and the yeast proteins (see description of the data). What do you observe?

[2.3.b] Repeat the analysis for the median summarization method. What do you observe, how does that compare to the robust summarisation and try to explain this?

2.3 Breast cancer example

Eighteen Estrogen Receptor Positive Breast cancer tissues from from patients treated with tamoxifen upon recurrence have been assessed in a proteomics study. Nine patients had a good outcome (or) and the other nine had a poor outcome (pd). The proteomes have been assessed using an LTQ-Orbitrap and the thermo output .RAW files were searched with MaxQuant (version 1.4.1.2) against the human proteome database (FASTA version 2012-09, human canonical proteome).

Three peptides txt files are available:

For a 3 vs 3 comparison
For a 6 vs 6 comparison
For a 9 vs 9 comparison

References

Goeminne, L. J. E., A. Sticker, L. Martens, K. Gevaert, and L. Clement. 2020. “MSqRob Takes the Missing Hurdle: Uniting Intensity- and Count-Based Proteomics.” Anal Chem 92 (9): 6278–87.

Goeminne, L. J., K. Gevaert, and L. Clement. 2016. “Peptide-level Robust Ridge Regression Improves Estimation, Sensitivity, and Specificity in Data-dependent Quantitative Label-free Shotgun Proteomics.” Mol Cell Proteomics 15 (2): 657–68.

Sticker, A., L. Goeminne, L. Martens, and L. Clement. 2020. “Robust Summarization and Inference in Proteome-wide Label-free Quantification.” Mol Cell Proteomics 19 (7): 1209–19.

2. Preprocessing and statistical analysis wit msqrob2 for experiments with simple designs

Lieven Clement

statOmics, Ghent University

2.1 The CPTAC A vs B dataset

2.2 The CPTAC A vs B dataset

2.3 Breast cancer example

References