Duguet et al. 2017 compared the proteomes of mouse regulatory T cells (Treg) and conventional T cells (Tconv) in order to discover differentially regulated proteins between these two cell populations. For each biological repeat the proteomes were extracted for both Treg and Tconv cell pools, which were purified by flow cytometry. The data in data/quantification/mouseTcell on the pdaData repository are a subset of the data PXD004436 on PRIDE.
We will use a subset of the data with a randomized complete block (RCB) design, i.e. the dataset consists of four mice for which the proteome of both conventional and regulatory T cells are assessed.
We first import the peptides.txt file. This is the file that contains your peptide-level intensities. For a MaxQuant search [6], this peptides.txt file can be found by default in the “path_to_raw_files/combined/txt/” folder from the MaxQuant output, with “path_to_raw_files” the folder where raw files were saved. In this tutorial, we will use a MaxQuant peptides file from MaxQuant that can be found in the data tree of the SGA2020 github repository https://github.com/statOmics/SGA2020/tree/data/quantification/mouseTcell .
To import the data we use the QFeatures
package.
We generate the object peptideRawFile with the path to the peptideRaws.txt file. Using the grepEcols
function, we find the columns that contain the expression data of the peptideRaws in the peptideRaws.txt file.
library(tidyverse)
library(limma)
library(QFeatures)
library(msqrob2)
library(plotly)
"https://raw.githubusercontent.com/statOmics/SGA2020/data/quantification/mouseTcell/peptidesRCB.txt"
peptidesFile <-
MSnbase::grepEcols(
ecols <-
peptidesFile,"Intensity ",
split = "\t")
readQFeatures(
pe <-table = peptidesFile,
fnames = 1,
ecol = ecols,
name = "peptideRaw", sep="\t")
pe
## An instance of class QFeatures containing 1 assays:
## [1] peptideRaw: SummarizedExperiment with 55814 rows and 8 columns
"peptideRaw"]] pe[[
## class: SummarizedExperiment
## dim: 55814 8
## metadata(0):
## assays(1): ''
## rownames(55814): AAAAAAAAAAGAAGGR AAAAAAAAAAGDSDSWDADTFSMEDPVRK ...
## YYYDGDMICK YYYDKNIIHK
## rowData names(74): Sequence N.term.cleavage.window ...
## Oxidation..M..site.IDs MS.MS.Count
## colnames(8): Intensity.Tconv.M12_2 Intensity.Tconv.M12_3 ...
## Intensity.Treg.M5_inj1 Intensity.Treg.M6_inj1
## colData names(0):
We will make use from data wrangling functionalities from the tidyverse package. The %>% operator allows us to pipe the output of one function to the next function.
colData(pe)$celltype <- substr(
colnames(pe[["peptideRaw"]]),
11,
14) %>%
unlist %>%
as.factor
colData(pe)$mouse <- pe[[1]] %>%
colnames %>%
strsplit(split="[.]") %>%
sapply(function(x) x[3]) %>%
as.factor
We calculate how many non zero intensities we have per peptide and this will be useful for filtering.
rowData(pe[["peptideRaw"]])$nNonZero <- rowSums(assay(pe[["peptideRaw"]]) > 0)
Peptides with zero intensities are missing peptides and should be represent with a NA
value rather than 0
.
zeroIsNA(pe, "peptideRaw") # convert 0 to NA pe <-
We can inspect the missingness in our data with the plotNA()
function provided with MSnbase
. 38% of all peptide intensities are missing and for some peptides we do not even measure a signal in any sample. The missingness is similar across samples.
::plotNA(assay(pe[["peptideRaw"]])) +
MSnbase xlab("Peptide index (ordered by data completeness)")
This section preforms standard preprocessing for the peptide data. This include log transformation, filtering and summarisation of the data.
logTransform(pe, base = 2, i = "peptideRaw", name = "peptideLog")
pe <-::plotDensities(assay(pe[["peptideLog"]])) limma
In our approach a peptide can map to multiple proteins, as long as there is none of these proteins present in a smaller subgroup.
"peptideLog"]] <-
pe[[ pe[["peptideLog"]][rowData(pe[["peptideLog"]])$Proteins
%in% smallestUniqueGroups(rowData(pe[["peptideLog"]])$Proteins),]
We now remove the contaminants, peptides that map to decoy sequences, and proteins which were only identified by peptides with modifications.
"peptideLog"]] <- pe[["peptideLog"]][rowData(pe[["peptideLog"]])$Reverse != "+", ]
pe[["peptideLog"]] <- pe[["peptideLog"]][rowData(pe[["peptideLog"]])$
pe[[!= "+", ] Potential.contaminant
I will skip this step for the moment. Large protein groups file needed for this.
We keep peptides that were observed at last twice.
"peptideLog"]] <- pe[["peptideLog"]][rowData(pe[["peptideLog"]])$nNonZero >= 2, ]
pe[[nrow(pe[["peptideLog"]])
## [1] 44449
We keep 44449 peptides after filtering.
normalize(pe, i = "peptideLog", method = "quantiles", name = "peptideNorm") pe <-
After quantile normalisation the density curves for all samples coincide.
::plotDensities(assay(pe[["peptideNorm"]])) limma
This is more clearly seen is a boxplot.
boxplot(assay(pe[["peptideNorm"]]), col = palette()[-1],
main = "Peptide distribtutions after normalisation", ylab = "intensity")
We can visualize our data using a Multi Dimensional Scaling plot, eg. as provided by the limma
package.
::plotMDS(assay(pe[["peptideNorm"]]), col = as.numeric(colData(pe)$celltype)) limma
The first axis in the plot is showing the leading log fold changes (differences on the log scale) between the samples.
We use robust summarization in aggregateFeatures. This is the default workflow of aggregateFeatures so you do not have to specifiy the argument fun
. However, because we compare methods we have included the fun
argument to show the summarization method explicitely.
aggregateFeatures(pe,
pe <-i = "peptideNorm",
fcol = "Proteins",
na.rm = TRUE,
name = "proteinRobust",
fun = MsCoreUtils::robustSummary)
## Your quantitative and row data contain missing values. Please read the
## relevant section(s) in the aggregateFeatures manual page regarding the
## effects of missing values on data aggregation.
plotMDS(assay(pe[["proteinRobust"]]), col = as.numeric(colData(pe)$celltype))
We model the protein level expression values using msqrob
. By default msqrob2
estimates the model parameters using robust regression.
msqrob(
pe <-object = pe,
i = "proteinRobust",
formula = ~ celltype + mouse)
First, we extract the parameter names of the model.
getCoef(rowData(pe[["proteinRobust"]])$msqrobModels[[1]])
## (Intercept) celltypeTreg mouseM12_3 mouseM5_inj1 mouseM6_inj1
## 20.31055846 0.18119940 0.14049510 -0.05802897 -0.25319752
Spike-in celltype a is the reference class. So the mean log2 expression for samples from celltype a is ‘(Intercept). The mean log2 expression for samples from celltype B is’(Intercept)+celltypeTreg’. Hence, the average log2 fold change between celltype b and celltype a is modelled using the parameter ‘celltypeTreg’. Thus, we assess the contrast ‘celltypeTreg=0’ with our statistical test.
makeContrast("celltypeTreg=0", parameterNames = c("celltypeTreg"))
L <- hypothesisTest(object = pe, i = "proteinRobust", contrast = L) pe <-
ggplot(rowData(pe[["proteinRobust"]])$celltypeTreg,
volcano <-aes(x = logFC, y = -log10(pval), color = adjPval < 0.05)) +
geom_point(cex = 2.5) +
scale_color_manual(values = alpha(c("black", "red"), 0.5)) + theme_minimal()
volcano
We first select the names of the proteins that were declared signficant.
rowData(pe[["proteinRobust"]])$celltypeTreg %>%
sigNames <- rownames_to_column("proteinRobust") %>%
filter(adjPval<0.05) %>%
pull(proteinRobust)
heatmap(assay(pe[["proteinRobust"]])[sigNames, ])
There are 125 proteins significantly differentially expressed at the 5% FDR level.
rowData(pe[["proteinRobust"]])$celltypeTreg %>%
filter(adjPval<0.05)
## logFC se df t pval adjPval
## O08807 1.1308315 0.1833569 5.648113 6.167380 1.046695e-03 0.039921853
## O09131 3.1609273 0.1479779 4.648113 21.360810 8.072302e-06 0.006988596
## O55101 1.1232913 0.1600783 5.648113 7.017136 5.434165e-04 0.033969286
## O70172 -0.6865617 0.1151301 5.648113 -5.963357 1.238085e-03 0.041420953
## O70293 -0.8775674 0.1440128 5.648113 -6.093675 1.111601e-03 0.039921853
## O70370 1.0284668 0.1711190 5.648113 6.010243 1.190755e-03 0.041420953
## O70400 -1.2338128 0.1960765 5.153478 -6.292506 1.333900e-03 0.043578264
## O70404 0.9618058 0.1316568 5.648113 7.305401 4.414947e-04 0.031733905
## O88508 1.1930233 0.1615099 5.648113 7.386689 4.168921e-04 0.031733905
## O88673 -0.8467915 0.1302400 5.648113 -6.501780 8.021487e-04 0.038052617
## P00329 -1.4424899 0.2214192 5.427854 -6.514746 9.284940e-04 0.039388787
## P07091 4.8997200 0.3411874 4.648113 14.360786 4.989865e-05 0.012048266
## P07356 2.2238855 0.1834747 5.648113 12.120936 2.951939e-05 0.012048266
## P07742 1.1501898 0.1183879 5.648113 9.715436 9.794316e-05 0.018843175
## P08207 2.6109023 0.1663444 5.611233 15.695763 7.517503e-06 0.006988596
## P09055 1.5345051 0.1480667 4.648113 10.363608 2.184426e-04 0.024402147
## P10630 -0.9238671 0.1646600 5.648113 -5.610757 1.672613e-03 0.048446312
## P13020 -1.6297739 0.1351352 5.217919 -12.060323 5.218712e-05 0.012048266
## P14094 -0.9957740 0.1601599 5.648113 -6.217374 1.005158e-03 0.039921853
## P15307 1.0980879 0.1481947 5.223787 7.409764 5.820622e-04 0.035279910
## P16045 1.5399243 0.1626141 5.648113 9.469806 1.123633e-04 0.019682654
## P16546 -0.7844787 0.1153365 5.239508 -6.801651 8.673449e-04 0.038515182
## P18654 -0.7746212 0.1328495 5.648113 -5.830816 1.384074e-03 0.044351807
## P19182 -0.9684486 0.1426819 5.577278 -6.787465 6.790086e-04 0.036199170
## P20444 1.1727128 0.1701988 5.453875 6.890253 6.909160e-04 0.036199170
## P21550 2.6238968 0.1348836 5.648113 19.453052 2.158410e-06 0.003737287
## P24452 2.8352880 0.1134888 5.318070 24.982985 1.017039e-06 0.003522007
## P25799 0.6909958 0.1118249 5.418866 6.179268 1.208330e-03 0.041420953
## P28867 0.6730851 0.1203984 5.610016 5.590484 1.740865e-03 0.048617866
## P29391 0.8229362 0.1275198 5.648113 6.453400 8.330661e-04 0.038465439
## P29416 -0.8119735 0.1465558 5.648113 -5.540372 1.779113e-03 0.049288547
## P29452 1.8357577 0.2082249 4.648113 8.816225 4.490215e-04 0.031733905
## P30285 1.0754673 0.1230776 5.648113 8.738121 1.725396e-04 0.023052816
## P37913 1.5659728 0.2067609 5.468696 7.573834 4.250041e-04 0.031733905
## P42230 0.9244640 0.1210240 5.410136 7.638682 4.278378e-04 0.031733905
## P45377 1.2161173 0.1646100 5.648113 7.387871 4.165465e-04 0.031733905
## P47856 0.6697006 0.1046098 5.648113 6.401892 8.675092e-04 0.038515182
## P48758 -0.9319064 0.1269985 5.648113 -7.337935 4.314509e-04 0.031733905
## P49717 0.7157706 0.1277728 5.648113 5.601901 1.685602e-03 0.048446312
## P49718 0.9297146 0.1273789 5.648113 7.298811 4.435622e-04 0.031733905
## P50096 -0.6925377 0.1136634 5.648113 -6.092882 1.112324e-03 0.039921853
## P50431 0.7223893 0.1178126 5.648113 6.131681 1.077569e-03 0.039921853
## P51855 -0.7916787 0.1404790 5.648113 -5.635568 1.636834e-03 0.048036925
## P54071 -1.3464611 0.1471637 5.272675 -9.149410 1.970069e-04 0.023525342
## P54227 1.2560082 0.2157827 5.648113 5.820708 1.396000e-03 0.044351807
## P56395 0.7826218 0.1339976 5.463673 5.840566 1.539899e-03 0.047613124
## P57016 2.6797649 0.2651227 4.648113 10.107642 2.443407e-04 0.025640969
## P61028 0.9333022 0.1518969 5.648113 6.144315 1.066524e-03 0.039921853
## P70236 -0.7927699 0.1129426 5.536746 -7.019230 5.908850e-04 0.035279910
## P70302 -0.9558572 0.1414547 5.648113 -6.757337 6.593006e-04 0.036199170
## P70677 1.5194259 0.1663758 5.648113 9.132491 1.364088e-04 0.019682654
## P83093 1.0705572 0.1638430 5.648113 6.534043 7.822681e-04 0.037624922
## P97310 0.8392918 0.1378016 5.208466 6.090581 1.493599e-03 0.047021221
## P97311 0.7419011 0.1329628 5.648113 5.579763 1.718580e-03 0.048617866
## Q00417 -1.5746650 0.2066734 5.562173 -7.619100 3.812116e-04 0.031733905
## Q03267 -1.0971108 0.1375987 5.648113 -7.973262 2.798261e-04 0.026382473
## Q04447 -1.6512368 0.2477056 5.648113 -6.666126 7.066099e-04 0.036199170
## Q05186 1.6345057 0.2084076 4.648113 7.842833 7.511821e-04 0.036638644
## Q3TBT3 0.7942856 0.1274124 5.648113 6.233974 9.917900e-04 0.039921853
## Q3UDE2 -0.8442968 0.1415854 5.648113 -5.963161 1.238287e-03 0.041420953
## Q3UN02 2.1036857 0.3013064 4.648113 6.981883 1.243944e-03 0.041420953
## Q3UW53 3.0415642 0.3507704 5.648113 8.671096 1.797361e-04 0.023052816
## Q4QQM4 1.2594917 0.1445231 5.648113 8.714810 1.750037e-04 0.023052816
## Q5FWK3 -0.7075541 0.1100855 5.648113 -6.427313 8.503088e-04 0.038515182
## Q60611 -1.7864650 0.1933418 5.625782 -9.239933 1.310373e-04 0.019682654
## Q60710 0.9897791 0.1243105 5.648113 7.962154 2.818803e-04 0.026382473
## Q61107 1.9965973 0.3249294 5.648113 6.144711 1.066179e-03 0.039921853
## Q61205 1.2695902 0.1147037 5.648113 11.068430 4.842651e-05 0.012048266
## Q61503 2.5977413 0.2447742 5.446390 10.612809 7.641391e-05 0.015565962
## Q62261 -0.8240089 0.1251300 5.306091 -6.585221 9.630850e-04 0.039704326
## Q62422 -0.8098628 0.1304322 5.648113 -6.209069 1.011924e-03 0.039921853
## Q64521 -0.9875816 0.1215263 5.648113 -8.126483 2.531975e-04 0.025788907
## Q6P9R4 -0.6850826 0.1214353 5.648113 -5.641543 1.628349e-03 0.048036925
## Q6PD03 -1.0273221 0.1414045 5.459413 -7.265129 5.280957e-04 0.033969286
## Q6Q899 -0.7759038 0.1317118 5.648113 -5.890921 1.315540e-03 0.043387770
## Q7TNG5 0.6824210 0.1192425 5.648113 5.722969 1.517631e-03 0.047347364
## Q7TPR4 -1.5847017 0.1160944 5.648113 -13.650108 1.538344e-05 0.010654568
## Q80SU7 1.4799451 0.1212201 5.318848 12.208742 4.306973e-05 0.012048266
## Q80U28 -0.9077972 0.1378244 5.648113 -6.586620 7.510798e-04 0.036638644
## Q80XN0 -0.7491724 0.1247645 5.648113 -6.004692 1.196248e-03 0.041420953
## Q8BFP9 -1.5359545 0.1326203 5.648113 -11.581594 3.784420e-05 0.012048266
## Q8BGC4 -0.9072711 0.1106540 5.648113 -8.199168 2.416041e-04 0.025640969
## Q8BGW0 -2.0239983 0.1552310 5.648113 -13.038620 1.979095e-05 0.011422679
## Q8BH59 -0.7234814 0.1197305 5.188336 -6.042581 1.570921e-03 0.048036925
## Q8BIG7 0.9260449 0.1433524 5.648113 6.459918 8.288215e-04 0.038465439
## Q8BMD8 0.9943482 0.1493381 5.648113 6.658368 7.108125e-04 0.036199170
## Q8BP56 0.7426015 0.1097015 5.648113 6.769293 6.533756e-04 0.036199170
## Q8BUV3 1.3086410 0.1818295 4.648113 7.197079 1.091299e-03 0.039921853
## Q8C0L6 -1.0750717 0.1337049 4.648113 -8.040629 6.736064e-04 0.036199170
## Q8C142 -1.2240028 0.1878836 5.421661 -6.514686 9.326828e-04 0.039388787
## Q8CFB4 1.5198602 0.2169980 5.646550 7.004029 5.493156e-04 0.033969286
## Q8CFK6 0.9319287 0.1507725 5.648113 6.181024 1.035165e-03 0.039921853
## Q8CG47 1.2204853 0.1526773 5.648113 7.993889 2.760578e-04 0.026382473
## Q8K157 1.3394703 0.1991931 5.648113 6.724480 6.759070e-04 0.036199170
## Q8K296 0.9323228 0.1462689 5.648113 6.374033 8.868235e-04 0.038874302
## Q8R001 0.7556182 0.1332287 5.539673 5.671589 1.692753e-03 0.048446312
## Q8R4N0 -1.5092119 0.2035904 4.648113 -7.412983 9.601784e-04 0.039704326
## Q8R502 0.7555189 0.1332652 5.648113 5.669290 1.589607e-03 0.048036925
## Q8VCT3 0.8385194 0.1141835 5.507000 7.343615 4.813845e-04 0.033287033
## Q8VCW8 -0.9998523 0.1418432 5.648113 -7.048996 5.309010e-04 0.033969286
## Q91VV4 -1.0444559 0.1686952 5.648113 -6.191377 1.026515e-03 0.039921853
## Q922J3 -0.6983472 0.1099451 5.611127 -6.351780 9.256730e-04 0.039388787
## Q99KE1 -0.8656428 0.1155267 5.648113 -7.493008 3.870780e-04 0.031733905
## Q99ME2 0.7177432 0.1289382 5.648113 5.566569 1.738588e-03 0.048617866
## Q99MN9 -0.7488332 0.1118698 5.648113 -6.693795 6.918551e-04 0.036199170
## Q99N69 1.2995065 0.1358688 5.451037 9.564425 1.306784e-04 0.019682654
## Q9CQ62 1.4512114 0.1964667 5.648113 7.386552 4.169322e-04 0.031733905
## Q9CXJ1 -1.4415674 0.2534193 5.584072 -5.688467 1.624550e-03 0.048036925
## Q9CYL5 1.1753986 0.1107101 5.477610 10.616905 7.359420e-05 0.015565962
## Q9D3P8 1.5456895 0.1195600 5.413891 12.928145 2.822527e-05 0.012048266
## Q9DC16 2.2722961 0.2275354 5.351181 9.986560 1.164142e-04 0.019682654
## Q9JIY5 0.8422794 0.1403619 5.648113 6.000769 1.200148e-03 0.041420953
## Q9JJU8 1.5175022 0.1311667 5.648113 11.569269 3.806442e-05 0.012048266
## Q9JMH9 0.8308308 0.1109785 5.367929 7.486414 4.902220e-04 0.033287033
## Q9QUG9 -0.9498899 0.1560668 5.648113 -6.086431 1.118227e-03 0.039921853
## Q9QXG4 -1.2124013 0.1545496 5.648113 -7.844741 3.047003e-04 0.027767818
## Q9QXY6 -1.2767271 0.1134318 5.648113 -11.255459 4.421121e-05 0.012048266
## Q9QYB5 -0.9787455 0.1137419 5.648113 -8.604971 1.871811e-04 0.023150286
## Q9QYC0 -0.8169803 0.1237723 5.648113 -6.600670 7.429903e-04 0.036638644
## Q9R1Q7 1.7129665 0.1859148 5.648113 9.213718 1.301102e-04 0.019682654
## Q9WTK5 0.9882094 0.1409217 5.648113 7.012470 5.452777e-04 0.033969286
## Q9WU84 0.9859723 0.1713411 5.483890 5.754441 1.633143e-03 0.048036925
## Q9WUU8 0.9412210 0.1116031 5.627862 8.433648 2.121077e-04 0.024402147
## Q9Z0S1 -1.0132896 0.1463462 5.424117 -6.923921 6.899718e-04 0.036199170
## Q9Z2L7 -0.7139750 0.1202404 5.534987 -5.937899 1.358572e-03 0.043969484
for (protName in sigNames[1:5])
{ pe[protName, , c("peptideNorm","proteinRobust")]
pePlot <- data.frame(longFormat(pePlot))
pePlotDf <-$assay <- factor(pePlotDf$assay,
pePlotDflevels = c("peptideNorm", "proteinRobust"))
$celltype <- as.factor(colData(pePlot)[pePlotDf$colname, "celltype"])
pePlotDf
# plotting
ggplot(data = pePlotDf,
p1 <-aes(x = colname, y = value, group = rowname)) +
geom_line() + geom_point() + theme_minimal() +
facet_grid(~assay) + ggtitle(protName)
print(p1)
# plotting 2
ggplot(pePlotDf, aes(x = colname, y = value, fill = celltype)) +
p2 <- geom_boxplot(outlier.shape = NA) + geom_point(position = position_jitter(width = .1),
aes(shape = rowname)) +
scale_shape_manual(values = 1:nrow(pePlotDf)) +
labs(title = protName, x = "sample", y = "peptide intensity (log2)") + theme_minimal()
facet_grid(~assay)
print(p2)
}