R version: R version 4.0.3 (2020-10-10)
Bioconductor version: 3.12
Package version: 1.14.0
The follow packages will be used throughout this documents. R version
3.3.1 or higher is required to install all the packages using
BiocManager::install.
library("mzR")
library("mzID")
library("MSnID")
library("MSnbase")
library("rpx")
library("MLInterfaces")
library("pRoloc")
library("pRolocdata")
library("MSGFplus")
library("rols")
library("hpar")The most convenient way to install all the tutorials requirement (and more related content), is to install RforProteomics with all its dependencies.
library("BiocManager")
BiocManager::install("RforProteomics", dependencies = TRUE)Other packages of interest, such as rTANDEM or MSGFgui will be described later in the document but are not required to execute the code in this workflow.
This workflow illustrates R / Bioconductor infrastructure for proteomics. Topics covered focus on support for open community-driven formats for raw data and identification results, packages for peptide-spectrum matching, data processing and analysis:
Links to other packages and references are also documented. In particular, the vignettes included in the RforProteomics package also contains relevant material.
In Bioconductor version 3.12, there are respectively 144
proteomics,
103
mass spectrometry software packages
and 23
mass spectrometry experiment packages. These
respective packages can be extracted with the proteomicsPackages(),
massSpectrometryPackages() and massSpectrometryDataPackages() and
explored interactively.
library("RforProteomics")
pp <- proteomicsPackages()
display(pp)Most community-driven formats are supported in R, as detailed in the
table below.
MS-based proteomics data is disseminated through the ProteomeXchange infrastructure, which centrally coordinates submission, storage and dissemination through multiple data repositories, such as the PRIDE data base at the EBI for MS/MS experiments, PASSEL at the ISB for SRM data and the MassIVE resource. The rpx is an interface to ProteomeXchange and provides a basic access to PX data.
library("rpx")
pxannounced()## 15 new ProteomeXchange annoucements##     Data.Set    Publication.Data Message
## 1  PXD014935 2020-12-31 07:14:47     New
## 2  PXD023330 2020-12-31 07:07:20     New
## 3  PXD017967 2020-12-31 06:24:23     New
## 4  PXD021637 2020-12-31 06:22:07     New
## 5  PXD022568 2020-12-31 06:17:42     New
## 6  PXD019658 2020-12-31 06:15:18     New
## 7  PXD021549 2020-12-30 15:00:10     New
## 8  PXD022147 2020-12-30 15:00:06     New
## 9  PXD020168 2020-12-30 03:35:49     New
## 10 PXD021129 2020-12-28 16:35:49     New
## 11 PXD023323 2020-12-28 05:05:08     New
## 12 PXD023322 2020-12-28 04:59:25     New
## 13 PXD023320 2020-12-26 08:14:38     New
## 14 PXD023319 2020-12-26 08:09:54     New
## 15 PXD023318 2020-12-26 08:03:56     NewUsing the unique PXD000001 identifier, we can retrieve the relevant
metadata that will be stored in a PXDataset object. The names of the
files available in this data can be retrieved with the pxfiles
accessor function.
px <- PXDataset("PXD000001")
px## Object of class "PXDataset"
##  Id: PXD000001 with 12 files
##  [1] 'F063721.dat' ... [12] 'generated'
##  Use 'pxfiles(.)' to see all files.pxfiles(px)##  [1] "F063721.dat"                                                         
##  [2] "F063721.dat-mztab.txt"                                               
##  [3] "PRIDE_Exp_Complete_Ac_22134.xml.gz"                                  
##  [4] "PRIDE_Exp_mzData_Ac_22134.xml.gz"                                    
##  [5] "PXD000001_mztab.txt"                                                 
##  [6] "README.txt"                                                          
##  [7] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML" 
##  [8] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzXML"
##  [9] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzXML"         
## [10] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.raw"           
## [11] "erwinia_carotovora.fasta"                                            
## [12] "generated"Other metadata for the px data set:
pxtax(px)## [1] "Erwinia carotovora"pxurl(px)## [1] "ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2012/03/PXD000001"pxref(px)## [1] "Gatto L, Christoforou A. Using R and Bioconductor for proteomics data analysis. Biochim Biophys Acta. 2013 May 18. doi:pii: S1570-9639(13)00186-6. 10.1016/j.bbapap.2013.04.032"Data files can then be downloaded with the pxget function. Below, we
retrieve the raw data file. The file is downloaded in the working
directory and the name of the file is return by the function and
stored in the mzf variable for later use.
fn <- "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML"
mzf <- pxget(px, fn)## Downloading 1 filemzf## [1] "/tmp/RtmpzMRRJN/Rbuild2eab651b224c/proteomics/vignettes/TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML"The mzR package provides an interface to the
proteowizard C/C++ code base
to access various raw data files, such as mzML, mzXML, netCDF,
and mzData. The data is accessed on-disk, i.e it is not loaded
entirely in memory by default but only when explicitly requested. The
three main functions are openMSfile to create a file handle to a raw
data file, header to extract metadata about the spectra contained in
the file and peaks to extract one or multiple spectra of
interest. Other functions such as instrumentInfo, or runInfo can
be used to gather general information about a run.
Below, we access the raw data file downloaded in the previous section and open a file handle that will allow us to extract data and metadata of interest.
library("mzR")
ms <- openMSfile(mzf)
ms## Mass Spectrometry file handle.
## Filename:  TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML 
## Number of scans:  7534The header function returns the metadata of all available peaks:
hd <- header(ms)
dim(hd)## [1] 7534   31names(hd)##  [1] "seqNum"                     "acquisitionNum"            
##  [3] "msLevel"                    "polarity"                  
##  [5] "peaksCount"                 "totIonCurrent"             
##  [7] "retentionTime"              "basePeakMZ"                
##  [9] "basePeakIntensity"          "collisionEnergy"           
## [11] "ionisationEnergy"           "lowMZ"                     
## [13] "highMZ"                     "precursorScanNum"          
## [15] "precursorMZ"                "precursorCharge"           
## [17] "precursorIntensity"         "mergedScan"                
## [19] "mergedResultScanNum"        "mergedResultStartScanNum"  
## [21] "mergedResultEndScanNum"     "injectionTime"             
## [23] "filterString"               "spectrumId"                
## [25] "centroided"                 "ionMobilityDriftTime"      
## [27] "isolationWindowTargetMZ"    "isolationWindowLowerOffset"
## [29] "isolationWindowUpperOffset" "scanWindowLowerLimit"      
## [31] "scanWindowUpperLimit"We can extract metadata and scan data for scan 1000 as follows:
hd[1000, ]##      seqNum acquisitionNum msLevel polarity peaksCount totIonCurrent
## 1000   1000           1000       2        1        274       1048554
##      retentionTime basePeakMZ basePeakIntensity collisionEnergy
## 1000      1106.916    136.061            164464              45
##      ionisationEnergy    lowMZ   highMZ precursorScanNum precursorMZ
## 1000                0 104.5467 1370.758              992    683.0817
##      precursorCharge precursorIntensity mergedScan mergedResultScanNum
## 1000               2           689443.7         NA                  NA
##      mergedResultStartScanNum mergedResultEndScanNum injectionTime
## 1000                       NA                     NA      55.21463
##                                                  filterString
## 1000 FTMS + p NSI d Full ms2 683.08@hcd45.00 [100.00-1380.00]
##                                         spectrumId centroided
## 1000 controllerType=0 controllerNumber=1 scan=1000       TRUE
##      ionMobilityDriftTime isolationWindowTargetMZ isolationWindowLowerOffset
## 1000                   NA                  683.08                          1
##      isolationWindowUpperOffset scanWindowLowerLimit scanWindowUpperLimit
## 1000                          1                  100                 1380head(peaks(ms, 1000))##          [,1]     [,2]
## [1,] 104.5467 308.9326
## [2,] 104.5684 308.6961
## [3,] 108.8340 346.7183
## [4,] 109.3928 365.1236
## [5,] 110.0345 616.7905
## [6,] 110.0703 429.1975plot(peaks(ms, 1000), type = "h")Below we reproduce the example from the MSmap function from the
MSnbase package to plot a specific slice of the raw data using the
mzR functions we have just described.
## a set of spectra of interest: MS1 spectra eluted
## between 30 and 35 minutes retention time
ms1 <- which(hd$msLevel == 1)
rtsel <- hd$retentionTime[ms1] / 60 > 30 &
    hd$retentionTime[ms1] / 60 < 35
## the map
M <- MSmap(ms, ms1[rtsel], 521, 523, .005, hd)
plot(M, aspect = 1, allTicks = FALSE)plot3D(M)## With some MS2 spectra
i <- ms1[which(rtsel)][1]
j <- ms1[which(rtsel)][2]
M2 <- MSmap(ms, i:j, 100, 1000, 1, hd)
plot3D(M2)The RforProteomics package distributes a small identification result
file (see
?TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzid) that we
load and parse using infrastructure from the mzID
package.
library("mzID")
f <- dir(system.file("extdata", package = "RforProteomics"),
         pattern = "mzid", full.names=TRUE)
basename(f)## [1] "TMT_Erwinia.mzid.gz"id <- mzID(f)## reading TMT_Erwinia.mzid.gz... DONE!id## An mzID object
## 
## Software used:   MS-GF+ (version: Beta (v10072))
## 
## Rawfile:         /home/lgatto/dev/00_github/RforProteomics/sandbox/TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzXML
## 
## Database:        /home/lgatto/dev/00_github/RforProteomics/sandbox/erwinia_carotovora.fasta
## 
## Number of scans: 5287
## Number of PSM's: 5563Various data can be extracted from the mzID object, using one the
accessor functions such as database, scans, peptides, … The
object can also be converted into a data.frame using the flatten
function.
The mzR package also provides support fasta parsing mzIdentML files
with the openIDfile function. As for raw data, the underlying C/C++
code comes from the
proteowizard.
library("mzR")
f <- dir(system.file("extdata", package = "RforProteomics"),
         pattern = "mzid", full.names=TRUE)
id1 <- openIDfile(f)
fid1 <- mzR::psms(id1)
head(fid1)##   spectrumID chargeState rank passThreshold experimentalMassToCharge
## 1  scan=5782           3    1          TRUE                1080.2325
## 2  scan=6037           3    1          TRUE                1002.2089
## 3  scan=5235           3    1          TRUE                1189.2836
## 4  scan=5397           3    1          TRUE                 960.5365
## 5  scan=6075           3    1          TRUE                1264.3409
## 6  scan=5761           2    1          TRUE                1268.6429
##   calculatedMassToCharge                            sequence peptideRef modNum
## 1              1080.2321 PVQIQAGEDSNVIGALGGAVLGGFLGNTIGGGSGR       Pep1      0
## 2              1002.2115        TQVLDGLINANDIEVPVALIDGEIDVLR       Pep2      0
## 3              1189.2800   TKGLNVMQNLLTAHPDVQAVFAQNDEMALGALR       Pep3      0
## 4               960.5365         SQILQQAGTSVLSQANQVPQTVLSLLR       Pep4      0
## 5              1264.3419 PIIGDNPFVVVLPDVVLDESTADQTQENLALLISR       Pep5      0
## 6              1268.6501             WTSQSSLDLGEPLSLITESVFAR       Pep6      0
##   isDecoy post pre start end DatabaseAccess DBseqLength DatabaseSeq
## 1   FALSE    S   R    50  84        ECA1932         155            
## 2   FALSE    R   K   288 315        ECA1147         434            
## 3   FALSE    A   R   192 224        ECA0013         295            
## 4   FALSE    -   R   264 290        ECA1731         290            
## 5   FALSE    F   R   119 153        ECA1443         298            
## 6   FALSE    Y   K   264 286        ECA1444         468            
##                                         DatabaseDescription scan.number.s.
## 1                        ECA1932 outer membrane lipoprotein           5782
## 2                                    ECA1147 trigger factor           6037
## 3                ECA0013 ribose-binding periplasmic protein           5235
## 4                                         ECA1731 flagellin           5397
## 5      ECA1443 UTP--glucose-1-phosphate uridylyltransferase           6075
## 6 ECA1444 6-phosphogluconate dehydrogenase, decarboxylating           5761
##   acquisitionNum
## 1           5782
## 2           6037
## 3           5235
## 4           5397
## 5           6075
## 6           5761While searches are generally performed using third-party software
independently of R or can be started from R using a system call, the
rTANDEM package allows one to execute such searches
using the X!Tandem engine. The shinyTANDEM provides an
experimental interactive interface to explore the search results.
library("rTANDEM")
?rtandem
library("shinyTANDEM")
?shinyTANDEMSimilarly, the MSGFplus package enables to perform a search using the MSGF+ engine, as illustrated below.
We search the /tmp/RtmpzMRRJN/Rbuild2eab651b224c/proteomics/vignettes/TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML file against the fasta file from PXD000001
using MSGFplus.
We first download the fasta files:
fas <- pxget(px, "erwinia_carotovora.fasta")## Downloading 1 filebasename(fas)## [1] "erwinia_carotovora.fasta"library("MSGFplus")
msgfpar <- msgfPar(database = fas,
                   instrument = 'HighRes',
                   tda = TRUE,
                   enzyme = 'Trypsin',
                   protocol = 'iTRAQ')
idres <- runMSGF(msgfpar, mzf, memory=1000)## '/usr/bin/java' -Xmx1000M -jar '/home/biocbuild/bbs-3.12-bioc/R/library/MSGFplus/MSGFPlus/MSGFPlus.jar' -s '/tmp/RtmpzMRRJN/Rbuild2eab651b224c/proteomics/vignettes/TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML' -o '/tmp/RtmpzMRRJN/Rbuild2eab651b224c/proteomics/vignettes/TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzid' -d '/tmp/RtmpzMRRJN/Rbuild2eab651b224c/proteomics/vignettes/erwinia_carotovora.fasta' -tda 1 -inst 1 -e 1 -protocol 2 
## 
## reading TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzid... DONE!idres## An mzID object
## 
## Software used:   MS-GF+ (version: Beta (v10072))
## 
## Rawfile:         /tmp/RtmpzMRRJN/Rbuild2eab651b224c/proteomics/vignettes/TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML
## 
## Database:        /tmp/RtmpzMRRJN/Rbuild2eab651b224c/proteomics/vignettes/erwinia_carotovora.fasta
## 
## Number of scans: 5343
## Number of PSM's: 5656## identification file (needed below)
basename(mzID::files(idres)$id)## [1] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzid"(Note that in the runMSGF call above, I explicitly reduce the memory
allocated to the java virtual machine to 3.5GB. In general, there is
no need to specify this argument, unless you experience an error
regarding the maximum heap size).
A graphical interface to perform the search the data and explore the results is also available:
library("MSGFgui")
MSGFgui()The MSnID package can be used for post-search filtering
of MS/MS identifications. One starts with the construction of an
MSnID object that is populated with identification results that can
be imported from a data.frame or from mzIdenML files. Here, we
will use the example identification data provided with the package.
mzids <- system.file("extdata", "c_elegans.mzid.gz", package="MSnID")
basename(mzids)## [1] "c_elegans.mzid.gz"We start by loading the package, initialising the MSnID object, and
add the identification result from our mzid file (there could of
course be more that one).
library("MSnID")
msnid <- MSnID(".")## Note, the anticipated/suggested columns in the
## peptide-to-spectrum matching results are:
## -----------------------------------------------
## accession
## calculatedMassToCharge
## chargeState
## experimentalMassToCharge
## isDecoy
## peptide
## spectrumFile
## spectrumIDmsnid <- read_mzIDs(msnid, mzids)## Reading from mzIdentMLs ...## reading c_elegans.mzid.gz... DONE!show(msnid)## MSnID object
## Working directory: "."
## #Spectrum Files:  1 
## #PSMs: 12263 at 36 % FDR
## #peptides: 9489 at 44 % FDR
## #accessions: 7414 at 76 % FDRPrinting the MSnID object returns some basic information such as
The package then enables to define, optimise and apply filtering based for example on missed cleavages, identification scores, precursor mass errors, etc. and assess PSM, peptide and protein FDR levels. To properly function, it expects to have access to the following data
## [1] "accession"                "calculatedMassToCharge"  
## [3] "chargeState"              "experimentalMassToCharge"
## [5] "isDecoy"                  "peptide"                 
## [7] "spectrumFile"             "spectrumID"which are indeed present in our data:
names(msnid)##  [1] "spectrumID"                "scan number(s)"           
##  [3] "acquisitionNum"            "passThreshold"            
##  [5] "rank"                      "calculatedMassToCharge"   
##  [7] "experimentalMassToCharge"  "chargeState"              
##  [9] "MS-GF:DeNovoScore"         "MS-GF:EValue"             
## [11] "MS-GF:PepQValue"           "MS-GF:QValue"             
## [13] "MS-GF:RawScore"            "MS-GF:SpecEValue"         
## [15] "AssumedDissociationMethod" "IsotopeError"             
## [17] "isDecoy"                   "post"                     
## [19] "pre"                       "end"                      
## [21] "start"                     "accession"                
## [23] "length"                    "description"              
## [25] "pepSeq"                    "modified"                 
## [27] "modification"              "idFile"                   
## [29] "spectrumFile"              "databaseFile"             
## [31] "peptide"Here, we summarise a few steps and redirect the reader to the package’s vignette for more details:
Cleaning irregular cleavages at the termini of the peptides and
missing cleavage site within the peptide sequences. The following two
function call create the new numMisCleavages and numIrrCleabages
columns in the MSnID object
msnid <- assess_termini(msnid, validCleavagePattern="[KR]\\.[^P]")
msnid <- assess_missed_cleavages(msnid, missedCleavagePattern="[KR](?=[^P$])")Now, we can use the apply_filter function to effectively apply
filters. The strings passed to the function represent expressions that
will be evaludated, this keeping only PSMs that have 0 irregular
cleavages and 2 or less missed cleavages.
msnid <- apply_filter(msnid, "numIrregCleavages == 0")
msnid <- apply_filter(msnid, "numMissCleavages <= 2")
show(msnid)## MSnID object
## Working directory: "."
## #Spectrum Files:  1 
## #PSMs: 7838 at 17 % FDR
## #peptides: 5598 at 23 % FDR
## #accessions: 3759 at 53 % FDRUsing "calculatedMassToCharge" and "experimentalMassToCharge", the
mass_measurement_error function calculates the parent ion mass
measurement error in parts per million.
summary(mass_measurement_error(msnid))##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -2184.0640    -0.6992     0.0000    17.6146     0.7512  2012.5178We then filter any matches that do not fit the +/- 20 ppm tolerance
msnid <- apply_filter(msnid, "abs(mass_measurement_error(msnid)) < 20")
summary(mass_measurement_error(msnid))##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -19.7797  -0.5866   0.0000  -0.2970   0.5713  19.6758Filtering of the identification data will rely on
msnid$msmsScore <- -log10(msnid$`MS-GF:SpecEValue`)msnid$absParentMassErrorPPM <- abs(mass_measurement_error(msnid))MS2 fiters are handled by a special MSnIDFilter class objects, where
individual filters are set by name (that is present in names(msnid))
and comparison operator (>, <, = , …) defining if we should retain
hits with higher or lower given the threshold and finally the
threshold value itself.
filtObj <- MSnIDFilter(msnid)
filtObj$absParentMassErrorPPM <- list(comparison="<", threshold=10.0)
filtObj$msmsScore <- list(comparison=">", threshold=10.0)
show(filtObj)## MSnIDFilter object
## (absParentMassErrorPPM < 10) & (msmsScore > 10)We can then evaluate the filter on the identification data object, which return the false discovery rate and number of retained identifications for the filtering criteria at hand.
evaluate_filter(msnid, filtObj)##           fdr    n
## PSM         0 3807
## peptide     0 2455
## accession   0 1009Rather than setting filtering values by hand, as shown above, these can be set automativally to meet a specific false discovery rate.
filtObj.grid <- optimize_filter(filtObj, msnid, fdr.max=0.01,
                                method="Grid", level="peptide",
                                n.iter=500)
show(filtObj.grid)## MSnIDFilter object
## (absParentMassErrorPPM < 3) & (msmsScore > 7.4)evaluate_filter(msnid, filtObj.grid)##                   fdr    n
## PSM       0.004097561 5146
## peptide   0.006447651 3278
## accession 0.021996616 1208Filters can eventually be applied (rather than just evaluated) using
the apply_filter function.
msnid <- apply_filter(msnid, filtObj.grid)
show(msnid)## MSnID object
## Working directory: "."
## #Spectrum Files:  1 
## #PSMs: 5146 at 0.41 % FDR
## #peptides: 3278 at 0.64 % FDR
## #accessions: 1208 at 2.2 % FDRAnd finally, identifications that matched decoy and contaminant protein sequences are removed
msnid <- apply_filter(msnid, "isDecoy == FALSE")
msnid <- apply_filter(msnid, "!grepl('Contaminant',accession)")
show(msnid)## MSnID object
## Working directory: "."
## #Spectrum Files:  1 
## #PSMs: 5117 at 0 % FDR
## #peptides: 3251 at 0 % FDR
## #accessions: 1179 at 0 % FDRThe resulting filtered identification data can be exported to a
data.frame or to a dedicated MSnSet data structure for
quantitative MS data, described below, and further processed and
analyses using appropriate statistical tests.
The above sections introduced low-level interfaces to raw and
identification results. The MSnbase package provides
abstractions for raw data through the MSnExp class and containers
for quantification data via the MSnSet class. Both store
spectra (or the [, [[ operators) or exprs;data.frame with pData;data.frame with fData.Another useful slot is processingData, accessed with
processingData(.), that records all the processing that objects have
undergone since their creation (see examples below).
The readMSData will parse the raw data, extract the MS2 spectra (by
default) and construct an MS experiment object of class MSnExp.
(Note that while readMSData supports MS1 data, this is currently not
convenient as all the data is read into memory.)
library("MSnbase")
rawFile <- dir(system.file(package = "MSnbase", dir = "extdata"),
               full.name = TRUE, pattern = "mzXML$")
basename(rawFile)## [1] "dummyiTRAQ.mzXML"msexp <- readMSData(rawFile, verbose = FALSE, centroided = FALSE)
msexp## MSn experiment data ("MSnExp")
## Object size in memory: 0.18 Mb
## - - - Spectra data - - -
##  MS level(s): 2 
##  Number of spectra: 5 
##  MSn retention times: 25:1 - 25:2 minutes
## - - - Processing information - - -
## Data loaded: Fri Jan  1 09:27:33 2021 
##  MSnbase version: 2.16.0 
## - - - Meta data  - - -
## phenoData
##   rowNames: dummyiTRAQ.mzXML
##   varLabels: sampleNames
##   varMetadata: labelDescription
## Loaded from:
##   dummyiTRAQ.mzXML 
## protocolData: none
## featureData
##   featureNames: F1.S1 F1.S2 ... F1.S5 (5 total)
##   fvarLabels: spectrum
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'MS2 spectra can be extracted as a list of Spectrum2 objects with the
spectra accessor or as a subset of the original MSnExp data with
the [ operator. Individual spectra can be accessed with [[.
length(msexp)## [1] 5msexp[1:2]## MSn experiment data ("MSnExp")
## Object size in memory: 0.07 Mb
## - - - Spectra data - - -
##  MS level(s): 2 
##  Number of spectra: 2 
##  MSn retention times: 25:1 - 25:2 minutes
## - - - Processing information - - -
## Data loaded: Fri Jan  1 09:27:33 2021 
## Data [numerically] subsetted 2 spectra: Fri Jan  1 09:27:33 2021 
##  MSnbase version: 2.16.0 
## - - - Meta data  - - -
## phenoData
##   rowNames: dummyiTRAQ.mzXML
##   varLabels: sampleNames
##   varMetadata: labelDescription
## Loaded from:
##   dummyiTRAQ.mzXML 
## protocolData: none
## featureData
##   featureNames: F1.S1 F1.S2
##   fvarLabels: spectrum
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'msexp[[2]]## Object of class "Spectrum2"
##  Precursor: 546.9586 
##  Retention time: 25:2 
##  Charge: 3 
##  MSn level: 2 
##  Peaks count: 1012 
##  Total ion count: 56758067The identification results stemming from the same raw data file can then be used to add PSM matches.
fData(msexp)##       spectrum
## F1.S1        1
## F1.S2        2
## F1.S3        3
## F1.S4        4
## F1.S5        5## find path to a mzIdentML file
identFile <- dir(system.file(package = "MSnbase", dir = "extdata"),
                 full.name = TRUE, pattern = "dummyiTRAQ.mzid")
basename(identFile)## [1] "dummyiTRAQ.mzid"msexp <- addIdentificationData(msexp, identFile)
fData(msexp)##       spectrum acquisition.number          sequence chargeState rank
## F1.S1        1                  1 VESITARHGEVLQLRPK           3    1
## F1.S2        2                  2     IDGQWVTHQWLKK           3    1
## F1.S3        3                  3              <NA>          NA   NA
## F1.S4        4                  4              <NA>          NA   NA
## F1.S5        5                  5           LVILLFR           2    1
##       passThreshold experimentalMassToCharge calculatedMassToCharge peptideRef
## F1.S1          TRUE                 645.3741               645.0375       Pep2
## F1.S2          TRUE                 546.9586               546.9633       Pep1
## F1.S3            NA                       NA                     NA       <NA>
## F1.S4            NA                       NA                     NA       <NA>
## F1.S5          TRUE                 437.8040               437.2997       Pep4
##       modNum isDecoy post  pre start end DatabaseAccess DBseqLength DatabaseSeq
## F1.S1      0   FALSE    A    R   170 186        ECA0984         231            
## F1.S2      0   FALSE    A    K    50  62        ECA1028         275            
## F1.S3     NA      NA <NA> <NA>    NA  NA           <NA>          NA        <NA>
## F1.S4     NA      NA <NA> <NA>    NA  NA           <NA>          NA        <NA>
## F1.S5      0   FALSE    L    K    22  28        ECA0510         166            
##                                                              DatabaseDescription
## F1.S1                                        ECA0984 DNA mismatch repair protein
## F1.S2 ECA1028 2,3,4,5-tetrahydropyridine-2,6-dicarboxylate N-succinyltransferase
## F1.S3                                                                       <NA>
## F1.S4                                                                       <NA>
## F1.S5           ECA0510 putative capsular polysacharide biosynthesis transferase
##       scan.number.s.          idFile MS.GF.RawScore MS.GF.DeNovoScore
## F1.S1              1 dummyiTRAQ.mzid            -39                77
## F1.S2              2 dummyiTRAQ.mzid            -30                39
## F1.S3             NA            <NA>             NA                NA
## F1.S4             NA            <NA>             NA                NA
## F1.S5              5 dummyiTRAQ.mzid            -42                 5
##       MS.GF.SpecEValue MS.GF.EValue modPeptideRef modName modMass modLocation
## F1.S1     5.527468e-05     79.36958          <NA>    <NA>      NA          NA
## F1.S2     9.399048e-06     13.46615          <NA>    <NA>      NA          NA
## F1.S3               NA           NA          <NA>    <NA>      NA          NA
## F1.S4               NA           NA          <NA>    <NA>      NA          NA
## F1.S5     2.577830e-04    366.38422          <NA>    <NA>      NA          NA
##       subOriginalResidue subReplacementResidue subLocation nprot npep.prot
## F1.S1               <NA>                  <NA>          NA     1         1
## F1.S2               <NA>                  <NA>          NA     1         1
## F1.S3               <NA>                  <NA>          NA    NA        NA
## F1.S4               <NA>                  <NA>          NA    NA        NA
## F1.S5               <NA>                  <NA>          NA     1         1
##       npsm.prot npsm.pep
## F1.S1         1        1
## F1.S2         1        1
## F1.S3        NA       NA
## F1.S4        NA       NA
## F1.S5         1        1The readMSData and addIdentificationData make use of mzR and
mzID packages to access the raw and identification data.
Spectra and (parts of) experiments can be extracted and plotted.
msexp[[1]]## Object of class "Spectrum2"
##  Precursor: 645.3741 
##  Retention time: 25:1 
##  Charge: 3 
##  MSn level: 2 
##  Peaks count: 2921 
##  Total ion count: 668170086plot(msexp[[1]], full=TRUE)msexp[1:3]## MSn experiment data ("MSnExp")
## Object size in memory: 0.11 Mb
## - - - Spectra data - - -
##  MS level(s): 2 
##  Number of spectra: 3 
##  MSn retention times: 25:1 - 25:2 minutes
## - - - Processing information - - -
## Data loaded: Fri Jan  1 09:27:33 2021 
## Data [numerically] subsetted 3 spectra: Fri Jan  1 09:27:34 2021 
##  MSnbase version: 2.16.0 
## - - - Meta data  - - -
## phenoData
##   rowNames: dummyiTRAQ.mzXML
##   varLabels: sampleNames
##   varMetadata: labelDescription
## Loaded from:
##   dummyiTRAQ.mzXML 
## protocolData: none
## featureData
##   featureNames: F1.S1 F1.S2 F1.S3
##   fvarLabels: spectrum acquisition.number ... npsm.pep (36 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'plot(msexp[1:3], full=TRUE)There are a wide range of proteomics quantitation techniques that can broadly be classified as labelled vs. label-free, depending whether the features are labelled prior the MS acquisition and the MS level at which quantitation is inferred, namely MS1 or MS2.
| Label-free | Labelled | |
|---|---|---|
| MS1 | XIC | SILAC, 15N | 
| MS2 | Counting | iTRAQ, TMT | 
In terms of raw data quantitation, most efforts have been devoted to MS2-level quantitation. Label-free XIC quantitation has however been addressed in the frame of metabolomics data processing by the xcms infrastructure.
An MSnExp is converted to an MSnSet by the quantitation
method. Below, we use the iTRAQ 4-plex isobaric tagging strategy
(defined by the iTRAQ4 parameter; other tags are available) and the
trapezoidation method to calculate the area under the isobaric
reporter peaks.
plot(msexp[[1]], full=TRUE, reporters = iTRAQ4)msset <- quantify(msexp, method = "trap", reporters = iTRAQ4, verbose=FALSE)
exprs(msset)##       iTRAQ4.114 iTRAQ4.115 iTRAQ4.116 iTRAQ4.117
## F1.S1   4483.320   4873.996   6743.441   4601.378
## F1.S2   1918.082   1418.040   1117.601   1581.954
## F1.S3  15210.979  15296.256  15592.760  16550.502
## F1.S4   4133.103   5069.983   4724.845   4694.801
## F1.S5  11947.881  13061.875  12809.491  12911.479processingData(msset)## - - - Processing information - - -
## Data loaded: Fri Jan  1 09:27:33 2021 
## iTRAQ4 quantification by trapezoidation: Fri Jan  1 09:27:38 2021 
##  MSnbase version: 2.16.0Other MS2 quantitation methods available in quantify include the
(normalised) spectral index SI and (normalised) spectral abundance
factor SAF or simply a simple count method.
exprs(si <- quantify(msexp, method = "SIn"))##         dummyiTRAQ.mzXML
## ECA0510     0.0006553518
## ECA0984     0.0035384487
## ECA1028     0.0002684726exprs(saf <- quantify(msexp, method = "NSAF"))##         dummyiTRAQ.mzXML
## ECA0510        0.4306167
## ECA0984        0.3094475
## ECA1028        0.2599359Note that spectra that have not been assigned any peptide (NA) or
that match non-unique peptides (npsm > 1) are discarded in the
counting process.
The PSI mzTab file format is aimed at providing a simpler (than XML
formats) and more accessible file format to the wider community. It is
composed of a key-value metadata section and peptide/protein/small
molecule tabular sections.
Note that below, we specify version 0.9 (that generates the warning)
to fit with the file. For recent files, the version argument would
be ignored to use the recent importer.
mztf <- pxget(px, "F063721.dat-mztab.txt")## Downloading 1 file(mzt <- readMzTabData(mztf, what = "PEP", version = "0.9"))## Warning: Version 0.9 is deprecated. Please see '?readMzTabData' and '?MzTab' for
## details.## MSnSet (storageMode: lockedEnvironment)
## assayData: 1528 features, 6 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: sub[1] sub[2] ... sub[6] (6 total)
##   varLabels: abundance
##   varMetadata: labelDescription
## featureData
##   featureNames: 1 2 ... 1528 (1528 total)
##   fvarLabels: sequence accession ... uri (14 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:  
## - - - Processing information - - -
## mzTab read: Fri Jan  1 09:27:43 2021 
##  MSnbase version: 2.16.0It is also possible to import arbitrary spreadsheets as MSnSet
objects into R with the readMSnSet2 function. The main 2 arguments
of the function are (1) a text-based spreadsheet and (2) column names
of indices that identify the quantitation data. The latter can be
queried with the getEcols function.
csv <- dir(system.file ("extdata" , package = "pRolocdata"),
           full.names = TRUE, pattern = "pr800866n_si_004-rep1.csv")
getEcols(csv, split = ",")##  [1] "\"Protein ID\""              "\"FBgn\""                   
##  [3] "\"Flybase Symbol\""          "\"No. peptide IDs\""        
##  [5] "\"Mascot score\""            "\"No. peptides quantified\""
##  [7] "\"area 114\""                "\"area 115\""               
##  [9] "\"area 116\""                "\"area 117\""               
## [11] "\"PLS-DA classification\""   "\"Peptide sequence\""       
## [13] "\"Precursor ion mass\""      "\"Precursor ion charge\""   
## [15] "\"pd.2013\""                 "\"pd.markers\""ecols <- 7:10
res <- readMSnSet2(csv, ecols)
head(exprs(res))##   area.114 area.115 area.116 area.117
## 1 0.379000 0.281000 0.225000 0.114000
## 2 0.420000 0.209667 0.206111 0.163889
## 3 0.187333 0.167333 0.169667 0.476000
## 4 0.247500 0.253000 0.320000 0.179000
## 5 0.216000 0.183000 0.342000 0.259000
## 6 0.072000 0.212333 0.573000 0.142667head(fData(res))##   Protein.ID        FBgn Flybase.Symbol No..peptide.IDs Mascot.score
## 1    CG10060 FBgn0001104    G-ialpha65A               3       179.86
## 2    CG10067 FBgn0000044         Act57B               5       222.40
## 3    CG10077 FBgn0035720        CG10077               5       219.65
## 4    CG10079 FBgn0003731           Egfr               2        86.39
## 5    CG10106 FBgn0029506        Tsp42Ee               1        52.10
## 6    CG10130 FBgn0010638      Sec61beta               2        79.90
##   No..peptides.quantified PLS.DA.classification Peptide.sequence
## 1                       1                    PM                 
## 2                       9                    PM                 
## 3                       3                                       
## 4                       2                    PM                 
## 5                       1                              GGVFDTIQK
## 6                       3              ER/Golgi                 
##   Precursor.ion.mass Precursor.ion.charge     pd.2013 pd.markers
## 1                                                  PM    unknown
## 2                                                  PM    unknown
## 3                                             unknown    unknown
## 4                                                  PM    unknown
## 5            626.887                    2 Phenotype 1    unknown
## 6                                            ER/Golgi         ERFor raw data processing look at MSnbase’s clean, smooth,
pickPeaks, removePeaks and trimMz for MSnExp and spectra
processing methods.
The MALDIquantand xcms packages also features a wide range of raw data processing methods on their own ad hoc data instance types.
Each different types of quantitative data will require their own
pre-processing and normalisation steps. Both isobar and MSnbase
allow to correct for isobaric tag impurities normalise the
quantitative data.
data(itraqdata)
qnt <- quantify(itraqdata, method = "trap",
                reporters = iTRAQ4, verbose = FALSE)
impurities <- matrix(c(0.929,0.059,0.002,0.000,
                       0.020,0.923,0.056,0.001,
                       0.000,0.030,0.924,0.045,
                       0.000,0.001,0.040,0.923),
                     nrow=4, byrow = TRUE)
## or, using makeImpuritiesMatrix()
## impurities <- makeImpuritiesMatrix(4)
qnt.crct <- purityCorrect(qnt, impurities)
processingData(qnt.crct)## - - - Processing information - - -
## Data loaded: Wed May 11 18:54:39 2011 
## Updated from version 0.3.0 to 0.3.1 [Fri Jul  8 20:23:25 2016] 
## iTRAQ4 quantification by trapezoidation: Fri Jan  1 09:27:44 2021 
## Purity corrected: Fri Jan  1 09:27:44 2021 
##  MSnbase version: 1.1.22Various normalisation methods can be applied the MSnSet instances
using the normalise method: variance stabilisation (vsn), quantile
(quantiles), median or mean centring (center.media or
center.mean), …
qnt.crct.nrm <- normalise(qnt.crct, "quantiles")The combineFeatures method combines spectra/peptides quantitation
values into protein data. The grouping is defined by the groupBy
parameter, which is generally taken from the feature metadata (protein
accessions, for example).
## arbitraty grouping
g <- factor(c(rep(1, 25), rep(2, 15), rep(3, 15)))
g##  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2
## [39] 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## Levels: 1 2 3prt <- combineFeatures(qnt.crct.nrm, groupBy = g, fun = "sum")## Warning: Parameter 'fun' is deprecated. Please use 'method' instead## Your data contains missing values. Please read the relevant section in
## the combineFeatures manual page for details on the effects of missing
## values on data aggregation.processingData(prt)## - - - Processing information - - -
## Data loaded: Wed May 11 18:54:39 2011 
## Updated from version 0.3.0 to 0.3.1 [Fri Jul  8 20:23:25 2016] 
## iTRAQ4 quantification by trapezoidation: Fri Jan  1 09:27:44 2021 
## Purity corrected: Fri Jan  1 09:27:44 2021 
## Normalised (quantiles): Fri Jan  1 09:27:44 2021 
## Combined 55 features into 3 using mean: Fri Jan  1 09:27:44 2021 
##  MSnbase version: 2.16.0Finally, proteomics data analysis is generally hampered by missing values. Missing data imputation is a sensitive operation whose success will be guided by many factors, such as degree and (non-)random nature of the missingness.
Below, missing values are randomly assigned to our test data and visualised on a heatmap.
set.seed(1)
qnt0 <- qnt
exprs(qnt0)[sample(prod(dim(qnt0)), 10)] <- NA
table(is.na(qnt0))## 
## FALSE  TRUE 
##   209    11image(qnt0)Missing value in MSnSet instances can be filtered out and imputed
using the filterNA and impute functions.
## remove features with missing values
qnt00 <- filterNA(qnt0)
dim(qnt00)## [1] 44  4any(is.na(qnt00))## [1] FALSE## impute missing values using knn imputation
qnt.imp <- impute(qnt0, method = "knn")
dim(qnt.imp)## [1] 55  4any(is.na(qnt.imp))## [1] FALSEThere are various methods to perform data imputation, as described in
?impute.
R in general and Bioconductor in particular are well suited for the statistical analysis of data. Several packages provide dedicated resources for proteomics data:
MSstats: A set of tools for statistical relative
protein significance analysis in DDA, SRM and DIA experiments. Data
stored in data.frame or MSnSet objects can be used as input.
msmsTests: Statistical tests for label-free LC-MS/MS
data by spectral counts, to discover differentially expressed
proteins between two biological conditions. Three tests are
available: Poisson GLM regression, quasi-likelihood GLM regression,
and the negative binomial of the edgeR
package. All can be readily applied on MSnSet instances produced,
for example by MSnID.
isobar also provides dedicated infrastructure for the statistical analysis of isobaric data.
The MLInterfaces package provides a unified interface
to a wide range of machine learning algorithms. Initially developed
for microarray and ExpressionSet instances, the
pRoloc package enables application of these algorithms
to MSnSet data.
The example below uses knn with the 5 closest neighbours as an
illustration to classify proteins of unknown sub-cellular localisation
to one of 9 possible organelles.
library("MLInterfaces")
library("pRoloc")
library("pRolocdata")
data(dunkley2006)
traininds <- which(fData(dunkley2006)$markers != "unknown")
ans <- MLearn(markers ~ ., data = t(dunkley2006), knnI(k = 5), traininds)
ans## MLInterfaces classification output container
## The call was:
## MLearn(formula = markers ~ ., data = t(dunkley2006), .method = knnI(k = 5), 
##     trainInd = traininds)
## Predicted outcome distribution for test set:
## 
##      ER lumen   ER membrane         Golgi Mitochondrion            PM 
##             5           140            67            51            89 
##       Plastid      Ribosome           TGN       vacuole 
##            29            31             6            10 
## Summary of scores on test set (use testScores() method for details):
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4000  1.0000  1.0000  0.9332  1.0000  1.0000kcl <- MLearn( ~ ., data = dunkley2006, kmeansI, centers = 12)
kcl## clusteringOutput: partition table
## 
##   1   2   3   4   5   6   7   8   9  10  11  12 
##  30  48 101  44  50  35  24  80  34  87  52 104 
## The call that created this object was:
## MLearn(formula = ~., data = dunkley2006, .method = kmeansI, centers = 12)plot(kcl, exprs(dunkley2006))A wide range of classification and clustering algorithms are also
available, as described in the ?MLearn documentation page. The
pRoloc package also uses MSnSet instances as input and ,while
being conceived with the analysis of spatial/organelle proteomics data
in mind, is applicable many use cases.
All the Bioconductor annotation infrastructure, such as biomaRt, GO.db, organism specific annotations, .. are directly relevant to the analysis of proteomics data. A total of 257 ontologies, including some proteomics-centred annotations such as the PSI Mass Spectrometry Ontology, Molecular Interaction (PSI MI 2.5) or Protein Modifications are available through the rols
library("rols")
res <- OlsSearch(q = "ESI", ontology = "MS", exact = TRUE)
res## Object of class 'OlsSearch':
##   ontolgy: MS 
##   query: ESI 
##   requested: 20 (out of 1)
##   response(s): 0There is a single exact match (default is to retrieve 20 results),
that can be retrieved and coreced to a Terms or data.frame object
with
res <- olsSearch(res)
as(res, "Terms")## Object of class 'Terms' with 1 entries
##  From the MS ontology
## MS:1000073as(res, "data.frame")##                                                   id
## 1 ms:class:http://purl.obolibrary.org/obo/MS_1000073
##                                         iri short_form     obo_id
## 1 http://purl.obolibrary.org/obo/MS_1000073 MS_1000073 MS:1000073
##                     label
## 1 electrospray ionization
##                                                                                                                                                                                                                                                                                                                                                                                                                                                  description
## 1 A process in which ionized species in the gas phase are produced from an analyte-containing solution via highly charged fine droplets, by means of spraying the solution from a narrow-bore needle tip at atmospheric pressure in the presence of a high electric field. When a pressurized gas is used to aid in the formation of a stable spray, the term pneumatically assisted electrospray ionization is used. The term ion spray is not recommended.
##   ontology_name ontology_prefix  type is_defining_ontology
## 1            ms              MS class                 TRUEData from the Human Protein Atlas is available via the hpar package.
Additional relevant packages are described in the RforProteomics vignettes.
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.so
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] nloptr_1.2.2.2        RforProteomics_1.28.0 BiocManager_1.30.10  
##  [4] knitr_1.30            proteomics_1.14.0     hpar_1.32.1          
##  [7] rols_2.18.0           MSGFplus_1.24.0       pRolocdata_1.28.0    
## [10] pRoloc_1.30.0         BiocParallel_1.24.1   MLInterfaces_1.70.0  
## [13] cluster_2.1.0         annotate_1.68.0       XML_3.99-0.5         
## [16] AnnotationDbi_1.52.0  IRanges_2.24.1        rpx_1.26.0           
## [19] MSnbase_2.16.0        ProtGenerics_1.22.0   S4Vectors_0.28.1     
## [22] Biobase_2.50.0        BiocGenerics_0.36.0   MSnID_1.24.0         
## [25] mzID_1.28.0           mzR_2.24.1            Rcpp_1.0.5           
## [28] BiocStyle_2.18.1     
## 
## loaded via a namespace (and not attached):
##   [1] AnnotationHub_2.22.0          BiocFileCache_1.14.0         
##   [3] plyr_1.8.6                    splines_4.0.3                
##   [5] ggplot2_3.3.3                 digest_0.6.27                
##   [7] foreach_1.5.1                 htmltools_0.5.0              
##   [9] magick_2.5.2                  viridis_0.5.1                
##  [11] gdata_2.18.0                  magrittr_2.0.1               
##  [13] memoise_1.1.0                 doParallel_1.0.16            
##  [15] mixtools_1.2.0                limma_3.46.0                 
##  [17] recipes_0.1.15                Biostrings_2.58.0            
##  [19] gower_0.2.2                   R.utils_2.10.1               
##  [21] askpass_1.1                   lpSolve_5.6.15               
##  [23] prettyunits_1.1.1             colorspace_2.0-0             
##  [25] blob_1.2.1                    rappdirs_0.3.1               
##  [27] xfun_0.19                     dplyr_1.0.2                  
##  [29] jsonlite_1.7.2                crayon_1.3.4                 
##  [31] RCurl_1.98-1.2                hexbin_1.28.1                
##  [33] graph_1.68.0                  impute_1.64.0                
##  [35] survival_3.2-7                iterators_1.0.13             
##  [37] glue_1.4.2                    gtable_0.3.0                 
##  [39] ipred_0.9-9                   zlibbioc_1.36.0              
##  [41] XVector_0.30.0                R.cache_0.14.0               
##  [43] kernlab_0.9-29                scales_1.1.1                 
##  [45] vsn_3.58.0                    mvtnorm_1.1-1                
##  [47] DBI_1.1.0                     viridisLite_0.3.0            
##  [49] xtable_1.8-4                  progress_1.2.2               
##  [51] bit_4.0.4                     proxy_0.4-24                 
##  [53] mclust_5.4.7                  preprocessCore_1.52.0        
##  [55] lava_1.6.8.1                  prodlim_2019.11.13           
##  [57] sampling_2.8                  httr_1.4.2                   
##  [59] FNN_1.1.3                     RColorBrewer_1.1-2           
##  [61] ellipsis_0.3.1                farver_2.0.3                 
##  [63] pkgconfig_2.0.3               R.methodsS3_1.8.1            
##  [65] nnet_7.3-14                   dbplyr_2.0.0                 
##  [67] caret_6.0-86                  labeling_0.4.2               
##  [69] tidyselect_1.1.0              rlang_0.4.10                 
##  [71] reshape2_1.4.4                later_1.1.0.1                
##  [73] biocViews_1.58.1              munsell_0.5.0                
##  [75] BiocVersion_3.12.0            tools_4.0.3                  
##  [77] LaplacesDemon_16.1.4          generics_0.1.0               
##  [79] RSQLite_2.2.1                 evaluate_0.14                
##  [81] stringr_1.4.0                 fastmap_1.0.1                
##  [83] yaml_2.2.1                    ModelMetrics_1.2.2.2         
##  [85] bit64_4.0.5                   randomForest_4.6-14          
##  [87] purrr_0.3.4                   dendextend_1.14.0            
##  [89] ncdf4_1.17                    RBGL_1.66.0                  
##  [91] nlme_3.1-151                  mime_0.9                     
##  [93] R.oo_1.24.0                   xml2_1.3.2                   
##  [95] biomaRt_2.46.0                compiler_4.0.3               
##  [97] curl_4.3                      interactiveDisplayBase_1.28.0
##  [99] e1071_1.7-4                   affyio_1.60.0                
## [101] tibble_3.0.4                  stringi_1.5.3                
## [103] highr_0.8                     lattice_0.20-41              
## [105] Matrix_1.3-0                  vctrs_0.3.6                  
## [107] pillar_1.4.7                  lifecycle_0.2.0              
## [109] RUnit_0.4.32                  MALDIquant_1.19.3            
## [111] data.table_1.13.6             bitops_1.0-6                 
## [113] httpuv_1.5.4                  R6_2.5.0                     
## [115] pcaMethods_1.82.0             affy_1.68.0                  
## [117] bookdown_0.21                 promises_1.1.1               
## [119] gridExtra_2.3                 codetools_0.2-18             
## [121] gtools_3.8.2                  MASS_7.3-53                  
## [123] assertthat_0.2.1              openssl_1.4.3                
## [125] withr_2.3.0                   hms_0.5.3                    
## [127] grid_4.0.3                    rpart_4.1-15                 
## [129] timeDate_3043.102             coda_0.19-4                  
## [131] class_7.3-17                  rmarkdown_2.6                
## [133] segmented_1.3-1               pROC_1.16.2                  
## [135] shiny_1.5.0                   lubridate_1.7.9.2