xcms 3.4.4
Package: xcms
Authors: Johannes Rainer
Modified: 2018-11-15 16:48:41
Compiled: Wed Feb 20 18:40:18 2019
This documents describes data import, exploration, preprocessing and analysis of
LCMS experiments with xcms version >= 3. The examples and basic workflow was
adapted from the original LC/MS Preprocessing and Analysis with xcms vignette
from Colin A. Smith.
The new user interface and methods use the XCMSnExp object (instead of the
old xcmsSet object) as a container for the pre-processing results. To
support packages and pipelines relying on the xcmsSet object, it is however
possible to convert an XCMSnExp into a xcmsSet object using the as method
(i.e.xset <- as(x, "xcmsSet"), with x being an XCMSnExp object.
xcms supports analysis of LC/MS data from files in (AIA/ANDI) NetCDF,
mzML/mzXML and mzData format. For the actual data import Bioconductor’s
mzR is used. For demonstration purpose we will analyze a
subset of the data from [1] in which the metabolic consequences of
knocking out the fatty acid amide hydrolase (FAAH) gene in mice was
investigated. The raw data files (in NetCDF format) are provided with the
faahKO data package. The data set consists of samples from the spinal cords of
6 knock-out and 6 wild-type mice. Each file contains data in centroid mode
acquired in positive ion mode form 200-600 m/z and 2500-4500 seconds.
Below we load all required packages, locate the raw CDF files within the faahKO
package and build a phenodata data frame describing the experimental setup.
library(xcms)
library(faahKO)
library(RColorBrewer)
library(pander)
library(magrittr)
## Get the full path to the CDF files
cdfs <- dir(system.file("cdf", package = "faahKO"), full.names = TRUE,
            recursive = TRUE)
## Create a phenodata data.frame
pd <- data.frame(sample_name = sub(basename(cdfs), pattern = ".CDF",
                                   replacement = "", fixed = TRUE),
                 sample_group = c(rep("KO", 6), rep("WT", 6)),
                 stringsAsFactors = FALSE)Subsequently we load the raw data as an OnDiskMSnExp object using the
readMSData method from the MSnbase package. While the MSnbase package was
originally developed for proteomics data processing, many of its functionality,
including raw data import and data representation, can be shared and reused in
metabolomics data analysis. Also, MSnbase can be used to centroid
profile-mode MS data (see the corresponding vignette in the MSnbase package).
raw_data <- readMSData(files = cdfs, pdata = new("NAnnotatedDataFrame", pd),
                       mode = "onDisk")The OnDiskMSnExp object contains general information about the number of
spectra, retention times, the measured total ion current etc, but does not
contain the full raw data (i.e. the m/z and intensity values from each measured
spectrum). Its memory footprint is thus rather small making it an ideal object
to represent large metabolomics experiments while still allowing to perform
simple quality controls, data inspection and exploration as well as data
sub-setting operations. The m/z and intensity values are imported from the raw
data files on demand, hence the location of the raw data files should not be
changed after initial data import.
The OnDiskMSnExp organizes the MS data by spectrum and provides the methods
intensity, mz and rtime to access the raw data from the files (the measured
intensity values, the corresponding m/z and retention time values). In addition,
the spectra method could be used to return all data encapsulated in Spectrum
classes. Below we extract the retention time values from the object.
head(rtime(raw_data))## F01.S0001 F01.S0002 F01.S0003 F01.S0004 F01.S0005 F01.S0006 
##  2501.378  2502.943  2504.508  2506.073  2507.638  2509.203All data is returned as one-dimensional vectors (a numeric vector for rtime
and a list of numeric vectors for mz and intensity, each containing the
values from one spectrum), even if the experiment consists of multiple
files/samples. The fromFile function returns a numeric vector that provides
the mapping of the values to the originating file. Below we use the fromFile
indices to organize the mz values by file.
mzs <- mz(raw_data)
## Split the list by file
mzs_by_file <- split(mzs, f = fromFile(raw_data))
length(mzs_by_file)## [1] 12As a first evaluation of the data we plot below the base peak chromatogram (BPC)
for each file in our experiment. We use the chromatogram method and set the
aggregationFun to "max" to return for each spectrum the maximal intensity
and hence create the BPC from the raw data. To create a total ion chromatogram
we could set aggregationFun to sum.
## Get the base peak chromatograms. This reads data from the files.
bpis <- chromatogram(raw_data, aggregationFun = "max")
## Define colors for the two groups
group_colors <- paste0(brewer.pal(3, "Set1")[1:2], "60")
names(group_colors) <- c("KO", "WT")
## Plot all chromatograms.
plot(bpis, col = group_colors[raw_data$sample_group])The chromatogram method returned a Chromatograms object that organizes
individual Chromatogram objects (which in fact contain the chromatographic
data) in a two-dimensional array: columns represent samples and rows
(optionally) m/z and/or retention time ranges. Below we extract the chromatogram
of the first sample and access its retention time and intensity values.
bpi_1 <- bpis[1, 1]
head(rtime(bpi_1))## F01.S0001 F01.S0002 F01.S0003 F01.S0004 F01.S0005 F01.S0006 
##  2501.378  2502.943  2504.508  2506.073  2507.638  2509.203head(intensity(bpi_1))## F01.S0001 F01.S0002 F01.S0003 F01.S0004 F01.S0005 F01.S0006 
##     43888     43960     43392     42632     42200     42288The chromatogram method supports also extraction of chromatographic data from
a m/z-rt slice of the MS data. In the next section we will use this method to
create an extracted ion chromatogram (EIC) for a selected peak.
Note that chromatogram reads the raw data from each file to calculate the
chromatogram. The bpi and tic methods on the other hand do not read any data
from the raw files but use the respective information that was provided in the
header definition of the input files (which might be different from the actual
data).
Below we create boxplots representing the distribution of total ion currents per file. Such plots can be very useful to spot problematic or failing MS runs.
## Get the total ion current by file
tc <- split(tic(raw_data), f = fromFile(raw_data))
boxplot(tc, col = group_colors[raw_data$sample_group],
        ylab = "intensity", main = "Total ion current")
Figure 1: Distribution of total ion currents per file
Next we perform the chromatographic peak detection using the centWave
algorithm [2]. Before running the peak detection it is however
strongly suggested to visually inspect e.g. the extracted ion chromatogram of
internal standards or known compounds to evaluate and adapt the peak detection
settings since the default settings will not be appropriate for most LCMS
experiments. The two most critical parameters for centWave are the peakwidth
(expected range of chromatographic peak widths) and ppm (maximum expected
deviation of m/z values of centroids corresponding to one chromatographic peak;
this is usually much larger than the ppm specified by the manufacturer)
parameters. To evaluate the typical chromatographic peak width we plot the EIC
for one peak.
## Define the rt and m/z range of the peak area
rtr <- c(2700, 2900)
mzr <- c(334.9, 335.1)
## extract the chromatogram
chr_raw <- chromatogram(raw_data, mz = mzr, rt = rtr)
plot(chr_raw, col = group_colors[chr_raw$sample_group])
Figure 2: Extracted ion chromatogram for one peak
Note that Chromatogram objects extracted by the chromatogram method contain
an NA value if in a certain scan (i.e. for a specific retention time) no
signal was measured in the respective mz range. This is reflected by the lines
not being drawn as continuous lines in the plot above.
The peak above has a width of about 50 seconds. The peakwidth parameter should
be set to accommodate the expected widths of peak in the data set. We set it to
20,80 for the present example data set.
For the ppm parameter we extract the full MS data (intensity, retention time
and m/z values) corresponding to the above peak. To this end we first filter the
raw object by retention time, then by m/z and finally plot the object withtype = "XIC" to produce the plot below. We use the pipe (%>%) command better
illustrate the corresponding workflow.
raw_data %>%
    filterRt(rt = rtr) %>%
    filterMz(mz = mzr) %>%
    plot(type = "XIC")
Figure 3: Visualization of the raw MS data for one peak
 For each plot: upper panel: chromatogram plotting the intensity values against the retention time, lower panel m/z against retention time plot. The individual data points are colored according to the intensity.
In the present data there is actually no variation in the m/z values. Usually
one would see the m/z values (lower panel) scatter around the real m/z value
of the compound. The first step of the centWave algorithm defines so called
regions of interest (ROI) based on the difference of m/z values from consecutive
scans. In detail, m/z values from consecutive scans are included into a ROI if
the difference between the m/z and the mean m/z of the ROI is smaller than the
user defined ppm parameter. A reasonable choice for the ppm could thus be
the maximal m/z difference of data points from neighboring scans/spectra that
are part of the chromatographic peak. It is suggested to inspect the ranges of
m/z values for many compounds (either internal standards or compounds known to
be present in the sample) and define the ppm parameter for centWave
according to these.
Below we perform the chromatographic peak detection using the findChromPeaks
method. The submitted parameter object defines which algorithm will be used
and allows to define the settings for this algorithm. Note that we set the
argument noise to 5000 to reduce the run time of this vignette. With this
setting we consider only signals with a value larger than 5000 in the peak
detection step.
cwp <- CentWaveParam(peakwidth = c(20, 80), noise = 5000)
xdata <- findChromPeaks(raw_data, param = cwp)The results are returned as an XCMSnExp object which extends the
OnDiskMSnExp object by storing also LC/GC-MS preprocessing results. This means
also that all methods to sub-set and filter the data or to access the (raw) data
are inherited from the OnDiskMSnExp object. The results from the
chromatographic peak detection can be accessed with the chromPeaks method.
head(chromPeaks(xdata))##           mz mzmin mzmax       rt    rtmin    rtmax      into      intb
## CP0001 453.2 453.2 453.2 2509.203 2501.378 2527.982 1007409.0 1007380.8
## CP0002 236.1 236.1 236.1 2518.593 2501.378 2537.372  253501.0  226896.3
## CP0003 594.0 594.0 594.0 2601.535 2581.191 2637.529  161042.2  149297.3
## CP0004 577.0 577.0 577.0 2604.665 2581.191 2626.574  136105.2  129195.5
## CP0005 369.2 369.2 369.2 2587.451 2556.151 2631.269  483852.3  483777.1
## CP0006 369.2 369.2 369.2 2568.671 2557.716 2578.061  144624.8  144602.9
##         maxo    sn sample is_filled
## CP0001 38152 38151      1         0
## CP0002 12957    11      1         0
## CP0003  7850    13      1         0
## CP0004  6215    13      1         0
## CP0005  7215  7214      1         0
## CP0006  7033  7032      1         0The returned matrix provides the m/z and retention time range for each
identified chromatographic peak as well as the integrated signal intensity
(“into”) and the maximal peak intensitity (“maxo”). Columns “sample” contains
the index of the sample in the object/experiment in which the peak was
identified.
Below we use the data from this table to calculate some per-file summaries.
summary_fun <- function(z) {
    c(peak_count = nrow(z), rt = quantile(z[, "rtmax"] - z[, "rtmin"]))
}
T <- lapply(split.data.frame(chromPeaks(xdata),
                             f = chromPeaks(xdata)[, "sample"]),
            FUN = summary_fun)
T <- do.call(rbind, T)
rownames(T) <- basename(fileNames(xdata))
pandoc.table(T,
             caption = paste0("Summary statistics on identified chromatographic",
                              " peaks. Shown are number of identified peaks per",
                              " sample and widths/duration of chromatographic ",
                              "peaks."))| peak_count | rt.0% | rt.25% | rt.50% | rt.75% | rt.100% | |
|---|---|---|---|---|---|---|
| ko15.CDF | 561 | 7.824 | 28.17 | 39.12 | 50.08 | 139.3 | 
| ko16.CDF | 741 | 4.695 | 23.47 | 39.12 | 50.08 | 148.7 | 
| ko18.CDF | 448 | 4.694 | 28.17 | 42.25 | 53.21 | 173.7 | 
| ko19.CDF | 333 | 7.824 | 31.3 | 46.95 | 59.47 | 181.5 | 
| ko21.CDF | 276 | 1.565 | 35.99 | 48.51 | 61.03 | 164.3 | 
| ko22.CDF | 325 | 7.824 | 26.61 | 43.82 | 56.34 | 126.8 | 
| wt15.CDF | 564 | 4.695 | 26.6 | 39.12 | 50.08 | 161.2 | 
| wt16.CDF | 507 | 7.824 | 28.17 | 40.69 | 53.21 | 151.8 | 
| wt18.CDF | 443 | 3.13 | 26.61 | 42.25 | 57.12 | 192.5 | 
| wt19.CDF | 350 | 7.824 | 31.3 | 45.38 | 57.9 | 184.7 | 
| wt21.CDF | 317 | 3.13 | 28.17 | 45.38 | 61.03 | 172.1 | 
| wt22.CDF | 437 | 1.565 | 26.61 | 40.69 | 53.21 | 161.2 | 
We can also plot the location of the identified chromatographic peaks in the
m/z - retention time space for one file using the plotChromPeaks
function. Below we plot the chromatographic peaks for the 3rd sample.
plotChromPeaks(xdata, file = 3)
Figure 4: Identified chromatographic peaks in the m/z by retention time space for one sample
To get a global overview of the peak detection we can plot the frequency of identified peaks per file along the retention time axis. This allows to identify time periods along the MS run in which a higher number of peaks was identified and evaluate whether this is consistent across files.
plotChromPeakImage(xdata)
Figure 5: Frequency of identified chromatographic peaks along the retention time axis
 The frequency is color coded with higher frequency being represented by yellow-white. Each line shows the peak frequency for one file.
Next we highlight the identified chromatographic peaks for the example peak from before. Evaluating such plots on a list of peaks corresponding to known peaks or internal standards helps to ensure that peak detection settings were appropriate and correctly identified the expected peaks.
plot(chr_raw, col = group_colors[chr_raw$sample_group], lwd = 2)
highlightChromPeaks(xdata, border = group_colors[chr_raw$sample_group],
                    lty = 3, rt = rtr, mz = mzr, type = "rect")
Figure 6: Signal for an example peak
 Red and blue colors represent KO and wild type samples, respectively. The rectangles indicate the identified chromatographic peaks per sample.
In the plot above the peak definitions are indicated with rectangles containing
the full peak. In addition it is possible to represent the apex position of each
peak with a single point (passing argumenttype = "point" to the function), or
draw the actually identified peak by specifying type = "polygon". Below we use
this option to fill the peak area for each identified chromatographic peak in
each sample. Whether individual peaks can be still identified in such a plot
depends however on the number of samples from which peaks are drawn.
plot(chr_raw, col = group_colors[chr_raw$sample_group], lwd = 2)
highlightChromPeaks(xdata, col = group_colors[chr_raw$sample_group],
                    lty = 3, rt = rtr, mz = mzr, border = NA,
                    type = "polygon")
Figure 7: Signal for an example peak
 Red and blue colors represent KO and wild type samples, respectively. The signal area of identified chromatographic peaks are filled with a color.
Note that we can also specifically extract identified chromatographic peaks for
a selected region by providing the respective m/z and retention time ranges with
the mz and rt arguments in the chromPeaks method.
pander(chromPeaks(xdata, mz = mzr, rt = rtr),
       caption = paste("Identified chromatographic peaks in a selected ",
                       "m/z and retention time range."))| mz | mzmin | mzmax | rt | rtmin | rtmax | into | intb | |
|---|---|---|---|---|---|---|---|---|
| CP0057 | 335 | 335 | 335 | 2782 | 2761 | 2810 | 412134 | 383167 | 
| CP0628 | 335 | 335 | 335 | 2786 | 2764 | 2813 | 1496244 | 1461187 | 
| CP1349 | 335 | 335 | 335 | 2786 | 2764 | 2813 | 211297 | 194283 | 
| CP1795 | 335 | 335 | 335 | 2783 | 2763 | 2813 | 285228 | 264249 | 
| CP2120 | 335 | 335 | 335 | 2797 | 2775 | 2816 | 159059 | 149230 | 
| CP3308 | 335 | 335 | 335 | 2786 | 2764 | 2813 | 932645 | 915334 | 
| CP3838 | 335 | 335 | 335 | 2788 | 2766 | 2821 | 1636669 | 1603951 | 
| CP4254 | 335 | 335 | 335 | 2785 | 2760 | 2810 | 643673 | 631119 | 
| CP4601 | 335 | 335 | 335 | 2792 | 2769 | 2824 | 876586 | 848569 | 
| CP4906 | 335 | 335 | 335 | 2789 | 2774 | 2807 | 89583 | 83927 | 
| maxo | sn | sample | is_filled | |
|---|---|---|---|---|
| CP0057 | 16856 | 23 | 1 | 0 | 
| CP0628 | 58736 | 72 | 2 | 0 | 
| CP1349 | 8158 | 15 | 3 | 0 | 
| CP1795 | 9154 | 18 | 4 | 0 | 
| CP2120 | 6295 | 13 | 5 | 0 | 
| CP3308 | 35856 | 66 | 8 | 0 | 
| CP3838 | 54640 | 78 | 9 | 0 | 
| CP4254 | 20672 | 54 | 10 | 0 | 
| CP4601 | 27200 | 36 | 11 | 0 | 
| CP4906 | 5427 | 12 | 12 | 0 | 
Finally we plot also the distribution of peak intensity per sample. This allows to investigate whether systematic differences in peak signals between samples are present.
## Extract a list of per-sample peak intensities (in log2 scale)
ints <- split(log2(chromPeaks(xdata)[, "into"]),
              f = chromPeaks(xdata)[, "sample"])
boxplot(ints, varwidth = TRUE, col = group_colors[xdata$sample_group],
        ylab = expression(log[2]~intensity), main = "Peak intensities")
grid(nx = NA, ny = NULL)
Figure 8: Peak intensity distribution per sample
The time at which analytes elute in the chromatography can vary between samples (and even compounds). Such a difference was already observable in the extracted ion chromatogram plot shown as an example in the previous section. The alignment step, also referred to as retention time correction, aims at adjusting this by shifting signals along the retention time axis to align the signals between different samples within an experiment.
A plethora of alignment algorithms exist (see [3]), with some of
them being implemented also in xcms. The method to perform the
alignment/retention time correction in xcms is adjustRtime which uses
different alignment algorithms depending on the provided parameter class. In the
example below we use the obiwarp method [4] to align the
samples. We use abinSize = 0.6 which creates warping functions in mz bins of 0.6. Also here
it is advisable to modify the settings for each experiment and evaluate if
retention time correction did align internal controls or known compounds
properly.
xdata <- adjustRtime(xdata, param = ObiwarpParam(binSize = 0.6))adjustRtime, besides calculating adjusted retention times for each spectrum,
does also adjust the reported retention times of the identified chromatographic
peaks.
To extract the adjusted retention times we can use the adjustedRtime method,
or simply the rtime method that, if present, returns by default adjusted
retention times from an XCMSnExp object.
## Extract adjusted retention times
head(adjustedRtime(xdata))## F01.S0001 F01.S0002 F01.S0003 F01.S0004 F01.S0005 F01.S0006 
##  2501.378  2502.958  2504.538  2506.118  2507.699  2509.280## Or simply use the rtime method
head(rtime(xdata))## F01.S0001 F01.S0002 F01.S0003 F01.S0004 F01.S0005 F01.S0006 
##  2501.378  2502.958  2504.538  2506.118  2507.699  2509.280Raw retention times can be extracted from an XCMSnExp containing
aligned data with rtime(xdata, adjusted = FALSE).
To evaluate the impact of the alignment we plot the BPC on the adjusted data. In
addition we plot the differences of the adjusted- to the raw retention times per
sample using the plotAdjustedRtime function.
## Get the base peak chromatograms.
bpis_adj <- chromatogram(xdata, aggregationFun = "max")
par(mfrow = c(2, 1), mar = c(4.5, 4.2, 1, 0.5))
plot(bpis_adj, col = group_colors[bpis_adj$sample_group])
## Plot also the difference of adjusted to raw retention time.
plotAdjustedRtime(xdata, col = group_colors[xdata$sample_group])
Figure 9: Obiwarp aligned data
 Base peak chromatogram after alignment (top) and difference between adjusted and raw retention times along the retention time axis (bottom).
Too large differences between adjusted and raw retention times could indicate poorly performing samples or alignment.
Alternatively we could use the peak groups alignment method that adjusts the
retention time by aligning previously identified hook peaks (chromatographic
peaks present in most/all samples). Ideally, these hook peaks should span most
part of the retention time range. Below we first restore the raw retention times
(also of the identified peaks) using the dropAdjustedRtime methods. Note that
a drop* method is available for each preprocessing step allowing to remove the
respective results from the XCMSnExp object.
## Does the object have adjusted retention times?
hasAdjustedRtime(xdata)## [1] TRUE## Drop the alignment results.
xdata <- dropAdjustedRtime(xdata)
## Does the object have adjusted retention times?
hasAdjustedRtime(xdata)## [1] FALSENote: XCMSnExp objects hold the raw along with the adjusted retention
times and subsetting will in most cases drop the adjusted retention
times. Sometimes it might thus be useful to replace the raw retention times
with the adjusted retention times. This can be done with the
applyAdjustedRtime.
As noted above the peak groups method requires peak groups (features) present
in most samples to perform the alignment. We thus have to perform a first
correspondence run to identify such peaks (details about the algorithm used are
presented in the next section). We use here again default settings, but it is
strongly advised to adapt the parameters for each data set. The definition of
the sample groups (i.e. assignment of individual samples to the sample groups in
the experiment) is mandatory for the PeakDensityParam. If there are no sample
groups in the experiment sampleGroups should be set to a single value for each
file (e.g. rep(1, length(fileNames(xdata))).
## Correspondence: group peaks across samples.
pdp <- PeakDensityParam(sampleGroups = xdata$sample_group,
                        minFraction = 0.8)
xdata <- groupChromPeaks(xdata, param = pdp)
## Now the retention time correction.
pgp <- PeakGroupsParam(minFraction = 0.85)
## Get the peak groups that would be used for alignment.
xdata <- adjustRtime(xdata, param = pgp)## Warning: Adjusted retention times had to be re-adjusted for some files
## to ensure them being in the same order than the raw retention times. A
## call to 'dropAdjustedRtime' might thus fail to restore retention times
## of chromatographic peaks to their original values. Eventually consider to
## increase the value of the 'span' parameter.Note also that we could use the adjustedRtimePeakGroups method on the object
before alignment to evaluate on which features (peak groups) the alignment would
be performed. This can be useful to test different settings for the peak groups
algorithm. Also, it is possible to manually select or define certain peak groups
(i.e. their retention times per sample) and provide this matrix to the
PeakGroupsParam class with the peakGroupsMatrix argument.
Below plot the difference between raw and adjusted retention times using the
plotAdjustedRtime function, which, if the peak groups method is used for
alignment, also highlights the peak groups used in the adjustment.
## Plot the difference of adjusted to raw retention time.
plotAdjustedRtime(xdata, col = group_colors[xdata$sample_group],
                  peakGroupsCol = "grey", peakGroupsPch = 1)
Figure 10: Peak groups aligned data
At last we evaluate the impact of the alignment on the test peak.
par(mfrow = c(2, 1))
## Plot the raw data
plot(chr_raw, col = group_colors[chr_raw$sample_group])
## Extract the chromatogram from the adjusted object
chr_adj <- chromatogram(xdata, rt = rtr, mz = mzr)
plot(chr_adj, col = group_colors[chr_raw$sample_group])
Figure 11: Example extracted ion chromatogram before (top) and after alignment (bottom)
The final step in the metabolomics preprocessing is the correspondence that
matches detected chromatographic peaks between samples (and depending on the
settings, also within samples if they are adjacent). The method to perform the
correspondence in xcms is groupChromPeaks. We will use the peak density
method [5] to group chromatographic peaks. The algorithm combines
chromatographic peaks depending on the density of peaks along the retention time
axis within small slices along the mz dimension. To illustrate this we plot
below the chromatogram for an mz slice with multiple chromatographic peaks
within each sample. We use below a value of 0.4 for the minFraction parameter
hence only chromatographic peaks present in at least 40% of the samples per
sample group are grouped into a feature. The sample group assignment is
specified with the sampleGroups argument.
## Define the mz slice.
mzr <- c(305.05, 305.15)
## Extract and plot the chromatograms
chr_mzr <- chromatogram(xdata, mz = mzr, rt = c(2500, 4000))
par(mfrow = c(3, 1), mar = c(1, 4, 1, 0.5))
cols <- group_colors[chr_mzr$sample_group]
plot(chr_mzr, col = cols, xaxt = "n", xlab = "")
## Highlight the detected peaks in that region.
highlightChromPeaks(xdata, mz = mzr, col = cols, type = "point", pch = 16)
## Define the parameters for the peak density method
pdp <- PeakDensityParam(sampleGroups = xdata$sample_group,
                        minFraction = 0.4, bw = 30)
par(mar = c(4, 4, 1, 0.5))
plotChromPeakDensity(xdata, mz = mzr, col = cols, param = pdp,
                     pch = 16, xlim = c(2500, 4000))
## Use a different bw
pdp <- PeakDensityParam(sampleGroups = xdata$sample_group,
                        minFraction = 0.4, bw = 20)
plotChromPeakDensity(xdata, mz = mzr, col = cols, param = pdp,
                     pch = 16, xlim = c(2500, 4000))
Figure 12: Example for peak density correspondence
 Upper panel: chromatogram for an mz slice with multiple chromatographic peaks. Middle and lower panel: identified chromatographic peaks at their retention time (x-axis) and index within samples of the experiments (y-axis) for different values of the bw parameter. The black line represents the peak density estimate. Grouping of peaks (based on the provided settings) is indicated by grey rectangles.
The upper panel in the plot above shows the extracted ion chromatogram for each
sample with the detected peaks highlighted. The middle and lower plot shows the
retention time for each detected peak within the different samples. The black
solid line represents the density distribution of detected peaks along the
retention times. Peaks combined into features (peak groups) are indicated with
grey rectangles. Different values for the bw parameter of the
PeakDensityParam were used:bw = 30 in the middle and bw = 20 in the lower
panel. With the default value for the parameter bw the two neighboring
chromatographic peaks would be grouped into the same feature, while with a bw
of 20 they would be grouped into separate features. This grouping depends on
the parameters for the density function and other parameters passed to the
algorithm with the PeakDensityParam.
## Perform the correspondence
pdp <- PeakDensityParam(sampleGroups = xdata$sample_group,
                        minFraction = 0.4, bw = 20)
xdata <- groupChromPeaks(xdata, param = pdp)The results from the correspondence can be extracted using the
featureDefinitions method, that returns a DataFrame with the definition of
the features (i.e. the mz and rt ranges and, in column peakidx, the index of
the chromatographic peaks in the chromPeaks matrix for each feature).
## Extract the feature definitions
featureDefinitions(xdata)## DataFrame with 351 rows and 10 columns
##                  mzmed            mzmin            mzmax            rtmed
##              <numeric>        <numeric>        <numeric>        <numeric>
## FT001              205              205              205 2789.03721644709
## FT002              206              206              206 2788.30543697305
## FT003 207.100006103516 207.100006103516 207.100006103516 2718.79157726896
## FT004              233              233              233 3024.32548193573
## FT005 233.100006103516 233.100006103516 233.100006103516  3024.3863124223
## ...                ...              ...              ...              ...
## FT347 594.200012207031 594.200012207031 594.200012207031 3403.03259249868
## FT348 594.299987792969 594.299987792969 594.400024414062 3614.08509687872
## FT349 595.200012207031 595.200012207031 595.299987792969 3004.70699121539
## FT350 596.299987792969 596.299987792969 596.400024414062 3817.51017727276
## FT351 597.400024414062 597.299987792969 597.400024414062  3818.2445730933
##                  rtmin            rtmax    npeaks        KO        WT
##              <numeric>        <numeric> <numeric> <numeric> <numeric>
## FT001 2787.98587855926 2791.11395773677        12         6         6
## FT002 2786.42663550763 2791.18821663553        12         5         6
## FT003 2712.66710628365 2721.92752961577        13         5         4
## FT004 3014.17378362294 3058.09080263218        11         4         5
## FT005 3015.83729934739  3030.9739137208         5         4         1
## ...                ...              ...       ...       ...       ...
## FT347 3378.34778786262 3409.33220659131         6         1         5
## FT348  3608.8592486129 3662.82600472661        14         5         3
## FT349 2996.10745854437 3018.29416062423         8         2         4
## FT350  3803.8495800095 3825.79574827039        14         6         4
## FT351 3812.49998761581 3826.28011505943         6         2         3
##                                                                                    peakidx
##                                                                                     <list>
## FT001               c(68, 629, 1362, 1799, 2129, 2409, 2762, 3311, 3839, 4257, 4602, 4922)
## FT002               c(48, 614, 1342, 1350, 1790, 2397, 2743, 3299, 3820, 4242, 4588, 4909)
## FT003           c(30, 36, 599, 1327, 1335, 2110, 2382, 2724, 3282, 3796, 3797, 3803, 4891)
## FT004                   c(107, 1398, 2140, 2425, 2816, 2828, 3329, 3884, 4633, 4642, 4960)
## FT005                                                       c(101, 1394, 2143, 2429, 4639)
## ...                                                                                    ...
## FT347                                                c(2215, 2961, 3974, 4375, 4716, 5088)
## FT348  c(364, 373, 1025, 1037, 1589, 2300, 2301, 2589, 2604, 4060, 4771, 4785, 5177, 5210)
## FT349                                         c(94, 99, 751, 2819, 3882, 4630, 4953, 4964)
## FT350 c(481, 1197, 1673, 1677, 2044, 2331, 2649, 3653, 3656, 3660, 4142, 4522, 5265, 5269)
## FT351                                                c(1192, 1676, 4143, 4840, 4842, 5268)The featureValues method returns a matrix with rows being features and
columns samples. The content of this matrix can be defined using the value
argument. The default isvalue = "index" which simply returns the index of the
peak (in the chromPeaks matrix) assigned to a feature in a sample. Setting
value = "into" returns a matrix with the integrated signal of the peaks
corresponding to a feature in a sample. Any column name of the chromPeaks
matrix can be passed to the argument value. Below we extract the integrated
peak intensity per feature/sample.
## Extract the into column for each feature.
head(featureValues(xdata, value = "into"))##        ko15.CDF  ko16.CDF  ko18.CDF   ko19.CDF  ko21.CDF  ko22.CDF  wt15.CDF
## FT001 1924712.0 1757151.0 1714581.8 1220358.09 1383416.7 1180288.2 2129885.1
## FT002  213659.3  289500.7  194603.7   92590.27        NA  178285.7  253825.6
## FT003  349011.5  451863.7  337473.3         NA  343897.8  208002.8  364609.8
## FT004  286221.4        NA  364299.5         NA  137261.0  149097.6  255697.7
## FT005  162905.3        NA  209999.6         NA  164009.0  111157.9        NA
## FT006 1160580.5        NA 1345515.1  608016.51  380970.3  300716.1 1286883.0
##        wt16.CDF   wt18.CDF  wt19.CDF  wt21.CDF  wt22.CDF
## FT001 1634342.0 1810112.28 1507943.1 1623589.2 1354004.9
## FT002  241844.4  228501.09  216393.1  240606.0  185399.5
## FT003  360908.9   54566.86        NA        NA  221937.5
## FT004  311296.8  272267.90        NA  181640.2  271128.0
## FT005        NA         NA        NA  366441.5        NA
## FT006 1739516.6  515676.86  621437.9  639755.3  508546.4This feature matrix contains NA for samples in which no chromatographic peak
was detected in the feature’s m/z-rt region. While in many cases there might
indeed be no peak signal in the respective region, it might also be that there
is signal, but the peak detection algorithm failed to detect a chromatographic
peak. xcms provides the fillChromPeaks method to fill in intensity data
for such missing values from the original files. The filled in peaks are added
to the chromPeaks matrix and are flagged with an 1 in the "is_filled"
column. Below we perform such a filling-in of missing peaks.
## Filling missing peaks using default settings. Alternatively we could
## pass a FillChromPeaksParam object to the method.
xdata <- fillChromPeaks(xdata)
head(featureValues(xdata))##       ko15.CDF ko16.CDF ko18.CDF ko19.CDF ko21.CDF ko22.CDF wt15.CDF
## FT001       68      629     1362     1799     2129     2409     2762
## FT002       48      614     1342     1790     5610     2397     2743
## FT003       30      599     1327     5499     2110     2382     2724
## FT004      107     5368     1398     5500     2140     2425     2816
## FT005      101     5369     1394     5501     2143     2429     5846
## FT006      413     5370     1644     2001     2295     2622     3107
##       wt16.CDF wt18.CDF wt19.CDF wt21.CDF wt22.CDF
## FT001     3311     3839     4257     4602     4922
## FT002     3299     3820     4242     4588     4909
## FT003     3282     3796     6112     6227     4891
## FT004     3329     3884     6113     4633     4960
## FT005     5939     6026     6114     4639     6350
## FT006     3587     4091     4497     4809     5221For features without detected peaks in a sample, the method extracts all
intensities in the mz-rt region of the feature, integrates the signal and adds a
filled-in peak to the chromPeaks matrix. No peak is added if no signal is
measured/available for the mz-rt region of the feature. For these, even after
filling in missing peak data, a NA is reported in the featureValues matrix.
It should be mentioned that fillChromPeaks uses columns "rtmin", "rtmax",
"mzmin" and "mzmax" from the featureDefinitions table to define the region
from which the signal should be integrated to generate the filled-in peak
signal. These values correspond however to the positions of the peak apex not
the peak boundaries of all chromatographic peaks assigned to a feature. It might
be advisable to increase this area in retention time dimension by a constant
value appropriate to the average peak width in the experiment. Such a value can
be specified with fixedRt of the FillChromPeaksParam. If the average peak
width in the experiment is 20 seconds, specifyingfixedRt = 10 ensures that the area
from which peaks are integrated is at least 20 seconds wide.
Below we compare the number of missing values before and after filling in
missing values. We can use the parameter filled of the featureValues method
to define whether or not filled-in peak values should be returned too.
## Missing values before filling in peaks
apply(featureValues(xdata, filled = FALSE), MARGIN = 2,
      FUN = function(z) sum(is.na(z)))## ko15.CDF ko16.CDF ko18.CDF ko19.CDF ko21.CDF ko22.CDF wt15.CDF wt16.CDF 
##       72       74       69      119      142      114      101       96 
## wt18.CDF wt19.CDF wt21.CDF wt22.CDF 
##       95      128      136      118## Missing values after filling in peaks
apply(featureValues(xdata), MARGIN = 2,
      FUN = function(z) sum(is.na(z)))## ko15.CDF ko16.CDF ko18.CDF ko19.CDF ko21.CDF ko22.CDF wt15.CDF wt16.CDF 
##        7        8        4        8       12        8        8        9 
## wt18.CDF wt19.CDF wt21.CDF wt22.CDF 
##        9       13       13        7Next we use the featureSummary function to get a general per-feature summary
that includes the number of samples in which a peak was found or the number of
samples in which more than one peak was assigned to the feature. Specifying also
sample groups breaks down these summary statistics for each individual sample
group.
head(featureSummary(xdata, group = xdata$sample_group))##       count      perc multi_count multi_perc       rsd KO_count   KO_perc
## FT001    12 100.00000           0   0.000000 0.1789898        6 100.00000
## FT002    11  91.66667           1   9.090909 0.2409441        5  83.33333
## FT003     9  75.00000           3  33.333333 0.2844117        5  83.33333
## FT004     9  75.00000           2  22.222222 0.3082528        4  66.66667
## FT005     5  41.66667           0   0.000000 0.4824173        4  66.66667
## FT006    11  91.66667           7  63.636364 0.5229604        5  83.33333
##       KO_multi_count KO_multi_perc    KO_rsd WT_count   WT_perc
## FT001              0             0 0.2027357        6 100.00000
## FT002              1            20 0.3653441        6 100.00000
## FT003              2            40 0.2562703        4  66.66667
## FT004              0             0 0.4694612        5  83.33333
## FT005              0             0 0.2492851        1  16.66667
## FT006              4            80 0.5059513        6 100.00000
##       WT_multi_count WT_multi_perc    WT_rsd
## FT001              0             0 0.1601279
## FT002              0             0 0.1069526
## FT003              1            25 0.3335925
## FT004              2            40 0.1840914
## FT005              0             0        NA
## FT006              3            50 0.5758382The performance of peak detection, alignment and correspondence should always be
evaluated by inspecting extracted ion chromatograms e.g. of known compounds,
internal standards or identified features in general. The featureChromatograms
function allows to extract chromatograms for each feature present in
featureDefinitions. The returned Chromatograms object contains an ion
chromatogram for each feature (each row containing the data for one feature) and
sample (each column representing containing data for one sample). Below we
extract the chromatograms for the first 4 features.
feature_chroms <- featureChromatograms(xdata, features = 1:4)
feature_chroms## Chromatograms with 4 rows and 12 columns
##                   1              2              3              4
##      <Chromatogram> <Chromatogram> <Chromatogram> <Chromatogram>
## [1,]     length: 38     length: 38     length: 39     length: 38
## [2,]     length: 42     length: 41     length: 43     length: 41
## [3,]     length: 36     length: 36     length: 35     length: 35
## [4,]     length: 67     length: 69     length: 70     length: 70
##                   5              6              7              8
##      <Chromatogram> <Chromatogram> <Chromatogram> <Chromatogram>
## [1,]     length: 37     length: 37     length: 39     length: 38
## [2,]     length: 41     length: 41     length: 42     length: 42
## [3,]     length: 36     length: 36     length: 36     length: 36
## [4,]     length: 73     length: 73     length: 66     length: 69
##                   9             10             11             12
##      <Chromatogram> <Chromatogram> <Chromatogram> <Chromatogram>
## [1,]     length: 38     length: 40     length: 38     length: 36
## [2,]     length: 41     length: 44     length: 42     length: 40
## [3,]     length: 37     length: 34     length: 34     length: 36
## [4,]     length: 71     length: 71     length: 75     length: 73
## phenoData with 2 variables
## featureData with 5 variablesAnd plot the extracted ion chromatograms for feature number 3 and 4.
par(mfrow = c(1, 2))
plot(feature_chroms[3, ], col = group_colors[feature_chroms$sample_group])
plot(feature_chroms[4, ], col = group_colors[feature_chroms$sample_group])
Figure 13: Extracted ion chromatograms for feature 3 and 4
At last we perform a principal component analysis to evaluate the grouping of the samples in this experiment. Note that we did not perform any data normalization hence the grouping might (and will) also be influenced by technical biases.
## Extract the features and log2 transform them
ft_ints <- log2(featureValues(xdata, value = "into"))
## Perform the PCA omitting all features with an NA in any of the
## samples. Also, the intensities are mean centered.
pc <- prcomp(t(na.omit(ft_ints)), center = TRUE)
## Plot the PCA
cols <- group_colors[xdata$sample_group]
pcSummary <- summary(pc)
plot(pc$x[, 1], pc$x[,2], pch = 21, main = "",
     xlab = paste0("PC1: ", format(pcSummary$importance[2, 1] * 100,
                                   digits = 3), " % variance"),
     ylab = paste0("PC2: ", format(pcSummary$importance[2, 2] * 100,
                                   digits = 3), " % variance"),
     col = "darkgrey", bg = cols, cex = 2)
grid()
text(pc$x[, 1], pc$x[,2], labels = xdata$sample_name, col = "darkgrey",
     pos = 3, cex = 2)
Figure 14: PCA for the faahKO data set, un-normalized intensities
We can see the expected separation between the KO and WT samples on PC2. On PC1 samples separate based on their ID, samples with an ID <= 18 from samples with an ID > 18. This separation might be caused by a technical bias (e.g. measurements performed on different days/weeks) or due to biological properties of the mice analyzed (sex, age, litter mates etc).
Normalizing features’ signal intensities is required, but at present not (yet)
supported in xcms (some methods might be added in near future). Also, for the
identification of e.g. features with significant different
intensities/abundances it is suggested to use functionality provided in other R
packages, such as Bioconductor’s excellent limma package. To enable support
also for other packages that rely on the old xcmsSet result object, it is
possible to coerce the new XCMSnExp object to an xcmsSet object using
xset <- as(x, "xcmsSet"), with x being an XCMSnExp object.
For a detailed description of the new data objects and changes/improvements compared to the original user interface see the new_functionality vignette.
XCMSnExp objects allow to capture all performed pre-processing steps along
with the used parameter class within the @processHistory slot. Storing also
the parameter class ensures the highest possible degree of analysis
documentation and in future might enable to replay analyses or parts of it.
The list of all performed preprocessings can be extracted using the
processHistory method.
processHistory(xdata)## [[1]]
## Object of class "XProcessHistory"
##  type: Peak detection 
##  date: Wed Feb 20 18:40:56 2019 
##  info:  
##  fileIndex: 1,2,3,4,5,6,7,8,9,10,11,12 
##  Parameter class: CentWaveParam 
##  MS level(s) 1 
## 
## [[2]]
## Object of class "XProcessHistory"
##  type: Peak grouping 
##  date: Wed Feb 20 18:42:23 2019 
##  info:  
##  fileIndex: 1,2,3,4,5,6,7,8,9,10,11,12 
##  Parameter class: PeakDensityParam 
## 
## [[3]]
## Object of class "XProcessHistory"
##  type: Retention time correction 
##  date: Wed Feb 20 18:42:26 2019 
##  info:  
##  fileIndex: 1,2,3,4,5,6,7,8,9,10,11,12 
##  Parameter class: PeakGroupsParam 
##  MS level(s) 1 
## 
## [[4]]
## Object of class "XProcessHistory"
##  type: Peak grouping 
##  date: Wed Feb 20 18:42:44 2019 
##  info:  
##  fileIndex: 1,2,3,4,5,6,7,8,9,10,11,12 
##  Parameter class: PeakDensityParam 
## 
## [[5]]
## Object of class "XProcessHistory"
##  type: Missing peak filling 
##  date: Wed Feb 20 18:42:46 2019 
##  info:  
##  fileIndex: 1,2,3,4,5,6,7,8,9,10,11,12 
##  Parameter class: FillChromPeaksParamIt is also possible to extract specific processing steps by specifying its
type. Available types can be listed with the processHistoryTypes
function. Below we extract the parameter class for the alignment/retention time
adjustment step.
ph <- processHistory(xdata, type = "Retention time correction")
ph## [[1]]
## Object of class "XProcessHistory"
##  type: Retention time correction 
##  date: Wed Feb 20 18:42:26 2019 
##  info:  
##  fileIndex: 1,2,3,4,5,6,7,8,9,10,11,12 
##  Parameter class: PeakGroupsParam 
##  MS level(s) 1And we can also extract the parameter class used in this preprocessing step.
## Access the parameter
processParam(ph[[1]])## Object of class:  PeakGroupsParam 
## Parameters:
##  minFraction: 0.85 
##  extraPeaks: 1 
##  smooth: loess 
##  span: 0.2 
##  family: gaussian 
##  number of peak groups: 39XCMSnEx objects can be subsetted/filtered using the [ method, or one of the
many filter* methods. All these methods aim to ensure that the data in the
returned object is consistent. This means for example that if the object is
subsetted by selecting specific spectra (by using the [ method) all identified
chromatographic peaks are removed. Correspondence results (i.e. identified
features) are removed if the object is subsetted to contain only data from
selected files (using the filterFile method). This is because the
correspondence results depend on the files on which the analysis was performed -
running a correspondence on a subset of the files would lead to different
results.
As an exception, it is possible to force keeping adjusted retention times in the
subsetted object setting the keepAdjustedRtime argument to TRUE in any of
the subsetting methods.
Below we subset our results object the data for the files 2 and 4.
subs <- filterFile(xdata, file = c(2, 4))
## Do we have identified chromatographic peaks?
hasChromPeaks(subs)## [1] TRUEPeak detection is performed separately on each file, thus the subsetted object contains all identified chromatographic peaks from the two files. However, we used a retention time adjustment (alignment) that was based on available features. All features have however been removed and also the adjusted retention times (since the alignment based on features that were identified on chromatographic peaks on all files).
## Do we still have features?
hasFeatures(subs)## [1] FALSE## Do we still have adjusted retention times?
hasAdjustedRtime(subs)## [1] FALSEWe can however use the keepAdjustedRtime argument to force keeping the
adjusted retention times.
subs <- filterFile(xdata, keepAdjustedRtime = TRUE)
hasAdjustedRtime(subs)## [1] TRUEThe filterRt method can be used to subset the object to spectra within a
certain retention time range.
subs <- filterRt(xdata, rt = c(3000, 3500))
range(rtime(subs))## [1] 3000.027 3499.994Filtering by retention time does not change/affect adjusted retention times (also, if adjusted retention times are present, the filtering is performed on the adjusted retention times).
hasAdjustedRtime(subs)## [1] TRUEAlso, we have all identified chromatographic peaks within the specified retention time range:
hasChromPeaks(subs)## [1] TRUErange(chromPeaks(subs)[, "rt"])## [1] 3001.260 3499.994The most natural way to subset any object in R is with [. Using [ on an
XCMSnExp object subsets it keeping only the selected spectra. The index i
used in [ has thus to be an integer between 1 and the total number of spectra
(across all files). Below we subset xdata using both [ and filterFile to
keep all spectra from one file.
## Extract all data from the 3rd file.
one_file <- filterFile(xdata, file = 3)
one_file_2 <- xdata[fromFile(xdata) == 3]
## Is the content the same?
all.equal(spectra(one_file), spectra(one_file_2))## [1] TRUEWhile the spectra-content is the same in both objects, one_file contains also
the identified chromatographic peaks while one_file_2 does not. Thus, in most
situations subsetting using one of the filter functions is preferred over the
use of [.
## Subsetting with filterFile preserves chromatographic peaks
head(chromPeaks(one_file))##         mz mzmin mzmax       rt    rtmin    rtmax      into      intb   maxo
## CP1303 316   316   316 2517.029 2501.379 2535.808  741708.6  727319.6  27816
## CP1304 333   333   333 2515.464 2501.379 2540.503  960445.7  942106.5  33992
## CP1305 338   338   338 2517.029 2501.379 2540.503  788766.4  758962.1  29288
## CP1306 332   332   332 2517.029 2501.379 2545.198 4870388.0 4768520.5 169280
## CP1307 315   315   315 2515.464 2501.379 2540.503 3714288.6 3634444.9 131136
## CP1308 337   337   337 2517.029 2501.379 2549.893 4356732.5 4216751.1 142720
##        sn sample is_filled
## CP1303 47      1         0
## CP1304 42      1         0
## CP1305 35      1         0
## CP1306 92      1         0
## CP1307 95      1         0
## CP1308 83      1         0## Subsetting with [ not
head(chromPeaks(one_file_2))## Warning in .local(object, ...): No chromatographic peaks available.## NULLNote however that also [ does support the keepAdjustedRtime argument. Below
we subset the object to spectra 20:30.
subs <- xdata[20:30, keepAdjustedRtime = TRUE]## Warning in xdata[20:30, keepAdjustedRtime = TRUE]: Removed preprocessing
## resultshasAdjustedRtime(subs)## [1] TRUE## Access adjusted retention times:
rtime(subs)## F01.S0020 F01.S0021 F01.S0022 F01.S0023 F01.S0024 F01.S0025 F01.S0026 
##  2535.977  2537.542  2539.107  2540.672  2542.237  2543.802  2545.367 
## F01.S0027 F01.S0028 F01.S0029 F01.S0030 
##  2546.932  2548.497  2550.062  2551.627## Access raw retention times:
rtime(subs, adjusted = FALSE)## F01.S0020 F01.S0021 F01.S0022 F01.S0023 F01.S0024 F01.S0025 F01.S0026 
##  2531.112  2532.677  2534.242  2535.807  2537.372  2538.937  2540.502 
## F01.S0027 F01.S0028 F01.S0029 F01.S0030 
##  2542.067  2543.632  2545.197  2546.762As with MSnExp and OnDiskMSnExp objects, [[ can be used to extract a
single spectrum object from an XCMSnExp object. The retention time of the
spectrum corresponds to the adjusted retention time if present.
## Extract a single spectrum
xdata[[14]]## Object of class "Spectrum1"
##  Retention time: 42:7 
##  MSn level: 1 
##  Total ion count: 445 
##  Polarity: NAAt last we can also use the split method that allows to split an XCMSnExp
based on a provided factor f. Below we split xdata per file. Using
keepAdjustedRtime = TRUE ensures that adjusted retention times are not
removed.
x_list <- split(xdata, f = fromFile(xdata), keepAdjustedRtime = TRUE)
lengths(x_list)##    1    2    3    4    5    6    7    8    9   10   11   12 
## 1278 1278 1278 1278 1278 1278 1278 1278 1278 1278 1278 1278lapply(x_list, hasAdjustedRtime)## $`1`
## [1] TRUE
## 
## $`2`
## [1] TRUE
## 
## $`3`
## [1] TRUE
## 
## $`4`
## [1] TRUE
## 
## $`5`
## [1] TRUE
## 
## $`6`
## [1] TRUE
## 
## $`7`
## [1] TRUE
## 
## $`8`
## [1] TRUE
## 
## $`9`
## [1] TRUE
## 
## $`10`
## [1] TRUE
## 
## $`11`
## [1] TRUE
## 
## $`12`
## [1] TRUENote however that there is also a dedicated splitByFile method instead for
that operation, that internally uses filterFile and hence does e.g. not remove
identified chromatographic peaks. The method does not yet support the
keepAdjustedRtime parameter and thus removes by default adjusted retention
times.
xdata_by_file <- splitByFile(xdata, f = factor(1:length(fileNames(xdata))))
lapply(xdata_by_file, hasChromPeaks)## $`1`
## [1] TRUE
## 
## $`2`
## [1] TRUE
## 
## $`3`
## [1] TRUE
## 
## $`4`
## [1] TRUE
## 
## $`5`
## [1] TRUE
## 
## $`6`
## [1] TRUE
## 
## $`7`
## [1] TRUE
## 
## $`8`
## [1] TRUE
## 
## $`9`
## [1] TRUE
## 
## $`10`
## [1] TRUE
## 
## $`11`
## [1] TRUE
## 
## $`12`
## [1] TRUEMost methods in xcms support parallel processing. Parallel processing is
handled and configured by the BiocParallel Bioconductor package and can be
globally defined for an R session.
Unix-based systems (Linux, macOS) support multicore-based parallel
processing. To configure it globally we register the parameter class. Note also
that bpstart is used below to initialize the parallel processes.
register(bpstart(MulticoreParam(2)))Windows supports only socket-based parallel processing:
register(bpstart(SnowParam(2)))Note that multicore-based parallel processing might be buggy or failing on
macOS. If so, the DoparParam could be used instead (requiring the foreach
package).
For other options and details see the vignettes from the BiocParallel package.
1. Saghatelian A, Trauger SA, Want EJ, Hawkins EG, Siuzdak G, Cravatt BF: Assignment of endogenous substrates to enzymes by global metabolite profiling. Biochemistry 2004, 43:14332–9.
2. Tautenhahn R, Böttcher C, Neumann S: Highly sensitive feature detection for high resolution LC/MS. BMC Bioinformatics 2008, 9:504.
3. Smith R, Ventura D, Prince JT: LC-MS alignment in theory and practice: a comprehensive algorithmic review. Briefings in bioinformatics 2013, 16:bbt080–117.
4. Prince JT, Marcotte EM: Chromatographic alignment of ESI-LC-MS proteomics data sets by ordered bijective interpolated warping. Analytical chemistry 2006, 78:6140–6152.
5. Smith CA, Want EJ, O’Maille G, Abagyan R, Siuzdak G: XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Analytical chemistry 2006, 78:779–787.