Data analysis of GC-MS in Envrionmental Science

Miao Yu

2016-11-30

Analysis is the core of the Environmental Science. When we want to know the amounts of certain pullutions, analytical methods were employed. GC-MS might be the most popular instrument for the development of analytical methond. Previous works such as xcms were devoloped for GC-MS data. However, such packages have limited functions for environmental analysis. For example, the isotope information of certain compounds is missing in almost all of the GC-MS data analysis software. In this package. I added functions for those kind of data analysis. Such feature could not only reveal certain problems, but also help the user find out the unknown patterns in the dataset of GC-MS.

Data structure of GC-MS full scan mode

GC is used for separation and MS is used for detection in a GC-MS system. The singals are intensities of certain mass. So when we perform analysis on certain column in full scan mode, the counts of different mass were collected in each cycle. Each cycle might only last for 500ms or less. Then the next scan begins. Here we could use a matrix to stand for those data. Each column stands for each mass and row stands for the retention time of that scan. Such matrix could be treated as time series data. In this package, we treat such data as matrix type.

For high-resolution MS, building such matrix is tricky. We need to convert the RAW data into low-resolution version to make identity columns for each scan.

Data structure of GC-MS SIM mode

When you perform a selected ions monitor(SIM) mode, only few mass data were collected and each mass would have counts and retention time as a time seris data. In this package, we treat such data as data.frame type.

Data input

You could use getmd to import the mass spectrum data as supported by xcms such as:

data <- getmd('data/data1.CDF')

You could subset the data by index the mass(m/z 100-1000) and retention time range(3.1-25min) in getsubmd function:

data <- getsubmd(data,rt=c(3.1,25),mz=c(100,1000))

You could combined the mass full-scan data with the same range of retention time by combinemd:

data <- combinemd(data1,data2,data3)

Basic visulation of mass spectrum data

You could plot the Total Ion Chromatogram(TIC) for certain RT and mass range.

plottic(data,rt=c(3.1,25),ms=c(100,1000))

You could also plot the mass spectrum for certain RT range. You could use the returned MSP files for NIST search:

plotrtms(data,rt=c(3.1,25),ms=c(100,1000))

The Extracted Ion Chromatogram(EIC) is also support by enviGCMS and the returned data could be analysised for molecular isotopes:

plotmsrt(data,ms=500,rt=c(3.1,25))

You could use plotms to show the heatmap of GC-MS data, which is very useful for exploratory data analysis.

plotms(data)

You could change the retention time into the temprature if it is a constant speed of temperature rising process. But you need show the temprature range.

plott(data,temp = c(100,320))

Data analysis for influnces from GC-MS

enviGCMS supplied many function for decreasing the noisy during the analysis process. findline could be used for find line of the boundary regression model for noisy. comparems could be used to make a point-to-point data subtraction of two full-scan mass spectrum data. plotgroup could be used convert the data matrix into a 0-1 heatmap according to threshold. plotsub could be used to show the self backgroud subtraction of full-scan data. plotsms shows the RSD of the intensity of full scan data. plothist could be used to find the data distribution of the histgram of the intensities of full scan data.

Data analysis for molecular isotope ratio

Some functions could be used to caculate the molecular isotope ratio. EIC data could be import into GetIntergration and return the infomation of found peaks. Getisotoplogues could be used to caculate the molecular isotope ratio of certain molecular. Some shortcut function such as batch and qbatch could be used to caculate molecular isotope ratio for mutiple and single molecular in EIC data.

Summary

In general, enviGCMS could be used to explore the data from GC-MS and extract certain patterns in the data.