---
title: "Real Time Annotation"
date: "2020-10-11"
package: peakPantheR
output:
    BiocStyle::html_document:
        toc_float: true
vignette: >
    %\VignetteIndexEntry{Real Time Annotation}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}
    %\VignetteDepends{peakPantheR,faahKO,pander,BiocStyle}
    %\VignettePackage{peakPantheR}
    %\VignetteKeywords{mass spectrometry, metabolomics}
---

```{r biocstyle, echo = FALSE, results = "asis" }
BiocStyle::markdown()
```
```{r, echo = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)
```

**Package**: `r Biocpkg("peakPantheR")`<br />
**Authors**: Arnaud Wolfer<br />

```{r init, message = FALSE, echo = FALSE, results = "hide" }
## Silently loading all packages
library(BiocStyle)
library(peakPantheR)
library(faahKO)
library(pander)
```

# Introduction

The `peakPantheR` package is designed for the detection, integration and 
reporting of pre-defined features in MS files (_e.g. compounds, fragments, 
adducts, ..._).

The **Real Time Annotation** is set to detect and integrate **multiple** 
compounds in **one** file at a time. 
It therefore can be deployed on a LC-MS instrument to integrate a set of 
pre-defined features (_e.g. spiked standards_) as soon as the acquisition of a 
sample is completed.

Using the `r Biocpkg("faahKO")` raw MS dataset as an example, this vignette 
will:

* Detail the **Real Time Annotation** concept
* Apply the **Real Time Annotation** to a subset of pre-defined features in the 
`r Biocpkg("faahKO")` dataset

## Abbreviations
- **ROI**: _Regions Of Interest_ 
    * reference _RT_ / _m/z_ windows in which to search for a feature
- **uROI**: _updated Regions Of Interest_
    * modifed ROI adapted to the current dataset which override the reference 
    ROI
- **FIR**: _Fallback Integration Regions_ 
    * _RT_ / _m/z_ window to integrate if no peak is found
- **TIC**: _Total Ion Chromatogram_ 
    * the intensities summed across all masses for each scan
- **EIC**: _Extracted Ion Chromatogram_ 
    * the intensities summed over a mass range, for each scan


# Real Time Annotation Concept

Real time compound integration is set to process **multiple** compounds in 
**one** file at a time.

To achieve this, `peakPantheR` will:

* load a list of expected _RT_ / _m/z_ regions of interest (**ROI**)
* detect features in each ROI and keep the highest intensity one
* determine peak statistics for each feature
* return:
    + TIC
    + a table with all detected compounds for that file (_row: compound, col: 
    statistic_)
    + EIC for each ROI
    + sample acquisition date-time from the mzML metadata (_if available_)
    + save EIC plots to disk


# Real Time Annotation Example

In the following example we will target two pre-defined features in a single raw
MS spectra file from the `r Biocpkg("faahKO")` package. For more details on the 
installation and input data employed, please consult the 
[Getting Started with peakPantheR](getting-started.html) vignette.


## Input Data

The path to a MS file from the `r Biocpkg("faahKO")` is located and used as 
input spectra:
```{r}
library(faahKO)
## file paths
input_spectraPath  <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"))
input_spectraPath
```


Two targeted features (_e.g. compounds, fragments, adducts, ..._) are defined 
and stored in a table with as columns:

* `cpdID` (character)
* `cpdName` (character)
* `rtMin` (sec)
* `rtMax` (sec)
* `rt` (sec, optional / `NA`)
* `mzMin` (m/z)
* `mzMax` (m/z)
* `mz` (m/z, optional / `NA`)

```{r, eval = FALSE}
# targetFeatTable
input_targetFeatTable <- data.frame(matrix(vector(), 2, 8, dimnames=list(c(), 
                        c("cpdID", "cpdName", "rtMin", "rt", "rtMax", "mzMin", 
                        "mz", "mzMax"))), stringsAsFactors=FALSE)
input_targetFeatTable[1,] <- c("ID-1", "Cpd 1", 3310., 3344.888, 3390., 
                                522.194778, 522.2, 522.205222)
input_targetFeatTable[2,] <- c("ID-2", "Cpd 2", 3280., 3385.577, 3440., 
                                496.195038, 496.2, 496.204962)
input_targetFeatTable[,c(3:8)] <- sapply(input_targetFeatTable[,c(3:8)], 
                                            as.numeric)
```

```{r, results = "asis", echo = FALSE}
# use pandoc for improved readability
input_targetFeatTable <- data.frame(matrix(vector(), 2, 8, dimnames=list(c(), 
                        c("cpdID", "cpdName", "rtMin", "rt", "rtMax", "mzMin", 
                        "mz", "mzMax"))), stringsAsFactors=FALSE)
input_targetFeatTable[1,] <- c("ID-1", "Cpd 1", 3310., 3344.888, 3390., 
                                522.194778, 522.2, 522.205222)
input_targetFeatTable[2,] <- c("ID-2", "Cpd 2", 3280., 3385.577, 3440., 
                                496.195038, 496.2, 496.204962)
input_targetFeatTable[,c(3:8)] <- sapply(input_targetFeatTable[,c(3:8)], 
                                            as.numeric)
rownames(input_targetFeatTable) <- NULL
pander::pandoc.table(input_targetFeatTable, digits = 9)
```


## Run Single File Annotation

`peakPantheR_singleFileSearch()` takes as input a `singleSpectraDataPath` 
pointing to the file to process and `targetFeatTable` defining the features to 
integrate. The resulting annotation contains all the fitting and integration 
properties:
```{r}
library(peakPantheR)
annotation <- peakPantheR_singleFileSearch(
                                    singleSpectraDataPath = input_spectraPath,
                                    targetFeatTable = input_targetFeatTable,
                                    peakStatistic = TRUE,
                                    curveModel = 'skewedGaussian',
                                    verbose = TRUE)
```
```{r}
annotation$TIC
```
```{r}
## acquisition time cannot be extracted from NetCDF files
annotation$acquTime
```
```{r, eval = FALSE}
annotation$peakTable
```
```{r, results = "asis", echo = FALSE}
# use pandoc for improved readability
pander::pandoc.table(annotation$peakTable, digits = 7)
```
```{r}
annotation$curveFit
```
```{r}
annotation$ROIsDataPoint
```


`peakPantheR_singleFileSearch()` takes multiple parameters that can alter the 
file annotation:

* `peakStatistic` if `TRUE` calculates additional peak statistics: 
_'ppm_error'_, _'rt_dev_sec'_, _'tailing factor'_ and _'asymmetry factor'_
* `plotEICsPath` if not `NA` will save a `.png` of all ROI EICs at the path 
provided (expects `'filepath/filename.png'` for example). If `NA` no plot is 
saved
* `getAcquTime` if `TRUE` the sample acquisition date-time is extracted from the
`mzML` metadata. Acquisition time cannot be extracted from other file formats. 
The additional file access will impact run time
* `FIR` if not `NULL`, defines the Fallback Integration Regions (**FIR**) to 
integrate when a feature is not found.
* `curveModel`, defines the peak-shape model to fit to each EIC. By default,
a _'skewedGaussian'_ model is used. The other alternative is the exponentially
modified gaussian _'emgGaussian'_ model.
* `verbose` if `TRUE` messages calculation progress, time taken and number of 
features found (_total and matched to targets_)
* `...` passes arguments to `findTargetFeatures` to alter peak-picking 
parameters (e.g. the curveModel, the sampling or fitting parameters)



The summary plot generated by `plotEICsPath`, corresponding to the EICs of each 
integrated regions of interest is as follow:
```{r, out.width = "700px", echo = FALSE}
knitr::include_graphics("../man/figures/singleFileSearch_EICsPlot.png")
```

> EICs plot: Each panel correspond to a targeted feature, with the EIC extracted
on the `mzMin`, `mzMax` range found. The red dot marks the RT peak apex, and the
red line highlights the RT peakwidth range found (`rtMin`, `rtMax`)



# See Also

* [Getting Started with peakPantheR](getting-started.html)
* [Parallel Annotation](parallel-annotation.html)
* [Graphical user interface use](peakPantheR-GUI.html)


# Session Information
```{r, echo = FALSE}
devtools::session_info()
```