---
title: "Finding local maxima with MassSpecWavelet"
author:
- name: Sergio Oller
  affiliation:
  - &id Institute for Bioengineering of Catalonia, Barcelona, Spain
package: MassSpecWavelet
output:
  BiocStyle::html_document
vignette: >
  %\VignetteIndexEntry{Finding local maxima}
  %\VignetteKeywords{Peak Detection}
  %\VignetteKeywords{Wavelet}
  %\VignetteKeywords{MassSpecWavelet}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Introduction

As described in the introductory vignette, one of the steps of the peak detection
process in the MassSpecWavelet package relies on the detection of local maxima
on the wavelet coefficients.

On some benchmarks (not shown), this local maxima detection step accounted for the largest
computational cost of the MassSpecWavelet peak detection algorithm.

MassSpecWavelet version 1.63.3 provides three algorithms for detecting local
maxima, named "classic", "faster" and "new".

- "classic" is the algorithm found in previous versions. As a reference implementation.
- "faster" is a new and faster implementation of the "classic" algorithm. It 
  gives identical results to "classic" in less time. It's the default algorithm.
- "new" is a new algorithm for local maxima detection, that has different behaviour
  and therefore gives different results. It is experimental and not used by default.

You can switch between the algorithms by setting the `"MassSpecWavelet.localMaximum.algorithm"`
option as follows:

```
options("MassSpecWavelet.localMaximum.algorithm" = "classic")
```

In a future version, we may remove "classic", replacing it with "faster" since
they both use the exact same criteria. The algorithm option is experimental, and in
future versions we may change the API or the algorithm names. While the
MassSpecWavelet package has a very stable API, this stability does not apply
to this option. Feedback is very much welcome on https://github.com/zeehio/MassSpecWavelet/issues/.


The `"new"` algorithm addresses some issues found in the "classic" algorithm, as
well as in its "faster" implementation. In this vignette:

- We show the issues of the classic `localMaximum()` detection algorithm
- We describe the new algorithm
- We compare the required CPU time of the three implementations

We are still exploring the impact of this new `localMaxima()` algorithm on the
peak detection pipeline provided by MassSpecWavelet. For now, the peak
detection pipeline uses the `"faster"` algorithm.

If you want to try using the `"new"` local maxima algorithm within the
peak detection pipeline, you can use
`options("MassSpecWavelet.localMaximum.algorithm" = "new")`. Consider that
experimental though.


```{r}
library(MassSpecWavelet)
```


# Problems with the classic algorithm

The `"classic"` `localMaximum` algorithm implements the local maxima detection
using a two partially overlapping non-sliding windows: One that starts at the beginning of
the signal, and another one starting at half the window size.

The documentation reflected this behavior in the Details section:

> Instead of find the local maximum by a slide window, which slide all possible
> positions, we find local maximum by transform the vector as matrix, then get
> the the maximum of each column. This operation is performed twice with 
> vecctor shifted half of the winSize. The main purpose of this is
> to increase the efficiency of the algorithm.

While it's true that this makes the algorithm faster, it does so at the expense of
missing some local maxima under some conditions. See for instance the following artificial signal:


```{r}
onetwothree <- c(1,2,3)
x <- c(0, onetwothree, onetwothree, 0, 0, 0, onetwothree, 1, onetwothree, onetwothree, 0)
plot(x, type = "o", main = "Five peaks are expected")
```

If we use a sliding window of four points we can detect the five peaks (if you
actually look at each maximum you will be able to draw some window that contains
the maximum and it has at least four points inside). So when we
use the new algorithm:

```{r}
options("MassSpecWavelet.localMaximum.algorithm" = "new")
local_max <- which(localMaximum(x, winSize = 4) > 0)
plot(x, type = "o", main = "With the new algorithm, 5/5 peaks are found")
points(local_max, x[local_max], col = "red", pch = 20)

```
However with the classic algorithm, there are two peaks we are missing:

```{r}
options("MassSpecWavelet.localMaximum.algorithm" = "classic")
local_max <- which(localMaximum(x, winSize = 4) > 0)
plot(x, type = "o", main = "With the classic algorithm, 3/5 peaks are found")
points(local_max, x[local_max], col = "red", pch = 20)
```

While this is less likely to happen with longer window sizes, and the default window
size in the peak detection is `2*scale+1`, this efficiency shortcut may have 
some actual impact on the peak detection of MassSpecWavelet.

I haven't measured how often this happens in real mass spectrometry samples. Feel
free to compare the results and get back to me, I guess the impact will be rather
small, otherwise this issue would have been noticed earlier.

Benchmarking the performance of the MassSpecWavelet peak detection algorithm, this
local maximum detection was also the most computationally intensive part in my
tests, so finding a more efficient (and more correct) algorithm seemed a worth
pursuing goal to improve the package.

# The new local maximum algorithm

We want an algorithm to detect local maximum using a sliding window. Since we
are putting effort in correctness, we want to define it properly.

A point in the signal will be reported as a local maximum if one of these
conditions are met:

- We can draw some window of size `winSize` that (a) contains our point (b) our
  point is not in the border of the window and (c) is the largest value of the
  window.

- If a point is part of a plateau of constant values, the center of the plateau
  will be considered as the maximum. If the plateau has an even size, it will be
  the first of the two points that could be the center of the plateau.

To give an example of these plateaus (very unlikely to happen with floating point
values) see this synthetic example and how the two algorithms behave:

```{r}
x <- c(0, 1, 2, 3, 3, 3, 3, 2, 1, 0, 3, 0, 3, 3, 3, 3, 3, 0)
x <- c(x, 0, 1, 2, 3, 3, 3, 2, 1, 0, 3, 0, 0, 0, 3, 3, 3, 0, 0)
options("MassSpecWavelet.localMaximum.algorithm" = "classic")
local_max_classic <- which(localMaximum(x, winSize = 5) > 0)
options("MassSpecWavelet.localMaximum.algorithm" = "new")
local_max_new <- which(localMaximum(x, winSize = 5) > 0)
par(mfrow = c(2, 1))
plot(x, type = "o", main = "With the classic algorithm, 2/6 peaks are found")
points(local_max_classic, x[local_max_classic], col = "red", pch = 20)
plot(x, type = "o", main = "With the new algorithm, 6/6 peaks are found")
points(local_max_new, x[local_max_new], col = "blue", pch = 20)

```

While these plateaus are corner cases in real scenarios very unlikely to happen,
it is worth having a well-defined and well-behaved implementation.

# Computational cost

Figure \@ref(fig:benchmark) shows the computational time to find the local maxima
using the `"new"` algorithm and the `"classic"` algorithm (in both the `"classic"` and
the `"faster"` implementations).

```{r benchmark, fig.cap="CPU time vs length, for several algorithms and windows"}
# Run this interactively
set.seed(5413L)
winSizes <- c(5, 31, 301)
xlengths <- c(20, 200, 2000, 20000, 200000)
out <- vector("list", length(winSizes) * length(xlengths))
i <- 0L
for (winSize in winSizes) {
    for (xlength in xlengths) {
        i <- i + 1L
        x <- round(10*runif(xlength), 1)*10
        bm1 <- as.data.frame(
                bench::mark(
                classic = {
                    options(MassSpecWavelet.localMaximum.algorithm = "classic")
                    localMaximum(x, winSize = winSize)
                    
                },
                faster = {
                    options(MassSpecWavelet.localMaximum.algorithm = "faster")
                    localMaximum(x, winSize = winSize)
                    
                },
                check = TRUE,
                filter_gc = FALSE,
                time_unit = "ms"
            )
        )
        bm2 <- as.data.frame(
                bench::mark(
                new = {
                    options(MassSpecWavelet.localMaximum.algorithm = "new")
                    localMaximum(x, winSize = winSize)
                    
                },
                check = FALSE,
                filter_gc = FALSE,
                time_unit = "ms"
            )
        )
        out[[i]] <- data.frame(
            algorithm = c(as.character(bm1$expression), as.character(bm2$expression)),
            min_cpu_time_ms = c(as.numeric(bm1$min), as.numeric(bm2$min)),
            xlength = xlength,
            winSize = winSize
        )
    }
}
out2 <- do.call(rbind, out)
plot(
    x = out2$xlength,
    y = out2$min_cpu_time_ms,
    col = ifelse(out2$algorithm == "new", "red", ifelse(out2$algorithm == "faster", "darkgreen", "blue")),
    pch = ifelse(out2$winSize == 5, 1, ifelse(out2$winSize == 31, 2, 3)),
    log = "xy",
    xlab = "Signal length",
    ylab = "Min CPU time (ms)"
)
legend(
    "topleft",
    legend = c("New algorithm", "Classic algorithm", "Faster algorithm",
               "winSize = 5", "winSize = 31", "winSize = 301"),
    lty = c(1,1,1, 0, 0, 0),
    pch = c(NA, NA, NA,1, 2, 3),
    col = c("red", "blue", "darkgreen", "black", "black", "black")
)
```

# Comparison on a real signal

Load the data, get the wavelet coefficients:

```{r}
data(exampleMS)
scales <- seq(1, 64, 1)
wCoefs <- cwt(exampleMS, scales = scales, wavelet = "mexh")
```

Let us focus on a small range of our signal, with one peak on the left and two
peaks on the right, closer to each other:

```{r}
plotRange <- c(8000, 9000)
```

```{r}
plot(exampleMS, xlim = plotRange, type = "l")
```

When plotting the wavelet coefficients we see that small scales see both peaks
separated and larger scales merge them together. This is standard behaviour.

```{r}
matplot(
    wCoefs[,ncol(wCoefs):1], 
    type = "l",
    col = rev(rainbow(64, start = 0.7, end = 0.1, alpha = 0.5)[scales]),
    lty = 1,
    xlim = c(8000, 9000),
    xlab = "m/z index",
    ylab = "CWT coefficients"
)
legend(
    x = "topright",
    legend = sprintf("scales = %d", scales[seq(1, length(scales), length.out = 4)]),
    lty = 1,
    col = rainbow(64, start = 0.7, end = 0.1)[scales[seq(1, length(scales), length.out = 4)]]
)

```

Find local maxima on the wavelet coefficients and plot them:

```{r }
options(MassSpecWavelet.localMaximum.algorithm = "new")
localMax_new <- getLocalMaximumCWT(wCoefs)
options(MassSpecWavelet.localMaximum.algorithm = "classic")
localMax_classic <- getLocalMaximumCWT(wCoefs)
```


```{r localMax, width=10, height=8,fig.align='center',fig.cap="Identified local maxima of CWT coefficients at each scale"}
par(mfrow = c(2,1))
plotLocalMax(localMax_classic, NULL, range=c(plotRange[1], plotRange[2]), main = "classic")
plotLocalMax(localMax_new, NULL, range=c(plotRange[1], plotRange[2]), main = "new")
```

The new algorithm picks more maxima than the classic one, but for this signal
there is not much difference in performance for our relevant peaks.

To be explored is how this new algorithm affects the whole pipeline in a larger
range of mass spectrometry signals, where we may find a larger dynamic range of
peaks. Once this full exploration is finished, we may discuss whether to change
the `localMaximum()` algorithm or not.


# Session Information

```{r}
sessionInfo()
```