---
title: "_Cardinal 3_: Statistical methods for mass spectrometry imaging"
author: "Kylie Ariel Bemis"
date: "Revised: April 24, 2024"
output:
  BiocStyle::html_document:
  toc: true
vignette: >
  %\VignetteIndexEntry{2. Cardinal 3: Statistical methods for mass spectrometry imaging}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r style, echo=FALSE, results='asis'}
BiocStyle::markdown()
```

```{r setup, echo=FALSE, message=FALSE}
library(Cardinal)
setCardinalVerbose(FALSE)
```

# Introduction 

*Cardinal 3* provides statistical methods for both supervised and unsupervised analysis of mass spectrometry (MS) imaging experiments. Class comparison can also be performed, provided an appropriate experimental design and sample size.

Before statistical analysis, it is important to identify the statistical goal of the experiment:

- __Unsupervised analysis__. The data has no class labels or conditions, and we are interested in exploratory analysis to *discover* regions of interest in the data.

- __Supervised analysis__. The data has class labels and we want to train a statistical or machine learning model to *predict* the class labels of new data.

- __Class comparison__. The data has class labels or conditions, and we want to *test* whether the abundance of the mass features is different between conditions.

*CardinalWorkflows* provides real experimental data and more detailed discussion of the statistical methods than will be covered in this brief overview.


# Exploratory analysis

Suppose we are exploring an unlabeled dataset, and wish to understand the structure of the data.

```{r eda-data, fig.height=4, fig.width=5, fig.align='center'}
set.seed(2020)
mse <- simulateImage(preset=2, dim=c(32,32), sdnoise=0.5,
	peakheight=c(2,4), representation="centroid")

mse$design <- makeFactor(circle=mse$circle,
	square=mse$square, bg=!(mse$circle | mse$square))

image(mse, "design")
```

```{r eda-image, fig.height=4, fig.width=9}
image(mse, i=c(5, 13, 21), layout=c(1,3))
```

## Principal components analysis (PCA)

Principal components analysis is an unsupervised dimension reduction technique. It reduces the data to some number of "principal components" that are a linear combination of the original mass features, where each component is orthogonal to the last, and explains as much of the variance in the data as possible.

Use `PCA()` to perform PCA on a `MSImagingExperiment`.

```{r pca}
pca <- PCA(mse, ncomp=3)
pca
```

We can see that the first 2 principal components explain most of the variation in the data.

```{r pca-image, fig.height=4, fig.width=9}
image(pca, type="x", superpose=FALSE, layout=c(1,3), scale=TRUE)
```

The loadings of the components show how each feature contributes to each component.

```{r pca-loadings, fig.height=3, fig.width=9}
plot(pca, type="rotation", superpose=FALSE, layout=c(1,3), linewidth=2)
```

Plotting the principal component scores against each other is a useful way of visualization the separation between data classes.

```{r pca-scores, fig.height=4, fig.width=9}
plot(pca, type="x", groups=mse$design, linewidth=2)
```

## Non-negative matrix factorization (NMF)

Non-negative matrix factorization is a popular alternative to PCA when the data is naturally non-negative. The main difference between PCA and NMF is that, for NMF, all of the loadings are required to be non-negative.

Use `NMF()` to perform NMF on a `MSImagingExperiment`.

```{r nmf}
nmf <- NMF(mse, ncomp=3)
nmf
```

We can see that NMF can pick up the variation somewhat better when the data is non-negative, as is the case for mass spectra. As before, we still only need 2 components.

```{r nmf-image, fig.height=4, fig.width=9}
image(nmf, type="x", superpose=FALSE, layout=c(1,3), scale=TRUE)
```

As with PCA, the loadings of the NMF components show how each feature contributes to each component. The NMF components can be easier to interpret as they must be non-negative.

```{r nmf-loadings, fig.height=3, fig.width=9}
plot(nmf, type="activation", superpose=FALSE, layout=c(1,3), linewidth=2)
```

Plotting the principal component scores against each other is a useful way of visualization the separation between data classes.

```{r nmf-scores, fig.height=4, fig.width=9}
plot(nmf, type="x", groups=mse$design, linewidth=2)
```


## Feature colocalization

Finding other mass features colocalized with a particular image is a common task in analysis of MS imaging experiments.

Use `colocalize()` to find mass features that are colocalized with another image.

```{r colocalized}
coloc <- colocalized(mse, mz=1116)
coloc
```

By default, Pearson correlation is used to rank the colocalized features. Manders overlap coefficient (MOC), colocalization coefficients (M1 and M2), and Dice scores are also provided.

```{r colocalized-images, fig.height=4, fig.width=9}
image(mse, mz=coloc$mz[1:3], layout=c(1,3))
```


# Image segmentation

Segmentation (clustering) a dataset is a useful way to summarize an MS imaging experiment and discover regions of interest within the sample.

## Spatial shrunken centroids clustering

Spatially-aware nearest shrunken centroids clustering allows simultaneous image segmentation and feature selection.

A smoothing radius `r`, initial number of clusters `k`, and sparsity parameters `s` must be provided.

The larger the sparsity parameter `s`, the fewer mass features will contribute to the segmentation.

Spatial shrunken centroids may result in fewer clusters than the initial number of clusters `k`, so it is recommended to use a value for `k` that is larger than the expected number of clusters, and allow the method to automatically choose the number of clusters.

```{r ssc-clustering}
set.seed(2020)
ssc <- spatialShrunkenCentroids(mse, r=1, k=3, s=c(0,6,12,18))
ssc
```

Plotting the predicted cluster probabilities shows a clear segmentation into the ground truth image.

```{r ssc-image, fig.height=4, fig.width=9}
image(ssc, i=2:4, type="probability", layout=c(1,3))
```

Spatial shrunken centroids calculates t-statistics for each segment and each mass feature. These t-statistics a measure of the difference between the cluster center and the global mean.


```{r ssc-statistic, fig.height=3, fig.width=9}
plot(ssc, i=2:4, type="statistic", layout=c(1,3),
	linewidth=2, annPeaks="circle")
```

Mass features with t-statistics of zero do not contribute to the segmentation. The sign of the t-statistic indicates whether the mass feature is over- or under-expressed in the given cluster relative to the global mean.

Use `topFeatures()` to rank mass features by t-statistic.


```{r ssc-top, fig.height=4, fig.width=9}
ssc_top <- topFeatures(ssc[[4L]])
ssc_top

ssc_top_cl3 <- subset(ssc_top, class==3)
image(mse, mz=ssc_top_cl3$mz[1:3], layout=c(1,3))
```


## Spatial Dirichlet Gaussian mixture modeling

Spatially-aware Dirichlet Gaussian mixture models (spatial-DGMM) is a method of image segmentation applied to each mass feature individually, rather than the dataset as a whole.

This is useful for summarizing molecular ion images, and for discovering structures that clustering using all mass features together may miss.

```{r dgmm}
set.seed(2020)
dgmm <- spatialDGMM(mse, r=1, k=3, weights="adaptive")
dgmm
```

A different segmentation is fit for each mass feature.

```{r dgmm-image, fig.height=4, fig.width=9}
image(dgmm, i=c(5, 13, 21), layout=c(1,3))
```

Each image is modeled as a mixture of Gaussian distributions.

```{r dgmm-plot, fig.height=3, fig.width=9}
plot(dgmm, i=c(5, 13, 21), layout=c(1,3), linewidth=2)
```

Spatial-DGMM segmentations can be especially useful for finding mass features colocalized with a region-of-interest.

When applied to a `SpatialDGMM` object, `colocalize()` is able to use match scores that can have a higher specificity than using Pearson correlation on the raw ion images.

```{r dgmm-colocalized, fig.height=4, fig.width=9}
coloc2 <- colocalized(dgmm, mse$square)
coloc2

image(mse, mz=coloc2$mz[1:3], layout=c(1,3))
```


# Classification and cross-validation

Classification of pixels into different known classes (e.g., cancer vs normal) based on the mass spectra is a common application for MS imaging.

```{r classification-data, fig.height=4, fig.width=9}
set.seed(2020)
mse2 <- simulateImage(preset=7, dim=c(32,32), sdnoise=0.3,
	nrun=3, peakdiff=2, representation="centroid")

mse2$class <- makeFactor(A=mse2$circleA, B=mse2$circleB)

image(mse2, "class", layout=c(1,3))
```


```{r classification-images, fig.height=4, fig.width=9}
image(mse2, i=1, layout=c(1,3))
```

When performing classification, it is important to use cross-validation so that reported accuracies are not overly optimistic.

We strongly recomend making sure that all spectra from the same experiment run belong to the same fold, to reduce predictive bias due to run effects.


## Projection to latent structures (PLS)

Projection to latent structures (PLS), also called partial least squares, is a supervised dimension reduction technique. It can be thought of as being similar to PCA, but for classification or regression.

```{r pls-cv}
cv_pls <- crossValidate(PLS, x=mse2, y=mse2$class, ncomp=1:15, folds=run(mse2))
cv_pls
```

We can see that the accuracy maxes out after 11 PLS components.

```{r pls}
pls <- PLS(mse2, y=mse2$class, ncomp=11)
pls
```

We can plot the fitted response values to visualize the prediction.

```{r pls-image, fig.height=4, fig.width=9}
image(pls, type="response", layout=c(1,3), scale=TRUE)
```

The PLS regression coefficients can be used to find influential features.

```{r pls-coefficients, fig.height=3, fig.width=9}
plot(pls, type="coefficients", linewidth=2, annPeaks="circle")
```

Like PCA or NMF, it can be useful to plot the PLS scores against each other to visualize the separation between classes.

```{r pls-scores, fig.height=4, fig.width=9}
plot(pls, type="scores", groups=mse2$class, linewidth=2)
```

Note that orthgonal PLS (O-PLS) is also available via `method="opls"` or by using the separate `OPLS()` method. Typically, both methods perform similarly, although O-PLS can sometimes produce more easily interpretable regression coefficients.


## Spatial shrunken centroids classification

Spatially-aware nearest shrunken centroids classification is an extension of nearest shrunken centroids that incorporates spatial information into the model.

Like in the clustering case of spatial shrunken centroids, a smoothing radius `r` must be provided along with sparsity parameters `s`.

```{r ssc-cv}
cv_ssc <- crossValidate(spatialShrunkenCentroids, x=mse2, y=mse2$class,
	r=2, s=c(0,3,6,9,12,15,18), folds=run(mse2))
cv_ssc
```

We can see that in this case, the model with `s=9` has the best cross-validated accuracy for the data. However, it does not perform as well as the PLS model.

```{r ssc-classification}
ssc2 <- spatialShrunkenCentroids(mse2, y=mse2$class, r=2, s=9)
ssc2
```

Again, we can plot the predicted class probabilities to visualize the prediction.

```{r ssc-image-2, fig.height=4, fig.width=9}
image(ssc2, type="probability", layout=c(1,3),
	subset=mse2$circleA | mse2$circleB)
```

Plotting t-statistics shows the peaks with lower *m/z* values have a higher abundance in class "B".

```{r ssc-statistic-2, fig.height=4, fig.width=9}
plot(ssc2, type="statistic", linewidth=2, annPeaks="circle")
```

```{r ssc-top-2}
ssc2_top <- topFeatures(ssc2)

subset(ssc2_top, class == "B")
```


# Class comparison

Statistical hypothesis testing is used to determine whether the abundance of a feature is different between two or more conditions.

In order to account for additional factors like the effect of experimental runs, subject-to-subject variability, etc., this is often done most appropriately using linear models.

This example uses a simple experiment with two conditions "A" and "B", with three replicates in each condition.


```{r test-data, fig.height=7, fig.width=9}
set.seed(2020)
mse3 <- simulateImage(preset=4, npeaks=10, dim=c(32,32), sdnoise=0.3,
	nrun=3, peakdiff=1, representation="centroid")

mse3$trt <- makeFactor(A=mse3$circleA, B=mse3$circleB)

image(mse3, "trt", layout=c(2,3))
```

```{r test-image, fig.height=7, fig.width=9}
image(mse3, i=1, layout=c(2,3))
```

We know from the design of the simulation that the first 5 (of 10) *m/z* values differ between the conditions.

```{r test-diff}
featureData(mse3)
```

## Sample-based means testing

Use `meansTest()` to fit linear models with the most basic summarization. The samples must be specified with `samples`. Each sample is summarized by its mean, and then a linear model is fit to the summaries. In this case, each sample is a separate run.

Here, we specify `condition` as the sole fixed effect. Internally, the model will call either `lm()` or `lme()` depending on whether any random effects are provided.

```{r test-mean-test}
mtest <- meansTest(mse3, ~ condition, samples=run(mse3))
mtest
```

By default, the models are summarized by performing likelihood ratio tests against the null model (with no fixed effects, retaining any random effects).

Box-and-whisker plots can be used to visualize the differences (if any) between the conditions.

```{r test-mean-plot, fig.height=5, fig.width=9}
plot(mtest, i=1:10, layout=c(2,5), ylab="Intensity", fill=TRUE)
```

Use `topFeatures()` to rank the results.

```{r test-mean-top}
mtest_top <- topFeatures(mtest)

subset(mtest_top, fdr < 0.05)
```

We find 4 significant features.


## Segment-based means testing

Testing of `SpatialDGMM` objects is also supported by `meansTest()`. The key idea here is that spatial-DGMM segmentation captures within-sample heterogeneity, so testing between spatial-DGMM segments is more sensitive that simply summarizing a whole sample by its mean.

First, we must segment the data with `spatialDGMM()`, while making sure that each sample is segmented independently (by specifying the samples as `groups`).

```{r test-segment-dgmm}
dgmm2 <- spatialDGMM(mse3, r=2, k=3, groups=run(mse3))
```

Now use `segmentationTest()` to fit the models.


```{r test-segment-test}
stest <- meansTest(dgmm2, ~ condition)

stest
```

As with `meansTest()`, the models are summarized by performing likelihood ratio tests against the null model (with no fixed effects, retaining any random effects).

Box-and-whisker plots can be used to visually compare the conditions.

```{r test-segment-plot, fig.height=5, fig.width=9}
plot(stest, i=1:10, layout=c(2,5), ylab="Intensity", fill=TRUE)
```

Use `topFeatures()` to rank the results.

```{r test-segment-top}
stest_top <- topFeatures(stest)

subset(stest_top, fdr < 0.05)
```

This example is simple enough so we still find the same 4 significant features.


# Session information

```{r session-info}
sessionInfo()
```