---
title: "Identifying cellular neighborhood with SPIAT"
author: "Yuzhou Feng"
date: "`r Sys.Date()`"
output:
  BiocStyle::html_document:
    self_contained: yes
    toc_float: true
    toc_depth: 4
package: "`r pkg_ver('SPIAT')`"
bibliography: "`r file.path(system.file(package='SPIAT', 'vignettes'), 'introduction.bib')`"    
vignette: >
  %\VignetteIndexEntry{Identifying cellular neighborhood with SPIAT}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  markdown: 
    wrap: 72
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r message=FALSE}
library(SPIAT)
```

# Cellular neighborhood
The aggregation of cells can result in 'cellular neighbourhoods'. A
neighbourhood is defined as a group of cells that cluster together.
These can be homotypic, containing cells of a single class (e.g. immune
cells), or heterotypic (e.g. a mixture of tumour and immune cells).

Function `identify_neighborhoods()` identifies cellular neighbourhoods. 
Users can select a subset of cell types of interest if desired. SPIAT includes 
three algorithms for the detection of neighbourhoods.

-   *Hierarchical Clustering algorithm*: Euclidean distances between
    cells are calculated, and pairs of cells with a distance less than a
    specified radius are considered to be 'interacting', with the rest
    being 'non-interacting'. Hierarchical clustering is then used to
    separate the clusters. Larger radii will result in the merging of
    individual clusters.
-   [*dbscan*](https://cran.r-project.org/web/packages/dbscan/index.html)
-   [*phenograph*](https://github.com/JinmiaoChenLab/Rphenograph)

For *Hierarchical Clustering algorithm* and *dbscan*, users need to
specify a radius that defines the distance for an interaction. We
suggest users to test different radii and select the one that generates
intuitive clusters upon visualisation. Cells not assigned to clusters
are assigned as `Cluster_NA` in the output table. The argument
`min_neighborhood_size` specifies the threshold of a neighborhood size
to be considered as a neighborhood. Smaller neighbourhoods will be
outputted, but will not be assigned a number.

*Rphenograph* uses the number of nearest neighbours to detect clusters.
This number should be specified by `min_neighborhood_size` argument. We
also encourage users to test different values.

For this part of the tutorial, we will use the image `image_no_markers`
simulated with the `spaSim` package. This image contains "Tumour",
"Immune", "Immune1" and "Immune2" cells without marker intensities.

```{r, fig.height = 2.5, out.width = "75%"}
data("image_no_markers")

plot_cell_categories(
  image_no_markers, c("Tumour", "Immune","Immune1","Immune2","Others"),
  c("red","blue","darkgreen", "brown","lightgray"), "Cell.Type")
```

Users are recommended to test out different radii and then visualise the
clustering results. To aid in this process, users can use the
`average_minimum_distance()` function, which calculates the average
minimum distance between all cells in an image, and can be used as a
starting point.

```{r}
average_minimum_distance(image_no_markers)
```

We then identify the cellular neighbourhoods using our hierarchical
algorithm with a radius of 50, and with a minimum neighbourhood size of
100. Cells assigned to neighbourhoods smaller than 100 will be assigned
to the "Cluster_NA" neighbourhood.

```{r}
clusters <- identify_neighborhoods(
  image_no_markers, method = "hierarchical", min_neighborhood_size = 100,
  cell_types_of_interest = c("Immune", "Immune1", "Immune2"), radius = 50, 
  feature_colname = "Cell.Type")
```

This plot shows clusters of "Immune", "Immune1" and "Immune2" cells. Each
number and colour corresponds to a distinct cluster. Black cells
correspond to 'free', un-clustered cells.

We can visualise the cell composition of neighborhoods. To do this, we
can use `composition_of_neighborhoods()` to obtain the percentages of
cells with a specific marker within each neighborhood and the number of
cells in the neighborhood.

In this example we select cellular neighbourhoods with at least 5 cells.

```{r}
neighorhoods_vis <- 
  composition_of_neighborhoods(clusters, feature_colname = "Cell.Type")
neighorhoods_vis <- 
  neighorhoods_vis[neighorhoods_vis$Total_number_of_cells >=5,]
```

Finally, we can use `plot_composition_heatmap()` to produce a heatmap
showing the marker percentages within each cluster, which can be used to
classify the derived neighbourhoods.

```{r, fig.width = 3, fig.height = 3, out.width = "70%"}
plot_composition_heatmap(neighorhoods_vis, feature_colname="Cell.Type")
```

This plot shows that Cluster_1 and Cluster_2 contain all three types of
immune cells. Cluster_3 does not have Immune1 cells. Cluster_1 and
Cluster_2 are more similar to the free cells (cells not assigned to
clusters) in their composition than Cluster_3.

# Average Nearest Neighbour Index (ANNI)

We can test for the presence of neighbourhoods using ANNI. We can
calculate the ANNI with the function `average_nearest_neighbor_index()`,
which takes one cell type of interest (e.g. `Cluster_1` under
`Neighborhood` column of `clusters` object) or a combinations of cell
types (e.g. `Immune1` and `Immune2` cells under `Cell.Type` column of
`image_no_markers` object) and outputs whether there is a clear
neighbourhood (clustered) or unclear (dispersed/random), along with a P
value for the estimate.

Here show the examples for both one cell type and multiple cell types.

```{r}
average_nearest_neighbor_index(clusters, reference_celltypes=c("Cluster_1"), 
                               feature_colname="Neighborhood", p_val = 0.05)
```

```{r}
average_nearest_neighbor_index(
  image_no_markers, reference_celltypes=c("Immune", "Immune1" , "Immune2"), 
  feature_colname="Cell.Type", p_val = 0.05)
```

`p_val` is the cutoff to determine if a pattern is significant or not.
If the p value of ANNI is larger than the threshold, the pattern will be
"Random". Although we give a default p value cutoff of 5e-6, we suggest
the users to define their own cutoff based on the images and how they
define the patterns "Clustered" and "Dispersed".

# You can access the vignettes for other modules of SPIAT here:

 - [Overview of SPIAT](SPIAT.html)
 - [Data reading and formatting](data_reading-formatting.html)
 - [Quality control and visualisation](quality-control_visualisation.html)
 - [Basic analysis](basic_analysis.html)
 - [Cell colocalisation](cell-colocalisation.html)
 - [Spatial heterogeneity](spatial-heterogeneity.html)
 - [Tissue structure](tissue-structure.html)

# Reproducibility

```{r}
sessionInfo()
```

# Author Contributions

AT, YF, TY, ML, JZ, VO, MD are authors of the package code. MD and YF
wrote the vignette. AT, YF and TY designed the package.