--- title: "Smoothclust Tutorial" author: - name: Lukas M. Weber affiliation: "Boston University" package: smoothclust output: BiocStyle::html_document: toc_float: true vignette: > %\VignetteIndexEntry{Smoothclust Tutorial} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Introduction `smoothclust` is a method for segmentation of spatial domains and spatially-aware clustering in spatial transcriptomics data. The method generates spatial domains with smooth boundaries by smoothing gene expression profiles across neighboring spatial locations, followed by unsupervised clustering. Spatial domains consisting of consistent mixtures of cell types may then be further investigated by applying cell type compositional analyses or differential analyses. # Installation The `smoothclust` package can be installed from Bioconductor as follows (using R version 4.4 onwards). This is the recommended installation for most users. Additional details are shown on the [Bioconductor](https://bioconductor.org/packages/smoothclust) package landing page. ```{r, eval=FALSE} install.packages("BiocManager") BiocManager::install("smoothclust") ``` The latest development version of the package can also be installed from the [devel](https://contributions.bioconductor.org/use-devel.html) version of Bioconductor or from [GitHub](https://github.com/lmweber/smoothclust). # Input data format Input data can be provided either as a [SpatialExperiment](https://bioconductor.org/packages/SpatialExperiment) object within the Bioconductor framework, or as numeric matrices of expression values and spatial coordinates. See help file (`?smoothclust`) for details. In the example workflow below, we assume the input is in `SpatialExperiment` format. # Tutorial The example workflow in this section demonstrates how to run `smoothclust` to generate spatial domains with smooth boundaries for a dataset from the 10x Genomics Visium platform. ```{r, message=FALSE} library(smoothclust) library(STexampleData) library(scuttle) library(scran) library(scater) library(ggspavis) ``` Load dataset from `STexampleData` package: ```{r} # load data spe <- Visium_humanDLPFC() # keep spots over tissue spe <- spe[, colData(spe)$in_tissue == 1] dim(spe) assayNames(spe) ``` Run `smoothclust` using default parameter settings, which have been selected to be appropriate for Visium data from human tissue. The method for smoothing can be specified by providing the `method` argument, with available options `uniform`, `kernel`, and `knn`. Additional arguments can be used to set parameter values, including `bandwidth`, `k,` and `truncate`, depending on the choice of method. For more details, see the function documentation (`?smoothclust`) or the paper (in preparation). ```{r, results="hide"} # run smoothclust spe <- smoothclust(spe) ``` The smoothed expression counts are stored in a new assay named `counts_smooth`: ```{r} # check output object assayNames(spe) ``` Calculate log-transformed normalized counts (logcounts) on the smoothed expression counts. Here, the argument `assay.type = "counts_smooth"` specifies that we want to calculate logcounts using the smoothed counts from `smoothclust`. ```{r} # calculate logcounts spe <- logNormCounts(spe, assay.type = "counts_smooth") assayNames(spe) ``` Run clustering. We use a standard clustering workflow from single-cell data, consisting of k-means clustering on the top principal components (PCs) calculated on the set of top highly variable genes (HVGs) with logcounts as the input. We use a relatively small number of clusters for demonstration purposes in this example: ```{r} # preprocessing steps for clustering # remove mitochondrial genes is_mito <- grepl("(^mt-)", rowData(spe)$gene_name, ignore.case = TRUE) table(is_mito) spe <- spe[!is_mito, ] dim(spe) # select top highly variable genes (HVGs) dec <- modelGeneVar(spe) top_hvgs <- getTopHVGs(dec, prop = 0.1) length(top_hvgs) spe <- spe[top_hvgs, ] dim(spe) ``` ```{r} # dimensionality reduction # compute PCA on top HVGs set.seed(123) spe <- runPCA(spe) ``` ```{r} # run k-means clustering set.seed(123) k <- 5 clust <- kmeans(reducedDim(spe, "PCA"), centers = k)$cluster table(clust) colLabels(spe) <- factor(clust) ``` Plot clusters / spatial domains generated by the smoothclust workflow: ```{r} # color palettes pal8 <- "libd_layer_colors" pal36 <- unname(palette.colors(36, "Polychrome 36")) # plot clusters / spatial domains plotSpots(spe, annotate = "label", pal = pal8) ``` Plot manually annotated reference labels, which can be used to evaluate the performance of the clustering in this dataset: ```{r} # plot reference labels plotSpots(spe, annotate = "ground_truth", pal = pal8) ``` # Session information ```{r} sessionInfo() ```