--- title: "Introduction to densvis" author: - name: Alan O'Callaghan email: alan.ocallaghan@outlook.com package: densvis output: BiocStyle::html_document: toc_float: yes fig_width: 10 fig_height: 8 bibliography: library.bib vignette: > %\VignetteIndexEntry{Introduction to densvis} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( error = FALSE, warning=FALSE, message=FALSE, collapse = TRUE, comment = "#>" ) library("BiocStyle") ``` # Introduction Non-linear dimensionality reduction techniques such as t-SNE [@Maaten2008] and UMAP [@McInnes2020] produce a low-dimensional embedding that summarises the global structure of high-dimensional data. These techniques can be particularly useful when visualising high-dimensional data in a biological setting. However, these embeddings may not accurately represent the local density of data in the original space, resulting in misleading visualisations where the space given to clusters of data does not represent the fraction of the high dimensional space that they occupy. `densvis` implements the density-preserving objective function described by [@Narayan2020] which aims to address this deficiency by including a density-preserving term in the t-SNE and UMAP optimisation procedures. This can enable the creation of visualisations that accurately capture differing degrees of transcriptional heterogeneity within different cell subpopulations in scRNAseq experiments, for example. # Setting up the data We will illustrate the use of densvis using simulated data. We will first load the `densvis` and `Rtsne` libraries and set a random seed to ensure the t-SNE visualisation is reproducible (note: it is good practice to ensure that a t-SNE embedding is robust by running the algorithm multiple times). ```{r setup} library("densvis") library("Rtsne") library("uwot") library("ggplot2") theme_set(theme_bw()) set.seed(14) ``` ```{r data} data <- data.frame( x = c(rnorm(1000, 5), rnorm(1000, 0, 0.2)), y = c(rnorm(1000, 5), rnorm(1000, 0, 0.2)), class = c(rep("Class 1", 1000), rep("Class 2", 1000)) ) ggplot() + aes(data[, 1], data[, 2], colour = data$class) + geom_point(pch = 19) + scale_colour_discrete(name = "Cluster") + ggtitle("Original co-ordinates") ``` # Running t-SNE Density-preserving t-SNE can be generated using the `densne` function. This function returns a matrix of t-SNE co-ordinates. We set `dens_frac` (the fraction of optimisation steps that consider the density preservation) and `dens_lambda` (the weight given to density preservation relative to the standard t-SNE objective) each to 0.5. ```{r run-densne} fit1 <- densne(data[, 1:2], dens_frac = 0.5, dens_lambda = 0.5) ggplot() + aes(fit1[, 1], fit1[, 2], colour = data$class) + geom_point(pch = 19) + scale_colour_discrete(name = "Class") + ggtitle("Density-preserving t-SNE") + labs(x = "t-SNE 1", y = "t-SNE 2") ``` If we run t-SNE on the same data, we can see that the density-preserving objective better represents the density of the data, ```{r run-tsne} fit2 <- Rtsne(data[, 1:2]) ggplot() + aes(fit2$Y[, 1], fit2$Y[, 2], colour = data$class) + geom_point(pch = 19) + scale_colour_discrete(name = "Class") + ggtitle("Standard t-SNE") + labs(x = "t-SNE 1", y = "t-SNE 2") ``` # Running UMAP A density-preserving UMAP embedding can be generated using the `densmap` function. This function returns a matrix of UMAP co-ordinates. As with t-SNE, we set `dens_frac` (the fraction of optimisation steps that consider the density preservation) and `dens_lambda` (the weight given to density preservation relative to the standard t-SNE objective) each to 0.5. ```{r run-densmap} fit1 <- densmap(data[, 1:2], dens_frac = 0.5, dens_lambda = 0.5) ggplot() + aes(fit1[, 1], fit1[, 2], colour = data$class) + geom_point(pch = 19) + scale_colour_discrete(name = "Class") + ggtitle("Density-preserving t-SNE") + labs(x = "t-SNE 1", y = "t-SNE 2") ``` If we run UMAP on the same data, we can see that the density-preserving objective better represents the density of the data, ```{r run-umap} fit2 <- umap(data[, 1:2]) ggplot() + aes(fit2[, 1], fit2[, 2], colour = data$class) + geom_point(pch = 19) + scale_colour_discrete(name = "Class") + ggtitle("Standard t-SNE") + labs(x = "t-SNE 1", y = "t-SNE 2") ``` # Session information {.unnumbered} ```{r} sessionInfo() ```