--- title: "Visualizing Dendrogram using ggtree" author: - name: Guangchuang Yu email: guangchuangyu@gmail.com affiliation: Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University date: "`r Sys.Date()`" output: prettydoc::html_pretty: theme: cayman highlight: github pdf_document: toc: true vignette: > %\VignetteIndexEntry{Visualizing Dendrogram using ggtree} %\VignetteEngine{knitr::rmarkdown} %\usepackage[utf8]{inputenc} %\VignetteEncoding{UTF-8} --- ```{r style, echo=FALSE, results="asis", message=FALSE} knitr::opts_chunk$set(tidy = FALSE, warning = FALSE, message = FALSE) library(yulab.utils) ``` # Introduction ```{r} library(ggtree) library(ggtreeDendro) library(aplot) ``` Clustering is very importance method to classify items into different categories and to infer functions since similar objects tend to behavior similarly. There are more than 200 packages in Bioconductor implement clustering algorithms or employ clustering methods for omic-data analysis. Albeit the methods are important for data analysis, the visualization is quite limited. Most the the packages only have the ability to visualize the hierarchical tree structure using `stats:::plot.hclust()`. This package is design to visualize hierarchical tree structure with associated data (e.g., clinical information collected with the samples) using the powerful in-house developed `r Biocpkg("ggtree")` package. This package implements a set of `autoplot()` methods to display tree structure. We will implement more `autoplot()` methods to support more objects. The output of these `autoplot()` methods is a `ggtree` object, which can be further annotated by adding layers using `r CRANpkg("ggplot2")` syntax. Integrating associated data to annotate the tree is also supported by `r Biocpkg("ggtreeExtra")` package. Here are some demonstrations of using `autoplot()` methods to visualize common hierarchical clustering tree objects. ## `hclust` and `dendrogram` objects These two classes are defined in the `r CRANpkg("stats")` package. ```{r fig.width=16, fig.height=6} d <- dist(USArrests) hc <- hclust(d, "ave") den <- as.dendrogram(hc) p1 <- autoplot(hc) + geom_tiplab() p2 <- autoplot(den) plot_list(p1, p2, ncol=2) ``` ## `linkage` object The class `linkage` is defined in the `r CRANpkg("mdendro")` package. ```{r fig.width=8, fig.height=8} library("mdendro") lnk <- linkage(d, digits = 1, method = "complete") autoplot(lnk, layout = 'circular') + geom_tiplab() + scale_color_subtree(4) + theme_tree() ``` ## `agnes`, `diana` and `twins` objects These classes are defined in the `r CRANpkg("cluster")` package. ```{r fig.width=16, height=6} library(cluster) x1 <- agnes(mtcars) x2 <- diana(mtcars) p1 <- autoplot(x1) + geom_tiplab() p2 <- autoplot(x2) + geom_tiplab() plot_list(p1, p2, ncol=2) ``` ## `pvclust` object The `pvclust` class is defined in the `r CRANpkg("pvclust")` package. ```{r fig.width=8, height=6} library(pvclust) data(Boston, package = "MASS") set.seed(123) result <- pvclust(Boston, method.dist="cor", method.hclust="average", nboot=1000, parallel=TRUE) autoplot(result, label_edge=TRUE, pvrect = TRUE) + geom_tiplab() ``` The `pvclust` object contains two types of p-values: AU (Approximately Unbiased) p-value and BP (Boostrap Probability) value. These values will be automatically labelled on the tree. # Session information Here is the output of sessionInfo() on the system on which this document was compiled: ```{r, echo=FALSE} sessionInfo() ```