--- title: "Data visualization with epiregulon.extra" author: "Xiaosai Yao, Tomasz Włodarczyk" email: "yao.xiaosai@gene.com" output: BiocStyle::html_document: toc: true number_section: true self_contained: true titlecaps: true package: "epiregulon.extra" vignette: > %\VignetteIndexEntry{Data visualization with epiregulon.extra} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Introduction `r BiocStyle::Biocpkg("epiregulon.extra")` is a companion package to `r BiocStyle::Biocpkg("epiregulon")` and provides functions to visualize transcription factor activity and network. It also provides statistical tests to identify transcription factors with differential activity and network topology. This tutorial continues from the reprogram-seq example included in epiregulon. We will use the gene regulatory networks constructed by epiregulon and continue with visualization and network analysis. For full documentation of the `epiregulon` package, please refer to the [epiregulon book](https://xiaosaiyao.github.io/epiregulon.book/). # Installation ```{r, results='hide', message=FALSE, eval=FALSE} if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("epiregulon.extra") ``` # Data preparation To continue with the visualization functions, we will first need the gene expression matrix. We can download the data from `r BiocStyle::Biocpkg("scMultiome")`.
```{r} # load the MAE object library(scMultiome) mae <- scMultiome::reprogramSeq() # expression matrix GeneExpressionMatrix <- mae[["GeneExpressionMatrix"]] rownames(GeneExpressionMatrix) <- rowData(GeneExpressionMatrix)$name reducedDim(GeneExpressionMatrix, "UMAP_Combined") <- reducedDim(mae[["TileMatrix500"]], "UMAP_Combined") # arrange hash_assignment GeneExpressionMatrix$hash_assignment <- factor(as.character( GeneExpressionMatrix$hash_assignment), levels = c("HTO10_GATA6_UTR", "HTO2_GATA6_v2", "HTO8_NKX2.1_UTR", "HTO3_NKX2.1_v2", "HTO1_FOXA2_v2", "HTO4_mFOXA1_v2", "HTO6_hFOXA1_UTR", "HTO5_NeonG_v2" ) ) ``` # Calculate TF activity Next we load the regulon object previously calculated by epiregulon. Here we just use the pruned version of regulon object in which only relevant columns are kept. Using epiregulon, we can calculate activity of transcription factors included in the regulon object. ```{r calculateActivity, results = "hide"} library(epiregulon) library(epiregulon.extra) data(regulon) score.combine <- calculateActivity( expMatrix = GeneExpressionMatrix, regulon = regulon, mode = "weight", method = "weightedMean", exp_assay = "normalizedCounts", normalize = FALSE ) ``` # Perform differential activity We perform a differential analysis of transcription factor activity across groups of cells. This function is a wrapper around `findMarkers` from `r BiocStyle::Biocpkg("scran")`. ```{r differential} library(epiregulon.extra) markers <- findDifferentialActivity( activity_matrix = score.combine, clusters = GeneExpressionMatrix$hash_assignment, pval.type = "some", direction = "up", test.type = "t" ) ``` We can specify the top transcription factors of interest ```{r} markers.sig <- getSigGenes(markers, topgenes = 5) ``` # Visualize the results We visualize the known differential transcription factors by bubble plot ```{r visualization} plotBubble( activity_matrix = score.combine, tf = c("NKX2-1", "GATA6", "FOXA1", "FOXA2"), clusters = GeneExpressionMatrix$hash_assignment ) ``` Then visualize the most differential transcription factors by clusters ```{r} plotBubble( activity_matrix = score.combine, tf = markers.sig$tf, clusters = GeneExpressionMatrix$hash_assignment ) ``` Visualize the known differential transcription factors by violin plot. ```{r} plotActivityViolin( activity_matrix = score.combine, tf = c("NKX2-1", "GATA6", "FOXA1", "FOXA2", "AR"), clusters = GeneExpressionMatrix$hash_assignment ) ``` Visualize the known differential transcription factors by UMAP ```{r, fig.height = 6, fig.width = 9} options(ggrastr.default.dpi=100) ActivityPlot <- plotActivityDim( sce = GeneExpressionMatrix, activity_matrix = score.combine, tf = c("NKX2-1", "GATA6", "FOXA1", "FOXA2", "AR"), dimtype = "UMAP_Combined", label = "Clusters", point_size = 0.3, ncol = 3, rasterise = TRUE ) ActivityPlot ``` In contrast, the gene expression of the transcription factors is very sparse ```{r, fig.height = 6, fig.width = 9} options(ggrastr.default.dpi=100) ActivityPlot <- plotActivityDim( sce = GeneExpressionMatrix, activity_matrix = counts(GeneExpressionMatrix), tf = c("NKX2-1", "GATA6", "FOXA1", "FOXA2", "AR"), dimtype = "UMAP_Combined", label = "Clusters", point_size = 0.3, ncol = 3, limit = c(0, 2), colors = c("grey", "blue"), legend.label = "GEX", rasterise = TRUE ) ActivityPlot ``` Visualize the activity of the selected transcription factors by heat map. Due to the maximum size limit for this vignette, the output is not shown here. ```{r eval=FALSE} plotHeatmapRegulon( sce = GeneExpressionMatrix, tfs = c("GATA6", "NKX2-1"), regulon = regulon, regulon_cutoff = 0, downsample = 1000, cell_attributes = "hash_assignment", col_gap = "hash_assignment", exprs_values = "normalizedCounts", name = "regulon heatmap", column_title_rot = 45, raster_quality=4 ) plotHeatmapActivity( activity = score.combine, sce = GeneExpressionMatrix, tfs = c("GATA6", "NKX2-1", "FOXA1", "FOXA2"), downsample = 5000, cell_attributes = "hash_assignment", col_gap = "hash_assignment", name = "Activity", column_title_rot = 45, raster_quality=3 ) ``` # Geneset enrichment Sometimes we are interested to know what pathways are enriched in the regulon of a particular TF. We can perform geneset enrichment using the enricher function from `r BiocStyle::Biocpkg("clusterProfiler")`. ```{r enrichment, fig.height = 8, fig.width = 14} # retrieve genesets H <- EnrichmentBrowser::getGenesets( org = "hsa", db = "msigdb", cat = "H", gene.id.type = "SYMBOL" ) C2 <- EnrichmentBrowser::getGenesets( org = "hsa", db = "msigdb", cat = "C2", gene.id.type = "SYMBOL" ) C6 <- EnrichmentBrowser::getGenesets( org = "hsa", db = "msigdb", cat = "C6", gene.id.type = "SYMBOL" ) # combine genesets and convert genesets to be compatible with enricher gs <- c(H, C2, C6) gs.list <- do.call(rbind, lapply(names(gs), function(x) { data.frame(gs = x, genes = gs[[x]]) })) enrichresults <- regulonEnrich( TF = c("GATA6", "NKX2-1"), regulon = regulon, weight = "weight", weight_cutoff = 0.1, genesets = gs.list ) # plot results enrichPlot(results = enrichresults) ``` # Network analysis We can visualize the genesets as a network ```{r plot gsea network} plotGseaNetwork( tf = names(enrichresults), enrichresults = enrichresults, p.adj_cutoff = 0.1, ntop_pathways = 10 ) ``` # Differential networks We are interested in understanding the differential networks between two conditions and determining which transcription factors account for the differences in the topology of networks. The pruned regulons with cluster-specific test statistics computed by `pruneRegulon` can be used to generate cluster-specific networks based on user-defined cutoffs and to visualize differential networks for transcription factors of interest. In this dataset, the GATA6 gene was only expressed in cluster 1 (C1) and NKX2-1 was only expressed in cluster 3 (C3). If we visualize the target genes of GATA6, we can see that C1 has many more target genes of GATA6 compared to C5, a cluster that does not express GATA6. Similarly, NKX2-1 target genes are confined to C3 which is the only cluster that exogenously expresses NKX2-1. ```{r differential networks} plotDiffNetwork(regulon, cutoff = 0.1, tf = c("GATA6"), weight = "weight", clusters = c("C1", "C5"), layout = "stress" ) plotDiffNetwork(regulon, cutoff = 0.1, tf = c("NKX2-1"), weight = "weight", clusters = c("C3", "C5"), layout = "stress" ) ``` We can also visualize how transcription factors relate to other transcription factors in C1 cluster. ```{r} selected <- which(regulon$weight[, "C1"] > 0 & regulon$tf %in% c("GATA6", "FOXA1", "AR")) C1_network <- buildGraph(regulon[selected, ], weights = "weight", cluster = "C1" ) plotEpiregulonNetwork(C1_network, layout = "sugiyama", tfs_to_highlight = c("GATA6", "FOXA1", "AR")) + ggplot2::ggtitle("C1") ``` To systematically examine the differential network topology between two clusters, we perform an edge subtraction between two graphs, using weights computed by `pruneRegulon`. We then calculate the degree centrality of the weighted differential graphs and if desired, normalize the differential centrality against the total number of edges. The default normalization function is `sqrt` as it preserves both the difference in the number of edges (but scaled by sqrt) and the differences in the weights. If the user only wants to examine the differences in the averaged weights, the `FUN` argument can be changed to `identity`. Finally, we rank the transcription factors by (normalized) differential centrality. For demonstration purpose, the regulon list is truncated but the full list would contain all the transcription factors. ```{r} # rank by differential centrality C1_network <- buildGraph(regulon, weights = "weight", cluster = "C1") C5_network <- buildGraph(regulon, weights = "weight", cluster = "C5") diff_graph <- buildDiffGraph(C1_network, C5_network, abs_diff = FALSE) diff_graph <- addCentrality(diff_graph) diff_graph <- normalizeCentrality(diff_graph) rank_table <- rankTfs(diff_graph) library(ggplot2) ggplot(rank_table, aes(x = rank, y = centrality)) + geom_point() + ggrepel::geom_text_repel(data = rank_table, aes(label = tf)) + theme_classic() ``` # Session Info ```{r} sessionInfo() ```