--- title: "Introduction to `epivizr`: interactive visualization for genomic data" author: "Héctor Corrada Bravo" date: "`r Sys.Date()`" output: BiocStyle::html_document vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Introduction to epivizr} %\VignetteEncoding{UTF-8} --- [`Epiviz`](http://epiviz.cbcb.umd.edu) is an interactive visualization tool for functional genomics data. It supports genome navigation like other genome browsers, but allows multiple visualizations of data within genomic regions using scatterplots, heatmaps and other user-supplied visualizations. It also includes data from the [Gene Expression Barcode project](http://barcode.luhs.org/) for transcriptome visualization. It has a flexible plugin framework so users can add [d3](http://d3js.org/) visualizations. You can find more information about Epiviz at [http://epiviz.cbcb.umd.edu/help](http://epiviz.cbcb.umd.edu/help) and see a video tour [here](http://youtu.be/099c4wUxozA). The `r Biocpkg("epivizr")` package implements two-way communication between the `R/Bioconductor` computational genomics environment and `Epiviz`. Objects in an `R` session can be displayed as tracks or plots on Epiviz. Epivizr uses Websockets for communication between the browser Javascript client and the R environment, the same technology underlying the popular [Shiny](http://www.rstudio.com/shiny/) system for authoring interactive web-based reports in R. # Preliminaries: the data In this vignette we will look at colon cancer methylation data from the TCGA project and expression data from the gene expression barcode project. The `r Biocpkg("epivizrData")` package contains human chromosome 11 methylation data from the Illumina 450kHumanMethylation beadarray processed with the `r Biocpkg("minfi")` package. We use expression data from the `r Biocpkg("antiProfilesData")` bioconductor package. ```{r, eval=TRUE, echo=TRUE, results='hide', warning=FALSE, error=FALSE, message=FALSE} library(epivizr) library(antiProfilesData) library(SummarizedExperiment) ``` ```{r} data(tcga_colon_blocks) data(tcga_colon_curves) data(apColonData) ``` The `tcga_colon_blocks` object is a `GRanges` object containing chromosome 11 regions of hypo or hyper methylation in colon cancer identified using the `blockFinder` function in the `r Biocpkg("minfi")` package. ```{r} show(tcga_colon_blocks) ``` The columns `value` and `p.value` can be used to determine which of these regions, or blocks, are interesting by looking at a volcano plot for instance. ```{r, fig.width=4, fig.height=4, fig.align='center'} plot(tcga_colon_blocks$value, -log10(tcga_colon_blocks$p.value), main="Volcano plot", xlab="Avg. methylation difference", ylab="-log10 p-value",xlim=c(-.5,.5)) ``` The `tcga_colon_curves` object is another `GRanges` object which contains the basepair resolution methylation data used to define these regions. ```{r} show(tcga_colon_curves) ``` This basepair resolution data includes mean methylation levels for normal and cancer and a smoothed estimate of methylation difference. This smoothed difference estimate is used to define regions in the `tcga_colon_blocks` object. Finally, the `apColonData` object is an `ExpressionSet` containing gene expression data for colon normal and tumor samples for genes within regions of methylation loss identified [this paper](http://www.nature.com/ng/journal/v43/n8/full/ng.865.html). Our goal in this vignette is to visualize this data as we browse the genome. # Using epivizr ## The epivizr app The connection to `Epiviz` is managed through a session manager object of class `EpivizApp`. We can create this object and open `Epiviz` using the `startEpiviz` function. ```{r, eval=FALSE, echo=TRUE} app <- startEpiviz(workspace="qyOTB6vVnff", gists="2caf8e891201130c7daa") ``` ```{r, eval=TRUE, echo=FALSE} app <- startEpiviz(host="http://localhost", http_port=8989, debug=TRUE, open_browser=FALSE, non_interactive=TRUE, try_ports=TRUE) # register BlockTrack js_chart_settings <- list(list(id = "title", type = "string", defaultValue = "", label = "Title", possibleValues = NULL), list(id = "marginTop", type = "number", defaultValue = 25, label = "Top margin", possibleValues = NULL), list(id = "marginBottom", type = "number", defaultValue = 23, label = "Bottom margin", possibleValues = NULL), list(id = "marginLeft", type = "number", defaultValue = 20, label = "Left margin", possibleValues = NULL), list(id = "marginRight", type = "number", defaultValue = 10, label = "Right margin", possibleValues = NULL), list(id = "minBlockDistance", type = "number", defaultValue = 5, label = "Minimum block distance", possibleValues = NULL)) js_chart_colors = c("#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd", "#8c564b", "#e377c2", "#7f7f7f", "#bcbd22", "#17becf") app$chart_mgr$register_chart_type("BlocksTrack", "epiviz.plugins.charts.BlocksTrack", js_chart_settings=js_chart_settings, js_chart_colors=js_chart_colors) # register LineTrack js_chart_settings <- list(list(id = "title", type = "string", defaultValue = "", label = "Title", possibleValues = NULL), list(id = "marginTop", type = "number", defaultValue = 25, label = "Top margin", possibleValues = NULL), list(id = "marginBottom", type = "number", defaultValue = 23, label = "Bottom margin", possibleValues = NULL), list(id = "marginLeft", type = "number", defaultValue = 20, label = "Left margin", possibleValues = NULL), list(id = "marginRight", type = "number", defaultValue = 10, label = "Right margin", possibleValues = NULL), list(id = "measurementGroupsAggregator", type = "categorical", defaultValue = "mean-stdev", label = "Aggregator for measurement groups", possibleValues = c("mean-stdev", "quartiles", "count", "min", "max", "sum")), list(id = "step", type = "number", defaultValue = 50, label = "Step", possibleValues = NULL), list(id = "showPoints", type = "boolean", defaultValue = FALSE, label = "Show points", possibleValues = NULL), list(id = "showLines", type = "boolean", defaultValue = TRUE, label = "Show lines", possibleValues = NULL), list(id = "showErrorBars", type = "boolean", defaultValue = TRUE, label = "Show error bars", possibleValues = NULL), list(id = "pointRadius", type = "number", defaultValue = 1, label = "Point radius", possibleValues = NULL), list(id = "lineThickness", type = "number", defaultValue = 1, label = "Line thickness", possibleValues = NULL), list(id = "yMin", type = "number", defaultValue = "default", label = "Min Y", possibleValues = NULL), list(id = "yMax", type = "number", defaultValue = "default", label = "Max Y", possibleValues = NULL), list(id = "interpolation", type = "categorical", defaultValue = "linear", label = "Interpolation", possibleValues = c("linear", "step-before", "step-after", "basis", "basis-open", "basis-closed", "bundle", "cardinal", "cardinal-open", "monotone"))) js_chart_colors <- c("#1859a9", "#ed2d2e", "#008c47", "#010101", "#f37d22", "#662c91", "#a11d20", "#b33893") app$chart_mgr$register_chart_type("LineTrack", "epiviz.plugins.charts.LineTrack", js_chart_settings=js_chart_settings, js_chart_colors=js_chart_colors) # register ScatterPlot js_chart_settings <- list(list(id = "title", type = "string", defaultValue = "", label = "Title", possibleValues = NULL), list(id = "marginTop", type = "number", defaultValue = 15, label = "Top margin", possibleValues = NULL), list(id = "marginBottom", type = "number", defaultValue = 50, label = "Bottom margin", possibleValues = NULL), list(id = "marginLeft", type = "number", defaultValue = 50, label = "Left margin", possibleValues = NULL), list(id = "marginRight", type = "number", defaultValue = 15, label = "Right margin", possibleValues = NULL), list(id = "measurementGroupsAggregator", type = "categorical", defaultValue = "mean-stdev", label = "Aggregator for measurement groups", possibleValues = c("mean-stdev", "quartiles", "count", "min", "max", "sum")), list(id = "circleRadiusRatio", type = "number", defaultValue = 0.015, label = "Circle radius ratio", possibleValues = NULL), list(id = "xMin", type = "number", defaultValue = "default", label = "Min X", possibleValues = NULL), list(id = "xMax", type = "number", defaultValue = "default", label = "Max X", possibleValues = NULL), list(id = "yMin", type = "number", defaultValue = "default", label = "Min Y", possibleValues = NULL), list(id = "yMax", type = "number", defaultValue = "default", label = "Max Y", possibleValues = NULL)) js_chart_colors <- c("#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd", "#8c564b", "#e377c2", "#7f7f7f", "#bcbd22", "#17becf") app$chart_mgr$register_chart_type("ScatterPlot", "epiviz.plugins.charts.ScatterPlot", js_chart_settings=js_chart_settings, js_chart_colors=js_chart_colors) app$server$start_server() ``` This opens a websocket connection between the interactive `R` session and the browser client. This will allow us to visualize data stored in the `Epiviz` server along with data in the interactive `R` session. ---- *Windows users:* In Windows platforms we need to use the `service` function to let the interactive `R` session connect to the `epiviz` web app and serve data requests. We then escape (using `ctl-c` or `esc` depending on your environment) to continue with the interactive `R` session. This is required anytime you want `epivizr` to serve data to the web app, for example, when interacting with the UI. (We are actively developing support for non-blocking sessions in Windows platforms). ```{r, eval=FALSE} app$server$service() ``` ---- ## Listing available chart types Once the browser is open and is connected to an active epivizr session, the epivizr session registers available chart types supported in the epiviz JS app. To list available chart types we use method `list_chart_types`. In this vignette, we only show three chart types available for use in the epiviz JS app. The production epiviz JS app has a larger number of chart types available. Chart types can also be added dynamically as described in the epiviz JS documentation: [http://epiviz.cbcb.umd.edu/help](http://epiviz.cbcb.umd.edu/help). Chart types that are added dynamically are listed and usable within an `epivizr` session. ```{r} app$chart_mgr$list_chart_types() ``` Chart type settings, e.g., line widths, colors, margins etc. can be modified dynamically. Settings available for each chart type are briefly listed in this call. See below for further detail on customizing charts through settings and colors. # Adding charts ## Adding block region tracks Once the browser is open we can visualize the `tcga_colon_blocks` object containing blocks of methylation modifications in colon cancer. We use the `plot` method to do so. ```{r,eval=TRUE} blocks_chart <- app$plot(tcga_colon_blocks, datasource_name="450k colon_blocks") ``` ---- *Windows users:* We need the `service` function to let the interactive `R` session serve data requests from the browser client as you interact with the UI. Escape (using `ctl-c` or `esc` depending on your environment) to continue with the interactive `R` session. ```{r,eval=TRUE} app$server$service() ``` ---- You should now see that a new track is added to the `Epiviz` web app. You can think of this track as an interactive display device in `R`. As you navigate on the browser, data is requested from the `R` session through the websocket connection. Remember to escape to continue with your interactive `R` session if you are not running "non-blocking" mode. The `blocks_chart` object inherits from class `EpivizChart`, which we can use to get information about the data being displayed and the chart used to display it. Note that the "brushing" effect we implement in `Epiviz` works for `epivizr` tracks as well. Now that we have interactive data connections to `Epiviz` we may want to iteratively compute and visualize our data. For example, we may want to only display methylation blocks inferred at a certain statistical significance level. In this case, we will filter by block size. ```{r, eval=TRUE} # subset to those with length > 250Kbp keep <- width(tcga_colon_blocks) > 250000 # get the data object for chart ms_obj <- app$get_ms_object(blocks_chart) app$update_measurements(ms_obj, tcga_colon_blocks[keep,]) ``` Now, only this subset of blocks will be displayed in the already existing track. ## Modifying chart settings and colors To modify default chart colors or settings, we need to know what settings can be applied to the chart. Again, as these are defined dynamically in the epiviz JS app we use a method, `print_chart_type_info` on the `EpivizChartMgr` class to list settings that can be applied to specific chart type. For example, to list chart settings available for a `BlocksTrack` chart we use the following: ```{r} app$chart_mgr$print_chart_type_info("BlocksTrack") ``` There are two ways to set settings and colors to a chart: when the chart is initially created using the `plot` method, or after the chart is created using the `set` method in class `EpivizChart`. We will illustrate both ways of doing this: If using the `plot` method, use the `list_chart_settings` method as above to list settings that can be applied to a chart type. For example, for a `BlocksTrack` chart, we can set `minBlockDistance` as a setting for the plot function which controls how close genomic regions can be before they are merged into a single rectangle in the chart. We can also change colors used in the chart this way. ```{r, eval=TRUE} settings <- list(minBlockDistance=50) colors <- c("#d15014", "#5e97eb", "#e81ccd") blocks_chart <- app$plot(tcga_colon_blocks, datasource_name="450k colon_blocks", settings=settings, colors=colors) ``` This will create a second blocks track using a different color map. On the other hand, if the `BlockChart` is already added to the epiviz JS application session, use the `set_chart_settings` method to update settings and colors. ```{r, eval=TRUE} # create a list of settings and colors to update the plot settings <- list(minBlockDistance=100) colors <- c("#5e97eb", "#d15014", "#e81ccd") app$chart_mgr$set_chart_settings(blocks_chart, settings=settings, colors=colors) ``` This changes the color map for the second blocks track added. Use the `print_chart_info` method to list settings and colors currently used in the chart. ```{r} # to list applied chart settings app$chart_mgr$print_chart_info(blocks_chart) ``` Methods are also available directly on the `EpivizChart` objects. E.g., calls `blocks_chart$set(settings=settings, colors=colors)` and `blocks_chart$print_info()` yield the same result. ## Printing charts Charts can be printed as a pdf or png file through the epivizr session. To do so, use the `print_chart` method: ```{r, eval=TRUE} app$chart_mgr$print_chart(blocks_chart, file_name="blocks_chart", file_type="pdf") ``` This will create a file named `blocks_chart.pdf` which will be downloaded through your web browser. It will be found in the default file download location for your web browser. ## Adding line plots along the genome There are a number of different data types available to use through `epivizr`. You can see a list of R/BioC classes that are supported using `?register`. In the previous section, we used the `block` data type to register a `GenomicRanges` object. To visualize methylation data at base-pair resolution from the `tcga_colon_curves` `GenomicRanges` object, we will use the `bp` type. ```{r, eval=TRUE} # add low-filter smoothed methylation estimates means_track <- app$plot(tcga_colon_curves, datasource_name="450kMeth",type="bp",columns=c("cancerMean","normalMean")) ``` NOTE: You can adjust track settings to change how this new track looks like. For instance, to show all points in the window set the `step` parameter to 1, and to see a smooth interpolation of the data set the `interpolation` parameter to `basis`: ```{r, eval=TRUE} means_track$set(settings=list(step=1, interpolation="basis")) ``` Notice that we added two lines in this plot, one for mean methylation in cancer and another for mean methylation in normal. The `columns` argument specifies which columns in `mcols(colon_curves)` will be displayed. We can also add a track containing the smooth methylation difference estimate used to define methylation blocks. ```{r, eval=TRUE} diff_chart <- app$plot(tcga_colon_curves, datasource_name="450kMethDiff",type="bp",columns=c("smooth"),ylim=matrix(c(-.5,.5),nc=1)) ``` We pass limits for the y axis in this case. To see other arguments supported, you can use the help framework in R `?"EpivizApp"`. As before, we can specify settings for this new track. # Managing the app We can use the app connection object to list charts we have added so far, or to remove charts. ```{r, eval=TRUE} app$chart_mgr$list_charts() app$chart_mgr$rm_chart(means_track) app$chart_mgr$list_charts() ``` # Charts that are not aligned to genomic location ## Adding a scatterplot Now we want to visualize the colon expression data in `apColonData` object as an MA plot in `Epiviz`. First, we add an `"MA"` assay to the `ExpressionSet`: ```{r} keep <- pData(apColonData)$SubType!="adenoma" apColonData <- apColonData[,keep] status <- pData(apColonData)$Status Indexes <- split(seq(along=status),status) exprMat <- exprs(apColonData) mns <- sapply(Indexes, function(ind) rowMeans(exprMat[,ind])) mat <- cbind(colonM=mns[,"1"]-mns[,"0"], colonA=0.5*(mns[,"1"]+mns[,"0"])) pd <- data.frame(stat=c("M","A")) rownames(pd) <- colnames(mat) maEset <- ExpressionSet( assayData=mat, phenoData=AnnotatedDataFrame(pd), featureData=featureData(apColonData), annotation=annotation(apColonData) ) show(maEset) ``` `epivizr` will use the `annotation(maEset)` annotation to determine genomic locations using the `AnnotationDbi` package so that only probesets inside the current browser window are displayed. ```{r, eval=TRUE} eset_chart <- app$plot(maEset, datasource_name="MAPlot", columns=c("colonA","colonM")) ``` In this case, we specify which data is displayed in each axis of the scatter plot using the `columns` argument. The `assay` arguments indicates where data is obtained. ## The RangedSummarizedExperiment Object `Epiviz` is also able to display plots of data in the form of a `RangedSummarizedExperiment` object. After loading the `tcga_colon_expression` dataset in the `epivizrData` package, we can see that this object contains information on 239322 exons in 40 samples. ```{r} data(tcga_colon_expression) show(tcga_colon_expression) ``` The `assay` slot holds a matrix of raw sequencing counts, so before we can plot a scatterplot showing expression, we must first normalize the count data. We use the geometric mean of each row as a reference sample to divide each column (sample) by, then use the median of each column as a scaling factor to divide each row (exon) by. ```{r, eval=TRUE} ref_sample <- 2 ^ rowMeans(log2(assay(tcga_colon_expression) + 1)) scaled <- (assay(tcga_colon_expression) + 1) / ref_sample scaleFactor <- Biobase::rowMedians(t(scaled)) assay_normalized <- sweep(assay(tcga_colon_expression), 2, scaleFactor, "/") assay(tcga_colon_expression) <- assay_normalized ``` Now, using the expression data in the `assay` slot and the sample data in the `colData` slot, we can compute mean exon expression by sample type. ```{r, eval=TRUE} status <- colData(tcga_colon_expression)$sample_type index <- split(seq(along = status), status) logCounts <- log2(assay(tcga_colon_expression) + 1) means <- sapply(index, function(ind) rowMeans(logCounts[, ind])) mat <- cbind(cancer = means[, "Primary Tumor"], normal = means[, "Solid Tissue Normal"]) ``` Now, create a new `RangedSummarizedExperiment` object with the two column matrix, and all the information about the features of interest, in this case exons, are stored in the `rowRanges` slot to be queried by `Epiviz`. ```{r, eval=TRUE} sumexp <- SummarizedExperiment(mat, rowRanges=rowRanges(tcga_colon_expression)) se_chart <- app$plot(sumexp, datasource_name="Mean by Sample Type", columns=c("normal", "cancer")) ``` Again, the `columns` argument specifies what data will be displayed along which axis. # Visualizing data available from epiviz webserver ## Load remote measurements Epiviz web server (http://epiviz.cbcb.umd.edu) currently hosts data sets from several sources including Gene Expression Barcode project. We provide a way to load these remotely hosted datasets and integrate/analyze with your current workflow and local data. For this, Lets first get the list of measurements available from the webserver ```{r, eval=FALSE, echo=TRUE} app$load_remote_measurements() remote_measurements <- app$data_mgr$get_remote_measurements() ``` ## Query measurements and add charts `remote_measurements` is a list of `EpivizMeasurement` objects. We can query this list to choose the datasets we would like to load from the webserver. For the purpose of this vignette, lets find measurements to visualize data from gene expression barcode project for tumor and normal samples from lung, colon and breast tissues. ```{r, eval=FALSE, echo=TRUE} measurementList <- lapply(remote_measurements, function(m) { if(m@name %in% c("colon normal", "lung normal", "breast normal", "colon tumor", "lung tumor", "breast tumor") && m@datasourceId == "gene_expression_barcode_subtype") { m } }) measurements <- Filter(Negate(is.null), measurementList) ``` Now lets add a heatmap using the measurements we just selected. For this we use the `visualize` function from the `EpivizChartMgr` class. ```{r, eval=FALSE, echo=TRUE} app$chart_mgr$visualize("HeatmapPlot", measurements = measurements) ``` Similarly other chart types registered with the current epivizr session can be used with the datasets from the webserver. # More application interactions ## Slideshow We can navigate to a location on the genome using the `navigate` method of the app object: ```{r, eval=TRUE} app$navigate("chr11", 110000000, 120000000) ``` There is a convenience function to quickly navigate to a series of locations in succession. We can use that to browse the genome along a ranked list of regions. Let's navigate to the 5 most up-regulated exons in the colon exon expression data. ```{r, eval=TRUE, warning=FALSE} foldChange <- mat[,"cancer"]-mat[,"normal"] ind <- order(foldChange,decreasing=TRUE) # bounding 1Mb around each exon slideshowRegions <- trim(rowRanges(sumexp)[ind] + 1000000L) app$slideshow(slideshowRegions, n=5) ``` ## Printing the epivizr workspace To print the current epiviz workspace as a pdf or png, we use the `print_workspace` method: ```{r, eval=TRUE} app$print_workspace(file_name="workspace", file_type="pdf") ``` This will create a file named `workspace.pdf` and will be downloaded through your web browser. It will be saved to the default file download location for your web browser. ## Saving the epivizr workspace To save the current state of the epiviz app, and UI workspace, into an `rda` file, we use the `save` method: ```{r, eval=TRUE} app$save(file="app.rda", include_data=TRUE) ``` This will create a file named `app.rda` that can be restarted for later use and analysis. For a smaller file size, you may choose whether to include or exclude the data when saving the workspace. In this case, expressions used to add data to the app, through `add_measurements` or `plot` as shown above, will be stored and re-evaluated when restarting the app. ## Restarting the epivizr workspace After a workspace is saved using the `save` method shown above, we can replot it using the following: ```{r, eval=TRUE} app <- restartEpiviz(file="app.rda", open_browser=TRUE) ``` This will recreate the workspace in your web browser and reconnect it with the R session. If the workspace you are restarting did not include its data when saving the file, you will need to load the data in your global environment before restarting epiviz. This will allow to reload the data used in the app. ## Closing the session To close the connection to `Epiviz` and remove all tracks added during the interactive session, we use the `stop_app` function. ```{r} app$stop_app() ``` # Standalone version and browsing arbitrary genomes The `epivizrStandalone` all files required to run the web app UI locally. This feature can be used to browse any genome of interest. See that packages vignette for more information. # SessionInfo ```{r session-info, cache=FALSE} sessionInfo() ```