mouseData MRExperiment objectMetaviz is tool for interactive visualization and exploration of metagenomic sequencing data. Metaviz a provides main navigation tool for exploring hierarchical feature data that is coupled with multiple data visualizations including heatmaps, stacked bar charts, and scatter plots. Metaviz supports a flexible plugin framework so users can add new d3 visualizations. You can find more information about Metaviz at http://metaviz.cbcb.umd.edu/help.
The metavizr package implements two-way communication between the R/Bioconductor computational genomics environment and Metaviz. Objects in an R/Bioconductor session can be visualized and explored using the Metaviz navigation tool and plots. Metavizr uses Websockets to communicate between the browser Javascript client and the R/Bioconductor session. Websockets are the protocols underlying the popular Shiny system for authoring interactive web-based reports in R.
In this vignette we will look at two data sets one from a case/control study and another with a time series. We load the first data set from the msd16s Bioconductor package. This data is from the Moderate to Severe Diaherrial disease study in children from four countries: Banglash, Kenya, Mali, and the Gambia. Case and control stool samples were gathered from each country across several age ranges, 0-6 months, 6-12 months, 12-18 months, 18-24 months, and 24-60 months. An analysis of this data is described in Pop et al. [1].
require(metavizr)
require(metagenomeSeq)
require(msd16s)The connection to Metaviz is managed through a session manager object of class EpivizApp. We can create this object and open Metaviz using the startMetaviz function.
app <- startMetaviz()NOTE: The following commands are not visible when building the vignette. These commands are for registering each chart type so that the vignette will build when starting a new Metaviz session using localhost. These commands are not needed if using startMetaviz with default which starts a Metaviz session using http://metaviz.cbcb.umd.edu.
This opens a websocket connection between the interactive R session and the browser client. This will allow us to visualize data stored in the Metaviz server along with data in the interactive R session.
Windows users: In Windows platforms we need to use the service function to let the interactive R session connect to the epiviz web app and serve data requests. We then escape (using ctl-c or esc depending on your environment) to continue with the interactive R session. This is required anytime you want metavizr to serve data to the web app, for example, when interacting with the UI. (We are actively developing support for non-blocking sessions in Windows platforms).
app$server$service()For vignette purposes, we will subset the 992 msd16s samples to those 301 from Bangladesh. Also, we will aggregate the count matrix to the species level. We have found that
matrix sizes with 100 samples and 4000 features perform well for interactive visualization with an R session using WebSockets. For larger abundance matrices, we recommend using the graph database backend available at https://github.com/epiviz/metaviz-data-provider. Once having subset the data, we set the aggregation level to “class”, normalize, find features differentially abundant at the “class” level, propagate those selections to an MRexperiment that is aggreagted to “species” level, and then explore the hiearchy.
feature_order <- c("superkingdom", "phylum", "class", "order", "family", "genus", "species", "OTU")
aggregated_feature_order <- feature_order[1:7]
msd16s_species <- msd16s
fData(msd16s) <- fData(msd16s)[feature_order]
fData(msd16s_species) <- fData(msd16s_species)[aggregated_feature_order]
  
bangladesh <- msd16s[, which(pData(msd16s)$Country == "Bangladesh")]
bangladesh_species <- msd16s_species[, which(pData(msd16s_species)$Country == "Bangladesh")]
aggregated_species <-  cumNorm(aggregateByTaxonomy(bangladesh_species, lvl="species"), p = 0.75)
aggregation_level <- "class"
aggregated_bangladesh <- aggregateByTaxonomy(bangladesh, lvl=aggregation_level)
normed_bangladesh <-  cumNorm(aggregated_bangladesh, p = 0.75)
bangladesh_sample_data <-  pData(normed_bangladesh)
mod <-  model.matrix(~1+Dysentery, data = bangladesh_sample_data)
results_bangladesh <-  fitFeatureModel(normed_bangladesh, mod)## Warning: glm.fit: algorithm did not converge## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurredlogFC_bangladesh <- MRcoefs(results_bangladesh, number = nrow(normed_bangladesh))
features <- rownames(logFC_bangladesh)
featuresToKeep_names <- features[which(logFC_bangladesh[which(abs(logFC_bangladesh$logFC) > 1),]$adjPvalues < .1)]
featuresToKeep <- rep(2, length(featuresToKeep_names))
names(featuresToKeep) <- featuresToKeep_names
featuresToRemove_names <- features[!(features %in% featuresToKeep_names)]
featuresToRemove <- rep(0, length(featuresToRemove_names))
names(featuresToRemove) <- featuresToRemove_namesA metavizControl allows users to specify settings for operating over the MRexperiment object including data analysis such as normalization and log transform. We create a metavizControl with the featureSelection as the nodes that be found to be differentially abundant and those will be visible when creating Metaviz plots. The rest of the settings in the metavizControl will use the default parameters.
control <- metavizr::metavizControl(featureSelection = c(featuresToKeep, featuresToRemove))mouseData MRExperiment objectNow we can view the data as a heatmap calling revisualize:
heatmap <- app$chart_mgr$revisualize(chart_type = "HeatmapPlot", chart = icicle_plot)Using the same data, we can also revisualize it in a stacked plot to see the abundance of various features across samples. Since the measurements are added from creating the icicle_plot, we only need to add a stacked line plot.
stackedPlot <- app$chart_mgr$revisualize(chart_type ="StackedLinePlot", chart = icicle_plot)Finally, we can update the threshold cutoff we had for fold change, pass those modifications the icicle plot, and see the updates propogate to the heat map and stacked plot. This shows the use case of statistically-guided interactive visualizations.
feature_names_update <- rownames(logFC_bangladesh[which(logFC_bangladesh[which(abs(logFC_bangladesh$logFC) > .5),]$adjPvalues < .05),])
fSelection_update <- rep(2, length(feature_names_update))
names(fSelection_update) <- feature_names_update
agg_level = which(feature_order==aggregation_level)
select_value = 2
names(select_value) = agg_level
app$get_ms_object(icicle_plot)$propagateHierarchyChanges(fSelection_update, selectedLevels = select_value, request_with_labels = TRUE)Another feature of metavizr is to visualize data using a line plot. We detail the steps to perform this analysis and create a line plot using Metaviz such as those used in Paulson et al. when analyzing this time series data with a smoothing-spline [2] and [3].
First, import the etec16s dataset, select sample data from the first 9 days, and choose the feature annotations of interest.
library(etec16s)
data(etec16s)
etec16s <- etec16s[,-which(pData(etec16s)$Day>9)]Next, use metagenomeSeq to fit a smoothing-spline to the time series data.
featureData(etec16s)$Kingdom <- "Bacteria"
feature_order <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species", "OTU.ID")
featureData(etec16s) <- featureData(etec16s)[,feature_order]For plotting the data using Metaviz, we set the fit values as y-coordinates and timepoints as x-coordinates. We need to call ts2MRexperiment with arguments for the sample and feature data, in this case timepoints and annotations, respectively. This is done in the constructor for EpivizMetagenomicsDataTimeSeries-class. Finally, we add the MRexperiment as a measurement using the add_measurements method and then visualize those measurements as a LinePlot.
ic_plot <- app$plot(object = etec16s, type = "TimeSeries", datasource_name = "etec_splines", control = metavizControl(norm=FALSE, aggregateAtDepth = 4), formula = abundance~id + time*class + AntiGiven, class="AnyDayDiarrhea", id="SubjectID", time="Day", lvl="Family", feature_order = feature_order, B=1, seed = 1234, runFitTimeSeries = TRUE, fitThreshold = 2)## Loading required namespace: gss## MRExperiment Object validated... PASS## creating hierarchy_tree## creating node_ids_table## creating nodes_table## creating leaf_of_table## creating leaf_sample_count_tablesplineChart <- app$chart_mgr$revisualize("LinePlot", ic_plot)We can update the colors and settings on the spline chart. For example, lets limit the y axis to be between -10 and 10. To do so we use the set_chart_settings method. We can list existing settings for a chart using the list_chart_settings function.
##                           type                              js_class
## 1                  HeatmapPlot     epiviz.plugins.charts.HeatmapPlot
## 2                     LinePlot        epiviz.plugins.charts.LinePlot
## 3              StackedLinePlot epiviz.plugins.charts.StackedLinePlot
## 4 epiviz.ui.charts.tree.Icicle          epiviz.ui.charts.tree.Icicle
##   num_settings
## 1           17
## 2           16
## 3           14
## 4            5
##                                                                              settings
## 1 title,marginTop,marginBottom,marginLeft,marginRight,measurementGroupsAggregator,...
## 2 title,marginTop,marginBottom,marginLeft,marginRight,measurementGroupsAggregator,...
## 3 title,marginTop,marginBottom,marginLeft,marginRight,measurementGroupsAggregator,...
## 4                                 title,marginTop,marginBottom,marginLeft,marginRight
##   num_colors
## 1          8
## 2         20
## 3         20
## 4         10##                             id                             label default_value
## 1                        title                             Title              
## 2                    marginTop                        Top margin            30
## 3                 marginBottom                     Bottom margin            50
## 4                   marginLeft                       Left margin            30
## 5                  marginRight                      Right margin            15
## 6  measurementGroupsAggregator Aggregator for measurement groups    mean-stdev
## 7                     colLabel                    Columns labels      colLabel
## 8                     rowLabel                        Row labels          name
## 9                   showPoints                       Show points         FALSE
## 10                   showLines                        Show lines          TRUE
## 11               showErrorBars                   Show error bars          TRUE
## 12                 pointRadius                      Point radius             4
## 13               lineThickness                    Line thickness             3
## 14                        yMin                             Min Y       default
## 15                        yMax                             Max Y       default
## 16               interpolation                     Interpolation         basis
##                                                                                       possible_values
## 1                                                                                                    
## 2                                                                                                    
## 3                                                                                                    
## 4                                                                                                    
## 5                                                                                                    
## 6                                                              mean-stdev,quartiles,count,min,max,sum
## 7                                                                                                    
## 8                                                                                                    
## 9                                                                                                    
## 10                                                                                                   
## 11                                                                                                   
## 12                                                                                                   
## 13                                                                                                   
## 14                                                                                                   
## 15                                                                                                   
## 16 linear,step-before,step-after,basis,basis-open,basis-closed,bundle,cardinal,cardinal-open,monotone
##                      type
## 1                  string
## 2                  number
## 3                  number
## 4                  number
## 5                  number
## 6             categorical
## 7    measurementsMetadata
## 8  measurementsAnnotation
## 9                 boolean
## 10                boolean
## 11                boolean
## 12                 number
## 13                 number
## 14                 number
## 15                 number
## 16            categoricalSpline metavizr
Beyond marker-gene survey sequencing results, Metaviz can visualize taxonomic community profiles from whole metagenome shotgun sequencing.
In order to show this utility, we will use select portions of the vignette for a Bioconductor 2017 workshop found here: https://github.com/waldronlab/MicrobiomeWorkshop.
We first retrieve a dataset named Zeller_2014 from curatedMetagenomicData and convert the ExpressionSet containing abundance measurements into an MRExperiment which we will then use with Metaviz.
## Loading required package: curatedMetagenomicData## Loading required package: AnnotationHub## Loading required package: BiocFileCache## Loading required package: dbplyr## 
## Attaching package: 'AnnotationHub'## The following object is masked from 'package:Biobase':
## 
##     cache## Loading required package: dplyr## 
## Attaching package: 'dplyr'## The following objects are masked from 'package:dbplyr':
## 
##     ident, sql## The following objects are masked from 'package:data.table':
## 
##     between, first, last## The following object is masked from 'package:Biobase':
## 
##     combine## The following objects are masked from 'package:BiocGenerics':
## 
##     combine, intersect, setdiff, union## The following objects are masked from 'package:stats':
## 
##     filter, lag## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union## Loading required package: ExperimentHub##                       _           _
##   ___ _   _ _ __ __ _| |_ ___  __| |
##  / __| | | | '__/ _` | __/ _ \/ _` |
## | (__| |_| | | | (_| | ||  __/ (_| |
##  \___|\__,_|_|  \__,_|\__\___|\__,_|
##  __  __      _                                         _
## |  \/  | ___| |_ __ _  __ _  ___ _ __   ___  _ __ ___ (_) ___
## | |\/| |/ _ \ __/ _` |/ _` |/ _ \ '_ \ / _ \| '_ ` _ \| |/ __|
## | |  | |  __/ || (_| | (_| |  __/ | | | (_) | | | | | | | (__
## |_|  |_|\___|\__\__,_|\__, |\___|_| |_|\___/|_| |_| |_|_|\___|
##  ____        _        |___/
## |  _ \  __ _| |_ __ _
## | | | |/ _` | __/ _` |
## | |_| | (_| | || (_| |
## |____/ \__,_|\__\__,_|## Working on ZellerG_2014.metaphlan_bugs_list.stool## snapshotDate(): 2020-04-27## see ?curatedMetagenomicData and browseVignettes('curatedMetagenomicData') for documentation## loading from cache## Warning: `data_frame()` is deprecated as of tibble 1.1.0.
## Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.We add the MRExperiment to the Metaviz app with a call to plot with the type = “innerNodeCounts” specified.
## MRExperiment Object validated... PASS## creating hierarchy_tree## creating node_ids_table## creating nodes_table## creating leaf_sample_count_tableA FacetZoom utility is now visible for the taxonomic hierarchy of the Zeller_2014 dataset. Now we can add a heatmap for abundance measurments for that data.
The FacetZoom control can be used to select features of interest to show in the linked heatmap. A user can explore at lower levels of the taxonomic hierarchy and remove nodes as desired from the data visualizations. We complete, all charts can be removed with a call to the Metaviz app chart manager.
To close the connection with metaviz and the R session, we use the stop_app function.
app$stop_app()sessionInfo()## R version 4.0.0 (2020-04-24)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.11-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.11-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] curatedMetagenomicData_1.17.1 ExperimentHub_1.14.0         
##  [3] dplyr_0.8.5                   AnnotationHub_2.20.0         
##  [5] BiocFileCache_1.12.0          dbplyr_1.4.3                 
##  [7] etec16s_1.15.0                msd16s_1.7.0                 
##  [9] metavizr_1.12.0               digest_0.6.25                
## [11] data.table_1.12.8             metagenomeSeq_1.30.0         
## [13] RColorBrewer_1.1-2            glmnet_3.0-2                 
## [15] Matrix_1.2-18                 limma_3.44.0                 
## [17] Biobase_2.48.0                BiocGenerics_0.34.0          
## [19] BiocStyle_2.16.0             
## 
## loaded via a namespace (and not attached):
##   [1] plyr_1.8.6                    igraph_1.2.5                 
##   [3] lazyeval_0.2.2                splines_4.0.0                
##   [5] BiocParallel_1.22.0           GenomeInfoDb_1.24.0          
##   [7] ggplot2_3.3.0                 foreach_1.5.0                
##   [9] ensembldb_2.12.0              htmltools_0.4.0              
##  [11] gdata_2.18.0                  magrittr_1.5                 
##  [13] memoise_1.1.0                 cluster_2.1.0                
##  [15] Biostrings_2.56.0             matrixStats_0.56.0           
##  [17] askpass_1.1                   prettyunits_1.1.1            
##  [19] epivizrServer_1.16.0          colorspace_1.4-1             
##  [21] blob_1.2.1                    epivizrStandalone_1.16.0     
##  [23] rappdirs_0.3.1                xfun_0.13                    
##  [25] crayon_1.3.4                  RCurl_1.98-1.2               
##  [27] jsonlite_1.6.1                graph_1.66.0                 
##  [29] survival_3.1-12               iterators_1.0.12             
##  [31] ape_5.3                       glue_1.4.0                   
##  [33] gtable_0.3.0                  zlibbioc_1.34.0              
##  [35] XVector_0.28.0                DelayedArray_0.14.0          
##  [37] phyloseq_1.32.0               Rhdf5lib_1.10.0              
##  [39] shape_1.4.4                   scales_1.1.0                 
##  [41] DBI_1.1.0                     Rcpp_1.0.4.6                 
##  [43] xtable_1.8-4                  progress_1.2.2               
##  [45] bit_1.1-15.2                  OrganismDbi_1.30.0           
##  [47] stats4_4.0.0                  httr_1.4.1                   
##  [49] gplots_3.0.3                  ellipsis_0.3.0               
##  [51] epivizr_2.18.0                pkgconfig_2.0.3              
##  [53] XML_3.99-0.3                  locfit_1.5-9.4               
##  [55] tidyselect_1.0.0              rlang_0.4.5                  
##  [57] reshape2_1.4.4                later_1.0.0                  
##  [59] AnnotationDbi_1.50.0          munsell_0.5.0                
##  [61] BiocVersion_3.11.1            tools_4.0.0                  
##  [63] RSQLite_2.2.0                 ade4_1.7-15                  
##  [65] evaluate_0.14                 biomformat_1.16.0            
##  [67] stringr_1.4.0                 fastmap_1.0.1                
##  [69] yaml_2.2.1                    knitr_1.28                   
##  [71] bit64_0.9-7                   caTools_1.18.0               
##  [73] purrr_0.3.4                   AnnotationFilter_1.12.0      
##  [75] RBGL_1.64.0                   nlme_3.1-147                 
##  [77] mime_0.9                      biomaRt_2.44.0               
##  [79] compiler_4.0.0                curl_4.3                     
##  [81] interactiveDisplayBase_1.26.0 tibble_3.0.1                 
##  [83] stringi_1.4.6                 GenomicFeatures_1.40.0       
##  [85] lattice_0.20-41               ProtGenerics_1.20.0          
##  [87] vegan_2.5-6                   permute_0.9-5                
##  [89] multtest_2.44.0               vctrs_0.2.4                  
##  [91] gss_2.1-12                    pillar_1.4.3                 
##  [93] lifecycle_0.2.0               BiocManager_1.30.10          
##  [95] bitops_1.0-6                  httpuv_1.5.2                 
##  [97] rtracklayer_1.48.0            GenomicRanges_1.40.0         
##  [99] R6_2.4.1                      epivizrData_1.16.0           
## [101] bookdown_0.18                 promises_1.1.0               
## [103] KernSmooth_2.23-17            IRanges_2.22.0               
## [105] codetools_0.2-16              MASS_7.3-51.6                
## [107] gtools_3.8.2                  assertthat_0.2.1             
## [109] Wrench_1.6.0                  rhdf5_2.32.0                 
## [111] SummarizedExperiment_1.18.0   openssl_1.4.1                
## [113] rjson_0.2.20                  GenomicAlignments_1.24.0     
## [115] Rsamtools_2.4.0               S4Vectors_0.26.0             
## [117] GenomeInfoDbData_1.2.3        mgcv_1.8-31                  
## [119] hms_0.5.3                     grid_4.0.0                   
## [121] tidyr_1.0.2                   rmarkdown_2.1                
## [123] git2r_0.26.1                  shiny_1.4.0.2References:
[1] Pop, M., Walker, A.W., Paulson, J., Lindsay, B., Antonio, M., Hossain, M.A., Oundo, J., Tamboura, B., Mai, V., Astrovskaya, I. and Bravo, H.C., 2014. Diarrhea in young children from low-income countries leads to large-scale alterations in intestinal microbiota composition. Genome biology, 15(6), p.1.
[2] Pop, M., Paulson, J.N., Chakraborty, S., Astrovskaya, I., Lindsay, B.R., Li, S., Bravo, H.C., Harro, C., Parkhill, J., Walker, A.W. and Walker, R.I., 2016. Individual-specific changes in the human gut microbiota after challenge with enterotoxigenic Escherichia coli and subsequent ciprofloxacin treatment. BMC genomics, 17(1), p.1.
[3] Paulson J.N., Talukder H., and Bravo H.C, Longitudinal differential abundance analysis of microbial marker-gene surveys using smoothing splines. In Submission.