--- title: "MicrobiomeBenchmarkData" author: - name: "Samuel Gamboa" email: "Samuel.Gamboa.Tuz@gmail.com" - name: "Levi Waldron" output: BiocStyle::html_document: toc: true vignette: > %\VignetteIndexEntry{MicrobiomeBenchmarkData} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction The `MicrobiomeBenchamrkData` package provides access to a collection of datasets with biological ground truth for benchmarking differential abundance methods. The datasets are deposited on Zenodo: https://doi.org/10.5281/zenodo.6911026 # Installation ```{r installation, eval=FALSE} ## Install BioConductor if not installed if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") ## Release version (not yet in Bioc, so it doesn't work yet) BiocManager::install("MicrobiomeBenchmarkData") ## Development version BiocManager::install("waldronlab/MicrobiomeBenchmarkData") ``` ```{r, message=FALSE} library(MicrobiomeBenchmarkData) library(purrr) ``` # Sample metadata All sample metadata is merged into a single data frame and provided as a data object: ```{r} data('sampleMetadata', package = 'MicrobiomeBenchmarkData') ## Get columns present in all samples sample_metadata <- sampleMetadata |> discard(~any(is.na(.x))) |> head() knitr::kable(sample_metadata) ``` # Accessing datasets Currently, there are `r nrow(MicrobiomeBenchmarkData::getBenchmarkData())` datasets available through the MicrobiomeBenchmarkData. These datasets are accessed through the `getBenchmarkData` function. ## Print avaialable datasets If no arguments are provided, the list of available datasets is printed on screen and a data.frame is returned with the description of the datasets: ```{r} dats <- getBenchmarkData() ``` ```{r} dats ``` ## Access a single dataset In order to import a dataset, the `getBenchmarkData` function must be used with the name of the dataset as the first argument (`x`) and the `dryrun` argument set to `FALSE`. The output is a list vector with the dataset imported as a TreeSummarizedExperiment object. ```{r} tse <- getBenchmarkData('HMP_2012_16S_gingival_V35_subset', dryrun = FALSE)[[1]] tse ``` ## Access a few datasets Several datasets can be imported simultaneously by giving the names of the different datasets in a character vector: ```{r} list_tse <- getBenchmarkData(dats$Dataset[2:4], dryrun = FALSE) str(list_tse, max.level = 1) ``` ## Access all of the datasets If all of the datasets must to be imported, this can be done by providing the `dryrun = FALSE` argument alone. ```{r} mbd <- getBenchmarkData(dryrun = FALSE) str(mbd, max.level = 1) ``` # Annotations for each taxa are included in rowData The biological annotations of each taxa are provided as a column in the `rowData` slot of the TreeSummarizedExperiment. ```{r} ## In the case, the column is named as taxon_annotation tse <- mbd$HMP_2012_16S_gingival_V35_subset rowData(tse) ``` # Cache The datasets are cached so they're only downloaded once. The cache and all of the files contained in it can be removed with the `removeCache` function. ```{r, eval=FALSE} removeCache() ``` # Session information ```{r} sessionInfo() ```