--- title: "An introduction to biodbHmdb" author: "Pierrick Roger" date: "`r BiocStyle::doc_date()`" package: "`r BiocStyle::pkg_ver('biodbHmdb')`" vignette: | %\VignetteIndexEntry{Introduction to the biodbHmdb package.} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} output: BiocStyle::html_document: toc: yes toc_depth: 4 toc_float: collapsed: false BiocStyle::pdf_document: default bibliography: references.bib --- # Introduction biodbHmdb is a *biodb* extension package that implements a connector to HMDB Metabolites. We present here the different ways to search for *HMDB* [@wishart2013_HMDB] entries with this package. Note that the whole *HMDB* is downloaded locally by *biodb* and stored on disk, since this is the only way to access *HMDB* programmatically. Any search on *HMDB* is hence currently run on the local machine. # Installation Install using Bioconductor: ```{r, eval=FALSE} if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install('biodbHmdb') ``` # Initialization The first step in using *biodbHmdb*, is to create an instance of the biodb class `BiodbMain` from the main *biodb* package. This is done by calling the constructor of the class: ```{r, results='hide'} mybiodb <- biodb::newInst() ``` During this step the configuration is set up, the cache system is initialized and extension packages are loaded. We will see at the end of this vignette that the *biodb* instance needs to be terminated with a call to the `terminate()` method. # Creating a connector to HMDB Metabolites In *biodb* the connection to a database is handled by a connector instance that you can get from the factory. biodbHmdb implements a connector to a remote database. Here is the code to instantiate a connector: ```{r} conn <- mybiodb$getFactory()$createConn('hmdb.metabolites') ``` For this vignette, we will avoid the downloading of the full HMDB Metabolites database, and use instead an extract containing a few entries: ```{r} dbExtract <- system.file("extdata", 'generated', "hmdb_extract.zip", package="biodbHmdb") conn$setPropValSlot('urls', 'db.zip.url', dbExtract) ``` # Accessing entries To get the number of entries stored inside the database, run: ```{r} conn$getNbEntries() ``` To get some of the first entry IDs (accession numbers) from the database, run: ```{r} ids <- conn$getEntryIds(2) ids ``` To retrieve entries, use: ```{r} entries <- conn$getEntry(ids) entries ``` To convert a list of entries into a dataframe, run: ```{r} x <- mybiodb$entriesToDataframe(entries, compute=FALSE) x ``` # Searching by name We use here the generic *biodb* method `searchForEntries()` to search for entries by name: ```{r} id <- conn$searchForEntries(list(name='1-Methylhistidine'), max.results=1) id ``` We limit the search result to one entry with the `max.results` field. The first parameter is the filtering criterion, expressed as a list whose single key (in our case) is the *biodb* field on which we want to filter. The value is the text we want to search for. See the documentation of `searchForEntries()` inside `?biodb::BiodbConn`. We could also use several strings to search for, in which case an entry will be matched if its field value contains all the specified strings: ```{r} conn$searchForEntries(list(name=c('propanoic', 'acid')), max.results=1) ``` To look at the values of the entry, you may convert it to a data frame: ```{r} entryDf <- conn$getEntry(id)$getFieldsAsDataframe(fields=c('accession', 'name')) ``` See table \@ref(tab:entryByNameTable) for the content of this data frame. ```{r entryByNameTable, echo=FALSE, results='asis'} knitr::kable(entryDf, "pipe", caption="The entry returned when searching by name.") ``` # Searching inside the "description" field Searching inside the `description` field can be done in the same way as for the `name` field. Here is a search with multiple strings to match: ```{r} id <- conn$searchForEntries(list(description=c('Parkinson', 'sclerosis')), max.results=1) id ``` Again, you can look at the values of the entry through a data frame: ```{r} entryDf <- conn$getEntry(id)$getFieldsAsDataframe(fields=c('accession', 'name', 'description')) ``` See table \@ref(tab:entryByDescTable) for the content of this data frame. ```{r entryByDescTable, echo=FALSE, results='asis'} knitr::kable(entryDf, "pipe", caption="The entry returned when searching by description.") ``` # Closing biodb instance When done with your *biodb* instance you have to terminate it, in order to ensure release of resources (file handles, database connection, etc): ```{r} mybiodb$terminate() ``` # Session information ```{r} sessionInfo() ``` # References