--- title: "Using a BioMart other than Ensembl" author: "Steffen Durinck, Wolfgang Huber, Mike Smith" package: "`r BiocStyle::pkg_ver('biomaRt')`" output: BiocStyle::html_document: md_extensions: "-autolink_bare_uris" css: style.css vignette: > %\VignetteIndexEntry{Using a BioMart other than Ensembl} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r setup, echo = FALSE} knitr::opts_chunk$set(error = TRUE, cache = FALSE, eval = TRUE) options(width=100) ``` # Introduction In recent years a wealth of biological data has become available in public data repositories. Easy access to these valuable data resources and firm integration with data analysis is needed for comprehensive bioinformatics data analysis. The `r BiocStyle::Biocpkg("biomaRt")` package, provides an interface to a growing collection of databases implementing the [BioMart software suite](http://www.biomart.org). The package enables retrieval of large amounts of data in a uniform way without the need to know the underlying database schemas or write complex SQL queries. Examples of BioMart databases are Ensembl, Uniprot and HapMap. These major databases give `r BiocStyle::Biocpkg("biomaRt")` users direct access to a diverse set of data and enable a wide range of powerful online queries from R. # Using a BioMart other than Ensembl There are a small number of non-Ensembl databases that offer a BioMart interface to their data. The `r BiocStyle::Biocpkg("biomaRt")` package can be used to access these in a very similar fashion to Ensembl. The majority of `r BiocStyle::Biocpkg("biomaRt")` functions will work in the same manner, but the construction of the initial Mart object requires slightly more setup. In this section we demonstrate the setting requires to query [Wormbase ParaSite](https://parasite.wormbase.org/index.html) and [Phytozome](https://phytozome.jgi.doe.gov/pz/portal.html). First we need to load `r BiocStyle::Biocpkg("biomaRt")`. ```{r} library(biomaRt) ``` ```{r test-wormbase-ssl, echo = FALSE, eval = FALSE} if(grepl(try(httr::GET('https://parasite.wormbase.org'), silent = TRUE)[1], pattern = "sslv3 alert handshake")) { httr::set_config(httr::config(ssl_cipher_list = "DEFAULT@SECLEVEL=1")) } ``` ## Wormbase To demonstrate the use of the `r BiocStyle::Biocpkg("biomaRt")` package with non-Ensembl databases the next query is performed using the Wormbase ParaSite BioMart. In this example, we use the `listMarts()` function to find the name of the available marts, given the URL of Wormbase. We use this to connect to Wormbase BioMart using the `useMart()` function.^[Note that we use the `https` address and must provide the port as `443`. Queries to WormBase will fail without these options.] ```{r wormbase, echo=TRUE, eval=TRUE} listMarts(host = "parasite.wormbase.org") wormbase <- useMart(biomart = "parasite_mart", host = "https://parasite.wormbase.org", port = 443) ``` We can then use functions described earlier in this vignette to find and select the gene dataset, and print the first 6 available attributes and filters. Then we use a list of gene names as filter and retrieve associated transcript IDs and the transcript biotype. ```{r wormbase-2, echo=TRUE, eval=TRUE} listDatasets(wormbase) wormbase <- useDataset(mart = wormbase, dataset = "wbps_gene") head(listFilters(wormbase)) head(listAttributes(wormbase)) getBM(attributes = c("external_gene_id", "wbps_transcript_id", "transcript_biotype"), filters = "gene_name", values = c("unc-26","his-33"), mart = wormbase) ``` ## Phytozome ### Version 12 The Phytozome 12 BioMart was [retired](https://jgi.doe.gov/more-intuitive-phytozome-interface/) in August 2021 and can not longer be accessed. ### Version 13 Version 13 of Phyotozome can be found at https://phytozome-next.jgi.doe.gov/ and if you wish to query that version the URL used to create the Mart object must reflect that. ```{r, phytozome-13, echo = TRUE, eval = TRUE} phytozome_v13 <- useMart(biomart = "phytozome_mart", dataset = "phytozome", host = "https://phytozome-next.jgi.doe.gov") ``` Once this is set up the usual `r BiocStyle::Biocpkg("biomaRt")` functions can be used to interrogate the database options and run queries. ```{r, pytozome-2} getBM(attributes = c("organism_name", "gene_name1"), filters = "gene_name_filter", values = "82092", mart = phytozome_v13) ``` # Session Info ```{r sessionInfo} sessionInfo() warnings() ```