---
title: "brendaDb"
author:
- name: Yi Zhou
  affiliation: Institute of Bioinformatics, University of Georgia
  email: Yi.Zhou@uga.edu
date: "`r Sys.Date()`"
output:
  BiocStyle::html_document:
    toc_float: true
vignette: >
  %\VignetteIndexEntry{brendaDb}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

# Overview
`r Biocpkg("brendaDb")` aims to make importing and analyzing data from the [BRENDA database](https://www.brenda-enzymes.org) easier. The main functions include:

- Read [text file downloaded from BRENDA](https://www.brenda-enzymes.org/download_brenda_without_registration.php) into an R `tibble`
- Retrieve information for specific enzymes
- Query enzymes using their synonyms, gene symbols, etc.
- Query enzyme information for specific [BioCyc](https://biocyc.org) pathways

For bug reports or feature requests, please go to the [GitHub repository](https://github.com/y1zhou/brendaDb/issues).

# Installation
`r Biocpkg("brendaDb")` is a *Bioconductor* package and can be installed through `BiocManager::install()`.
```{r, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("brendaDb", dependencies=TRUE)
```

Alternatively, install the development version from GitHub.
```{r, setup, message=FALSE}
if(!requireNamespace("brendaDb")) {
  devtools::install_github("y1zhou/brendaDb")
}
```

After the package is installed, it can be loaded into the *R* workspace by
```{r}
library(brendaDb)
```

# Getting Started
## Downloading the BRENDA Text File
Download the BRENDA database as [a text file](https://www.brenda-enzymes.org/download_brenda_without_registration.php) here. Alternatively, download the file in R (file updated 2019-04-24):
```{r, eval=FALSE}
brenda.filepath <- DownloadBrenda()
#> Please read the license agreement in the link below.
#>
#> https://www.brenda-enzymes.org/download_brenda_without_registration.php
#>
#> Found zip file in cache.
#> Extracting zip file...
```

The function downloads the file to a local cache directory. Now the text file can be loaded into R as a `tibble`:
```{r, eval=FALSE}
df <- ReadBrenda(brenda.filepath)
#> Reading BRENDA text file...
#> Converting text into a list. This might take a while...
#> Converting list to tibble and removing duplicated entries...
#> If you're going to use this data again, consider saving this table using data.table::fwrite().
```

As suggested in the function output, you may save the `df` object to a text file using `data.table::fwrite()` or to an R object using `save(df)`, and load the table using `data.table::fread()` or `load()`^[This requires the R package `r CRANpkg("data.table")` to be installed.]. Both methods should be much faster than reading the raw text file again using `ReadBrenda()`.

# Making Queries
Since BRENDA is a database for enzymes, all final queries are based on EC numbers.

## Query for Multiple Enzymes
If you already have a list of EC numbers in mind, you may call `QueryBrenda` directly:
```{r}
brenda_txt <- system.file("extdata", "brenda_download_test.txt",
                          package = "brendaDb")
df <- ReadBrenda(brenda_txt)
res <- QueryBrenda(df, EC = c("1.1.1.1", "6.3.5.8"), n.core = 2)

res

res[["1.1.1.1"]]
```

## Query Specific Fields
You can also query for certain fields to reduce the size of the returned object.
```{r}
ShowFields(df)

res <- QueryBrenda(df, EC = "1.1.1.1", fields = c("PROTEIN", "SUBSTRATE_PRODUCT"))
res[["1.1.1.1"]][["interactions"]][["substrate.product"]]
```

It should be noted that most fields contain a `fieldInfo` column and a `commentary` column. The `fieldInfo` column is what's extracted by BRENDA from the literature, and the `commentary` column is usually some context from the original paper. `#` symbols in the commentary correspond to the `proteinID`s, and `<>` enclose the corresponding `refID`s. For further information, please see [the README file from BRENDA](https://www.brenda-enzymes.org/download_brenda_without_registration.php).

## Query Specific Organisms
Note the difference in row numbers in the following example and in the one where we queried for [all organisms](#query-for-multiple-enzymes).

```{r}
res <- QueryBrenda(df, EC = "1.1.1.1", organisms = "Homo sapiens")
res$`1.1.1.1`
```

## Extract Information in Query Results
To transform the `brenda.entries` structure into a table, use the helper function `ExtractField()`.
```{r}
res <- QueryBrenda(df, EC = c("1.1.1.1", "6.3.5.8"), n.core = 2)
ExtractField(res, field = "parameters$ph.optimum")
```

As shown above, the returned table consists of three parts: the EC number, organism-related information (organism, protein ID, uniprot ID, and commentary on the organism), and extracted field information (description, commentary, etc.).

# Foreign ID Retrieval
## Querying Synonyms
A lot of the times we have a list of gene symbols or enzyme names instead of EC numbers. In this case, a helper function can be used to find the corresponding EC numbers:

```{r}
ID2Enzyme(brenda = df, ids = c("ADH4", "CD38", "pyruvate dehydrogenase"))
```

The `EC` column can be then handpicked and used in `QueryBrenda()`.

## BioCyc Pathways
Often we are interested in the enzymes involved in a specific [BioCyc](https://biocyc.org) pathway. As BioCyc now requires login credentials
for using their web service, users are recommended to use the [metabolike](https://github.com/y1zhou/metabolike) package for more advanced queries.

# Additional Information {.unnumbered}
By default `QueryBrenda` uses all available cores, but often limiting `n.core` could give better performance as it reduces the overhead. The following are results produced on a machine with 40 cores (2 Intel Xeon CPU E5-2640 v4 @ 3.4GHz), and 256G of RAM:
```{r, eval=FALSE}
EC.numbers <- head(unique(df$ID), 100)
system.time(QueryBrenda(df, EC = EC.numbers, n.core = 0))  # default
#  user  system elapsed
# 4.528   7.856  34.567
system.time(QueryBrenda(df, EC = EC.numbers, n.core = 1))
#  user  system elapsed
# 22.080   0.360  22.438
system.time(QueryBrenda(df, EC = EC.numbers, n.core = 2))
#  user  system elapsed
# 0.552   0.400  13.597
system.time(QueryBrenda(df, EC = EC.numbers, n.core = 4))
#  user  system elapsed
# 0.688   0.832   9.517
system.time(QueryBrenda(df, EC = EC.numbers, n.core = 8))
#  user  system elapsed
# 1.112   1.476  10.000
```

```{r}
sessionInfo()
```