<!--
%\VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{Accessing the KEGG REST API}
-->


```{r setup, echo=FALSE}
library(knitr)
options(width=80)
```
```{r wrap-hook, echo=FALSE}
hook_output = knit_hooks$get('output')
knit_hooks$set(output = function(x, options) {
  # this hook is used only when the linewidth option is not NULL
  if (!is.null(n <- options$linewidth)) {
    x = knitr:::split_lines(x)
    # any lines wider than n should be wrapped
    if (any(nchar(x) > n)) x = strwrap(x, width = n)
    x = paste(x, collapse = '\n')
  }
  hook_output(x, options)
})
```

# KEGGREST

[KEGG](https://www.kegg.jp/kegg/)
is a database resource for understanding high-level functions
and utilities of the biological system, such as the cell, the organism
and the ecosystem, from molecular-level information, especially
large-scale molecular datasets generated by genome sequencing and
other high-throughput experimental technologies.

`KEGGREST` allows access to the
[KEGG REST API](https://www.kegg.jp/kegg/docs/keggapi.html). Since
KEGG disabled the KEGG SOAP server
on December 31, 2012 (which means the `KEGGSOAP` package will no
longer work), `KEGGREST` serves as a replacement.

The interface to `KEGGREST` is simpler and in some ways more
powerful than `KEGGSOAP`; however, not all the functionality
that was available through the SOAP API has been exposed
in the REST API. If and when more functionality is exposed
on the server side, this package will be updated to take
advantage of it.

## Overview

The KEGG REST API is built on some simple operations:
`info`, `list`, `find`, `get`, `conv`, and `link`.
The corresponding `R` functions in `KEGGREST` are:
`keggInfo()`, `keggList()`, `keggFind()`, `keggGet()`,
`keggConv`, and `keggLink()`.


## Exploring KEGG Resources with `keggList()`

KEGG exposes a number of databases. To get an idea of
what is available, run `listDatabases()`:

```{r listDatabases}
library(KEGGREST)
listDatabases()
```
You can use these databases in further queries. Note that in many
cases you can also use a three-letter KEGG organism code or a 
"T number" (genome identifier) in the same place you would use 
one of these database names.

You can obtain the list of organisms available in KEGG with
the `keggList()` function:

```{r get_organisms}
org <- keggList("organism")
head(org)
```

From `KEGGREST`'s point of view, you've just asked KEGG
to show you the name of every entry in the "organism" database.

Therefore, the complete list of entities that can be
queried with `KEGGREST` can be obtained as follows:

```{r list_queryables}
queryables <- c(listDatabases(), org[,1], org[,2])
```

You could also ask for every entry in the "hsa" (_Homo sapiens_)
database as follows:

```{r query_hsa, eval=FALSE}
keggList("hsa")
```

## Get specific entries with `keggGet()`

Once you have a list of specific KEGG identifiers, use
`keggGet()` to get more information about them. Here we look up
a human gene and an E. coli O157 gene:

```{r keggGet}
query <- keggGet(c("hsa:10458", "ece:Z5100"))
```

As expected, this returns two items:

```{r querylength}
length(query)
```

Behind the scenes, `KEGGREST` downloaded and parsed a KEGG
[flat file](https://www.kegg.jp/kegg/rest/dbentry.html), which you
can now explore:

```{r explore}
names(query[[1]])
query[[1]]$ENTRY
query[[1]]$DBLINKS
```

`keggGet()` can also return amino acid sequences as `AAStringSet` objects
(from the `Biostrings` package):

```{r aaseq}
keggGet(c("hsa:10458", "ece:Z5100"), "aaseq") ## retrieves amino acid sequences
```

...or `DNAStringSet` objects if `option` is `ntseq`:

```{r ntseq}
keggGet(c("hsa:10458", "ece:Z5100"), "ntseq") ## retrieves nucleotide sequences
```


`keggGet()` can also return images:
```{r png}
png <- keggGet("hsa05130", "image") 
t <- tempfile()
library(png)
writePNG(png, t)
if (interactive()) browseURL(t)
```

__NOTE__: `keggGet()` can only return 10 result sets at once (this limitation
is on the server side). If you supply more than 10 inputs to `keggGet()`, 
`KEGGREST` will warn that only the first 10 results will be returned.

## Search by keywords with `keggFind()`

You can search for two separate keywords ("shiga" and "toxin" in this case):

```{r separate_keywords, linewidth=80}
head(keggFind("genes", c("shiga", "toxin")))
```

Or search for the two words together:

```{r keyphrase, linewidth=80}
head(keggFind("genes", "shiga toxin"))
```

Search for a chemical formula:
```{r formula}
head(keggFind("compound", "C7H10O5", "formula"))
```
Search for a chemical formula containing "O5" and "C7":
```{r formula2}
head(keggFind("compound", "O5C7", "formula"))
```

You can search for compounds with a particular exact mass:

```{r exact_mass}
keggFind("compound", 174.05, "exact_mass")
```

Because we've supplied a number with two decimal digits of precision,
KEGG will find all compounds with exact mass between 174.045 and 174.055.

Integer ranges can be used to find compounds by molecular weight:

```{r mol_weight}
head(keggFind("compound", 300:310, "mol_weight"))
```

## Convert identifiers with `keggConv()`

Convert between KEGG identifiers and outside identifiers.

You can either specify fully qualified identifiers:

```{r conv_with_ids}
keggConv("ncbi-proteinid", c("hsa:10458", "ece:Z5100"))
```

...or get the mapping for an entire species:

```{r conv_species_kegg_to_geneid}
head(keggConv("eco", "ncbi-geneid"))
```

Reversing the arguments does the opposite mapping:

```{r conv_species_geneid_to_kegg}
head(keggConv("ncbi-geneid", "eco"))

```

## Link across databases with `keggLink()`

Most of the `KEGGSOAP` functions whose names started with
"get", for example `get.pathways.by.genes()`, can be replaced
with the `keggLink()` function. Here we query all pathways
for human:

```{r keggLink}
head(keggLink("pathway", "hsa"))
```

...but you can also specify one or more genes (from multiple species):
```{r keggLink2}
keggLink("pathway", c("hsa:10458", "ece:Z5100"))
```