---
title: "Working with the Gene Ontology"
author: 
  - name: Kevin Rue-Albrecht
    affiliation:
    - University of Oxford
    email: kevin.rue-albrecht@imm.ox.ac.uk
output: 
  BiocStyle::html_document:
    self_contained: yes
    toc: true
    toc_float: true
    toc_depth: 2
    code_folding: show
date: "`r doc_date()`"
package: "`r pkg_ver('iSEEpathways')`"
vignette: >
  %\VignetteIndexEntry{Working with the Gene Ontology}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}  
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    crop = NULL ## Related to https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html
)
```

```{r, eval=!exists("SCREENSHOT"), include=FALSE}
SCREENSHOT <- function(x, ...) knitr::include_graphics(x)
```

```{r vignetteSetup, echo=FALSE, message=FALSE, warning = FALSE}
## Track time spent on making the vignette
startTime <- Sys.time()

## Bib setup
library("RefManageR")

## Write bibliography information
bib <- c(
    R = citation(),
    BiocStyle = citation("BiocStyle")[1],
    knitr = citation("knitr")[1],
    RefManageR = citation("RefManageR")[1],
    rmarkdown = citation("rmarkdown")[1],
    sessioninfo = citation("sessioninfo")[1],
    testthat = citation("testthat")[1],
    iSEEpathways = citation("iSEEpathways")[1]
)
```

# Scenario

In this vignette, we demonstrate how one may use the package `r BiocStyle::Biocpkg("GO.db")` to dynamically display additional information about selected pathways in the interactive user interface.

# Demonstration

## Example data

First, we generate pathway analysis results for simulated data using `r BiocStyle::Biocpkg("fgsea")`.

In particular, we use the package `r BiocStyle::Biocpkg("org.Hs.eg.db")` to fetch real gene sets.
To reduce memory footprint, we retain only the gene sets associated with 15 to 500 genes.

Then, we simulate a score for each of the gene present in any of those remaining gene sets.
In practice, that score could be the log~2~ fold-change of the gene in a differential expression analysis (among other possibilities).

Finally, we perform an FGSEA on the simulated data.

```{r "start", message=FALSE, warning=FALSE}
library("org.Hs.eg.db")
library("fgsea")

# Example data ----

## Pathways
pathways <- select(org.Hs.eg.db, keys(org.Hs.eg.db, "SYMBOL"), c("GOALL"), keytype = "SYMBOL")
pathways <- subset(pathways, ONTOLOGYALL == "BP")
pathways <- unique(pathways[, c("SYMBOL", "GOALL")])
pathways <- split(pathways$SYMBOL, pathways$GOALL)
len_pathways <- lengths(pathways)
pathways <- pathways[len_pathways > 15 & len_pathways < 500]

## Features
set.seed(1)
# simulate a score for all genes found across all pathways
feature_stats <- rnorm(length(unique(unlist(pathways))))
names(feature_stats) <- unique(unlist(pathways))
# arbitrarily select a pathway to simulate enrichment
pathway_id <- "GO:0046324"
pathway_genes <- pathways[[pathway_id]]
# increase score of genes in the selected pathway to simulate enrichment
feature_stats[pathway_genes] <- feature_stats[pathway_genes] + 1

# fgsea ----

set.seed(42)
fgseaRes <- fgsea(pathways = pathways, 
                  stats    = feature_stats,
                  minSize  = 15,
                  maxSize  = 500)
head(fgseaRes[order(pval), ])
```

Then, we embed the `r BiocStyle::Biocpkg("fgsea")` results in a `r BiocStyle::Biocpkg("SummarizedExperiment")` object.

In this case, we create an empty `?SummarizedExperiment-class` object, without any simulated count data nor metadata, as we will not be using any of those data in this example.

We then embed the pathway analysis results in the newly created `?SummarizedExperiment-class` object.

But first, we reorder the results by increasing p-value.
Although not essential, this implicitly defines the default ordering of the table in the live app.

```{r, message=FALSE, warning=FALSE}
library("SummarizedExperiment")
library("iSEEpathways")
se <- SummarizedExperiment()
fgseaRes <- fgseaRes[order(pval), ]
se <- embedPathwaysResults(fgseaRes, se, name = "fgsea", class = "fgsea", pathwayType = "GO")
```

## Pathway information

In this example, we configure the app option `PathwaysTable.select.details` to define a function that,
given the identifier of the GO term currently selected in a panel,
displays information about that GO term.

Although not essential, this is a user-friendly and immediate way to 'translate' machine-friendly database identifiers into human-friendly descriptions.

```{r, message=FALSE, warning=FALSE}
library("iSEE")
library("GO.db")
library("shiny")
go_details <- function(x) {
    info <- select(GO.db, x, c("TERM", "ONTOLOGY", "DEFINITION"), "GOID")
    html <- list(p(strong(info$GOID), ":", info$TERM, paste0("(", info$ONTOLOGY, ")")))
    if (!is.na(info$DEFINITION)) {
        html <- append(html, list(p(info$DEFINITION)))
    }
    tagList(html)
}
se <- registerAppOptions(se, PathwaysTable.select.details = go_details)
```

## Live app

Finally, we configure the app initial state and launch the live app.

```{r, message=FALSE}
app <- iSEE(se, initial = list(
  PathwaysTable(ResultName="fgsea", Selected = "GO:0046324", PanelWidth = 12L)
))

if (interactive()) {
  shiny::runApp(app)
}
```

```{r, echo=FALSE, out.width="100%"}
SCREENSHOT("screenshots/gene_ontology.png", delay=20)
```


# Reproducibility

The `r Biocpkg("iSEEpathways")` package `r Citep(bib[["iSEEpathways"]])` was made possible thanks to:

* R `r Citep(bib[["R"]])`
* `r Biocpkg("BiocStyle")` `r Citep(bib[["BiocStyle"]])`
* `r CRANpkg("knitr")` `r Citep(bib[["knitr"]])`
* `r CRANpkg("RefManageR")` `r Citep(bib[["RefManageR"]])`
* `r CRANpkg("rmarkdown")` `r Citep(bib[["rmarkdown"]])`
* `r CRANpkg("sessioninfo")` `r Citep(bib[["sessioninfo"]])`
* `r CRANpkg("testthat")` `r Citep(bib[["testthat"]])`

This package was developed using `r BiocStyle::Biocpkg("biocthis")`.


Code for creating the vignette

```{r createVignette, eval=FALSE}
## Create the vignette
library("rmarkdown")
system.time(render("gene-ontology.Rmd", "BiocStyle::html_document"))

## Extract the R code
library("knitr")
knit("gene-ontology.Rmd", tangle = TRUE)
```

Date the vignette was generated.

```{r reproduce1, echo=FALSE}
## Date the vignette was generated
Sys.time()
```

Wallclock time spent generating the vignette.

```{r reproduce2, echo=FALSE}
## Processing time in seconds
totalTime <- diff(c(startTime, Sys.time()))
round(totalTime, digits = 3)
```

`R` session information.

```{r reproduce3, echo=FALSE}
## Session info
library("sessioninfo")
options(width = 120)
session_info()
```



# Bibliography

This vignette was generated using `r Biocpkg("BiocStyle")` `r Citep(bib[["BiocStyle"]])`
with `r CRANpkg("knitr")` `r Citep(bib[["knitr"]])` and `r CRANpkg("rmarkdown")` `r Citep(bib[["rmarkdown"]])` running behind the scenes.

Citations made with `r CRANpkg("RefManageR")` `r Citep(bib[["RefManageR"]])`.

```{r vignetteBiblio, results = "asis", echo = FALSE, warning = FALSE, message = FALSE}
## Print bibliography
PrintBibliography(bib, .opts = list(hyperlink = "to.doc", style = "html"))
```