---
title: "Enrichment"
author:
- name: Y-h. Taguchi
  affiliation:  Department of Physics, Chuo University, Tokyo 112-8551, Japan
  email: tag@granular.com
output:   
    BiocStyle::html_document:
    toc: true
bibliography: references.bib
vignette: >
  %\VignetteIndexEntry{Enrichment}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r style, echo = FALSE, results = 'asis'}
BiocStyle::markdown()
```

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  crop = NULL,
  comment = "#>"
)
```



```{r setup}
library(TDbasedUFE)
library(TDbasedUFEadv)
library(DOSE)
library(enrichplot)
library(RTCGA.rnaseq)
library(RTCGA.clinical)
library(enrichR)
library(STRINGdb)
```

# Introduction

It might be helpful to demonstrate how to evaluate selected genes by enrichment analysis. Here, we show some of useful tools applied to the output from TDbasedUFEadv
In order foe this, we reproduce one example in "How to use TDbasedUFEadv" as follows.

``` {r}
Multi <- list(
  BLCA.rnaseq[seq_len(100), 1 + seq_len(1000)],
  BRCA.rnaseq[seq_len(100), 1 + seq_len(1000)],
  CESC.rnaseq[seq_len(100), 1 + seq_len(1000)],
  COAD.rnaseq[seq_len(100), 1 + seq_len(1000)]
)
Z <- prepareTensorfromList(Multi, 10L)
Z <- aperm(Z, c(2, 1, 3))
Clinical <- list(BLCA.clinical, BRCA.clinical, CESC.clinical, COAD.clinical)
Multi_sample <- list(
  BLCA.rnaseq[seq_len(100), 1, drop = FALSE],
  BRCA.rnaseq[seq_len(100), 1, drop = FALSE],
  CESC.rnaseq[seq_len(100), 1, drop = FALSE],
  COAD.rnaseq[seq_len(100), 1, drop = FALSE]
)
# patient.stage_event.tnm_categories.pathologic_categories.pathologic_m
ID_column_of_Multi_sample <- c(770, 1482, 773, 791)
# patient.bcr_patient_barcode
ID_column_of_Clinical <- c(20, 20, 12, 14)
Z <- PrepareSummarizedExperimentTensor(
  feature = colnames(ACC.rnaseq)[1 + seq_len(1000)],
  sample = array("", 1), value = Z,
  sampleData = prepareCondTCGA(
    Multi_sample, Clinical,
    ID_column_of_Multi_sample, ID_column_of_Clinical
  )
)
HOSVD <- computeHosvd(Z)
cond <- attr(Z, "sampleData")
index <- selectFeatureProj(HOSVD, Multi, cond, de = 1e-3, input_all = 3) # Batch mode
head(tableFeatures(Z, index))
genes <- unlist(lapply(strsplit(tableFeatures(Z, index)[, 1], "|",
  fixed = TRUE
), "[", 1))
entrez <- unlist(lapply(strsplit(tableFeatures(Z, index)[, 1], "|",
  fixed = TRUE
), "[", 2))
```

# Enrichr 

Enrichr[@Enrichr] is one of tools that often provides us significant results 
toward genes selected by TDbasedUFE and TDbasedUFEadv.

``` {r}
setEnrichrSite("Enrichr")
websiteLive <- TRUE
dbs <- c(
  "GO_Molecular_Function_2021", "GO_Cellular_Component_2021",
  "GO_Biological_Process_2021"
)
enriched <- enrichr(genes, dbs)
if (websiteLive) {
  plotEnrich(enriched$GO_Biological_Process_2021,
    showTerms = 20, numChar = 40, y = "Count",
    orderBy = "P.value"
  )
}
```

Enrichr can provide you huge number of enrichment analyses, 
many of which have good compatibility with the genes selected by
TDbasedUFE as well as TDbasedUFEadv by the experience. 
Please check Enrichr's web site to see what kinds of enrichment
analyses can be done.

# STRING

STRING[@STRING] is enrichment analyses based upon protein-protein interaction, 
which is known to provide often significant results toward genes selected by
TDbasedUFE as well as TDbasedUFEadv.

```{r}
options(timeout = 200)
string_db <- STRINGdb$new(
  version = "11.5",
  species = 9606, score_threshold = 200,
  network_type = "full", input_directory = ""
)
example1_mapped <- string_db$map(data.frame(genes = genes),
  "genes",
  removeUnmappedRows = TRUE
)
hits <- example1_mapped$STRING_id
string_db$plot_network(hits)
```

# enrichplot

Although these above can provide us enough number of information to evaluate 
the genes selected by TDbasedUFE as well as TDbasedUFEadv, one might need 
all one package for which one does not how to decide which category must
be evaluated in enrichment analysis.

In this case, we would recommend Metascape[@Metascape] that unfortunately  
does not have the ways approached from R. Thus, we recommend RITAN as
an alternative. It can list significant ones among multiple categories.


```{r}
edo <- enrichDGN(entrez)
dotplot(edo, showCategory=30) + ggtitle("dotplot for ORA")
```
```{r}
sessionInfo()
```