---
title: "GenomicInteractionNodes Guide"
author: "Jianhong Ou, Yu Xiang, Xiaolin Wei, and Yarui Diao"
date: "`r doc_date()`"
package: "`r pkg_ver('GenomicInteractionNodes')`"
vignette: >
  %\VignetteIndexEntry{GenomicInteractionNodes Vignette}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
output:
  BiocStyle::html_document
---

```{r, echo=FALSE, results="hide", warning=FALSE, message=FALSE}
library(GenomicInteractionNodes)
library(rtracklayer)
library(TxDb.Hsapiens.UCSC.hg19.knownGene) 
library(org.Hs.eg.db)
knitr::opts_chunk$set(warning=FALSE, message=FALSE)
```

# Introduction

The profiles of genome interactions can be categorized into three levels:
loops, domains and compartments.
Cis-regulatory elements (cREs) regulate the distal target gene expression
through three-dimensional chromatin looping.
Most of these loops occur within the boundaries of topologically
associating domains (TADs).
The TAD boundaries are enriched for insulator binding proteins.
HiC analysis revealed the TADs can be segregated into an active (A)
and inactive (B) compartment. 
TADs are megabase-sized chromosomal regions where the loop sizes may 
very from thousand bases to mega bases.
We know that not all the loops within one TADs are involved in single
binding platform.
There are hundreds or even thousands binding platforms even in a single TAD.
Batut et al. report the 'tethering element' help to establish
long-range enhancer-promoter interactions.
Tethering elements are the one kind of the binding platform
which bring together enhancers and promoters for rapid gene activation.
For each binding platform, multiple loops may work together to work
as a single function, such as repression or promotion one or more 
distal target gene expression.
That means multiple promoters and enhancers may bind together to perform
as super enhancer or regulatory elements.

We defined such kind of binding platform as genomic interaction node or
interaction hot spot.
The interaction node can be a tethering element working as node,
or a super-enhancer (or super regulatory element region).
It is a level of genome organization higher than loops but smaller than TADs.
It is a kind of tethering element with multiple interaction loops.

The interaction node has two attributes:
1. it must contain multiple interaction loops,
2. it regulates one or more target genes.

To help users to define the interaction nodes, we developed the 
`Bioconductor` package: `GenomicInteractionNodes`.
The `GenomicInteractionNodes` package will define the interaction node
by testing the involved loops within one connected component
by Poisson distribution. The annotated loops will be used for 
enrichment analysis.

# Installation
You can install the package via `devtools::install_github` from `github`.
```{r devtoolsInstallation, eval=FALSE}
library(devtools)
install_github("jianhong/GenomicInteractionNodes")
```

You can also try to install it via `BiocManager::install` when it is ready in Bioconductor.
```{r BiocManagerInstallation, eval=FALSE}
library(BiocManager)
install("GenomicInteractionNodes")
```

# Quick start
Here is an example to use `GenomicInteractionNodes` to define interaction nodes
and do downstream enrichment analysis.

There are three steps,

1. `detectNodes`, define the nodes.

2. `annotateNodes`, annotate the nodes for downstream analysis.

3. `enrichmentAnalysis`, Gene Ontology enrichment analysis.

```{r quickStart}
## load library
library(GenomicInteractionNodes)
library(rtracklayer)
library(TxDb.Hsapiens.UCSC.hg19.knownGene) ## for human hg19
library(org.Hs.eg.db) ## used to convert gene_id to gene_symbol
library(GO.db) ## used for enrichment analysis

## import the interactions from bedpe file
p <- system.file("extdata", "WT.2.bedpe",
                 package = "GenomicInteractionNodes")
#### please try to replace the connection to your file path
interactions <- import(con=p, format="bedpe")

## define the nodes
nodes <- detectNodes(interactions)
names(nodes)
lapply(nodes, head, n=2)

## annotate the nodes by gene promoters
node_regions <- nodes$node_regions
node_regions <- annotateNodes(node_regions,
                        txdb=TxDb.Hsapiens.UCSC.hg19.knownGene,
                        orgDb=org.Hs.eg.db,
                        upstream=2000, downstream=500)
head(node_regions, n=2)

## Gene Ontology enrichment analysis
enrich <- enrichmentAnalysis(node_regions, orgDb=org.Hs.eg.db)
names(enrich$enriched)
names(enrich$enriched_in_compound)
res <- enrich$enriched$BP
res <- res[order(res$fdr), ]
head(res, n=2)
```

# Session Info
```{r sessionInfo}
sessionInfo()
```