---
title: "Introduction to HiCParser"
author:
- name: Elise Maigné
affiliation:
- INRAE, MIAT
email: elise.maigne@inrae.fr
- name: Matthias Zytnicki
affiliation:
- INRAE, MIAT
email: matthias.zytnicki@inrae.fr
output:
BiocStyle::html_document:
self_contained: yes
toc: true
toc_float: true
toc_depth: 2
code_folding: show
date: "`r BiocStyle::doc_date()`"
package: "`r BiocStyle::pkg_ver('HiCParser')`"
vignette: >
%\VignetteIndexEntry{Introduction to HiCParser}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
crop = NULL,
warning = FALSE
)
```
# Basics
## Required knowledge
`r BiocStyle::Biocpkg("HiCParser")` is based on other packages and in
particular in those that have implemented the infrastructure needed for
dealing with HiC data with several replicates and conditions. Is provides
several parsers, for several HiC data standard format to import them
into R in a `r BiocStyle::Biocpkg("InteractionSet")` object.
## Citing `HiCParser`
We hope that `r BiocStyle::Biocpkg("HiCParser")` will be useful for your
research.
Please use the following information to cite the package and the overall
approach. Thank you!
```{r "citation"}
## Citation info
citation("HiCParser")
```
# Start using `HiCParser`
```{r "start", message=FALSE}
library("HiCParser")
```
`HiCParser` can import Hi-C data sets in various different formats:
- Cooler `.cool` or `.mcool` files.
- Juicer `.hic` files.
- HiC-Pro `.matrix` and `.bed` files.
- Tabular (`.tsv`, `.csv`, ...) files.
## Cooler files
### `.cool` files
To load `.cool` files generated by [Cooler][cooler-documentation]
[@cooler]:
```{r coolFormat}
# Path to each file
paths <- c(
"path/to/condition-1.replicate-1.cool",
"path/to/condition-1.replicate-2.cool",
"path/to/condition-1.replicate-3.cool",
"path/to/condition-2.replicate-1.cool",
"path/to/condition-2.replicate-2.cool",
"path/to/condition-2.replicate-3.cool"
)
# For the sake of the example, we will use the same file, several times
paths <- rep(
system.file("extdata",
"hicsample_21.cool",
package = "HiCParser"
),
6
)
# Condition and replicate of each file. Can be names instead of numbers.
conditions <- c(1, 1, 1, 2, 2, 2)
replicates <- c(1, 2, 3, 1, 2, 3)
# Instantiation of data set
hic.experiment <- parseCool(
paths,
conditions = conditions,
replicates = replicates
)
```
### `.mcool` files
To load `.mcool` files generated by [Cooler][cooler-documentation]
[@cooler]:
```{r mcoolFormat}
# Path to each file
paths <- c(
"path/to/condition-1.replicate-1.mcool",
"path/to/condition-1.replicate-2.mcool",
"path/to/condition-1.replicate-3.mcool",
"path/to/condition-2.replicate-1.mcool",
"path/to/condition-2.replicate-2.mcool",
"path/to/condition-2.replicate-3.mcool"
)
# For the sake of the example, we will use the same file, several times
paths <- rep(
system.file("extdata",
"hicsample_21.mcool",
package = "HiCParser"
),
6
)
# Condition and replicate of each file. Can be names instead of numbers.
conditions <- c(1, 1, 1, 2, 2, 2)
replicates <- c(1, 2, 3, 1, 2, 3)
# mcool files can store several resolutions.
# We will mention the one we need.
binSize <- 5000000
# Instantiation of data set
# The same function "parseCool" is used for cool and mcool files
hic.experiment <- parseCool(
paths,
conditions = conditions,
replicates = replicates,
binSize = binSize # Specified for .mcool files.
)
```
## hic files
To load `.hic` files generated by [Juicer][juicer-documentation] [@juicer]:
```{r hicFormat}
# Path to each file
paths <- c(
"path/to/condition-1.replicate-1.hic",
"path/to/condition-1.replicate-2.hic",
"path/to/condition-2.replicate-1.hic",
"path/to/condition-2.replicate-2.hic",
"path/to/condition-3.replicate-1.hic"
)
# For the sake of the example, we will use the same file, several times
paths <- rep(
system.file("extdata",
"hicsample_21.hic",
package = "HiCParser"
),
6
)
# Condition and replicate of each file. Can be names instead of numbers.
conditions <- c(1, 1, 1, 2, 2, 2)
replicates <- c(1, 2, 3, 1, 2, 3)
# hic files can store several resolutions.
# We will mention the one we need.
binSize <- 5000000
# Instantiation of data set
hic.experiment <- parseHiC(
paths,
conditions = conditions,
replicates = replicates,
binSize = binSize
)
```
Currently, `HiCParser` supports the hic format up to the version 9.
## HiC-Pro files
To load `.matrix` and `.bed` files generated by [HiC-Pro][hicpro-documentation]
[@hicpro]:
```{r hicproFormat}
# Path to each matrix file
matrixPaths <- c(
"path/to/condition-1.replicate-1.matrix",
"path/to/condition-1.replicate-2.matrix",
"path/to/condition-1.replicate-3.matrix",
"path/to/condition-2.replicate-1.matrix",
"path/to/condition-2.replicate-2.matrix",
"path/to/condition-2.replicate-3.matrix"
)
# For the sake of the example, we will use the same file, several times
matrixPaths <- rep(
system.file("extdata",
"hicsample_21.matrix",
package = "HiCParser"
),
6
)
# Path to each bed file
bedPaths <- c(
"path/to/condition-1.replicate-1.bed",
"path/to/condition-1.replicate-2.bed",
"path/to/condition-1.replicate-3.bed",
"path/to/condition-2.replicate-1.bed",
"path/to/condition-2.replicate-2.bed",
"path/to/condition-2.replicate-3.bed"
)
# Alternatively, if the same bed file is used, we can provide it only once
bedPaths <- system.file("extdata",
"hicsample_21.bed",
package = "HiCParser"
)
# Condition and replicate of each file. Can be names instead of numbers.
conditions <- c(1, 1, 1, 2, 2, 2)
replicates <- c(1, 2, 3, 1, 2, 3)
# Instantiation of data set
hic.experiment <- parseHiCPro(
matrixPaths = matrixPaths,
bedPaths = bedPaths,
conditions = conditions,
replicates = replicates
)
```
## Tabular files
A tabular file is a tab-separated multi-replicate sparse matrix with a header:
```
chromosome position 1 position 2 C1.R1 C1.R2 C1.R3 ...
Y 1500000 7500000 145 184 72 ...
```
The number of interactions between `position 1` and `position 2` of
`chromosome` are reported in each `condition.replicate` column. There is no
limit to the number of conditions and replicates.
To load Hi-C data in this format:
```{r tabFormat}
hic.experiment <- parseTabular(
system.file("extdata",
"hicsample_21.tsv",
package = "HiCParser"
),
sep = "\t"
)
```
# InteractionSet format
# Output : InteractionSet format
The output is a `r BiocStyle::Biocpkg("InteractionSet")`.
This object can store one or several samples.
Please read the documentation associated with the
`r BiocStyle::Biocpkg("InteractionSet")` package to known more about this
format.
```{r}
library("HiCParser")
hicFilePath <- system.file("extdata", "hicsample_21.hic", package = "HiCParser")
hic.experiment <- parseHiC(
paths = rep(hicFilePath, 6),
binSize = 5000000,
conditions = rep(seq(2), each = 3),
replicates = rep(seq(3), 2)
)
hic.experiment
```
The conditions and replicates are reported in the `colData` slot :
```{r}
SummarizedExperiment::colData(hic.experiment)
```
They corresponds to columns of the `assays` matrix (containing
interactions values):
```{r}
head(SummarizedExperiment::assay(hic.experiment))
```
The positions of interactions are in the `interactions` slot of the object:
```{r}
InteractionSet::interactions(hic.experiment)
```
## Additional utils functions
A function `mergeInteractionSet` to merge `InteractionSet` objects,
from the same experiment (for differents replicates or conditions).
It merges the the data containing bins of interactions and fill the assays
matrix accordingly, returning an assays matrix with several columns.
The object returned by the function is an `InteractionSet`.
Here is a fictitious example:
```{r}
path <- system.file("extdata", "hicsample_21.cool", package = "HiCParser")
object1 <- parseCool(path, conditions = 1, replicates = 1)
# Creating an object with a different condition
object2 <- parseCool(path, conditions = 2, replicates = 1)
```
The merged object:
```{r}
objectMerged <- mergeInteractionSet(object1, object2)
SummarizedExperiment::colData(objectMerged)
head(SummarizedExperiment::assay(objectMerged))
```
# Reproducibility
This package was developed using `r BiocStyle::Biocpkg("biocthis")`.
`R` session information.
```{r reproduce3, echo=FALSE}
## Session info
library("sessioninfo")
options(width = 120)
session_info()
```
# Bibliography
Lun ATL, Perry M and Ing-Simmons E (2016).
Infrastructure for genomic interactions: Bioconductor classes for Hi-C,
ChIA-PET and related experiments. F1000Res. 5, 950