---
title: "VplotR"
author: "Jacques Serizay"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{VplotR}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, eval = TRUE, echo=FALSE, results="hide", warning=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
suppressPackageStartupMessages({
library(GenomicRanges)
library(ggplot2)
library(VplotR)
})
```
- [Introduction](#introduction)
- [Overview](#overview)
- [Installation](#installation)
- [Importing sequencing datasets](#importing-sequencing-datasets)
- [Using `importPEBamFiles()` function](#using-importpebamfiles-function)
- [Provided datasets](#provided-datasets)
- [Fragment size distribution](#fragment-size-distribution)
- [Vplot(s)](#vplots)
- [Single Vplot](#single-vplot)
- [Multiple Vplots](#multiple-vplots)
- [Vplots normalization](#vplots-normalization)
- [Footprints](#footprints)
- [Local fragment distribution](#local-fragment-distribution)
- [Session Info](#session-info)
## Introduction
### Overview
VplotR is an R package streamlining the process of generating V-plots,
i.e. two-dimensional paired-end fragment density plots.
It contains functions to import paired-end sequencing bam files from any
type of DNA accessibility experiments (e.g. ATAC-seq, DNA-seq, MNase-seq)
and can produce V-plots and one-dimensional footprint profiles over single
or aggregated genomic loci of interest. The R package is well integrated
within the Bioconductor environment and easily fits in standard genomic
analysis workflows. Integrating V-plots into existing analytical frameworks
has already brought additional insights in chromatin organization
(Serizay et al., 2020).
The main user-level functions of VplotR are `getFragmentsDistribution()`,
`plotVmat()`, `plotFootprint()` and `plotProfile()`.
* `getFragmentsDistribution()` computes the distribution of fragment sizes
over sets of genomic ranges;
* `plotVmat()` is used to compute fragment density and generate V-plots;
* `plotFootprint()` generates the MNase-seq or ATAC-seq footprint at a
set of genomic ranges.
* `plotProfile()` is used to plot the distribution of paired-end fragments
at a single locus of interest.
### Installation
VplotR can be installed from Bioconductor:
```{r eval = FALSE}
if(!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("VplotR")
library("VplotR")
```
## Importing sequencing datasets
### Using `importPEBamFiles()` function
Paired-end .bam files are imported using the `importPEBamFiles()`
function as follows:
```{r eval = TRUE}
library(VplotR)
bamfile <- system.file("extdata", "ex1.bam", package = "Rsamtools")
fragments <- importPEBamFiles(
bamfile,
shift_ATAC_fragments = TRUE
)
fragments
```
### Provided datasets
Several datasets are available for this package:
* Sets of tissue-specific ATAC-seq experiments in young adult C. elegans
(Serizay et al., 2020):
```{r, eval = TRUE}
data(ce11_proms)
ce11_proms
data(ATAC_ce11_Serizay2020)
ATAC_ce11_Serizay2020
```
* MNase-seq experiment in yeast (Henikoff et al., 2011)
and ABF1 binding sites:
```{r, eval = TRUE}
data(ABF1_sacCer3)
ABF1_sacCer3
data(MNase_sacCer3_Henikoff2011)
MNase_sacCer3_Henikoff2011
```
## Fragment size distribution
A preliminary control to check the distribution of fragment
sizes (regardless of their location relative to genomic loci) can be
performed using the `getFragmentsDistribution()` function.
```{r, eval = TRUE}
df <- getFragmentsDistribution(
MNase_sacCer3_Henikoff2011,
ABF1_sacCer3
)
p <- ggplot(df, aes(x = x, y = y)) + geom_line() + theme_ggplot2()
p
```
## Vplot(s)
### Single Vplot
Once data is imported, a V-plot of paired-end fragments over loci of
interest is generated using the `plotVmat()` function:
```{r, eval = TRUE}
p <- plotVmat(x = MNase_sacCer3_Henikoff2011, granges = ABF1_sacCer3)
p
```
### Multiple Vplots
The generation of multiple V-plots can be parallelized using a list of
parameters:
```{r, eval = TRUE}
list_params <- list(
"MNase\n@ ABF1" = list(MNase_sacCer3_Henikoff2011, ABF1_sacCer3),
"MNase\n@ random loci" = list(
MNase_sacCer3_Henikoff2011, sampleGRanges(ABF1_sacCer3)
)
)
p <- plotVmat(
list_params,
cores = 1
)
p
```
For instance, ATAC-seq fragment density can be visualized at different classes
of ubiquitous and tissue-specific promoters in *C. elegans*.
```{r, eval = TRUE}
list_params <- list(
"Germline ATACseq\n@ Ubiq. proms" = list(
ATAC_ce11_Serizay2020[['Germline']],
ce11_proms[ce11_proms$which.tissues == 'Ubiq.']
),
"Germline ATACseq\n@ Germline proms" = list(
ATAC_ce11_Serizay2020[['Germline']],
ce11_proms[ce11_proms$which.tissues == 'Germline']
),
"Neuron ATACseq\n@ Ubiq. proms" = list(
ATAC_ce11_Serizay2020[['Neurons']],
ce11_proms[ce11_proms$which.tissues == 'Ubiq.']
),
"Neuron ATACseq\n@ Neuron proms" = list(
ATAC_ce11_Serizay2020[['Neurons']],
ce11_proms[ce11_proms$which.tissues == 'Neurons']
)
)
p <- plotVmat(
list_params,
cores = 1,
nrow = 2, ncol = 5
)
p
```
### Vplots normalization
Different normalization approaches are available using the `normFun` argument.
* Un-normalized raw counts can be plotted by specifying `normFun = 'none'`.
```{r, eval = TRUE}
# No normalization
p <- plotVmat(
list_params,
cores = 1,
nrow = 2, ncol = 5,
verbose = FALSE,
normFun = 'none'
)
p
```
* By default, plots are normalized by the library depth of the sequencing run
and by the number of loci used to compute fragment density.
```{r, eval = TRUE}
# Library depth + number of loci of interest (default)
p <- plotVmat(
list_params,
cores = 1,
nrow = 2, ncol = 5,
verbose = FALSE,
normFun = 'libdepth+nloci'
)
p
```
* Alternatively, heatmaps can be internally z-scored or scaled to a specific
quantile.
```{r, eval = TRUE}
# Zscore
p <- plotVmat(
list_params,
cores = 1,
nrow = 2, ncol = 5,
verbose = FALSE,
normFun = 'zscore'
)
p
# Quantile
p <- plotVmat(
list_params,
cores = 1,
nrow = 2, ncol = 5,
verbose = FALSE,
normFun = 'quantile',
s = 0.99
)
p
```
## Footprints
VplotR also implements a function to profile the footprint from MNase or
ATAC-seq over sets of genomic loci. For instance, CTCF is known for its
~40-bp large footprint at its binding loci.
```{r}
p <- plotFootprint(
MNase_sacCer3_Henikoff2011,
ABF1_sacCer3
)
p
```
## Local fragment distribution
VplotR provides a function to plot the distribution of paired-end
fragments over an individual genomic window.
```{r}
data(MNase_sacCer3_Henikoff2011_subset)
genes_sacCer3 <- GenomicFeatures::genes(TxDb.Scerevisiae.UCSC.sacCer3.sgdGene::
TxDb.Scerevisiae.UCSC.sacCer3.sgdGene
)
p <- plotProfile(
fragments = MNase_sacCer3_Henikoff2011_subset,
window = "chrXV:186,400-187,400",
loci = ABF1_sacCer3,
annots = genes_sacCer3,
min = 20, max = 200, alpha = 0.1, size = 1.5
)
p
```
## Session Info
```{r echo = TRUE, collapse = TRUE, eval = TRUE}
sessionInfo()
```