--- title: "Identify reproducible genomic peaks from replicate ChIP-seq experiments" author: "Konstantin Krismer" date: "`r Sys.Date()`" output: rmarkdown::html_vignette bibliography: bibliography.bib vignette: > %\VignetteIndexEntry{Identify reproducible genomic peaks from replicate ChIP-seq experiments} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- IDR2D is an extension of the original method IDR [@li2011], which was intended for ChIP-seq peaks (or one-dimensional genomic data). This package applies the method to two-dimensional genomic data, such as interactions between two genomic loci (also called anchors). Genomic interaction data is generated by genome-wide methods such as Hi-C [@pmid20461051], ChIA-PET [@pmid19247990], and HiChIP [@pmid25128017]. ```{r setup, include=FALSE} knitr::opts_chunk$set(fig.width = 7, fig.height = 7, echo = FALSE, warning = FALSE, message = FALSE, dev = 'png', out.extra = 'style="border-width: 0;"') ``` # Input data Load example data: ```{r, echo = TRUE} rep1_df <- idr2d:::chipseq$rep1_df rep2_df <- idr2d:::chipseq$rep2_df ``` ## Example data - replicate 1 ```{r} library(DT) header <- htmltools::withTags(table( class = 'display', thead( tr( th("chromosome"), th("start coordinate"), th("end coordinate"), th("score") ) ) )) datatable(rep1_df[seq_len(min(nrow(rep1_df), 1000)), ], container = header, rownames = FALSE, options = list(searching = FALSE)) %>% formatRound("value", 3) ``` Only the first 1000 peaks are shown. ## Example data - replicate 2 ```{r} datatable(rep2_df[seq_len(min(nrow(rep2_df), 1000)), ], container = header, rownames = FALSE, options = list(searching = FALSE)) %>% formatRound("value", 3) ``` Only the first 1000 peaks are shown. # Analysis Load the package: ```{r, echo = TRUE} library(idr2d) ``` Estimate IDR: ```{r, echo = TRUE} idr_results <- estimate_idr1d(rep1_df, rep2_df, value_transformation = "log") rep1_idr_df <- idr_results$rep1_df ``` Important to note here is that the appropriate value transformation depends on the semantics of the *value* column (always the seventh column) in `rep1_df` and `rep2_df`. This column is used to establish a ranking between interactions, with highly significant interactions on top of the list and least significant interactions (i.e., most likely noise) at the bottom of the list. The ranking is established by the *value* column, sorted in descending order. Since our *value* column contains FDRs (the lower, the more significant), we need to transform the values to comply with the assumption that high values indicate high significance. For p-values and p-value derived measures (like Q values), the `log_additive_inverse` transformation (`-log(x)`) is recommended. ## Results ```{r} chr <- start <- end <- rank <- rep_rank <- value <- rep_value <- idr <- NULL header <- htmltools::withTags(table( class = 'display', thead( tr( th("chr."), th("start coordinate"), th("end coordinate"), th("rank in R1"), th("rank in R2"), th("transformed value in R1"), th("transformed value in R2"), th("IDR") ) ) )) df <- dplyr::select(rep1_idr_df, chr, start, end, rank, rep_rank, value, rep_value, idr) datatable(df[seq_len(min(nrow(df), 1000)), ], rownames = FALSE, options = list(searching = FALSE), container = header) %>% formatRound(c("value", "rep_value", "idr"), 3) ``` Only the first 1000 observations are shown. ### Summary ```{r, echo = TRUE} summary(idr_results) ``` ### Distribution of IDRs ```{r, echo = TRUE} draw_idr_distribution_histogram(rep1_idr_df) ``` ### Rank - IDR dependence ```{r, echo = TRUE} draw_rank_idr_scatterplot(rep1_idr_df) ``` ### Value - IDR dependence ```{r, echo = TRUE} draw_value_idr_scatterplot(rep1_idr_df) ``` # Additional information Most of the functionality of the IDR2D package is also offered through the website at https://idr2d.mit.edu. For a more detailed discussion on IDR2D, please have a look at the IDR2D paper: **IDR2D identifies reproducible genomic interactions** Konstantin Krismer, Yuchun Guo, and David K. Gifford Nucleic Acids Research, Volume 48, Issue 6, 06 April 2020, Page e31; DOI: https://doi.org/10.1093/nar/gkaa030 # References