The data have been created for this publication and is described there:

Sebastian Steinhauser, Nils Kurzawa, Roland Eils, Carl Herrmann, A comprehensive comparison of tools for differential ChIP-seq analysis, Briefings in Bioinformatics, Volume 17, Issue 6, November 2016, Pages 953–966, https://doi.org/10.1093/bib/bbv110

two biological replicates of H3K36me3 in MCF7 cells were downloaded and pooled, which was used as source for the simulation of a broad signal data set (GEO: GSE31755).
To simulate a realistic ‘gold standard' data set for differential ChIP enrichment, we started from real ChIP data sets: one data set targeting FoxA1, as an example of a data set with sharp signal, and a H3K36me3 data set as an example of a broad enrichment. In each case (FoxA1 and H3K36me3), we simulated two ‘treatment' data sets (T1 and T2) representing the reference condition and the comparison condition, as well as ‘control' data sets (C1 and C2), corresponding to input. The treatment data sets (T) were obtained by merging regions corresponding to real signal (S) with background noise (B) (hence, T = S + B) The signal data set was obtained as follows: we performed peak calling using MACS2, using the provided input data set as a control. We selected the top 20,000 peaks identified by the peak caller, as they are most likely to represent truly enriched regions. In a first step, only the reads located in these regions were considered, and represent the reference signal data set (S1). To simulate the second condition, these peak regions were split into two groups: 10,000 peaks were kept as is (S20), and represent nondifferential peaks. The remaining 10,000 peaks were divided into 10 groups of 1000 peaks each, which were downsampled by random read sampling by 10%, 20%, etc. (S210,S220,…). These represent sets of differential peaks in which the level of differential enrichment is variable. Next, we simulated realistic background signal B: several public input data sets for the MCF7 cell line were downloaded, aligned and compared. The input data set showing the highest correlation with the original input was used to sample reads, until the library size of the real ChIP-seq experiment was reached. If not enough reads were available in one input, it was merged with a second similar one. This is done twice, to simulate a background for the reference and the second condition (B1 and B2). In summary, our two data sets are composed of T1 = S1 + B1 and T2 = S2 + B2, where S2 = S20 + Σi = 10 → 100S2i. Comparing these two data sets should ideally yield 10,000 differential peaks, which represents our gold standard.

The bam files present in this package are the result of random sampling of each condition five times and then it has been cut to the exemple region with samtools.

