Fit-Hi-C is a tool for assigning statistical confidence estimates to intra-chromosomal contact maps produced by genome-wide genome architecture assays such as Hi-C. Compared to Python original, Fit-Hi-C R port has the following advantages:
To install this package, start R and enter
## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("FitHiC")
There are two ways to retrieve development versions
## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("BiocInstaller")
useDevel()
biocLite("FitHiC")
x.y.z, open a terminal and enterwget http://bioconductor.org/packages/devel/bioc/src/contrib/FitHiC_x.y.z.tar.gz .
R CMD INSTALL FitHiC_x.y.z.tar.gz
Before running Fit-Hi-C, two input files should be prepared.
| Chromosome.Name | Column.2 | Mid.Point | Hit.Count | Column.5 |
|---|---|---|---|---|
| 1 | 0 | 1305 | 0 | 0 |
| 1 | 0 | 2635 | 233 | 1 |
| 1 | 0 | 4756 | 876 | 1 |
| 1 | 0 | 8568 | 1076 | 1 |
| 1 | 0 | 10384 | 1210 | 1 |
| 1 | 0 | 12246 | 639 | 1 |
| Chromosome1.Name | Mid.Point.1 | Chromosome2.Name | Mid.Point.2 | Hit.Count |
|---|---|---|---|---|
| 10 | 100894 | 10 | 150593 | 2 |
| 10 | 100894 | 10 | 162267 | 1 |
| 10 | 100894 | 10 | 169783 | 2 |
| 10 | 100894 | 10 | 179515 | 3 |
| 10 | 100894 | 10 | 182528 | 1 |
| 10 | 100894 | 10 | 185071 | 1 |
Besides, OUTDIR, the path where the output files will be stored, is also required to be specified.
After the input data is well prepared, you can easily run Fit-Hi-C in R as:
library("FitHiC")
FitHiC(FRAGSFILE, INTERSFILE, OUTDIR, ...)If you want to output images simultaneously, explicitly set visual to TRUE:
library("FitHiC")
FitHiC(FRAGSFILE, INTERSFILE, OUTDIR, ..., visual=TRUE)The pre-processed Hi-C data is from Yeast - EcoRI 1. FRAGSFILE and INTERSFILE are located in system.file("extdata", "fragmentLists/Duan_yeast_EcoRI.gz", package = "FitHiC") and system.file( "extdata", "contactCounts/Duan_yeast_EcoRI.gz", package = "FitHiC"), respectively. When input data is ready, run as follows:
library("FitHiC")
fragsfile <- system.file("extdata", "fragmentLists/Duan_yeast_EcoRI.gz",
package = "FitHiC")
intersfile <- system.file("extdata", "contactCounts/Duan_yeast_EcoRI.gz",
package = "FitHiC")
FitHiC(fragsfile, intersfile, getwd(), libname="Duan_yeast_EcoRI",
distUpThres=250000, distLowThres=10000)Internally, Fit-Hi-C will successively call generate_FragPairs, read_ICE_biases, read_All_Interactions, calculateing_Probabilities, fit_Spline methods. The execution of Fit-Hi-C will be successfully completed till the following log appears:
## Fit-Hi-C is processing ...
## Running generate_FragPairs method ...
## Complete generate_FragPairs method [OK]
## Running read_All_Interactions method ...
## Complete read_All_Interactions method [OK]
## Running calculating_Probabilities method ...
## Writing Duan_yeast_EcoRI.fithic_pass1.txt
## Complete calculating_Probabilities method [OK]
## Running fit_Spline method ...
## Writing p-values to file Duan_yeast_EcoRI.spline_pass1.significances.txt.gz
## Complete fit_Spline method [OK]
## Running calculating_Probabilities method ...
## Writing Duan_yeast_EcoRI.fithic_pass2.txt
## Complete calculating_Probabilities method [OK]
## Running fit_Spline method ...
## Writing p-values to file Duan_yeast_EcoRI.spline_pass2.significances.txt.gz
## Complete fit_Spline method [OK]
## Execution of Fit-Hi-C completed successfully. [DONE]
## .Primitive("return")
The output files come from two internal methods called by Fit-Hi-C.
| avgGenomicDist | contactProbability | standardError | noOfLocusPairs | totalOfContactCounts |
|---|---|---|---|---|
| 10105 | 3.12e-05 | 2.7e-06 | 322 | 22212 |
| 10315 | 3.05e-05 | 2.5e-06 | 330 | 22251 |
| 10545 | 2.87e-05 | 2.1e-06 | 350 | 22191 |
| 10779 | 2.97e-05 | 3.0e-06 | 344 | 22583 |
| 10982 | 3.16e-05 | 2.7e-06 | 323 | 22532 |
| 11196 | 3.32e-05 | 2.7e-06 | 302 | 22185 |
| avgGenomicDist | contactProbability | standardError | noOfLocusPairs | totalOfContactCounts |
|---|---|---|---|---|
| 10107 | 1.15e-05 | 8e-07 | 252 | 6428 |
| 10317 | 1.31e-05 | 9e-07 | 266 | 7709 |
| 10546 | 1.43e-05 | 8e-07 | 281 | 8887 |
| 10779 | 1.27e-05 | 8e-07 | 285 | 7974 |
| 10982 | 1.32e-05 | 8e-07 | 255 | 7426 |
| 11196 | 1.40e-05 | 8e-07 | 238 | 7356 |
| chr1 | fragmentMid1 | chr2 | fragmentMid2 | contactCount | p_value | q_value |
|---|---|---|---|---|---|---|
| 10 | 100894 | 10 | 150593 | 2 | 0.9988785 | 1 |
| 10 | 100894 | 10 | 162267 | 1 | 0.9985433 | 1 |
| 10 | 100894 | 10 | 169783 | 2 | 0.9708609 | 1 |
| 10 | 100894 | 10 | 179515 | 3 | 0.8072602 | 1 |
| 10 | 100894 | 10 | 182528 | 1 | 0.9831568 | 1 |
| 10 | 100894 | 10 | 185071 | 1 | 0.9795001 | 1 |
| chr1 | fragmentMid1 | chr2 | fragmentMid2 | contactCount | p_value | q_value |
|---|---|---|---|---|---|---|
| 10 | 100894 | 10 | 150593 | 2 | 0.9813195 | 1 |
| 10 | 100894 | 10 | 162267 | 1 | 0.9902851 | 1 |
| 10 | 100894 | 10 | 169783 | 2 | 0.8983241 | 1 |
| 10 | 100894 | 10 | 179515 | 3 | 0.6547083 | 1 |
| 10 | 100894 | 10 | 182528 | 1 | 0.9571117 | 1 |
| 10 | 100894 | 10 | 185071 | 1 | 0.9501637 | 1 |
If visual is set to TRUE, corresponding images will be also outputed:
For questions about the use of Fit-Hi-C method, to request pre-processed Hi-C data or additional features and scripts, or to report bugs and provide feedback please e-mail Ferhat Ay.
Ferhat Ay <ferhatay at uw period edu>
Duan Z, et al. 2010. A three-dimensional model of the yeast genome. Nature 465: 363–367.↩