--- title: "erma: epigenomics road map adventures" author: "Vincent J. Carey, stvjc at channing.harvard.edu" date: "August 2015" output: BiocStyle::html_document: highlight: pygments number_sections: yes toc: yes --- ```{r style, echo = FALSE, results = 'asis'} BiocStyle::markdown() ``` ```{r dosetup,echo=FALSE,results="hide"} suppressPackageStartupMessages({ library(rtracklayer) library(BiocStyle) library(erma) library(GenomicFiles) library(ggplot2) library(grid) library(png) library(DT) }) ``` # Introduction The [epigenomics road map](http://www.roadmapepigenomics.org/) describes locations of epigenetic marks in DNA from a variety of cell types. Of interest are locations of histone modifications, sites of DNA methylation, and regions of accessible chromatin. This package presents a selection of elements of the road map including metadata and outputs of the ChromImpute procedure applied to ENCODE cell lines by [Ernst and Kellis](http://dx.doi.org/10.1038/nbt.3157). # Metadata on the ChromImpute archive ## Sample metadata I have retrieved a [Google Docs spreadsheet](https://docs.google.com/spreadsheet/ccc?key=0Am6FxqAtrFDwdHU1UC13ZUxKYy1XVEJPUzV6MEtQOXc&usp=sharing#gid=15) with comprehensive information. The mapmeta() function provides access to a local DataFrame image of the file as retrieved in mid April 2015. We provide a dynamic view of a selection of columns. Use the search box to filter records shown, for example \code{LUNG}. ```{r lkm} library(DT) library(erma) meta = mapmeta() kpc = c("Comments", "Epigenome.ID..EID.", "Epigenome.Mnemonic", "Quality.Rating", "Standardized.Epigenome.name", "ANATOMY", "TYPE") datatable(as.data.frame(meta[,kpc])) ``` ## Metadata on the inferred states The chromatin states and standard colorings used are enumerated in `states_25`: ```{r lkst} data(states_25) datatable(states_25) ``` The emission parameters of the 25 state model are depicted in the supplementary Figure 33 of Ernst and Kellis: ```{r dofig,fig=TRUE} library(png) im = readPNG(system.file("pngs/emparms.png", package="erma")) grid.raster(im) ``` # Managing access to imputed chromatin states for a set of cell types I have retrieved a modest number of roadmap bed files with ChromImpute mnemonic labeling of chromatin by states. These can be managed with an ErmaSet instance, a trivial extension of `r Biocpkg("GenomicFiles")` class. The `cellTypes` method yields a character vector. The `colData` component has full metadata on the cell lines available. ```{r getgf} ermaset = makeErmaSet() ermaset cellTypes(ermaset)[1:5] datatable(as.data.frame(colData(ermaset)[,kpc])) ``` # Enumerating states in the vicinity of a gene, across cell types We form a GRanges representing 50kb upstream of IL33. ```{r getgm, cache=TRUE} uil33 = flank(resize(range(genemodel("IL33")), 1), width=50000) uil33 ``` Bind this to the ErmaSet instance. ```{r bind} rowRanges(ermaset) = uil33 ermaset ``` Now query the files for cell-specific states in this interval. ```{r getcss} library(BiocParallel) register(MulticoreParam(workers=2)) # reduce will be done according to registered bpparam; lapply just extracts suppressWarnings({ csstates = lapply(reduceByFile(ermaset, MAP=function(range, file) { imp = import(file, which=range, genome=genome(range)[1]) seqlevels(imp) = seqlevels(range) imp$rgb = erma:::rgbByState(imp$name) imp }), "[[", 1) }) tys = cellTypes(ermaset) # need to label with cell types csstates = lapply(1:length(csstates), function(x) { csstates[[x]]$celltype = tys[x] csstates[[x]] }) csstates[1:2] ``` This sort of code underlies the `csProfile` utility to visualize variation in state assignments in promoter regions for various genes. ```{r doviz, fig=TRUE} csProfile(ermaset[,1:5], symbol="CD28", useShiny=FALSE) ``` Set `useShiny` to `TRUE` to permit interactive selection of region to visualize.