derfinderPlot
If you wish, you can view this vignette online here.
derfinderPlot
(Collado-Torres, Jaffe, and Leek, 2015)
is an addon package for derfinder
(Collado-Torres, Frazee, Love, Irizarry, et al., 2015)
with functions that allow you to visualize the results.
While the functions in derfinderPlot
assume you generated the data with derfinder
, they can be used with other GRanges
objects properly formatted.
The functions in derfinderPlot
are:
plotCluster()
is a tailored ggbio
(Yin, Cook, and Lawrence, 2012)
plot that shows all the regions in a cluster (defined by distance). It shows the base-level coverage for each sample as well as the mean for each group. If these regions overlap any known gene, the gene and the transcript annotation is displayed.plotOverview()
is another tailored ggbio
(Yin, Cook, and Lawrence, 2012)
plot showing an overview of the whole genome. This plot can be useful to observe if the regions are clustered in a subset of a chromosome. It can also be used to check whether the regions match predominantly one part of the gene structure (for example, 3' overlaps).plotRegionCoverage()
is a fast plotting function using R
base graphics that shows the base level coverage for each sample inside a specific region of the genome. If the region overlaps any known gene or intron, the information is displayed. Optionally, it can display the known transcripts. This function is most likely the easiest to use with GRanges
objects from other packages.As an example, we will analyze a small subset of the samples from the BrainSpan Atlas of the Human Brain (BrainSpan, 2011)
publicly available data.
We first load the required packages.
## Load libraries suppressPackageStartupMessages(library('derfinder')) library('derfinderData') library('derfinderPlot')
For this example, we created a small table with the relevant phenotype data for 12 samples: 6 from fetal samples and 6 from adult samples. We chose at random a brain region, in this case the primary auditory cortex (core) and for the example we will only look at data from chromosome 21. Other variables include the age in years and the gender. The data is shown below.
library('knitr') ## Get pheno table pheno <- subset(brainspanPheno, structure_acronym == 'A1C') ## Display the main information p <- pheno[, -which(colnames(pheno) %in% c('structure_acronym', 'structure_name', 'file'))] rownames(p) <- NULL kable(p, format = 'html', row.names = TRUE)
gender | lab | Age | group | |
---|---|---|---|---|
1 | M | HSB114.A1C | -0.6428571 | fetal |
2 | M | HSB103.A1C | -0.6428571 | fetal |
3 | M | HSB178.A1C | -0.5714286 | fetal |
4 | M | HSB154.A1C | -0.5714286 | fetal |
5 | F | HSB150.A1C | -0.6666667 | fetal |
6 | F | HSB149.A1C | -0.6428571 | fetal |
7 | F | HSB130.A1C | 21.0000000 | adult |
8 | M | HSB136.A1C | 23.0000000 | adult |
9 | F | HSB126.A1C | 30.0000000 | adult |
10 | M | HSB145.A1C | 36.0000000 | adult |
11 | M | HSB123.A1C | 37.0000000 | adult |
12 | F | HSB135.A1C | 40.0000000 | adult |
We can load the data from derfinderData
(Collado-Torres, Jaffe, and Leek, 2014)
by first identifying the paths to the BigWig files with derfinder::rawFiles()
and then loading the data with derfinder::fullCoverage()
.
## Determine the files to use and fix the names files <- rawFiles(system.file('extdata', 'A1C', package = 'derfinderData'), samplepatt = 'bw', fileterm = NULL) names(files) <- gsub('.bw', '', names(files)) ## Load the data from disk system.time(fullCov <- fullCoverage(files = files, chrs = 'chr21'))
## user system elapsed ## 1.896 0.016 2.031
Alternatively, since the BigWig files are publicly available from BrainSpan (see here), we can extract the relevant coverage data using derfinder::fullCoverage()
. Note that as of rtracklayer
1.25.16 BigWig files are not supported on Windows: you can find the fullCov
object inside derfinderData
to follow the examples.
## Determine the files to use and fix the names files <- pheno$file names(files) <- gsub('.A1C', '', pheno$lab) ## Load the data from the web system.time(fullCov <- fullCoverage(files = files, chrs = 'chr21'))
Once we have the base-level coverage data for all 12 samples, we can construct the models. In this case, we want to find differences between fetal and adult samples while adjusting for gender and a proxy of the library size.
## Get some idea of the library sizes sampleDepths <- sampleDepth(collapseFullCoverage(fullCov), 1)
## Define models models <- makeModels(sampleDepths, testvars = pheno$group, adjustvars = pheno[, c('gender')])
Next, we can find candidate differentially expressed regions (DERs) using as input the segments of the genome where at least one sample has coverage greater than 3. In this particular example, we chose a low theoretical F-statistic cutoff and used 20 permutations.
## Filter coverage filteredCov <- lapply(fullCov, filterData, cutoff = 3)
## Perform differential expression analysis suppressPackageStartupMessages(library('bumphunter')) system.time(results <- analyzeChr(chr = 'chr21', filteredCov$chr21, models, groupInfo = pheno$group, writeOutput = FALSE, cutoffFstat = 5e-02, nPermute = 20, seeds = 20140923 + seq_len(20)))
## Warning: Calling species() on a TxDb object is *deprecated*. ## Please use organism() instead.
## ...
## user system elapsed ## 67.591 0.247 67.988
## Quick access to the results regions <- results$regions$regions ## Annotation database to use suppressPackageStartupMessages(library('TxDb.Hsapiens.UCSC.hg19.knownGene')) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
plotOverview()
Now that we have obtained the main results using derfinder
, we can proceed to visualizing the results using derfinderPlot
. The easiest to use of all the functions is plotOverview()
which takes a set of regions and annotation information produced by bumphunter::annotateNearest()
.
The plot below shows the candidate DERs colored by whether their q-value was less than 0.10 or not.
## Q-values overview plotOverview(regions = regions, annotation = results$annotation, type = 'qval')
The next plot shows the candidate DERs colored by the type of gene feature they are nearest too.
## Annotation overview plotOverview(regions = regions, annotation = results$annotation, type = 'annotation')
In this particular example, because we are only using data from one chromosome the above plot is not as informative as in a real case scenario. However, with this plot we can quickly observe that nearly all of the candidate DERs are inside an exon.
plotRegionCoverage()
The complete opposite of visualizing the candidate DERs at the genome level is to visualize them one region at a time. plotRegionCoverage()
allows us to do this quickly for a large number of regions.
Before using this function, we need to process more detailed information using two derfinder
functions: annotateRegions()
and getRegionCoverage()
as shown below.
## Get required information for the plots annoRegs <- annotateRegions(regions, genomicState$fullGenome)
regionCov <- getRegionCoverage(fullCov, regions)
Once we have the relevant information we can proceed to plotting the first 10 regions. In this case, we will supply plotRegionCoverage()
with the information it needs to plot transcripts overlapping these 10 regions.
## Plot top 10 regions plotRegionCoverage(regions = regions, regionCoverage = regionCov, groupInfo = pheno$group, nearestAnnotation = results$annotation, annotatedRegions = annoRegs, whichRegions=1:10, txdb = txdb, scalefac = 1, ask = FALSE)
The base level coverage is shown in a log2 scale with any overlapping exons shown in dark blue and known introns in light blue.
plotCluster()
In this example, we noticed with the plotRegionCoverage()
plots that most of the candidate DERs are contained in known exons. Sometimes, the signal might be low or we might have used very stringent cutoffs in the derfinder
analysis. One way we can observe this is by plotting clusters of regions where a cluster is defined as regions within 300 bp (default option) of each other.
To visualize the clusters, we can use plotCluster()
which takes similar input to plotOverview()
with the notable addition of the coverage information as well as the idx
argument. This argument specifies which region to focus on: it will be plotted with a red bar and will determine the cluster to display.
In the plot below we observe one large candidate DER with other nearby ones that do not have a q-value less than 0.10. In a real analysis, we would probably discard this region as the coverage is fairly low.
## First cluster plotCluster(idx = 1, regions = regions, annotation = results$annotation, coverageInfo = fullCov$chr21, txdb = txdb, groupInfo = pheno$group, titleUse = 'pval')
The second cluster shows a larger number of potential DERs (again without q-values less than 0.10) in a segment of the genome where the coverage data is highly variable. This is a common occurrence with RNA-seq data.
## Second cluster plotCluster(idx = 2, regions = regions, annotation = results$annotation, coverageInfo = fullCov$chr21, txdb = txdb, groupInfo = pheno$group, titleUse = 'pval')
These plots show an ideogram which helps quickly identify which region of the genome we are focusing on. Then, the coverage level information for each sample is displayed in log2. Next, the coverage group means are shown in the log2 scale. The plot is completed with the potential and candidate DERs as well as any known transcripts.
If you are interested in using the advanced arguments, use derfinder::advancedArg()
as shown below:
## URLs to advanced arguemtns sapply(c('plotCluster', 'plotOverview'), derfinder::advancedArg, package = 'derfinderPlot', browse = FALSE)
## plotCluster ## "https://github.com/search?utf8=%E2%9C%93&q=advanced_argument+filename%3Acluster+repo%3Alcolladotor%2FderfinderPlot+path%3A%2FR&type=Code" ## plotOverview ## "https://github.com/search?utf8=%E2%9C%93&q=advanced_argument+filename%3Aoverview+repo%3Alcolladotor%2FderfinderPlot+path%3A%2FR&type=Code"
## Set browse = TRUE if you want to open them in your browser
This package was made possible thanks to:
(R Core Team, 2015)
(Arora, Morgan, Carlson, and Pages, 2015)
(Lawrence, Huber, Pagès, Aboyoun, et al., 2013)
(Yin, Cook, and Lawrence, 2012)
(Wickham, 2009)
(Lawrence, Huber, Pagès, Aboyoun, et al., 2013)
(Wickham, 2011)
(Neuwirth, 2014)
(Wickham, 2007)
(Wickham, 2014)
(Yin, Lawrence, and Cook, 2015)
and (Jaffe, E, Murakami, Peter, et al., 2012)
(Collado-Torres, Frazee, Love, Irizarry, et al., 2015)
(Collado-Torres, Jaffe, and Leek, 2014)
(Wickham and Chang, 2015)
(Boettiger, 2014)
(Xie, 2014)
(Hester, 2013)
(Allaire, Cheng, Xie, McPherson, et al., 2015)
(Wickham, 2011)
(Carlson, 2015)
Code for creating the vignette
## Create the vignette library('knitrBootstrap') knitrBootstrapFlag <- packageVersion('knitrBootstrap') < '1.0.0' if(knitrBootstrapFlag) { ## CRAN version system.time(knit_bootstrap('derfinderPlot.Rmd', chooser=c('boot', 'code'), show_code = TRUE)) unlink('derfinderPlot.md') } else { ## GitHub version library('rmarkdown') system.time(render('derfinderPlot.Rmd', 'knitrBootstrap::bootstrap_document')) } ## Note: if you prefer the knitr version use: # library('rmarkdown') # system.time(render('derfinderPlot.Rmd', 'html_document')) unlink('chr21', recursive = TRUE) ## Extract the R code library('knitr') knit('derfinderPlot.Rmd', tangle = TRUE) ## Clean up file.remove('derfinderPlotRef.bib')
Date the vignette was generated.
## [1] "2015-04-16 22:15:16 PDT"
Wallclock time spent generating the vignette.
## Time difference of 1.886 mins
R
session information.
## setting value ## version R version 3.2.0 (2015-04-16) ## system x86_64, linux-gnu ## ui X11 ## language en_US: ## collate C ## tz <NA>
## package * version date source ## AnnotationDbi 1.30.0 2015-04-17 Bioconductor ## BSgenome * 1.36.0 2015-04-17 Bioconductor ## Biobase 2.28.0 2015-04-17 Bioconductor ## BiocGenerics 0.14.0 2015-04-17 Bioconductor ## BiocParallel * 1.2.0 2015-04-17 Bioconductor ## Biostrings * 2.36.0 2015-04-17 Bioconductor ## DBI 0.3.1 2014-09-24 CRAN (R 3.2.0) ## Formula * 1.2-1 2015-04-07 CRAN (R 3.2.0) ## GGally * 0.5.0 2014-12-02 CRAN (R 3.2.0) ## GenomeInfoDb 1.4.0 2015-04-17 Bioconductor ## GenomicAlignments * 1.4.0 2015-04-17 Bioconductor ## GenomicFeatures 1.20.0 2015-04-17 Bioconductor ## GenomicFiles * 1.4.0 2015-04-17 Bioconductor ## GenomicRanges 1.20.0 2015-04-17 Bioconductor ## Hmisc * 3.15-0 2015-02-16 CRAN (R 3.2.0) ## IRanges 2.2.0 2015-04-17 Bioconductor ## MASS * 7.3-40 2015-03-21 CRAN (R 3.2.0) ## Matrix * 1.2-0 2015-04-04 CRAN (R 3.2.0) ## OrganismDbi * 1.10.0 2015-04-17 Bioconductor ## RBGL * 1.44.0 2015-04-17 Bioconductor ## RColorBrewer * 1.1-2 2014-12-07 CRAN (R 3.2.0) ## RCurl * 1.95-4.5 2014-12-28 CRAN (R 3.2.0) ## RJSONIO * 1.3-0 2014-07-28 CRAN (R 3.2.0) ## RSQLite 1.0.0 2014-10-25 CRAN (R 3.2.0) ## Rcpp * 0.11.5 2015-03-06 CRAN (R 3.2.0) ## RefManageR * 0.8.45 2015-01-09 CRAN (R 3.2.0) ## Rsamtools * 1.20.0 2015-04-17 Bioconductor ## S4Vectors 0.6.0 2015-04-17 Bioconductor ## TxDb.Hsapiens.UCSC.hg19.knownGene 3.1.2 2015-04-17 Bioconductor ## VariantAnnotation * 1.14.0 2015-04-17 Bioconductor ## XML * 3.98-1.1 2013-06-20 CRAN (R 3.2.0) ## XVector 0.8.0 2015-04-17 Bioconductor ## acepack * 1.3-3.3 2014-11-24 CRAN (R 3.2.0) ## bibtex * 0.4.0 2014-12-31 CRAN (R 3.2.0) ## biomaRt * 2.24.0 2015-04-17 Bioconductor ## biovizBase * 1.16.0 2015-04-17 Bioconductor ## bitops * 1.0-6 2013-08-17 CRAN (R 3.2.0) ## bumphunter 1.8.0 2015-04-17 Bioconductor ## cluster * 2.0.1 2015-01-31 CRAN (R 3.2.0) ## codetools * 0.2-11 2015-03-10 CRAN (R 3.2.0) ## colorspace * 1.2-6 2015-03-11 CRAN (R 3.2.0) ## derfinder 1.2.0 2015-04-17 Bioconductor ## derfinderData 0.101.1 2015-04-17 Bioconductor ## derfinderHelper * 1.2.0 2015-04-17 Bioconductor ## derfinderPlot 1.2.0 2015-04-17 Bioconductor ## devtools 1.7.0 2015-01-17 CRAN (R 3.2.0) ## dichromat * 2.0-0 2013-01-24 CRAN (R 3.2.0) ## digest * 0.6.8 2014-12-31 CRAN (R 3.2.0) ## doRNG * 1.6 2014-03-07 CRAN (R 3.2.0) ## evaluate * 0.6 2015-04-13 CRAN (R 3.2.0) ## foreach 1.4.2 2014-04-11 CRAN (R 3.2.0) ## foreign * 0.8-63 2015-02-20 CRAN (R 3.2.0) ## formatR * 1.1 2015-03-29 CRAN (R 3.2.0) ## futile.logger * 1.4 2015-03-21 CRAN (R 3.2.0) ## futile.options * 1.0.0 2010-04-06 CRAN (R 3.2.0) ## ggbio * 1.16.0 2015-04-17 Bioconductor ## ggplot2 * 1.0.1 2015-03-17 CRAN (R 3.2.0) ## graph * 1.46.0 2015-04-17 Bioconductor ## gridExtra * 0.9.1 2012-08-09 CRAN (R 3.2.0) ## gtable * 0.1.2 2012-12-05 CRAN (R 3.2.0) ## highr * 0.4.1 2015-03-29 CRAN (R 3.2.0) ## httr * 0.6.1 2015-01-01 CRAN (R 3.2.0) ## iterators 1.0.7 2014-04-11 CRAN (R 3.2.0) ## knitcitations 1.0.5 2014-11-26 CRAN (R 3.2.0) ## knitr 1.9 2015-01-20 CRAN (R 3.2.0) ## knitrBootstrap 0.9.0 2013-10-17 CRAN (R 3.2.0) ## labeling * 0.3 2014-08-23 CRAN (R 3.2.0) ## lambda.r * 1.1.7 2015-03-20 CRAN (R 3.2.0) ## lattice * 0.20-31 2015-03-30 CRAN (R 3.2.0) ## latticeExtra * 0.6-26 2013-08-15 CRAN (R 3.2.0) ## locfit 1.5-9.1 2013-04-20 CRAN (R 3.2.0) ## lubridate * 1.3.3 2013-12-31 CRAN (R 3.2.0) ## markdown * 0.7.4 2014-08-24 CRAN (R 3.2.0) ## matrixStats * 0.14.0 2015-02-14 CRAN (R 3.2.0) ## memoise * 0.2.1 2014-04-22 CRAN (R 3.2.0) ## munsell * 0.4.2 2013-07-11 CRAN (R 3.2.0) ## nnet * 7.3-9 2015-02-11 CRAN (R 3.2.0) ## org.Hs.eg.db 3.1.2 2015-04-17 Bioconductor ## pkgmaker * 0.22 2014-05-14 CRAN (R 3.2.0) ## plyr * 1.8.1 2014-02-26 CRAN (R 3.2.0) ## proto * 0.3-10 2012-12-22 CRAN (R 3.2.0) ## qvalue * 2.0.0 2015-04-17 Bioconductor ## registry * 0.2 2012-01-24 CRAN (R 3.2.0) ## reshape * 0.8.5 2014-04-23 CRAN (R 3.2.0) ## reshape2 * 1.4.1 2014-12-06 CRAN (R 3.2.0) ## rngtools * 1.2.4 2014-03-06 CRAN (R 3.2.0) ## rpart * 4.1-9 2015-02-24 CRAN (R 3.2.0) ## rstudioapi * 0.3.1 2015-04-07 CRAN (R 3.2.0) ## rtracklayer * 1.28.0 2015-04-17 Bioconductor ## scales * 0.2.4 2014-04-22 CRAN (R 3.2.0) ## stringr * 0.6.2 2012-12-06 CRAN (R 3.2.0) ## survival * 2.38-1 2015-02-24 CRAN (R 3.2.0) ## xtable * 1.7-4 2014-09-12 CRAN (R 3.2.0) ## zlibbioc * 1.14.0 2015-04-17 Bioconductor
This vignette was generated using knitrBootstrap
(Hester, 2013)
with knitr
(Xie, 2014)
and rmarkdown
(Allaire, Cheng, Xie, McPherson, et al., 2015)
running behind the scenes.
Citations made with knitcitations
(Boettiger, 2014)
.
[1] J. Allaire, J. Cheng, Y. Xie, J. McPherson, et al. rmarkdown: Dynamic Documents for R. R package version 0.5.1. 2015. URL: http://CRAN.R-project.org/package=rmarkdown.
[2] S. Arora, M. Morgan, M. Carlson and H. Pages. GenomeInfoDb: Utilities for manipulating chromosome and other 'seqname' identifiers. R package version 1.4.0. 2015.
[3] C. Boettiger. knitcitations: Citations for knitr markdown files. R package version 1.0.5. 2014. URL: http://CRAN.R-project.org/package=knitcitations.
[4] BrainSpan. “Atlas of the Developing Human Brain [Internet]. Funded by ARRA Awards 1RC2MH089921-01, 1RC2MH090047-01, and 1RC2MH089929-01.” 2011. URL: http://developinghumanbrain.org.
[5] M. Carlson. TxDb.Hsapiens.UCSC.hg19.knownGene: Annotation package for TxDb object(s). R package version 3.1.2. 2015.
[6] L. Collado-Torres, A. C. Frazee, M. I. Love, R. A. Irizarry, et al. “derfinder: Software for annotation-agnostic RNA-seq differential expression analysis”. In: bioRxiv (2015). DOI: 10.1101/015370. URL: http://www.biorxiv.org/content/early/2015/02/19/015370.abstract.
[7] L. Collado-Torres, A. E. Jaffe and J. T. Leek. derfinderPlot: Plotting functions for derfinder. https://github.com/lcolladotor/derfinderPlot - R package version 1.2.0. 2015. URL: http://www.bioconductor.org/packages/release/bioc/html/derfinderPlot.html.
[8] L. Collado-Torres, A. Jaffe and J. Leek. derfinderData: Processed BigWigs from BrainSpan for examples. R package version 0.101.1. 2014. URL: https://github.com/lcolladotor/derfinderData.
[9] J. Hester. knitrBootstrap: Knitr Bootstrap framework. R package version 0.9.0. 2013. URL: http://CRAN.R-project.org/package=knitrBootstrap.
[10] Jaffe, A. E, Murakami, Peter, et al. “Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies”. In: International Journal of Epidemiology (2012).
[11] M. Lawrence, W. Huber, H. Pagès, P. Aboyoun, et al. “Software for Computing and Annotating Genomic Ranges”. In: PLoS Computational Biology 9 (8 2013). DOI: 10.1371/journal.pcbi.1003118. URL: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118}.
[12] E. Neuwirth. RColorBrewer: ColorBrewer Palettes. R package version 1.1-2. 2014. URL: http://CRAN.R-project.org/package=RColorBrewer.
[13] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2015. URL: http://www.R-project.org/.
[14] H. Wickham. “Reshaping Data with the reshape Package”. In: Journal of Statistical Software 21.12 (2007), pp. 1–20. URL: http://www.jstatsoft.org/v21/i12/.
[15] H. Wickham. “The Split-Apply-Combine Strategy for Data Analysis”. In: Journal of Statistical Software 40.1 (2011), pp. 1–29. URL: http://www.jstatsoft.org/v40/i01/.
[16] H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009. ISBN: 978-0-387-98140-6. URL: http://had.co.nz/ggplot2/book.
[17] H. Wickham. scales: Scale functions for graphics. R package version 0.2.4. 2014. URL: http://CRAN.R-project.org/package=scales.
[18] H. Wickham. “testthat: Get Started with Testing”. In: The R Journal 3 (2011), pp. 5–10. URL: http://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.
[19] H. Wickham and W. Chang. devtools: Tools to Make Developing R Packages Easier. R package version 1.7.0. 2015. URL: http://CRAN.R-project.org/package=devtools.
[20] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. URL: http://www.crcpress.com/product/isbn/9781466561595.
[21] T. Yin, D. Cook and M. Lawrence. “ggbio: an R package for extending the grammar of graphics for genomic data”. In: Genome Biology 13.8 (2012), p. R77.
[22] T. Yin, M. Lawrence and D. Cook. biovizBase: Basic graphic utilities for visualization of genomic data. R package version 1.16.0. 2015.