Genomic ranges describe…
Packages
r Biocpkg("Biostrings")
and other packageslibrary(GenomicRanges)
library(GenomicAlignments)
sessionInfo()
## R version 3.2.0 alpha (2015-03-25 r68090)
## Platform: x86_64-unknown-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.2 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.1.2
## [2] GenomicFeatures_1.19.36
## [3] AnnotationDbi_1.29.21
## [4] Biobase_2.27.3
## [5] BSgenome.Hsapiens.UCSC.hg19_1.4.0
## [6] BSgenome_1.35.20
## [7] rtracklayer_1.27.11
## [8] GenomicAlignments_1.3.33
## [9] Rsamtools_1.19.49
## [10] Biostrings_2.35.12
## [11] XVector_0.7.4
## [12] GenomicRanges_1.19.52
## [13] GenomeInfoDb_1.3.16
## [14] IRanges_2.1.43
## [15] S4Vectors_0.5.22
## [16] BiocGenerics_0.13.11
## [17] BiocStyle_1.5.3
##
## loaded via a namespace (and not attached):
## [1] knitr_1.9 zlibbioc_1.13.3 BiocParallel_1.1.21
## [4] stringr_0.6.2 tools_3.2.0 DBI_0.3.1
## [7] lambda.r_1.1.7 futile.logger_1.4 htmltools_0.2.6
## [10] yaml_2.1.13 digest_0.6.8 formatR_1.1
## [13] futile.options_1.0.0 bitops_1.0-6 biomaRt_2.23.5
## [16] RCurl_1.95-4.5 RSQLite_1.0.0 evaluate_0.5.5
## [19] rmarkdown_0.5.1 XML_3.98-1.1
GRanges
: simple genomic ranges
mcols()
of associated data, e.g., ‘score’, ‘id’, …seqname()
, e.g., chromosome, but could be, e.g., contig, …start()
, end()
: 1-based, closed intervalsstrand()
: +, -, or * (does not matter)mcols()
GRangesList
: nested genomic ranges
*List
objects: lists, but all elements of the same type. E.g., start()
returns an IntegerList()
.unlist()
Intra-range operations
range()
, flank()
Inter-range operations
reduce()
, disjoin()
Between-object
psetdiff()
, findOverlaps()
, countOverlaps()
PLoS Comput Biol 9(8): e1003118
What can I do with my GRanges
instance?
methods(class="GRanges")
## [1] != $ $<-
## [4] %in% < <=
## [7] == > >=
## [10] BamViews NROW Ops
## [13] ROWNAMES ScanBamParam ScanBcfParam
## [16] [ [<- aggregate
## [19] anyNA append as.character
## [22] as.complex as.data.frame as.env
## [25] as.integer as.list as.logical
## [28] as.numeric as.raw bamWhich<-
## [31] blocks browseGenome c
## [34] chrom chrom<- coerce
## [37] coerce<- compare countOverlaps
## [40] coverage disjoin disjointBins
## [43] distance distanceToNearest duplicated
## [46] elementMetadata elementMetadata<- end
## [49] end<- eval export
## [52] extractROWS extractUpstreamSeqs findOverlaps
## [55] fixedColumnNames flank follow
## [58] gaps getPromoterSeq granges
## [61] head high2low intersect
## [64] isDisjoint length liftOver
## [67] mapCoords mapFromAlignments mapFromTranscripts
## [70] mapToAlignments mapToTranscripts match
## [73] mcols mcols<- metadata
## [76] metadata<- mstack names
## [79] names<- narrow nearest
## [82] order overlapsAny parallelSlotNames
## [85] pgap pintersect pmapCoords
## [88] pmapFromAlignments pmapFromTranscripts pmapToAlignments
## [91] pmapToTranscripts precede promoters
## [94] psetdiff punion range
## [97] ranges ranges<- rank
## [100] reduce relist relistToClass
## [103] rename rep rep.int
## [106] replaceROWS resize restrict
## [109] rev rowRanges<- scanFa
## [112] scanTabix score score<-
## [115] seqinfo seqinfo<- seqlevelsInUse
## [118] seqnames seqnames<- setdiff
## [121] shift shiftApply show
## [124] showAsCell sort split
## [127] split<- start start<-
## [130] strand strand<- subset
## [133] subsetByOverlaps summarizeOverlaps table
## [136] tail tapply tile
## [139] trim union unique
## [142] update updateObject values
## [145] values<- width width<-
## [148] window window<- with
## [151] xtfrm
## see '?methods' for accessing help and source code
What type of object(s) can I use findOverlaps()
on (what methods exist for the findOverlaps()
generic)?
methods(findOverlaps)
## [1] findOverlaps,GAlignmentPairs,GAlignmentPairs-method
## [2] findOverlaps,GAlignmentPairs,Vector-method
## [3] findOverlaps,GAlignments,GAlignments-method
## [4] findOverlaps,GAlignments,Vector-method
## [5] findOverlaps,GAlignmentsList,GAlignmentsList-method
## [6] findOverlaps,GAlignmentsList,Vector-method
## [7] findOverlaps,GNCList,GenomicRanges-method
## [8] findOverlaps,GRangesList,GRangesList-method
## [9] findOverlaps,GRangesList,GenomicRanges-method
## [10] findOverlaps,GRangesList,RangedData-method
## [11] findOverlaps,GRangesList,RangesList-method
## [12] findOverlaps,GenomicRanges,GIntervalTree-method
## [13] findOverlaps,GenomicRanges,GNCList-method
## [14] findOverlaps,GenomicRanges,GRangesList-method
## [15] findOverlaps,GenomicRanges,GenomicRanges-method
## [16] findOverlaps,GenomicRanges,RangedData-method
## [17] findOverlaps,GenomicRanges,RangesList-method
## [18] findOverlaps,NCList,Ranges-method
## [19] findOverlaps,RangedData,GRangesList-method
## [20] findOverlaps,RangedData,GenomicRanges-method
## [21] findOverlaps,RangedData,RangedData-method
## [22] findOverlaps,RangedData,RangesList-method
## [23] findOverlaps,Ranges,IntervalTree-method
## [24] findOverlaps,Ranges,NCList-method
## [25] findOverlaps,Ranges,Ranges-method
## [26] findOverlaps,RangesList,GRangesList-method
## [27] findOverlaps,RangesList,GenomicRanges-method
## [28] findOverlaps,RangesList,IntervalForest-method
## [29] findOverlaps,RangesList,RangedData-method
## [30] findOverlaps,RangesList,RangesList-method
## [31] findOverlaps,SummarizedExperiment,SummarizedExperiment-method
## [32] findOverlaps,SummarizedExperiment,Vector-method
## [33] findOverlaps,Vector,GAlignmentPairs-method
## [34] findOverlaps,Vector,GAlignments-method
## [35] findOverlaps,Vector,GAlignmentsList-method
## [36] findOverlaps,Vector,SummarizedExperiment-method
## [37] findOverlaps,Vector,Views-method
## [38] findOverlaps,Vector,ViewsList-method
## [39] findOverlaps,Vector,missing-method
## [40] findOverlaps,Views,Vector-method
## [41] findOverlaps,Views,Views-method
## [42] findOverlaps,ViewsList,Vector-method
## [43] findOverlaps,ViewsList,ViewsList-method
## [44] findOverlaps,integer,Ranges-method
## see '?methods' for accessing help and source code
How can I get help on functions, generics, and methods?
?"findOverlaps" ## generic
?"findOverlaps,<tab>" ## specific method
Other help?
browseVignettes("GenomicRanges")
GAlignments
and friends (GenomicAlignments)
GAlignments
: Single-end aligned reads, e.g., from BAM filesGAlignmentPairs
, GAlignmentsList
: paired-end aligned reads. *Pairs
is more restrictive on what pairs can be representedDNAString
and DNAStringSet
(Biostrings)
SummarizedExperiment
(GenomicRanges)
assays
of rows (regions of interest; genomic ranges) x columns (samples, including integrated phenotypic information)TxDb
(AnnotationDb)
transcripts()
interfaceselect()
interfaceVCF
(VariantAnnotation)
Lower-level classes
R works efficiently on vectors
GRanges
as a collection of vectors, not as a collection of recordsgetClass("GRanges")
## Class "GRanges" [package "GenomicRanges"]
##
## Slots:
##
## Name: seqnames ranges strand elementMetadata
## Class: Rle IRanges Rle DataFrame
##
## Name: seqinfo metadata
## Class: Seqinfo list
##
## Extends:
## Class "GenomicRanges", directly
## Class "Vector", by class "GenomicRanges", distance 2
## Class "GenomicRangesORmissing", by class "GenomicRanges", distance 2
## Class "GenomicRangesORGRangesList", by class "GenomicRanges", distance 2
## Class "GenomicRangesORGenomicRangesList", by class "GenomicRanges", distance 2
## Class "RangedDataORGenomicRanges", by class "GenomicRanges", distance 2
## Class "Annotated", by class "GenomicRanges", distance 3
Vector
and Annotated
[
, length()
, names()
, etc.mcols()
List
-like[[
elementLengths()
Implementation: Vector
plus partitioning
unlist()
and relist()
are very inexpensiveIngredients
exons()
, and exonsBy()
functionswidth()
, elementLengths()
accessorshist()
Goals
Ingredients - BSgenome.Hsapiens.UCSC.hg19 BSGenome package - TxDb.Hsapiens.UCSC.hg19 TxDb package - ?"getSeq,BSgenome-method"
, letterFrequency()
Goapls
Ingredients
DNAStringSet()
to construct CG island sequencematchPDict()
to find CG islands on BSgenome chromosomescoverage()
, tileGenome()
, Views()
, following HintstileGenome()
, findOverlaps()
, splitAsList()
, mean()
Goal
Ingredients
RNAseqData.HNRNPC.bam.chr14_BAMFILES
readGAlignments()
, readGAlignmentPairs()
, and readGAlignmentsList()
BamFile()
Goals
ScanBamParam()
which
and what
to selective input datacountOverlaps()
between reads and known genesBamFile()
yieldSize
argument to iterate through file