--- title: "A quick overview of the S4 class system" author: "Hervé Pagès" date: "June 2016" package: S4Vectors vignette: > %\VignetteIndexEntry{A quick overview of the S4 class system} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} %\VignetteDepends{methods,Matrix,IRanges,ShortRead,graph} output: slidy_presentation: font_adjustment: -1 css: styles.css includes: in_header: header.html --- ```{r setup, echo=FALSE, results='hide'} options(width=60) suppressPackageStartupMessages({ library(Matrix) library(IRanges) library(ShortRead) library(graph) }) ``` # What is S4? ## The S4 class system * The *S4 class system* is a set of facilities provided in R for OO programming. * Implemented in the `methods` package. * On a fresh R session: ```r > sessionInfo() ... attached base packages: [1] stats graphics grDevices utils datasets [6] methods base ``` * R also supports an older class system: the *S3 class system*. ## A different world **The syntax** ```r foo(x, ...) ``` not: ```r x.foo(...) ``` like in other OO programming languages. **The central concepts** * The core components: *classes*, *generic functions* and *methods* * The glue: *method dispatch* (supports *simple* and *multiple* dispatch) ## **The result** ``` > ls('package:methods') [1] "addNextMethod" "allGenerics" [3] "allNames" "Arith" [5] "as" "as<-" [7] "asMethodDefinition" "assignClassDef" ... [211] "testVirtual" "traceOff" [213] "traceOn" "tryNew" [215] "unRematchDefinition" "validObject" [217] "validSlotNames" ``` * Rich, complex, can be intimidating * The classes and methods we implement in our packages can be hard to document, especially when the class hierarchy is complicated and multiple dispatch is used ## S4 in Bioconductor * Heavily used. In BioC 3.3: 3158 classes and 22511 methods defined in 609 packages! (out of 1211 software packages) * Top 10: 128 classes in `ChemmineOB`, 98 in `flowCore`, 79 in `IRanges`, 68 in `rsbml`, 61 in `ShortRead`, 58 in `Biostrings`, 51 in `rtracklayer`, 50 in `oligoClasses`, 45 in `flowUtils`, and 40 in `BaseSpaceR`. * For the end user: it's mostly transparent. But when something goes wrong, error messages issued by the S4 class system can be hard to understand. Also it can be hard to find the documentation for a specific method. * Most Bioconductor packages use only a small subset of the S4 capabilities (covers 99.99% of our needs) # S4 from an end-user point of view ## Where do S4 objects come from? **From a dataset** ```{r S4_object_in_dataset} library(graph) data(apopGraph) apopGraph ``` **From using an object constructor function** ```{r S4_object_from_constructor} library(IRanges) IRanges(start=c(101, 25), end=c(110, 80)) ``` ## **From a coercion** ```{r S4_object_from_ceorcion} library(Matrix) m <- matrix(3:-4, nrow=2) as(m, "Matrix") ``` **From using a specialized high-level constructor** ```{r eval=FALSE} library(GenomicFeatures) makeTxDbFromUCSC("sacCer2", tablename="ensGene") ``` ``` TxDb object: # Db type: TxDb # Supporting package: GenomicFeatures # Data source: UCSC # Genome: sacCer2 # Organism: Saccharomyces cerevisiae # Taxonomy ID: 4932 # UCSC Table: ensGene # UCSC Track: Ensembl Genes ... ``` ## **From using a high-level I/O function** ```{r S4_object_from_high_level_IO_function} library(ShortRead) path_to_my_data <- system.file( package="ShortRead", "extdata", "Data", "C1-36Firecrest", "Bustard", "GERALD") lane1 <- readFastq(path_to_my_data, pattern="s_1_sequence.txt") lane1 ``` **Inside another object** ```{r S4_object_inside_another_object} sread(lane1) ``` ## How to manipulate S4 objects? **Low-level: getters and setters** ```{r getters_and_setters} ir <- IRanges(start=c(101, 25), end=c(110, 80)) width(ir) width(ir) <- width(ir) - 5 ir ``` **High-level: plenty of specialized methods** ```{r specialized_methods} qa1 <- qa(lane1, lane="lane1") class(qa1) ``` ## How to find the right man page? * `class?graphNEL` or equivalently `?`graphNEL-class`` for accessing the man page of a class * `?qa` for accessing the man page of a generic function * The man page for a generic might also document some or all of the methods for this generic. The *See Also:* section might give a clue. Also using `showMethods()` can be useful: ```{r showMethods} showMethods("qa") ``` * `?`qa,ShortReadQ-method`` to access the man page for a particular method (might be the same man page as for the generic) * In doubt: `??qa` will search the man pages of all the installed packages and return the list of man pages that contain the string `qa` ## Inspecting objects and discovering methods * `class()` and `showClass()` ```{r showClass, R.options=list(width=60)} class(lane1) showClass("ShortReadQ") ``` * `str()` for compact display of the content of an object * `showMethods()` to discover methods * `selectMethod()` to see the code # Implementing an S4 class (in 4 slides) ## Class definition and constructor **Class definition** ```{r setClass} setClass("SNPLocations", slots=c( genome="character", # a single string snpid="character", # a character vector of length N chrom="character", # a character vector of length N pos="integer" # an integer vector of length N ) ) ``` **Constructor** ```{r SNPLocations} SNPLocations <- function(genome, snpid, chrom, pos) new("SNPLocations", genome=genome, snpid=snpid, chrom=chrom, pos=pos) ``` ```{r test_SNPLocations} snplocs <- SNPLocations("hg19", c("rs0001", "rs0002"), c("chr1", "chrX"), c(224033L, 1266886L)) ``` ## Getters **Defining the `length` method** ```{r length, results='hide'} setMethod("length", "SNPLocations", function(x) length(x@snpid)) ``` ```{r test_length} length(snplocs) # just testing ``` **Defining the slot getters** ```{r genome, results='hide'} setGeneric("genome", function(x) standardGeneric("genome")) setMethod("genome", "SNPLocations", function(x) x@genome) ``` ```{r snpid, results='hide'} setGeneric("snpid", function(x) standardGeneric("snpid")) setMethod("snpid", "SNPLocations", function(x) x@snpid) ``` ```{r chrom, results='hide'} setGeneric("chrom", function(x) standardGeneric("chrom")) setMethod("chrom", "SNPLocations", function(x) x@chrom) ``` ```{r pos, results='hide'} setGeneric("pos", function(x) standardGeneric("pos")) setMethod("pos", "SNPLocations", function(x) x@pos) ``` ```{r test_slot_getters} genome(snplocs) # just testing snpid(snplocs) # just testing ``` ## **Defining the `show` method** ```{r show, results='hide'} setMethod("show", "SNPLocations", function(object) cat(class(object), "instance with", length(object), "SNPs on genome", genome(object), "\n") ) ``` ```{r} snplocs # just testing ``` **Defining the *validity method*** ```{r validity, results='hide'} setValidity("SNPLocations", function(object) { if (!is.character(genome(object)) || length(genome(object)) != 1 || is.na(genome(object))) return("'genome' slot must be a single string") slot_lengths <- c(length(snpid(object)), length(chrom(object)), length(pos(object))) if (length(unique(slot_lengths)) != 1) return("lengths of slots 'snpid', 'chrom' and 'pos' differ") TRUE } ) ``` ```{r error=TRUE} snplocs@chrom <- LETTERS[1:3] # a very bad idea! validObject(snplocs) ``` ## **Defining slot setters** ```{r set_chrom, results='hide'} setGeneric("chrom<-", function(x, value) standardGeneric("chrom<-")) setReplaceMethod("chrom", "SNPLocations", function(x, value) {x@chrom <- value; validObject(x); x}) ``` ```{r test_slot_setters} chrom(snplocs) <- LETTERS[1:2] # repair currently broken object ``` ```{r error=TRUE} chrom(snplocs) <- LETTERS[1:3] # try to break it again ``` **Defining a coercion method** ```{r setAs, results='hide'} setAs("SNPLocations", "data.frame", function(from) data.frame(snpid=snpid(from), chrom=chrom(from), pos=pos(from)) ) ``` ```{r test_coercion} as(snplocs, "data.frame") # testing ``` # Extending an existing class ## Slot inheritance * Most of the time (but not always), the child class will have additional slots: ```{r AnnotatedSNPs} setClass("AnnotatedSNPs", contains="SNPLocations", slots=c( geneid="character" # a character vector of length N ) ) ``` * The slots from the parent class are inherited: ```{r slot_inheritance} showClass("AnnotatedSNPs") ``` * Constructor: ```{r AnnotatedSNPs_constructor} AnnotatedSNPs <- function(genome, snpid, chrom, pos, geneid) { new("AnnotatedSNPs", SNPLocations(genome, snpid, chrom, pos), geneid=geneid) } ``` ## Method inheritance * Let's create an AnnotatedSNPs object: ```{r method_inheritance} snps <- AnnotatedSNPs("hg19", c("rs0001", "rs0002"), c("chr1", "chrX"), c(224033L, 1266886L), c("AAU1", "SXW-23")) ``` * All the methods defined for SNPLocations objects work out-of-the-box: ```{r method_inheritance_2} snps ``` * But sometimes they don't do the right thing: ```{r as_data_frame_is_not_right} as(snps, "data.frame") # the 'geneid' slot is ignored ``` ## * Being a SNPLocations *object* vs being a SNPLocations *instance*: ```{r} is(snps, "AnnotatedSNPs") # 'snps' is an AnnotatedSNPs object is(snps, "SNPLocations") # and is also a SNPLocations object class(snps) # but is *not* a SNPLocations *instance* ``` * Method overriding: for example we could define a `show` method for AnnotatedSNPs objects. `callNextMethod` can be used in that context to call the method defined for the parent class from within the method for the child class. * Automatic coercion method: ```{r automatic_coercion_method} as(snps, "SNPLocations") ``` ## Incremental validity method * The *validity method* for AnnotatedSNPs objects only needs to validate what's not already validated by the *validity method* for SNPLocations objects: ```{r incremental_validity_method, results='hide'} setValidity("AnnotatedSNPs", function(object) { if (length(object@geneid) != length(object)) return("'geneid' slot must have the length of the object") TRUE } ) ``` * In other words: before an AnnotatedSNPs object can be considered valid, it must first be a valid SNPLocations object. # What else? ## **Other important S4 features** * *Virtual* classes: equivalent to *abstract* classes in Java * Class unions (see `?setClassUnion`) * Multiple inheritance: a powerful feature that should be used with caution. If used inappropriately, can lead to a class hierarchy that is very hard to maintain **Resources** * Man pages in the `methods` package: `?setClass`, `?showMethods`, `?selectMethod`, `?getMethod`, `?is`, `?setValidity`, `?as` * The *Extending RangedSummarizedExperiment* section of the *SummarizedExperiment* vignette in the `SummarizedExperiment` package. * Note: S4 is *not* covered in the *An Introduction to R* or *The R language definition* manuals * The *Writing R Extensions* manual for details about integrating S4 classes to a package * The *R Programming for Bioinformatics* book by Robert Gentleman