---
title: "A quick overview of the S4 class system"
author: "Hervé Pagès"
date: "June 2016"
package: S4Vectors
vignette: >
%\VignetteIndexEntry{A quick overview of the S4 class system}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
%\VignetteDepends{methods,Matrix,IRanges,ShortRead,graph}
output:
slidy_presentation:
font_adjustment: -1
css: styles.css
includes:
in_header: header.html
---
```{r setup, echo=FALSE, results='hide'}
options(width=60)
suppressPackageStartupMessages({
library(Matrix)
library(IRanges)
library(ShortRead)
library(graph)
})
```
# What is S4?
## The S4 class system
* The *S4 class system* is a set of facilities provided in R for OO programming.
* Implemented in the `methods` package.
* On a fresh R session:
```r
> sessionInfo()
...
attached base packages:
[1] stats graphics grDevices utils datasets
[6] methods base
```
* R also supports an older class system: the *S3 class system*.
## A different world
**The syntax**
```r
foo(x, ...)
```
not:
```r
x.foo(...)
```
like in other OO programming languages.
**The central concepts**
* The core components: *classes*, *generic functions* and *methods*
* The glue: *method dispatch* (supports *simple* and *multiple* dispatch)
##
**The result**
```
> ls('package:methods')
[1] "addNextMethod" "allGenerics"
[3] "allNames" "Arith"
[5] "as" "as<-"
[7] "asMethodDefinition" "assignClassDef"
...
[211] "testVirtual" "traceOff"
[213] "traceOn" "tryNew"
[215] "unRematchDefinition" "validObject"
[217] "validSlotNames"
```
* Rich, complex, can be intimidating
* The classes and methods we implement in our packages can be hard to document, especially when the class hierarchy is complicated and multiple dispatch is used
## S4 in Bioconductor
* Heavily used. In BioC 3.3: 3158 classes and 22511 methods defined in 609
packages! (out of 1211 software packages)
* Top 10: 128 classes in `ChemmineOB`, 98 in `flowCore`, 79 in `IRanges`, 68 in
`rsbml`, 61 in `ShortRead`, 58 in `Biostrings`, 51 in `rtracklayer`, 50 in
`oligoClasses`, 45 in `flowUtils`, and 40 in `BaseSpaceR`.
* For the end user: it's mostly transparent. But when something goes wrong,
error messages issued by the S4 class system can be hard to understand. Also it
can be hard to find the documentation for a specific method.
* Most Bioconductor packages use only a small subset of the S4 capabilities
(covers 99.99% of our needs)
# S4 from an end-user point of view
## Where do S4 objects come from?
**From a dataset**
```{r S4_object_in_dataset}
library(graph)
data(apopGraph)
apopGraph
```
**From using an object constructor function**
```{r S4_object_from_constructor}
library(IRanges)
IRanges(start=c(101, 25), end=c(110, 80))
```
##
**From a coercion**
```{r S4_object_from_ceorcion}
library(Matrix)
m <- matrix(3:-4, nrow=2)
as(m, "Matrix")
```
**From using a specialized high-level constructor**
```{r eval=FALSE}
library(GenomicFeatures)
makeTxDbFromUCSC("sacCer2", tablename="ensGene")
```
```
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: UCSC
# Genome: sacCer2
# Organism: Saccharomyces cerevisiae
# Taxonomy ID: 4932
# UCSC Table: ensGene
# UCSC Track: Ensembl Genes
...
```
##
**From using a high-level I/O function**
```{r S4_object_from_high_level_IO_function}
library(ShortRead)
path_to_my_data <- system.file(
package="ShortRead",
"extdata", "Data", "C1-36Firecrest", "Bustard", "GERALD")
lane1 <- readFastq(path_to_my_data, pattern="s_1_sequence.txt")
lane1
```
**Inside another object**
```{r S4_object_inside_another_object}
sread(lane1)
```
## How to manipulate S4 objects?
**Low-level: getters and setters**
```{r getters_and_setters}
ir <- IRanges(start=c(101, 25), end=c(110, 80))
width(ir)
width(ir) <- width(ir) - 5
ir
```
**High-level: plenty of specialized methods**
```{r specialized_methods}
qa1 <- qa(lane1, lane="lane1")
class(qa1)
```
## How to find the right man page?
* `class?graphNEL` or equivalently `?`graphNEL-class`` for accessing the man
page of a class
* `?qa` for accessing the man page of a generic function
* The man page for a generic might also document some or all of the methods for
this generic. The *See Also:* section might give a clue. Also using
`showMethods()` can be useful:
```{r showMethods}
showMethods("qa")
```
* `?`qa,ShortReadQ-method`` to access the man page for a particular method
(might be the same man page as for the generic)
* In doubt: `??qa` will search the man pages of all the installed packages and
return the list of man pages that contain the string `qa`
## Inspecting objects and discovering methods
* `class()` and `showClass()`
```{r showClass, R.options=list(width=60)}
class(lane1)
showClass("ShortReadQ")
```
* `str()` for compact display of the content of an object
* `showMethods()` to discover methods
* `selectMethod()` to see the code
# Implementing an S4 class (in 4 slides)
## Class definition and constructor
**Class definition**
```{r setClass}
setClass("SNPLocations",
slots=c(
genome="character", # a single string
snpid="character", # a character vector of length N
chrom="character", # a character vector of length N
pos="integer" # an integer vector of length N
)
)
```
**Constructor**
```{r SNPLocations}
SNPLocations <- function(genome, snpid, chrom, pos)
new("SNPLocations", genome=genome, snpid=snpid, chrom=chrom, pos=pos)
```
```{r test_SNPLocations}
snplocs <- SNPLocations("hg19",
c("rs0001", "rs0002"),
c("chr1", "chrX"),
c(224033L, 1266886L))
```
## Getters
**Defining the `length` method**
```{r length, results='hide'}
setMethod("length", "SNPLocations", function(x) length(x@snpid))
```
```{r test_length}
length(snplocs) # just testing
```
**Defining the slot getters**
```{r genome, results='hide'}
setGeneric("genome", function(x) standardGeneric("genome"))
setMethod("genome", "SNPLocations", function(x) x@genome)
```
```{r snpid, results='hide'}
setGeneric("snpid", function(x) standardGeneric("snpid"))
setMethod("snpid", "SNPLocations", function(x) x@snpid)
```
```{r chrom, results='hide'}
setGeneric("chrom", function(x) standardGeneric("chrom"))
setMethod("chrom", "SNPLocations", function(x) x@chrom)
```
```{r pos, results='hide'}
setGeneric("pos", function(x) standardGeneric("pos"))
setMethod("pos", "SNPLocations", function(x) x@pos)
```
```{r test_slot_getters}
genome(snplocs) # just testing
snpid(snplocs) # just testing
```
##
**Defining the `show` method**
```{r show, results='hide'}
setMethod("show", "SNPLocations",
function(object)
cat(class(object), "instance with", length(object),
"SNPs on genome", genome(object), "\n")
)
```
```{r}
snplocs # just testing
```
**Defining the *validity method***
```{r validity, results='hide'}
setValidity("SNPLocations",
function(object) {
if (!is.character(genome(object)) ||
length(genome(object)) != 1 || is.na(genome(object)))
return("'genome' slot must be a single string")
slot_lengths <- c(length(snpid(object)),
length(chrom(object)),
length(pos(object)))
if (length(unique(slot_lengths)) != 1)
return("lengths of slots 'snpid', 'chrom' and 'pos' differ")
TRUE
}
)
```
```{r error=TRUE}
snplocs@chrom <- LETTERS[1:3] # a very bad idea!
validObject(snplocs)
```
##
**Defining slot setters**
```{r set_chrom, results='hide'}
setGeneric("chrom<-", function(x, value) standardGeneric("chrom<-"))
setReplaceMethod("chrom", "SNPLocations",
function(x, value) {x@chrom <- value; validObject(x); x})
```
```{r test_slot_setters}
chrom(snplocs) <- LETTERS[1:2] # repair currently broken object
```
```{r error=TRUE}
chrom(snplocs) <- LETTERS[1:3] # try to break it again
```
**Defining a coercion method**
```{r setAs, results='hide'}
setAs("SNPLocations", "data.frame",
function(from)
data.frame(snpid=snpid(from), chrom=chrom(from), pos=pos(from))
)
```
```{r test_coercion}
as(snplocs, "data.frame") # testing
```
# Extending an existing class
## Slot inheritance
* Most of the time (but not always), the child class will have additional slots:
```{r AnnotatedSNPs}
setClass("AnnotatedSNPs",
contains="SNPLocations",
slots=c(
geneid="character" # a character vector of length N
)
)
```
* The slots from the parent class are inherited:
```{r slot_inheritance}
showClass("AnnotatedSNPs")
```
* Constructor:
```{r AnnotatedSNPs_constructor}
AnnotatedSNPs <- function(genome, snpid, chrom, pos, geneid)
{
new("AnnotatedSNPs",
SNPLocations(genome, snpid, chrom, pos),
geneid=geneid)
}
```
## Method inheritance
* Let's create an AnnotatedSNPs object:
```{r method_inheritance}
snps <- AnnotatedSNPs("hg19",
c("rs0001", "rs0002"),
c("chr1", "chrX"),
c(224033L, 1266886L),
c("AAU1", "SXW-23"))
```
* All the methods defined for SNPLocations objects work out-of-the-box:
```{r method_inheritance_2}
snps
```
* But sometimes they don't do the right thing:
```{r as_data_frame_is_not_right}
as(snps, "data.frame") # the 'geneid' slot is ignored
```
##
* Being a SNPLocations *object* vs being a SNPLocations *instance*:
```{r}
is(snps, "AnnotatedSNPs") # 'snps' is an AnnotatedSNPs object
is(snps, "SNPLocations") # and is also a SNPLocations object
class(snps) # but is *not* a SNPLocations *instance*
```
* Method overriding: for example we could define a `show` method for
AnnotatedSNPs objects. `callNextMethod` can be used in that context to call the
method defined for the parent class from within the method for the child class.
* Automatic coercion method:
```{r automatic_coercion_method}
as(snps, "SNPLocations")
```
## Incremental validity method
* The *validity method* for AnnotatedSNPs objects only needs to validate what's
not already validated by the *validity method* for SNPLocations objects:
```{r incremental_validity_method, results='hide'}
setValidity("AnnotatedSNPs",
function(object) {
if (length(object@geneid) != length(object))
return("'geneid' slot must have the length of the object")
TRUE
}
)
```
* In other words: before an AnnotatedSNPs object can be considered valid, it
must first be a valid SNPLocations object.
# What else?
##
**Other important S4 features**
* *Virtual* classes: equivalent to *abstract* classes in Java
* Class unions (see `?setClassUnion`)
* Multiple inheritance: a powerful feature that should be used with caution. If
used inappropriately, can lead to a class hierarchy that is very hard to
maintain
**Resources**
* Man pages in the `methods` package: `?setClass`, `?showMethods`,
`?selectMethod`, `?getMethod`, `?is`, `?setValidity`, `?as`
* The *Extending RangedSummarizedExperiment* section of the
*SummarizedExperiment* vignette in the `SummarizedExperiment` package.
* Note: S4 is *not* covered in the *An Introduction to R* or *The R language
definition*
manuals
* The *Writing R Extensions* manual for details about integrating S4 classes to
a package
* The *R Programming for Bioinformatics* book by Robert
Gentleman