---
title: "Supported differential expression methods"
author:
- name: Kevin Rue-Albrecht
affiliation:
- University of Oxford
email: kevin.rue-albrecht@imm.ox.ac.uk
output:
BiocStyle::html_document:
self_contained: yes
toc: true
toc_float: true
toc_depth: 2
code_folding: show
date: "`r doc_date()`"
package: "`r pkg_ver('iSEEde')`"
vignette: >
%\VignetteIndexEntry{Supported differential expression methods}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
crop = NULL ## Related to https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html
)
options(width = 100)
```
```{r, eval=!exists("SCREENSHOT"), include=FALSE}
SCREENSHOT <- function(x, ...) knitr::include_graphics(x)
```
```{r vignetteSetup, echo=FALSE, message=FALSE, warning = FALSE}
## Track time spent on making the vignette
startTime <- Sys.time()
## Bib setup
library("RefManageR")
## Write bibliography information
bib <- c(
R = citation(),
BiocStyle = citation("BiocStyle")[1],
knitr = citation("knitr")[1],
RefManageR = citation("RefManageR")[1],
rmarkdown = citation("rmarkdown")[1],
sessioninfo = citation("sessioninfo")[1],
testthat = citation("testthat")[1],
iSEEde = citation("iSEEde")[1]
)
```
# Implementation
## User-facing storage and access
Differential expression results are generally reported as tables of statistics,
including (log~2~) fold-change, p-value, average expression, etc.
Those statistics being reported for individual features (e.g., genes), the
`rowData()` component of `SummarizedExperiment()` objects provides a natural
home for that information.
Specifically, `r BiocStyle::Biocpkg("iSEEde")` stores and retrieves differential expression results in `rowData(se)[["iSEEde"]]`.
However, the `rowData()` function should not be used to access or edit that information.
Instead, the functions `embedContrastResults()` and `contrastResults()`, should be used to store and retrieve contrast results, respectively, as those functions ensure that feature names are kept synchronised with the enclosing `SummarizedExperiment` object.
Moreover, the function `contrastResultsNames()` can be used to retrieve the names of contrast available in a given `SummarizedExperiment` object.
## Additional considerations
The first challenge arises when differential expression statistics are computed only for a subset of features.
In that case, `embedContrastResults()` populates missing information with `NA` values.
The second challenge arises from the different names of columns used by individual differential expression methods to store differential expression common statistics.
To address this, `r BiocStyle::Biocpkg("iSEEde")` provides S4 classes creating a common interface to supported differential expression methods.
Each class of differential expression results implements the following methods:
- `pValue()` returns the vector of raw p-values.
- `log2FoldChange()` returns the vector of log2-fold-change values.
- `averageLog2()` returns the vector of average log2-expression values.
# Example data
In this example, we use the `?airway` data set.
We briefly adjust the reference level of the treatment factor to the untreated condition.
```{r, message=FALSE, warning=FALSE}
library("airway")
data("airway")
airway$dex <- relevel(airway$dex, "untrt")
```
# Supported methods
## Limma
We first use `edgeR::filterByExpr()` to remove genes whose counts are too low to
support a rigorous differential expression analysis. Then we run a standard
Limma-Voom analysis using `edgeR::voomLmFit()`, `limma::makeContrasts()`, and
`limma::eBayes()`. (Alternatively, we could have used `limma::treat()` instead
of `limma::eBayes()`.)
The linear model includes the `dex` and `cell` covariates, indicating the
treatment conditions and cell types, respectively. Here, we are interested in
differences between treatments, adjusted by cell type, and define this
comparison as the `dextrt - dexuntrt` contrast.
The final differential expression results are fetched using `limma::topTable()`.
```{r, message=FALSE, warning=FALSE}
library("edgeR")
design <- model.matrix(~ 0 + dex + cell, data = colData(airway))
keep <- filterByExpr(airway, design)
fit <- voomLmFit(airway[keep, ], design, plot = FALSE)
contr <- makeContrasts("dextrt - dexuntrt", levels = design)
fit <- contrasts.fit(fit, contr)
fit <- eBayes(fit)
res_limma <- topTable(fit, sort.by = "P", n = Inf)
head(res_limma)
```
Then, we embed this set of differential expression results in the `airway`
object using the `embedContrastResults()` method.
Because the output of `limma::topTable()` is a standard `data.frame`, the
`class=` argument must be used to manually identify the method that produced
those results.
Supported classes are listed in the object `iSEEde::embedContrastResultsMethods`, i.e.
```{r, message=FALSE}
library(iSEEde)
embedContrastResultsMethods
```
This manual method is preferable to any automated heuristic (e.g. using the
column names of the `data.frame` for identifying it as a `limma::topTable()`
output).
The results embedded in the `airway` object can be accessed using the `contrastResults()` function.
```{r}
airway <- embedContrastResults(res_limma, airway, name = "Limma-Voom", class = "limma")
contrastResults(airway)
contrastResults(airway, "Limma-Voom")
```
## DESeq2
First, we use `DESeqDataSet()` to construct a `DESeqDataSet` object
for the analysis. Then we run a standard `r BiocStyle::Biocpkg("DESeq2")`
analysis using `DESeq()`.
The differential expression results are fetched using `results()`.
```{r, message=FALSE, warning=FALSE}
library("DESeq2")
dds <- DESeqDataSet(airway, ~ 0 + dex + cell)
dds <- DESeq(dds)
res_deseq2 <- results(dds, contrast = list("dextrt", "dexuntrt"))
head(res_deseq2)
```
Then, we embed this set of differential expression results in the `airway`
object using the `embedContrastResults()` method.
In this instance, the `r BiocStyle::Biocpkg("DESeq2")` results are stored in a
recognisable `?DESeqResults` object, which can be given *as is* directly to the
`embedContrastResults()` method.
The function will automatically pass the results object to the
`iSEEDESeq2Results()` constructor, to identify it as such.
The results embedded in the airway object can be accessed using the `contrastResults()` function.
```{r}
airway <- embedContrastResults(res_deseq2, airway, name = "DESeq2")
contrastResults(airway)
contrastResults(airway, "DESeq2")
```
## edgeR
We run a standard `r BiocStyle::Biocpkg("edgeR")` analysis using `glmFit()` and
`glmLRT()`.
The differential expression results are fetched using `topTags()`.
```{r, message=FALSE, warning=FALSE}
library("edgeR")
design <- model.matrix(~ 0 + dex + cell, data = colData(airway))
fit <- glmFit(airway, design, dispersion = 0.1)
lrt <- glmLRT(fit, contrast = c(-1, 1, 0, 0, 0))
res_edger <- topTags(lrt, n = Inf)
head(res_edger)
```
Then, we embed this set of differential expression results in the `airway`
object using the `embedContrastResults()` method.
In this instance, the `r BiocStyle::Biocpkg("edgeR")` results are stored in a
recognisable `?TopTags` object. As such, the results object can be given *as is*
directly to the `embedContrastResults()` method. The function will automatically pass
the results object to the `iSEEedgeRResults()` constructor, to identify it as
such.
The results embedded in the airway object can be accessed using the `contrastResults()` function.
```{r}
airway <- embedContrastResults(res_edger, airway, name = "edgeR")
contrastResults(airway)
contrastResults(airway, "edgeR")
```
# Live app
In this example, we used the `r BiocStyle::Biocpkg("iSEEde")` functions `DETable()`, `VolcanoPlot()`, and `MAPlot()` to add panels that facilitate the visualisation of differential expression results in an `r BiocStyle::Biocpkg("iSEE")` app.
Specifically, we add one set of panels for each differential expression method used in this vignette (i.e., Limma-Voom, DESeq2, edgeR).
```{r, message=FALSE}
library(iSEE)
app <- iSEE(airway, initial = list(
DETable(ContrastName="Limma-Voom", HiddenColumns = c("AveExpr",
"t", "B"), PanelWidth = 4L),
VolcanoPlot(ContrastName = "Limma-Voom", PanelWidth = 4L),
MAPlot(ContrastName = "Limma-Voom", PanelWidth = 4L),
DETable(ContrastName="DESeq2", HiddenColumns = c("baseMean",
"lfcSE", "stat"), PanelWidth = 4L),
VolcanoPlot(ContrastName = "DESeq2", PanelWidth = 4L),
MAPlot(ContrastName = "DESeq2", PanelWidth = 4L),
DETable(ContrastName="edgeR", HiddenColumns = c("logCPM",
"LR"), PanelWidth = 4L),
VolcanoPlot(ContrastName = "edgeR", PanelWidth = 4L),
MAPlot(ContrastName = "edgeR", PanelWidth = 4L)
))
if (interactive()) {
shiny::runApp(app)
}
```
```{r, echo=FALSE, out.width="100%"}
SCREENSHOT("screenshots/methods_side_by_side.png", delay = 20)
```
# Comparing two contrasts
The `?LogFCLogFCPlot` class allows users to compare the log~2~ fold-change value of features between two differential expression contrasts.
In this example, we add one `?LogFCLogFCPlot` panel comparing the same contrast using the Limma-Voom and DESeq2 methods, alongside one `?VolcanoPlot` panel for each of those two contrasts.
Moreover, we pre-select an area of the `?LogFCLogFCPlot` and highlight the selected features in the two `?VolcanoPlot` panels.
```{r, message=FALSE, warning=FALSE}
library(iSEE)
app <- iSEE(airway, initial = list(
VolcanoPlot(ContrastName="Limma-Voom",
RowSelectionSource = "LogFCLogFCPlot1", ColorBy = "Row selection",
PanelWidth = 4L),
LogFCLogFCPlot(ContrastNameX="Limma-Voom", ContrastNameY="DESeq2",
BrushData = list(
xmin = 3.6, xmax = 8.2, ymin = 3.8, ymax = 9.8,
mapping = list(x = "X", y = "Y"),
direction = "xy", brushId = "LogFCLogFCPlot1_Brush",
outputId = "LogFCLogFCPlot1"),
PanelWidth = 4L),
VolcanoPlot(ContrastName="DESeq2",
RowSelectionSource = "LogFCLogFCPlot1", ColorBy = "Row selection",
PanelWidth = 4L)
))
if (interactive()) {
shiny::runApp(app)
}
```
```{r, echo=FALSE, out.width="100%"}
SCREENSHOT("screenshots/logfc_logfc_plot.png", delay = 20)
```
# Reproducibility
The `r Biocpkg("iSEEde")` package `r Citep(bib[["iSEEde"]])` was made possible
thanks to:
* R `r Citep(bib[["R"]])`
* `r Biocpkg("BiocStyle")` `r Citep(bib[["BiocStyle"]])`
* `r CRANpkg("knitr")` `r Citep(bib[["knitr"]])`
* `r CRANpkg("RefManageR")` `r Citep(bib[["RefManageR"]])`
* `r CRANpkg("rmarkdown")` `r Citep(bib[["rmarkdown"]])`
* `r CRANpkg("sessioninfo")` `r Citep(bib[["sessioninfo"]])`
* `r CRANpkg("testthat")` `r Citep(bib[["testthat"]])`
This package was developed using `r BiocStyle::Biocpkg("biocthis")`.
Code for creating the vignette
```{r createVignette, eval=FALSE}
## Create the vignette
library("rmarkdown")
system.time(render("methods.Rmd", "BiocStyle::html_document"))
## Extract the R code
library("knitr")
knit("methods.Rmd", tangle = TRUE)
```
Date the vignette was generated.
```{r reproduce1, echo=FALSE}
## Date the vignette was generated
Sys.time()
```
Wallclock time spent generating the vignette.
```{r reproduce2, echo=FALSE}
## Processing time in seconds
totalTime <- diff(c(startTime, Sys.time()))
round(totalTime, digits = 3)
```
`R` session information.
```{r reproduce3, echo=FALSE}
## Session info
library("sessioninfo")
options(width = 120)
session_info()
```
# Bibliography
This vignette was generated using `r Biocpkg("BiocStyle")` `r
Citep(bib[["BiocStyle"]])` with `r CRANpkg("knitr")` `r Citep(bib[["knitr"]])`
and `r CRANpkg("rmarkdown")` `r Citep(bib[["rmarkdown"]])` running behind the
scenes.
Citations made with `r CRANpkg("RefManageR")` `r Citep(bib[["RefManageR"]])`.
```{r vignetteBiblio, results = "asis", echo = FALSE, warning = FALSE, message = FALSE}
## Print bibliography
PrintBibliography(bib, .opts = list(hyperlink = "to.doc", style = "html"))
```
<!-- Links -->
[scheme-wikipedia]: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Syntax
[iana-uri]: https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml