---
title: 'MetaVolcanoR: Differential expression meta-analysis tool'
date: June 5, 2019
output:
  html_document:
    toc: yes
  pdf_document:
    toc: yes
  prettydoc::html_pretty:
    highlight: github
    theme: lumen
    toc: yes
vignette: > 
  %\VignetteIndexEntry{MetaVolcanoR: Differential expression meta-analysis tool}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r style, echo=FALSE, results="asis", message=FALSE}
knitr::opts_chunk$set(tidy = FALSE,
                      warning = FALSE,
                      message = FALSE,
                      cache=TRUE)
```

# Introduction

The measurement of how gene expression changes under different biological 
conditions is necessary to reveal the gene regulatory programs that determine
the cellular phenotype. 

Comparing the expresion of genes under a given condition against a reference 
biological state is usually applied to identify sets of differentially 
expressed genes (DEG). These DEG point out the genomic regions functionally 
relevant under the biological condition of interest. 

Athough individual genome-wide expression studies have small signal/noise ratio,
today's genomic data availability usually allows to combine differential gene 
expression results from independent studies to overcome this limitation. 

Databases such as GEO (https://www.ncbi.nlm.nih.gov/geo/), SRA 
(https://www.ncbi.nlm.nih.gov/sra), ArrayExpress, 
(https://www.ebi.ac.uk/arrayexpress/), and ENA (https://www.ebi.ac.uk/ena) offer
systematic access to vast amounts of transcriptome data. There exists more than
one gene expression study for many biological conditions. This redundancy could
be exploit by meta-analysis approaches to reveal genes that are consistently
and differentially expressed under given conditions.

MetaVolcanoR was designed to identify the genes whose expression is consistently 
perturbed across several studies. 

# Usage

## Overview 
The MetaVolcanoR R package combines differential gene expression results. It 
implements three strategies to summarize gene expression activities from 
different studies. i) Random Effects Model (REM) approach. ii) a vote-counting
approach, and iii) a p-value combining-approach. MetaVolcano exploits the 
Volcano plot reasoning to visualize the gene expression meta-analysis results.

## Installation

```{r}

BiocManager::install("MetaVolcanoR", eval = FALSE)
```

## Load library

```{r}
library(MetaVolcanoR)
```

## Input Data

Users should provide a named list of \code{data.table/data.frame} objects 
containing differential gene expression results. Each object of the list  
must contain a *gene name*, a *fold change*, and a *p-value* variable. It 
is highly recommended to also include the *variance* or the confidence 
intervals of the *fold change* variable. 

Take a look at the demo data. It includes differential gene expression results
from five studies. 

```{r}
data(diffexplist)
class(diffexplist)
head(diffexplist[[1]])
length(diffexplist)
```

# Implemented meta-analysis approaches

## Random Effect Model MetaVolcano

The *REM* MetaVolcano summarizes the gene fold change of several studies taking
into account the variance. The *REM* estimates a *summary p-value* which stands 
for the probability of the *summary fold-change* is not different than zero. 
Users can set the *metathr* parameter to highlight the top percentage of the 
most consistently perturbed genes. This perturbation ranking is defined 
following the *topconfects* approach.


```{r}
meta_degs_rem <- rem_mv(diffexp=diffexplist,
			pcriteria="pvalue",
			foldchangecol='Log2FC', 
			genenamecol='Symbol',
			geneidcol=NULL,
			collaps=FALSE,
			llcol='CI.L',
			rlcol='CI.R',
			vcol=NULL, 
			cvar=TRUE,
			metathr=0.01,
			jobname="MetaVolcano",
			outputfolder=".", 
			draw='HTML',
			ncores=1)

head(meta_degs_rem@metaresult, 3)

meta_degs_rem@MetaVolcano

draw_forest(remres=meta_degs_rem,
	    gene="MMP9",
	    genecol="Symbol", 
	    foldchangecol="Log2FC",
	    llcol="CI.L", 
	    rlcol="CI.R",
	    jobname="MetaVolcano",
	    outputfolder=".",
	    draw="HTML")

draw_forest(remres=meta_degs_rem,
	    gene="COL6A6",
	    genecol="Symbol", 
	    foldchangecol="Log2FC",
	    llcol="CI.L", 
	    rlcol="CI.R",
	    jobname="MetaVolcano",
	    outputfolder=".",
	    draw="HTML")

```

&nbsp;
The *REM* MetaVolcano also allows users to explore the forest plot of a given 
gene based on the REM results.

## Vote-counting approach

The *vote-counting* MetaVolcano identifies differential expressed genes (DEG) 
for each study based on the user-defined *p-value* and *fold change* thresholds.
It displays the number of differentially expressed and unperturbed genes per 
study. In addition, it plots the inverse cumulative distribution of the 
consistently DEG, so the user can identify the number of genes whose expression
is perturbed in at least 1 or n studies.

```{r}
meta_degs_vote <- votecount_mv(diffexp=diffexplist,
			       pcriteria='pvalue',
			       foldchangecol='Log2FC',
			       genenamecol='Symbol',
			       geneidcol=NULL,
			       pvalue=0.05,
			       foldchange=0, 
			       metathr=0.01,
			       collaps=FALSE,
			       jobname="MetaVolcano", 
			       outputfolder=".",
			       draw='HTML')

head(meta_degs_vote@metaresult, 3)
meta_degs_vote@degfreq
```

The *vote-counting* MetaVolcano visualizes genes based on the number of studies
where genes were identified as differentially expressed and the gene fold change 
sign consistency. It means that a gene that was differentially expressed in five 
studies, from which three of them it was downregulated, will get a sign 
consistency score of 2 + (-3) = -1. Based on user preference, MetaVolcano can 
highlight the top *metathr* percentage of consistently perturbed genes.


```{r}
meta_degs_vote@MetaVolcano
```

## Combining-approach 

The *combinig* MetaVolcano summarizes the fold change of a gene in different 
studies by the *mean* or *median* depending on the user preference. In addition,
the *combinig* MetaVolcano summarizes the gene differential expression 
*p-values* using the Fisher method. The *combining* MetaVolcano can highlight 
the top *metathr* percentage of consistently perturbed genes.


```{r}
meta_degs_comb <- combining_mv(diffexp=diffexplist,
			       pcriteria='pvalue', 
			       foldchangecol='Log2FC',
			       genenamecol='Symbol',
			       geneidcol=NULL,
			       metafc='Mean',
			       metathr=0.01, 
			       collaps=TRUE,
			       jobname="MetaVolcano",
			       outputfolder=".",
			       draw='HTML')

head(meta_degs_comb@metaresult, 3)
meta_degs_comb@MetaVolcano
```