%\VignetteIndexEntry{Annotation Overview}
%\VignetteKeywords{Expression Analysis}
%\VignetteDepends{fdrame}
%\VignettePackage{fdrame}

\documentclass{article}
\usepackage{latexsym}

\begin{document}

\title{FDR adjustments of Microarray Experiments (FDR-AME)}
\author{Yoav Benjamini, Effi Kenigsberg, Anat Reiner, Daniel Yekutieli }
\maketitle

\begin{center}
Department of Statistics and O.R., Tel Aviv University
\end{center}

\paragraph{Purpose }
This R package adjusts p-values generated in multiple hypotheses testing of 
gene expression data obtained by a microarray experiment. The software 
applies multiple testing procedures that control the False Discovery Rate 
(FDR) criterion introduced by Benjamini and Hochberg (1995). It applies both 
theoretical-distribution-based and resampling-based multiple testing 
procedures, and presents as output adjusted p-values and p-value plots, as 
described in Reiner et al (2003). It goes beyond Reiner et al in offering 
adjustments according to the adaptive two stage FDR controlling procedures 
in Benjamini et al (2001, submitted), and in addressing differences in 
expression between many classes using one-way ANOVA.

\paragraph{The False Discovery Rate (FDR) Criterion}
The FDR is the expected proportion of erroneously rejected null hypotheses 
among the rejected ones. Consider a family of $m$ simultaneously tested null 
hypotheses of which $m_{0}$ are true. For each hypothesis $H_{i}$ a test 
statistic is calculated along with the corresponding p-value $P_{i}$. Let $R$ 
denote the number of hypotheses rejected by a procedure, $V$ the number of true 
null hypotheses erroneously rejected, and $S$ the number of false hypotheses 
rejected. Now let $Q$ denote $V/R$ when $R$>$0$ and $0$ otherwise. Then the FDR is defined as 

\begin{center}
FDR=E(Q).
\end{center}

\textbf{The Linear Step-Up Procedure (BH)}

This procedure makes use of the ordered p-values $P_{(1)}$\textit{$\le ${\ldots} $\le $P}$_{(m)}.$ Denote the 
corresponding null hypotheses $H_{(1)}$\textit{,{\ldots},H}$_{(m)}$. For a desired FDR level $q$, 
the ordered p-value $P_{(i)}$ is compared to the critical value \textit{q$\cdot $i/m}. Let \textit{k = max {\{} i : P}$_{(i) 
}$\textit{$\le $ q}$\cdot $\textit{i/m {\}}}. Then reject $H_{(1)}$\textit{,{\ldots},H}$_{(k),}$ if such a $k$ exists.

Benjamini and Hochberg (1995) show that when the test statistics are 
independent, this procedure controls the FDR at the level $q$. Actually, the 
FDR is controlled at level \textit{FDR $\le $} \textit{q$\cdot $m}$_{0}$\textit{/m $\le $} $q$.

Benjamini and Yekutieli (2001) further show that \textit{FDR $\le $} \textit{q$\cdot $m}$_{0}/m$ for positively 
dependent test statistics as well. The technical condition under which the 
control holds is that of positive regression dependency on each test 
statistic corresponding the true null hypotheses. Reiner et al (2003) and 
Reiner (unpublished thesis) shows \textit{FDR $\le $} $q$ for two sided tests under positive and 
negative correlations.

\textbf{The Adaptive Procedures}

Since the BH procedure controls the FDR at a level too low by a factor of 
$m_{0}/m$, it is natural to try to estimate $m_{0}$ and use $q^\ast =q\cdot 
\frac{m}{m_0 }$ instead of $q$ to gain more power. Benjamini et al (2001) 
suggest a simple two-stage procedure: use BH once to reject r1 hypotheses; 
then use the BH at the second stage at level $q^\ast =q\cdot \frac{m 
}{(m-r_1 )\cdot (1+q)}$ This two stage procedure has proven FDR controlling 
properties under independence and simulation support for its controlling 
properties under positive dependence.

\textbf{Resampling FDR Adjustments}

For data containing high inter-correlations, generally designed multiple 
comparisons may be over-conservative in specific dependency structures. 
Resampling-based multiple testing procedures utilize the empirical 
dependency structure of the data to construct more powerful FDR controlling 
procedures. 

In p-value resampling, the data is repeatedly resampled under the complete 
null hypotheses, and a vector of resample-based p-values is computed. The 
underlying assumption is that the joint distribution of p-values 
corresponding to the true null hypotheses, which is generated through the 
p-value resampling scheme, represents the real joint distribution under the 
null hypothesis. Thus, for each value of $p$, the number of resampling-based 
p-values less than$p$, denoted by $V^\ast (p)$, is an estimated upper bound 
to the expected number of p-values corresponding to true null hypotheses 
less than $p$.

Yekutieli and Benjamini (1999) introduce resampling-based FDR control, while 
taking into account that the FDR is also a function of the number of false 
null hypotheses rejected. Therefore, for each value of $p$, they first 
conservatively estimate the number of false null hypotheses less than $p$, 
denoted by $\hat {s}(p)$, and then estimate the FDR adjustment by
\[
FDR^{est}(p)=E_{V^*(p)}\frac{V^*(p)}{V^*(p)+\hat s(p)}
\]
Two estimation methods are suggested differing by their strictness level. 
The FDR local estimator is conservative on the mean, and the FDR upper limit 
bounds the FDR with probability 95{\%}.

A third alternative uses the BH procedure to control the FDR, but rather 
than using the raw p-values, it estimates the p-values by resampling from 
the marginal distribution and collapsing over all hypotheses, assuming 
exchangeability of the marginal distributions: For the $k$-th gene, with an 
observed test statistics $t_{k}$, the estimated p-value is
\[
P^{est}_k=\frac1I\sum_{i=1}^I\left[\frac1N
 \#\left(|t_i^{*j}|\geq |t_k|\right)\right]
\]
We next use the estimated p-values in the BH procedure to easily obtain the 
BH point estimate for the k-th gene:
\[
P^{BH}_{(k)}=\min_{k\leq i}\frac{P^{est}_{(i)}\cdot m}i
\]
\paragraph{Plots of p-values}
\label{para:plots}
In addition to output of significant genes in a file, the program offers 
plots of p-value. The plot of p-values versus rank for all genes is a 
diagnostic plot that allows researchers to examine the adequacy of the 
preprocessing stage as well as of the assumptions on which the distribution 
of the test statistics are based. The plot of the adjusted p-values versus 
rank (or versus estimated difference) allows researchers to pick their 
desired FDR level by comparing simply comparing the adjusted p-value to the 
desired level, and then view the consequence in terms of the pool of genes 
thereby identified as significant. Each FDR controlling method results in 
its corresponding set of adjusted p-values.

\paragraph{References}
\begin{enumerate}
\item Benjamini,Y. and Hochberg,Y. (1995) Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J. Roy. Stat. Soc. B., \textbf{57}, 289-300.
\item Benjamini,Y., Krieger,A. and Yekutieli,D. (2001) Two-Staged Linear Step-Up FDR Controlling Procedure, Department of Statistics and Operation Research, Tel-Aviv University, and Department of Statistics, Wharton School, University of Pennsylvania, Technical Report. (Submitted)
\item Benjamini,Y. and Yekutieli,D. (2001) The Control of the False Discovery Rate Under Dependency. Ann Stat, \textbf{29}, 1165-1188.
\item Reiner,A., Yekutieli,D. and Benjamini,Y. (2003) Identifying Differentially Expressed Genes Using False Discovery Rate Controlling Procedures. Bioinformatics, 19(3), 368-375.
\item Yekutieli,D. and Benjamini,Y. (1999) Resampling-Based False Discovery Rate Controlling Multiple Test Procedures for Correlated Test Statistics. J Stat Plan Infer, \textbf{82}, 171-196.
\end{enumerate}

\end{document}