%\VignetteIndexEntry{Supplement: multi-channel assays}
%\VignetteKeywords{Cell based assays}
%\VignettePackage{cellHTS}

\documentclass[11pt]{article}
\usepackage{amsmath}
\usepackage{color}
\definecolor{darkblue}{rgb}{0.0,0.0,0.75}
\usepackage[%
baseurl={http://www.bioconductor.org},%
pdftitle={Analysis of multi-channel cell-based screens},%
pdfauthor={Wolfgang Huber},%
pdfsubject={cellHTS},%
pdfkeywords={Bioconductor},%
pagebackref,bookmarks,colorlinks,linkcolor=darkblue,citecolor=darkblue,%
pagecolor=darkblue,raiselinks,plainpages,pdftex]{hyperref}

\newcommand{\Robject}[1]{\texttt{#1}}
\newcommand{\Rpackage}[1]{\textit{#1}}
\newcommand{\Rfunction}[1]{\textit{#1}}
\newcommand{\Rclass}[1]{\textit{#1}}

\newcommand{\myincfig}[3]{%
  \begin{figure}[tp]
    \begin{center}
      \includegraphics[width=#2]{#1}
      \caption{\label{#1}#3}
    \end{center}
  \end{figure}
}

\begin{document}

%------------------------------------------------------------
\title{Analysis of multi-channel cell-based screens}
%------------------------------------------------------------
\author{L\'igia Br\'as, Michael Boutros and Wolfgang Huber}
\maketitle
\tableofcontents

\section{Introduction}
This techical report is a supplement of the main vignette 
\textit{End-to-end analysis of cell-based screens: from raw intensity readings
to the annotated hit list} that is given as part of the \Rpackage{cellHTS} 
package. It accompanies the paper \textit{Analysis
of cell-based RNAi screens} by Michael Boutros, L\'igia Br\'as and
Wolfgang Huber~\cite{Boutros2006}. 

The report demonstrates how the 
\Rpackage{cellHTS} package can be applied to the documentation and analysis of 
multi-channel cell-based high-throughput screens (HTS), more specifically, 
dual-channel experiments. 
Such experiments are used, for example, to measure the phenotype of a
pathway-specific reporter gene against a constitutive signal that can be
used for normalization purposes. Typical examples for dual-channel experimental setups
are dual-luciferase assays, whereby both a firefly and renilla luciferase are
measured in the same well. In principle, multiplex assays can consist of many
more than two channels, such as in the case of flow-cytometry
readout or other microscopy-based high-content approaches.

We note that in this report we present a simple approach to analyse data from dual-channel
 experiments, which can be expanded to experiments with more than two reporters, taking the 
in-built normalization functions of \textit{cellHTS} as a template, and employing the 
extensive statistical modeling capabilities of the R
programming language. Moreover, such analyses should be adapted to the biological system and
to the question of interest.

This text has been produced as a reproducible
document~\cite{Gentleman2004RepRes}, containing the actual computer
instructions, given in the language R, to produce all results, including the figures and 
tables that are shown here. To reproduce the computations shown here, 
you will need an installation
of R (version 2.3 or greater) together with a recent version of the package
\Rpackage{cellHTS} and of some other add-on packages. Then, you can simply take the file
\textit{twoChannels.Rnw} in the \textit{doc} directory of the package,
open it in a text editor, run it using the R command
\Rfunction{Sweave}, and modify it according to your needs.

We start by loading the package.
%
<<setup1, results=hide>>=
library("cellHTS")
@ 
%
<<setup2, echo=FALSE, results=hide>>=
## for debugging:
options(error=recover)
@ 
%
%------------------------------------------------------------
\section{Assembling the data}
\label{sec:assemble}
%------------------------------------------------------------

Here, we consider a sample data of a dual-channel experiment performed 
with \textit{D. melanogaster} cells. The screen was conducted  
in microtiter plate format using a library of double-stranded RNAs (dsRNAs), in duplicates.
The example data set corresponds to three 384-well plates. 
The purpose of the screen is to find signaling components of a given pathway.
In the screen, one reporter (assigned to channel 1, and denoted here by $R_1$) 
monitors cell growth and viability, while the other reporter (assigned to channel 2 and 
denoted here by $R_2$) is indicative of pathway activity.


%------------------------------------------------------------
\subsection{Reading the raw intensity files}
%------------------------------------------------------------

The set of available result files and the information about them (which plate,
which replicate, which channel) is given in the \emph{plate list file}. 
The first few lines of the plate list file for this data set are shown in 
Table~\ref{tab:platelist}.
\input{twoChannels-platelist}

Using the function \Rfunction{readPlateData}, we can read the plate list file and 
all the intensity files, thereby assembling the data into a single R
object that can be used for subsequent analyses. First, we define the path for those
files:
%
<<dataPath>>=
experimentName <- "DualChannelScreen"
dataPath=system.file(experimentName, package="cellHTS") 
@ 
%
The input files are in the
\Robject{\Sexpr{experimentName}} directory of the \Rpackage{cellHTS}
package. 
%
<<readPlateData, results=hide>>=
x <- readPlateData("Platelist.txt", name=experimentName, path=dataPath)
@ 
%
<<showX>>=
x
@ 
%

%% Create the tables with the first lines of the plate list file:
%% it would have been nice to use the "xtable" package but it insists on 
%%   adding row numbers, which we don't like.
<<plateFileTable, results=hide, echo=FALSE>>=
cellHTS:::tableOutput(file.path(dataPath, "Platelist.txt"), "plate list", preName="twoChannels")
@ 

%------------------------------------------------------------
\subsection{Annotating the plate results}
%------------------------------------------------------------
\input{twoChannels-plateconfiguration} 
\input{twoChannels-screenlog} 

Next, we annotate the measured data with information on the controls, 
and flag invalid measurements using the information given in the \emph{plate
configuration file} and in the \emph{screen log file}, respectively. 
Selected lines of these files are shown in Table~\ref{tab:plateconfiguration} and
Table~\ref{tab:screenlog}. 
Morevoer, we also add the information contained in the 
\emph{screen description file}, which gives a general description of the screen.
%
<<configure the data>>=
x <- configure(x, "Plateconf.txt", "Screenlog.txt",
  "Description.txt", path=dataPath) 
@ 
%

%%
%% Create the table for plateConf and screenLog
<<plateConfscreenLogTable, results=hide, echo=FALSE>>=
cellHTS:::tableOutput(file.path(dataPath, "Plateconf.txt"), 
    "plate configuration", selRows=1:4, preName="twoChannels")
cellHTS:::tableOutput(file.path(dataPath, "Screenlog.txt"), 
    "screen log", selRows=1:2, preName="twoChannels")
@ 

In this data set, instead of using the default names \emph{pos} 
and \emph{neg} for 
positive and negative controls, respectively, 
we use the name of the gene targeted by 
the probes in the control wells: 
\emph{geneA}, \emph{geneB}, \emph{geneC} and \emph{geneD}. 
This is a more straighforward approach, 
since not all of these four controls behave as controls 
for both reporters $R_1$ and $R_2$. Moreover, the two positive 
controls have different strengths: \emph{geneC} is expected to generate 
a weaker effect than \emph{geneD}. 
Thus, it is useful to define these controls separately at the configuration 
step, in order to calculate the quality measures 
(dynamic range and $Z'$-factors) specific for each of them in the HTML 
quality reports.

%Note that this allows to have the same configuration file for both reporters.
Below, we look at the frequency of each well annotation in the example data:
%
<<>>=
table(x$plateConf$Content)
@ 


%------------------------------------------------------------
\section{Data preprocessing and summarization of replicates}
%------------------------------------------------------------ 

We can take a first look at the data by constructing the HTML quality reports 
using the \Rfunction{writeReport} function. 

As mentioned above, the controls used in the screen are reporter-specific. 
When calling \Rfunction{writeReport}, we need to specify to the function's arguments
\Robject{posControls} and \Robject{negControls} which are the positive and
negative controls for each channel: 
%
<<define controls>>=
## Define the controls for the different channels:
negControls <- vector("character", length=dim(x$xraw)[4])

# channel 1 - gene A
negControls[1] <- "(?i)^geneA$" # case-insensitive and match the empty string at the beginning and end of a line (to distinguish between "geneA" and "geneAB", for example. Although it is not a problem for the present well annotation)
 
# channel 2 - gene A and geneB
negControls[2] <- "(?i)^geneA$|^geneB$" 

posControls <- vector("character", length=dim(x$xraw)[4])
# channel 1 - no controls
# channel 2 - geneC and geneD
posControls[2] <- "(?i)^geneC$|^geneD$"
@ 
%
In the constitutive channel $R_1$, there is one negative control, 
named \emph{geneA}, and no positive controls. In the pathway-specific 
reporter $R_2$ there are two different negative controls (\emph{geneA} 
and \emph{geneB}), and two diffferent positive controls (\emph{geneC} 
and \emph{geneD}).
Each of the arguments \Robject{posControls} and \Robject{negControls} 
should be defined as a vector of regular expressions with the same 
length as the number of channels in \Robject{x\$xraw}.
These arguments will be passed to the \Rfunction{regexpr} 
function for pattern matching within 
the well annotation given in \Robject{x\$wellAnno}.

Finally, we construct the quality report pages for the raw 
data in a directory called 
\Robject{raw}, in the working directory:
%
<<writeReport1Show, eval=FALSE>>=
out <- writeReport(x, outdir="raw",posControls=posControls, negControls=negControls)
@ 
<<writeReport1Do, echo=FALSE, results=hide>>=
out <- writeReport(x, force=TRUE, outdir="raw", posControls=posControls, negControls=negControls)
@ 
%
After this function has finished, we can view the index page of
the report:
%
<<browseReport1, eval=FALSE>>=
browseURL(out)
@ 
%
In this experiment, reporter 1 ($R_1$) monitors cell viability. Thus,
wells with low intensities in $R_1$ should be masked: these cells are not
responding to a specific perturbation of the studied signaling pathway, 
but show a more unspecific cell viability phenotype.
There is no obvious choice for a threshold for the minimum intensity
$R_1$ that we consider still viable; here, we have chosen to set this
cut-off as a low quantile ($5\%$) of the overall distribution of
corrected intensity values in the $R_1$ channel of each replicate.
However, to be able to use the overall distribution of intensities in
the three plates, first we need to remove the plate-to-plate
variations.  This will be performed by applying plate median scaling
to each replicate and channel using the function
\Rfunction{normalizePlates}:
%
<<plateMedianChannels>>=
x <- normalizePlates(x, normalizationMethod="median")
@ 
%
<<set cut-off for R1, echo=TRUE, results=hide>>=
ctoff <- apply(x$xnorm[,,,1], 3, quantile, probs=0.05, na.rm=TRUE)
@ 
%

%
<<FvsRcorrected, fig=TRUE, results=hide, echo=FALSE, include=FALSE, width=6, height=3>>=
R <- getMatrix(x$xnorm, channel=1, na.rm=FALSE)
F <- getMatrix(x$xnorm, channel=2, na.rm=FALSE)

# Use the controls of R2 channel:
posC <- which(regexpr(posControls[2], as.character(x$wellAnno), perl=TRUE)>0)
negC <- which(regexpr(negControls[2], as.character(x$wellAnno), perl=TRUE)>0)

ylim <- range(F, na.rm=TRUE)
xlim <- range(R, na.rm=TRUE)

par(mfrow=c(1,ncol(F)), mai=c(1.15,1.15, 0.3,0.3))
for (r in 1:ncol(F)) {
ind <- apply(cbind(R[,r],F[,r]), 1, function(z) any(is.na(z)))
plot(R[!ind,r],F[!ind,r], col= densCols(cbind(R[!ind,r], F[!ind,r])), pch=16,cex=0.7,
     xlab="R1 (log scale)", ylab="R2 (log scale)", log="xy", 
     ylim=ylim, xlim=xlim)
abline(v=ctoff[r], col="grey", lty=2, lwd=2)
points(R[posC,r],F[posC,r], col="red", pch=20, cex=0.9)
points(R[negC,r],F[negC,r], col="green", pch=20, cex=0.9)
ind <- which(x$xnorm[,,r,1] <= ctoff[r])
points(R[ind,r],F[ind,r], col="grey", pch=20, cex=0.9)
#legend("topleft", col=c("grey", "red", "green"), legend=c("masked", "positive controls","negative controls"),  bty="n", pch=20, cex=0.9)
 }
@ 
%
\myincfig{twoChannels-FvsRcorrected}{\textwidth}{Scatterplot of the plate median 
corrected intensity values in the signal-dependent channel ($R_2$) against 
the plate median corrected intensity values in the constitutive channel 
($R_1$) for replicate 1 (left) and replicate 2 (right). Masked values are shown in grey, while positive and negative controls are shown in red and green, respectively.}
%

Figure~\ref{twoChannels-FvsRcorrected} shows the plate median corrected intensities in $R_2$ versus $R_1$ channels, together with the calculated threshold and the positive and negative controls of the pathway-inducible reporter $R_2$. The wells with intensity values below the 
calculated threshold are shown in grey and will be set to "NA" in the slot \Robject{xraw}.
Since we further want to apply an alternative normalization procedure to the data (see below), 
we will create a new \Rpackage{cellHTS} object called \Robject{y} by copying 
the contents of \Robject{x}, and then flag the low intensity $R_1$ values in \Robject{y\$xraw}. 
%The subsequent preprocessing will be applied on this array.
%
<<masking, results=hide>>=
y <- x
for(r in 1:dim(y$xnorm)[3]) {
ind <- y$xnorm[,,r,1]<=ctoff[r]
y$xraw[,,r,][ind] <- NA
}
@ 
%

In order to distinguish between changes 
in the readout caused by depletion of
specific pathway components versus changes in the overall cell number, 
the next step consists in normalizing the pathway-inducible readout ($R_2$) 
against the constitutive reporter ($R_1$).
This can be done using the \Rfunction{normalizeChannels} function provided in the 
\textit{cellHTS} package. This function can also adjust for
plate effects on the $R_2/R_1$ ratios (or log ratios) using the method specified in the 
parameter \Robject{adjustPlates}. 
For example, by setting it to \Robject{"median"}, the \Rfunction{normalizeChannels} function 
applies plate median normalization by dividing or subtracting (if data has been log 
transformed) each value by the median of values in that plate.

Below, we apply \Rfunction{normalizeChannels} to take the $\log_2$ ratio $R_2/R_1$ using
the data values in \Robject{y\$xraw}, and then use plate median normalization 
to remove plate-to-plate variations:
%
<<Ratio R2/R1>>=
y <- normalizeChannels(y, adjustPlates="median")
@ 
%
The normalized intensities are stored in the slot
\Robject{y\$xnorm}. This is an array of the same size as
\Robject{y\$xraw}, except in the last dimension (number of channels).  

As noted above, the \Rfunction{normalizeChannels} is deprecated, and currently we 
advice the use of the function \Rfunction{summarizeChannels}, 
which was designed to take the raw measurements in \Robject{xraw} and correct them 
for plate effects \emph{before} summarizing the two channels. 
Note that the step of plate correction is optional, 
because the argument \Robject{adjustPlates} can also be omitted when calling this function.
In our case, we first adjust each channel separately by performing plate median scaling 
(\Robject{adjustPlates = median}). Then, we apply the function given 
in the \Robject{fun} argument of
\Rfunction{summarizeChannels}. In this function, we define a low intensity 
threshold for the measurements in the constitutive channel $R_1$, 
which will be set to the $5\%$ quantile of the overall plate median corrected intensities 
in this channel. For the positions where the plate median corrected intentities in $R_1$ channel 
are above this threshold, we relate the two channels plate median corrected measurements 
by taking the $\log_2$ ratio of $R_2/R_1$, while the other positions are set to ``NA''.
We call this function below on \Robject{x}, which contains the original raw data. 
%
<<alternativeNormalization>>=
x = summarizeChannels(x, 
         fun = function(r1, r2, thresh=quantile(r1, probs=0.05, na.rm=TRUE)) 
           ifelse(r1>thresh, log2(r2/r1), as.numeric(NA)),
      adjustPlates="median") 
@ 
%
Considering this last preprocessing procedure, 
we proceed with the analysis using the data stored in \Robject{x}.
Below, we call the \Rfunction{summarizeReplicates} function to determine the 
$z$-score values for each replicate, and then summarize 
the replicated $z$-score values by taking the average.

%
<<summarizeReplicates>>=
x <- summarizeReplicates(x, zscore="-", summary="mean") 
@ 
%
The resulting single $z$-score value per probe is stored in the
slot \Robject{x\$score}. The left side of Figure~\ref{twoChannels-scores} shows the boxplots of 
the $z$-scores for the different types of probes, while the right side of the figure shows 
the $z$-scores for the whole screen as an image plot.
%
<<scores, fig=TRUE, include=FALSE, width=7, height=6>>=
par(mfrow=c(1,2))
ylim <- quantile(x$score, c(0.001, 0.999), na.rm=TRUE)
boxplot(x$score ~ x$wellAnno, col="lightblue", outline=FALSE, ylim=ylim)
imageScreen(x, zrange=c(-2,4))
@ 
%
\myincfig{twoChannels-scores}{0.7\textwidth}{$z$-scores for the screen. 
Left Panel: Boxplots of $z$-scores for the different types of probes. 
Right Panel: Screen-wide image plot.}
%
Now that the data have been preprocessed and scored, we call again 
\Rfunction{writeReport} and use a web browser to view the resulting report.
But first, we have to redefine the positive and negative controls for the normalized data 
\Robject{x\$xnorm}, because it now corresponds to a single channel. 
The controls for the normalized data values are the same as those of the raw data channel $R_2$.
%
<<RedefineControls>>=
## Define the controls for the normalized intensities (only one channel):
# For the single channel, the negative controls are geneA and geneB 
negControls <- "(?i)^geneA$|^geneB$" 
posControls <- "(?i)^geneC$|^geneD$"
@ 
<<report2Show, eval=FALSE>>=
out <- writeReport(x, outdir="logRatio", 
imageScreenArgs=list(zrange=c(-4,4)),
plotPlateArgs=list(xrange=c(-3,3)), 
posControls=posControls, negControls=negControls)
@
<<report2Do, results=hide, echo=FALSE>>=
out <- writeReport(x, force=TRUE, outdir="logRatio", 
imageScreenArgs=list(zrange=c(-4,4)),
plotPlateArgs=list(xrange=c(-3,3)), 
posControls=posControls, negControls=negControls)
@
%
The quality reports have been created in the folder \Robject{logRatio} 
in the working directory.

<<browse2, eval=FALSE>>=
browseURL(out)
@ 
% 
The quality reports have been created in the folder \Robject{logRatio} 
in the working directory.
%
Finally, we will save the data set to a file. 
%
<<savex>>=
save(x, file=paste(experimentName, ".rda", sep=""))
@ 
% 


%---------------------------------------------------------
\section{Session info}
%---------------------------------------------------------

This document was produced using:

<<sessionInfo, results=tex, print=TRUE>>=
toLatex(sessionInfo())
@ 

%------------------------------------------------------------
%Bibliography
%------------------------------------------------------------
\bibliography{cellhts}
\bibliographystyle{plain}

\end{document}