% -*- mode: noweb; noweb-default-code-mode: R-mode; -*-
%\VignetteIndexEntry{gcrma1.2}
%\VignetteKeywords{Preprocessing, Affymetrix}
%\VignetteDepends{affy,Biostrings,tools,splines}
%\VignettePackage{gcrma}
%documentclass[12pt, a4paper]{article}
\documentclass[12pt]{article}

\usepackage{amsmath}
\usepackage{hyperref}
\usepackage[authoryear,round]{natbib}

\textwidth=6.2in
\textheight=8.5in
%\parskip=.3cm
\oddsidemargin=.1in
\evensidemargin=.1in
\headheight=-.3in

\newcommand{\scscst}{\scriptscriptstyle}
\newcommand{\scst}{\scriptstyle}
\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textit{#1}}}
\newcommand{\Rfunarg}[1]{{\textit{#1}}}


\author{Zhijin(Jean) Wu, Rafael Irizarry}
\begin{document}
\title{Description of gcrma package}

\maketitle
\tableofcontents
\section{Introduction}
The \Rpackage{gcrma} package is part of the Bioconductor\footnote{\url{http://www.bioconductor.org/}} project. 

\Rpackage{gcrma} adjusts for background intensities in Affymetrix
array data which include optical noise and non-specific binding
(NSB). The main function \Rfunction{gcrma} converts background
adjusted probe intensities to expression measures using the same
normalization  and summarization methods as \Rfunction{rma}  (Robust
Multiarray Average).

\Rpackage{gcrma} uses probe sequence information to estimate probe
affinity to non-specific binding (NSB). The sequence information is
summarized in a more complex way than the simple GC content. Instead,
the base types (A,T,G or C) at each position (1-25) along the probe 
determine the {\it affinity} of each probe. The parameters of the 
position-specific base contributions to the probe affinity is estimated 
in an NSB experiment in which only
NSB but no gene-specific bidning is expected. In version 2.0.0 we give
options to the users to obtain these parameters from their choice.

With the probe affinities available, we estimate the relationship
between the amount of NSB and the probe sequences. Specifically, we 
estimate the function $$NSB=h(affinity)$$ by fitting a loess curve
through $$\mbox{MM probe intensities} \sim \mbox{MM probe affinities}.$$
In version 2.0.0 we also allow the use of any list of
 negative control(NC) probes instead of MM.

The background adjusted intensity is computed as the posterior mean of
specific binding given the observed intensities and the probe sequences.
This is done in function \Rfunction{bg.adjust.gcrma}. The background
adjusted data is then converted to expression measures with function
\Rfunction{rma} with the option \Rfunarg{background=FALSE} to avoid
another round of background correction.

The following terms are used throughout this document:
\begin{description}
\item[probe] oligonucleotides of 25 base pair length used to
probe RNA targets.
\item[perfect match] probes intended to match perfectly the
target sequence.
\item[$PM$] intensity value read from the perfect matches.
\item[mismatch] the probes having one base mismatch
  with the target sequence intended to account for non-specific binding.
\item[$MM$] intensity value read from the mis-matches.
\item[probe pair] a unit composed of a  perfect
match and its mismatch.
\item[affyID] an identification for a probe set (which can be a gene
or a fraction of a gene) represented on the array.
\item[probe pair set] $PM$s and $MM$s related to a common {\it affyID}.
\item[{\it CEL} files] contain measured intensities and locations for
an array that has been hybridized.
\item[{\it CDF} file] contain the information relating probe pair sets
to locations on the array.
\item[{\it probe} file] contain the information relating probe sequences
to locations on the array.

\end{description}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{What's new in version 2.0.0}
We added a function \Rfunction{bg.adjust.gcrma} for the convenience  of performing
gcrma background adjustment only, without summarizing  into gene level
data. 


In earlier versions, we compute the sequence-determined probe
affinities using parameters estimated from data obtained in our
non-specific binding (NSB)
experiment. In version 2.0.0 we give various options for the users to
choose their own sources of such data. Users can choose  to 
\begin{enumerate}
\item compute probe affinities based on their own non-specific biniding
experiment (typically all probes in  such experiments will contain
non-specific bnidng, but the  users can choose their list of probes)
\item compute probe affinties using  each experimental array, with the
user-defined negative control (NC) probes (MMs will be used if NC not specified).
\end{enumerate}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Options for \Rfunction{gcrma}}
Here we explain the major options in gcrma. Examples will be given in
more details in the next section.

\begin{enumerate}
\item \Robject{affinity.info}

This is in general an AffyBatch object but you can leave it as {\it NULL}
and let \Rfunction{gcrma} compute it automatically. Instead of storing probe
intensities,  it contains probe affinities.

When affinity.info is set to {\it NULL} (default), it will be computed 
within the function gcrma.
To compute the affinity of each probe, we first obtain base-postion
profiles (the contribution of each base type at each postion along the probe)
from non-specific binding data. The user can choose to use the developers reference 
data or use each experimetnal array with indexes of negative
control (NC) probes. If the NC probe index is not provided, MM probes
will be used as NC probes.

Some users express the concern that their array type may behave
differently from the human hgu95 array, which was used in the
developers' non-specific binding experiment. The user can choose to
run an independent non-specific binding experiment for her/his own 
research. The affinity.info can be obtained separately by 
\begin{verbatim}
 my.affinity.info <- compute.affinities.local(myNsbData)
\end{verbatim}

and this should be passed to function gcrma
 \begin{verbatim}
 est<-gcrma(myExprData,affinity.info=my.affinity.info)
\end{verbatim}

\item \Rfunarg{type}

  The options for \Rfunarg{type} are
\begin{itemize}
\item {\it fullmodel}: uses both probe sequence information and
  observed MM probe intensities
\item {\it affinities}: uses probe sequence information and ignores MMs
\item {\it MM}: uses MM probe intensities and ignores sequence information
\end{itemize}

\item \Rfunarg{fast}

  When \Rfunarg{fast} is set to {\it TRUE}, an ad hoc procedure is used to speed up
  the non-specific binding correction. The default in previous version
  has been \Rfunarg{fast=TRUE}, but is changed to \Rfunarg{fast=FALSE} in version 2.0.0.

\end{enumerate}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Getting started: the simplest example}
\label{sec:get.started}
You will need the following libraries:
\Rpackage{affy}, \Rpackage{MASS}, cdf and probe packages for your chip
type(such as \Rpackage{hgu95av2cdf} and \Rpackage{hgu95av2probe}).

The first thing you need to do is {\bf load the package}.
\begin{Sinput}
R> library(gcrma) ##load the gcrma package
\end{Sinput}
%%<<echo=F,results=hide>>=
%%library(gcrma)
%%@
If all you want is to go from probe level data ({\it Cel} files) to
expression measures here are some quick ways.

The quickest way of reading in data and getting expression measures
is the following:
\begin{enumerate}
\item Create a directory, move all the relevant
{\it CEL} files to that directory
\item Start R in that directory.
\item If using the Rgui for 
Microsoft Windows make sure your working directory contains the {\it
Cel} files (use ``File -> Change Dir'' menu item).
\item Load the library.
\begin{Sinput}
R> library(gcrma) ##load the gcrma package
\end{Sinput}
\item Read in the data and create an expression, using RMA for example.
\begin{verbatim}
R> Data <-  ReadAffy() ##read data in working directory
R> eset <- gcrma(Data)
\end{verbatim}
\end{enumerate}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{More Examples with options}

  For illustration we will use the Dilution data.

\begin{Sinput}
R> require(affydata)
R> data(Dilution)
\end{Sinput}
%%<<echo=F,results=hide>>=
%%require(affydata)
%%data(Dilution)
%%@

\begin{enumerate}
\item Obtain expression values
  For example, 
\begin{Sinput}
  R> Dil.expr1<-gcrma(Dilution)
\end{Sinput}
%%<<echo=F,results=hide>>=
%%Dil.expr1<-gcrma(Dilution)
%%@ 

 To use the faster ad hoc procedure one can call
\begin{Sinput}
  R> Dil.expr2<-gcrma(Dilution,fast=TRUE)
\end{Sinput}
%%<<echo=F,results=hide>>=
%%Dil.expr2<-gcrma(Dilution,fast=TRUE)
%%@ 

Suppose the user has his/her own NSB experiment and wants to compute
affinity.info with that experiment.
\begin{Sinput}
R> myNsbData <- ReadAffy(``mynsb.cel'')
R> my.affinity.info <- compute.affinities.local(myNsbData)
R> Dil.expr3 <- gcrma(myExprData,affinity.info=my.affinity.info)
\end{Sinput}
%% <<echo=F,results=hide>>=
%% myNsbData<-ReadAffy("mynsb.CEL")
%% my.affinity.info <- compute.affinities.local(myNsbData)
%% Dil.expr3<-gcrma(Dilution,affinity.info=my.affinity.info)
%% @
 
Suppose the user would like to use NC probes in the experimental data to estimate
probe affinities. Here we use MM probes as example for NC probes.
\begin{Sinput}
R> mmIndex <- unlist(indexProbes(Dilution,"mm"))
R> Dil.expr4 <- gcrma(Dilution,affinity.source="local",NCprobe=mmIndex)
\end{Sinput}
%%<<echo=F,results=hide>>=
%%mmIndex <- unlist(indexProbes(Dilution,"mm"))
%%Dil.expr4 <- gcrma(Dilution,affinity.source="local",NCprobe=mmIndex)
%%@
 
Since the MM probes are default setting when NCprobe is not provided,
the  above gives identical result as
\begin{Sinput}
R> Dil.expr5 <- gcrma(Dilution,affinity.source="local")
\end{Sinput}
%%<<echo=F,results=hide>>=
%%Dil.expr5 <- gcrma(Dilution,affinity.source="local")
%%@ 

\item Background adjustment only
The function \Rfunction{bg.adjust.gcrma} allows one to perform background
adjustment only. 
\begin{Sinput}
R> Dil.bgadj <- bg.adjust.gcrma(Dilution)
R> Dil.expr6 <- rma(Dil.bgadj,background=FALSE)
\end{Sinput}
%%<<echo=F,results=hide>>=
%%Dil.bgadj <- bg.adjust.gcrma(Dilution)
%%Dil.expr6 <- rma(Dil.bgadj,background=FALSE)
%%@
 
\Rfunction{gcrma} also tries to adjust for specific binding using probe
sequence. The user can turn off this feature by specifying \Rfunarg{GSB.adjust=FALSE}.
\begin{Sinput}
R> Dil.bgadj <- bg.adjust.gcrma(Dilution,GSB.adjust=FALSE)
\end{Sinput}
%%<<echo=F,results=hide>>=
%%Dil.bgadj <- bg.adjust.gcrma(Dilution,GSB.adjust=FALSE)
%%@
 \end{enumerate}

\section{Efficient use of gcrma}

Most users deal with one or a few types of GeneChip arrays for
repeatedly. To use gcrma efficiently, one can compute the
\Robject{affinity.info} and save it, thus save the time to compute 
\Robject{affinity.info} every time \Rfunction{gcrma} (or 
\Rfunction{bg.adjust.gcrma} is called.

For example, the Dilution data is from hgu95av2 chips.
We can compute affinity.info of chip type "hgu95av2"
using the NSB data provided in \Rpackage{gcrma} and save it in a file.

\begin{Sinput}
R> affinity.info.hgu95av2 <- compute.affinities("hgu95av2")
R> save(affinity.info.hgu95av2,file = "affinity.hgu95av2.RData")
\end{Sinput}
or
\begin{Sinput}
R> affinity.info.hgu95av2 <- compute.affinities(cdfName(Dilution))
R> save(affinity.info.hgu95av2,file = "affinity.hgu95av2.RData")
\end{Sinput}

Now when you need to call \Rfunction{gcrma} for the same type of array, there is
no need to compute \Robject{affinity.info} again:
\begin{Sinput}
R> library(gcrma)
R> data(Dilution)
R> load("affinity.hgu95av2.RData")
R> Dil.expr7 <- gcrma(Dilution,affinity.info=affinity.info.hgu95av2)
\end{Sinput}


\newpage



%\bibliographystyle{plainnat}
%\bibliography{affy}

\end{document}