%\VignetteIndexEntry{CGHcall}
%\VignetteDepends{}
%\VignetteKeywords{Calling aberrations for array CGH tumor profiles.}
%\VignettePackage{CGHcall}

\documentclass[11pt]{article}

\usepackage{amsmath}
\usepackage[authoryear,round]{natbib}
\usepackage{hyperref}
\SweaveOpts{echo=FALSE}

\begin{document}

\setkeys{Gin}{width=0.99\textwidth}

\title{\bf CGHcall: Calling aberrations for array CGH tumor profiles.}

\author{Sjoerd Vosse and Mark van de Wiel}

\maketitle

\begin{center}
Department of Epidemiology \& Biostatistics\\
VU University Medical Center
\end{center}

\begin{center}

{\tt mark.vdwiel@vumc.nl}
\end{center}


\tableofcontents

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Overview}

CGHcall allows users to make an objective and effective classification of their aCGH data into copy number 
states (loss, normal, gain or amplification). This document provides an overview on the usage of the CGHcall package. 
For more detailed information on the algorithm and assumptions we refer to the article \citep{CGHcall} and its 
supplementary material. As example data we attached the first five samples of the Wilting dataset \citep{Wilting}. 
After filtering and selecting only the autosomal 4709 datapoints remained.

\section{Example}

In this section we will use CGHcall to call and visualize the aberrations in the dataset described above. 
First, we load the package and the data:

<<echo=TRUE,print=FALSE>>=
library(CGHcall)
data(Wilting)
Wilting <- make_cghRaw(Wilting)
@

\noindent
Next, we apply the {\tt preprocess} function which:
\begin{itemize}
\item removes data with unknown or invalid position information.
\item shrinks the data to {\tt nchrom} chromosomes.
\item removes data with more than {\tt maxmiss} \% missing values.
\item imputes missing values using {\tt impute.knn} from the package {\tt impute} \citep{Impute}.
\end{itemize}

<<echo=TRUE,print=FALSE>>=
cghdata <- preprocess(Wilting, maxmiss=30, nchrom=22)
@

To be able to compare profiles they need to be normalized. In this package we first provide very basic global median or 
mode normalization. This function also contains smoothing 
of outliers as implemented in the DNAcopy package \citep{DNAcopy}. Furthermore, when the proportion of tumor cells is 
not 100\% the ratios can be corrected. See the article and the supplementary material for more information on 
cellularity correction \citep{CGHcall}.

<<echo=TRUE,print=FALSE>>=

norm.cghdata <- normalize(cghdata, method="median", smoothOutliers=TRUE)
@

The next step is segmentation of the data. This package only provides a wrapper 
function that applies the {\tt DNAcopy} algorithm \citep{DNAcopy}. It provides extra functionality
by allowing to undo splits differently for long and short segments, respectively. In the example below
short segments are smaller than clen=10 probes, and for such segments 
undo.splits is effective when segments are less than undo.SD=3 (sd) apart. For long segments a less stringent criterion holds:
undo when less than undo.SD/relSDlong = 3/5 (sd) apart. If, for two consecutive segements, one is short and one is long, 
splits are undone in the same way as for two consecutive short segments.
To save time we will limit our analysis to the first two samples from here on.

<<echo=TRUE,print=FALSE>>=
norm.cghdata <- norm.cghdata[,1:2]
seg.cghdata <- segmentData(norm.cghdata, method="DNAcopy",undo.splits="sdundo",undo.SD=3,
clen=10, relSDlong=5)
@

Post-segmentation normalization allows to better set the zero level after segmentation. 

<<echo=TRUE,print=FALSE>>=
postseg.cghdata <- postsegnormalize(seg.cghdata)
@

Now that the data have been normalized and segments have been defined, we need to determine which 
segments should be classified as double losses, losses, normal, gains or amplifications. 
Cellularity correction is now provided WITHIN the calling step (as opposed to some earlier of CGHcall) 


<<echo=TRUE,print=FALSE>>=
tumor.prop <- c(0.75, 0.9)
result <- CGHcall(postseg.cghdata,nclass=5,cellularity=tumor.prop)
@

The result of CGHcall needs to be converted to a call object. This can be a large object for large arrays. 

<<echo=TRUE,print=FALSE>>=
result <- ExpandCGHcall(result,postseg.cghdata)
@

\pagebreak
\noindent
To visualize the results per profile we use the {\tt plotProfile} function:

\begin{center}
<<fig=TRUE,echo=TRUE>>=
plot(result[,1])
@
\end{center}

\pagebreak
\begin{center}
<<fig=TRUE,echo=TRUE>>=
plot(result[,2])
@
\end{center}

\pagebreak
\noindent
Alternatively, we can create a summary plot of all the samples:

\begin{center}
<<fig=TRUE,echo=TRUE>>=
summaryPlot(result)
@
\end{center}

\pagebreak
\noindent
Or a frequency plot::

\begin{center}
<<fig=TRUE,echo=TRUE>>=
frequencyPlotCalls(result)
@
\end{center}

\pagebreak
%\newpage
\bibliographystyle{apalike}
\bibliography{CGHcall}

\end{document}