%\VignetteIndexEntry{flowClean} %\VignetteDepends{flowCore} %\VignetteDepends{flowClean} %\VignetteDepends{bit} %\VignetteDepends{changepoint} %\VignetteDepends{sfsmisc} %\VignettePackage{flowClean} \documentclass[12pt]{article} <>= options(width=70) @ \SweaveOpts{eps=FALSE,echo=TRUE,png=TRUE,pdf=FALSE,figs.only=TRUE,keep.source=TRUE} \usepackage{fullpage} \usepackage{times} \usepackage[colorlinks=TRUE,urlcolor=blue,citecolor=blue]{hyperref} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\Rcode}[1]{{\texttt{#1}}} \newcommand{\software}[1]{\textsf{#1}} \newcommand{\R}{\software{R}} \title{flowClean} \author{Christopher Fletez-Brant, Pratip Chattopadhyay} \date{Modified: April 1, 2014. Compiled: \today} \begin{document} \setlength{\parskip}{0.2\baselineskip} \setlength{\parindent}{0pt} \setkeys{Gin}{width=\textwidth} \maketitle \section*{Introduction} This package contains the flowCore method for performing quality control on flow cytomery datasets. This method is described in \cite{flowCleanpaper}. \begin{small} <>= library(flowClean) library(flowViz) library(grid) library(gridExtra) @ \end{small} \section*{Data} Example data is a real FCS file in which we intentionally perturbed the fluorescent intensity (FI) of a subset of cells along the V705 channel (``). \begin{small} <>= data(synPerturbed) synPerturbed @ \end{small} \section*{Quality Control} The full details are available in \cite{flowCleanpaper}. The motivating idea for this methodology is that populations in a flow experiment should be collected nearly uniformly with respect to time of collection. The primary actor in flowClean is the \Rfunction{clean}, which tests for deviations from uniformity of collection. Specifically, the collection time is discretized into $l$ periods, each of which can be considered a $N$-part composition \begin{displaymath} D_{j = 1..l} = \left[P_1, P_2, \dots, P_N\right] \end{displaymath} with each $P_i$ the frequency of a population defined as +/- with respect to some threshold; the default is the median FI of a flow parameter. By default $l = 100$. Each $D_j$ then udnergoes the centered log ratio (CLR) transformation \cite{comppaper}: \begin{displaymath} CLR(D_j) = \left[ln\frac{P_1}{g(D_j)}; \ldots ; ln\frac{P_N}{g(D_j)}\right] \end{displaymath} where \begin{displaymath} g(D_j) = \sqrt[N]{P_1P_2...P_N} \end{displaymath} To avoid \Robject{-Inf} values, substitution of zeroes is performed using the 'modified Aitchison' of \cite{zerosub}. The $L_p$ norm of the subset $CLR(D_j) > 0$, denoted $L_p = \|CLR(D_j)\|^+$, where $p = |CLR(D_j) > 0|$, is then calculated for each $D_j$ and changepoint analysis is performed on the set of all $\|CLR(D_j)\|^+$. If there are no changes then the FCS is assumed to contain no errors. Otherwise, the means of the periods are compared relative to the mean of the longest period between changepoints and thresholded according to some $k$, which empirically works well with $k = 1.3$. Actually calling \Rfunction{clean} requires only specifying a flowFrame, which markers are to be analyzed (generally without the 'scatter' parameters), the name to be given to the output (directory structure can be included) and the file extension: \begin{small} <>= synPerturbed.c <- clean(synPerturbed, vectMarkers=c(5:16), filePrefixWithDir="sample_out", ext="fcs", diagnostic=TRUE) synPerturbed.c @ \end{small} The result is an FCS file identical to the input file with a new parameter, 'GoodVsBad', in which 'Good' cells all are given $FI < 10000$ and 'Bad' cells are given $FI \geq 10000$, which allows for easy programmatic gating out of 'Bad' cells from multiple FCS files. This parameter can also be used in plots as any other flow parameter as well. \begin{small} <>= lgcl <- estimateLogicle(synPerturbed.c, unname(parameters(synPerturbed.c)$name[5:16])) synPerturbed.cl <- transform(synPerturbed.c, lgcl) p1 <- xyplot(`` ~ `Time`, data=synPerturbed.cl, abs=TRUE, smooth=FALSE, alpha=0.5, xlim=c(0, 100)) p2 <- xyplot(`GoodVsBad` ~ `Time`, data=synPerturbed.cl, abs=TRUE, smooth=FALSE, alpha=0.5, xlim=c(0, 100), ylim=c(0, 20000)) rg <- rectangleGate(filterId="gvb", list("GoodVsBad"=c(0, 9999))) idx <- filter(synPerturbed.cl, rg) synPerturbed.clean <- Subset(synPerturbed.cl, idx) p3 <- xyplot(`` ~ `Time`, data=synPerturbed.clean, abs=TRUE, smooth=FALSE, alpha=0.5, xlim=c(0, 100)) grid.arrange(p1, p2, p3, ncol=3) @ \end{small} \begin{figure} <>= <> @ \caption{Left) FCS before flowClean. Center) New 'GoodVsBad' parameter. Right) FCS after flowClean and filtering.} \label{fig:one} \end{figure} \section*{SessionInfo} <>= toLatex(sessionInfo()) @ \begin{thebibliography}{1} \bibitem{flowCleanpaper} Fletez-Brant C, Spidlen J, Brinkman R, Roederer M, Chattopadhyay P. Quailty Control of flow cytometry data through compositional data analysis. In preparation. \bibitem{comppaper} Aitchison J. A concise guide to compositional data analysis. Compositional Data Analysis Workshop; Girona, Italy. \bibitem{zerosub} Fry J, Fry T, McLaren K. Compositional data analysis and zeros in micro data. CoPS/IMPACT Working Paper Number G-120. \end{thebibliography} \end{document} % Local Variables: % LocalWords: LocalWords flow cytoemtry, compositional data analysis % LocalWords: clean, flowClean % End: