%
% NOTE -- ONLY EDIT THE .Rnw FILE!!!  The .tex file is
% likely to be overwritten.
%
\documentclass[12pt]{article}

\usepackage{amsmath,pstricks}
\usepackage[authoryear,round]{natbib}
\usepackage{hyperref}


\textwidth=6.2in
\textheight=8.5in
%\parskip=.3cm
\oddsidemargin=.1in
\evensidemargin=.1in
\headheight=-.3in

\newcommand{\scscst}{\scriptscriptstyle}
\newcommand{\scst}{\scriptstyle}


\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textit{#1}}}
\newcommand{\Rmethod}[1]{{\texttt{#1}}}
\newcommand{\Rfunarg}[1]{{\texttt{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}

\textwidth=6.2in

\bibliographystyle{plainnat} 
 
\begin{document}
%\setkeys{Gin}{width=0.55\textwidth}

\title{Reproducibility of Dressman 2007}
\author{VJ Carey}
\maketitle

The dispute between Baggerly and Coombes (BC) and Dressman, Potti and Nevins (DPN) in
J Clin Oncology, 2008; 26(7):1186-1187, is of broad interest.  Because general 
reproducibility of genome-scale data analysis has been questioned at high levels (cite
Ioannidis and others), specific debates on difficulties of reproduction of analytic
findings should be examined carefully.  Study of these debates can
reveal patterns of analysis and interpretation
that should be avoided to reduce risks of conflicts over reproducibility,
or promoted to increase ease with which new findings can be prioritized for application and
extension.

The response of DPN to BC is harsh and dismissive.  DPN write that
``[t]o `reproduce' means to repeat, following the methods
outlined in an original report'', and then argue that BC's claims
are erroneous and inconsistent.  The debate is difficult to follow
in detail, in part because much of the work of BC is devoted to
correcting clerical errors in the archive for the original paper,
in part because the primary documentation of BC's findings is spread
over 7 substantial supplementary documents and hundreds of supporting
data files, and in part because BC's
investigations of non-reproducibility are interlarded with investigations of
confounding.  Confounding is a common problem for statistical analysis
of observational data, and epidemiologists frequently hedge their conclusions
with admissions that unmeasured and uncorrected confounding could be
responsible for some component of observed associations.
BC's primary claims are a) good faith attempts to reproduce findings
such as Dressman et al.'s Figures 2B and 2C do not succeed on the basis
of the published data, even after admitted clerical errors are corrected,
and b) any findings associating pathway activation with survival that
can be teased out of the public archive become nonsignificant when confounding
adjustments are made.

The response by DPN has three main components.  First, they admit various
clerical errors but assert that these errors are connected only with the
creation of the public archive and no not affect the paper's inferences.
Second, they argue that post-RMA processing using sparse factor regression
would eliminate any artifacts due to batch and thus eliminate confounding.
Third, they argue that BC did not use the same methods of analysis as
the original paper and therefore cannot produce any judgment on the
reproducibility of Dressman's analyses.  

The second and third components
of response are of serious concern.  The claim that sparse factor regression
can \textit{completely} eliminate batch-related artifacts seems unsupportable,
although DPN would be completely justified to argue that they did something
reasonable -- possibly something very effective -- 
to reduce effects of array batch in their analysis.  Residual confounding
could still be present and should be acknowledged.  Additional analyses
``adjusting for batch'' would be called for.  This makes the third
component of DPN's response extremely disconcerting.  The fact is that
BC did use the ``same methods of analysis'' as Dressman et al. -- BC implicitly used the
sparse factor regression adjustments to RMA when using ``Dressman's quantifications''
in their supplementary analyses.  Using these quantifications, BC are able to
show that considerable numbers of genes have distributions that vary
significantly across array batches, despite the corrections introduced by
sparse factor regression.  There are passages in BC's analyses, for example
p 16 of ovca6.pdf, that suggest that sparse factor regression does tend
to diminish batch effects (XLS patterns show less clumping by batch than
CEL patterns), supporting DPN's hopes for reduced confounding.  However,
the confounding persists.  The only analysts who tested for impact of
batch on the pathway activation:survival association using Dressman's data
were BC.  

In the course of preparing an invited chapter on reproducible research discipline
in a forthcoming volume on cancer bioinformatics, I decided to try to get to the bottom
of the BC-DPN conflict in as independent and concise a manner as possible.  The key
targets of my investigation were Figures 2B and 2C of the Dressman paper.  The necessary
ingredients are 1) transcript profiles and survival times for patients analyzed
in those figures and 2) coefficients for the tumor scoring procedure that determines
activity of Src and E2F3 for each sample.

The archive at \url{http://data.cgt.duke.edu/platinum.php} includes the 'corrected
RMA' transcript profiles, clinical data including survival time and authors' determination
of clinical responsiveness or nonresponsiveness to platinum therapy, and a large collection
of raw CEL files related to the study.  I assume that `corrected RMA' refers to application
of 'sparse factor regression' to remove artifacts such as batch effects from the RMA
transformations.  Coefficients for the tumor scoring procedure are not available,
to the best of my knowledge, but the cell-line transcript profiles of Bild et al. can
be obtained from GSE3151 and the singular value decomposition can be used to obtain
coefficients facilitating sample scoring.

Figure 1(a) shows the association between Src dysregulation and survival
among non-responders using Dressman et al.'s 'corrected RMA' with the
given sample identifiers.  Figure 1(b) shows the same association
after sample identifiers are redefined using the method of BC to
map Dressman's quantifications to the original CEL files which
are assumed to be accurately labeled to correspond to
records in the clinical data.  Figure 1(b) is a closer approximation
to Dressman's original Figure 2B, and shows that BC are correct when they
say that the
labels in the 'corrected RMA' archive need to be revised.
The mild discrepancies between Figure 1(b) and Dressman's original 2B might
be explained through differences in tumor scoring coefficients, or
through differences in exact numbers of patients/events available  -- only
116 of 119 transcript profiles could be reliably mapped to CEL files
for relabeling.  In any case, Figure 1(b) suggests that we can use
the public archive, with some adjustments, to technically
reproduce original findings to a reasonable approximation.
 
Figures 1(c) and 1(d) are more troubling.  Figure 1(c) reproduces
the methods for Figure 1(b) in application to E2F3 activation.  Among
patients labeled as nonresponders by Dressman in the 
public archive, there is no association between pathway activation and
survival.  Figure 1(d) shows that there is an association, but among
the responders.  These findings are congruent with those of BC, who
explored more variations on the data sources and could not recover
the finding of the original Figure 2C.

We now have two basic problems.  First, ignoring concerns about confounding,
and using only (and all) quantifications provided by Dressman, and thereby
'following their methods', we cannot reproduce Figure 2C.  Second, the 
significance of the association
indicated in my Figure 1(b) is lost when a simple allowance for batch
effects is made in the test for different survival distributions by
pathway activation.

<<do1,fig=TRUE, echo=FALSE>>=
psurv = function (x, digits = max(options()$digits - 4, 3), ...) 
{
    saveopt <- options(digits = digits)
    on.exit(options(saveopt))
    if (!inherits(x, "survdiff")) 
        stop("Object is not the result of survdiff")
    if (!is.null(cl <- x$call)) {
    }
    omit <- x$na.action
    if (length(omit)) {
    }
    if (length(x$n) == 1) {
        z <- sign(x$exp - x$obs) * sqrt(x$chisq)
        temp <- c(x$obs, x$exp, z, signif(1 - pchisq(x$chisq, 
            1), digits))
        names(temp) <- c("Observed", "Expected", "Z", "p")
        print(temp)
    }
    else {
        if (is.matrix(x$obs)) {
            otmp <- apply(x$obs, 1, sum)
            etmp <- apply(x$exp, 1, sum)
        }
        else {
            otmp <- x$obs
            etmp <- x$exp
        }
        df <- (sum(1 * (etmp > 0))) - 1
        temp <- cbind(x$n, otmp, etmp, ((otmp - etmp)^2)/etmp, 
            ((otmp - etmp)^2)/diag(x$var))
        dimnames(temp) <- list(names(x$n), c("N", "Observed", 
            "Expected", "(O-E)^2/E", "(O-E)^2/V"))
        uu <- 1 - pchisq(x$chisq, df)
    }
    uu
}
library(dressCheck)
library(chron)
data(DrAsGiven)
data(srcWts)
giv = DrAsGiven[intersect(featureNames(DrAsGiven), names(srcWts)),]
srcWtsL = srcWts[featureNames(giv)]
sco = t(exprs(giv))%*%srcWtsL
sdys = 1*(sco>median(sco))
par(mfrow=c(2,2))
library(survival)
with(pData(DrAsGiven), plot(survfit(Surv(Survival, X0...alive...1...dead)~sdys,
 subset=response.0.NR..1.CR==0),col=c("blue", "yellow"), lwd=3,
 xlab="Months", ylab="survival (%)", main="(a)"))
with(pData(DrAsGiven), d1 <<- survdiff(Surv(Survival, X0...alive...1...dead)~sdys,
 subset=response.0.NR..1.CR==0))
text(37,.05, paste("logrank p=", round(psurv(d1),3)))
data(corrp116)
corr = corrp116[intersect(featureNames(corrp116), names(srcWts)),]
srcWtsL = srcWts[featureNames(corr)]
sco = t(exprs(corr))%*%srcWtsL
sdys = 1*(sco>median(sco))
with(pData(corrp116), plot(survfit(Surv(Survival, dead)~sdys,
 subset=CR==0),col=c("blue", "yellow"), lwd=3,
 xlab="Months", ylab="survival (%)", main="(b)"))
with(pData(corrp116), d2 <<- survdiff(Surv(Survival, dead)~sdys,
 subset=CR==0))
text(37,.05, paste("logrank p=", round(psurv(d2),3)))
data(e2f3Wts)
corr = corrp116[intersect(featureNames(corrp116), names(e2f3Wts)),]
eWtsL = e2f3Wts[featureNames(corr)]
sco = t(exprs(corr))%*%eWtsL
edys = 1*(sco>median(sco))
with(pData(corrp116), d1 <<- survdiff(Surv(Survival, dead)~edys,
 subset=CR==0))
with(pData(corrp116), plot(survfit(Surv(Survival, dead)~edys,
 subset=CR==0),col=c("blue", "yellow"), lwd=3,
 xlab="Months", ylab="survival (%)", main="(c)"))
text(37,.05, paste("logrank p=", round(psurv(d1),3)))
with(pData(corrp116), d2 <<- survdiff(Surv(Survival, dead)~edys,
 subset=CR==1))

with(pData(corrp116), plot(survfit(Surv(Survival, dead)~edys,
 subset=CR==1),col=c("blue", "yellow"), lwd=3,
 xlab="Months", ylab="survival (%)", main="(d)"))
text(57,.05, paste("logrank p=", round(psurv(d2),3)))

with(pData(corrp116), m2 <<- survreg(Surv(Survival, dead)~sdys+poly(chron(rundate),2),
 subset=CR==0))
summary(m2)

@

\end{document}