% % NOTE -- ONLY EDIT THE .Rnw FILE!!! The .tex file is % likely to be overwritten. % \documentclass[12pt]{article} \usepackage{amsmath,pstricks} \usepackage[authoryear,round]{natbib} \usepackage{hyperref} \textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \textwidth=6.2in \bibliographystyle{plainnat} \begin{document} %\setkeys{Gin}{width=0.55\textwidth} \title{Reproducibility of Dressman 2007} \author{VJ Carey} \maketitle The dispute between Baggerly and Coombes (BC) and Dressman, Potti and Nevins (DPN) in J Clin Oncology, 2008; 26(7):1186-1187, is of broad interest. Because general reproducibility of genome-scale data analysis has been questioned at high levels (cite Ioannidis and others), specific debates on difficulties of reproduction of analytic findings should be examined carefully. Study of these debates can reveal patterns of analysis and interpretation that should be avoided to reduce risks of conflicts over reproducibility, or promoted to increase ease with which new findings can be prioritized for application and extension. The response of DPN to BC is harsh and dismissive. DPN write that ``[t]o `reproduce' means to repeat, following the methods outlined in an original report'', and then argue that BC's claims are erroneous and inconsistent. The debate is difficult to follow in detail, in part because much of the work of BC is devoted to correcting clerical errors in the archive for the original paper, in part because the primary documentation of BC's findings is spread over 7 substantial supplementary documents and hundreds of supporting data files, and in part because BC's investigations of non-reproducibility are interlarded with investigations of confounding. Confounding is a common problem for statistical analysis of observational data, and epidemiologists frequently hedge their conclusions with admissions that unmeasured and uncorrected confounding could be responsible for some component of observed associations. BC's primary claims are a) good faith attempts to reproduce findings such as Dressman et al.'s Figures 2B and 2C do not succeed on the basis of the published data, even after admitted clerical errors are corrected, and b) any findings associating pathway activation with survival that can be teased out of the public archive become nonsignificant when confounding adjustments are made. The response by DPN has three main components. First, they admit various clerical errors but assert that these errors are connected only with the creation of the public archive and no not affect the paper's inferences. Second, they argue that post-RMA processing using sparse factor regression would eliminate any artifacts due to batch and thus eliminate confounding. Third, they argue that BC did not use the same methods of analysis as the original paper and therefore cannot produce any judgment on the reproducibility of Dressman's analyses. The second and third components of response are of serious concern. The claim that sparse factor regression can \textit{completely} eliminate batch-related artifacts seems unsupportable, although DPN would be completely justified to argue that they did something reasonable -- possibly something very effective -- to reduce effects of array batch in their analysis. Residual confounding could still be present and should be acknowledged. Additional analyses ``adjusting for batch'' would be called for. This makes the third component of DPN's response extremely disconcerting. The fact is that BC did use the ``same methods of analysis'' as Dressman et al. -- BC implicitly used the sparse factor regression adjustments to RMA when using ``Dressman's quantifications'' in their supplementary analyses. Using these quantifications, BC are able to show that considerable numbers of genes have distributions that vary significantly across array batches, despite the corrections introduced by sparse factor regression. There are passages in BC's analyses, for example p 16 of ovca6.pdf, that suggest that sparse factor regression does tend to diminish batch effects (XLS patterns show less clumping by batch than CEL patterns), supporting DPN's hopes for reduced confounding. However, the confounding persists. The only analysts who tested for impact of batch on the pathway activation:survival association using Dressman's data were BC. In the course of preparing an invited chapter on reproducible research discipline in a forthcoming volume on cancer bioinformatics, I decided to try to get to the bottom of the BC-DPN conflict in as independent and concise a manner as possible. The key targets of my investigation were Figures 2B and 2C of the Dressman paper. The necessary ingredients are 1) transcript profiles and survival times for patients analyzed in those figures and 2) coefficients for the tumor scoring procedure that determines activity of Src and E2F3 for each sample. The archive at \url{http://data.cgt.duke.edu/platinum.php} includes the 'corrected RMA' transcript profiles, clinical data including survival time and authors' determination of clinical responsiveness or nonresponsiveness to platinum therapy, and a large collection of raw CEL files related to the study. I assume that `corrected RMA' refers to application of 'sparse factor regression' to remove artifacts such as batch effects from the RMA transformations. Coefficients for the tumor scoring procedure are not available, to the best of my knowledge, but the cell-line transcript profiles of Bild et al. can be obtained from GSE3151 and the singular value decomposition can be used to obtain coefficients facilitating sample scoring. Figure 1(a) shows the association between Src dysregulation and survival among non-responders using Dressman et al.'s 'corrected RMA' with the given sample identifiers. Figure 1(b) shows the same association after sample identifiers are redefined using the method of BC to map Dressman's quantifications to the original CEL files which are assumed to be accurately labeled to correspond to records in the clinical data. Figure 1(b) is a closer approximation to Dressman's original Figure 2B, and shows that BC are correct when they say that the labels in the 'corrected RMA' archive need to be revised. The mild discrepancies between Figure 1(b) and Dressman's original 2B might be explained through differences in tumor scoring coefficients, or through differences in exact numbers of patients/events available -- only 116 of 119 transcript profiles could be reliably mapped to CEL files for relabeling. In any case, Figure 1(b) suggests that we can use the public archive, with some adjustments, to technically reproduce original findings to a reasonable approximation. Figures 1(c) and 1(d) are more troubling. Figure 1(c) reproduces the methods for Figure 1(b) in application to E2F3 activation. Among patients labeled as nonresponders by Dressman in the public archive, there is no association between pathway activation and survival. Figure 1(d) shows that there is an association, but among the responders. These findings are congruent with those of BC, who explored more variations on the data sources and could not recover the finding of the original Figure 2C. We now have two basic problems. First, ignoring concerns about confounding, and using only (and all) quantifications provided by Dressman, and thereby 'following their methods', we cannot reproduce Figure 2C. Second, the significance of the association indicated in my Figure 1(b) is lost when a simple allowance for batch effects is made in the test for different survival distributions by pathway activation. <<do1,fig=TRUE, echo=FALSE>>= psurv = function (x, digits = max(options()$digits - 4, 3), ...) { saveopt <- options(digits = digits) on.exit(options(saveopt)) if (!inherits(x, "survdiff")) stop("Object is not the result of survdiff") if (!is.null(cl <- x$call)) { } omit <- x$na.action if (length(omit)) { } if (length(x$n) == 1) { z <- sign(x$exp - x$obs) * sqrt(x$chisq) temp <- c(x$obs, x$exp, z, signif(1 - pchisq(x$chisq, 1), digits)) names(temp) <- c("Observed", "Expected", "Z", "p") print(temp) } else { if (is.matrix(x$obs)) { otmp <- apply(x$obs, 1, sum) etmp <- apply(x$exp, 1, sum) } else { otmp <- x$obs etmp <- x$exp } df <- (sum(1 * (etmp > 0))) - 1 temp <- cbind(x$n, otmp, etmp, ((otmp - etmp)^2)/etmp, ((otmp - etmp)^2)/diag(x$var)) dimnames(temp) <- list(names(x$n), c("N", "Observed", "Expected", "(O-E)^2/E", "(O-E)^2/V")) uu <- 1 - pchisq(x$chisq, df) } uu } library(dressCheck) library(chron) data(DrAsGiven) data(srcWts) giv = DrAsGiven[intersect(featureNames(DrAsGiven), names(srcWts)),] srcWtsL = srcWts[featureNames(giv)] sco = t(exprs(giv))%*%srcWtsL sdys = 1*(sco>median(sco)) par(mfrow=c(2,2)) library(survival) with(pData(DrAsGiven), plot(survfit(Surv(Survival, X0...alive...1...dead)~sdys, subset=response.0.NR..1.CR==0),col=c("blue", "yellow"), lwd=3, xlab="Months", ylab="survival (%)", main="(a)")) with(pData(DrAsGiven), d1 <<- survdiff(Surv(Survival, X0...alive...1...dead)~sdys, subset=response.0.NR..1.CR==0)) text(37,.05, paste("logrank p=", round(psurv(d1),3))) data(corrp116) corr = corrp116[intersect(featureNames(corrp116), names(srcWts)),] srcWtsL = srcWts[featureNames(corr)] sco = t(exprs(corr))%*%srcWtsL sdys = 1*(sco>median(sco)) with(pData(corrp116), plot(survfit(Surv(Survival, dead)~sdys, subset=CR==0),col=c("blue", "yellow"), lwd=3, xlab="Months", ylab="survival (%)", main="(b)")) with(pData(corrp116), d2 <<- survdiff(Surv(Survival, dead)~sdys, subset=CR==0)) text(37,.05, paste("logrank p=", round(psurv(d2),3))) data(e2f3Wts) corr = corrp116[intersect(featureNames(corrp116), names(e2f3Wts)),] eWtsL = e2f3Wts[featureNames(corr)] sco = t(exprs(corr))%*%eWtsL edys = 1*(sco>median(sco)) with(pData(corrp116), d1 <<- survdiff(Surv(Survival, dead)~edys, subset=CR==0)) with(pData(corrp116), plot(survfit(Surv(Survival, dead)~edys, subset=CR==0),col=c("blue", "yellow"), lwd=3, xlab="Months", ylab="survival (%)", main="(c)")) text(37,.05, paste("logrank p=", round(psurv(d1),3))) with(pData(corrp116), d2 <<- survdiff(Surv(Survival, dead)~edys, subset=CR==1)) with(pData(corrp116), plot(survfit(Surv(Survival, dead)~edys, subset=CR==1),col=c("blue", "yellow"), lwd=3, xlab="Months", ylab="survival (%)", main="(d)")) text(57,.05, paste("logrank p=", round(psurv(d2),3))) with(pData(corrp116), m2 <<- survreg(Surv(Survival, dead)~sdys+poly(chron(rundate),2), subset=CR==0)) summary(m2) @ \end{document}