%\VignetteIndexEntry{quick views of eSet instances}
%\VignetteDepends{Biobase, ALL}
%\VignetteKeywords{Data representation, Analysis}
%\VignettePackage{Biobase}


%
% NOTE -- ONLY EDIT THE .Rnw FILE!!!  The .tex file is
% likely to be overwritten.
%
\documentclass[12pt]{article}
\usepackage{amsmath,fullpage}
\usepackage[authoryear,round]{natbib}
\usepackage{hyperref}


\textwidth=6.2in
\textheight=8.5in
%\parskip=.3cm
\oddsidemargin=.1in
\evensidemargin=.1in
\headheight=-.3in

\newcommand{\scscst}{\scriptscriptstyle}
\newcommand{\scst}{\scriptstyle}


\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textit{#1}}}
\newcommand{\Rmethod}[1]{{\texttt{#1}}}
\newcommand{\Rfunarg}[1]{{\texttt{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}

\textwidth=6.2in

\bibliographystyle{plainnat} 
 
\begin{document}
%\setkeys{Gin}{width=0.55\textwidth}

\title{quick view tools for eSets}
\author{VJ Carey}

\maketitle
\tableofcontents

\section{Introduction}

In teaching a course where a large number of datasets are
introduced over a short period of time, the relationship
between data content and software infrastructure can
be hard to master.  This document introduces a number of
experimental approaches to getting rapid access to key
elements of eSet derivatives.



We will work with the ALL data for demonstration.
<<doa,echo=FALSE,results=hide>>=
if (!("Biobase" %in% search())) library(Biobase)
if (!("ALL" %in% search())) library(ALL)
if (!("ALL" %in% objects())) data(ALL)
<<getaEF,eval=FALSE>>=
library(Biobase)
library(ALL)
data(ALL)
ALL
@

\section{An alternative to the current show method}

It could be nice to tell the package from which the dataset
was loaded.

<<mkprov>>=
dataSource = function(dsn) {
 if (!is(dsn, "character")) dsn = try(deparse(substitute(dsn)))
 if (inherits(dsn, "try-error")) stop("can't parse dsn arg")
 dd = data()$results
 if (is.na(match(dsn, dd[,"Item"]))) return(NULL)
 paste("package:", dd[ dd[,"Item"] == dsn, "Package" ], sep="")
}
 

<<newmeth,echo=FALSE,results=hide>>=
setGeneric("peek", function(x,maxattr=10)standardGeneric("peek"))
setMethod("peek", c("eSet", "numeric"), function(x,maxattr=10) {
 ds = dataSource(deparse(substitute(x)))
 if (!is.null(ds)) ds = paste(" [from ", ds, "]", sep="")
  else ds = ""
 cat(deparse(substitute(x)), ds, ":\n", sep="")
 cat("Platform annotation: ", annotation(x),"\n")
 cat("primary assay results are:\n")
 print(dim(x))
 cat("sample attributes are:\n")
   vn = rownames(varMetadata(x))
   ld = substr(varMetadata(x)$labelDescription,1,50)
   dd = data.frame("labelDescription[truncated]"=ld)
   rownames(dd) = vn
 if ((ndd <- nrow(dd)) <= maxattr) show(dd)
 else {
    cat("first", maxattr, "of", ndd, "attributes:\n")
    show(dd[1:maxattr,,drop=FALSE])
    }
 cat("----------\n")
 cat("use varTable to see values/freqs of all sample attributes\n")
 cat("----------\n")
})
setMethod("peek", c("eSet", "missing"), function(x,maxattr=10) {
 ds = dataSource(deparse(substitute(x)))
 if (!is.null(ds)) ds = paste(" [from ", ds, "]", sep="")
  else ds = ""
 cat(deparse(substitute(x)), ds, ":\n", sep="")
 cat("Platform annotation: ", annotation(x),"\n")
 cat("primary assay results are:\n")
 print(dim(x))
 cat("sample attributes are:\n")
   vn = rownames(varMetadata(x))
   ld = substr(varMetadata(x)$labelDescription,1,50)
   dd = data.frame("labelDescription[truncated]"=ld)
   rownames(dd) = vn
 if ((ndd <- nrow(dd)) <= maxattr) show(dd)
 else {
    cat("first", maxattr, "of", ndd, "attributes:\n")
    show(dd[1:maxattr,,drop=FALSE])
    }
 cat("----------\n")
 cat("use varTable to see values/freqs of all sample attributes\n")
 cat("----------\n")
})
setGeneric("varTable", function(x, full=FALSE, max=Inf) standardGeneric("varTable"))
setMethod("varTable", c("eSet", "missing", "ANY"), function(x, full=FALSE, max=Inf) varTable(x, FALSE, max))
setMethod("varTable", c("eSet", "logical", "ANY"), function(x, full=FALSE, max=Inf) {
   ans = lapply( names(pData(x)), function(z)table(x[[z]]) )
   tans = lapply(ans, names)
   kp = 1:min(max,length(tans))
   if (!full) ans = sapply(tans, selectSome, 3)[kp]
   else ans = tans[kp]
   names(ans) = names(pData(x))[kp]
   ans
})
setGeneric("varNames", function(x) standardGeneric("varNames"))
setMethod("varNames", "eSet", function(x) names(pData(x)))
@

We use \texttt{peek} to get a concise view:
<<lka>>=
peek(ALL)
@


\section{Sample characterization}

Getting a handle on sample characterization requires survey
of variable names.
<<lkv>>=
varNames(ALL)
@

In addition, we need to know values taken.  This can be very
cumbersome.  We have a few parameters on how much detail
is provided.
<<lkvn>>=
varTable(ALL, max=4)
@
In the above, we are only showing 4 attributes.  By default
all attributes would be shown.  Note
that the report on range of values is truncated and is character
mode.  We can show the full range of values using the
\texttt{full} parameter.
<<lkvn>>=
varTable(ALL, full=TRUE, max=4)
@

\end{document}