%\VignetteIndexEntry{dsQTL, data excerpt from Degner et al. 2012 Nature letter}
%\VignetteDepends{dsQTL, Biobase, SummarizedExperiment, GGBase, GGtools, rtracklayer}
%\VignetteKeywords{genetics, Expression Analysis}
%\VignettePackage{dsQTL}


%
% NOTE -- ONLY EDIT THE .Rnw FILE!!!  The .tex file is
% likely to be overwritten.
%
\documentclass[12pt]{article}

\usepackage{amsmath,pstricks}
\usepackage{auto-pst-pdf}
\usepackage[authoryear,round]{natbib}
\usepackage{hyperref}


\textwidth=6.2in
\textheight=8.5in
%\parskip=.3cm
\oddsidemargin=.1in
\evensidemargin=.1in
\headheight=-.3in

\newcommand{\scscst}{\scriptscriptstyle}
\newcommand{\scst}{\scriptstyle}


\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textit{#1}}}
\newcommand{\Rmethod}[1]{{\texttt{#1}}}
\newcommand{\Rfunarg}[1]{{\texttt{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}

\textwidth=6.2in

\bibliographystyle{plainnat} 
 
\begin{document}
%\setkeys{Gin}{width=0.55\textwidth}

\title{dsQTL: exploring DNA-variants associated with DNaseI hypersensitivity}
\author{VJ Carey}
\maketitle

\section{Introduction}

Degner et al. (2012) publish information on associations
between DNA variants (SNP, SNV, and indels) and DNaseI hypersensitivity
measures acquired via DNase-Seq.

This package includes information from the Chicago group on
normalized DNase-seq data for chromosomes 2 and 17, and genotype data
from chromosome 2 only.

\section{The basic data structure}



<<lkd>>=
library(dsQTL)
data(DSQ_17)
metadata(DSQ_17)
metadata(DSQ_17)[[1]]
@

We use summarized experiment structure for the assay data, but the imputed
genotype data are kept separate, in the package, in the inst/parts folder.

The data structure on chr2, which will be used to reproduce some findings,
is more mature
<<lkd2>>=
data(DSQ_2)
names(assays(DSQ_2))
assays(DSQ_2)[[1]][1:5,1:5]
rowRanges(DSQ_2)
@

To implement the GGBase protocol for on-the-fly generation of smlSet instances
from getSS queries, we have an ExpressionSet instance with specific names.
<<lkd3>>=
data(eset, package="dsQTL")
ex
@ 


The genotype data supplied by Degner et al are imputed to 1000 genomes
haplotypes, and are reals
in [0,2].   For simplicity the current image of the data uses the
rounding of the fractional genotypes x with round(x,0).

The feature data refer to the retained 100bp segments that
were summarized for DNaseI hypersensitivity and found to lie in the
uppermost 5\% of the distribution.
<<lkf>>=
library(Biobase)
fData(ex)[1:5,,drop=FALSE]
@


We can get the integrated container as
<<domo>>=
library(GGBase)
ds2 = getSS("dsQTL", "roundGT_2")
@
the name indicates that we simply rounded the imputed fractional genotypes
to nearest integer.

A very restricted search is:
<<dose,cache=TRUE>>=
# need to get rid of SNPlocs package getSNPlocs
getSNPlocs = dsQTL::getSNPlocs  # force
library(GGtools)
#library(parallel)
#options(mc.cores=12)
n1 = best.cis.eQTLs(smpack="dsQTL", radius=2000, geneannopk="dsQTL",
  snpannopk="dsQTL", chrnames="2", smchrpref="roundGT_",
  smFilter = function(x) GTFfilter(x, lower=0.05)[23810:23830,], 
#  geneApply=mclapply)
  geneApply=lapply)
<<lkpr>>=
n1
@

<<lkhi,fig=TRUE>>=
plot_EvG(probeId("dhs_2_45370802"), rsid("chr2.45370846"), getSS("dsQTL", "roundGT_2",
  wrapperEndo=function(x){annotation(x)="dsQTL"; x}))
@

\section{Provenance}

\subsection{Normed DNaseI hypersensitivity scores}
The dsQTL package data structures \verb+DSQ_2+ and \verb+DSQ_17+
are generated from GEO GSE31388, from which a collection of 70 compressed
BED format files were acquired Aug 9 2011.  These are imported
using the rtracklayer package to obtain location and score information
for all the recorded DNaseI hypersensitivity assay results.  For
example, after import for NA19257, we have

\begin{verbatim}
Browse[1]> x
[1] "NA19257"
Browse[1]> tmp
RangedData with 1465907 rows and 3 value columns across 22 spaces
           space               ranges   |        name       score   strand
        <factor>            <IRanges>   | <character>   <numeric> <factor>
1           chr1       [  402,   501]   |         NOT -0.67088720        +
2           chr1       [  502,   601]   |         NOT -1.69969288        +
3           chr1       [  602,   701]   |         NOT  0.13520754        +
...          ...                  ... ...         ...         ...      ...
1465905    chr22 [49571602, 49571701]   |         NOT  0.62742318        +
1465906    chr22 [49575102, 49575201]   |         NOT -0.09417379        +
1465907    chr22 [49581602, 49581701]   |         NOT -0.29496269        +
\end{verbatim}

The scores for locations on chromosome 2 were collected using
<<showp1>>=
proc1 = function(x) {
 library(rtracklayer)
 tmp = import(paste(x, ".qnorm.bed.gz", sep=""))
 stt = split(tmp, space(tmp))
 obn = paste(x, "_dsq_chr2", sep="")
 assign(obn, stt[["chr2"]])
 save(list=obn, file=paste(obn, ".rda", sep=""))
 NULL
}
@
The regions and scores reported are described in the GEO metadata as
\begin{quotation}
We also provide BED file format data for each individual for the top 5\% of the genome in terms of total sensitivity. This data was mapped to hg18 using a custom read-mapping algorithm which we describe in detail in the associated publication. Measures of DNase sensitivity were quantile normalized within each individual to a standard normal distribution. Each individual was corrected for GC bias and the top 4 principle (sic) components were removed from the data (See manuscript).
\end{quotation}

Score data were structured as a matrix with columns corresponding to
Yoruba HapMap subject, and rows corresponding to reported
hypersensitivity regions.  

The \texttt{RangedSummarizedExperiment} container is used to unite
range and score data in the assays component, and allied metadata
are available in metadata and colData components.

\subsection{Genotype data}

Textual representation of the allelic doses is provided at
\url{http://eqtl.uchicago.edu/dsQTL_data/GENOTYPES/}.
As of Oct 2012, these were rounded to allele counts to
allow use of snpMatrix representation for chromosome 2 genotypes; 
propagation of
dosage fractions will be undertaken in late 2012.



\section{Session information}
<<lks>>=
sessionInfo()
@


\end{document}