%
% NOTE -- ONLY EDIT Biobase.Rnw!!!
% Biobase.tex file will get overwritten.
%
%\VignetteIndexEntry{RSNPper Primer}
%\VignetteDepends{XML}
%\VignetteKeywords{SNP} 
%\VignettePackage{RSNPper}


%
% NOTE -- ONLY EDIT THE .Rnw FILE!!!  The .tex file is
% likely to be overwritten.
%
\documentclass[12pt]{article}

%%\usepackage{amsmath}
\usepackage[authoryear,round]{natbib}
%%\usepackage{hyperref}


\textwidth=6.2in
\textheight=8.5in
%\parskip=.3cm
\oddsidemargin=.1in
\evensidemargin=.1in
\headheight=-.3in

\newcommand{\scscst}{\scriptscriptstyle}
\newcommand{\scst}{\scriptstyle}

\bibliographystyle{plainnat} 

%\setkeys{Gin}{width=0.55\textwidth}
 
\begin{document}

\title{{\it RSNPper}: utilities for SNP data}
\author{VJ Carey {\tt stvjc@channing.harvard.edu}}
\maketitle

\tableofcontents

\section{Introduction}
This document describes {\tt RSNPper} version 1.0,
added to Bioconductor in October of 2003.  This
first version focuses on SNP metadata, with functions
that
retrieve SNP-related data from the Boston Children's Hospital
Informatics Program SNPper web service \cite{Riva:2002}.

Earlier non-released versions of this package included
considerable code for working with prettybase format
and for conducting other tasks in SNP discovery projects.
That material has been moved to {\tt inst/OLD} and
may be re-introduced later.  Users seeking legacy
support should contact the author.

\section{How it works}

<<echo=FALSE>>=
library(RSNPper)
@
The core of this package is the XML-RPC service at
CHIP accessible through the following URL stub:
<<baseURL>>=
print(.SNPperBaseURL)
@
The {\tt useSNPper} function allows you to 
work directly with the XML-RPC server by packing
up appropriate command and argument strings.
<<primitiveInterface>>=
dput(useSNPper)
print(useSNPper("geneinfo","&name=CRP")[1:7])
@
The main functions of {\it RSNPper} attend to simplifying
specification of parameters and parsing and packaging the
XML results.

{\bf Note on auditability.}  All functions return textual
information coupled with auditing information as a 'toolInfo'
attribute, detailing the SNPper supplied information on
the human genome sequence build, the dbSNP version,
and the SNPper version from which the results are obtained.
At present, there is one exception: when {\tt itemsInRange}
is invoked with {\tt item='countsnps}, no toolInfo data
is obtained.  This will be corrected once the {\tt countsnps}
command at SNPper returns valid XML element tags.

\section{Overview of the functions}
The current set of functions intended for investigative
use is:
\begin{itemize}
\item {\tt geneInfo} -- general information about location and nomenclature
\item {\tt geneLayout} -- information about exon locations
\item {\tt geneSNPs} -- all SNPs associated with a given gene
\item {\tt SNPinfo} -- detailed information on a SNP
\item {\tt itemsInRange} -- supports chromosome scanning for genes,
SNPs, or counts of SNPs
\end{itemize}
An omission: for SNP information, I have not collected information
on submitter.
\section{Demonstrations}
\subsection{Obtaining information on genes}
The {\tt geneInfo} function will collect some basic
information on a gene.  The gene may be specified
by HUGO name, mRNA accession number, or SNPper id.
<<demoGeneInfo>>=
print(geneInfo("CRP"))
@
The {\tt geneLayout} function provides information on exon locations.
<<demoGeneLayout>>=
print(geneLayout("546"))
@
Information on all the genes catalogued in a certain
chromosomal region can be obtained using {\tt itemsInRange}.
<<demoGenesInRange>>=
print(itemsInRange("genes", "chr1", "156400000", "156500000"))

@
\subsection{Obtaining information on SNPs}
Suppose you want information on the SNP with dbSNP id rs25.
<<SNPinfoDemo>>=
print(SNPinfo("25"))
@
Suppose instead you want information on all the SNPs
cataloged in a certain chromosomal region.
<<SNPsInRangeDemo>>=
ird <- itemsInRange("snps", "chr1", "156400000", "156500000")
print(length(ird))
print(ird[1:3])
@
Note that the start and end locations are supplied as strings.
This is to avoid coercion to textual scientific notation.

Additional detail on the count of SNPs can be obtained
more briefly:
<<countSNPsInRangeDemo>>=
print(itemsInRange("countsnps", "chr1", "156400000", "156500000"))
@
To see all the SNPs associated with a given gene,
use the {\tt geneSNPs} function.  This requires knowledge of the
SNPper gene id, which can be obtained using {\tt geneInfo}.
<<geneSNPsDemo>>=
gs <- geneSNPs("546")
print(length(gs))
print(gs[1:3])
@
\section{Application: SNP density on chr 1}
Human chromosome 1 is approximately 300Mb, and 142,629
SNPs have been recorded as of dbSNP build 106, according to
NCBI SNP/maplists/maplist-newmap.html on 13 Sep 03.
Let's see if these facilities can recover this sort of 
information.  Counting the number of SNPs on a long chromosomal
region seems to take a long time for SNPper, so we will
break up the task.
<<snpDensity>>=
print(itemsInRange("countsnps", "chr1", "1", "100000"))
system("sleep 2")
print(itemsInRange("countsnps", "chr1", "100001", "200000"))
system("sleep 2")
print(itemsInRange("countsnps", "chr1", "200001", "300000"))
system("sleep 2")
@
These runs complete in a reasonable amount of time.
Here we will just look at the first 2Mb in intervals of .1Mb.
<<fullRun>>=
starts <- as.character(as.integer(seq(1,2000001,100000)))
ends <- as.character(as.integer(as.integer(starts)+99999))
out <- matrix(NA,nr=20,nc=3)
for (i in 1:20)
  {
  cat(i)
  out[i,] <- itemsInRange("countsnps", "chr1", starts[i], ends[i])
  system("sleep 2")
  }
print(out)
@
\end{document}