%\VignetteIndexEntry{AnnotationHub: A client package for retrieving data from the AnnotationHub web service}
%\VignetteDepends{AnnotationHub}

\documentclass[11pt]{article}

\usepackage{Sweave}
\usepackage[usenames,dvipsnames]{color}
\usepackage{graphics}
\usepackage{latexsym, amsmath, amssymb}
\usepackage{authblk}
\usepackage[colorlinks=true, linkcolor=Blue, urlcolor=Black,
  citecolor=Blue]{hyperref}

%% Simple macros

\newcommand{\code}[1]{{\texttt{#1}}}
\newcommand{\file}[1]{{\texttt{#1}}}

\newcommand{\software}[1]{\textsl{#1}}
\newcommand\R{\textsl{R}}
\newcommand\Bioconductor{\textsl{Bioconductor}}
\newcommand\Rpackage[1]{{\textsl{#1}\index{#1 (package)}}}
\newcommand\Biocpkg[1]{%
  {\href{http://bioconductor.org/packages/devel/bioc/html/#1.html}%
    {\textsl{#1}}}%
  \index{#1 (package)}}
\newcommand\Rpkg[1]{%
  {\href{http://cran.fhcrc.org/web/devel/#1/index.html}%
    {\textsl{#1}}}%
  \index{#1 (package)}}
\newcommand\Biocdatapkg[1]{%
  {\href{http://bioconductor.org/packages/devel/data/experiment/html/#1.html}%
    {\textsl{#1}}}%
  \index{#1 (package)}}
\newcommand\Robject[1]{{\small\texttt{#1}}}
\newcommand\Rclass[1]{{\textit{#1}\index{#1 (class)}}}
\newcommand\Rfunction[1]{{{\small\texttt{#1}}\index{#1 (function)}}}
\newcommand\Rmethod[1]{{\texttt{#1}}}
\newcommand\Rfunarg[1]{{\small\texttt{#1}}}
\newcommand\Rcode[1]{{\small\texttt{#1}}}

%% Question, Exercise, Solution
\usepackage{theorem}
\theoremstyle{break}
\newtheorem{Ext}{Exercise}
\newtheorem{Question}{Question}


\newenvironment{Exercise}{
  \renewcommand{\labelenumi}{\alph{enumi}.}\begin{Ext}%
}{\end{Ext}}
\newenvironment{Solution}{%
  \noindent\textbf{Solution:}\renewcommand{\labelenumi}{\alph{enumi}.}%
}{\bigskip}




\title{AnnotationHub: A client package for retrieving data from the
  AnnotationHub web service}
\author{Marc Carlson}

\SweaveOpts{keep.source=TRUE}
\begin{document}
\SweaveOpts{concordance=TRUE}

\maketitle

\section{\Robject{AnnotationHub} Objects}

The \Rpackage{AnnotationHub} package provides a client interface to
resources stored at the AnnotationHub web service.

<<loadPkg>>=
library(AnnotationHub)
@ 

The \Rpackage{AnnotationHub} package is straightforward to use.  The
1st thing you need to do to make use of it is to create an
\Robject{AnnotationHub} object like this:

<<makeObject>>=
ah = AnnotationHub()
@ 

Now at this point you have already done everything you need in order
to get annotations. If you know exactly what the resource you want is
called (and where it can be found), you could get it right now by just
tab completing to it using the \Rmethod{\$} operator.  

<<download1, echo=FALSE>>=
path <- "ah$goldenpath.hg19.encodeDCC.wgEncodeUwTfbs.wgEncodeUwTfbsMcf7CtcfStdPkRep1.narrowPeak_0.0.1.RData"
@ 

Lets suppose that you knowd the following is the path to your data:

\url{\Sexpr{path}}

Simply tab completing to the above path (followed by hitting enter),
as demonstrated below will actually retrieve an object and then assign
its contents to a local variable called res.

<<download2>>=
res <- ah$goldenpath.hg19.encodeDCC.wgEncodeUwTfbs.wgEncodeUwTfbsMcf7CtcfStdPkRep1.narrowPeak_0.0.1.RData
@ 


As you can see it's pretty easy to get data out using
\Robject{AnnotationHub} objects.  The rest of this vignette is mostly
about helping you to make sure you are accessing the version of
\Robject{AnnotationHub} objects that you intend to use and also about
making sure that you can filter down the huge number of objects to the
few that you are really interested in.


\section{Configuring  \Robject{AnnotationHub} objects}

When you create the \Robject{AnnotationHub} object, it will set up the
object for you with some default settings.  If you look at the object
you will see some helpful information about it.

<<show>>=
ah
@ 

By default, you can see that the \Robject{AnnotationHub} object is set
to the latest snapshotData and a snapshot version that matches the
version of Bioconductor that you are using.  You can also learn about
these data with the appropriate methods.

<<snapshot>>=
snapshotVersion(ah)
snapshotDate(ah)
@ 

If you are interested in using an older version of a snapshot, you can
list previous versions with the \Rmethod{possibleDates} like this:

<<possibleDates>>=
pd <- possibleDates(ah)
pd
@ 

And then you can set the dates like this:

<<setdate, eval=FALSE>>=
snapshotDate(ah) <- pd[1]
@ 


\section{Exploring and setting filters for  \Robject{AnnotationHub}}

If you are interested in how many annotation resources are currently
available for your \Robject{AnnotationHub} object, you can just take
the length like this:

<<length>>=
length(ah)
@ 

Similarly, there are also methods to show the resource names, or even
the full set of resource URLs for available resources.

<<listResources>>=
names <- head(names(ah),n=1)
@ 
<<namesShow1,eval=FALSE>>=
names
@ 
\url{\Sexpr{names}}

<<listResources2>>=
urls <- head(snapshotUrls(ah),n=1)
@ 
<<urlsShow1,eval=FALSE>>=
urls
@ 
\url{\Sexpr{urls}}



For humans, the number of resources available is going to be
overwhelming.  How should we cut this data set down to size?  For this
task, we introduce filters.  Every \Robject{AnnotationHub} object
contains a list of filters that can be configured to control which
resources it can return. By default this list is empty, which means
you get everything.

<<baseFilters>>=
filters(ah)
@ 

How can we learn which things are available for filtering?  For this
we have defined \Rmethod{columns} and \Rmethod{keytypes} methods, which
will list all the kinds of data that can be filtered on.

<<columnsAndKeytypes>>=
columns(ah)
keytypes(ah)
@ 

Once we know which things can be used to filter on, we can extract
values that these things can be required to match.  For this task, we
have defined a \Rmethod{key} method.

<<keys>>=
head(keys(ah, keytype="Species"))
@ 

Now we are able to construct and assign a filter to our \Robject{AnnotationHub}
object.  Lets set it up to only find resources from humans.

<<filtersSet>>=
filters(ah) <- list(Species="Homo sapiens")
@ 

And now if we look we will see that our \Robject{AnnotationHub} object
is only exposing resources from Homo sapiens.

<<listResources3>>=
length(ah)
names <- head(names(ah),n=1)
@ 
<<namesShow2,eval=FALSE>>=
names
@ 

\url{\Sexpr{names}}

<<listResources4>>=
urls <- head(snapshotUrls(ah),n=1)
@ 
<<urlsShow2,eval=FALSE>>=
urls
@ 

\url{\Sexpr{urls}}

We can also look at the \Robject{AnnotationHub} object in a broswer
using the display function. We can then  filter the \Robject{AnnotationHub} 
object for `Homo sapiens' by either using the Global search field 
on the top right corner of the page or the in-column search field for 
`Species'.

<<displayfunc,eval=FALSE>>=
d <- display(ah)
@

\begin{figure}[ht]
\centering
\includegraphics[width=\textwidth]{display.pdf}
\caption{Displaying and filtering the Annnotation Hub object in a browser}
\label{fig:dbtypes}
\end{figure}

By default 1000 entries are displayed per page, we can change this using
the filter on the top of the page or navigate through different pages
using the page scrolling feature at the bottom of the page.

We can also select the rows of interest to us and send them back to 
the R session using 'Send Rows' button ; 
this sets a filter internally which filters the 
\Robject{AnnotationHub} object.

\section{Using \Robject{AnnotationHub} to retrieve data}

So now that we have our \Robject{AnnotationHub} object configured to
expose only the data for humans how would we go about getting that
data downloaded?  As mentioned above, we can use the \Rmethod{\$}
operator and tab completion to pull down a data source of interest
like this.


<<download3,echo=FALSE>>=
path <- "ah$goldenpath.hg19.encodeDCC.wgEncodeUwTfbs.wgEncodeUwTfbsMcf7CtcfStdPkRep1.narrowPeak_0.0.1.RData"
@ 

\url{\Sexpr{path}}


Just by using tab completion like this:

<<download4>>=
res <- ah$goldenpath.hg19.encodeDCC.wgEncodeUwTfbs.wgEncodeUwTfbsMcf7CtcfStdPkRep1.narrowPeak_0.0.1.RData
@ 

And once you have done this, you can look at the object stored in res
and use it etc..  Any dependencies that you need to use this kind of
object should automatically try to load at this time.

<<download3>>=
res
@ 

Also, since you have previously downloaded this object at the start of
this vignette, the 2nd time it should pull this object from a local
cache that \Rpackage{AnnotationHub} will have created for you.  This
is a feature of \Rpackage{AnnotationHub} that is meant to provide
better performance by removing the need to pull a large amount of data
from a distant server every time.  However, this does not mean that
once you have used \Robject{AnnotationHub} to retrieve data that you
no longer need to have internet access.  This is because whenever you
create a \Robject{AnnotationHub} object, it needs to talk to the
metadata server to learn about things like the latest available
version etc.  So if you intend to access your objects on the plane you
will need to either save them to a convenient location or else take
note of where your local cache is located so that you can load them up
manually later.



\section{Session Information}

<<SessionInfo, echo=FALSE>>=
sessionInfo()
@

\end{document}