%\VignetteIndexEntry{ensemblVEP}
%\VignetteDepends{GenomicRanges, VariantAnnotation, Biostrings}
%\VignetteKeywords{annotation, variants}
%\VignettePackage{ensemblVEP}
\documentclass[10pt]{article}

\usepackage{times}
\usepackage{hyperref}

\usepackage[margin=0.65in]{geometry}

\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textsf{#1}}}
\newcommand{\Rmethod}[1]{{\texttt{#1}}}
\newcommand{\Rfunarg}[1]{{\texttt{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}
\newcommand{\Rcode}[1]{{\texttt{#1}}}

\newcommand{\software}[1]{\textsf{#1}}
\newcommand{\R}{\software{R}}
\newcommand{\Bioconductor}{\software{Bioconductor}}

\SweaveOpts{keep.source=TRUE}

\title{Overview of \Rpackage{ensemblVEP}}
\author{Lori Shepherd and Valerie Obenchain}

\begin{document}

\maketitle
\tableofcontents

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Ensembl provides the facility to predict functional consequences
of known and unknown variants using the Variant Effect Predictor
(VEP). The \Rpackage{ensemblVEP} package wraps Ensembl VEP and
returns the results as \R objects or a file on disk. To use this
package the Ensembl VEP perl script must be installed in your path.
See the package README for details.

NOTE: As of Ensembl version 88 the VEP script has been renamed from
variant\_effect\_predictor.pl to vep. The ensemblVEP package code
and documentation have been updated to reflect this change.

\noindent Downloads:
\url{http://uswest.ensembl.org/info/docs/tools/vep/index.html}
\\
\noindent Complete documentation for runtime options:
\url{http://uswest.ensembl.org/info/docs/tools/vep/script/vep_options.html}
\\
\noindent To test that Ensembl VEP is properly installed, enter the
name of the script from the command line:

{\it vep}
\\


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Results as \R{} objects}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

<<setup>>=
library(ensemblVEP)
@

The \Rfunction{ensemblVEP} function can return variant consequences from
Ensembl VEP as \R objects (\Robject{GRanges} or \Robject{VCF}) or write
them to a file. The default behavior returns a \Robject{GRanges}. Runtime
options are stored in a \Robject{VEPFlags} object and allow a great deal
of control over the content and format of the results. See the man pages
for more details.
<<man_page, eval=FALSE>>=
?ensemblVEP
?VEPFlags
@

The default runtime options can be inspected by creating a
\Robject{VEPFlags}.
<<default_VEPFlags>>=
param <- VEPFlags()
param
flags(param)
@

Of note is the 'host'. The default 'host' for queries is character() and
therefore is set to the vep default of 'ensembldb.ensembl.org'. Users in
the US may find connection and transfer speeds quicker using our East
coast mirror, 'useastdb.ensembl.org'.

<<hostChange, eval=TRUE>>=
param <- VEPFlags(flags=list(host="useastdb.ensembl.org"))
@

Using a vcf file from \Rpackage{VariantAnnotation} as input, we query
Ensembl VEP with the default runtime parameters. Consequence data are
parsed into the metadata columns of the \Robject{GRanges}. To control
the type and amount of data returned see the output options at
\url{http://uswest.ensembl.org/info/docs/tools/vep/script/vep_options.html}.

<<rtn_GRanges, results=verbatim>>=
fl <- system.file("extdata", "gl_chr1.vcf", package="VariantAnnotation")
gr <- ensemblVEP(fl, param)
head(gr, 3)
@

Next we request that a \Robject{VCF} object be returned by setting
the {\it vcf} option in the {\it flags} slot to TRUE.
<<set_vcf>>=
param <- VEPFlags(flags=list(vcf=TRUE, host="useastdb.ensembl.org"))
vep <- ensemblVEP(fl, param)
@

Success! When a \Robject{VCF} is returned, consequence data are
included as an unparsed INFO column labeled {\it CSQ}.
<<rtn_VCF>>=
info(vep)$CSQ
@

The \Rfunction{parseCSQToGRanges} function parses these data
into a \Robject{GRanges}. When the rownames of the original
VCF are provided as \Rcode{VCFRowID} a metadata column of the
same name is included in the output.
<<parseCSQToGRanges>>=
vcf <- readVcf(fl, "hg19")
csq <- parseCSQToGRanges(vep, VCFRowID=rownames(vcf))
head(csq, 3)
@

The \Rcode{VCFRowID} columns maps the expanded {\it CSQ} data back
to the rows in the \Rclass{VCF} object. This index can be used to
subset the original VCF.
<<map_rownames>>=
vcf[csq$"VCFRowID"]
@

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Write results to a file}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

In the previous section we saw Ensembl VEP results returned as
\R{} objects in the workspace. Alternatively, these results can
be written directly to a file. The flag that controls how the
data are returned is the {\it output\_file} flag.

When {\it output\_file} is NULL (default), the results
are returned as either a \Rclass{GRanges} or \Rclass{VCF} object.
<<output_file_default>>=
flags(param)$output_file
@

To write results directly to a file, specify a file name for the
{\it output\_file} flag.
<<output_file_filename>>=
flags(param)$output_file <- "/mypath/myfile"
@

The file can be written as a {\it vcf} or {\it gvf} by setting the
options of the slot to TRUE. If neither of {\it vcf}
or {\it gvf} are TRUE the file is written out as tab delimited.
<<ouput_slot>>=
## Write a vcf file to myfile.vcf:
myparam <- VEPFlags(flags=list(vcf=TRUE,
                        output_file="/path/myfile.vcf"))
## Write a gvf file to myfile.gvf:
myparam <- VEPFlags(flags=list(gvf=TRUE,
                        output_file="/path/myfile.gvf"))
## Write a tab delimited file to myfile.txt:
myparam <- VEPFlags(flags=list(output_file="/path/myfile.txt"))
@


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Configuring runtime options}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The Ensembl VEP web page has complete descriptions of all runtime
options.
\url{http://uswest.ensembl.org/info/docs/tools/vep/script/vep_options.html}
Below are examples of how to configure the runtime options
in the \Rclass{VEPFlags} for specific situations. Investigate the
differences in results using a sample file from \Rpackage{VariantAnnotation}.
<<samplefile>>=
fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
@

\begin{itemize}
  \item Add regulatory region consequences:
<<runtime1, eval=FALSE>>=
param <- VEPFlags(flags=list(regulatory=TRUE, host="useastdb.ensembl.org"))
gr <- ensemblVEP(fl, param)
@

  \item Specify input file format as VCF, add HGNC gene
        identifiers, output SO consequence terms:
<<runtime2, eval=FALSE>>=
param <- VEPFlags(flag=list(format="vcf",
                      terms="SO",
                      symbol=TRUE, host="useastdb.ensembl.org"))
gr <- ensemblVEP(fl, param)
@

  \item Check for co-located variants, output only coding
        sequence consequences, output HGVS names:
<<runtime3, eval=FALSE>>=
param <- VEPFlags(flags=list(coding_only=TRUE,
                      check_existing=TRUE,
                      symbol=TRUE, host="useastdb.ensembl.org"))
gr <- ensemblVEP(fl, param)
@

  \item Add SIFT score and prediction, PolyPhen prediction only,
        output results as \Rcode{VCF}:
\begin{verbatim}
fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
param <- VEPFlags(flags=list(sift="b", polyphen="p",
                  vcf=TRUE, host="useastdb.ensembl.org"))
vcf <- ensemblVEP(fl, param)
csq <- parseCSQToGRanges(vcf)

> head(levels(mcols(csq)$SIFT))
[1] "deleterious(0.01)" "deleterious(0.02)" "deleterious(0.03)"
[4] "deleterious(0.04)" "deleterious(0.05)" "deleterious(0)"

> levels(mcols(csq)$PolyPhen)
[1] "benign"            "possibly_damaging" "probably_damaging"
[4] "unknown"
\end{verbatim}
\end{itemize}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{\Rcode{sessionInfo()}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

<<sessionInfo>>=
sessionInfo()
@

\end{document}