%\VignetteIndexEntry{An overview of yeastRNASeq}
%\VignetteDepends{yeastRNASeq,ShortRead}
%\VignettePackage{yeastRNASeq}

\documentclass[12pt]{article}
\usepackage{times}
\usepackage{hyperref}

\newcommand{\RPackage}[1]{\textsf{#1}}
\newcommand{\RCode}[1]{\texttt{#1}}
\newcommand{\RObject}[1]{\texttt{#1}}
\newcommand{\RFunction}[1]{\textsf{#1}}
\newcommand{\RClass}[1]{{\textsf{#1}}}
\SweaveOpts{eps=FALSE, echo=TRUE}
\title{An overview of yeastRNASeq}
\author{James Bullard and Kasper D.\ Hansen}
\date{Modified: 15 January 2010. Compiled: \today}

\begin{document}
\maketitle

<<echo=FALSE>>=
options(width = 70)
@ 

<<>>=
require(yeastRNASeq)
@ 

This package contains data from
<<>>=
x <- citation(package = "yeastRNASeq")[[1]]
print(x, bibtex = FALSE)
@ 
which describes some experiments in \emph{S.\ cerevisiae} comparing
various mutant strains to a wild-type strain.  A full BibTex entry can
be obtained by
<<results=hide>>=
citation("yeastRNASeq")
@ 

The subset of the data which this package contains is more
specifically data from a wild-type and a single mutant yeast.  For
each condition (mutant, wild-type) there is two lanes worth of data,
each lane containing a sample of 500,000 raw (unaligned) reads from
each of 2 lanes each.  Each of the four lanes have been aligned
against the yeast genome using Bowtie.

The raw reads are contained in 4 FASTQ files and the Bowtie alignment
are contained in 4 Bowtie output files.  There are 500,000 reads in
each of the FASTQ files and fewer reads in each of the Bowtie files.
The filenames are
<<>>=
list.files(file.path(system.file(package = "yeastRNASeq"), "reads"))
@ 

The reads were aligned to the yeast genome obtained from
\url{ftp://ftpmips.gsf.de/yeast/sequences} (which was the basis for
the Bowtie index available at the Bowtie website at the time of
alignment).

These files are ready to be parsed by the tools in the \RPackage{ShortRead}
package.  As an example we read the alignment files by
<<>>=
require(ShortRead)
files <- list.files(file.path(system.file(package = "yeastRNASeq"), "reads"),
                    pattern = "bowtie", full.names = TRUE)
names(files) <- gsub("\\.bowtie.*", "", basename(files))
names(files)
aligned <- lapply(files, readAligned, type = "Bowtie")
@ 
The constructed object \RObject{aligned} is a list with 4 elements.
Each element correspond to a lane and is an object of class
\RClass{AlignedRead}.

The output from this operation has already been stored as an R object
and is accessible by
<<>>=
data(yeastAligned)
yeastAligned[["mut_1_f"]]
@ 
The percent of aligned reads is
<<>>=
round(sapply(aligned, length) / 500000, 2)
@ 

There are two additional objects available in the package, purely for
illustrative purposes (do not use them for analysis).  The object
\RObject{yeastAnno} is annotation obtained from Ensembl using biomaRt
and is a \RClass{data.frame} of annotation:
<<>>=
data(yeastAnno)
dim(yeastAnno)
head(yeastAnno, n = 2)
table(yeastAnno$gene_biotype)
@ 
The other object is called \RObject{geneLevelData} and is a
\RClass{matrix} of counts per gene.
<<>>=
data(geneLevelData)
head(geneLevelData, n = 2)
@ 
Such a matrix may be constructed from \RObject{yeastAligned} and
\RObject{yeastAnno} using either the functionality in
the \RPackage{IRanges} and \RPackage{ShortRead} packages or by using
the functionality of the \RPackage{Genominator} package (which also
contains a vignette describing a simplified differential analysis of
this dataset).

Note that the data does not contain any biological replicates.  In the
original publication this was addressed by also analyzing a set of
tiling microarrays.
\end{document}