--- title: "Aligning reads with Rhisat2" author: "Charlotte Soneson" date: "`r Sys.Date()`" package: Rhisat2 output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{Rhisat2} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console bibliography: rhisat2.bib --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(width=100) ``` The `r Biocpkg("Rhisat2")` package provides an interface to the [HISAT2](https://ccb.jhu.edu/software/hisat2/index.shtml) short-read aligner software [@Kim2015-mu]. # Installation Use the `BiocManager` package to download and install the package from Bioconductor as follows: ```{r getPackage, eval=FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("Rhisat2") ``` If required, the latest development version of the package can also be installed from GitHub. ```{r, eval = FALSE} BiocManager::install("fmicompbio/Rhisat2") ``` Once the package is installed, load it into your R session: ```{r} library(Rhisat2) ``` # Building a genome index To build an index for the alignment, use the `hisat2_build` function. You need to provide one or more fasta file with reference sequences, as well as an output directory where the index will be stored, and a "prefix" (that will determine the name of the index files in the output directory). Any additional arguments to the `hisat2-build` binary can also be supplied to the function (see `hisat2_build_usage()` or [https://ccb.jhu.edu/software/hisat2/manual.shtml#the-hisat2-build-indexer](https://ccb.jhu.edu/software/hisat2/manual.shtml#the-hisat2-build-indexer) for an overview of the available options). The package contains three example reference sequences, corresponding to short pieces of three different chromosomes. We show how to create an index (named `myindex`) based on these sequences. ```{r} list.files(system.file("extdata/refs", package="Rhisat2"), pattern="\\.fa$") refs <- list.files(system.file("extdata/refs", package="Rhisat2"), full.names=TRUE, pattern="\\.fa$") td <- tempdir() hisat2_build(references=refs, outdir=td, prefix="myindex", force=TRUE, strict=TRUE, execute=TRUE) ``` By instead setting `execute=FALSE` in the command above, `hisat2_build()` will return the index building shell command for inspection, without executing it. ```{r} print(hisat2_build(references=refs, outdir=td, prefix="myindex", force=TRUE, strict=TRUE, execute=FALSE)) ``` # Aligning reads to the genome index After creating the index, reads can be aligned using the `hisat2` wrapper function. Most commonly, the reads will be provided in `fastq` files (one file for single-end reads, two files for paired-end reads). The names of these files can be provided directly to the `hisat2` function, either as a vector (for single-end reads) or as a list of length 2 (for paired-end reads, each list element corresponds to one mate). You also need to provide the path to the index generated by `hisat2_build` (the output directory combined with the prefix), and the output file name where the alignments should be written. ```{r} list.files(system.file("extdata/reads", package="Rhisat2"), pattern="\\.fastq$") reads <- list.files(system.file("extdata/reads", package="Rhisat2"), pattern="\\.fastq$", full.names=TRUE) hisat2(sequences=as.list(reads), index=file.path(td, "myindex"), type="paired", outfile=file.path(td, "out.sam"), force=TRUE, strict=TRUE, execute=TRUE) ``` As for `hisat2_build()`, any additional arguments to the `hisat2` binary can be provided to the `hisat2()` function (see `hisat2_usage()` or [https://ccb.jhu.edu/software/hisat2/manual.shtml#running-hisat2](https://ccb.jhu.edu/software/hisat2/manual.shtml#running-hisat2) for an overview of the available options). With HISAT2, you can provide a file with known splice sites at the alignment step, which can help in finding the correct alignments across known splice junctions. A text file in the required format can be generated using the `extract_splice_sites()` function, starting from an annotation file in gtf or gff3 format, or from a `GRanges` or `TxDb` object. ```{r} spsfile <- tempfile() gtf <- system.file("extdata/refs/genes.gtf", package="Rhisat2") extract_splice_sites(features=gtf, outfile=spsfile) hisat2(sequences=as.list(reads), index=file.path(td, "myindex"), type="paired", outfile=file.path(td, "out_sps.sam"), `known-splicesite-infile`=spsfile, force=TRUE, strict=TRUE, execute=TRUE) ``` # Miscellaneous helper functions To get the version of `HISAT2`: ```{r} hisat2_version() ``` To see the usage of `hisat2_build()`: ```{r} hisat2_build_usage() ``` And similarly to see the usage of `hisat2()`: ```{r} hisat2_usage() ``` # Session info ```{r} sessionInfo() ``` # References