---
title: "FScanR: detect programmed ribosomal frameshifting events from various genomes"
author: "Xiao Chen\\

        Columbia University Medical Center"
date: "`r Sys.Date()`"
output:
  prettydoc::html_pretty:
    toc: true
    theme: cayman
    highlight: github
  pdf_document:
    toc: true
vignette: >
  %\VignetteIndexEntry{FScanR}
  %\VignetteEngine{knitr::rmarkdown}
  %\usepackage[utf8]{inputenc}
---

```{r style, echo=FALSE, results="asis", message=FALSE}
knitr::opts_chunk$set(tidy = FALSE,
		   message = FALSE)
```


```{r echo=FALSE}
CRANpkg <- function (pkg) {
    cran <- "https://CRAN.R-project.org/package"
    fmt <- "[%s](%s=%s)"
    sprintf(fmt, pkg, cran, pkg)
}

Biocpkg <- function (pkg) {
    sprintf("[%s](http://bioconductor.org/packages/%s)", pkg, pkg)
}

```

```{r echo=FALSE, results='hide', message=FALSE}
library(FScanR)
```

# Abstract

'FScanR' identifies Programmed Ribosomal Frameshifting (PRF) events from BLASTX homolog sequence alignment 
between targeted genomic/cDNA/mRNA sequences against the peptide library of the same species or a close relative. 
The output by BLASTX or diamond BLASTX will be used as input of 'FScanR' and should be in a tabular format with 14 columns. 

For BLASTX, the output parameter should be: 

	-outfmt '6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qframe sframe'

For diamond BLASTX, the output parameter should be: 

	-outfmt 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qframe qframe

For details, please visit <https://doi.org/10.1111/1755-0998.13023>.


# Introduction

Ribosomal frameshifting, also known as translational frameshifting or translational recoding, is a biological phenomenon 
that occurs during translation that results in the production of multiple, unique proteins from a single mRNA. 
The process can be programmed by the nucleotide sequence of the mRNA and is sometimes affected by the secondary, 3-dimensional mRNA structure.
It has been described mainly in viruses (especially retroviruses), retrotransposons and bacterial insertion elements, and also in some cellular genes.

For details, please visit [Ribosomal frameshift](https://en.wikipedia.org/wiki/Ribosomal_frameshift).


## Install FScanR
```{r eval = FALSE}
##  Install FScanR in R (>= 3.5.0)
if(!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("FScanR")
```

## Load FScanR and test data

The dataset _test_ in this vignettes was homolog sequence alignment between _Euplotes vannus_ mRNA and protein sequences, 
output by BLASTX, from [_Chen et al., 2019_](https://doi.org/10.1111/1755-0998.13023).

```{r}
## loading package
library(FScanR)
## loading test data
test_data <- read.table(system.file("extdata", "test.tab", package = "FScanR"), header=TRUE, sep="\t")
```

## PRF events detection

The default cutoffs for _E-value_ and _frameDist_ are _1e-05_ and _10_ (nt), respectively.

Low _E-value_ cutoff ensures the fidelity of sequence alignment, but a too strict cutoff 
may also leads to false-negative detection.

Small _frameDist_ cutoff avoids the false-positive PRF events introduced by introns, 
especially when using genomic sequences as query sequence. _frameDist_ cutoff should be 
at least 4 nt.

Detected high PRF events will be output in tabular format with 7 columns. 
The column _FS_type_ contains the type information (-2, -1, +1, +2) of PRF events.

```{r}
## loading packages
prf <- FScanR(test_data, evalue_cutoff = 1e-05, frameDist_cutoff = 10)
table(prf$FS_type)
```

## PRF event types plot

In this vignettes, the number of detected events of four PRF types are presented in a pie chart.

```{r}
## plot the 4-type PRF events detected
mytable <- table(prf$FS_type)
lbls <- paste(names(mytable), " : ", mytable, sep="")
pie(mytable, labels = NA, main=paste0("PRF events"), cex=0.5, col=cm.colors(length(mytable)))
legend("right",legend=lbls[!is.na(lbls)], bty="n", cex=1,  fill=cm.colors(length(mytable))[!is.na(lbls)])
```

# Citation

If you use [FScanR](https://github.com/seanchen607/FScanR) in published research, 
please cite the most appropriate paper(s) from this list:

1.  **X Chen**, Y Jiang, F Gao<sup>\*</sup>, W Zheng, TJ Krock, NA Stover, C Lu, LA Katz & W Song (2019). 
    Genome analyses of the new model protist Euplotes vannus focusing on genome rearrangement and resistance 
    to environmental stressors. ***Molecular Ecology Resources***, 19(5):1292-1308. doi:
    [10.1111/1755-0998.13023](https://doi.org/10.1111/1755-0998.13023).

# Session Information

Here is the output of `sessionInfo()` on the system on which this document was compiled:

```{r echo=FALSE}
sessionInfo()
```