---
title: "Analyzing tRNA sequences and structures"
author: "Felix G.M. Ernst"
date: "`r Sys.Date()`"
package: tRNA
abstract: >
  Example of importing tRNAdb output as GRanges
output:
  BiocStyle::html_document:
    toc: true
    toc_float: true
    df_print: paged
vignette: >
  %\VignetteIndexEntry{tRNA}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: references.bib
---

```{r style, echo = FALSE, results = 'asis'}
BiocStyle::markdown(css.files = c('custom.css'))
```

# Introduction

The `tRNA` package allows tRNA sequences and structures to be accessed and used
for subsetting. In addition, it provides visualization tools to compare feature
parameters of multiple tRNA sets and correlate them to additional data. The
package expects a `GRanges` object with certain columns as input. The following
columns are a requirement: `tRNA_length`, `tRNA_type`, `tRNA_anticodon`,
`tRNA_seq`, `tRNA_str`, `tRNA_CCA.end`.

# Loading tRNA information

To work with the tRNA packages, tRNA information can be retrieved in a 
number of ways:

  1. A `GRanges` object can be constructed by hand containing the
required colums (see above)
  2. a tRNAscan result file can be loaded using 
  `import.tRNAscanAsGRanges()` from `tRNAscanImport` package
  3. selected tRNA information can be retrieved using
  `import.tRNAdb()` from `tRNAdbImport` package

For this vignette a predefined `GRanges` object is loaded.

```{r, echo=FALSE}
suppressPackageStartupMessages(
  library(tRNA)
)
data("gr", package = "tRNA", envir = environment())
```

```{r, eval=FALSE}
library(tRNA)
data("gr", package = "tRNA", envir = environment())
```

# tRNA sequences and structures

To retrieve the sequences for individual tRNA structure elements 
`gettRNAstructureGRanges` or `gettRNAstructureSeqs` can be used. 

```{r}
# just get the coordinates of the anticodonloop
gettRNAstructureGRanges(gr,
                        structure = "anticodonLoop")
gettRNAstructureSeqs(gr,
                     joinFeatures = TRUE,
                     structure = "anticodonLoop")
```

In addition the sequences can be returned already joined to get a fully blank
padded set of sequences. The boundaries of the individual structures is returned
as metadata of the `RNAStringSet` object.

```{r}
seqs <- gettRNAstructureSeqs(gr[1:10],
                             joinCompletely = TRUE)
seqs
# getting the tRNA structure boundaries
metadata(seqs)[["tRNA_structures"]]
```

Be aware, that `gettRNAstructureGRanges` and `gettRNAstructureSeqs` might not be
working as expected, if the tRNA sequences in questions are armless or deviate a
lot from the canonical tRNA model. It was thouroughly tested using human
mitochondrial tRNA and other tRNAs missing certain features. However, for fringe
cases this might not be the case. Please report these cases, if you find them,
with an example.

# Subsetting tRNA sequences

The structure information can be queried for subsetting using the function
`hasAccpeptorStem()` as an example. Please have a look at`?hasAccpeptorStem` for
all available subsetting functions.

```{r echo=TRUE, results="hide"}
gr[hasAcceptorStem(gr, unpaired = TRUE)]
# mismatches and bulged are subsets of unpaired
gr[hasAcceptorStem(gr, mismatches = TRUE)]
gr[hasAcceptorStem(gr, bulged = TRUE)]
# combination of different structure parameters
gr[hasAcceptorStem(gr, mismatches = TRUE) & 
     hasDloop(gr, length = 8)]
```

# Visualization

To get an overview of tRNA feature and compare different datasets, 
`gettRNAFeaturePlots` can be used. It accepts a named `GRangesList` as input.
Internally it will calculate a list of features values based on the functions
mentioned above and the data contained in the mcols of the `GRanges` objects. 
The results can be retrieved without generating plots using `gettRNASummary`.

```{r}
# load tRNA data for E. coli and H. sapiens
data("gr_eco", package = "tRNA", envir = environment())
data("gr_human", package = "tRNA", envir = environment())

# get summary plots
grl <- GRangesList(Sce = gr,
                   Hsa = gr_human,
                   Eco = gr_eco)
plots <- gettRNAFeaturePlots(grl)
```
```{r plot1, fig.cap = "tRNA length."}
plots$length
```
```{r plot2, fig.cap = "tRNAscan-SE scores."}
plots$tRNAscan_score
```
```{r plot3, fig.cap = "tRNA GC content."}
plots$gc
```
```{r plot4, fig.cap = "tRNAs with introns."}
plots$tRNAscan_intron
```
```{r plot5, fig.cap = "Length of the variable loop."}
plots$variableLoop_length
```

To check whether features correlate with additional scores, optional arguments
can be added to `gettRNAFeaturePlots` or used from the `score` column of the
`GRanges` objects. For the first option provided a list of scores with the same
dimensions as the `GRangesList` object as argument `scores` and for the latter 
set `plotScore = TRUE`.

```{r, eval=FALSE}
# score column will be used
plots <- gettRNAFeaturePlots(grl,
                             plotScores = TRUE)
```
```{r}
plots <- gettRNAFeaturePlots(grl,
                             scores = list(runif(length(grl[[1]]),0,100),
                                           runif(length(grl[[2]]),0,100),
                                           runif(length(grl[[3]]),0,100)))
```
```{r plot6, fig.cap = "tRNA length and score correlation."}
plots$length
```
```{r plot7, fig.cap = "variable loop length and score correlation."}
plots$variableLoop_length
```

The plots can be of course modified manually and changed to suit your needs.

```{r plot8, fig.cap = "Customized plot switching out the point and violin plot into a boxplot."}
plots$length$layers <- plots$length$layers[c(-1,-2)]
plots$length + ggplot2::geom_boxplot()
```

# Options

The colours of the plots can be customized directly on creation with the 
following options.

```{r}
options("tRNA_colour_palette")
options("tRNA_colour_yes")
options("tRNA_colour_no")
```

# Dot bracket annotation

To retrieve detailed information on the base pairing `gettRNABasePairing()` or
`getBasePairing()` can be used depending on the input type. The DotBracket
annotation is expected with pairs of `<>{}[]()` (Please note the orientation.
For `<>` the orientation is variable, since the tRNAscan files use the `><`
annotation by default).

```{r}
head(gettRNABasePairing(gr)[[1]])
head(getBasePairing(gr[1]$tRNA_str)[[1]])
```

The loop ids for the structure elements can be retrieved with `gettRNALoopIDs()`
or `getLoopIDs()`. Please not that pseudoloops are currently not correctly
reflected in the output of these functions.

```{r}
gettRNALoopIDs(gr)[[1]]
getLoopIDs(gr[1]$tRNA_str)
```

# Session info

```{r}
sessionInfo()
```