---
title: "Exporting trees with data"
author: "Guangchuang Yu\\

        School of Public Health, The University of Hong Kong"
date: "`r Sys.Date()`"
bibliography: treeio.bib
biblio-style: apalike
output:
  prettydoc::html_pretty:
    toc: true
    theme: cayman
    highlight: github
  pdf_document:
    toc: true
vignette: >
  %\VignetteIndexEntry{02 Exporting trees with data}
  %\VignetteEngine{knitr::rmarkdown}
  %\usepackage[utf8]{inputenc}
---

```{r style, echo=FALSE, results="asis", message=FALSE}
knitr::opts_chunk$set(tidy = FALSE,
		   message = FALSE)
```


```{r echo=FALSE, results="hide", message=FALSE}
library('jsonlite')
library("treeio")
```

# Introduction

The [treeio](https://bioconductor.org/packages/treeio/) package supports parsing
various phylogenetic tree file formats including software outputs that contain
evolutionary evidences. Some of the formats are just log file
(*e.g.* [PAML](http://abacus.gene.ucl.ac.uk/software/paml.html)
and [r8s](http://ginger.ucdavis.edu/r8s) outputs), while some of the others are
non-standard formats (*e.g.* [BEAST](http://beast2.org/)
and [MrBayes](http://nbisweden.github.io/MrBayes/) outputs that introduce square
bracket, which was reserved to store comment in standard Nexus format, to store
inferences). With [treeio](https://bioconductor.org/packages/treeio/), we are
now able to parse these files to extract phylogenetic tree and map associated
data on the tree structure. Exporting tree structure is easy, users can use
`as.phyo` method defined [treeio](https://bioconductor.org/packages/treeio/) to
convert `treedata` object to `phylo` object then using `write.tree` or
`write.nexus` implemented
in [ape](https://cran.r-project.org/web/packages/ape/index.html) package
[@paradis_ape_2004] to export the tree structure as Newick text or Nexus file.
This is quite useful for converting non-standard formats to standard format and
for extracting tree from software outputs, such as log file.


However, exporting tree with associated data is still challenging. These
associated data can be parsed from analysis programs or obtained from external
sources (*e.g.* phenotypic data, experimental data and clinical data). The major
obstacle here is that there is no standard format that designed for storing
tree with data. [NeXML](http://www.nexml.org/) [@vos_nexml:_2012] maybe the most
flexible format, however it is currently not widely supported. Most of the
analysis programs in this field rely extensively on Newick string and Nexus
format. In my opinion, although BEAST Nexus
format^[http://beast.community/nexus_metacomments] may not be the best solution,
it is currently a good approach for storing heterogeneous associated data. The
beauty of the format is that all the annotate elements are stored within square
bracket, which is reserved for comments. So that the file can be parsed as
standard Nexus by ignoring annotate elements and existing programs should be
able to read them.

# Exporting tree data to BEAST Nexus format


## Exporting/converting software output

The [treeio](https://bioconductor.org/packages/treeio/) package provides
`write.beast` to export `treedata` object as BEAST Nexus file [@bouckaert_beast_2014].
With [treeio](https://bioconductor.org/packages/treeio/), it is easy to convert
software output to BEAST format if the output can be parsed
by [treeio](https://bioconductor.org/packages/treeio/). For example, we can
convert NHX file to BEAST file and use NHX tags to color the tree using
FigTree^[http://beast.community/figtree] or convert CODEML output and use
*d~N~/d~S~*, *d~N~* or *d~S~* to color the tree in FigTree.

```{r comment=NA}
nhxfile <- system.file("extdata/NHX", "phyldog.nhx", package="treeio")
nhx <- read.nhx(nhxfile)
# write.beast(nhx, file = "phyldog.tree")
write.beast(nhx)
```

![](figures/phyldog.png)


```{r comment=NA}
mlcfile <- system.file("extdata/PAML_Codeml", "mlc", package="treeio")
ml <- read.codeml_mlc(mlcfile)
# write.beast(ml, file = "codeml.tree")
write.beast(ml)
```

![](figures/codeml.png)


## Combining tree with external data

Using the utilities provided
by [treeio](https://bioconductor.org/packages/treeio/), it is easy to link
external data onto the corresponding phylogeny. The `write.beast` function enable users
to combine the tree with external data to a single tree file.



```{r comment=NA}
phylo <- as.phylo(nhx)
## print the newick text
write.tree(phylo)

N <- Nnode2(phylo)
fake_data <- data_frame(node = 1:N, fake_trait = rnorm(N), another_trait = runif(N))
fake_tree <- treedata(phylo = phylo, data = fake_data)
write.beast(fake_tree)
```


## Merging tree data from different sources

Not only Newick tree text can be combined with associated data, but also tree
data obtained from software output can be combined with external data, as well
as different tree objects can be merged together. For details, please refer to
the [Importer](Importer.html) vignette.


```{r}
## combine tree object with data
tree_with_data <- full_join(nhx, fake_data, by = "node")
tree_with_data

## merge two tree object
tree2 <- merge_tree(nhx, fake_tree)
tree2

identical(tree_with_data, tree2)
```

After merging data from different sources, the tree with the associated data can
be exported into a single file.

```{r comment=NA}
write.beast(tree2)
```

The output BEAST Nexus file can be imported into R using the `read.beast`
function and all the associated data can be used to annotate the tree
using [ggtree](https://bioconductor.org/packages/ggtree/) [@yu_ggtree:_2017].


```{r}
outfile <- tempfile(fileext = ".tree")
write.beast(tree2, file = outfile)
read.beast(outfile)
```

# Exporting tree data to *jtree* format {#jtree}

The [treeio](https://bioconductor.org/packages/treeio/) package provides
`write.beast` to export `treedata` to BEAST Nexus file. This is quite useful
to convert file format, combine tree with data and merging tree data from
different sources as we demonstrated in
[Exporting tree data to BEAST Nexus format](#exporting-tree-data-to-beast-nexus-format) session.
The [treeio](https://bioconductor.org/packages/treeio/) package also supplies
`read.beast` function to parse output file of `write.beast`. Although
with [treeio](https://bioconductor.org/packages/treeio/), the R community has the ability to
manipulate BEAST Nexus format and process tree data, there is still lacking
library/package for parsing BEAST file in other programming language.

JSON (JavaScript Object Notation)^[https://www.json.org/] is a lightweight data-interchange format and
widely supported in almost all modern programming languages. To make it easy
to import tree with data in other programming
languages, [treeio](https://bioconductor.org/packages/treeio/) supports
exporting tree with data in [jtree format](Importer.html#jtree), which is JSON-based and can be easy to parse using any
languages that supports JSON.

```{r comment=NA}
write.jtree(tree2)
```

The *jtree* format is based on JSON and can be parsed using JSON parser.

```{r comment=NA}
jtree_file <- tempfile(fileext = '.jtree')
write.jtree(tree2, file = jtree_file)
jsonlite::fromJSON(jtree_file)
```


The *jtree* file can be directly imported as a `treedata` object using
`read.jtree` provided also
in [treeio](https://bioconductor.org/packages/treeio/) package.

```{r}
read.jtree(jtree_file)
```

# References