---
title: "Combine TreeSummarizedExperiment objects"
author: 
- name: Ruizhu HUANG
  affiliation: 
  - Institute of Molecular Life Sciences, University of Zurich.
  - SIB Swiss Institute of Bioinformatics.
- name: Charlotte Soneson
  affiliation: 
  - Institute of Molecular Life Sciences, University of Zurich.
  - SIB Swiss Institute of Bioinformatics.
- name: Mark Robinson
  affiliation: 
  - Institute of Molecular Life Sciences, University of Zurich.
  - SIB Swiss Institute of Bioinformatics.
package: TreeSummarizedExperiment
output: 
  BiocStyle::html_document
vignette: >
  %\VignetteIndexEntry{2. Combine TSEs}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
bibliography: TreeSE_vignette.bib
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = TRUE)
```


# Combine multiple `TreeSummarizedExperiment` objects 

Multiple `TreeSummarizedExperiemnt` objects (**TSE**) can be combined by using
`rbind` or `cbind`. Here, we create a toy `TreeSummarizedExperiment` object
using `makeTSE()` (see `?makeTSE()`). As the tree in the row/column tree slot is
generated randomly using `ape::rtree()`, `set.seed()` is used to create
reproducible results.

```{r}
library(TreeSummarizedExperiment)

set.seed(1)
# TSE: without the column tree
(tse_a <- makeTSE(include.colTree = FALSE))

# combine two TSEs by row
(tse_aa <- rbind(tse_a, tse_a))
```
The generated `tse_aa` has 20 rows, which is two times of that in `tse_a`. The row tree in `tse_aa` is the same as that in `tse_a`.

```{r}
identical(rowTree(tse_aa), rowTree(tse_a))
```

If we `rbind` two TSEs (e.g., `tse_a` and `tse_b`) that have different row trees, the obtained TSE (e.g., `tse_ab`) will have two row trees. 
```{r}
set.seed(2)
tse_b <- makeTSE(include.colTree = FALSE)

# different row trees
identical(rowTree(tse_a), rowTree(tse_b))

# 2 phylo tree(s) in rowTree
(tse_ab <- rbind(tse_a, tse_b))
```

In the row link data, the `whichTree` column gives information about which tree the row is mapped to.
For `tse_aa`, there is only one tree named as `phylo`. However, for `tse_ab`, there are two trees (`phylo` and `phylo.1`).
```{r}
rowLinks(tse_aa)
rowLinks(tse_ab)
```

The name of trees can be accessed using `rowTreeNames`. If the input **TSE**s use the same name for trees, `rbind` will automatically create valid and unique names for trees by using `make.names`. `tse_a` and `tse_b` both use `phylo` as the name of their row trees. In `tse_ab`, the row tree that originates from `tse_b` is named as `phylo.1` instead.
```{r}
rowTreeNames(tse_aa)
rowTreeNames(tse_ab)

# The original tree names in the input TSEs
rowTreeNames(tse_a)
rowTreeNames(tse_b)
```

Once the name of trees is changed, the column `whichTree` in the `rowLinks()` is updated accordingly.
```{r}
rowTreeNames(tse_ab) <- paste0("tree", 1:2)
rowLinks(tse_ab)
```

To run `cbind`, **TSE**s should agree in the row dimension. If **TSE**s only differ in the row tree, the row tree and the row link data are dropped.
```{r}
cbind(tse_a, tse_a)
cbind(tse_a, tse_b)
```


# Subset a **TSE** object

We obtain a subset of `tse_ab` by extracting the data on rows `11:15`. These rows are mapped to the same tree named as `phylo.1`. So, the `rowTree` slot of `sse` has only one tree.
```{r}
(sse <- tse_ab[11:15, ])
rowLinks(sse)
```

`[` works not only as a getter but also a setter to replace a subset of `sse`.
```{r}
set.seed(3)
tse_c <- makeTSE(include.colTree = FALSE)
rowTreeNames(tse_c) <- "new_tree"

# the first two rows are from tse_c, and are mapped to 'new_tree'
sse[1:2, ] <- tse_c[5:6, ]
rowLinks(sse)
```

The **TSE** object can be subset also by nodes or/and trees using `subsetByNodes`
```{r}
# by tree
sse_a <- subsetByNode(x = sse, whichRowTree = "new_tree")
rowLinks(sse_a)

# by node
sse_b <- subsetByNode(x = sse, rowNode = 5)
rowLinks(sse_b)

# by tree and node
sse_c <- subsetByNode(x = sse, rowNode = 5, whichRowTree = "tree2")
rowLinks(sse_c)
```

# Change specific trees of **TSE**

By using `colTree`, we can add a column tree to `sse` that has no column tree before.
```{r}
colTree(sse)

library(ape)
set.seed(1)
col_tree <- rtree(ncol(sse))

# To use 'colTree` as a setter, the input tree should have node labels matching
# with column names of the TSE.
col_tree$tip.label <- colnames(sse)

colTree(sse) <- col_tree
colTree(sse)
```

`sse` has two row trees. We can replace one of them with a new tree by
specifying `whichTree` of the `rowTree`.
```{r}
# the original row links
rowLinks(sse)

# the new row tree
set.seed(1)
row_tree <- rtree(4)
row_tree$tip.label <- paste0("entity", 5:7)

# replace the tree named as the 'new_tree'
nse <- sse
rowTree(nse, whichTree = "new_tree") <- row_tree
rowLinks(nse)
```


In the row links, the first two rows now have new values in `nodeNum` and
`nodeLab_alias`. The name in `whichTree` is not changed but the tree is actually
updated.

```{r}
# FALSE is expected
identical(rowTree(sse, whichTree = "new_tree"),
          rowTree(nse, whichTree = "new_tree"))

# TRUE is expected
identical(rowTree(nse, whichTree = "new_tree"),
          row_tree)
```

If nodes of the input tree and rows of the **TSE** are named differently, users
can match rows with nodes via `changeTree` with `rowNodeLab` provided.


# Session Info

```{r}
sessionInfo()
```

# Reference