---
title: "CITE-seq data with MultiAssayExperiment and MuData"
author:
- name: "Danila Bredikhin"
  affiliation: "European Molecular Biology Laboratory, Heidelberg, Germany"
  email: "danila.bredikhin@embl.de"
- name: "Ilia Kats"
  affiliation: "German Cancer Research Center, Heidelberg, Germany"
  email: "i.kats@dkfz-heidelberg.de"
date: "`r Sys.Date()`"
output:
  BiocStyle::html_document:
    toc_float: true
vignette: >
  %\VignetteIndexEntry{Cord Blood CITE-seq with MuData}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

## Introduction

CITE-seq data provide RNA and surface protein counts for the same cells. 
This tutorial shows how MuData can be integrated into with Bioconductor 
workflows to analyse CITE-seq data.

## Installation

The most recent dev build can be installed from GitHub:

```{r, eval = FALSE}
library(remotes)
remotes::install_github("ilia-kats/MuData")
```

Stable version of `MuData` will be available in future bioconductor versions.

## Loading libraries

```{r setup, message = FALSE}
library(MuData)
library(SingleCellExperiment)
library(MultiAssayExperiment)
library(SingleCellMultiModal)
library(scater)

library(rhdf5)
```

## Loading data

We will use CITE-seq data accessible with the 
[`SingleCellMultiModal` Bioconductor package](https://bioconductor.org/packages/release/data/experiment/vignettes/SingleCellMultiModal/inst/doc/CITEseq.html), 
which was originally described in 
[Stoeckius et al., 2017](https://www.nature.com/articles/nmeth.4380).

```{r}
mae <- CITEseq(
    DataType="cord_blood", modes="*", dry.run=FALSE, version="1.0.0"
)

mae
```

We see two modalities in the object — `scRNAseq` and `scADT`, the latter 
providing counts for antibody-derived tags. Notably, each experiment is a matrix.

## Processing ADT data

While CITE-seq analysis workflows such as 
[CiteFuse](http://www.bioconductor.org/packages/release/bioc/html/CiteFuse.html) 
should be consulted for more details, below we exemplify simple data 
transformation in order to demonstrate how their output can be saved 
to an H5MU file later on.

For ADT counts, we will apply CLR transformation following 
[Hao et al., 2020](https://doi.org/10.1016/j.cell.2021.04.048):

```{r}
# Define CLR transformation as in the Seurat workflow
clr <- function(data) t(
  apply(data, 1, function(x) log1p(
    x / (exp(sum(log1p(x[x > 0]), na.rm = TRUE) / length(x)))
  ))
)
```

We will make the ADT modality a `SingleCellExperiment` object and add an assay 
with CLR-transformed counts:

```{r}
adt_counts <- mae[["scADT"]]

mae[["scADT"]] <- SingleCellExperiment(adt_counts)
assay(mae[["scADT"]], "clr") <- clr(adt_counts)
```

We will also generate reduced dimensions taking advantage of the functionality 
in the [`scater` package](https://bioconductor.org/packages/release/bioc/vignettes/scater/inst/doc/overview.html):

```{r}
mae[["scADT"]] <- runPCA(
  mae[["scADT"]], exprs_values = "clr", ncomponents = 20
)
```

```{r}
plotReducedDim(mae[["scADT"]], dimred = "PCA",
               by_exprs_values = "clr", colour_by = "CD3")
plotReducedDim(mae[["scADT"]], dimred = "PCA",
               by_exprs_values = "clr", colour_by = "CD14")
```

## Writing H5MU files

We can write the contents of the MultiAssayExperiment object into an H5MU file:

```{r}
writeH5MU(mae, "cord_blood_citeseq.h5mu")
```

We can check that both modalities were written to the file, whether it was a 
`matrix` for RNA or `SingleCellExperiment` for ADT:

```{r}
h5 <- rhdf5::H5Fopen("cord_blood_citeseq.h5mu")
h5ls(H5Gopen(h5, "mod"), recursive = FALSE)
```

... both assays for ADT — raw counts are stored in `X` and CLR-transformed 
counts are in the corresponding layer:

```{r}
h5ls(H5Gopen(h5, "mod/scADT"), recursive = FALSE)
h5ls(H5Gopen(h5, "mod/scADT/layers"), recursive = FALSE)
```

... as well as reduced dimensions (PCA):

```{r}
h5ls(H5Gopen(h5, "mod/scADT/obsm"), recursive = FALSE)
# There is an alternative way to access groups:
# h5&'mod'&'scADT'&'obsm'
rhdf5::H5close()
```

## References

- [Muon: multimodal omics analysis framework](https://www.biorxiv.org/content/10.1101/2021.06.01.445670) preprint

- [mudata](https://mudata.readthedocs.io/) (Python) documentation

- muon [documentation](https://muon.readthedocs.io/) and [web page](https://gtca.github.io/muon/)

- Stoeckius, M., Hafemeister, C., Stephenson, W., Houck-Loomis, B., Chattopadhyay, P.K., Swerdlow, H., Satija, R. and Smibert, P., 2017. Simultaneous epitope and transcriptome measurement in single cells. _Nature methods_, 14(9), pp.865-868.

- Hao, Y., Hao, S., Andersen-Nissen, E., Mauck III, W.M., Zheng, S., Butler, A., Lee, M.J., Wilk, A.J., Darby, C., Zager, M. and Hoffman, P., 2021. Integrated analysis of multimodal single-cell data. _Cell_.


## Session Info

```{r}
sessionInfo()
```