---
title: "The `waddR` package"
output: rmarkdown::html_vignette
vignette: >
    %\VignetteIndexEntry{waddR}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)
```


## Introduction

`waddR` is an R package that provides a 2-Wasserstein distance based 
statistical test for detecting and describing differential distributions in
one-dimensional data.
Functions for wasserstein distance calculation, differential distribution
testing, and a specialized test for differential expression in scRNA data are
provided.

The package `waddR` provides three sets of utilities to cover distinct use
cases, each described in a separate vignette:

* Fast and accurate 
[calculation of the  2-Wasserstein distance](wasserstein_metric.html)

* [Two-sample test](wasserstein_test.html) to check for differences between two
distributions

* Detect
[differential gene expression distributions](wasserstein_singlecell.html)
in scRNAseq data

These are bundled into the same package, because they are internally dependent:
The procedure for detecting differential distributions in single-cell data is a
refinement of the general two-sample test, which itself uses the 2-Wasserstein
distance to compare two distributions.


### Wasserstein Distance functions

The 2-Wasserstein distance is a metric to describe the distance between two
distributions, representing two diferent conditions A and B. This package
specifically considers the squared 2-Wasserstein distance d := W^2 which
offers a decomposition into location, size, and shape terms.

The package `waddR` offers three functions to calculate the 2-Wasserstein
distance, all of which are implemented in Cpp and exported to R with Rcpp for
better performance.
The function `wasserstein_metric` is a Cpp reimplementation of the
function `wasserstein1d` from the package `transport` and offers the most exact
results.
The functions `squared_wass_approx` and `squared_wass_decomp` compute
approximations of the squared 2-Wasserstein distance with `squared_wass_decomp`
also returning the decomosition terms for location, size, and shape. 
See `?wasserstein_metric`, `?squared_wass_aprox`, and `?squared_wass_decomp`.

### Two-Sample Testing

This package provides two testing procedures using the 2-Wasserstein distance
to test whether two distributions F_A and F_B given in the form of samples are
different ba specifically testing the null hypothesis H0: F_A = F_B against the
alternative hypothesis H1: F_A != F_B.

The first, semi-parametric (SP), procedure uses a test based on permutations
combined with a generalized pareto distribution approximation to estimate small
pvalues accurately.

The second procedure (ASY) uses a test based on asymptotic theory which is
valid only if the samples can be assumed to come from continuous
distributions.

See `?wasserstein.test` for more details.

### Single Cell Test: The waddR package provides an adaptation of the

semi-parametric testing procedure based on the 2-Wasserstein distance
which is specifically tailored to identify differential distributions in
single-cell RNA-seqencing (scRNA-seq) data. In particular, a two-stage
(TS) approach has been implemented that takes account of the specific
nature of scRNA-seq data by separately testing for differential
proportions of zero gene expression (using a logistic regression model)
and differences in non-zero gene expression (using the semi-parametric
2-Wasserstein distance-based test) between two conditions.

See the documentation of the single cell procedure `?wasserstein.sc` and the
test for zero expression levels `?testZeroes` for more details.


## Installation

To install `waddR` from Bioconductor, use `BiocManager` with the following commands:

```{r install, eval=FALSE, echo=TRUE}
if (!requireNamespace("BiocManager"))
 install.packages("BiocManager")
BiocManager::install("MyPackage")
```

Using `BiocManager`, the package can also be installed from github directly:

```{r install-github, eval=FALSE, echo=TRUE}
BiocManager::install("goncalves-lab/waddR")
```

The package `waddR` can then be used in R:

```{r load-package}
library("waddR")
```

## Session Info

```{r session-info}
sessionInfo()
```