---
title: "SparseArray objects"
author:
- name: Hervé Pagès
  affiliation: Fred Hutchinson Cancer Research Center, Seattle, WA
date: "Compiled `r BiocStyle::doc_date()`; Modified 8 April 2024"
package: SparseArray
vignette: |
  %\VignetteIndexEntry{SparseArray objects}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
output:
  BiocStyle::html_document
---


```{r setup, include=FALSE}
library(BiocStyle)
```

# Introduction

`r Biocpkg("SparseArray")` is an infrastructure package that provides
an array-like container for efficient in-memory representation of
multidimensional sparse data in R.

The package defines the SparseArray virtual class and two concrete subclasses:
COO\_SparseArray and SVT\_SparseArray. Each subclass uses its own internal
representation of the nonzero multidimensional data, the "COO layout"
and the "SVT layout", respectively.

Note that the SparseArray virtual class could easily be extended by other
S4 classes that intent to implement alternative internal representations
of the nonzero multidimensional data.

This vignette focuses on the SVT\_SparseArray container.

# Install and load the package

```{r, eval=FALSE}
if (!require("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("SparseArray")
```

```{r, message=FALSE}
library(SparseArray)
```


# SVT\_SparseArray objects

The SVT\_SparseArray container provides an efficient representation of the
nonzero multidimensional data via a novel layout called the "SVT layout".

Note that SVT\_SparseArray objects mimic as much as possible the behavior of
ordinary matrix and array objects in base R. In particular, they suppport
most of the "standard matrix and array API" defined in base R and in the
`r Biocpkg("matrixStats")` package from CRAN.

## Construction

SVT\_SparseArray objects can be constructed in many ways. A common way
is to coerce an ordinary matrix or array to SVT\_SparseArray:
```{r}
m <- matrix(0L, nrow=6, ncol=4)
m[c(1:2, 8, 10, 15:17, 24)] <- (1:8)*10L
svt1 <- as(m, "SVT_SparseArray")
svt1

a <- array(0L, 5:3)
a[c(1:2, 8, 10, 15:17, 20, 24, 40, 56:60)] <- (1:15)*10L
svt2 <- as(a, "SVT_SparseArray")
svt2
```
Alternatively, the ordinary matrix or array can be passed to the
`SVT_SparseArray` constructor function:
```{r}
svt1 <- SVT_SparseArray(m)
svt2 <- SVT_SparseArray(a)
```

Note that coercing an ordinary matrix or array to SparseArray is also
supported and will produce the same results:
```{r}
svt1 <- as(m, "SparseArray")
svt2 <- as(a, "SparseArray")
```
This is because, for most use cases, the SVT\_SparseArray representation
is more efficient than the COO\_SparseArray representation, so the former
is usually preferred over the latter.

For the same reason, the `SparseArray` constructor function will also
give the preference to the SVT\_SparseArray representation:
```{r}
svt1 <- SparseArray(m)
svt2 <- SparseArray(a)
```
This is actually the most convenient way to turn an ordinary matrix or
array into an SVT\_SparseArray object.

Coercion back to ordinary matrix or array is supported:
```{r}
as.array(svt1)  # same as as.matrix(svt1)

as.array(svt2)
```

## Accessors

The standard array accessors are supported:
```{r}
dim(svt2)

length(svt2)

dimnames(svt2) <- list(NULL, letters[1:4], LETTERS[1:3])
svt2
```

Some additional accessors defined in the `r Biocpkg("S4Arrays")` /
`r Biocpkg("SparseArray")` framework:
```{r}
type(svt1)

type(svt1) <- "double"
svt1

is_sparse(svt1)
```

Other accessors/extractors specific to sparse arrays:
```{r}
## Get the number of nonzero array elements in 'svt1':
nzcount(svt1)

## Extract the "linear indices" of the nonzero array elements in 'svt1':
nzwhich(svt1)

## Extract the "array indices" (a.k.a. "array coordinates") of the
## nonzero array elements in 'svt1':
nzwhich(svt1, arr.ind=TRUE)

## Extract the values of the nonzero array elements in 'svt1' and return
## them in a vector "parallel" to 'nzwhich(svt1)':
#nzvals(svt1)  # NOT READY YET!

sparsity(svt1)
```

See `?SparseArray` for more information and additional examples.

## Subsetting and subassignment

```{r}
svt2[5:3, , "C"]
```

Like with ordinary arrays in base R, assigning values of type `"double"` to
an SVT\_SparseArray object of type `"integer"` will automatically change the
type of the object to `"double"`:
```{r}
type(svt2)
svt2[5, 1, 3] <- NaN
type(svt2)
```

See `?SparseArray_subsetting` for more information and additional examples.

## Summarization methods (whole array)

The following summarization methods are provided at the moment: `anyNA()`,
`any`, `all`, `min`, `max`, `range`, `sum`, `prod`, `mean`, `var`, `sd`.

```{r}
anyNA(svt2)

range(svt2, na.rm=TRUE)

mean(svt2, na.rm=TRUE)

var(svt2, na.rm=TRUE)
```

See `?SparseArray_summarization` for more information and additional examples.

## Operations from the 'Ops', 'Math', 'Math2', and 'Complex' groups

SVT\_SparseArray objects support operations from the 'Ops', 'Math', `Math2`,
and 'Complex' groups, with some restrictions.
See `?S4groupGeneric` in the `r Biocpkg("methods")` package for more
information about these group generics.

```{r}
signif((svt1^1.5 + svt1) %% 100 - 0.6 * svt1, digits=2)
```

See `?SparseArray_Ops`, `?SparseArray_Math`, and `?SparseArray_Complex`,
for more information and additional examples.

## Other operations on SVT\_SparseArray objects

More operations will be added in the future e.g. `is.na()`, `is.infinite()`,
`is.nan()`, etc...

## Generate a random SVT\_SparseArray object

Two convenience functions are provided for this:
```{r}
randomSparseArray(c(5, 6, 2), density=0.5)

poissonSparseArray(c(5, 6, 2), density=0.5)
```

See `?randomSparseArray` for more information and additional examples.


# SVT\_SparseMatrix objects

## Transposition

```{r}
t(svt1)
```

Note that multidimensional transposition is supported via `aperm()`:
```{r}
aperm(svt2)
```

See `?SparseArray_aperm` for more information and additional examples.

## Combine multidimensional objects along a given dimension

Like ordinary matrices in base R, SVT\_SparseMatrix objects can be
combined by rows or columns, with `rbind()` or `cbind()`:
```{r}
svt3 <- poissonSparseMatrix(6, 2, density=0.5)

cbind(svt1, svt3)
```

Note that multidimensional objects can be combined along any dimension
with `abind()`:
```{r}
svt4a <- poissonSparseArray(c(5, 6, 2), density=0.4)
svt4b <- poissonSparseArray(c(5, 6, 5), density=0.2)
svt4c <- poissonSparseArray(c(5, 6, 4), density=0.2)
abind(svt4a, svt4b, svt4c)

svt5a <- aperm(svt4a, c(1, 3:2))
svt5b <- aperm(svt4b, c(1, 3:2))
svt5c <- aperm(svt4c, c(1, 3:2))
abind(svt5a, svt5b, svt5c, along=2)
```

See `?SparseArray_abind` for more information and additional examples.

## Matrix multiplication and cross-product

Like ordinary matrices in base R, SVT\_SparseMatrix objects can be
multiplied with the `%*%` operator:
```{r}
m6 <- matrix(0L, nrow=5, ncol=6, dimnames=list(letters[1:5], LETTERS[1:6]))
m6[c(2, 6, 12:17, 22:30)] <- 101:117
svt6 <- SparseArray(m6)

svt6 %*% svt3
```

They also support `crossprod()` and `tcrossprod()`:
```{r}
crossprod(svt3)
```

See `?SparseMatrix_mult` for more information and additional examples.

## matrixStats methods

The `r Biocpkg("SparseArray")` package provides memory-efficient col/row
summarization methods for SVT\_SparseMatrix objects:
```{r}
colVars(svt6)
```

Note that multidimensional objects are supported:
```{r}
colVars(svt2)
colVars(svt2, dims=2)
colAnyNAs(svt2)
colAnyNAs(svt2, dims=2)
```

See `?matrixStats_methods` for more information and additional examples.

## `rowsum()` method

A `rowsum()` method is provided:
```{r}
rowsum(svt6, group=c(1:3, 2:1))
```

See `?rowsum_methods` for more information and additional examples.

## Read/write a sparse matrix from/to a CSV file

Use `writeSparseCSV()` to write a sparse matrix to a CSV file:
```{r}
csv_file <- tempfile()
writeSparseCSV(m6, csv_file)
```

Use `readSparseCSV()` to read the file. This will import the data as
an SVT\_SparseMatrix object:
```{r}
readSparseCSV(csv_file)
```

See `?readSparseCSV` for more information and additional examples.


# Comparison with dgCMatrix objects

The nonzero data of a SVT\_SparseArray object is stored in a _Sparse
Vector Tree_. This internal data representation is referred to as
the "SVT layout". It is similar to the "CSC layout" (compressed, sparse,
column-oriented format) used by CsparseMatrix derivatives from
the `r CRANpkg("Matrix")` package, like dgCMatrix or lgCMatrix objects,
but with the following improvements:

- The "SVT layout" supports sparse arrays of arbitrary dimensions.

- With the "SVT layout", the sparse data can be of any type.
  Whereas CsparseMatrix derivatives only support sparse data of
  type `"double"` or `"logical"` at the moment.

- The "SVT layout" imposes no limit on the number of nonzero array
  elements that can be stored. With dgCMatrix/lgCMatrix objects, this
  number must be < 2^31.

- Overall, the "SVT layout" allows more efficient operations on
  SVT\_SparseArray objects.

See `?SVT_SparseArray` for more information.


# Learn more

Please consult the individual man pages in the `r Biocpkg("SparseArray")`
package to learn more about SVT\_SparseArray objects and about the
package. A good starting point is the man page for SparseArray
objects: `?SparseArray`


# Session information

```{r}
sessionInfo()
```