CHANGES IN VERSION 0.32.0 ------------------------- NEW FEATURES o Add nzwhich() method for DelayedArray objects (block-processed). o Support %*%, crossprod(), and tcrossprod() between DelayedMatrix and COO_SparseMatrix objects. SIGNIFICANT USER-VISIBLE CHANGES o The default realize() method now returns an SVT_SparseArray object instead of an SparseArraySeed object when the array-like object to realize is sparse and 'BACKEND' is NULL. o Coercing a DelayedArray object or derivative to SparseArray should be much more efficient (thanks to various tweaks that happened in the SparseArray and HDF5Array packages). DEPRECATED AND DEFUNCT o Deprecate SparseArraySeed objects. o Deprecate OLD_extract_sparse_array() and read_sparse_block() generics and methods. o Fix bug in coercion from DelayedArray to SparseArray when the object to coerce has NAs. See https://github.com/Bioconductor/HDF5Array/issues/61 CHANGES IN VERSION 0.30.0 ------------------------- NEW FEATURES o rowsum()/colsum() now acknowledge the current "automatic realization backend" and "automatic BiocParallel BPPARAM". See '?DelayedArray::rowsum' for more information. SIGNIFICANT USER-VISIBLE CHANGES o Rename supportedRealizationBackends -> registeredRealizationBackends. o Slightly modify the behavior of 'realize(x, BACKEND=NULL)'. See https://github.com/hansenlab/minfi/issues/256 o Two important changes to matrix multiplication of DelayedMatrix objects. 1. Now it returns an ordinary matrix by default (before this change an ordinary matrix was returned wrapped in a DelayedMatrix object). The user can change the default behavior by setting an "automatic realization backend". See ?DelayedArray::`%*%` for more information. 2. Better block processing strategy when only one of the two operands is a DelayedMatrix object (or derivative). The new strategy acknowledges the geometry of the physical chunks of the data in the object. This can make a huge difference in some cases. For example, using a subset of the "1.3 Million Brain Cell Dataset" from 10x Genomics: library(HDF5Array) library(ExperimentHub) hub <- ExperimentHub() tenx <- TENxMatrix(hub[["EH1039"]], group="mm10") M <- tenx[ , 1:25000] m <- cbind(runif(ncol(M)), runif(ncol(M))) M %*% m Doing 'M %*% m' now takes 7.6s and uses 1.1Gb of memory, compared to 110s / 3.1Gb before this improvement. Furthermore, the new strategy operates in linear time and at constant memory: with DelayedArray with DelayedArray ncol(M) 0.29.2 < 0.29.2 ------- ----------------- ----------------- 12500 4.3s / 1.1Gb 32s / 2.1Gb 25000 7.6s / 1.1Gb 110s / 3.1Gb 50000 13.4s / 1.1Gb 495s / 5.6Gb 100000 24.0s / 1.2Gg 2409s / 9.1Gb Note that the new strategy is implemented in internal helpers DelayedArray:::BLOCK_mult_Lgrid() and DelayedArray:::BLOCK_mult_Rgrid(). When the two operands are DelayedMatrix objects, the old strategy (which is implemented in DelayedArray:::.super_BLOCK_mult()) is still used. CHANGES IN VERSION 0.28.0 ------------------------- NEW FEATURES o Add coercion from DelayedArray to SparseArray. o Add efficient rowVars/colVars methods for DelayedMatrix objects. These methods, like all other row/col summarization methods implemented in the DelayedArray package, use block processing and can handle blocks of **arbitrary** geometry, that is, they can handle a grid of class ArbitraryArrayGrid (the most general type of grid). o Add 'useNames' arg to row/colMins, row/colMaxs, row/colRanges, and row/colVars methods for DelayedMatrix objects. o Add 'current_viewport' argument to set_grid_context(). SIGNIFICANT USER-VISIBLE CHANGES o DelayedArray now depends on S4Arrays and SparseArray. o Some improvements to the rowMeans/colMeans methods for DelayedMatrix objects. CHANGES IN VERSION 0.26.0 ------------------------- - No changes in this version. CHANGES IN VERSION 0.24.0 ------------------------- SIGNIFICANT USER-VISIBLE CHANGES o Move the aperm() S4 generic to BiocGenerics. CHANGES IN VERSION 0.22.0 ------------------------- DEPRECATED AND DEFUNCT o The following stuff is now defunct after being deprecated in previous versions of the package: - blockGrid(): replaced with defaultAutoGrid() - rowGrid(): replaced with rowAutoGrid() - colGrid(): replaced with colAutoGrid() - multGrids(): replaced with defaultMultAutoGrids() - linearInd(): replaced with Mindex2Lindex() - viewportApply(): replaced with gridApply() - viewportReduce(): replaced with gridReduce() - getRealizationBackend(): replaced with getAutoRealizationBackend() - setRealizationBackend(): replaced with setAutoRealizationBackend() - RealizationSink(): replaced with AutoRealizationSink() BUG FIXES o Small tweak to updateObject() method for DelayedArray objects (see commit abcd154). CHANGES IN VERSION 0.20.0 ------------------------- BUG FIXES o Fix long-standing bugs in dense2sparse(): - mishandling of NAs/NaNs in input - 1D case didn't work CHANGES IN VERSION 0.18.0 ------------------------- NEW FEATURES o Implement ConstantArray objects. The ConstantArray class is a DelayedArray subclass to efficiently mimic an array containing a constant value, without actually creating said array in memory. o Add scale() method for DelayedMatrix objects. o Add sinkApply(), a convenience function for walking on a RealizationSink derivative and filling it with blocks of data. o Proper support for dgRMatrix and lgRMatrix objects as DelayedArray object seeds: - is_sparse() now returns TRUE on dgRMatrix and lgRMatrix objects. - Support coercion back and forth between SparseArraySeed objects and dgRMatrix/lgRMatrix objects. - Add extract_sparse_array() methods for dgRMatrix and lgRMatrix objects. These changes bring the treatment of dgRMatrix and lgRMatrix objects to the same level as dgCMatrix and lgCMatrix objects. For example, wrapping a dgRMatrix or lgRMatrix object in a DelayedArray object will trigger the same sparse-optimized mechanisms during block processing as when wrapping a dgCMatrix or lgCMatrix object. o rbind() and cbind() on sparse DelayedArray objects are now fully supported. o Delayed operations of type DelayedUnaryIsoOpWithArgs now preserve sparsity when appropriate. o Implement DummyArrayGrid and DummyArrayViewport objects. SIGNIFICANT USER-VISIBLE CHANGES o Rename viewportApply()/viewportReduce() -> gridApply()/gridReduce(). BUG FIXES o Subsetting of a DelayedArray object now propagates the names/dimnames, even when drop=TRUE and the result has only 1 dimension (issue #78). o log() on a DelayedArray object now handles the 'base' argument. o Fix issue in is_sparse() methods for DelayedUnaryIsoOpStack and DelayedNaryIsoOp objects. o cbind()/rbind() no longer coerce supplied objects to type of 1st object (commit f1279e07). o Fix small issue in dim() setter (commit c9488537). CHANGES IN VERSION 0.16.0 ------------------------- NEW FEATURES o Added 'as.sparse' argument to read_block() (see ?read_block) and to AutoRealizationSink() (see ?AutoRealizationSink). o SparseArraySeed objects now can hold dimnames. As a consequence read_block() now also propagates the dimnames to sparse blocks, not just to dense blocks. o Matrix multiplication is now sparse-aware via sparseMatrices. o Added is_sparse<- generic (with methods for HDF5Array/HDF5ArraySeed objects only, see ?HDF5Array in the HDF5Array package). o Added viewportApply() and viewportReduce() to the blockApply() family. o Added set_grid_context() for testing/debugging callback functions passed to blockApply() and family. SIGNIFICANT USER-VISIBLE CHANGES o Renamed first write_block() argument 'x' -> 'sink' o Renamed: RealizationSink() -> AutoRealizationSink() get/setRealizationBackend() -> get/setAutoRealizationBackend() blockGrid() -> defaultAutoGrid() row/colGrid() -> row/colAutoGrid() o Improved support of sparse data: - Slightly more efficient coercion from SparseArraySeed to dgCMatrix/lgCMatrix (small speedup and memory footprint reduction). This provides a minor speedup to the sparse aware block-processed row/col summarization methods for DelayedMatrix objects when the object is sparse. (These methods are: row/colSums(), row/colMeans(), row/colMins(), row/colMaxs(), and row/colRanges(). The methods defined in DelayedMatrixStats are not sparse aware yet so are not affected.) - Made the following block-processed operations on DelayedArray objects sparse aware: anyNA(), which(), max(), min(), range(), sum(), prod(), any(), all(), and mean(). With a typical 50%-60% speedup when the DelayedArray object is sparse. - Implemented a bunch of methods to operate natively on SparseArraySeed objects. Their main purpose is to support the above i.e. to support block processed methods for DelayedArray objects like sum(), mean(), which(), etc... when the object is sparse. Note that more are needed to also support the sparse aware block-processed row/col summarization methods for DelayedMatrix objects so we can finally ditch the costly coercion from SparseArraySeed to dgCMatrix/lgCMatrix that they currently rely on. o The utility functions for retrieving grid context for the current block/viewport should now be called with no argument (previously one needed to pass the current block to them). These functions are effectiveGrid(), currentBlockId(), and currentViewport(). o DelayedArray now depends on the MatrixGenerics package. BUG FIXES o Various fixes and improvements to block processing of sparse logical DelayedMatrix objects (e.g. DelayedMatrix object with a lgCMatrix seed from thr Matrix package). o Fix extract_sparse_array() inefficiency on dgCMatrix and lgCMatrix objects. o Switch matrix multiplication to bplapply2() from bpiterate() to fix error handling. CHANGES IN VERSION 0.14.0 ------------------------- NEW FEATURES o Support 'type(x) <- new_type' to change the type of a DelayedArray object. o 1D-style single bracket subsetting of DelayedArray objects now supports subsetting by a numeric matrix with one column per dimension. SIGNIFICANT USER-VISIBLE CHANGES o No more parallel evaluation by default, that is, getAutoBPPARAM() now returns NULL on a fresh session instead of one of the parallelization backends defined in BiocParallel. It is now the responsibility of the user to set the parallelization backend (with setAutoBPPARAM()) if they wish things like matrix multiplication, rowsum() or rowSums() use parallel evaluation again. Also BiocParallel has been moved from Depends to Suggests. o Replace arrayInd2() and linearInd() with Lindex2Mindex() and Mindex2Lindex(). The new functions are implemented in C for better performances and they properly handle L-index values greater than INT_MAX (2^31 - 1) in the input and output. o 2x speedup to coercion from DelayedArray to SparseArraySeed or dgCMatrix. DEPRECATED AND DEFUNCT o arrayInd2() and linearInd() are now deprecated in favor of Lindex2Mindex() and Mindex2Lindex(). BUG FIXES o Fix handling of linear indices >= 2^31 in 1D-style single bracket subsetting of DelayedArray objects. o rowsum() & colsum() methods for DelayedArray objects now respect factor level ordering (issue #59). o Coercion from DelayedMatrix to dgCMatrix now propagates the dimnames. o No more quotes around the NA values of a DelayedArray of type "character". o Better error message when Ops methods for DelayedArray objects reject their operands. CHANGES IN VERSION 0.12.0 ------------------------- NEW FEATURES o Add isPristine() o Delayed subassignment now accepts a right value with dimensions that are not strictly the same as the dimensions of the selection as long as the "effective dimensions" are the same o Small improvement to delayed dimnames setter: atomic vectors or factors in the supplied 'dimnames' list are now accepted and passed thru as.character() SIGNIFICANT USER-VISIBLE CHANGES o Improve show() method for DelayedArray objects (see commit 54540856) BUG FIXES o Setting and getting the dimnames of a DelayedArray object or derivative now preserves the names on the dimnames o Some fixes related to DelayedArray objects with list array seeds (see commit 6c94eac7) CHANGES IN VERSION 0.10.0 ------------------------- NEW FEATURES o Many improvements to matrix multiplication (%*%) of DelayedMatrix objects by Aaron Lun. Also add limited support for (t)crossprod methods. o Add rowsum() and colsum() methods for DelayedMatrix objects. These methods are block-processed operations. o Many improvements to the RleArray() contructor (see messages for commits 582234a7 and 0a36ee01 for more info). o Add seedApply() o Add multGrids() utility (still a work-in-progress, not documented yet) CHANGES IN VERSION 0.8.0 ------------------------ NEW FEATURES o Add get/setAutoBlockSize(), getAutoBlockLength(), get/setAutoBlockShape() and get/setAutoGridMaker(). o Add rowGrid() and colGrid(), in addition to blockGrid(). o Add get/setAutoBPPARAM() to control the automatic 'BPPARAM' used by blockApply(). o Reduce memory usage when realizing a sparse DelayedArray to disk On-disk realization of a DelayedArray object that is reported to be sparse (by is_sparse()) to a "sparsity-optimized" backend (i.e. to a backend with a memory efficient write_sparse_block() like the TENxMatrix backend imple- mented in the HDF5Array package) now preserves sparse representation of the data all the way. More precisely, each block of data is now kept in a sparse form during the 3 steps that it goes thru: read from seed, realize in memory, and write to disk. o showtree() now displays whether a tree node or leaf is considered sparse or not. o Enhance "aperm" method and dim() setter for DelayedArray objects. In addition to allowing dropping "ineffective dimensions" (i.e. dimensions equal to 1) from a DelayedArray object, aperm() and the dim() setter now allow adding "ineffective dimensions" to it. o Enhance subassignment to a DelayedArray object. So far subassignment to a DelayedArray object only supported the **linear form** (i.e. x[i] <- value) with strong restrictions (the subscript 'i' must be a logical DelayedArray of the same dimensions as 'x', and 'value' must be an ordinary vector of length 1). In addition to this linear form, subassignment to a DelayedArray object now supports the **multi-dimensional form** (e.g. x[3:1, , 6] <- 0). In this form, one subscript per dimension is supplied, and each subscript can be missing or be anything that multi-dimensional subassignment to an ordinary array supports. The replacement value (a.k.a. the right value) can be an array-like object (e.g. ordinary array, dgCMatrix object, DelayedArray object, etc...) or an ordinary vector of length 1. Like the linear form, the multi-dimensional form is also implemented as a delayed operation. o Re-implement internal helper simple_abind() in C and support long arrays. simple_abind() is the workhorse behind realization of arbind() and acbind() operations on DelayedArray objects. o Add "table" and (restricted) "unique" methods for DelayedArray objects, both block-processed. o range() (block-processed) now supports the 'finite' argument on a DelayedArray object. o %*% (block-processed) now works between a DelayedMatrix object and an ordinary vector. o Improve support for DelayedArray of type "list". o Add TENxMatrix to list of supported realization backends. o Add backend-agnostic RealizationSink() constructor. o Add linearInd() utility for turning array indices into linear indices. Note that linearInd() performs the reverse transformation of base::arrayInd(). o Add low-level utilities mapToGrid() and mapToRef() for mapping reference array positions to grid positions and vice-versa. o Add downsample() for reducing the "resolution" of an ArrayGrid object. o Add maxlength() generic and methods for ArrayGrid objects. SIGNIFICANT USER-VISIBLE CHANGES o Multi-dimensional subsetting is no more delayed when drop=TRUE and the result has only one dimension. In this case the result now is returned as an **ordinary** vector (atomic or list). This is the only case of multi-dimensional single bracket subsetting that is not delayed. o Rename defaultGrid() -> blockGrid(). The 'max.block.length' argument is replaced with the 'block.length' argument. 2 new arguments are added: 'chunk.grid' and 'block.shape'. o Major improvements to the block processing mechanism. All block-processed operations (except realization by block) now support blocks of **arbitrary** geometry instead of column-oriented blocks only. 'blockGrid(x)', which is called by the block-processed operations to get the grid of blocks to use on 'x', has the following new features: 1) It's "chunk aware". This means that, when the chunk grid is known (i.e. when 'chunkGrid(x)' is not NULL), 'blockGrid(x)' defines blocks that are "compatible" with the chunks i.e. that any chunk is fully contained in a block. In other words, blocks are chosen so that chunks don't cross their boundaries. 2) When the chunk grid is unknown (i.e. when 'chunkGrid(x)' is NULL), blocks are "isotropic", that is, they're as close as possible to an hypercube instead of being "column-oriented" (column-oriented blocks, also known as "linear blocks", are elongated along the 1st dimension, then along the 2nd dimension, etc...) 3) The returned grid has the lowest "resolution" compatible with 'getAutoBlockSize()', that is, the blocks are made as big as possible as long as their size in memory doesn't exceed 'getAutoBlockSize()'. Note that this is not a new feature. What is new though is that an exception now is made when the chunk grid is known and some chunks are >= 'getAutoBlockSize()', in which case 'blockGrid(x)' returns a grid that is the same as the chunk grid. These new features are supposed to make the returned grid "optimal" for block processing. (Some benchmarks still need to be done to confirm/quantify this.) o The automatic block size now is set to 100 Mb (instead of 4.5 Mb previously) at package startup. Use setAutoBlockSize() to change the automatic block size. o No more 'BPREDO' argument to blockApply(). o Replace block_APPLY_and_COMBINE() with blockReduce(). BUG FIXES o No-op operations on a DelayedArray derivative really act like no-ops. Operating on a DelayedArray derivative (e.g. RleArray, HDF5Array or GDSArray) will now return an objet of the original class if the result is "pristine" (i.e. if it doesn't carry delayed operations) instead of degrading the object to a DelayedArray instance. This applies for example to 't(t(x))' or 'dimnames(x) <- dimnames(x)' etc...