TileDBArray 1.19.1
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.38083142 0.11234525 -2.29940851 . -0.2591990 0.8143135
## [2,] 0.02795202 -0.96746620 -1.84212152 . -1.0932518 0.1605919
## [3,] 0.72748657 1.65021429 -0.29649533 . -1.4741443 0.1014634
## [4,] -0.48274634 -1.38172755 -1.37907769 . -0.6874571 1.5974623
## [5,] -2.65772590 0.34393228 0.52446006 . 1.5964242 -1.3738002
## ... . . . . . .
## [96,] -0.84042850 -0.04264024 -0.83820161 . 0.8124257 -0.6096601
## [97,] 1.54777081 0.62924525 -1.15350556 . -0.1966158 0.6287125
## [98,] 0.59914658 0.48742728 -0.19632296 . 1.0127335 1.6184330
## [99,] -1.03566335 -0.17852972 -0.11883407 . -1.3192679 -0.8106710
## [100,] 0.96039118 0.08230523 0.58901001 . -2.3635629 0.2016447
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.38083142 0.11234525 -2.29940851 . -0.2591990 0.8143135
## [2,] 0.02795202 -0.96746620 -1.84212152 . -1.0932518 0.1605919
## [3,] 0.72748657 1.65021429 -0.29649533 . -1.4741443 0.1014634
## [4,] -0.48274634 -1.38172755 -1.37907769 . -0.6874571 1.5974623
## [5,] -2.65772590 0.34393228 0.52446006 . 1.5964242 -1.3738002
## ... . . . . . .
## [96,] -0.84042850 -0.04264024 -0.83820161 . 0.8124257 -0.6096601
## [97,] 1.54777081 0.62924525 -1.15350556 . -0.1966158 0.6287125
## [98,] 0.59914658 0.48742728 -0.19632296 . 1.0127335 1.6184330
## [99,] -1.03566335 -0.17852972 -0.11883407 . -1.3192679 -0.8106710
## [100,] 0.96039118 0.08230523 0.58901001 . -2.3635629 0.2016447
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0.00 -0.51 0.00 . -0.51 0.00
## [997,] 0.00 0.00 0.00 . -1.50 0.00
## [998,] 0.00 0.00 0.00 . 0.00 0.00
## [999,] 0.00 0.00 1.30 . 0.00 0.00
## [1000,] 0.00 0.00 0.00 . 0.00 0.00
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE TRUE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 1.38083142 0.11234525 -2.29940851 . -0.2591990 0.8143135
## GENE_2 0.02795202 -0.96746620 -1.84212152 . -1.0932518 0.1605919
## GENE_3 0.72748657 1.65021429 -0.29649533 . -1.4741443 0.1014634
## GENE_4 -0.48274634 -1.38172755 -1.37907769 . -0.6874571 1.5974623
## GENE_5 -2.65772590 0.34393228 0.52446006 . 1.5964242 -1.3738002
## ... . . . . . .
## GENE_96 -0.84042850 -0.04264024 -0.83820161 . 0.8124257 -0.6096601
## GENE_97 1.54777081 0.62924525 -1.15350556 . -0.1966158 0.6287125
## GENE_98 0.59914658 0.48742728 -0.19632296 . 1.0127335 1.6184330
## GENE_99 -1.03566335 -0.17852972 -0.11883407 . -1.3192679 -0.8106710
## GENE_100 0.96039118 0.08230523 0.58901001 . -2.3635629 0.2016447
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## 1.38083142 0.02795202 0.72748657 -0.48274634 -2.65772590 0.38678187
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 1.38083142 0.11234525 -2.29940851 0.37860961 -0.73310416
## GENE_2 0.02795202 -0.96746620 -1.84212152 0.83947867 -0.84077295
## GENE_3 0.72748657 1.65021429 -0.29649533 -0.98122026 -0.96431210
## GENE_4 -0.48274634 -1.38172755 -1.37907769 0.53896401 0.61715377
## GENE_5 -2.65772590 0.34393228 0.52446006 -1.17084123 -0.45635875
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 2.76166283 0.22469051 -4.59881702 . -0.5183980 1.6286270
## GENE_2 0.05590404 -1.93493239 -3.68424304 . -2.1865036 0.3211838
## GENE_3 1.45497313 3.30042858 -0.59299066 . -2.9482886 0.2029268
## GENE_4 -0.96549268 -2.76345510 -2.75815538 . -1.3749143 3.1949247
## GENE_5 -5.31545181 0.68786455 1.04892012 . 3.1928483 -2.7476005
## ... . . . . . .
## GENE_96 -1.68085700 -0.08528048 -1.67640321 . 1.6248515 -1.2193201
## GENE_97 3.09554162 1.25849051 -2.30701111 . -0.3932315 1.2574250
## GENE_98 1.19829317 0.97485456 -0.39264592 . 2.0254671 3.2368659
## GENE_99 -2.07132671 -0.35705944 -0.23766814 . -2.6385358 -1.6213419
## GENE_100 1.92078235 0.16461046 1.17802003 . -4.7271257 0.4032894
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7
## -15.634623 8.144075 -2.446777 -11.943200 -6.820110 -7.465234 -7.687889
## SAMP_8 SAMP_9 SAMP_10
## 1.457812 6.137428 17.114101
out %*% runif(ncol(out))
## [,1]
## GENE_1 0.977325786
## GENE_2 -0.352330171
## GENE_3 -1.058180084
## GENE_4 0.781628825
## GENE_5 1.879167046
## GENE_6 -1.097205868
## GENE_7 0.822702764
## GENE_8 -0.961042704
## GENE_9 -0.275206294
## GENE_10 -2.647718781
## GENE_11 2.435870495
## GENE_12 0.051938514
## GENE_13 -1.675637750
## GENE_14 2.120397959
## GENE_15 0.784310595
## GENE_16 0.738457159
## GENE_17 0.460796087
## GENE_18 -2.709968689
## GENE_19 -1.063648766
## GENE_20 -1.484448508
## GENE_21 -1.044001719
## GENE_22 -3.152664251
## GENE_23 0.838100026
## GENE_24 -1.550221409
## GENE_25 1.506130499
## GENE_26 0.257064720
## GENE_27 0.808565984
## GENE_28 1.147945433
## GENE_29 -1.075728429
## GENE_30 3.295278912
## GENE_31 2.103646631
## GENE_32 -0.481560026
## GENE_33 0.247948280
## GENE_34 -0.059023622
## GENE_35 2.774138223
## GENE_36 1.109725774
## GENE_37 0.981068307
## GENE_38 2.439823558
## GENE_39 -2.416270810
## GENE_40 3.280850287
## GENE_41 0.862182250
## GENE_42 1.383038645
## GENE_43 0.432670047
## GENE_44 0.335253085
## GENE_45 0.105310474
## GENE_46 1.030144627
## GENE_47 -1.079346467
## GENE_48 0.794620779
## GENE_49 -3.416657198
## GENE_50 -2.100890225
## GENE_51 0.829693666
## GENE_52 1.381881767
## GENE_53 -1.736469348
## GENE_54 -0.298212605
## GENE_55 0.495456988
## GENE_56 1.959833365
## GENE_57 -1.416548142
## GENE_58 -0.061356925
## GENE_59 -1.076574782
## GENE_60 -1.487970560
## GENE_61 -1.375909217
## GENE_62 1.162870910
## GENE_63 -2.252161740
## GENE_64 -0.311089835
## GENE_65 -2.613508706
## GENE_66 -1.905385111
## GENE_67 0.885706181
## GENE_68 1.597509874
## GENE_69 -1.006478769
## GENE_70 -2.255598250
## GENE_71 -1.903994425
## GENE_72 -0.962795841
## GENE_73 -0.364713763
## GENE_74 -0.041223195
## GENE_75 -2.871133330
## GENE_76 0.135781399
## GENE_77 1.161474064
## GENE_78 1.861421205
## GENE_79 0.370503314
## GENE_80 -2.913784342
## GENE_81 -0.870437542
## GENE_82 -0.358303159
## GENE_83 -0.066876806
## GENE_84 1.233836247
## GENE_85 -0.167944323
## GENE_86 1.922775445
## GENE_87 -0.729505395
## GENE_88 0.001137461
## GENE_89 0.441731608
## GENE_90 -0.304902582
## GENE_91 -0.055433200
## GENE_92 -0.171721783
## GENE_93 -0.307413755
## GENE_94 -2.927017075
## GENE_95 1.182634618
## GENE_96 -1.859619004
## GENE_97 -1.494893239
## GENE_98 0.766134047
## GENE_99 0.540590633
## GENE_100 -2.410936659
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.7007080 -0.2894618 -1.1555896 . 0.5858825 0.7694018
## [2,] 0.2805795 1.1564087 2.0067822 . -0.1386979 -0.8471824
## [3,] -2.4064770 -0.3508253 -0.7598218 . 2.6037669 1.3442125
## [4,] -1.0829533 0.5431087 -0.7205174 . 0.2765132 0.3419529
## [5,] -0.3333336 -0.8726008 1.4880949 . 0.5809294 -0.8857297
## ... . . . . . .
## [96,] -0.07450996 -0.73200729 -0.68242107 . -0.4687153 0.9380191
## [97,] 0.07299338 -1.20085845 1.07757277 . 1.1202139 -0.2574838
## [98,] 0.43846001 0.59021094 -0.16114681 . 1.3143597 -1.0806247
## [99,] 1.21969490 2.16030637 -0.03407063 . -0.4977162 -0.3597264
## [100,] 1.06547897 -0.24561029 -0.17459897 . 0.5275360 0.1874912
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.7007080 -0.2894618 -1.1555896 . 0.5858825 0.7694018
## [2,] 0.2805795 1.1564087 2.0067822 . -0.1386979 -0.8471824
## [3,] -2.4064770 -0.3508253 -0.7598218 . 2.6037669 1.3442125
## [4,] -1.0829533 0.5431087 -0.7205174 . 0.2765132 0.3419529
## [5,] -0.3333336 -0.8726008 1.4880949 . 0.5809294 -0.8857297
## ... . . . . . .
## [96,] -0.07450996 -0.73200729 -0.68242107 . -0.4687153 0.9380191
## [97,] 0.07299338 -1.20085845 1.07757277 . 1.1202139 -0.2574838
## [98,] 0.43846001 0.59021094 -0.16114681 . 1.3143597 -1.0806247
## [99,] 1.21969490 2.16030637 -0.03407063 . -0.4977162 -0.3597264
## [100,] 1.06547897 -0.24561029 -0.17459897 . 0.5275360 0.1874912
sessionInfo()
## R version 4.5.1 Patched (2025-08-23 r88802)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.22 TileDBArray_1.19.1 DelayedArray_0.35.3
## [4] SparseArray_1.9.1 S4Arrays_1.9.1 IRanges_2.43.5
## [7] abind_1.4-8 S4Vectors_0.47.4 MatrixGenerics_1.21.0
## [10] matrixStats_1.5.0 BiocGenerics_0.55.1 generics_0.1.4
## [13] Matrix_1.7-4 BiocStyle_2.37.1
##
## loaded via a namespace (and not attached):
## [1] bit_4.6.0 jsonlite_2.0.0 compiler_4.5.1
## [4] BiocManager_1.30.26 crayon_1.5.3 Rcpp_1.1.0
## [7] nanoarrow_0.7.0-1 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-7 R6_2.6.1
## [13] RcppCCTZ_0.2.13 XVector_0.49.1 tiledb_0.33.0
## [16] knitr_1.50 bookdown_0.45 bslib_0.9.0
## [19] rlang_1.1.6 cachem_1.1.0 xfun_0.53
## [22] sass_0.4.10 bit64_4.6.0-1 cli_3.6.5
## [25] spdl_0.0.5 digest_0.6.37 grid_4.5.1
## [28] lifecycle_1.0.4 data.table_1.17.8 evaluate_1.0.5
## [31] nanotime_0.3.12 zoo_1.8-14 rmarkdown_2.30
## [34] tools_4.5.1 htmltools_0.5.8.1