1 scaeData

scaeData is a complementary package to the Bioconductor package SingleCellAlleleExperiment. It contains three datasets to be used when testing functions in SingleCellAlleleExperiment. These are:

  • 5k PBMCs of a healthy donor, 3’ v3 chemistry
  • 10k PBMCs of a healthy donor, 3’ v3 chemistry
  • 20k PBMCs of a healthy donor, 3’ v3 chemistry

The raw FASTQs for all three datasets were sourced from publicly accessible datasets provided by 10x Genomics.

After downloading the raw data, the scIGD Snakemake workflow was utilized to perform HLA allele-typing processes and generate allele-specific quantification from scRNA-seq data using donor-specific references.

2 Quick Start

2.1 Installation

From Bioconductor:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")

BiocManager::install("scaeData")

Alternatively, a development version is available on GitHub and can be installed via:

if (!require("devtools", quietly = TRUE))
    install.packages("devtools")

devtools::install_github("AGImkeller/scaeData", build_vignettes = TRUE)

3 Usage

The datasets within scaeData are accessible using the scaeDataGet() function:

library("scaeData")
pbmc_5k <- scaeDataGet("pbmc_5k")
## Retrieving barcode identifiers for **pbmc 5k** dataset...DONE
## Retrieving feature identifiers for **pbmc 5k** dataset...DONE
## Retrieving quantification matrix for **pbmc 5k** dataset...DONE
pbmc_10k <- scaeDataGet("pbmc_10k")
## Retrieving barcode identifiers for **pbmc 10k** dataset...DONE
## Retrieving feature identifiers for **pbmc 10k** dataset...DONE
## Retrieving quantification matrix for **pbmc 10k** dataset...DONE
pbmc_20k <- scaeDataGet("pbmc_20k")
## Retrieving barcode identifiers for **pbmc 20k** dataset...DONE
## Retrieving feature identifiers for **pbmc 20k** dataset...DONE
## Retrieving quantification matrix for **pbmc 20k** dataset...DONE
pbmc_20k
## $dir
## [1] "/home/pkgbuild/.cache/R/ExperimentHub/"
## 
## $barcodes
## [1] "ee29d1c053a26_9525"
## 
## $features
## [1] "ee29d4188244b_9526"
## 
## $matrix
## [1] "ee29d4ef56da2_9527"

For example, we can view pbmc_20k:

cells.dir <- file.path(pbmc_20k$dir, pbmc_20k$barcodes)
features.dir <- file.path(pbmc_20k$dir, pbmc_20k$features)
mat.dir <- file.path(pbmc_20k$dir, pbmc_20k$matrix)

cells <- utils::read.csv(cells.dir, sep = "", header = FALSE)
features <- utils::read.delim(features.dir, header = FALSE)
mat <- Matrix::readMM(mat.dir)

rownames(mat) <- cells$V1
colnames(mat) <- features$V1
head(mat)
## 6 x 62760 sparse Matrix of class "dgTMatrix"
##   [[ suppressing 34 column names 'ENSG00000279928.2', 'ENSG00000228037.1', 'ENSG00000142611.17' ... ]]
##                                                                               
## AAACCCAAGAAACACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AAACCCAAGAAACTCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AAACCCAAGAAACTGT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AAACCCAAGAAATTGC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AAACCCAAGAACAAGG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AAACCCAAGAACAGGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##                              
## AAACCCAAGAAACACT . . . ......
## AAACCCAAGAAACTCA . . . ......
## AAACCCAAGAAACTGT . . . ......
## AAACCCAAGAAATTGC . . . ......
## AAACCCAAGAACAAGG . . . ......
## AAACCCAAGAACAGGA . . . ......
## 
##  .....suppressing 62726 columns in show(); maybe adjust options(max.print=, width=)
##  ..............................

A SingleCellAlleleExperiment object, scae for short, can be generated using the read_allele_counts() function retrieved from the SingleCellAlleleExperiment package.

A lookup table corresponding to each dataset, facilitating the creation of relevant additional data layers during object generation, can be downloaded from the [GitHub repo] (https://github.com/AGImkeller/scaeData/tree/devel/inst/extdata):

scae_20k <- read_allele_counts(pbmc_20k$dir,
                               sample_names = "example_data",
                               filter = "yes",
                               exp_type = "WTA",
                               lookup_file = "pbmc_20k_lookup_table.csv",
                               barcode_file = pbmc_20k$barcodes,
                               gene_file = pbmc_20k$features,
                               matrix_file = pbmc_20k$matrix,
                               verbose = TRUE)

scae_20k

Session info

sessionInfo()
## R Under development (unstable) (2024-03-18 r86148)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] scaeData_0.99.0  BiocStyle_2.31.0
## 
## loaded via a namespace (and not attached):
##  [1] KEGGREST_1.43.0         xfun_0.42               bslib_0.6.1            
##  [4] lattice_0.22-6          Biobase_2.63.0          vctrs_0.6.5            
##  [7] tools_4.4.0             generics_0.1.3          stats4_4.4.0           
## [10] curl_5.2.1              tibble_3.2.1            fansi_1.0.6            
## [13] AnnotationDbi_1.65.2    RSQLite_2.3.5           blob_1.2.4             
## [16] pkgconfig_2.0.3         Matrix_1.6-5            dbplyr_2.5.0           
## [19] S4Vectors_0.41.5        lifecycle_1.0.4         GenomeInfoDbData_1.2.11
## [22] compiler_4.4.0          Biostrings_2.71.4       GenomeInfoDb_1.39.9    
## [25] htmltools_0.5.7         sass_0.4.9              yaml_2.3.8             
## [28] pillar_1.9.0            crayon_1.5.2            jquerylib_0.1.4        
## [31] cachem_1.0.8            mime_0.12               ExperimentHub_2.11.1   
## [34] AnnotationHub_3.11.3    tidyselect_1.2.1        digest_0.6.35          
## [37] dplyr_1.1.4             purrr_1.0.2             bookdown_0.38          
## [40] BiocVersion_3.19.1      grid_4.4.0              fastmap_1.1.1          
## [43] cli_3.6.2               magrittr_2.0.3          utf8_1.2.4             
## [46] withr_3.0.0             filelock_1.0.3          rappdirs_0.3.3         
## [49] bit64_4.0.5             rmarkdown_2.26          XVector_0.43.1         
## [52] httr_1.4.7              bit_4.0.5               png_0.1-8              
## [55] memoise_2.0.1           evaluate_0.23           knitr_1.45             
## [58] IRanges_2.37.1          BiocFileCache_2.11.1    rlang_1.1.3            
## [61] glue_1.7.0              DBI_1.2.2               BiocManager_1.30.22    
## [64] BiocGenerics_0.49.1     jsonlite_1.8.8          R6_2.5.1               
## [67] zlibbioc_1.49.3