The database manager is an API within OmnipathR which is able to load various datasets, keep track of their usage and remove them after an expiry period. Currently it supports a few Gene Ontology and UniProt datasets, but easily can be extended to cover all datasets in the package.
OmnipathR 3.8.2
1 Institute for Computational Biomedicine, Heidelberg University
To see a full list of datasets call the omnipath_show_db function:
library(OmnipathR)
omnipath_show_db()## # A tibble: 20 × 10
##    name                         last_used lifetime package loader loader_param latest_param loaded db    key  
##    <chr>                        <lgl>        <dbl> <chr>   <chr>  <list>       <lgl>        <lgl>  <lgl> <chr>
##  1 Gene Ontology (basic)        NA             300 Omnipa… go_on… <named list> NA           FALSE  NA    go_b…
##  2 Gene Ontology (full)         NA             300 Omnipa… go_on… <named list> NA           FALSE  NA    go_f…
##  3 Gene Ontology (AGR)          NA             300 Omnipa… go_on… <named list> NA           FALSE  NA    go_a…
##  4 Gene Ontology (Aspergillus)  NA             300 Omnipa… go_on… <named list> NA           FALSE  NA    go_a…
##  5 Gene Ontology (generic slim) NA             300 Omnipa… go_on… <named list> NA           FALSE  NA    go_s…
##  6 Gene Ontology (Candida)      NA             300 Omnipa… go_on… <named list> NA           FALSE  NA    go_c…
##  7 Gene Ontology (Drosphila)    NA             300 Omnipa… go_on… <named list> NA           FALSE  NA    go_d…
##  8 Gene Ontology (ChEMBL)       NA             300 Omnipa… go_on… <named list> NA           FALSE  NA    go_c…
##  9 Gene Ontology (metagenomic)  NA             300 Omnipa… go_on… <named list> NA           FALSE  NA    go_m…
## 10 Gene Ontology (plant)        NA             300 Omnipa… go_on… <named list> NA           FALSE  NA    go_p…
## 11 Gene Ontology (mouse)        NA             300 Omnipa… go_on… <named list> NA           FALSE  NA    go_m…
## 12 Gene Ontology (PIR)          NA             300 Omnipa… go_on… <named list> NA           FALSE  NA    go_p…
## 13 Gene Ontology (Pombe)        NA             300 Omnipa… go_on… <named list> NA           FALSE  NA    go_p…
## 14 Gene Ontology (yeast)        NA             300 Omnipa… go_on… <named list> NA           FALSE  NA    go_y…
## 15 GO annotations (human)       NA             300 Omnipa… go_an… <named list> NA           FALSE  NA    goa_…
## 16 UniProt-GeneSymbol table     NA             300 Omnipa… unipr… <named list> NA           FALSE  NA    up_gs
## 17 Ensembl organism names       NA           10800 Omnipa… taxon… <NULL>       NA           FALSE  NA    orga…
## 18 All SwissProt ACs            NA           10800 Omnipa… all_u… <named list> NA           FALSE  NA    swis…
## 19 All TrEMBL ACs               NA           10800 Omnipa… all_u… <named list> NA           FALSE  NA    trem…
## 20 OmniPath search index        NA             300 Omnipa… build… <named list> NA           FALSE  NA    sear…It returns a tibble where each dataset has a human readable name and a key which can be used to refer to it. We can also check here if the dataset is currently loaded, the time it’s been last used, the loader function and its arguments.
Datasets can be accessed by the get_db function. Ideally you should call
this function every time you use the dataset. The first time it will be
loaded, the subsequent times the already loaded dataset will be returned.
This way each access is registered and extends the expiry time. Let’s load
the human UniProt-GeneSymbol table. Above we see its key is up_gs.
up_gs <- get_db('up_gs')
up_gs## NULLThis dataset is a two columns data frame of SwissProt IDs and Gene Symbols.
Looking again at the datasets, we find that this dataset is loaded now and
the last_used timestamp is set to the time we called get_db:
omnipath_show_db()## # A tibble: 20 × 10
##    name               last_used           lifetime package loader loader_param latest_param loaded db    key  
##    <chr>              <dttm>                 <dbl> <chr>   <chr>  <list>       <list>       <lgl>  <lgl> <chr>
##  1 Gene Ontology (ba… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_b…
##  2 Gene Ontology (fu… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_f…
##  3 Gene Ontology (AG… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_a…
##  4 Gene Ontology (As… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_a…
##  5 Gene Ontology (ge… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_s…
##  6 Gene Ontology (Ca… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_c…
##  7 Gene Ontology (Dr… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_d…
##  8 Gene Ontology (Ch… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_c…
##  9 Gene Ontology (me… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_m…
## 10 Gene Ontology (pl… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_p…
## 11 Gene Ontology (mo… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_m…
## 12 Gene Ontology (PI… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_p…
## 13 Gene Ontology (Po… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_p…
## 14 Gene Ontology (ye… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_y…
## 15 GO annotations (h… NA                       300 Omnipa… go_an… <named list> <lgl [1]>    FALSE  NA    goa_…
## 16 UniProt-GeneSymbo… 2023-09-15 17:42:42      300 Omnipa… unipr… <named list> <named list> TRUE   NA    up_gs
## 17 Ensembl organism … NA                     10800 Omnipa… taxon… <NULL>       <lgl [1]>    FALSE  NA    orga…
## 18 All SwissProt ACs  NA                     10800 Omnipa… all_u… <named list> <lgl [1]>    FALSE  NA    swis…
## 19 All TrEMBL ACs     NA                     10800 Omnipa… all_u… <named list> <lgl [1]>    FALSE  NA    trem…
## 20 OmniPath search i… NA                       300 Omnipa… build… <named list> <lgl [1]>    FALSE  NA    sear…The above table contains also a reference to the dataset, and the arguments passed to the loader function:
d <- omnipath_show_db()
d %>% dplyr::pull(db) %>% magrittr::extract2(16)## [1] NAd %>% dplyr::pull(latest_param) %>% magrittr::extract2(16)## $to
## [1] "genesymbol"
## 
## $organism
## [1] 9606If we call get_db again, the timestamp is updated, resetting the expiry
counter:
up_gs <- get_db('up_gs')
omnipath_show_db()## # A tibble: 20 × 10
##    name               last_used           lifetime package loader loader_param latest_param loaded db    key  
##    <chr>              <dttm>                 <dbl> <chr>   <chr>  <list>       <list>       <lgl>  <lgl> <chr>
##  1 Gene Ontology (ba… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_b…
##  2 Gene Ontology (fu… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_f…
##  3 Gene Ontology (AG… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_a…
##  4 Gene Ontology (As… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_a…
##  5 Gene Ontology (ge… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_s…
##  6 Gene Ontology (Ca… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_c…
##  7 Gene Ontology (Dr… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_d…
##  8 Gene Ontology (Ch… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_c…
##  9 Gene Ontology (me… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_m…
## 10 Gene Ontology (pl… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_p…
## 11 Gene Ontology (mo… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_m…
## 12 Gene Ontology (PI… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_p…
## 13 Gene Ontology (Po… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_p…
## 14 Gene Ontology (ye… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  NA    go_y…
## 15 GO annotations (h… NA                       300 Omnipa… go_an… <named list> <lgl [1]>    FALSE  NA    goa_…
## 16 UniProt-GeneSymbo… 2023-09-15 17:42:52      300 Omnipa… unipr… <named list> <named list> TRUE   NA    up_gs
## 17 Ensembl organism … NA                     10800 Omnipa… taxon… <NULL>       <lgl [1]>    FALSE  NA    orga…
## 18 All SwissProt ACs  NA                     10800 Omnipa… all_u… <named list> <lgl [1]>    FALSE  NA    swis…
## 19 All TrEMBL ACs     NA                     10800 Omnipa… all_u… <named list> <lgl [1]>    FALSE  NA    trem…
## 20 OmniPath search i… NA                       300 Omnipa… build… <named list> <lgl [1]>    FALSE  NA    sear…The loaded datasets live in an environment which belong to the OmnipathR
package. Normally users don’t need to access this environment. As we see
below, omnipath_show_db presents us all information availble by directly
looking at the environment:
OmnipathR:::omnipath.env$db$up_gs## $name
## [1] "UniProt-GeneSymbol table"
## 
## $last_used
## [1] "2023-09-15 17:42:52 EDT"
## 
## $lifetime
## [1] 300
## 
## $package
## [1] "OmnipathR"
## 
## $loader
## [1] "uniprot_full_id_mapping_table"
## 
## $loader_param
## $loader_param$to
## [1] "genesymbol"
## 
## $loader_param$organism
## [1] 9606
## 
## 
## $latest_param
## $latest_param$to
## [1] "genesymbol"
## 
## $latest_param$organism
## [1] 9606
## 
## 
## $loaded
## [1] TRUEThe default expiry of datasets is given by the option omnipath.db_lifetime.
By calling omnipath_save_config this option is saved to the default config
file and will be valid in all subsequent sessions. Otherwise it’s valid only
in the current session.
options(omnipath.db_lifetime = 600)
omnipath_save_config()The built-in dataset definitions are in a JSON file shipped with the package. Easiest way to see it is by the git web interface.
Currently no API available for this, but it would be super easy to implement. It would be matter of providing a JSON similar to the above, or calling a function. Please open an issue if you are interested in this feature.
sessionInfo()## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB             
##  [4] LC_COLLATE=C               LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
## [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] OmnipathR_3.8.2  BiocStyle_2.28.1
## 
## loaded via a namespace (and not attached):
##  [1] rappdirs_0.3.3      sass_0.4.7          utf8_1.2.3          generics_0.1.3      tidyr_1.3.0        
##  [6] xml2_1.3.5          stringi_1.7.12      hms_1.1.3           digest_0.6.33       magrittr_2.0.3     
## [11] evaluate_0.21       bookdown_0.35       fastmap_1.1.1       cellranger_1.1.0    jsonlite_1.8.7     
## [16] progress_1.2.2      backports_1.4.1     BiocManager_1.30.22 httr_1.4.7          rvest_1.0.3        
## [21] purrr_1.0.2         fansi_1.0.4         jquerylib_0.1.4     cli_3.6.1           rlang_1.1.1        
## [26] crayon_1.5.2        bit64_4.0.5         withr_2.5.0         cachem_1.0.8        yaml_2.3.7         
## [31] parallel_4.3.1      tools_4.3.1         tzdb_0.4.0          checkmate_2.2.0     dplyr_1.1.3        
## [36] curl_5.0.2          vctrs_0.6.3         logger_0.2.2        R6_2.5.1            lifecycle_1.0.3    
## [41] stringr_1.5.0       bit_4.0.5           vroom_1.6.3         pkgconfig_2.0.3     pillar_1.9.0       
## [46] bslib_0.5.1         later_1.3.1         glue_1.6.2          Rcpp_1.0.11         xfun_0.40          
## [51] tibble_3.2.1        tidyselect_1.2.0    knitr_1.44          htmltools_0.5.6     igraph_1.5.1       
## [56] rmarkdown_2.24      readr_2.1.4         compiler_4.3.1      prettyunits_1.1.1   readxl_1.4.3