healthyControlsPresenceChecker allows users to verify if a specific GEO dataset contains data of healthy controls amongside data of patients.
healthyControlsPresenceChecker 1.4.0
library(healthyControlsPresenceChecker)
#> Setting options('download.file.method.GEOquery'='auto')
#> Setting options('GEOquery.inmemory.gpl'=FALSE)Bioinformatics projects regarding the analysis of data of patients with cancer or other diseases often require the comparison between the results obtained on patients’ data and results obtained on healthy controls’ data. This step, although crucial, often cannot be performed if the dataset contains no healthy control data. Looking for datasets containing both these kinds of the data can be tedious, and checking a specific dataset can be time-consuming, too. Here we propose a software package that can immedaitely inform the user if data of healthy controls are present or not in a specific dataset.
healthyControlsPresenceChecker allows users to verify if a specific GEO dataset contains data of healthy controls amongside data of patients.
Once this package will be available on Bioconductor, it will be possibile to install it through the following commands.
Start R (version “4.1”) and enter:
if (!requireNamespace("BiocManager", quietly = TRUE))`
        `install.packages("BiocManager")
BiocManager::install("healthyControlsPresenceChecker")It will be possible to load the package with the following command:
library("healthyControlsPresenceChecker")The usage of healthyControlsPresenceChecker is very easy. The main function healthyControlsCheck() reads two input arguments: the GEO accession code of the dataset for which the user wants to verify the presence of the healthy controls, and a verbose flag.
For example, if the user wants to know if the GSE47407 dataset contains data of healthy controls, she/he can type on a terminal shell within the R environment:
outcomeGSE47407 <- healthyControlsCheck("GSE47407", TRUE)
#> Processed URL: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE47nnn/GSE47407
#> Found 1 file(s)
#> GSE47407_series_matrix.txt.gz
#> === === === === === GSE47407 === === === === ===
#> :: The keyword "healthy" was NOT found among the annotations of this dataset (GSE47407)
#> :: The keyword "control" was NOT found among the annotations of this dataset (GSE47407)
#> === === === === === === === === === === === ===
#> 
#> healthyControlsCheck() call output: were healthy controls found in the GSE47407 dataset? FALSEThe function will print all the intermediate messages, and eventually the outcomeGSE47407 variable will be true if healthy controls were found, or false otherwise.
This software was developed by Davide Chicco, who can be contacted via email at davidechicco(AT)davidechicco.it
Session Info
sessionInfo()
#> R version 4.3.0 RC (2023-04-13 r84269)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] healthyControlsPresenceChecker_1.4.0 BiocStyle_2.28.0                    
#> 
#> loaded via a namespace (and not attached):
#>  [1] limma_3.56.0              jsonlite_1.8.4           
#>  [3] dplyr_1.1.2               compiler_4.3.0           
#>  [5] BiocManager_1.30.20       tidyselect_1.2.0         
#>  [7] Biobase_2.60.0            xml2_1.3.3               
#>  [9] tidyr_1.3.0               jquerylib_0.1.4          
#> [11] geneExpressionFromGEO_0.9 yaml_2.3.7               
#> [13] fastmap_1.1.1             readr_2.1.4              
#> [15] R6_2.5.1                  generics_0.1.3           
#> [17] curl_5.0.0                GEOquery_2.68.0          
#> [19] knitr_1.42                BiocGenerics_0.46.0      
#> [21] tibble_3.2.1              bookdown_0.33            
#> [23] bslib_0.4.2               pillar_1.9.0             
#> [25] tzdb_0.3.0                R.utils_2.12.2           
#> [27] rlang_1.1.0               utf8_1.2.3               
#> [29] cachem_1.0.7              xfun_0.39                
#> [31] sass_0.4.5                cli_3.6.1                
#> [33] formatR_1.14              withr_2.5.0              
#> [35] magrittr_2.0.3            digest_0.6.31            
#> [37] hms_1.1.3                 lifecycle_1.0.3          
#> [39] R.oo_1.25.0               R.methodsS3_1.8.2        
#> [41] vctrs_0.6.2               evaluate_0.20            
#> [43] glue_1.6.2                data.table_1.14.8        
#> [45] fansi_1.0.4               rmarkdown_2.21           
#> [47] purrr_1.0.1               tools_4.3.0              
#> [49] pkgconfig_2.0.3           htmltools_0.5.5