GEOfastq can be installed from Bioconductor as follows:
The NCBI Gene Expression Omnibus (GEO) offers a convenient interface to explore high-throughput experimental data such as RNA-seq. GEO deposits RNA-seq data as sra files to the Sequence Read Archive (SRA) which can be converted to fastq files using fastq-dump. This conversion process can be quite slow and it is usually more convenient to download fastq files for a GEO accession generated by the European Nucleotide Archive (ENA). GEOfastq crawls GEO to retrieve metadata and ENA fastq urls, and then downloads them.
To get fastq data for a GEO series, we first retrieve the metadata for a GEO accession:
Next, we extract the sample accessions for this study and retrieve the GEO metadata and ENA fastq url for an example:
gsm_names <- extract_gsms(gse_text)
gsm_name <- gsm_names[182]
srp_meta <- crawl_gsms(gsm_name)
#> 1 GSMs to processNow that we have retrieved the necessary metadata, we are ready to download the fastq files for this sample:
data_dir <- tempdir()
# example using smaller file
srp_meta <- data.frame(
        run  = 'SRR014242',
        row.names = 'SRR014242',
        gsm_name = 'GSM315559',
        ebi_dir = get_dldir('SRR014242'), stringsAsFactors = FALSE)
res <- get_fastqs(srp_meta, data_dir)
#> Warning in utils::download.file(files[i], destfile): URL
#> 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR014/SRR014242/SRR014242.fastq.gz':
#> status was 'Failure when receiving data from the peer'
#> Warning: cannot open URL
#> 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR014/SRR014242/SRR014242.fastq.gz'The following package and versions were used in the production of this vignette.
#> R version 4.5.1 Patched (2025-08-23 r88802)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] GEOfastq_1.18.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.37     R6_2.6.1          codetools_0.2-20  fastmap_1.2.0    
#>  [5] doParallel_1.0.17 xfun_0.53         iterators_1.0.14  cachem_1.1.0     
#>  [9] parallel_4.5.1    knitr_1.50        RCurl_1.98-1.17   htmltools_0.5.8.1
#> [13] rmarkdown_2.30    lifecycle_1.0.4   bitops_1.0-9      cli_3.6.5        
#> [17] foreach_1.5.2     sass_0.4.10       jquerylib_0.1.4   compiler_4.5.1   
#> [21] plyr_1.8.9        tools_4.5.1       evaluate_1.0.5    bslib_0.9.0      
#> [25] Rcpp_1.1.0        yaml_2.3.10       rlang_1.1.6       jsonlite_2.0.0