--- title: "Loading and re-analysing public data through ReactomeGSA" author: "Johannes Griss" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Loading and re-analysing public data through ReactomeGSA} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction Since October 2023, ReactomeGSA was extended to simplify the reuse of public data. As key features, ReactomeGSA can now directly load data from **EBI's ExpressionAtlas**, and **NCBI's GREIN**. Both of these resources reprocess available public datasets using consistent pipelines. Additionally, a search function was integrated into ReactomeGSA that can search for datasets simultaneously in all of these supported resources. The ReactomeGSA R package now also has all required functions to directly access this web-based service. It is thereby possible to search for public datasets directly and download them as **ExpressionSet** objects. ## Installation The `ReactomeGSA` package can be directly installed from Bioconductor: ```{r, eval=FALSE } if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") if (!require(ReactomeGSA)) BiocManager::install("ReactomeGSA") ``` For more information, see https://bioconductor.org/install/. ## Searching for Public Datasets The `find_public_datasets` function uses ReactomeGSA's web service to search for public datasets in all supported resources. By default, the datasets are limited to human studies. This can be changed by setting the `species` parameter. The complete list of available species is returned by the `get_public_species` function. ```{r} library(ReactomeGSA) # get all available species found in the datasets all_species <- get_public_species() head(all_species) ``` The `search_term` parameter takes a single string as an argument. Words separated by a space are logically combined using an **AND**. ```{r} # search for datasets on BRAF and melanoma datasets <- find_public_datasets("melanoma BRAF") # the function returns the found datasets as a data.frame datasets[1:4, c("id", "title")] ``` ## Load a public dataset Datasets found through the `find_public_datasets` function can subsequently loaded using the `load_public_dataset` function. ```{r} # find the correct entry in the search result # this must be the complete row of the data.frame returned # by the find_public_datasets function dataset_search_entry <- datasets[datasets$id == "E-MTAB-7453", ] str(dataset_search_entry) ``` The selected dataset can now be loaded through the `load_public_dataset` function. ```{r} # this function only takes one argument, which must be # a single row from the data.frame returned by the # find_public_datasets function mel_cells_braf <- load_public_dataset(dataset_search_entry, verbose = TRUE) ``` The returned object is an `ExpressionSet` object that already contains all available metada. ```{r} # use the biobase functions to access the metadata library(Biobase) # basic metadata pData(mel_cells_braf) ``` Detailed descriptions of the loaded study are further stored in the metadata slot. ```{r} # access the stored metadata using the experimentData function experimentData(mel_cells_braf) # for some datasets, longer descriptions are available. These # can be accessed using the abstract function abstract(mel_cells_braf) ``` Additionally, you can use the `table` function to quickly get the number of available samples for a specific metadata field. ```{r} table(mel_cells_braf$compound) ``` ## Perform the pathway analysis using ReactomeGSA This object is now directly compatible with ReactomeGSA's pathway analysis functions. A detailed explanation of how to perform this analysis, please have a look at the respective vignette. ```{r} # create the analysis request my_request <-ReactomeAnalysisRequest(method = "Camera") # do not create a visualization for this example my_request <- set_parameters(request = my_request, create_reactome_visualization = FALSE) # add the dataset using the loaded object my_request <- add_dataset(request = my_request, expression_values = mel_cells_braf, name = "E-MTAB-7453", type = "rnaseq_counts", comparison_factor = "compound", comparison_group_1 = "PLX4720", comparison_group_2 = "none") my_request ``` The analysis can now started using the standard workflow: ```{r} # perform the analysis using ReactomeGSA res <- perform_reactome_analysis(my_request) # basic overview of the result print(res) # key pathways res_pathways <- pathways(res) head(res_pathways) ``` ## Session Info ```{r} sessionInfo() ```