--- title: "Work with other organisms" author: "Zuguang Gu ( z.gu@dkfz.de )" date: '`r Sys.Date()`' output: html_document: toc: true toc_depth: 3 toc_collapsed: false toc_float: true vignette: > %\VignetteIndexEntry{3. Work with other organisms} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, echo = FALSE, message = FALSE} library(knitr) knitr::opts_chunk$set( error = FALSE, tidy = FALSE, message = FALSE, warning = FALSE, fig.align = "center") ``` ## Use TxDb packages In the Bioconductor annotation ecosystem, there are **TxDb.\*** packages which provide data for Gene Ontology gene sets. The **TxDb.\*** packages supported in **rGREAT** are: ```{r} library(rGREAT) rGREAT:::BIOC_ANNO_PKGS$txdb ``` To perform GREAT anlaysis with GO gene sets for other organisms, you can either specify the genome version: ```{r, eval = FALSE} great(gr, "GO:BP", "galGal6") ``` or with the full name of the corresponding TxDb package: ```{r, eval = FALSE} great(gr, "GO:BP", "TxDb.Ggallus.UCSC.galGal6.refGene") ``` These two are internally the same. ## Use BioMart GO gene sets You can specify a BioMart dataset (which corresponds to a specific organism), e.g.: ```{r, eval = FALSE} # Giant panda great(gr, "GO:BP", biomart_dataset = "amelanoleuca_gene_ensembl") ``` A full list of supported BioMart datasets (organisms) can be found with the function `BioMartGOGeneSets::supportedOrganisms()`. ## Use MSigDB gene sets MSigDB contains gene sets only for human, but it can be extended to other organisms by mapping to the homologues genes. The package [**msigdbr**](https://cran.r-project.org/web/packages/msigdbr/index.html) has already mapped genes to many other organisms. A full list of supported organisms in **msigdbr** can be obtained by: ```{r} library(msigdbr) msigdbr_species() ``` To obtain gene sets for non-human organisms, e.g.: ```{r} h_gene_sets = msigdbr(species = "chimpanzee", category = "H") head(h_gene_sets) ``` If the organism you selected has a corresponding TxDb package available which provides TSS information, you need to make sure the gene sets use Entrez gene ID as the identifier (Most TxDb packages use Entrez ID as primary ID, you can check the variable `rGREAT:::BIOC_ANNO_PKGS`). ```{r} # convert to a list of gene sets h_gene_sets = split(h_gene_sets$entrez_gene, h_gene_sets$gs_name) h_gene_sets = lapply(h_gene_sets, as.character) # just to make sure gene IDs are all in character. h_gene_sets[1:2] ``` Now we can perform the local GREAT analysis. ```{r, eval = FALSE} great(gr, h_gene_sets, "panTro6") ``` ## Self-define TSS and gene sets Since `great()` allows both self-defined TSS and gene sets, this means `great()` can be independent to organisms. Please refer to the vignette ["Analyze with local GREAT"](local-GREAT.html#manually-set-gene-sets-and-transcriptome-annotations) to find out how to manuallly set both TSS and gene sets.