MeSH (Medical Subject Headings) is the NLM (U.S. National Library of Medicine) controlled vocabulary used to manually index articles for MEDLINE/PubMed. MeSH is comprehensive life science vocabulary. MeSH has 19 categories and MeSH.db contains 16 of them. That is:
| Abbreviation | Category |
|---|---|
| A | Anatomy |
| B | Organisms |
| C | Diseases |
| D | Chemicals and Drugs |
| E | Analytical, Diagnostic and Therapeutic Techniques and Equipment |
| F | Psychiatry and Psychology |
| G | Phenomena and Processes |
| H | Disciplines and Occupations |
| I | Anthropology, Education, Sociology and Social Phenomena |
| J | Technology and Food and Beverages |
| K | Humanities |
| L | Information Science |
| M | Persons |
| N | Health Care |
| V | Publication Type |
| Z | Geographical Locations |
MeSH terms were associated with Entrez Gene ID by three methods, gendoo, gene2pubmed and RBBH (Reciprocal Blast Best Hit).
| Method | Way of corresponding Entrez Gene IDs and MeSH IDs |
|---|---|
| Gendoo | Text-mining |
| gene2pubmed | Manual curation by NCBI teams |
| RBBH | sequence homology with BLASTP search (E-value<10-50) |
meshes supports enrichment analysis (over-representation analysis and gene set enrichment analysis) of gene list or whole expression profile using MeSH annotation. Data source from gendoo, gene2pubmed and RBBH are all supported. User can selecte interesting category to test. All 16 categories are supported. The analysis supports >70 species listed in MeSHDb BiocView.
For algorithm details, please refer to the vignettes of DOSE1 package.
library(meshes)
data(geneList, package="DOSE")
de <- names(geneList)[1:100]
x <- enrichMeSH(de, MeSHDb = "MeSH.Hsa.eg.db", database='gendoo', category = 'C')
head(x)
## ID Description GeneRatio BgRatio pvalue
## D043171 D043171 Chromosomal Instability 16/96 198/16528 2.794765e-14
## D000782 D000782 Aneuploidy 17/96 320/16528 3.866830e-12
## D042822 D042822 Genomic Instability 16/96 312/16528 3.007419e-11
## D012595 D012595 Scleroderma, Systemic 11/96 279/16528 6.449334e-07
## D009303 D009303 Nasopharyngeal Neoplasms 11/96 314/16528 2.049315e-06
## D019698 D019698 Hepatitis C, Chronic 11/96 317/16528 2.246856e-06
## p.adjust qvalue
## D043171 2.434241e-11 1.794534e-11
## D000782 1.684004e-09 1.241456e-09
## D042822 8.731539e-09 6.436931e-09
## D012595 1.404343e-04 1.035288e-04
## D009303 3.261686e-04 2.404530e-04
## D019698 3.261686e-04 2.404530e-04
## geneID
## D043171 4312/991/2305/1062/4605/10403/7153/55355/4751/4085/81620/332/7272/9212/1111/6790
## D000782 4312/55143/991/1062/7153/4751/79019/55839/890/983/4085/332/7272/9212/8208/1111/6790
## D042822 55143/991/1062/4605/7153/1381/9787/4751/10635/890/4085/81620/332/9212/1111/6790
## D012595 4312/6280/1062/4605/7153/3627/4283/6362/7850/3002/4321
## D009303 4312/7153/3627/6241/983/4085/5918/332/3002/4321/6790
## D019698 4312/3627/10563/6373/4283/983/6362/7850/332/3002/3620
## Count
## D043171 16
## D000782 17
## D042822 16
## D012595 11
## D009303 11
## D019698 11
In the over-representation analysis, we use data source from gendoo and C (Diseases) category.
In the following example, we use data source from gene2pubmed and test category G (Phenomena and Processes) using GSEA.
y <- gseMeSH(geneList, MeSHDb = "MeSH.Hsa.eg.db", database = 'gene2pubmed', category = "G")
## [1] "preparing geneSet collections..."
## [1] "GSEA analysis..."
## [1] "leading edge analysis..."
## [1] "done..."
head(y)
## ID Description setSize enrichmentScore
## D059647 D059647 Gene-Environment Interaction 456 -0.3543814
## D009929 D009929 Organ Size 451 -0.3449653
## D009043 D009043 Motor Activity 398 -0.3391521
## D050156 D050156 Adipogenesis 371 -0.3585031
## D004041 D004041 Dietary Fats 314 -0.3425194
## D049629 D049629 Waist-Hip Ratio 302 -0.3419178
## NES pvalue p.adjust qvalues rank
## D059647 -1.582649 0.001222494 0.03881504 0.02848429 2237
## D009929 -1.538603 0.001225490 0.03881504 0.02848429 2309
## D009043 -1.496853 0.001253133 0.03881504 0.02848429 1757
## D050156 -1.574589 0.001264223 0.03881504 0.02848429 2207
## D004041 -1.482771 0.001331558 0.03881504 0.02848429 1684
## D049629 -1.477004 0.001338688 0.03881504 0.02848429 2176
## leading_edge
## D059647 tags=26%, list=18%, signal=22%
## D009929 tags=26%, list=18%, signal=22%
## D009043 tags=21%, list=14%, signal=18%
## D050156 tags=26%, list=18%, signal=22%
## D004041 tags=21%, list=13%, signal=19%
## D049629 tags=26%, list=17%, signal=22%
## core_enrichment
## D059647 9497/118/8859/6532/23405/7424/2295/7157/8631/627/2774/22891/2908/4088/51151/11132/1387/860/268/7366/2104/4153/29119/3791/1543/3643/22841/1129/5624/3240/3174/3350/5590/55304/55213/1548/2169/196/8204/8863/5021/23284/9162/11005/4256/3426/84159/5334/629/1793/4208/4322/7048/6817/553/56172/3953/22795/2638/210/5243/5468/1393/1012/27136/51314/4023/5172/4319/4214/3952/5577/126/7832/79068/4313/2944/9369/3075/6720/7494/2099/857/57161/9223/4306/79750/4035/4915/10443/5744/5654/100126791/3551/2487/1746/185/2952/6935/4128/4059/4582/27324/9358/64084/7166/6505/9370/3708/3117/80129/125/5105/2018/2167/652/4137/1524/5241
## D009929 154/9846/3315/6716/9732/5139/7337/5530/4086/6532/1499/7157/627/2252/22891/2908/8654/4088/22846/4057/860/268/2735/2104/23522/5480/51131/3082/10253/831/604/1028/182/7173/5624/8743/23047/596/9905/1548/2272/22829/948/27303/4314/196/6019/595/5021/7248/4212/2488/54820/5334/6403/2246/4803/866/5919/79789/1907/7048/1831/4060/2247/5468/8076/5793/3485/1733/3952/126/3778/79068/79633/6653/5244/4313/3625/10468/9201/1501/6720/2273/2099/3480/5764/6387/1471/1462/4016/2690/8817/8821/5125/1191/5350/2162/5744/23541/185/367/4982/25802/4128/150/3479/10451/9370/125/4857/1308/2167/652/57502/4137/8614/5241
## D009043 23621/3082/1291/2915/1543/7466/3240/3350/55304/181/2169/27306/80169/9627/196/8678/8863/23284/81627/4692/5799/2259/3087/1278/1277/3953/4747/2247/6414/210/4744/5468/89795/4023/8522/3485/3952/79068/8864/4313/2944/2273/2099/3480/8528/4908/56892/3339/57161/4741/4306/6571/79750/4915/5744/2487/58503/347/6863/2952/5327/367/4982/4128/4059/3572/150/7060/9358/7166/3479/9254/5348/4129/9370/3708/1311/5105/4137/1408/5241
## D050156 5595/8609/9563/27332/1499/79738/4837/7157/79960/5729/408/2908/4088/6500/8038/4057/6649/5564/860/8648/10365/10253/54884/4602/7474/6776/79875/596/25956/8644/80781/79923/1490/50486/7840/84162/6041/4692/2246/4208/11075/63924/5919/284119/2308/9411/54795/5950/79365/2247/5468/50507/6469/8553/4023/594/7350/81029/3952/79068/5733/4313/10468/10628/6720/11213/55893/290/6678/63895/4035/633/23414/8639/2162/165/3551/10788/185/3357/367/4982/3667/1634/4128/23024/3479/6424/9370/2167/652/8839/54829/2625/79689/10974
## D004041 3554/4925/22841/7466/2181/3350/201134/181/2169/948/55911/324/4018/3426/3087/6785/2308/1581/56172/3953/1384/5950/2166/60481/5468/5166/50507/1012/27136/4023/7056/4214/9365/7350/3952/3778/79068/8864/2944/6720/5159/3991/2203/2819/9223/4035/32/213/165/347/2152/185/3487/5327/3667/54898/150/64084/3479/9370/5105/5174/2018/5346/7021/79689
## D049629 8609/9563/23405/10206/23314/4776/25970/627/2908/490/4057/268/3567/23429/283450/1543/3240/3174/81490/23047/55304/5099/54808/4179/2169/8082/4018/54465/4256/3087/5919/253461/26470/10903/1581/56172/3953/5950/2638/5468/1012/8835/4023/594/4214/7350/3952/79068/51232/2202/6444/9369/2099/3991/4016/57161/79750/4915/5125/5167/8639/11188/2487/2697/6935/3487/367/3667/4059/150/9358/3479/6424/9370/4629/652/5346/7021/4239
User can use visualization methods implemented in DOSE (i.e.barplot, dotplot, cnetplot, enrichMap, upsetplot and gseaplot) to visualize these enrichment results. With these visualization methods, it’s much easier to interpret enriched results.
dotplot(x)
gseaplot(y, y[1,1], title=y[1,2])
meshes implemented four IC-based methods (i.e. Resnik2, Jiang3, Lin4 and Schlicker5) and one graph-structure based method (i.e. Wang6). For algorithm details, please refer to the vignette of GOSemSim package7
meshSim function is designed to measure semantic similarity between two MeSH term vectors.
library(meshes)
## hsamd <- meshdata("MeSH.Hsa.eg.db", category='A', computeIC=T, database="gendoo")
data(hsamd)
meshSim("D000009", "D009130", semData=hsamd, measure="Resnik")
## [1] 0.2910261
meshSim("D000009", "D009130", semData=hsamd, measure="Rel")
## [1] 0.521396
meshSim("D000009", "D009130", semData=hsamd, measure="Jiang")
## [1] 0.4914785
meshSim("D000009", "D009130", semData=hsamd, measure="Wang")
## [1] 0.5557103
meshSim(c("D001369", "D002462"), c("D017629", "D002890", "D008928"), semData=hsamd, measure="Wang")
## D017629 D002890 D008928
## D001369 0.2886598 0.1923711 0.2193326
## D002462 0.6521739 0.2381925 0.2809552
geneSim function is designed to measure semantic similarity among two gene vectors.
geneSim("241", "251", semData=hsamd, measure="Wang", combine="BMA")
## [1] 0.487
geneSim(c("241", "251"), c("835", "5261","241", "994"), semData=hsamd, measure="Wang", combine="BMA")
## 835 5261 241 994
## 241 0.732 0.337 1.000 0.438
## 251 0.526 0.588 0.487 0.597
Here is the output of sessionInfo() on the system on which this document was compiled:
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.1 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] meshes_1.0.0 DOSE_3.0.0 MeSH.db_1.7.0
## [4] MeSH.Hsa.eg.db_1.7.0 MeSHDbi_1.10.0 BiocGenerics_0.20.0
## [7] BiocStyle_2.2.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.7 formatR_1.4 plyr_1.8.4
## [4] tools_3.3.1 digest_0.6.10 RSQLite_1.0.0
## [7] evaluate_0.10 tibble_1.2 gtable_0.2.0
## [10] igraph_1.0.1 fastmatch_1.0-4 DBI_0.5-1
## [13] yaml_2.1.13 fgsea_1.0.0 gridExtra_2.2.1
## [16] stringr_1.1.0 knitr_1.14 S4Vectors_0.12.0
## [19] IRanges_2.8.0 stats4_3.3.1 grid_3.3.1
## [22] qvalue_2.6.0 Biobase_2.34.0 data.table_1.9.6
## [25] AnnotationDbi_1.36.0 BiocParallel_1.8.0 GOSemSim_2.0.0
## [28] rmarkdown_1.1 reshape2_1.4.1 GO.db_3.4.0
## [31] DO.db_2.9 ggplot2_2.1.0 magrittr_1.5
## [34] splines_3.3.1 scales_0.4.0 htmltools_0.3.5
## [37] assertthat_0.1 colorspace_1.2-7 labeling_0.3
## [40] stringi_1.1.2 munsell_0.4.3 chron_2.3-47
1. Yu, G., Wang, L.-G., Yan, G.-R. & He, Q.-Y. DOSE: An r/Bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics 31, 608–609 (2015).
2. Philip, R. Semantic similarity in a taxonomy: An Information-Based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999).
3. Jiang, J. J. & Conrath, D. W. Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of 10th International Conference on Research In Computational Linguistics (1997). at <http://www.citebase.org/abstract?id=oai:arXiv.org:cmp-lg/9709008>
4. Lin, D. An Information-Theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning 296—304 (1998). doi:10.1.1.55.1832
5. Schlicker, A., Domingues, F. S., Rahnenfuhrer, J. & Lengauer, T. A new measure for functional similarity of gene products based on gene ontology. BMC Bioinformatics 7, 302 (2006).
6. Wang, J. Z., Du, Z., Payattakool, R., Yu, P. S. & Chen, C.-F. A new method to measure the semantic similarity of gO terms. Bioinformatics (Oxford, England) 23, 1274–81 (2007).
7. Yu, G. et al. GOSemSim: An r package for measuring semantic similarity among gO terms and gene products. Bioinformatics 26, 976–978 (2010).
8. Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: an r package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology 16, 284–287 (2012).
9. Yu, G. & He, Q.-Y. ReactomePA: An r/Bioconductor package for reactome pathway analysis and visualization. Mol. BioSyst. 12, 477–479 (2016).