The human metabolomics database (HMDB, http://www.hmdb.ca) includes XML documents describing 114000 metabolites. We will show how to manipulate the metadata on metabolites fairly flexibly.
The hmdbQuery package includes a function for querying HMDB directly over HTTP:
The result is parsed and encapsulated in an S4 object
## HMDB metabolite metadata for 1-Methylhistidine:
## There are 10 diseases annotated.
## Direct association reported for 5 biospecimens and 2 tissues.
## Use diseases(), biospecimens(), tissues() for more information.The size of the complete import of information about a single metabolite suggests that it would not be too convenient to have comprehensive information about all HMDB constituents in memory. The most effective approach to managing the metadata will depend upon use cases to be developed over the long run.
Note however that this package does provide snapshots of certain direct associations derived from all available information as of Sept. 23 2017. Information about direct associations reported in the database is present in tables hmdb_disease, hmdb_gene, hmdb_protein, hmdb_omim. For example
## DataFrame with 75360 rows and 3 columns
##         accession                        name
##       <character>                 <character>
## 1     HMDB0000001           1-Methylhistidine
## 2     HMDB0000001           1-Methylhistidine
## 3     HMDB0000001           1-Methylhistidine
## 4     HMDB0000001           1-Methylhistidine
## 5     HMDB0000002          1,3-Diaminopropane
## ...           ...                         ...
## 75356 HMDB0094706                 Serylvaline
## 75357 HMDB0094708        Tetraethylene glycol
## 75358 HMDB0094712                Serylleucine
## 75359 HMDB0100002      TG(i-14:0/17:0/i-13:0)
## 75360 HMDB0101657 TG(15:0/i-14:0/a-21:0)[rac]
##                                                    disease
##                                                <character>
## 1                                      Alzheimer's disease
## 2                                 Diabetes mellitus type 2
## 3                                           Kidney disease
## 4                                                  Obesity
## 5     Perillyl alcohol administration for cancer treatment
## ...                                                    ...
## 75356                                                   NA
## 75357                                                   NA
## 75358                                                   NA
## 75359                                                   NA
## 75360                                                   NASome HMDB metabolites have been mapped to diseases.
## DataFrame with 10 rows and 4 columns
##           metabolite                   disease
##          <character>               <character>
## 1  1-Methylhistidine            Kidney disease
## 2  1-Methylhistidine        Early preeclampsia
## 3  1-Methylhistidine                 Pregnancy
## 4  1-Methylhistidine   Late-onset preeclampsia
## 5  1-Methylhistidine       Alzheimer's disease
## 6  1-Methylhistidine                   Obesity
## 7  1-Methylhistidine  Diabetes mellitus type 2
## 8  1-Methylhistidine        Propionic acidemia
## 9  1-Methylhistidine Maple syrup urine disease
## 10 1-Methylhistidine  Eosinophilic esophagitis
##                             pmids   accession
##                            <List> <character>
## 1  11380830,11418788,12865413,... HMDB0000001
## 2                        22494326 HMDB0000001
## 3     3252730,663967,12833386,... HMDB0000001
## 4                        23159745 HMDB0000001
## 5   17031479,11959400,8595727,... HMDB0000001
## 6   15899597,16253646,2401584,... HMDB0000001
## 7  15899597,11887176,16731998,... HMDB0000001
## 8   19809936,19551947,2226555,... HMDB0000001
## 9  12101068,10508118,10472531,... HMDB0000001
## 10                                HMDB0000001pmids = unlist(diseases(lk1)[1,]$pmids)
library(annotate)
pm = pubmed(pmids[1])
ab = buildPubMedAbst(xmlRoot(pm)[[1]])
ab## An object of class 'pubMedAbst':
## Title: Dimethylglycine accumulates in uremia and predicts elevated
##      plasma homocysteine concentrations.
## PMID: 11380830
## Authors: DO McGregor, WJ Dellow, M Lever, PM George, RA Robson, ST
##      Chambers
## Journal: Kidney Int
## Date: Jun 2001Note that pre HMDB v 4.0, biospecimens were called biofluids.
There are arbitrarily many biospecimen and tissue associations provided for each HMDB entry. We have direct accessors, and by default we capture all metadata, available through the store method.
## [1] "Blood"                     "Cerebrospinal Fluid (CSF)"
## [3] "Feces"                     "Saliva"                   
## [5] "Urine"## [1] "Muscle"          "Skeletal Muscle"## [1] "version"              "creation_date"        "update_date"         
## [4] "accession"            "status"               "secondary_accessions"## [1] 44##                                  protein 
##               "Beta-Ala-His dipeptidase" 
##                                  protein 
## "Protein arginine N-methyltransferase 3"## protein protein 
## "CNDP1" "PRMT3"