xDAGsim | R Documentation |
xDAGsim
is supposed to calculate pair-wise semantic similarity
between input terms based on a direct acyclic graph (DAG) with
annotated data. It returns an object of class "igraph", a network
representation of input terms. Parallel computing is also supported for
Linux or Mac operating systems.
xDAGsim(g, terms = NULL, method.term = c("Resnik", "Lin", "Schlicker", "Jiang", "Pesquita"), fast = T, parallel = TRUE, multicores = NULL, verbose = T)
g |
an object of class "igraph". It must contain a vertex attribute called 'anno' for storing annotation data (see example for howto) |
terms |
the terms/nodes between which pair-wise semantic similarity is calculated. If NULL, all terms in the input DAG will be used for calcluation, which is very prohibitively expensive! |
method.term |
the method used to measure semantic similarity between input terms. It can be "Resnik" for information content (IC) of most informative common ancestor (MICA) (see http://dl.acm.org/citation.cfm?id=1625914), "Lin" for 2*IC at MICA divided by the sum of IC at pairs of terms, "Schlicker" for weighted version of 'Lin' by the 1-prob(MICA) (see http://www.ncbi.nlm.nih.gov/pubmed/16776819), "Jiang" for 1 - difference between the sum of IC at pairs of terms and 2*IC at MICA (see http://arxiv.org/pdf/cmp-lg/9709008.pdf), "Pesquita" for graph information content similarity related to Tanimoto-Jacard index (ie. summed information content of common ancestors divided by summed information content of all ancestors of term1 and term2 (see http://www.ncbi.nlm.nih.gov/pubmed/18460186)). By default, it uses "Schlicker" method |
fast |
logical to indicate whether a vectorised fast computation is used. By default, it sets to true. It is always advisable to use this vectorised fast computation; since the conventional computation is just used for understanding scripts |
parallel |
logical to indicate whether parallel computation with
multicores is used. By default, it sets to true, but not necessarily
does so. Partly because parallel backends available will be
system-specific (now only Linux or Mac OS). Also, it will depend on
whether these two packages "foreach" and "doMC" have been installed. It
can be installed via:
|
multicores |
an integer to specify how many cores will be registered as the multicore parallel backend to the 'foreach' package. If NULL, it will use a half of cores available in a user's computer. This option only works when parallel computation is enabled |
verbose |
logical to indicate whether the messages will be displayed in the screen. By default, it sets to true for display |
It returns an object of class "igraph", with nodes for input terms and edges for pair-wise semantic similarity between terms.
none
xDAGanno
, xConverter
## Not run: # 1) SNP-based ontology # 1a) ig.EF (an object of class "igraph" storing as a directed graph) g <- xRDataLoader('ig.EF') g # 1b) load GWAS SNPs annotated by EF (an object of class "dgCMatrix" storing a spare matrix) anno <- xRDataLoader(RData='GWAS2EF') # 1c) prepare for ontology and its annotation information dag <- xDAGanno(g=g, annotation=anno, path.mode="all_paths", true.path.rule=TRUE, verbose=TRUE) # 1d) calculate pair-wise semantic similarity between 5 randomly chosen terms terms <- sample(V(dag)$name, 5) sim <- xDAGsim(g=dag, terms=terms, method.term="Schlicker", parallel=FALSE) sim ########################################################### # 2) Gene-based ontology # 2a) ig.MP (an object of class "igraph" storing as a directed graph) g <- xRDataLoader('ig.MP') # 2b) load human genes annotated by MP (an object of class "GS" containing the 'gs' component) GS <- xRDataLoader(RData='org.Hs.egMP') anno <- GS$gs # notes: This is a list # 2c) prepare for annotation data dag <- xDAGanno(g=g, annotation=anno, path.mode="all_paths", true.path.rule=TRUE, verbose=TRUE) # 2d) calculate pair-wise semantic similarity between 5 randomly chosen terms terms <- sample(V(dag)$name, 5) sim <- xDAGsim(g=dag, terms=terms, method.term="Schlicker", parallel=FALSE) sim ## End(Not run)