%\VignetteIndexEntry{How to Use pkgDepTools} %\VignetteDepends{Biobase, Rgraphviz} %\VignetteSuggests{RCurl} %\VignetteKeywords{graphs, dependency, DAG, package} %\VignettePackage{pkgDepTools} \documentclass[12pt]{article} \newcommand{\file}[1]{{\texttt{#1}}} \newcommand{\code}[1]{{\texttt{#1}}} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textsf{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\acronym}[1]{{\textsf{#1}}} \title{How to Use pkgDepTools} \author{Seth Falcon} \begin{document} \maketitle \SweaveOpts{keep.source=TRUE} \section{Introduction} The \Rpackage{pkgDepTools} package provides tools for computing and analyzing dependency relationships among R packages. With it, you can build a graph-based representation of the dependencies among all packages in a list of CRAN-style package repositories. There are utilities for computing installation order of a given package and, if the RCurl package is available, estimating the download size required to install a given package and its dependencies. This vignette demonstrates the basic features of the package. \section{Graph Basics} A graph consists of a set of nodes and a set of edges representing relationships between pairs of nodes. The relationships among the nodes of a graph are binary; either there is an edge between a pair of nodes or there is not. To model package dependencies using a graph, let the set of packages be the nodes of the graph with directed edges originating from a given package to each of its dependencies. Figure~\ref{fig:Category} shows a part of the Bioconductor dependency graph for to the \Rpackage{Category} package. Since circular dependencies are not allowed, the resulting dependency graph will be a directed acyclic graph (\acronym{DAG}). \section{Building a Dependency Graph} <>= options(width=72) @ <>= library("pkgDepTools") library("Biobase") library("Rgraphviz") @ The \Rfunction{makeDepGraph} function retrieves the meta data for all packages of a specified type (source, win.binary, or mac.binary) from each repository in a list of repository URLs and builds a \Rclass{graphNEL}\footnote{See \Robject{help("graphNEL-class")}} instance representing the packages and their dependency relationships. The function takes four arguments: 1) \Robject{repList} a character vector of \acronym{CRAN}-style package repository URLs; 2) \Robject{suggests.only} a logical value indicating whether the resulting graph should represent relations from the \code{Depends} field (\code{FALSE}, default) or the \code{Suggests} field (\code{TRUE}); 3) \Robject{type} a string indicating the type of packages to search for, the default is \code{getOption("pkgType")}; 4) \Robject{keep.builtin} which will keep packages that come with a standard R install in the dependency graph (the default is \Robject{FALSE}). Here we use \Rfunction{makeDepGraph} to build dependency graphs of the BioC and \acronym{CRAN} packages. Each dependency graph is a \Rclass{graphNEL} instance. The out-edges of a given node list its direct dependencies (as shown for package \Rpackage{annotate}). The node attribute ``size'' gives the size of the package in megabytes when the \Robject{dosize} argument is \Robject{TRUE} (this is the default). Obtaining the size of packages requires the \Rpackage{RCurl} package and can be time consuming for large repositories since a seprate HTTP request must be made for each package. In the examples below, we set \Robject{dosize=FALSE} to speed the computations. <>= library(BiocManager) biocUrl <- repositories()["BioCsoft"] biocDeps <- makeDepGraph(biocUrl, type="source", dosize=FALSE) @ % <>= biocDeps edges(biocDeps)["annotate"] ## if dosize=TRUE, size in MB is stored ## as a node attribute: ## nodeData(biocDeps, n="annotate", attr="size") @ \section{Using the Dependency Graph} The dependencies of a given package can be visualized using the graph generated by \Rfunction{makeDepGraph} and the \Rpackage{Rgraphviz} package. The graph shown in Figure~\ref{fig:Category} was produced using the code shown below. The \Rfunction{acc} method from the \Rpackage{graph} package returns a vector of all nodes that are accessible from the given node. Here, it has been used to obtain the complete list of \Rpackage{Category}'s dependencies. <>= categoryNodes <- c("Category", names(acc(biocDeps, "Category")[[1]])) categoryGraph <- subGraph(categoryNodes, biocDeps) nn <- makeNodeAttrs(categoryGraph, shape="ellipse") plot(categoryGraph, nodeAttrs=nn) @ \begin{figure}[hbt] \begin{center} \setkeys{Gin}{width=0.95\textwidth} \includegraphics{CategoryPlot} \end{center} \caption{The dependency graph for the \Rpackage{Category} package.} \label{fig:Category} \end{figure} In R, there is no easy to way to preview a given package's dependencies and estimate the amount of data that needs to be downloaded even though the \Rfunction{install.packages} function will search for and install package dependencies if you ask it to by specifying \code{dependencies=TRUE}. The \Rfunction{getInstallOrder} function provides such a ``preview''. For computing installation order, it is useful to have a single graph representing the relationships among all packages in all available repositories. Below, we create such a graph combining all CRAN and Bioconductor packages. <>= allDeps <- makeDepGraph(repositories(), type="source", keep.builtin=TRUE, dosize=FALSE) @ Calling \Rfunction{getInstallOrder} for package \Rpackage{GOstats}, we see a listing of only those packages that need to be installed. Your results will be different based upon your installed packages. <>= getInstallOrder("GOstats", allDeps) @ When \code{needed.only=FALSE}, the complete dependency list is returned regardless of what packages are currently installed. <>= getInstallOrder("GOstats", allDeps, needed.only=FALSE) @ The edge directions of the dependency graph can be reversed and the resulting graph used to determine the set of packages that make use of (even indirectly) a given package. For example, one might like to know which packages make use of the \Rpackage{methods} package. Here is one way to do that: <>= allDepsOnMe <- reverseEdgeDirections(allDeps) usesMethods <- dijkstra.sp(allDepsOnMe, start="methods")$distance usesMethods <- usesMethods[is.finite(usesMethods)] length(usesMethods) - 1 ## don't count methods itself table(usesMethods) @ <>= toLatex(sessionInfo()) @ \end{document}