R packages under analysis were retrieved from CRAN/Biocoductor on 2021-10-28. There are <%=sum(!grepl('bioconductor', df$repository))%> packages from CRAN and <%=length(grep('bioconductor', df$repository))%> packages from Bioconductor (bioc version 3.14).
In the DESCRIPTION
file of a package denoted as P, its direct dependency packages are listed in the Depends
, Imports
, LinkingTo
, Suggestes
and Enhances
fields. We define the following dependency categories for package P:
Depends
, Imports
and LinkingTo
fields (package category B in the following diagram, the same as the packages in the red box).Suggests
and Enhances
are also included (package category A, B, C and D). It simulates when all packages are put into Depends
/Imports
, the number of strong dependencies.Next we define various measures for heaviness:
n1 - n2
where n1
is
the number of parent packages for P and n2
is the number of
parent packages for P if moving A to Suggests
. In
other words, the heaviness measures the number of additional required packages that A
brings to P.n_1k
as the number of parent packages for package A_k and n_2k
as the number of parent packages for A_k if moving P to its Suggesets
, the heaviness
of P on its child packages is calculated as sum(n_1k - n_2k)/K
. So here the heaviness measures the average number of additional packages P brings to its child packages.When plotting the heaviness on child packages verse the number of child
packages (see the "Dependency plot" tab), since the heaviness here is
an averaged measure, it is easy to gain large value for small number of
child packages. Thus, when ordering the dependency table, packages on the top
with the highest heaviness values are most likely those with small number of
child dependencies (You can try to order the dependency table below by the
column "Heaviness on child packages"). These packages, although with high
heaviness, only contain very few child packages, which means, their effects on
other packages are very small. What is more important for this analysis is to pick those
packages which affect more other packages. Therefore, we adjusted the original
definition of "heaviness on children" to sum(n_1k - n_2k)/(10 + K)
where 10 is
an empirical value and it greatly decreases the heaviness for packages with small number of
children. The adjustment is done similarly for the heaviness on downsteam
packages.
Other measures are:
Loading plot...