An introduction to eulerr

Johan Larsson

2016-12-10

eulerr generates area-proportional euler diagrams that display set relationships (intersections, unions, and disjoints) with circles. Euler diagrams are Venn diagrams without the requirement that all set interactions be present (whether they are empty or not). That is, depending on input, eulerr will sometimes produce Venn diagrams but sometimes not.

Background

R features a number of packages that produce euler and/or venn diagrams; some of the more prominent ones (on CRAN) are

The last of these (venneuler) serves as the primary inspiration for this package, along with the refinements that Ben Fredrickson has presented on his blog and made available in his javascript venn.js.

venneuler, however, is written in java, preventing R users from browsing the source code (unless they are also literate in java) or contributing. Furthermore, venneuler is known to produce imperfect output for set configurations that have perfect solutions. Consider, for instance, the following example in which the intersection between A and B is both needless and unwanted.

library(venneuler,  quietly = TRUE)
venn_fit <- venneuler(c(A = 75, B = 50, "A&B" = 0))
par(mar = c(0, 0, 0, 0))
plot(venn_fit)
venneuler plot with unwanted overlap.

venneuler plot with unwanted overlap.

Enter eulerr

eulerr is based on the improvements to venneuler that Ben Fredrickson introduced with venn.js but has been recoded from scratch, uses different optimizers, and returns statistics featured in venneuler and eulerAPE.

Input

At the time of writing, it is possible to provide input to eulerr as either

library(eulerr)

# Input in the form of a named numeric vector
fit1 <- eulerr(c("A" = 25, "B" = 5, "C" = 5,
                 "A&B" = 5, "A&C" = 5, "B&C" = 3,
                 "A&B&C" = 3))

# Input as a matrix of logicals
set.seed(1)
mat <- cbind(
  A = sample(c(TRUE, TRUE, FALSE), size = 50, replace = TRUE),
  B = sample(c(TRUE, FALSE), size = 50, replace = TRUE),
  C = sample(c(TRUE, FALSE, FALSE, FALSE), size = 50, replace = TRUE)
)
fit2 <- eulerr(mat)

Fit

We inspect our results by printing the eulerr object

fit2
##       original fitted residuals region_error
## A           31 31.032    -0.032        0.000
## B           29 29.046    -0.046        0.000
## C           13 13.025    -0.025        0.000
## A&B         20 19.920     0.080        0.001
## A&C          6  5.894     0.106        0.001
## B&C          7  6.913     0.087        0.001
## A&B&C        5  5.175    -0.175        0.002
## 
## diagError:  0.002 
## stress:     0

or directly access and plot the residuals.

# Cleveland dot plot of the residuals
graphics::dotchart(resid(fit2))
abline(v = 0, lty = 3)
Residuals for the eulerr fit.

Residuals for the eulerr fit.

This shows us that the A&B&C intersection is somewhat overrepresented in fit2. Given that these residuals are on the scale of the original values, however, the residuals are arguably of little concern.

As an alternative, we could plot the circles in another program by retrieving their coordinates and radii

coef(fit2)
##           x           y         r
## A 0.6791992 0.447737822 0.7027710
## B 0.3341535 0.580278497 0.6799139
## C 0.1561640 0.003480332 0.4552978

Starting configuration

A starting configuration is obtained via a constrained version of multidimensional scaling that has been explained thoroughly [elsewhere]{http://www.benfrederickson.com/better-venn-diagrams/}.

Optimization

The starting configuration is based solely on the two-way relationships of the sets so has to be optimized for most set relationships. We try to optimize the coordinates and radii of the solution with the objective of producing a diagram that is as accurate as possible. In this context, however, accuracy is an ambigious objective that has produced a slew of proposals. In eulerr, the user can choose between the cost function of eulerAPE (the default) (Micallef and Rodgers (2014–17AD)) and the stress statistic of venneuler (Wilkinson (2012)).

eulerAPE’s cost function is computed by

\[\frac{1}{n} \sum_{i=1}^{n} \frac{(y_i - \hat{y}_i) ^ 2}{\hat{y}_i}\]

where \(\hat{y}_i\) in an estimate of \(y_i\) that is explored during optimization and \(n\) is the number of set relationships. venneuler’s stress function is defined as

\[\frac{\sum_{i=1}^{n} (y_i - \hat{y}_i) ^ 2}{\sum_{i=1}^{n} y_i ^ 2}\]

where \(\hat{y}_i\) is an OLS estimate from the regression of the fitted areas on the original areas that is being explored during optimization.

Goodness-of-fit

For goodness-of-fit measures, we the same stress statistic from venneuler that was used during optimization (Wilkinson (2012)) and the diagError statistic from eulerAPE (Micallef and Rodgers (2014–17AD)) which is

\[ \max_{i = 1, 2, \dots, n} \left| \frac{y_i}{\sum y_i} - \frac{\hat{y}_i}{\sum \hat{y}_i} \right|\] Our diagError is 0.0016 and our stress is 0, suggesting that the fit is accurate.

We can now be confident that eulerr provides a reasonable representation of our input. Were it otherwise, we would do best to stop here and look for another way to visualize our data. (I suggest the excellent UpSetR package.)

Plotting

No we get to the fun part: plotting our diagram. This is easy, as well as highly customizable, with eulerr.

plot(fit2)

# Change fill colors, border type (remove) and fontface.
plot(fit2,
     polygon_args = list(col = c("dodgerblue4", "darkgoldenrod1", "cornsilk4"),
                         border = "transparent"),
     text_args = list(font = 8))

eulerr plots can be modified in every possible way.eulerr plots can be modified in every possible way.

eulerr’s default color palette is taken from qualpalr – another package that I have developed – which uses color difference algorithms to generate distinct qualitative color palettes.

Acknowledgements

eulerr would not be possible without Ben Fredrickson’s work on venn.js or Leland Wilkinson’s venneuler.

References

Micallef, Luana, and Peter Rodgers. 2014–17AD. “EulerAPE: Drawing Area-Proportional 3-Venn Diagrams Using Ellipses.” PLOS ONE 9 (7): e101717. doi:10.1371/journal.pone.0101717.

Wilkinson, L. 2012. “Exact and Approximate Area-Proportional Circular Venn and Euler Diagrams.” IEEE Transactions on Visualization and Computer Graphics 18 (2): 321–31. doi:10.1109/TVCG.2011.56.