The Poisson binomial distribution is becoming increasingly important, especially in the areas of statistics, finance, insurance mathematics and quality management. Still, only a few R
packages exist for its calculation, namely poibin
and poisbinom
. Both are based on Hong (2013). The first one implements the exact DFT-CF algorithm along with the exact recursive method, which is very memory-demanding, and Normal and Poisson approximations. The latter package only provides a more efficient DFT-CF implementation. Unfortunately, it sometimes yields negative probabilities, especially for large distributions. This numerical issue has not been addressed to date. Biscarri, Zhao & Brunner (2018) developed two more efficient procedures, but at the time of this writing, no package exists that implements them, because the authors published a reference implementation for R
, but refrained from releasing it as a package. In addition to the disadvantages regarding computational speed, especially for very large distributions, poibin
and poisbinom
do not provide headers to their internal C/C++ functions, so that they cannot be imported directly by C or C++ code of other packages that use for example Rcpp
. In our own project, we often have to deal with Poisson binomial distributions that include Bernoulli trials with \(p_i \in \{0, 1\}\). Computation can be further optimized by handling these trials before the actual computations. None of the aforementioned packages do that. That is why we decided to develop PoissonBinomial
. We needed a package that
While implementing the procedures of Biscarri, Zhao & Brunner (2018), it was decided to also include all methods that are described in Hong (2013), together with three additional binomial approximations. In total, the PoissonBinomial
package includes 10 different algorithms of computing the Poisson binomial distribution, including optimized versions of the Normal, Refined Normal and Poisson Approaches. The exact recursive procedure has been improved so it is hugely less memory-demanding: the poibin
implementation needs the memory equivalent of roughly \((n + 1)^2\) values of type double
, while ours only needs \(3 \cdot (n + 1)\).
In this package, the following exact algorithms for computing the Poisson Binomial distribution with Bernoulli probabilities \(p_1, ..., p_n\) are implemented:
Examples and performance comparisons of these procedures are provided in a separate vignette.
In addition, the following approximation methods are provided:
Again, examples and performance comparisons are for these approaches are presented in a separate vignette as well.
Unfortunately, some approximations do not work at all for Bernoulli trials with \(p_i = 1\). This is why handling these values before the actual computation of the distribution is not only a performance tweak, but sometimes even a necessity. It is achieved by some simple preliminary considerations:
These cases are illustrated in the following example:
library(PoissonBinomial)
# Case 1
dpbinom(NULL, rep(0.3, 7))
#> [1] 0.0823543 0.2470629 0.3176523 0.2268945 0.0972405 0.0250047 0.0035721
#> [8] 0.0002187
dbinom(0:7, 7, 0.3)
#> [1] 0.0823543 0.2470629 0.3176523 0.2268945 0.0972405 0.0250047 0.0035721
#> [8] 0.0002187
# equal results
dpbinom(NULL, c(0, 0, 0, 0, 0, 0, 0))
#> [1] 1 0 0 0 0 0 0 0
dpbinom(NULL, c(1, 1, 1, 1, 1, 1, 1))
#> [1] 0 0 0 0 0 0 0 1
# Case 2
dpbinom(NULL, c(0, 0, 0, 0, 1, 1, 1))
#> [1] 0 0 0 1 0 0 0 0
# Case 3
dpbinom(NULL, c(0, 0, 0.1, 0.2, 0.4, 0.8, 1))
#> [1] 0.0000 0.0864 0.4344 0.3784 0.0944 0.0064 0.0000 0.0000
dpbinom(NULL, c(0, 0, 0.4, 1))
#> [1] 0.0 0.6 0.4 0.0 0.0
How to import and use the internal C++ functions in Rcpp
based packages is described in a separate vignette.