Univariate, Bivariate and Trivariate Entropies

The univariate entropy for discrete variable \(X\) with \(r\) outcomes is defined by \[H(X) = \sum_x p(x) \log_2\frac{1}{p(x)} \] with which we can check for redundancy and uniformity: a discrete random variable with minimal zero entropy has no uncertainty and is always equal to the same single outcome. Thus, it is a constant that contributes nothing to further analysis and can be omitted. Maximum entropy is \(\log_2r\) and it corresponds to a uniform probability distribution over the outcomes.

The bivariate entropy for discrete variable \(X\) and \(Y\) is defined by \[H(X,Y) = \sum_x \sum_y p(x,y) \log_2\frac{1}{p(x,y)}\] with which we can check for redundancy, functional relationships and stochastic independence between pairs of variables. It is bounded according to \[H(X) \leq H(X,Y) \leq H(X)+H(Y)\] where we have

Note that when the bivariate entropy of two variables is equal to the univariate entropy of either one alone, then one of these variables should be omitted as they are redundant providing no additional information.

These results on bivariate entropies are directly linked to joint entropies and association graphs.

Similarly, trivariate entropies (and higher order entropies) allows us to check for functional relationships and stochastic independence between three (or more) variables. The trivariate entropy of three variables \(X\), \(Y\) and \(Z\) is defined by \[H(X,Y,Z) = \sum_x \sum_y \sum_z p(x,y,z) \log_2\frac{1}{p(x,y,z)}\]

and bounded by \[ H(X,Y) \leq H(X,Y,Z) \leq H(X,Z) + H(Y,Z) - H(Z). \]

The results on bivariate and trivariate entropies are directly linked to prediction power and expected conditional entropies.

Examples of computing univariate, bivariate and trivariate entropies are given in the following.


Example: univariate and bivariate entropies

library(netropy)

We create a dataframe dyad.var consisting of dyad variables as described and created in variable domains and data editing. Similar analyses can be perfomed on observed and/or transformed dataframes with vertex or triad variables.

head(dyad.var)
##   status gender office years age practice lawschool cowork advice friend
## 1      3      3      0     8   8        1         0      0      3      2
## 2      3      3      3     5   8        3         0      0      0      0
## 3      3      3      3     5   8        2         0      0      1      0
## 4      3      3      0     8   8        1         6      0      1      2
## 5      3      3      0     8   8        0         6      0      1      1
## 6      3      3      1     7   8        1         6      0      1      1

The function entropy_bivar() computes the bivariate entropies of all pairs of variables in the dataframe. The output is given as an upper triangular matrix with cells giving the bivariate entropies of row and column variables. The diagonal thus gives the univariate entropies for each variable in the dataframe:

entropy_bivar(dyad.var)
##           status gender office years   age practice lawschool cowork advice
## status     1.493  2.868  3.640 3.370 3.912    3.453     4.363  2.092  2.687
## gender        NA  1.547  3.758 3.939 4.274    3.506     4.439  2.158  2.785
## office        NA     NA  2.239 4.828 4.901    4.154     5.058  2.792  3.388
## years         NA     NA     NA 2.671 4.857    4.582     5.422  3.268  3.868
## age           NA     NA     NA    NA 2.801    4.743     5.347  3.411  4.028
## practice      NA     NA     NA    NA    NA    1.962     4.880  2.530  3.127
## lawschool     NA     NA     NA    NA    NA       NA     2.953  3.567  4.186
## cowork        NA     NA     NA    NA    NA       NA        NA  0.615  1.687
## advice        NA     NA     NA    NA    NA       NA        NA     NA  1.248
## friend        NA     NA     NA    NA    NA       NA        NA     NA     NA
##           friend
## status     2.324
## gender     2.415
## office     3.044
## years      3.483
## age        3.637
## practice   2.831
## lawschool  3.812
## cowork     1.456
## advice     1.953
## friend     0.881

Example: redundant variables

Bivariate entropies can be used to detect redundant variables that should be omitted from the dataframe for further analysis. When calculating bivariate entropies, one can check of whether the diagonal values are equal to any of the other values in the rows an columns. As seen above, the dataframe dyad.var has no redundant variables. This can also be checked using the function redundancy() which yields a binary matrix as output indicating which row and column variables are hold the same information:

redundancy(dyad.var)
## no redundant variables
## NULL

To illustrate an example with redundancy, we use the dataframe att.var with node attributes as described and created in variable domains and data editing. Note however that we now keep the variable senior in this dataframe:

head(att.var)
##   senior status gender office years age practice lawschool
## 1      1      0      1      0     2   2        1         0
## 2      2      0      1      0     2   2        0         0
## 3      3      0      1      1     1   2        1         0
## 4      4      0      1      0     2   2        0         2
## 5      5      0      1      1     2   2        1         1
## 6      6      0      1      1     2   2        1         0

Checking redundancy on this dataframe yields the following output:

redundancy(att.var)
##           senior status gender office years age practice lawschool
## senior         0      1      1      1     1   1        1         1
## status         0      0      0      0     0   0        0         0
## gender         0      0      0      0     0   0        0         0
## office         0      0      0      0     0   0        0         0
## years          0      0      0      0     0   0        0         0
## age            0      0      0      0     0   0        0         0
## practice       0      0      0      0     0   0        0         0
## lawschool      0      0      0      0     0   0        0         0

As seen, senior has been flagged as a redundant variable which is not surprising since it only consists of unique values. This redudancy can also be noted by computing the bivariate entropies and noting that the univariate entropy for this variable is equal to the bivariate entropies of pairs including this variable:

entropy_bivar(att.var)
##           senior status gender office years   age practice lawschool
## senior      6.15   6.15  6.150  6.150 6.150 6.150    6.150     6.150
## status        NA   1.00  1.695  2.084 2.007 2.276    1.981     2.459
## gender        NA     NA  0.817  1.927 2.226 2.383    1.799     2.323
## office        NA     NA     NA  1.125 2.693 2.668    2.088     2.607
## years         NA     NA     NA     NA 1.585 2.750    2.555     3.012
## age           NA     NA     NA     NA    NA 1.585    2.558     2.876
## practice      NA     NA     NA     NA    NA    NA    0.983     2.513
## lawschool     NA     NA     NA     NA    NA    NA       NA     1.533

Example: trivariate entropies

Trivariate entropies can be computed using the function entropy_trivar() which returns a dataframe with the first three columns representing possible triples of variables V1,V2, and V3 from the dataframe in question, and their entropies H(V1,V2,V3) as the fourth column. We illustrated this on the dataframe dyad.var:

entropy_trivar(dyad.var)
##            V1        V2        V3 H(V1,V2,V3)
## 1      status    gender    office       4.938
## 2      status    gender     years       4.609
## 3      status    gender       age       5.129
## 4      status    gender  practice       4.810
## 5      status    gender lawschool       5.664
## 6      status    gender    cowork       3.464
## 7      status    gender    advice       4.048
## 8      status    gender    friend       3.685
## 9      status    office     years       5.321
## 10     status    office       age       5.721
## 11     status    office  practice       5.528
## 12     status    office lawschool       6.303
## 13     status    office    cowork       4.165
## 14     status    office    advice       4.713
## 15     status    office    friend       4.378
## 16     status     years       age       5.430
## 17     status     years  practice       5.264
## 18     status     years lawschool       5.976
## 19     status     years    cowork       3.959
## 20     status     years    advice       4.535
## 21     status     years    friend       4.167
## 22     status       age  practice       5.832
## 23     status       age lawschool       6.305
## 24     status       age    cowork       4.498
## 25     status       age    advice       5.080
## 26     status       age    friend       4.695
## 27     status  practice lawschool       6.268
## 28     status  practice    cowork       3.989
## 29     status  practice    advice       4.537
## 30     status  practice    friend       4.258
## 31     status lawschool    cowork       4.957
## 32     status lawschool    advice       5.523
## 33     status lawschool    friend       5.162
## 34     status    cowork    advice       3.087
## 35     status    cowork    friend       2.867
## 36     status    advice    friend       3.360
## 37     gender    office     years       5.984
## 38     gender    office       age       6.277
## 39     gender    office  practice       5.641
## 40     gender    office lawschool       6.418
## 41     gender    office    cowork       4.301
## 42     gender    office    advice       4.873
## 43     gender    office    friend       4.539
## 44     gender     years       age       5.973
## 45     gender     years  practice       5.837
## 46     gender     years lawschool       6.558
## 47     gender     years    cowork       4.532
## 48     gender     years    advice       5.120
## 49     gender     years    friend       4.731
## 50     gender       age  practice       6.130
## 51     gender       age lawschool       6.654
## 52     gender       age    cowork       4.872
## 53     gender       age    advice       5.459
## 54     gender       age    friend       5.072
## 55     gender  practice lawschool       6.301
## 56     gender  practice    cowork       4.062
## 57     gender  practice    advice       4.638
## 58     gender  practice    friend       4.349
## 59     gender lawschool    cowork       5.044
## 60     gender lawschool    advice       5.632
## 61     gender lawschool    friend       5.266
## 62     gender    cowork    advice       3.217
## 63     gender    cowork    friend       2.983
## 64     gender    advice    friend       3.469
## 65     office     years       age       6.786
## 66     office     years  practice       6.552
## 67     office     years lawschool       7.259
## 68     office     years    cowork       5.344
## 69     office     years    advice       5.861
## 70     office     years    friend       5.528
## 71     office       age  practice       6.737
## 72     office       age lawschool       7.272
## 73     office       age    cowork       5.428
## 74     office       age    advice       5.988
## 75     office       age    friend       5.622
## 76     office  practice lawschool       6.876
## 77     office  practice    cowork       4.645
## 78     office  practice    advice       5.185
## 79     office  practice    friend       4.934
## 80     office lawschool    cowork       5.595
## 81     office lawschool    advice       6.149
## 82     office lawschool    friend       5.811
## 83     office    cowork    advice       3.798
## 84     office    cowork    friend       3.569
## 85     office    advice    friend       4.045
## 86      years       age  practice       6.624
## 87      years       age lawschool       7.187
## 88      years       age    cowork       5.442
## 89      years       age    advice       6.005
## 90      years       age    friend       5.618
## 91      years  practice lawschool       7.181
## 92      years  practice    cowork       5.117
## 93      years  practice    advice       5.665
## 94      years  practice    friend       5.360
## 95      years lawschool    cowork       5.999
## 96      years lawschool    advice       6.557
## 97      years lawschool    friend       6.174
## 98      years    cowork    advice       4.274
## 99      years    cowork    friend       4.020
## 100     years    advice    friend       4.505
## 101       age  practice lawschool       7.140
## 102       age  practice    cowork       5.290
## 103       age  practice    advice       5.849
## 104       age  practice    friend       5.538
## 105       age lawschool    cowork       5.940
## 106       age lawschool    advice       6.501
## 107       age lawschool    friend       6.108
## 108       age    cowork    advice       4.453
## 109       age    cowork    friend       4.191
## 110       age    advice    friend       4.672
## 111  practice lawschool    cowork       5.436
## 112  practice lawschool    advice       6.000
## 113  practice lawschool    friend       5.706
## 114  practice    cowork    advice       3.544
## 115  practice    cowork    friend       3.358
## 116  practice    advice    friend       3.810
## 117 lawschool    cowork    advice       4.613
## 118 lawschool    cowork    friend       4.381
## 119 lawschool    advice    friend       4.836
## 120    cowork    advice    friend       2.389

References

Frank, O., & Shafie, T. (2016). Multivariate entropy analysis of network data. Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique, 129(1), 45-63. link

Nowicki, K., Shafie, T., & Frank, O. (Forthcoming 2022). Statistical Entropy Analysis of Network Data.