Contrasts

In a previous vignette, we introduced the “marginal effect” as a partial derivative. Since derivatives are only properly defined for continuous variables, we cannot use them to interpret the effects of changes in categorical variables. For this, we turn to contrasts between Adjusted predictions. In the context of this package, a “Contrast” is defined as:

The difference between two adjusted predictions, calculated for meaningfully different regressor values (e.g., College graduates vs. Others).

The marginaleffects() function automatically calculates contrasts instead of derivatives for factor, logical, or character variables. The comparisons() function gives users more powerful features to compute different contrasts.

Contrasts: logical and factor variables

Consider a simple model with a logical and a factor variable:

library(marginaleffects)
library(magrittr)

tmp <- mtcars
tmp$am <- as.logical(tmp$am)
mod <- lm(mpg ~ am + factor(cyl), tmp)

The marginaleffects function automatically computes contrasts for each level of the categorical variables, relative to the baseline category (FALSE for logicals, and the reference level for factors), while holding all other values at their mode or mean:

mfx <- marginaleffects(mod)
summary(mfx)
#> Average marginal effects 
#>   Term     Contrast  Effect Std. Error z value   Pr(>|z|)     2.5 % 97.5 %
#> 1   am TRUE - FALSE   2.560      1.298   1.973    0.04851   0.01675  5.103
#> 2  cyl        6 - 4  -6.156      1.536  -4.009 6.1077e-05  -9.16608 -3.146
#> 3  cyl        8 - 4 -10.068      1.452  -6.933 4.1147e-12 -12.91359 -7.222
#> 
#> Model type:  lm 
#> Prediction type:  response

The summary printed above says that moving from the reference category 4 to the level 6 on the cyl factor variable is associated with a change of -6.156 in the adjusted prediction. Similarly, the contrast from FALSE to TRUE on the am variable is equal to 2.560.

We can obtain different contrasts by using the comparisons() function. For example:

comparisons(mod, contrast_factor = "sequential") %>% tidy()
#>   term     contrast  estimate std.error statistic      p.value    conf.low
#> 1   am TRUE - FALSE  2.559954  1.297579  1.972869 4.851044e-02  0.01674586
#> 2  cyl        6 - 4 -6.156118  1.535723 -4.008612 6.107658e-05 -9.16607927
#> 3  cyl        8 - 6 -3.911442  1.470254 -2.660385 7.805144e-03 -6.79308707
#>   conf.high
#> 1  5.103162
#> 2 -3.146156
#> 3 -1.029797
comparisons(mod, contrast_factor = "pairwise") %>% tidy()
#>   term     contrast   estimate std.error statistic      p.value     conf.low
#> 1   am TRUE - FALSE   2.559954  1.297579  1.972869 4.851044e-02   0.01674586
#> 2  cyl        6 - 4  -6.156118  1.535723 -4.008612 6.107658e-05  -9.16607927
#> 3  cyl        8 - 4 -10.067560  1.452082 -6.933187 4.114709e-12 -12.91358877
#> 4  cyl        8 - 6  -3.911442  1.470254 -2.660385 7.805144e-03  -6.79308707
#>   conf.high
#> 1  5.103162
#> 2 -3.146156
#> 3 -7.221530
#> 4 -1.029797
comparisons(mod, contrast_factor = "reference") %>% tidy()
#>   term     contrast   estimate std.error statistic      p.value     conf.low
#> 1   am TRUE - FALSE   2.559954  1.297579  1.972869 4.851044e-02   0.01674586
#> 2  cyl        6 - 4  -6.156118  1.535723 -4.008612 6.107658e-05  -9.16607927
#> 3  cyl        8 - 4 -10.067560  1.452082 -6.933187 4.114709e-12 -12.91358877
#>   conf.high
#> 1  5.103162
#> 2 -3.146156
#> 3 -7.221530

For comparison, this code produces the same results using the emmeans package:

library(emmeans)
emm <- emmeans(mod, specs = "cyl")
contrast(emm, method = "revpairwise")
#>  contrast estimate   SE df t.ratio p.value
#>  6 - 4       -6.16 1.54 28  -4.009  0.0012
#>  8 - 4      -10.07 1.45 28  -6.933  <.0001
#>  8 - 6       -3.91 1.47 28  -2.660  0.0331
#> 
#> Results are averaged over the levels of: am 
#> P value adjustment: tukey method for comparing a family of 3 estimates

emm <- emmeans(mod, specs = "am")
contrast(emm, method = "revpairwise")
#>  contrast     estimate  SE df t.ratio p.value
#>  TRUE - FALSE     2.56 1.3 28   1.973  0.0585
#> 
#> Results are averaged over the levels of: cyl

Note that these commands also work on for other types of models, such as GLMs, on different scales:

mod_logit <- glm(am ~ factor(gear), data = mtcars, family = binomial)

comparisons(mod_logit) %>% tidy()
#>   term contrast  estimate    std.error    statistic      p.value  conf.low
#> 1 gear    4 - 3 0.6666667 1.360805e-01     4.899061 9.629569e-07 0.3999538
#> 2 gear    5 - 3 1.0000000 1.071403e-05 93335.529606 0.000000e+00 0.9999790
#>   conf.high
#> 1 0.9333795
#> 2 1.0000210

comparisons(mod_logit, type = "link") %>% tidy()
#>   term contrast estimate std.error   statistic   p.value   conf.low conf.high
#> 1 gear    4 - 3 21.25922  4577.962 0.004643817 0.9962948  -8951.381   8993.90
#> 2 gear    5 - 3 41.13214  9155.924 0.004492407 0.9964156 -17904.148  17986.41

Contrasts: numeric variables

We can also compute contrasts for differences in numeric variables. For example, we can see what happens to the adjusted predictions when we increment the hp variable by 1 unit (default) or by 5 units:

mod <- lm(mpg ~ hp, data = mtcars)

comparisons(mod) %>% tidy()
#>   term contrast    estimate std.error statistic      p.value    conf.low
#> 1   hp       +1 -0.06822828 0.0101193 -6.742389 1.558043e-11 -0.08806175
#>     conf.high
#> 1 -0.04839481

comparisons(mod, contrast_numeric = 5) %>% tidy()
#>   term contrast   estimate  std.error statistic      p.value   conf.low
#> 1   hp       +5 -0.3411414 0.05059652 -6.742389 1.558043e-11 -0.4403087
#>   conf.high
#> 1 -0.241974

Compare adjusted predictions for a change in the regressor between two arbitrary values:

comparisons(mod, contrast_numeric = c(90, 110)) %>% tidy()
#>   term contrast  estimate std.error statistic      p.value  conf.low  conf.high
#> 1   hp 110 - 90 -1.364566 0.2023861 -6.742389 1.558043e-11 -1.761235 -0.9678961

Compare adjusted predictions when the regressor changes across the interquartile range, across one or two standard deviations about its mean, or from across its full range:

comparisons(mod, contrast_numeric = "iqr") %>% tidy()
#>   term contrast  estimate std.error statistic      p.value  conf.low conf.high
#> 1   hp      IQR -5.697061 0.8449619 -6.742389 1.558043e-11 -7.353156 -4.040966

comparisons(mod, contrast_numeric = "sd") %>% tidy()
#>   term contrast  estimate std.error statistic      p.value  conf.low conf.high
#> 1   hp       sd -4.677926 0.6938085 -6.742389 1.558043e-11 -6.037766 -3.318087

comparisons(mod, contrast_numeric = "2sd") %>% tidy()
#>   term contrast  estimate std.error statistic      p.value  conf.low conf.high
#> 1   hp      2sd -9.355853  1.387617 -6.742389 1.558043e-11 -12.07553 -6.636174

comparisons(mod, contrast_numeric = "minmax") %>% tidy()
#>   term  contrast estimate std.error statistic      p.value  conf.low conf.high
#> 1   hp Max - Min -19.3086  2.863763 -6.742389 1.558043e-11 -24.92147 -13.69573

Contrasts with interactions

In models with multiplicative interactions, the contrasts of a categorical variable will depend on the values of the interacted variable:

mod_int <- lm(mpg ~ am * factor(cyl), tmp)

We can now use the newdata argument of the marginaleffects function to compute contrasts for different values of the other regressors. As in the marginal effects vignette, the datagrid function can be handy. Since we only care about the logical am contrast, we use the variables to indicate the subset of results to report:

comparisons(mod_int, newdata = datagrid(cyl = tmp$cyl), variables = "am")
#>    rowid term     contrast comparison std.error    am cyl
#> 1:     1   am TRUE - FALSE   1.441667  2.315925 FALSE   6
#> 2:     2   am TRUE - FALSE   5.175000  2.052848 FALSE   4
#> 3:     3   am TRUE - FALSE   0.350000  2.315925 FALSE   8

Once again, we obtain the same results with emmeans:

emm <- emmeans(mod_int, specs = "am", by = "cyl")
contrast(emm, method = "revpairwise")
#> cyl = 4:
#>  contrast     estimate   SE df t.ratio p.value
#>  TRUE - FALSE     5.17 2.05 26   2.521  0.0182
#> 
#> cyl = 6:
#>  contrast     estimate   SE df t.ratio p.value
#>  TRUE - FALSE     1.44 2.32 26   0.623  0.5390
#> 
#> cyl = 8:
#>  contrast     estimate   SE df t.ratio p.value
#>  TRUE - FALSE     0.35 2.32 26   0.151  0.8810

Complex queries

As described above, the marginaleffects package includes limited support to compute contrasts. Users who require more powerful features are encouraged to consider alternative packages such as emmeans, modelbased, or ggeffects. These packages offer useful features such as automatic back-transforms, p value correction for multiple comparisons, and more.