In the context of this package, “marginal means” refer to the values obtained by this three step process:
For example, consider a model with a numeric, a factor, and a logical predictor:
library(marginaleffects)
mtcars
dat <-$cyl <- as.factor(dat$cyl)
dat$am <- as.logical(dat$am)
dat lm(mpg ~ hp + cyl + am, data = dat) mod <-
Using the predictions
function, we set the hp
variable at its mean and compute predictions for all combinations for am
and cyl
:
predictions(mod, variables = c("am", "cyl"))
#> type predicted std.error conf.low conf.high hp am cyl
#> 1 expectation 21.03914 1.213043 18.55019 23.52810 146.6875 TRUE 6
#> 2 expectation 16.88129 1.272938 14.26944 19.49314 146.6875 FALSE 6
#> 3 expectation 24.96372 1.176830 22.54907 27.37838 146.6875 TRUE 4
#> 4 expectation 20.80587 1.756564 17.20169 24.41004 146.6875 FALSE 4
#> 5 expectation 21.43031 1.826126 17.68341 25.17721 146.6875 TRUE 8
#> 6 expectation 17.27245 1.116885 14.98079 19.56411 146.6875 FALSE 8
For illustration purposes, it is useful to reshape the above results:
cyl | TRUE | FALSE | Marginal mean of cyl |
---|---|---|---|
6 | 21.0 | 16.9 | 19.0 |
4 | 25.0 | 20.8 | 22.9 |
8 | 21.4 | 17.3 | 19.4 |
Marginal means of am | 22.5 | 18.3 |
The marginal means of am
and cyl
are obtained by taking the mean of the adjusted predictions across cells. The marginalmeans
function gives us the same results easily:
marginalmeans(mod)
#> term value predicted std.error
#> 1 cyl 4 22.88479 1.3566479
#> 2 cyl 6 18.96022 1.0729360
#> 3 cyl 8 19.35138 1.3770817
#> 4 am FALSE 18.31987 0.7853925
#> 5 am TRUE 22.47772 0.8343346
The same results can be obtained using the very powerful emmeans
package:
library(emmeans)
emmeans(mod, specs = "cyl")
#> cyl emmean SE df lower.CL upper.CL
#> 4 22.9 1.36 27 20.1 25.7
#> 6 19.0 1.07 27 16.8 21.2
#> 8 19.4 1.38 27 16.5 22.2
#>
#> Results are averaged over the levels of: am
#> Confidence level used: 0.95
emmeans(mod, specs = "am")
#> am emmean SE df lower.CL upper.CL
#> FALSE 18.3 0.785 27 16.7 19.9
#> TRUE 22.5 0.834 27 20.8 24.2
#>
#> Results are averaged over the levels of: cyl
#> Confidence level used: 0.95
The summary
, tidy
, and glance
functions are also available to summarize and manipulate the results:
marginalmeans(mod)
me <-
tidy(me)
#> term group estimate std.error statistic p.value conf.low conf.high
#> 1 cyl 4 22.88479 1.3566479 16.86863 0 20.22581 25.54378
#> 2 cyl 6 18.96022 1.0729360 17.67134 0 16.85730 21.06313
#> 3 cyl 8 19.35138 1.3770817 14.05246 0 16.65235 22.05041
#> 4 am FALSE 18.31987 0.7853925 23.32575 0 16.78053 19.85921
#> 5 am TRUE 22.47772 0.8343346 26.94090 0 20.84246 24.11299
glance(me)
#> r.squared adj.r.squared sigma statistic p.value df logLik AIC
#> 1 0.824875 0.7989306 2.70253 31.7939 7.400614e-10 4 -74.50167 161.0033
#> BIC deviance df.residual nobs F
#> 1 169.7978 197.199 27 32 31.7939
summary(me)
#> Estimated marginal means
#> Term Group Mean Std. Error z value Pr(>|z|) 2.5 % 97.5 %
#> 1 cyl 4 22.88 1.3566 16.87 < 2.22e-16 20.23 25.54
#> 2 cyl 6 18.96 1.0729 17.67 < 2.22e-16 16.86 21.06
#> 3 cyl 8 19.35 1.3771 14.05 < 2.22e-16 16.65 22.05
#> 4 am FALSE 18.32 0.7854 23.33 < 2.22e-16 16.78 19.86
#> 5 am TRUE 22.48 0.8343 26.94 < 2.22e-16 20.84 24.11
#>
#> Model type: lm
#> Prediction type: expectation
Thanks to those tidiers, we can also present the results in the style of a regression table using the modelsummary
package:
library("modelsummary")
modelsummary(me,
title = "Estimated Marginal Means",
estimate = "{estimate} ({std.error}){stars}",
statistic = NULL,
group = term + group ~ model)
Model 1 | ||
---|---|---|
cyl | 4 | 22.885 (1.357)*** |
6 | 18.960 (1.073)*** | |
8 | 19.351 (1.377)*** | |
am | FALSE | 18.320 (0.785)*** |
TRUE | 22.478 (0.834)*** | |
Num.Obs. | 32 | |
R2 | 0.825 | |
R2 Adj. | 0.799 | |
AIC | 161.0 | |
BIC | 169.8 | |
Log.Lik. | −74.502 | |
F | 31.794 |