In the context of this package, an “Adjusted Prediction” is defined as:
The response predicted by a model for some combination of the regressors’ values, such as their means or factor levels (a.k.a. “reference grid”).
An adjusted prediction is thus the regression-adjusted response variable (or link, or other fitted value), for a given combination (or grid) of predictors. This grid may or may not correspond to the actual observations in a dataset.
By default, predictions
calculates the regression-adjusted predicted values for a single hypothetical unit of observation with all regressors set at their means or modes:
library(marginaleffects)
lm(mpg ~ hp + factor(cyl), data = mtcars)
mod <-
predictions(mod)
#> type predicted std.error conf.low conf.high hp cyl
#> 1 expectation 16.60307 1.278754 13.98366 19.22248 146.6875 8
In many cases, this is too limiting, and researchers will want to specify a grid of “typical” values over which to compute adjusted predictions.
There are two main ways to select the reference grid over which we want to compute adjusted predictions. The first is using the variables
argument. The second is with the newdata
argument and the typical()
function that we already introduced in the marginal effects vignette.
variables
: Levels and Tukey’s 5 numbersThe variables
argument is a handy shortcut to create grids of predictors. Each of the levels of factor/logical/character variables listed in the variables
argument will be displayed. For numeric variables, predictions
will compute adjusted predictions at Tukey’s 5 summary numbers. All other variables will be set at their means or modes.
predictions(mod, variables = c("cyl", "hp"))
#> type predicted std.error conf.low conf.high cyl hp
#> 1 expectation 21.43244 1.6083883 18.137810 24.72708 6 52
#> 2 expectation 27.40010 1.0595843 25.229639 29.57056 4 52
#> 3 expectation 18.87925 2.5641372 13.626851 24.13165 8 52
#> 4 expectation 20.37474 1.2562453 17.801433 22.94804 6 96
#> 5 expectation 26.34239 0.9707174 24.353966 28.33081 4 96
#> 6 expectation 17.82154 1.9364843 13.854831 21.78825 8 96
#> 7 expectation 19.72569 1.1892191 17.289682 22.16169 6 123
#> 8 expectation 25.69334 1.1343184 23.369796 28.01689 4 123
#> 9 expectation 17.17249 1.5721502 13.952087 20.39289 8 123
#> 10 expectation 18.35547 1.4848895 15.313815 21.39713 6 180
#> 11 expectation 24.32313 1.7749372 20.687334 27.95892 4 180
#> 12 expectation 15.80228 0.9537705 13.848567 17.75599 8 180
#> 13 expectation 14.62945 3.4865450 7.487590 21.77132 6 335
#> 14 expectation 20.59711 4.0024363 12.398490 28.79573 4 335
#> 15 expectation 12.07626 2.1126445 7.748702 16.40381 8 335
The data.frame
produced by predictions
is “tidy”, which makes it easy to manipulate with other R
packages and functions:
library(kableExtra)
library(tidyverse)
predictions(mod, variables = c("cyl", "hp")) %>%
select(hp, cyl, predicted) %>%
pivot_wider(values_from = predicted, names_from = cyl) %>%
kbl(caption = "A table of Adjusted Predictions") %>%
kable_styling() %>%
add_header_above(header = c(" " = 1, "cyl" = 3))
hp | 6 | 4 | 8 |
---|---|---|---|
52 | 21.43244 | 27.40010 | 18.87925 |
96 | 20.37474 | 26.34239 | 17.82154 |
123 | 19.72569 | 25.69334 | 17.17249 |
180 | 18.35547 | 24.32313 | 15.80228 |
335 | 14.62945 | 20.59711 | 12.07626 |
newdata
and typical
A second strategy to construct grids of predictors for adjusted predictions is to combine the newdata
argument and the typical
function. Recall that this function creates a “typical” dataset with all variables at their means or modes, except those we explicitly define:
typical(cyl = c(4, 6, 8), model = mod)
#> hp cyl
#> 1 146.6875 4
#> 2 146.6875 6
#> 3 146.6875 8
We can also use this typical
function in a predictions
call (omitting the model
argument):
predictions(mod, newdata = typical(cyl = c(4, 6, 8)))
#> type predicted std.error conf.low conf.high hp cyl
#> 1 expectation 25.12392 1.368888 22.31988 27.92796 146.6875 4
#> 2 expectation 19.15627 1.247190 16.60151 21.71102 146.6875 6
#> 3 expectation 16.60307 1.278754 13.98366 19.22248 146.6875 8
First, we download the ggplot2movies
dataset from the RDatasets archive. Then, we create a variable called certified_fresh
for movies with a rating of at least 8. Finally, we discard some outliers and fit a logistic regression model:
library(tidyverse)
read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/ggplot2movies/movies.csv") %>%
dat <- mutate(style = case_when(Action == 1 ~ "Action",
== 1 ~ "Comedy",
Comedy == 1 ~ "Drama",
Drama TRUE ~ "Other"),
style = factor(style),
certified_fresh = rating >= 8) %>%
filter(length < 240)
glm(certified_fresh ~ length * style, data = dat, family = binomial) mod <-
We can plot adjusted predictions, conditional on the length
variable using the plot_cap
function:
glm(certified_fresh ~ length, data = dat, family = binomial)
mod <-
plot_cap(mod, condition = "length")
We can also introduce another condition which will display a categorical variable like style
in different colors. This can be useful in models with interactions:
glm(certified_fresh ~ length * style, data = dat, family = binomial)
mod <-
plot_cap(mod, condition = c("length", "style"))
Of course, you can also design your own plots or tables by working with the predictions
output directly:
predictions(mod,
type = c("response", "link"),
newdata = typical(length = 90:120,
style = c("Action", "Comedy"))) %>%
ggplot(aes(length, predicted, color = style)) +
geom_line() +
facet_wrap(~type, scales = "free_y")
The predictions
function computes model-adjusted means on the scale of the output of the predict(model)
function. By default, predict
produces predictions on the "response"
scale, so the adjusted predictions should be interpreted on that scale. However, users can pass a string or a vector of strings to the type
argument, and predictions
will consider different outcomes.
Typical values include "response"
and "link"
, but users should refer to the documentation of the predict
of the package they used to fit the model to know what values are allowable. documentation.
glm(am ~ mpg, family = binomial, data = mtcars)
mod <-predictions(mod, type = c("response", "link"))
#> type predicted std.error conf.low conf.high mpg
#> 1 expectation 0.3929 0.1083666 0.2098929 0.6118973 20.09062
Users who need more control over the type of adjusted predictions to compute, including a host of options for back-transformation, may want to consider the emmeans
package.