SHAP (SHapley Additive exPlanations, see Lundberg and Lee (2017)) is an ingenious way to study
black box models. SHAP values decompose - as fair as possible -
predictions into additive feature contributions. Crunching SHAP values
requires clever algorithms by clever people. Analyzing them, however, is
super easy with the right visualizations. The package
shapviz
offers the latter:
sv_dependence()
: Dependence plots to study feature
effects (optionally colored by heuristically strongest interacting
feature).sv_importance()
: Importance plots (bar and/or beeswarm
plots) to study variable importance.sv_waterfall()
: Waterfall plots to study single
predictions.sv_force()
: Force plots as an alternative to waterfall
plots.These plots require a shapviz
object, which is built
from two things only:
S
: Matrix of SHAP valuesX
: Dataset with corresponding feature valuesFurthermore, a baseline
can be passed to represent an
average prediction on the scale of the SHAP values.
A key feature of the shapviz
package is that
X
is used for visualization only. Thus it is perfectly fine
to use factor variables, even if the underlying model would not accept
these.
To further simplify the use of shapviz
, we added direct
connectors to
# From CRAN
install.packages("shapviz")
# Or the newest version from GitHub:
# install.packages("devtools")
::install_github("shapviz") devtools
We start by fitting an XGBoost model to predict diamond prices based on the four “C” features.
library(shapviz)
library(ggplot2)
library(xgboost)
set.seed(3653)
<- diamonds[c("carat", "cut", "color", "clarity")]
X <- xgb.DMatrix(data.matrix(X), label = diamonds$price)
dtrain
<- xgb.train(
fit params = list(learning_rate = 0.1, objective = "reg:squarederror"),
data = dtrain,
nrounds = 65L
)
One line of code creates a shapviz
object. It contains
SHAP values and feature values for the set of observations we are
interested in. Note again that X
is solely used as
explanation dataset, not for calculating SHAP values.
In this example we construct the shapviz
object directly
from the fitted XGBoost model. Thus we also need to pass a corresponding
prediction dataset X_pred
used for calculating SHAP values
by XGBoost.
<- X[sample(nrow(X), 2000L), ]
X_small
# X is the "explanation" dataset using the original factors
<- shapviz(fit, X_pred = data.matrix(X_small), X = X_small) shp
The main idea behind SHAP values is to decompose, in a fair way, a prediction into additive contributions of each feature. Typical visualizations include waterfall plots and force plots:
sv_waterfall(shp, row_id = 1L) +
theme(axis.text = element_text(size = 11))
Works pretty sweet, and factor input is respected!
Alternatively, we can study a force plot:
sv_force(shp, row_id = 1L)
Studying SHAP decompositions of many observations allows to gain an impression on variable importance. As simple descriptive measure, the mean absolute SHAP value of each feature is considered. These values can be plotted as a simple bar plot, or, to add information on the sign of the feature effects, as a beeswarm plot sorted by the mean absolute SHAP values.
# A beeswarm plot
sv_importance(shp)
# Or much simpler: a bar plot of mean absolute SHAP values
sv_importance(shp, kind = "bar")
# Or both!
sv_importance(shp, kind = "both", alpha = 0.2, width = 0.2)
A SHAP beeswarm importance plot gives first hints on whether high feature values tend to high or low predictions. This impression can be substantiated by studying simple scatterplots of SHAP values of a feature against its feature values. A second feature can be added as color information to see whether the feature effect depends on the feature on the color scale or not. The stronger the vertical scatter for similar values on the x axis, the stronger the interactions.
sv_dependence(shp, v = "color", color_var = "auto")
sv_dependence(shp, v = "carat", color_var = "auto", alpha = 0.2, size = 1) +
guides(colour = guide_legend(override.aes = list(alpha = 1, size = 2)))
The above example uses XGBoost to calculate SHAP values. In the
following sections, we show (without running the code), how other
packages work together with shapviz
.
library(lightgbm)
<- lgb.Dataset(data.matrix(X), label = diamonds$price)
dtrain
<- lgb.train(
fit params = list(learning_rate = 0.1, objective = "regression"),
data = dtrain,
nrounds = 65L
)
<- shapviz(fit, X_pred = data.matrix(X_small), X = X_small) shp
library(fastshap)
<- lm(price ~ carat + clarity + cut + color, data = diamonds)
fit <- explain(fit, newdata = X_small, exact = TRUE)
explainer <- shapviz(explainer, X = X_small)
shp sv_dependence(shp, "carat")
library(treeshap)
library(catboost)
<- function(X) data.frame(data.matrix(X))
f
<- catboost.load_pool(data = f(X), label = diamonds$price)
X_cat <- catboost.train(
fit
X_cat, params = list(
loss_function = "RMSE",
iterations = 165,
logging_level = "Silent",
allow_writing_files = FALSE
)
)<- catboost.unify(fit, f(X))
unified_catboost <- treeshap(unified_catboost, f(X_small))
shaps <- shapviz(shaps, X = X_small)
shp sv_dependence(shp, "clarity", color_var = "auto", alpha = 0.2, size = 1)
The most general interface is to provide a matrix of SHAP values and corresponding feature values:
<- matrix(c(1, -1, -1, 1), ncol = 2, dimnames = list(NULL, c("x", "y")))
S <- data.frame(x = c("a", "b"), y = c(100, 10))
X <- shapviz(S, X, baseline = 4) shp
The plot functions work with one-dimensional model predictions only.
However, the wrappers for XGBoost and LightGBM allows to select the
categery of interest and work with its predicted (logit) probabilities,
simply by passing which_class
in the constructor.