modelStudio - perks and features

Hubert Baniecki

2020-04-12

modelStudio::modelStudio computes various (instance and dataset level) model explanations and produces an interactive, customizable dashboard made with D3.js. It consists of multiple panels for plots with their short descriptions. Easily save and share the dashboard with others. Tools for model exploration unite with tools for EDA (Exploratory Data Analysis) to give a broad overview of the model behavior.

Let’s use HR dataset to explore modelStudio parameters:

train <- DALEX::HR
train$fired <- ifelse(train$status == "fired", 1, 0)
train$status <- NULL

head(train)
DALEX::HR dataset
gender age hours evaluation salary fired
male 32.58 41.89 3 1 1
female 41.21 36.34 2 5 1
male 37.71 36.82 3 0 1
female 30.06 38.96 3 2 1
male 21.10 62.15 5 3 0
male 40.12 69.54 2 0 1

Prepare HR_test data and a randomForest model for the explainer:

# fit a randomForest model
library("randomForest")
model <- randomForest(fired ~., data = train)

# prepare validation dataset
test <- DALEX::HR_test[1:1000,]
test$fired <- ifelse(test$status == "fired", 1, 0)
test$status <- NULL

# create an explainer for the model
explainer <- DALEX::explain(model,
                            data = test,
                            y = test$fired)

# start modelStudio
library("modelStudio")

modelStudio parameters

instance explanations

Pass data points to the new_observation parameter for instance explanations such as Break Down, Shapley Values and Ceteris Paribus Profiles. Use new_observation_y to show their true labels.

new_observation <- test[1:3,]
rownames(new_observation) <- c("John Snow", "Arya Stark", "Samwell Tarly")
true_labels <- test[1:3,]$fired

modelStudio(explainer,
            new_observation = new_observation,
            new_observation_y  = true_labels)

grid size

Achieve bigger or smaller modelStudio grid with facet_dim parameter.

# small dashboard with 2 panels
modelStudio(explainer,
            facet_dim = c(1,2))

# large dashboard with 9 panels
modelStudio(explainer,
            facet_dim = c(3,3))

animations

Manipulate time parameter to set animation length. Value 0 will make them invisible.

# slow down animations
modelStudio(explainer,
            time = 1000)

# turn off animations
modelStudio(explainer,
            time = 0)

more calculations means more time

N is a number of observations used for calculation of Partial Dependence and Accumulated Dependence Profiles. B is a number of permutation rounds used for calculation of Shapley Values and Feature Importance. Decrease N and B parameters to lower the computation time or increase them to get more accurate empirical results.

# faster, less precise
modelStudio(explainer,
            N = 200, B = 5)

# slower, more precise
modelStudio(explainer,
            N = 800, B = 25)

no EDA mode

Don’t compute the EDA plots if they are not needed. Set the eda parameter to FALSE.

modelStudio(explainer,
            eda = FALSE)

progress bar

Hide computation progress bar messages with show_info parameter.

modelStudio(explainer,
            show_info = FALSE)

viewer or browser?

Change viewer parameter to set where to display modelStudio. Best described in r2d3 documentation: r2d3 viewer argument.

modelStudio(explainer,
            viewer = "browser")

parallel computation

Speed up modelStudio computation by setting parallel parameter to TRUE. It uses parallelMap package to calculate local explainers faster. It is really useful when using modelStudio with complicated models, vast datasets or many observations are being processed.

All options can be set outside of function call. More on that here.

# set up the cluster
options(
  parallelMap.default.mode        = "socket",
  parallelMap.default.cpus        = 4,
  parallelMap.default.show.info   = FALSE
)

# calculations of local explanations will be distributed into 4 cores
modelStudio(explainer,
            new_observation = test[1:16,],
            parallel = TRUE)

plot options

Customize some of modelStudio looks by overwriting default options returned by the modelStudioOptions() function. Full list of options: documentation.

# set additional graphical parameters
new_options <- modelStudioOptions(
  show_subtitle = TRUE,
  bd_subtitle = "Hello World",
  line_size = 5,
  point_size = 9,
  line_color = "pink",
  point_color = "purple",
  bd_positive_color = "yellow",
  bd_negative_color = "orange"
)

modelStudio(explainer,
            options = new_options)

DALEXtra

Use explain_*() functions from the DALEXtra package to explain various models. Bellow basic example of making modelStudio for a mlr model using explain_mlr().

library(DALEXtra)
library(mlr)

# fit a model
task <- makeRegrTask(id = "task", data = train, target = "fired")

learner <- makeLearner("regr.randomForest", par.vals = list(ntree = 300), predict.type = "response")

model <- train(learner, task)

# create an explainer for the model
explainer_mlr <- explain_mlr(model,
                             data = test,
                             y = test$fired,
                             label = "mlr")

# make a studio for the model
modelStudio(explainer_mlr,
            B = 10)

References