visTree: Visualization of Subgroups for Decision Trees

Ashwini Venkatasubramaniam and Julian Wolfson

2018-10-31

visTree provides a visualization to characterize subgroups generated by a decision tree. Each individual terminal node identified by a decision tree corresponds to a subplot in the visualization.

Installation

The GitHub version:

#install_github("AshwiniKV/visTree")
library(visTree)

Load the BLSdata for the given examples and other relevant packages used for drawing trees. For this visTree package, the relevant packages are partykit and rpart.

data("blsdata")
library(partykit)
#> Loading required package: grid
#> Loading required package: libcoin
#> Loading required package: mvtnorm
library(rpart)
library(colorspace)

This document introduces you to the set of tools provided by the visTree package and provides examples of different scenarios that the package is able to developed to accommodate.

Example dataset

The example scenarios are illustrated by applications to the box lunch study dataset. This dataset is available within the package.

data("blsdata")

Outcome type

The visTree package is able to accommodate both continuous and categorical outcomes. An option interval = TRUE is utilized within the visTree function to display the relevant graphical output for a continuous outcome rather than a categorical outcome.

Continuous outcome

potentialtree<-ctree(kcal24h0~., data = newblsdata, control = ctree_control(mincriterion = 0.95))
visTree(potentialtree,  color.type = 1, alpha = 0.5)

Categorical outcome

potentialtree<-ctree(bin~hunger+rrvfood+resteating+liking+wanting+disinhibition, data = blsdataedit, control = ctree_control(mincriterion = 0.85))
visTree(potentialtree, interval = T)

Repeated Splits

This series of plots describe the splits leading to each subgroup and the splits need not necessarily be composed of different variables. The splits over multiple levels can be performed on the same variable and these are are summarized such that the resulting intervals are readily interpretable. The horizontal bars display these splits and the relevant criterions.

potentialtree<-ctree(kcal24h0~skcal+hunger+rrvfood+resteating+liking+wanting+age, data = blsdata, control = ctree_control(mincriterion = 0.95))
visTree(potentialtree)

Other Trees, (e.g., rpart)

The examples in this document have so far focused on scenarios described over conditional inference trees. The conditional inference tree is implemented as an object of class party using the partykit package. However, the visTree package is also able to accommodate other types of decision tree structures such as CART (implemented by the rpart package); CART is generated as an object of class rpart by the rpart package.

potentialtree<-rpart(kcal24h0~., data = newblsdata, control = rpart.control(cp = 0.015))
visTree(potentialtree)

Display controls

Text

The controls within the visTree function can be utilized to specify different text sizes for the title of the subplots (text.title), title of the histogram (text.main), axis labels placed at the tickmarks (text.axis), title labels for the axis (text.labels) and the splits placed on the horizontal bars (text.bar).

potentialtree<-ctree(kcal24h0~skcal+hunger+rrvfood+resteating+liking+wanting+age, data = blsdata, control = ctree_control(mincriterion = 0.95))
visTree(potentialtree, text.label = 1.5, text.title = 1.5, text.bar = 1.5, text.axis = 1.5, text.main = 1.5)

Axis

The axis for each subplot within the visualization is placed above the horizontal colored bars for the percentiles of relevant covariates and below the histogram/bar chart for the outcome values. Both these axes can be removed or placed as necessary using the options add.h.axis (associated with the colored bars) and add.p.axis (associated with the percentiles) within the visTree function.

potentialtree<-ctree(kcal24h0~skcal+hunger+rrvfood+resteating+liking+wanting+age, data = blsdata, control = ctree_control(mincriterion = 0.95))
visTree(potentialtree, add.h.axis = FALSE)
visTree(potentialtree, add.p.axis = FALSE)

Rounding the displayed split criterion

In addition to changing the size of the text placed on the bars, the number of decimal places can also be specified for the splitting criterions that are displayed on the horizontal bars. This is implemented using the option text.round in the visTree function.

potentialtree<-ctree(kcal24h0~skcal+hunger+rrvfood+resteating+liking+wanting+age, data = blsdata, control = ctree_control(mincriterion = 0.95))
visTree(potentialtree, text.round= 3, text.bar = 1.1)

Transparency

The transparency of the horizontal bars in each of the subplots can also be modified by specifying a value between 0 and 1 for alpha in the visTree function. As values get closer to 1, the opaqueness of the horizontal colored bars increases.

potentialtree<-ctree(kcal24h0~skcal+hunger+rrvfood+resteating+liking+wanting+age, data = blsdata, control = ctree_control(mincriterion = 0.95))
visTree(potentialtree, alpha = 0.8)
visTree(potentialtree, alpha = 0.3)

Density curve

The visualization tool accommodates continuous and categorical data. For continuous data, a density curve over the histogram can also be placed or removed from the lower part of the sub-plot.

potentialtree<-ctree(kcal24h0~skcal+hunger+rrvfood+resteating+liking+wanting+age, data = blsdata, control = ctree_control(mincriterion = 0.95))
visTree(potentialtree, density.line = FALSE)