The mushrooms
dataset contains data about mushrooms. The goal of our model is to predict which mushrooms are poisonous.
First, we’ll create a training and test dataset:
set.seed(100)
train.cases <- sample(c(T, F), size = nrow(mushrooms), replace = T)
mushrooms.train <- mushrooms[train.cases,]
mushrooms.test <- mushrooms[train.cases == F,]
mushrooms.fft <- fft(formula = poisonous ~.,
data = mushrooms.train,
data.test = mushrooms.test)
Here is a summary of the final 8 trees:
summary(mushrooms.fft)
## tree.num level.name level.class
## 1 1 odor;sporepc;gcolor;ringtype factor;factor;factor;factor
## 3 2 odor;sporepc factor;factor
## 5 3 odor;sporepc;gcolor factor;factor;factor
## 7 4 odor;sporepc factor;factor
## 2 5 odor;sporepc factor;factor
## 6 6 odor;sporepc factor;factor
## 4 7 odor;sporepc;gcolor factor;factor;factor
## 8 8 odor;sporepc;gcolor;ringtype factor;factor;factor;factor
## level.exit level.threshold level.sigdirection
## 1 0;0;0;0.5 n,l,a;n,k,o,u,y,b;n,w,u,k,p,e,y,o;l,e,n !=;!=;!=;=
## 3 0;0.5 n,l,a;n,k,o,u,y,b !=;!=
## 5 0;0;0.5 n,l,a;n,k,o,u,y,b;n,w,u,k,p,e,y,o !=;!=;!=
## 7 0;0.5 n,l,a;n,k,o,u,y,b !=;!=
## 2 1;0.5 n,l,a;n,k,o,u,y,b !=;!=
## 6 1;0.5 n,l,a;n,k,o,u,y,b !=;!=
## 4 1;1;0.5 n,l,a;n,k,o,u,y,b;n,w,u,k,p,e,y,o !=;!=;!=
## 8 1;1;1;0.5 n,l,a;n,k,o,u,y,b;n,w,u,k,p,e,y,o;l,e,n !=;!=;!=;=
## n.train hi.train mi.train fa.train cr.train hr.train far.train
## 1 4093 1297 661 0 2135 0.6624106 0.0000000
## 3 4093 1670 288 0 2135 0.8529111 0.0000000
## 5 4093 1339 619 0 2135 0.6838611 0.0000000
## 7 4093 1670 288 0 2135 0.8529111 0.0000000
## 2 4093 1958 0 344 1791 1.0000000 0.1611241
## 6 4093 1958 0 344 1791 1.0000000 0.1611241
## 4 4093 1958 0 514 1621 1.0000000 0.2407494
## 8 4093 1958 0 790 1345 1.0000000 0.3700234
## v.train dprime.train n.test hi.test mi.test fa.test cr.test hr.test
## 1 0.6622521 2.048976 4031 1295 663 0 2073 0.6613892
## 3 0.8527040 2.363812 4031 1678 280 0 2073 0.8569969
## 5 0.6836971 2.078703 4031 1349 609 0 2073 0.6889683
## 7 0.8527040 2.363812 4031 1678 280 0 2073 0.8569969
## 2 0.8386689 2.323205 4031 1958 0 280 1793 1.0000000
## 6 0.8386689 2.323205 4031 1958 0 280 1793 1.0000000
## 4 0.7590622 2.180293 4031 1958 0 454 1619 1.0000000
## 8 0.6298185 1.994299 4031 1958 0 754 1319 1.0000000
## far.test v.test dprime.test
## 1 0.0000000 0.6612274 2.043819
## 3 0.0000000 0.8567852 2.369006
## 5 0.0000000 0.6887995 2.082142
## 7 0.0000000 0.8567852 2.369006
## 2 0.1350699 0.8647144 2.379611
## 6 0.1350699 0.8647144 2.379611
## 4 0.2190063 0.7807983 2.216105
## 8 0.3637241 0.6361154 2.002660
Here is the best training tree applied to the test data:
plot(mushrooms.fft, data = "test")
The iris
dataset contains data about 150 flowers. Our goal is to predict which flowers are of the class Virginica
head(iris)
## sep.len sep.wid pet.len pet.wid virginica
## 1 5.1 3.5 1.4 0.2 FALSE
## 2 4.9 3.0 1.4 0.2 FALSE
## 3 4.7 3.2 1.3 0.2 FALSE
## 4 4.6 3.1 1.5 0.2 FALSE
## 5 5.0 3.6 1.4 0.2 FALSE
## 6 5.4 3.9 1.7 0.4 FALSE
We’ll create a new fft object called iris.fft
using the fft()
function. In this case, we won’t create a separate training and test dataset, we’ll just fit the tree to the entire dataset:
iris.fft <- fft(
formula = virginica ~.,
data = iris
)
Here is the final best training tree (tree 6):
plot(iris.fft)
Here is tree number 2, which is a bit more conservative than tree 6.
plot(iris.fft, which.tree = 2)