OddsPlotty

OddsPlottyPackage

This package has been created to create odds plot for the results of a logistic regression.

The package uses caret to train the model and the final model parameter is used to generate the application.

Loading OddsPlotty

To use the odds_plot function you can invoke it by using:

library(OddsPlotty)

Training a GLM to use with odds plot

First we load the required packages. The example dataset we are going to use to work with OddsPlotty is the breast cancer data:

#install.packages("mlbench")
#install.packages("caret")
library(mlbench)
library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
library(tibble)
library(ggplot2)
library(OddsPlotty)
library(e1071)
library(ggthemes)

Then we are going to load the data we need for the breast cancer data:

data("BreastCancer", package = "mlbench")
#Use complete cases of breast cancer
breast <- BreastCancer[complete.cases(BreastCancer), ] #Create a copy
breast <- breast[, -1]
head(breast, 10)
#>    Cl.thickness Cell.size Cell.shape Marg.adhesion Epith.c.size Bare.nuclei
#> 1             5         1          1             1            2           1
#> 2             5         4          4             5            7          10
#> 3             3         1          1             1            2           2
#> 4             6         8          8             1            3           4
#> 5             4         1          1             3            2           1
#> 6             8        10         10             8            7          10
#> 7             1         1          1             1            2          10
#> 8             2         1          2             1            2           1
#> 9             2         1          1             1            2           1
#> 10            4         2          1             1            2           1
#>    Bl.cromatin Normal.nucleoli Mitoses     Class
#> 1            3               1       1    benign
#> 2            3               2       1    benign
#> 3            3               1       1    benign
#> 4            3               7       1    benign
#> 5            3               1       1    benign
#> 6            9               7       1 malignant
#> 7            3               1       1    benign
#> 8            3               1       1    benign
#> 9            1               1       5    benign
#> 10           2               1       1    benign
#Convert the class to a factor - Beningn (0) and Malignant (1)
breast$Class <- factor(breast$Class)
str(breast)
#> 'data.frame':    683 obs. of  10 variables:
#>  $ Cl.thickness   : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 5 5 3 6 4 8 1 2 2 4 ...
#>  $ Cell.size      : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 4 1 8 1 10 1 1 1 2 ...
#>  $ Cell.shape     : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 4 1 8 1 10 1 2 1 1 ...
#>  $ Marg.adhesion  : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 5 1 1 3 8 1 1 1 1 ...
#>  $ Epith.c.size   : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 2 7 2 3 2 7 2 2 2 2 ...
#>  $ Bare.nuclei    : Factor w/ 10 levels "1","2","3","4",..: 1 10 2 4 1 10 10 1 1 1 ...
#>  $ Bl.cromatin    : Factor w/ 10 levels "1","2","3","4",..: 3 3 3 3 3 9 3 3 1 2 ...
#>  $ Normal.nucleoli: Factor w/ 10 levels "1","2","3","4",..: 1 2 1 7 1 7 1 1 1 1 ...
#>  $ Mitoses        : Factor w/ 9 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 5 1 ...
#>  $ Class          : Factor w/ 2 levels "benign","malignant": 1 1 1 1 1 2 1 1 1 1 ...

This takes care of the class encoding but now we need to code the factors to numeric

for(i in 1:9) {
  breast[, i] <- as.numeric(as.character(breast[, i]))
}
#Loops through the first columns - 1 to 9 and changes them from factors to a numerical representation
str(breast)
#> 'data.frame':    683 obs. of  10 variables:
#>  $ Cl.thickness   : num  5 5 3 6 4 8 1 2 2 4 ...
#>  $ Cell.size      : num  1 4 1 8 1 10 1 1 1 2 ...
#>  $ Cell.shape     : num  1 4 1 8 1 10 1 2 1 1 ...
#>  $ Marg.adhesion  : num  1 5 1 1 3 8 1 1 1 1 ...
#>  $ Epith.c.size   : num  2 7 2 3 2 7 2 2 2 2 ...
#>  $ Bare.nuclei    : num  1 10 2 4 1 10 10 1 1 1 ...
#>  $ Bl.cromatin    : num  3 3 3 3 3 9 3 3 1 2 ...
#>  $ Normal.nucleoli: num  1 2 1 7 1 7 1 1 1 1 ...
#>  $ Mitoses        : num  1 1 1 1 1 1 1 1 5 1 ...
#>  $ Class          : Factor w/ 2 levels "benign","malignant": 1 1 1 1 1 2 1 1 1 1 ...

This has now changed the data into a numerical value and this can now be used in the GLM model.

Training the GLM using Caret

I will use Caret to train the Generalised Linear Model (GLM) aka Logistic Regression, as this is the package that best supports the odds plot statistics. Please note: I am training on the full dataset and not undertaking a data partitioning method, as perhaps seen in logistic regression.

library(caret)
glm_model <- train(Class ~ .,
                   data = breast,
                   method = "glm",
                   family = "binomial")
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Once the model is trained we can inspect the results with OddsPlotty:

Visualise with OddsPlotty

The below shows how to visualise and expose the plot from the saved list in OddsPlotty.

#> Waiting for profiling to be done...

Exposing the data frame

Each odds plot has an associated tibble under the hood for querying. To access the tibble use:

#> # A tibble: 9 x 4
#>      OR lower upper vars           
#>   <dbl> <dbl> <dbl> <chr>          
#> 1 1.71  1.32   2.31 Cl.thickness   
#> 2 0.994 0.674  1.55 Cell.size      
#> 3 1.38  0.862  2.16 Cell.shape     
#> 4 1.39  1.10   1.80 Marg.adhesion  
#> 5 1.10  0.805  1.50 Epith.c.size   
#> 6 1.47  1.23   1.78 Bare.nuclei    
#> 7 1.56  1.13   2.23 Bl.cromatin    
#> 8 1.24  0.998  1.56 Normal.nucleoli
#> 9 1.71  0.993  3.02 Mitoses

Using different themes with additional parameters

Additional parameters for the plot can be fed in:

#> Waiting for profiling to be done...

Another example of how to use a different theme:

library(OddsPlotty)
library(ggthemes)
plotty <- OddsPlotty::odds_plot(glm_model$finalModel, 
                      title = "Odds Plot with ggthemes Tufte Theme",
                      subtitle = "Showing odds of cancer based on various factors",
                      point_col = "#00f2ff",
                      error_bar_colour = "black",
                      point_size = .5,
                      error_bar_width = .8,
                      h_line_color = "red")
#> Waiting for profiling to be done...

plotty$odds_plot + ggthemes::theme_tufte()

This package was created by Gary Hutson https://twitter.com/StatsGary and the package is part of his work.