Introduction To The auctestr Package

Functions

auctestr currently contains only four simple functions, which is all that is required for complete statistical testing of the AUC. An example dataset would consist of one or more observations of at least two different predictive models:

data("sample_experiment_data", package="auctestr")
head(sample_experiment_data, 15)

##          auc precision  accuracy    n n_p  n_n  dataset time model_id
## 1  0.7957640 0.5354970 0.8207171 1757 350 1407 dataset1    0   ModelA
## 2  0.7957640 0.5354970 0.8207171 1757 350 1407 dataset1    0   ModelC
## 3  0.7957640 0.5354970 0.8207171 1757 350 1407 dataset1    0   ModelB
## 4  0.8459516 0.4772727 0.8471926 1407 199 1208 dataset1    1   ModelA
## 5  0.7473793 0.6300578 0.8905473 1407 199 1208 dataset1    1   ModelC
## 6  0.7440098 0.6407186 0.8919687 1407 199 1208 dataset1    1   ModelB
## 7  0.8434291 0.6080000 0.8841060 1208 194 1014 dataset1    2   ModelA
## 8  0.8097918 0.7371429 0.9081126 1208 194 1014 dataset1    2   ModelC
## 9  0.8009618 0.7440476 0.9072848 1208 194 1014 dataset1    2   ModelB
## 10 0.8455385 0.6265823 0.8471400 1014 235  779 dataset1    3   ModelA
## 11 0.8339251 0.6654676 0.8589744 1014 235  779 dataset1    3   ModelC
## 12 0.7319750 0.7393939 0.8461538 1014 235  779 dataset1    3   ModelB
## 13 0.4970371 0.3500000 0.5866496  779 316  463 dataset1    4   ModelA
## 14 0.7426457 0.7167235 0.7573813  779 316  463 dataset1    4   ModelC
## 15 0.7586701 0.7046154 0.7650834  779 316  463 dataset1    4   ModelB
##    model_variant
## 1       VariantA
## 2       VariantA
## 3       VariantA
## 4       VariantA
## 5       VariantA
## 6       VariantA
## 7       VariantA
## 8       VariantA
## 9       VariantA
## 10      VariantA
## 11      VariantA
## 12      VariantA
## 13      VariantA
## 14      VariantA
## 15      VariantA

Conducting statistical comparisons of models, including over time, can be completed in a single call to auc_compare():

# compare model A and model B, only evaluating VariantC of both models
z_score = auc_compare(sample_experiment_data, compare_values = c("ModelA", "ModelB"), filter_value = c("VariantC"), time_col = "time", outcome_col = "auc", compare_col = "model_id", over_col = "dataset", filter_col = "model_variant")

## fetching comparison results for models ModelA, ModelB in dataset dataset1 with filter value VariantC

## fetching comparison results for models ModelA, ModelB in dataset dataset2 with filter value VariantC

## fetching comparison results for models ModelA, ModelB in dataset dataset3 with filter value VariantC

z_score

## [1] 3.604343

# fetch p-value of this comparison
pnorm(-abs(z_score))

## [1] 0.0001564715

auctestr also allows for flexible adjustment of which pairwise comparisons are conducted, and which elements are held fixed (the fixed values are set using filter_value and filter_col parameters):

z_score = auc_compare(sample_experiment_data, compare_values = c("VariantA", "VariantB"), filter_value = c("ModelC"), time_col = "time", outcome_col = "auc", compare_col = "model_variant", over_col = "dataset", filter_col = "model_id")

## fetching comparison results for models VariantA, VariantB in dataset dataset1 with filter value ModelC

## fetching comparison results for models VariantA, VariantB in dataset dataset2 with filter value ModelC

## fetching comparison results for models VariantA, VariantB in dataset dataset3 with filter value ModelC

z_score

## [1] 1.655143

pnorm(-abs(z_score))

## [1] 0.04894775

The model comparisons are conducted using a method described in detail in: Fogarty, James, Ryan S. Baker, and Scott E. Hudson. “Case studies in the use of ROC curve analysis for sensor-based estimates in human computer interaction.” Proceedings of Graphics Interface 2005. Canadian Human-Computer Communications Society, 2005.

Note that these comparisons assume that there is a dataset-dependent column that needs to be statistically averaged over, and it uses Stouffer’s method to do so:

\(Z \sim \frac{\sum_{i=1}^kZ_i}{\sqrt{k}}\)

This is a conservative adjustment and more powerful, less conservative adjustments may be added in future versions. For more information, see Stouffer, S.A.; Suchman, E.A.; DeVinney, L.C.; Star, S.A.; Williams, R.M. Jr. (1949). The American Soldier, Vol.1: Adjustment during Army Life. Princeton University Press, Princeton, or [wikipedia]{https://en.wikipedia.org/wiki/Fisher%27s_method#Relation_to_Stouffer.27s_Z-score_method}

We hope to implement more features in a future version, but this is a fully-featured package to allow for more principled statistical model selection based on the unique statistical properties of the AUC metric; we hope it improves your research and modeling.

Introduction To The auctestr Package

Josh Gardner

2017-11-12

Introduction

Functions