Observe and check your data in R

2017-01-29

Create observations from your data with ‘observe_if’

The observer package checks that a given dataset passes user-specified rules. The main functions are observe_if and inspect.

For instance, according to the documentation of the diamonds dataset in package ggplot2, the column depth is equal to 100*2*z/(x+y). Let us make an observation of this:

df <- ggplot2::diamonds %>% 
  mutate(depth2 = 100*2*z/(x+y)) %>% 
  observe_if(x > 0, 
             y > 0, 
             z > 0, 
             abs(depth-depth2) < 1)

obs(df)
#> # A tibble: 4 × 8
#>      Id               Predicate Passed Failed Missing      Rows Status
#> * <int>                   <chr>  <int>  <int>   <int>    <list>  <chr>
#> 1     1                   x > 0  53932      8       0 <S3: bit> failed
#> 2     2                   y > 0  53933      7       0 <S3: bit> failed
#> 3     3                   z > 0  53920     20       0 <S3: bit> failed
#> 4     4 abs(depth - depth2) < 1  53840     93       7 <S3: bit> failed
#> # ... with 1 more variables: Number_of_trials <int>

We observe that 93 rows fail to satisfy this rule. To go further we need to see what is happening; with inspect we can select the rows at stake:

inspect(df, 4)
#> # A tibble: 100 × 11
#>    carat       cut color clarity depth table price     x     y     z
#>    <dbl>     <ord> <ord>   <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1   1.00   Premium     G     SI2  59.1    59  3142  6.55  6.48  0.00
#> 2   1.22   Premium     J     SI2  62.6    59  3156  6.79  4.24  3.76
#> 3   1.01   Premium     H      I1  58.1    59  3167  6.66  6.60  0.00
#> 4   0.70     Ideal     G     VS2  62.7    54  3172  5.65  5.70  3.65
#> 5   1.00 Very Good     J     SI2  62.8    63  3293  6.26  6.19  3.19
#> 6   0.70   Premium     E      IF  62.9    59  3403  5.66  5.59  3.40
#> 7   1.01      Fair     F     SI2  64.6    59  3540  6.19  6.25  4.20
#> 8   1.00      Fair     G     SI1  43.0    59  3634  6.32  6.27  3.97
#> 9   0.81   Premium     E     VS2  61.5    58  3674  5.99  5.94  3.97
#> 10  1.10   Premium     G     SI2  63.0    59  3696  6.50  6.47  0.00
#> # ... with 90 more rows, and 1 more variables: depth2 <dbl>

Another way is to write it with standard evaluation:

## Write your predicates first
p <- c(~ x > 0, ~ y > 0, ~ z > 0, 
       ~ abs(depth-depth2) < 1)

## Make observations
df %>% 
  observe_if_(.dots = p) %>% 
  obs()
#> # A tibble: 8 × 8
#>      Id               Predicate Passed Failed Missing      Rows Status
#> * <int>                   <chr>  <int>  <int>   <int>    <list>  <chr>
#> 1     1                   x > 0  53932      8       0 <S3: bit> failed
#> 2     2                   y > 0  53933      7       0 <S3: bit> failed
#> 3     3                   z > 0  53920     20       0 <S3: bit> failed
#> 4     4 abs(depth - depth2) < 1  53840     93       7 <S3: bit> failed
#> 5     5                   x > 0  53932      8       0 <S3: bit> failed
#> 6     6                   y > 0  53933      7       0 <S3: bit> failed
#> 7     7                   z > 0  53920     20       0 <S3: bit> failed
#> 8     8 abs(depth - depth2) < 1  53840     93       7 <S3: bit> failed
#> # ... with 1 more variables: Number_of_trials <int>