tidydice

Roland Krasser

2019-11-26

Introduction

A basic understanding of probability and statistics is crucial for data understanding and discovery of meaningful patterns. A great way to teach probability and statistics is to start with an experiment, like rolling a dice or flipping a coin.

This package simulates rolling a dice and flipping a coin. Each experiment generates a tibble. Dice rolls and coin flips are simulated using sample(). The properties of the dice can be changed, like the number of sides. A Coin flip is simulated using a two sided dice. Experiments can be combined with the pipe-operator.

tidydice package on Github: https://github.com/rolkra/tidydice

Setup

As the tidydice-functions fits well into the tidyverse, we load the dplyr-package. For quick visualisations we use the explore-package. To create more flexible graphics, use ggplot2.

library(tidydice)
library(dplyr)
library(explore)

Roll a dice

Example

6 dice are rolled 3 times using roll_dice(). The result of the dice-experiment is visualised using plot_dice().

set.seed(123)
roll_dice(times = 6, rounds = 3) %>% 
  plot_dice(fill_success = "lightgrey")

Roll a dice once

The output of roll_dice() is a tibble. Each row represents a dice roll. Without parameters, a dice is rolled once. You can use plot_dice() to visualise the result.

set.seed(123)
roll_dice()
#> # A tibble: 1 x 5
#>   experiment round    nr result success
#>        <int> <int> <int>  <int> <lgl>  
#> 1          1     1     1      2 FALSE
set.seed(123)
roll_dice() %>% plot_dice()

Success is defined as result = 6 (as default), while result = 1..5 is not a success. In this case the result is 2, so it is no success.

Define success

If we would define result = 2 and result = 6 as success, it would be treated as success.

set.seed(123)
roll_dice(success = c(2,6))
#> # A tibble: 1 x 5
#>   experiment round    nr result success
#>        <int> <int> <int>  <int> <lgl>  
#> 1          1     1     1      2 TRUE

Unfair dice

As default, the dice is fair. So every result (0..6) has the same probability. If you want, you can change this.

set.seed(123)
roll_dice(prob = c(0,0,0,0,0,1))
#> # A tibble: 1 x 5
#>   experiment round    nr result success
#>        <int> <int> <int>  <int> <lgl>  
#> 1          1     1     1      6 TRUE

In this case we created a dice that always gets result = 6 (with 100% probability)

Untypically dice

As default the dice has 6 sides. If you want you can change this. Here we use a dice with 12 sides. result now can have a value between 1 and 12. But result = 6 is still the default success.

set.seed(123)
roll_dice(sides = 12)
#> # A tibble: 1 x 5
#>   experiment round    nr result success
#>        <int> <int> <int>  <int> <lgl>  
#> 1          1     1     1      4 FALSE

Roll a dice 4 times

set.seed(123)
roll_dice(times = 4)
#> # A tibble: 4 x 5
#>   experiment round    nr result success
#>        <int> <int> <int>  <int> <lgl>  
#> 1          1     1     1      2 FALSE  
#> 2          1     1     2      5 FALSE  
#> 3          1     1     3      3 FALSE  
#> 4          1     1     4      6 TRUE
set.seed(123)
roll_dice(times = 4) %>% plot_dice()

We get 1 success

Define rounds

set.seed(123)
roll_dice(times = 4, rounds = 2)
#> # A tibble: 8 x 5
#>   experiment round    nr result success
#>        <int> <int> <int>  <int> <lgl>  
#> 1          1     1     1      2 FALSE  
#> 2          1     1     2      5 FALSE  
#> 3          1     1     3      3 FALSE  
#> 4          1     1     4      6 TRUE   
#> 5          1     2     1      6 TRUE   
#> 6          1     2     2      1 FALSE  
#> 7          1     2     3      4 FALSE  
#> 8          1     2     4      6 TRUE
set.seed(123)
roll_dice(times = 4, rounds = 2) %>% plot_dice()

Rolling the dice 4 times is repeated. In the first round we got 1 success, in the secound round 2 success.

Use agg

A convenient way to aggregate the result, is to use the agg parameter. Now we get one line per round.

set.seed(123)
roll_dice(times = 4, rounds = 2, agg = TRUE)
#> # A tibble: 2 x 4
#>   experiment round times success
#>        <int> <int> <int>   <int>
#> 1          1     1     4       1
#> 2          1     2     4       2

You can aggregate by hand too using dplyr.

set.seed(123)
roll_dice(times = 4, rounds = 2) %>% 
  group_by(experiment, round) %>% 
  summarise(times = n(),
            success = sum(success))
#> # A tibble: 2 x 4
#> # Groups:   experiment [1]
#>   experiment round times success
#>        <int> <int> <int>   <int>
#> 1          1     1     4       1
#> 2          1     2     4       2

Visualise result

You can use any package/tool you like to visualise the result. In this example we use the explore-package.

set.seed(123)
roll_dice(times = 100) %>% 
  explore(result, title = "Rolling a dice 100x")

In 15% of the cases we got a six. This is close to the expected value of 100/6 = 16.67%

If we increase the times parameter to 10000, the results are more balanced.

set.seed(123)
roll_dice(times = 10000) %>% 
  explore(result, title = "Rolling a dice 10000x")

Visualise success

If we repeat the experiment rolling a dice 100x with rounds = 100, we get the distribution with a peak at about 17 (16.67 is the expected value)

set.seed(123)
roll_dice(times = 100, rounds = 100, agg = TRUE) %>% 
  explore(success, 
          title = "Rolling a dice 100x",
          auto_scale = FALSE)

If we increase rounds from 100 to 10000 we get a more symmetric shape. We see that success below 5 and success above 30 are very unlikely.

set.seed(123)
roll_dice(times = 100, rounds = 10000, agg = TRUE) %>% 
  explore(success, 
          title = "Rolling a dice 10000x",
          auto_scale = FALSE)

set.seed(123)
roll_dice(times = 100, rounds = 10000, agg = TRUE) %>% 
  mutate(check = ifelse(success < 5 | success > 30, 1, 0)) %>% 
  count(check)
#> # A tibble: 2 x 2
#>   check     n
#>   <dbl> <int>
#> 1     0  9996
#> 2     1     4

In only 4 of 10000 cases success is below 5 or above 30. So the probability to get this result is very low.

Combine experiments

Let’s add an experiment, where you have 10 extra dice. The shape of the distribution changes.

set.seed(123)
roll_dice(times = 100, rounds = 10000, agg = TRUE) %>% 
  roll_dice(times = 110, rounds = 10000, agg = TRUE) %>% 
  explore(success, 
          target = experiment,
          title = "Rolling a dice 100/110x",
          auto_scale = FALSE)

You can add as many experiments as you like (as long they generate the same data structure)

Adding an experiment with times = 150 will generate a smaller but wider shape.

set.seed(123)
roll_dice(times = 100, rounds = 10000, agg = TRUE) %>% 
  roll_dice(times = 110, rounds = 10000, agg = TRUE) %>% 
  roll_dice(times = 150, rounds = 10000, agg = TRUE) %>% 
  explore(success, 
          target = experiment,
          title = "Rolling a dice 100/110/150x",
          auto_scale = FALSE)

Flip a coin

Internally the package handles coins as dice with only two sides. Success is defined as result = 2 (as default), while result = 1 is not a success.

Flip a coin 10x

set.seed(123)
flip_coin(times = 10)
#> # A tibble: 10 x 5
#>    experiment round    nr result success
#>         <int> <int> <int>  <int> <lgl>  
#>  1          1     1     1      1 FALSE  
#>  2          1     1     2      1 FALSE  
#>  3          1     1     3      2 TRUE   
#>  4          1     1     4      1 FALSE  
#>  5          1     1     5      2 TRUE   
#>  6          1     1     6      1 FALSE  
#>  7          1     1     7      2 TRUE   
#>  8          1     1     8      1 FALSE  
#>  9          1     1     9      2 TRUE   
#> 10          1     1    10      2 TRUE

In this case the result are 6x 2 and 4x 1. We can use the describe() function of the explore-package to get a good overview.

set.seed(123)
flip_coin(times = 10) %>% 
  describe(success)
#> variable = success
#> type     = logical
#> na       = 0 of 10 (0%)
#> unique   = 2
#>    FALSE = 5 (50%)
#>     TRUE = 5 (50%)

Or just use the agg-parameter

set.seed(123)
flip_coin(times = 10, agg = TRUE)
#> # A tibble: 1 x 4
#>   experiment round times success
#>        <int> <int> <int>   <int>
#> 1          1     1    10       7

Define rounds

The parameter rounds can be used like in roll_dice().

set.seed(123)
flip_coin(times = 10, rounds = 4, agg = TRUE)
#> # A tibble: 4 x 4
#>   experiment round times success
#>        <int> <int> <int>   <int>
#> 1          1     1    10       7
#> 2          1     2    10       3
#> 3          1     3    10       5
#> 4          1     4    10       5

Combine experiments

set.seed(123)
flip_coin(times = 10, agg = TRUE) %>% 
  flip_coin(times = 15, agg = TRUE)
#> # A tibble: 2 x 4
#>   experiment round times success
#>        <int> <int> <int>   <int>
#> 1          1     1    10       6
#> 2          2     1    15       7

Dice design

Default settings

set.seed(123)
roll_dice(times = 6) %>% 
  plot_dice()

Change color

set.seed(123)
roll_dice(times = 6) %>% 
  plot_dice(fill = "black", line_color = "white", point_color = "white")

set.seed(123)
roll_dice(times = 6) %>% 
  plot_dice(fill = "lightblue", fill_success = "darkblue")

set.seed(123)
roll_dice(times = 6) %>% 
  plot_dice(fill = "darkgrey", 
            fill_success = "darkblue",
            line_color = "white",
            point_color = "white")

Dice type

set.seed(123)
roll_dice(times = 6) %>% 
  plot_dice(detailed = TRUE)

set.seed(123)
roll_dice(times = 6) %>% 
  plot_dice(detailed = FALSE)

Limits

plot_dice() is limited to 1 experiment with max. 10 times x 10 rounds.

set.seed(123)
roll_dice(times = 10, rounds = 10) %>% 
  plot_dice(detailed = FALSE, fill_success = "gold")

Force

You can force a result using force_dice() and force_coin().

force_dice(1:6) %>% 
  plot_dice()

force_dice(rep(6, times = 6)) %>% 
  plot_dice()

We can combine two foreced dice rolling using the pipe operator and the parameter round.

force_dice(rep(5, times = 3), round = 1) %>% 
  force_dice(rep(6, times = 3), round = 2)
#> # A tibble: 6 x 5
#>   experiment round    nr result success
#>        <int> <int> <int>  <int> <lgl>  
#> 1          1     1     1      5 FALSE  
#> 2          1     1     2      5 FALSE  
#> 3          1     1     3      5 FALSE  
#> 4          1     2     1      6 TRUE   
#> 5          1     2     2      6 TRUE   
#> 6          1     2     3      6 TRUE
set.seed(123)
force_dice(rep(6, times = 3)) %>% 
  roll_dice(times = 3)
#> # A tibble: 6 x 5
#>   experiment round    nr result success
#>        <int> <int> <int>  <int> <lgl>  
#> 1          1     1     1      6 TRUE   
#> 2          1     1     2      6 TRUE   
#> 3          1     1     3      6 TRUE   
#> 4          2     1     1      2 FALSE  
#> 5          2     1     2      5 FALSE  
#> 6          2     1     3      3 FALSE

In the first experiment we get 3 times a 6 (forced), but in the second experiment none.