--- title: "Introduction to `{whatifbandit}`" output: rmarkdown::html_vignette bibliography: "REFERENCES.bib" vignette: > %\VignetteIndexEntry{Introduction to `{whatifbandit}`} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(data.table) library(dplyr) library(future) library(ggplot2) library(whatifbandit) ``` `{whatifbandit}` allows researchers to re-simulate their randomized experiments under adaptive experimental designs. Replicating experiments can be costly and time-consuming, so this simulation tool allows researchers to explore what their results could have looked like under an adaptive experimental design. These re-analyses can inform the planning for future studies, allowing a researcher to justify an adaptive design for a future project by understanding how such designs help meet research and implementation goals. Adaptive experimental designs randomly assigning treatments, but do so while taking into account the relative performance of each treatment. Usually this means that better performing treatments will be assigned more participants at each assignment period, allowing both identification of and assignment to the best treatment arm(s). These designs can shine in situations such as: - Finding and assigning the one best treatment takes precedence over gauging the effects of all treatments with equal precision. - A researcher wants to test a many treatments, such as 10, 20, or 100. - Simple random assignment produces sample sizes too small to detect the subset of treatment effects the researcher most cares about. - An experiment occurs over multiple time periods. This vignette introduces the key functions of the `{whatifbandit}` package and how to use them effectively. All that's required is data from a previous randomized experiment containing information about which treatment each observation was originally assigned to, and the original outcome. For a more detailed view of the exact adaptive procedures and implementation, please refer to the function documentation, and if you encounter any bugs please report them on the package's [GitHub](https://github.com/Noch05/whatifbandit/issues) repository. # Core Functionality `{whatifbandit}` provides 2 main functions for conducting simulations, `single_mab_simulation()` and `multiple_mab_simulation()`. Like their names suggest `single_mab_simulation()` runs only 1 simulation while `multiple_mab_simulation()` runs a user-specified number of simulations. Internally both functions are similar, and share the same arguments except for the few additional ones `multiple_mab_simulation()` needs for repeated trials. ## Multi-Arm-Bandits: Adaptive Assignment Adaptive experiments are a form of multi-arm bandit (MAB) problem, where each treatment arm has an unknown probability of success, and a balance must be achieved between exploration and exploitation. In an applied setting, we want a design can adequately explore the space of possible treatments, and find the best treatment to exploit. If the decision algorithm is quick to exploit a single treatment, then there is a high chance that the true best treatment is missed, but if the decision algorithm explores too much, then we miss out on the rewards from exploiting the current treatment we believe is the best. Additionally, these unknown probabilities can change overtime in what is called a non-stationary bandit problem. Solutions to these bandit problems, in the form of decision algorithms, provide the basis for how `{whatifbandit}` simulates an adaptive trial. `{whatifbandit}` supports 2 assignment algorithms, Thompson sampling and upper confidence bound assignment ("UCB1"). Thompson sampling calculates the posterior probability of each treatment being the best treatment arm, and then uses those probabilities to assign the treatment arms in the next wave. In the Bernoulli case implemented in `{whatifbandit}`, the posterior probabilities can be calculated from a beta distribution. On the other hand, UCB1 instead calculates an upper confidence bound on the expected reward of each arm, and selects the maximum value [@kuleshov2014; @slivkins2024; @offer-westort2021]. In a multi-arm experiment with $T$ time periods, let the set of potential treatment arms be denoted $\mathcal{K}$. At each time period $t \in \{1, 2, ..., T\}$, a treatment arm $W_t \in \mathcal{K}$ needs to be selected. In Thompson sampling, this involves finding the arm $k \in \mathcal{K}$ with the highest probability of being the best treatment arm. When outcomes are binary, the probability of a reward from any arm $k \in \mathcal{K}$ after wave $t-1$ of the adaptive procedure, denoted $\theta^k_t$, follows a beta distribution according to the number of observed success and failures the treatment has received up to that point. These cumulative successes and failures are for each arm are denoted, $\text{success}_{kt}$ and $\text{failures}_{kt}$ respectively. Now instead of choosing the arm $k$ which has the highest probability of being the best as the one and only $W_t$, Thompson sampling uses these probabilities to assign new treatments, so the probability that each arm $k$ is the best becomes the probability that it will be assigned from the set of $\mathcal{K}$ arms at time $t$. For a given treatment arm $w \in \mathcal{K}$, this probability is represented below: $$ P(W_t = w) = P\left(\theta_t^w > \theta_t^k, \ \forall k \in \mathcal{K} \setminus \{w\} \mid \ \theta^k_t \sim \text{Beta}(1+\text{success}_{kt}, 1 + \text{failures}_{kt}) \ \forall k \in \mathcal{K}\right) $$ For UCB1 only 1 treatment arm is selected at any time $t$. $W_t$, the selected treatment at time $t$, is the treatment arm $k \in \mathcal{K}$ which maximizes the sum of the estimated expected reward and the uncertainty bound at time $t$, where $n_{kt}$ is the number of times arm $k$ has been selected up to time $t$. $$ W_t = \arg\max_k \left(\widehat{\mathbb{E}[\text{Reward}_{kt}]} + \sqrt{\frac{2 \ln t}{n_{kt}}} \right) $$ ## Adaptive Inference Statistical inference under adaptive designs requires careful consideration. The sample mean is biased for the true reward probability, because observations are no longer independently identically distributed (i.i.d.). Thus, we utilize an augmented inverse probability weighted estimator (AIPW), with a constant allocation of the variance across periods formulated by @hadad2021a, whose formulas we implement in the package. These estimators are unbiased and asymptotically normal under adaptive trials, so a normal distribution can be used for parametric inference. Additionally, `{whatifbandit}` also allows for multiple simulations to easily estimate the variance of the procedure, and build an empirical distribution of the unbiased AIPW estimates, instead of just relying on the normal approximation. The estimator is calculated by weighting the outcome against the probability of being assigned the treatment, and a conditional expectation/regression estimate based on the previous periods of assignment, then aggregating based on an adaptive weight. In the package the conditional expectation is estimated via a grouped mean, and the adaptive weight uses the constant allocation rate described in @hadad2021a. The allocation rate is a function which describes the proportion of the remaining variance that is allocated for next observation. This allocation rate determines how the adaptive weights are used which are necessary to ensure the variance of the estimator converges properly. Additional details about adaptive weights, and allocation functions can be found in @hadad2021a, along with the other specifications they tested. The formulas below are adapted from the @hadad2021a, and showcase the AIPW calculations. The first equation below gives the individual AIPW estimate for treatment $w$ and at period $t$, where $\mathbb{I}$ is an indicator function, $e_t(w)$ is the probability of unit in period $t$ being assigned treatment $w$, and $\hat{m}_t(w)$ is conditional expectation/regression estimate based on the previous periods of assignment. The regression estimate at period $t$, is the average of the outcomes under treatment $w$, from periods $t-1$, where $n_{t-1}(w)$ is the number of times treatment $w$ was assigned by period $t-1$. The second equation below gives the final AIPW estimate for treatment $w$, and is the weighted sum of each individual estimate $\widehat{\Gamma}^{AIPW}_t$ and the adaptive weights $h_t(w)$ for treatment $w$ at time $t$, all divided by the sum of the weights. These adaptive weights are calculated in the third equation below by dividing current probability of being assigned treatment, $e_t(w)$, by the total number of periods, $T$, and taking the square root. $$ \widehat{\Gamma}^{AIPW}_t = \frac{\mathbb{I}\{W_t = w\}}{e_t(w)} Y_t + \left(1 - \frac{\mathbb{I}\{W_t = w\}}{e_t(w)} \right)\hat{m}_t(w) \tag {Individual} \\ $$ $$ \widehat{Q}^h_T(w) = \frac{\sum_{t=1}^Th_t(w)\widehat{\Gamma}^{AIPW}_t}{\sum_{t=1}^Th_t(w)} \tag{Average} $$ $$ h_t(w) = \sqrt{\frac{e_t(w)}{T}} \tag{Adaptive Weight} \\ $$ $$ \hat{m}_t(w) = \frac{1}{n_{t-1}(w)}\sum_{i=1}^{n_{t-1}(w)} y_i \tag{Grouped Mean} $$ ## Data To explore the functionality of `{whatifbandit}`, we will use the included `tanf` dataset. These data detail the results of a randomized experiment, which tested the impact of different letters on re-certification for the Temporary Assistance for Needy Families (TANF) program in Washington D.C. [@moore2022]. The original field experiment tests 3 different conditions, over the course of 5 months. ```{r, data} data(tanf) glimpse(tanf) ``` # A Single MAB Simulation `single_mab_simulation()` runs an adaptive-design simulation, where each of its arguments tunes the settings of the simulation. A simulation requres a data frame containing a unique ID for every row, that row's original treatment, and its original outcome. These column names are passed to the function as strings. ```{r, singlesimnoeval, eval=FALSE} sim <- single_mab_simulation( data = tanf, assignment_method = "Batch", period_length = 1000, algorithm = "UCB1", whole_experiment = TRUE, perfect_assignment = TRUE, prior_periods = "All", blocking = FALSE, data_cols = c( id_col = "ic_case_id", success_col = "success", condition_col = "condition" ) ) ``` The function contains several options which control the specification of the simulation. The setup above performs UCB1 sampling without treatment blocking with batch sizes of 1000 people, where the whole experiment is used for imputing outcomes. This differs significantly from the original TANF field experiment. The original TANF experiment was blocked by D.C. Community and Human Services service centers (and by official appointment date), ensuring each center had a comparable number of participants in each group. To replicate this we set `block = TRUE` and add the service_center column to `block_cols` [@moore2022]. The original experiment took place over 5 distinct months. To mirror that, we set `assignment_method = "Date"`, `time_unit = "Month"`, and `period_length = 1`, which creates 1-month-long periods. With a date assignment, the name of `date_col` needs to be provided in `data_cols`. UCB1 sampling also only chooses 1 treatment per period, but because we want time-based periods, and each period is a large portion of the dataset, we use Thompson sampling instead. This ensures that we can see the treatment assignments evolve more over time. We sset `algorithm = "Thompson"` for this. The time component of the experiment can be further taken into account. There is no guarantee, that by the time next month's letters are sent out, that all the people who would recertify had done so. Additionally, when `whole_experiment = TRUE`, the full set of original results is used for outcome imputation, while setting it to FALSE only uses the data up to the assignment period, which more closer reflects the information available, had the original experiment been a MAB. To reflect these conditions, we set `perfect_assignment = FALSE`, and `whole_experiment = FALSE`, along with including `success_date_col` and `assignment_date_col` in `data_cols`. With this configuration, if a success occurs after the date treatments are assigned, it is masked and treated as a failure. This simulates the researcher's limited information as those late outcomes would not have been observed yet. ```{r, singlesimeval} sim <- single_mab_simulation( data = tanf, assignment_method = "Date", time_unit = "Month", period_length = 1, algorithm = "Thompson", whole_experiment = FALSE, perfect_assignment = TRUE, prior_periods = "All", blocking = TRUE, block_cols = c("service_center"), data_cols = c( id_col = "ic_case_id", date_col = "appt_date", success_col = "success", condition_col = "condition", month_col = "recert_month", success_date_col = "date_of_recert", assignment_date_col = "letter_sent_date" ) ) ``` You don't always have to replicate the setting perfectly, reanalyzing the experiment as an adaptive trial also involves testing how the results change across different configurations, such as the size and length of periods, or the usage of `perfect_assignment` or `whole_experiment`. The only guidelines would be to use UCB1 only when the period size is a relatively small component of the dataset, and only set `perfect_assignment = FALSE`, when you have dates that line up the periods you have set. If this is not the case, then all the successes will simply be masked. For example, we can experiment with using individual assignment, by setting `assignment_method = "Individual"`. This does not follow the original experiment, and we don't have data in that form, so `perfect_assignment = TRUE` should be set for the best results. Additionally, since the date still represents the order, we sort the data before running the simulation, to ensure it's in the observed order. It also may be strong assumption that the true probabilities of success for each treatment are stationary over time. To account for this, I can set `prior_periods = 500`, to only look at the last 500 periods, in the `"Individual"` case, observations, in the simulation when performing adaptive assignment. This ensures that the oldest periods, which might not be representative of the true current probabilities, are not considered when assigning new treatments. Additionally, because the goal of this experiment is also to estimate a treatment effect for the letters, we might not want to leave 100% of the assignment up to the bandit algorithms. For this we have two methods, either control augmentation or a hybrid design. Control augmentation ensures that the marked control group meets the specified probability threshold for assignment. For example, if we set `control_augment = 0.25`, the algorithm allocates 25% of each period to be assigned to the control group, and if a period has the control group at 15%, then the probabilities are adjusted accordingly. The hybrid design sets aside a percentage of the dataset to be assigned uniformly randomly. If we set `random_assign_prop = 0.2`, then the algorithm allocates 20% of each period to be assigned to treatments randomly, while 80% are assigned via the Thompson sampling or UCB1 algorithm. We do not recommend using both control augmentation and a hybrid design. If there are 3 conditions, like in the TANF experiment, and we set `control_augment = 0.1`, and `random_assign_prop = 0.21`, then the minimum assignment probability for the control group is not 10% but 14.9% instead (0.1 * 0.79 + 0.21/3 = 0.149). Below, we use `control_augment = 0.2`, and identify the control group using the `control_condition` argument. If we use `random_assign_prob` instead, we do not need to specify this. ```{r, singlesimcontrolaug, eval=FALSE} tanf <- arrange(tanf, appt_date) conditions <- setNames(levels(tanf$condition), c("Control", "T1", "T2")) sim <- single_mab_simulation( data = tanf, assignment_method = "Individual", algorithm = "Thompson", whole_experiment = FALSE, perfect_assignment = TRUE, prior_periods = 500, blocking = TRUE, block_cols = c("service_center"), data_cols = c( id_col = "ic_case_id", success_col = "success", condition_col = "condition" ), control_augment = 0.2, control_condition = "no_letter" ) ``` ## Analysis `single_mab_simulation()` returns a list with 4 components: * `final_data` is the transformed input containing the treatment assignments and outcomes from the simulation, along with other new columns such as the probability of being assigned each treatment at each period, and the regression adjustment $\hat{m}(W)$ for the AIPW estimates. * `bandits` contains either the Thompson sampling probabilities or UCB1 values from each period in the trial. The chart should be interpreted as the bandit calculation after that period occurred. * `assignment_probs` contains the probability of assignment from each period in the trial. The chart should be interpreted as the probability for that specific period. * `estimates` contains the AIPW and sample estimates with variances for each treatment arm. * `settings` contains the configuration for the simulation. This object is also a custom `mab` class, that comes with several generic methods. The `print()` method displays the settings of the trial, and the `summary()` method aggregates the results. ```{r} class(sim) sim ``` The summary method presents the final Thompson/UCB1 calculation from after the trial concluded, the AIPW estimate for the probability of success, its standard error, a 95% two-tailed confidence interval based on a normal distribution, and the number of observations assigned to each treatment group over the course of the trial. ```{r} sim_summary <- summary(sim) print(sim_summary, width = Inf) ``` The width of the confidence intervals can be changed in the `level` argument of the summary method, but can also be calculated by hand using the standard error. Below, we specify 80% confidence intervals or an $\alpha = 0.2$. ```{r} # Inside Summary Call summary(sim, level = 0.8) |> select(estimated_probability_of_success, SE, lower_bound, upper_bound, level) |> print(width = Inf) # By hand quantile <- qnorm(0.2 / 2, lower.tail = FALSE) sim_summary |> mutate( lower_bound = estimated_probability_of_success - SE * quantile, upper_bound = estimated_probability_of_success + SE * quantile ) |> select(estimated_probability_of_success, SE, lower_bound, upper_bound) ``` ## Plotting Plot methods display the results. These methods use `{ggplot2}` to create visualizations. The `mab` class supports 3 plot types, `"arm"`, `"assign"` and `"estimate"`, each providing different information about the results. These plots can be built with `+` like any `{ggplot2}` object, and arguments to the main `geom*` can be specified through the `...` argument. The geoms used are `geom_line()`, and `geom_errorbarh()`. `type = "arm"` provides a view of each arm over time, its relative probability of being the best, or the UCB1 value. ```{r} #| fig.width: 5 plot(sim, type = "arm") ``` `type = "assign"` showcases the cumulative proportion of the data, assigned to each arm over time ```{r} #| fig.width: 5 plot(sim, type = "assign") ``` `type = "estimate"` shows error bars on the AIPW using the estimated variances. Just like the `summary()`, the confidence level can be specified. ```{r} #| fig.width: 5 plot(sim, type = "estimate", level = 0.9, height = 0.4) + scale_x_continuous(breaks = seq(0, 1, .1), limits = range(0, 1)) ``` All of the data used to create these plots and summary statistics is present in the original output, so you can easily create your own more detailed versions. Additionally, the information provided should also let you calculate other adaptive estimators, if you do not want to use the AIPW from @hadad2021a or want to specify another one of their proposed adaptive variance allocation methods. # Multiple Simulations `multiple_mab_simulation()` is a wrapper function for executing multiple trials under the same settings. It is used to estimate the variance in the outcomes that occurs from the simulation process. Depending on the number of trials specified and the size of your dataset, running multiple simulations can be memory intensive, and have a long run time, so we recommend that running a single simulation first to estimate how long multiple simulations will take. `multiple_mab_simulation()` has 3 new arguments but is otherwise the same as `single_mab_simulation()`. These arguments are: * `times`: The number of times to run the simulation. * `seeds`: A numeric vector of random seeds, where `length(seeds) = times`. Required to make results reproducible, and we encourage setting them before hand using `sample.int()`. * `keep_data`: A logical value, whether or not to keep the `final_data` object from each simulation. Default is FALSE, to reduce memory load of execution. Below, we re-simulate the TANF experiment as 100 MAB trials: ```{r} set.seed(532454) seeds <- sample.int(1000000, 100, replace = FALSE) time <- system.time( multiple_sims <- multiple_mab_simulation( data = tanf, assignment_method = "Date", time_unit = "Month", period_length = 1, algorithm = "Thompson", whole_experiment = FALSE, perfect_assignment = TRUE, prior_periods = "All", blocking = TRUE, block_cols = c("service_center"), data_cols = c( id_col = "ic_case_id", date_col = "appt_date", success_col = "success", condition_col = "condition", month_col = "recert_month", success_date_col = "date_of_recert", assignment_date_col = "letter_sent_date" ), keep_data = TRUE, times = 100, seeds = seeds ) ) ``` ```{r} #| echo: FALSE get_size <- function(x, unit) { string_format <- object.size(x) |> format(units = unit) num <- regmatches(string_format, regexpr("[0-9]+", string_format)) |> as.numeric() return(num) } full_size <- get_size(multiple_sims, unit = "MiB") full_size_kib <- get_size(multiple_sims, unit = "KiB") multiple_sims$final_data_nest <- NULL reduced_size <- get_size(multiple_sims, unit = "KiB") ``` Here we've kept `keep_data = TRUE` to showcase the memory impact of saving the data from each iteration. The full object size with all the data is `r full_size` megabytes (MiB). However, when we remove the data from the object, the size becomes just `r reduced_size` kilobytes (KiB), making it `r round(full_size_kib / reduced_size, 0)` times smaller than the original size. Executing those 100 simulations took `r as.numeric(time["elapsed"])` seconds. To speed up this process `multiple_mab_simulation()` supports parallel processing via the [furrr](https://furrr.futureverse.org/) and [future](https://furrr.futureverse.org/) packages. To run the function in parallel, call `future::plan()` before running it and specify the number of cores you want to use. ```{r} #| echo: FALSE load("parallel.RData") ``` ```{r} #| eval: FALSE set.seed(532454) seeds <- sample.int(1000000, 100, replace = FALSE) plan("multisession", workers = 4) parallel_time <- system.time( multiple_sims <- multiple_mab_simulation( data = tanf, assignment_method = "Date", time_unit = "Month", period_length = 1, algorithm = "Thompson", whole_experiment = FALSE, perfect_assignment = TRUE, prior_periods = "All", blocking = TRUE, block_cols = c("service_center"), data_cols = c( id_col = "ic_case_id", date_col = "appt_date", success_col = "success", condition_col = "condition", month_col = "recert_month", success_date_col = "date_of_recert", assignment_date_col = "letter_sent_date" ), keep_data = FALSE, times = 100, seeds = seeds ) ) plan("sequential") ``` Using 4 cores, it only takes `r as.numeric(parallel_time["elapsed"])` seconds to run 100 simulations. Your core usage is determined by your system requirements. To see how many cores are available on your system call `future::availableCores()`. Using `{future}` comes with some new complexities. Generally, Windows users should use `plan("multisession")`, while Linux and MacOS users should use `plan("multisession")` or `plan("multicore")`. If you are using a High Performance Computing cluster, use the `future.batchtools` backend with the proper plan for your HPC scheduler. For details on how `futures` work, and specific future backends, please consult the [future](https://future.futureverse.org/) documentation. ## Analyzing Repeated Simulations `multiple_mab_simulation()` returns the mostly the same elements as `single_mab_simulation()` except for `final_data`. When `keep_data = TRUE` the `final_data` element is a nested data frame of all the `final_data` from each trial, and when `keep_data = FALSE`, the function returns only objects `bandits`, `assignment_probs`, `estimates`, `assignment_quantities` and `settings`. `assignment_quantities` is a new object, which is the same structure as `bandits` and `assignment_probs` but contains the number of units assigned to each treatment for each trial. The object is also a custom `multiple.mab` class containing similar generic methods as the `mab` class, including `print()` and `summary()`. ```{r} class(multiple_sims) print(multiple_sims) ``` For `multiple.mab` objects, `summary()` returns the average for AIPW estimates and 2 estimates of their standard error. One estimate is the average of the standard errors across each trial, with the variances averaged first $\left(\sqrt{\frac{1}{\text{n}}\sum_1^n\mathbb{V}[\text{Estimate}_i]}\right)$. The other estimate, is the sample variance of all of the AIPW estimates across the trials (`SE_empirical`). 95% confidence intervals are provided using both the empirical cumulative distribution function (eCDF), and the normal CDF using the average standard errors. Additionally, the number of times each treatment arm was selected as the best is displayed, along with the mean and standard deviation of the number of units assigned to each treatment across the trials. The best arm for a given simulation is selected as the treatment arm with the highest Thompson sampling probability or UCB1 value at the end of the trial. ```{r} summary(multiple_sims) |> print(width = Inf) ``` Just like with `mab` objects, the confidence levels can be changed in the `summary()` call. ```{r} summary(multiple_sims, level = 0.75) |> print(width = Inf) ``` ## Plotting Repeated Simulations `multiple.mab` objects also have their own plot method with 3 distinct plots: `"summary"`, `"hist"`, and `"estimate"`. Just like with `mab` these plots are created using `{ggplot2}`, so can be added to with `+` and arguments in `...` can be passed to the `geom*` functions inside. The geoms used are `geom_bar()`, `geom_histogram()`, and `geom_errorbarh()`. `type = "summary"` creates a bar graph showing how many times each treatment arm was selected as the best: ```{r} #| fig.width: 5 plot(multiple_sims, type = "summary") ``` `type = "hist"` creates histograms showcasing the distribution for the AIPW for each treatment arm, over the repeated trials or the distribution of the number of units assigned to each treatment arm. For this function because it also takes advtanage of `facet_grid()`, additional arguments need to be specified in a list labelled `geom` for items to go to `geom_histogram()` and `facet` for items to go to `facet_grid()`. ```{r} #| fig.width: 5 plot(multiple_sims, type = "hist", quantity = "estimate", geom = list(bins = 50)) plot(multiple_sims, type = "hist", quantity = "assignment", facet = list(switch = "x")) ``` `type = "estimate"` is the same as for `mab`, but now the spread can be shown using either empirical or normal CDF. ```{r} #| fig.width: 5 plot(multiple_sims, type = "estimate", cdf = "empirical", level = 0.99, height = 0.4 ) + scale_x_continuous(breaks = seq(0, 1, .1), limits = range(0, 1)) ``` All the data used to construct these plots is available in the output, so users can perform custom analyses or make custom plots. # Additional Topics ## Large Datasets Both `single_mab_simulation()` and `multiple_mab_simulation()` have support for large data objects through `{data.table}`. When working with large datasets, data.tables are faster with common operations and more memory efficient than data.frames or tibbles, so using them can result in faster simulations. To turn a data.frame into a data.table simply call `data.table::setDT()` on your data.frame. Below, solely in order to gauge execution times, we `rbind()` the `tanf` dataset to itself several times to create a large object. ```{r} #| echo: FALSE load("datatable.RData") ``` ```{r} #| eval: FALSE # Prepare the dataset: tanf_large <- tanf setDT(tanf_large) for (i in 1:9) { tanf_large <- rbindlist(list(tanf_large, tanf_large)) } setorder(tanf_large, appt_date) # Set id to be the row number for uniqueness: tanf_large[, id := .I] ``` Now `tanf_large` has `r nrow(tanf_large)` rows to simulate over instead of `r nrow(tanf)`. For this example, the batch size will be 3000 people, resulting in `r ceiling(nrow(tanf_large) / 3000)` periods. When using large datasets in conjunction with Thompson sampling, the direct calculation can fail. In the event of a failure, an attempt will be made to estimate the Thompson sampling probability by simulation draws from the posterior distribution. The number of these draws is set by the `ndraws` argument, which defaults to 5000, and you can change based on how accurate or fast you want the calculation to be. This simulation still has the potential to fail, and if it does, an error will be thrown. If this occurs, we recommend you change `prior_periods` from "All" to a lower number which limits the size data used for Thompson sampling in each period, or use the UCB1 as the decision algorithm instead. ```{r} #| eval: FALSE set.seed(523432453) dataframe_time <- system.time(single_mab_simulation( data = as.data.frame(tanf_large), assignment_method = "Batch", period_length = 3000, algorithm = "Thomspon", whole_experiment = FALSE, perfect_assignment = TRUE, prior_periods = "All", blocking = FALSE, data_cols = c( id_col = "id", success_col = "success", condition_col = "condition" ), ndraws = 5000 )) datatable_time <- system.time(single_mab_simulation( data = tanf_large, assignment_method = "Batch", period_length = 3000, algorithm = "UCB1", whole_experiment = FALSE, perfect_assignment = TRUE, prior_periods = "All", blocking = FALSE, data_cols = c( id_col = "id", success_col = "success", condition_col = "condition" ), ndraws = 5000 )) ``` The data.table version completes in `r datatable_time` seconds while the data.frame version takes `r dataframe_time` seconds making the the data.table version `r round(((dataframe_time/datatable_time) - 1)*100, 0)`% faster in this case. The data.table version may not be faster in all cases. Performance depends on the size of the data, the number of observations in each period, and the total number of periods. For simulations where each period contains fewer than 500 observations or the total size is less than 1,000,000 rows, the data.table version will likely be slower. # References