Overview

This app illustrates the method of bootstrapping data, which is one way one can obtain confidence intervals for parameters when fitting mechanistic models to data.

The Model

The data and the underlying simulation model are the same as in the ‘Basic model fitting’ app. If you haven’t worked your way through that app, do that first.

Fitting Approach

This app allows you to fit the parameters either in linear or log space. Fitting in log space can be useful if you have a large range of parameters you want to try. The underlying problem is the same, no matter what scale you fit your parameters on, but sometimes one version or the other can lead to better performance of your solver.

Note that while choosing to fit parameters on a linear or log scale is just done to optimize performance of the computer code, a decision to fit your data on a linear or log scale corresponds to different biological problems and underlying assumptions about the processes that generated the data and any uncertainty/noise. For this app, data is fit on a log scale.

Bootstrapping

The main new feature of this app is the inclusion of bootstrapping, a sampling process that allows one to obtain confidence intervals for the estimated parameters. The basic approach goes like this:

  • Resample your data (with replacement), to get a new dataset as large as the original one. For instance if you had 20 data points in the original dataset, you’ll again have 20 data points. But now some of them might occur more than once, and others might be missing. This approach tries to mimic the idea that if you had done another experiment, you might have gotten slightly different data.
  • Fit your model to this ‘new’ dataset you obtained through sampling. Record the best-fit estimates for your parameters.
  • Do this ‘sample, then fit’ approach multiple times (generally 100s or more) to obtain distribution for your parameter estimates.
  • From these distributions, extract e.g. 95% confidence intervals.

Bootstrapping is conceptually easy to understand and also fairly easy to implement. The big drawback is that it takes time to run. Instead of fitting your model (until it converges) once, you now have to do it many times. That can take time and usually requires a mix of fast computers, parallelization, good/quick optimizers and simple models. For our teaching model, we’ll use a simple model, only few bootstrap samples and a limited number of iterations per fit.

What to do

The fit is run multiple times. First, the model is fit to the actual data. Then, the model is fit again for the specified number of bootstrap samples to data created by the bootstrap sampling process. Note that the default for the number of fitting iterations and the number of bootstrap samples are both very low. Usually, we might want to take as many steps as it takes until we found the best fit (which could be a lot). And we generally also want 100s or 1000s of bootstrap samples. Unfortunately, that would take very long, so for the purposes of this teaching app, we’ll keep both iterations and bootstrap samples at values less than you should use in a ‘real world’ situation.

Task 1

  • Since bootstrapping involves sampling, different random number seeds will lead to different results. Try by running the simulation for 2 different seeds. Note that the best fit value does not change. This could either be because we let the solver run long enough to find the best fit (not the case here), or because the underlying solver is deterministic. The latter is the case here. Some optimizers have a random component, then setting a seed for them is also important.

Task 2

  • If the speed of your computer allows, increase the number of bootstrap samples to 10 or 50 or more. Re-run with different random number seeds. As you get more samples, the impact of the random sampling should diminish and eventually the results should be robust enough that you get essentially the same confidence intervals independent of random number seed.

Task 3

  • Repeat the tasks above and explore how fitting the parameters in log space vs. linear space changes things. In theory, you should get the same results, since the underlying problem is the same. However, sometimes changing this scale can change the performance of the underlying solver. If we waited until the solver converged (found the best optimal solution), the result should be the same, but the time to get there might different. Here, since we only take a few iteration steps, the actual result might change.

Task 4

  • Explore the above tasks and see how results change if you increase the number of iterations to whatever runs reasonably fast on your computer.

Note that the time it takes to run the code scales with the product of the iterations and number of bootstraps, so if you set both high, things will get really slow.

Further Information

References

Efron, Bradley, and Robert Tibshirani. 1993. An introduction to the bootstrap. New York: Chapman & Hall.