This app allows exploration of the concept of uncertainty and sensitivity analysis. For this purpose, we use the basic bacteria infection model introduced in the app of that name. If you haven’t done so yet, familiarize yourself and work through that app first.
The model is the continuous time model in the ‘basic bacteria model’ app. See the documentation there for the model description. For convenience, here is a quick summary and the equations again.
We model 2 compartments:
We specify the following processes/flows:
The equations are given by:
\[\dot B = g B (1-\frac{B}{B_{max}}) - d_B B - k BI\] \[\dot I = r B I - d_I I\]
Often, for a given system we want to model, we only have rough estimates for the model parameters and starting values. Instead of specifying fixed values (which results in a single time-series), we can instead specify parameter ranges, choose sets of parameter values from these ranges, and run the model for multiple sets of parameters.
The simplest way of specifying parameter ranges is to set an upper and lower bound (based on what we know about the biology of the system) and randomly choose any value within those bounds. We can almost always set bounds even if we know very little about a system. Assume we want to model the death rate of some cell type (e.g. NK cells) in humans. We might not know anything, but we can still be fairly confident that their lifespan is at least 1 second and less than 100 years. That’s of course a wide range and we should and usually can narrow ranges further, based on biological knowledge of a given system.
If we are fairly certain that values are close to some quantity, instead of specifying a uniform distribution, we can choose one that is more peaked around the most likely value. Normal distributions are not ideal since they allow negative values, which doesn’t make sense for our parameters. The gamma distribution is a better idea, since it leads to only positive values.
To run the model for this app, we need to specify values for the 2 initial conditions, B0 and I0, and the 6 model parameters g, Bmax, dB, k, r, dI. All initial conditions and parameters are sampled uniformly between the specified upper and lower bound, apart from the bacteria growth rate, which is given by a gamma distribution, with user-specified mean and variance. For this teaching app, there is no biological reason for making bacterial growth different, I just picked one parameter and decided to make it non-uniformly distributed to show you different ways one can implement distributions from which to draw parameter samples.
The way the samples are drawn could be done completely randomly, but that would lead to inefficient sampling. A smarter method exists, known as Latin Hypercube sampling (LHS). It essentially ensures that we sample the full range of possible parameter combinations in an efficient manner. For more technical details, see e.g. (Saltelli et al. 2004). For this app, we use LHS.
Once we specify the ranges for each parameter, the sampling method, and the number of samples, the simulation draws that many samples, runs the model for each sample, and records outcomes of interest. While the underlying simulation returns a time-series for each sample, we are usually not interested in the full time-series. Instead, we are interested in some summary quantity. For instance in this model, we might be interested in the maximum/peak level of bacteria during the infection, the level of bacteria at the end (the steady state) of the infection, and the level of the immune response at steady state. This app records and reports those 3 quantities as Bpeak, Bsteady and Isteady.
Results from such simulations for multiple samples can be analyzed in different ways. The most basic one, called uncertainty analysis only asks what level of uncertainty we have in our outcomes of interest, given the amount of uncertainty in our model parameter values. This can be graphically represented with a boxplot, and is one of the plot options for this app.
In a next step, we can ask ‘how sensitive is the outcome(s) of interest to variation in specific parameters’ - that part is the sensitivity analysis. When you run the simulations, you essentially do both uncertainty and sensitivity analysis at the same time, it’s just a question of how you further process the results. We can graphically inspect the relation between outcome and some parameter with scatterplots. If we find that there is a monotone up or down (or neither) trend between parameter and outcome, we can also summarize the finding using a correlation coefficient. For this type of analysis, using the Spearman rank correlation coefficient is useful, which is what the app produces below the figures.
This simulation (as well as some of the others) involves sampling. This leads to some level of randomness. In science, we want to be as reproducible as possible. Fortunately, random numbers on a computer are not completely random, but can be reproduced. In practice, this is done by specifying a random number seed, in essence a starting position for the algorithm to produce pseudo-random numbers. As long as the seed is the same, the code should produce the same pseudo-random numbers each time, thus ensuring reproducibility.
First, familiarize yourself with the setup of the app, it looks different from most others. Parameters are not set to specific values. Instead, most parameters have a lower and upper bound. For each simulation that is run, random values for the parameter are chosen uniformly between those bounds. The parameter g does not have a uniform but instead a gamma distribution, you can specify its mean and variance to determine the distribution from which values are sampled.
The default outcome plots are boxplots, which show the distribution of the 3 outcomes of interest for the different parameter samples. You can set the number of samples you want to run. Samples are constructed using the latin hypercube method to efficiently span the space of possible parameter values. In general, more samples are better, but of course take longer to run.
Since the creation of parameter samples involves some element of uncertainty, we need to make use of random numbers. We still want results to be reproducible. That’s where the random number seed comes in. As long as the seed is the same, the code should produce the same pseudo-random numbers each time, thus ensuring reproducibility. Let’s explore this.
Note that each sample means one simulation of the underlying dynamical model, so as sample numbers increase, things slow down. Also note the ‘system might not have reached steady state’ message. If for too many of the samples steady state has not been reached, the results for Bsteady and Isteady are not correct. In that case you need to increase the simulation time to allow the system to settle into steady state. For some parameter combinations, that can take very long.
R
console. That generally means that the parameters for a given simulation are such that the differential equation solver can’t properly run the model. That usually corresponds to biologically unrealistic parameter settings. We’ll ignore them, but if you did a research project and you got such warning or error messages, you’d have to figure out why you get them and only once you fully understand why is it maybe ok to ignore them.The above approach of exploring the impact of a parameter on results by varying bounds is tedious. Also, often we have bounds that are specified by biology, and not subject to us changing them. It would still be useful to know how a given parameter impacts the results. This is where sensitivity analysis comes in. We run the same simulations, but now instead of plotting outcomes as a boxplot, we produce scatterplots for outcomes as function of each varied parameter.
Since our model is rather simple, we can actually determine relations between parameters and some of the outcomes analytically. Specifically, it is possible to compute the steady state values for B and I, Bsteady and Isteady. If you don’t know what steady states are and how to compute them, go through the “Bacterium Model Exploration” app, where this is explained.
The important take-home message from this task is that the influence of a parameter on some outcome can be different over different ranges. For instance in range A-B, the parameter might have a major influence, but once the parameter value goes above B, the parameter does not further influence the result. If you have large uncertainty in your parameters, it might be worth considering both the full range, and dividing the range into smaller areas to see how the parameter behaves.
simulate_usanalysis
. That function repeatedly calls simulate_basicbacteria_ode
.help()
command for more information on how to use the functions directly. If you go that route, you need to use the results returned from this function and produce useful output (such as a plot) yourself.vignette('DSAIRM')
into the R console.