--- title: "Choosing a Sobol Estimator in Sobol4R" shorttitle: "Choosing a Sobol Estimator" author: - name: "Frédéric Bertrand" affiliation: - Cedric, Cnam, Paris email: frederic.bertrand@lecnam.net date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{Choosing a Sobol Estimator in Sobol4R} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "figures/sobol-stochastic-", fig.width = 7, fig.height = 5, dpi = 150, message = FALSE, warning = FALSE, eval=FALSE ) LOCAL <- identical(Sys.getenv("LOCAL"), "TRUE") library(Sobol4R) library(sensitivity) set.seed(4669) ``` ## Introduction Sobol4R provides a unified interface for global sensitivity analysis of deterministic and stochastic simulators. Several Monte Carlo estimators are available. They correspond to the classical helpers in the \CRANpkg{sensitivity} package, but are accessed through the single function sobol_indices(). This vignette explains: * which Sobol estimators are implemented in Sobol4R, * how they differ in terms of numerical properties, * why Jansen is used as the default in sobol_indices(), * how to work with both deterministic and stochastic models. The goal is to make the choice of estimator explicit and reproducible, while remaining compatible with existing workflows based on \CRANpkg{sensitivity}. ## Supported estimators Sobol4R mirrors the main Monte Carlo estimators from the \CRANpkg{sensitivity} package: * sobol and sobol2007 for classical Saltelli type estimators, * soboljansen for the Jansen variance of differences estimator, * sobolEff for efficient radial sampling, * sobolmartinez for Martinez correlation based estimators. All of these are exposed through the argument estimator in `sobol_indices()`. For example ```{r} sobol_indices( model = ishigami_model, design = sobol_design(n = 512, d = 3, quasi = TRUE), estimator = "jansen", # default replicates = 1L ) ``` In addition, you can still construct designs and call the estimators directly from \CRANpkg{sensitivity}. Sobol4R is designed to work well with both approaches. ## Two complementary analysis paths Sobol4R exposes two ways to compute global sensitivity indices, depending on your workflow and the level of control you require. * **Reuse the estimators from the sensitivity package.** You can generate designs with Sobol4R (or your own routines) and pass the matrices directly to `sensitivity::sobol()`, `sensitivity::sobol2007()`, `sensitivity::soboljansen()`, `sensitivity::sobolEff()`, or `sensitivity::sobolmartinez()`. `Sobol4R provides autoplot()` methods that visualise these objects without altering your existing code. * **Use the in package estimators with built in Jansen support**. The streamlined `sobol_design()` and `sobol_indices()` helpers generate the Saltelli type matrices, evaluate the model (including replicated runs for stochastic simulators), and return a unified `sobol_result` object. Sobol4R implements several estimators internally, including Jansen, Martinez and Saltelli. The **default estimator is Jansen**, chosen for its numerical robustness and stable behaviour with non centred or noisy outputs. Results can be summarised or plotted directly and may include bootstrap like quantiles when analysing stochastic simulators. ## Comparison of estimators | Estimator | Implemented in | First order formula type | Total order formula type | Numerical stability | Sensitivity to non centred outputs | Works with stochastic simulators | Pros | Cons | Recommended usage | |----------|----------------|--------------------------|--------------------------|---------------------|------------------------------------|----------------------------------|------|------|-------------------| | **Jansen** | sensitivity (soboljansen), Sobol4R (default) | Variance of differences | Variance of differences | **High** | **Robust** | Yes (with replicates) | Very stable, low bias, well suited to noisy or uncentered outputs, simple interpretation | Slightly higher variance than Martinez in some settings | **Default choice** for deterministic and stochastic models | | **Saltelli 2002** (`sobol`) | sensitivity, Sobol4R | Covariance based | Variance of differences | Low to medium | Requires centering (otherwise biased) | Not ideal unless centering is enforced | Classical, widely cited, matches early literature | Strongly biased without centering, unstable when Y has large mean | Legacy compatibility with early Saltelli style analyses | | **Saltelli 2007** (`sobol2007`) | sensitivity, Sobol4R | Improved covariance based | Variance of differences | Medium | Requires centering (built in centering possible) | Yes, but care is needed | More stable than 2002 version, matches `sensitivity::sobol2007` | Still less stable than Jansen or Martinez | When strict compatibility with Saltelli 2007 is needed | | **Martinez** (`sobolmartinez`) | sensitivity, Sobol4R | Correlation based | Correlation based | **Very high** | **Very robust** | Yes | Low variance, stable, handles nonlinearities and interactions well | Slightly more complex for explanation | Excellent alternative to Jansen, suited to strongly nonlinear models | | **SobolEff** (`sobolEff`) | sensitivity, Sobol4R | Efficient radial sampling | Same | **High** | Robust | Yes | Fewer model evaluations for a given precision, good for expensive models | Requires structured design, less standard in introductory texts | Efficient Sobol analysis when model evaluations are costly | ## Recommended default Sobol4R uses the **Jansen estimator** as the default because: * it is numerically stable across a wide range of models, * it behaves well on non centred outputs, * it integrates naturally with stochastic simulators through the replication mechanism, * its variance of differences formulation avoids the conditioning issues that affect classical Saltelli estimators. Alternative estimators such as **Martinez** or **SobolEff** remain available for users who require advanced properties. The `sobol` and `sobol2007` variants ensure compatibility with historical analyses and existing code bases. ## Practical guidance In practice, the following rules of thumb are useful. * Start with **Jansen** for both deterministic and stochastic models. * Use **Martinez** as an alternative when you want very stable indices and are comfortable with correlation based formulas. * Use **SobolEff** if each model evaluation is expensive and you want to reduce the number of simulator runs. * Only use `sobol` or `sobol2007` when you must reproduce legacy analyses or benchmark against published Saltelli style results.