Parameterization of Response Distributions in brms

The purpose of this vignette is to discuss the parameterizations of the families (i.e., response distributions) used in brms. For a more general overview of the package see vignette("brms_overview").

Notation

Throughout this vignette, we denote values of the response variable as \(y\), a density function as \(f\), and use \(\mu\) to refer to the main model parameter, which is usually the mean of the response distribution or some closely related quantity. In a regression framework, \(\mu\) is not estimated directly but computed as \(\mu = g(\eta)\), where \(\eta\) is a predictor term (see help(brmsformula) for details) and \(g\) is the response function (i.e., inverse of the link function).

Location shift models

The density of the gaussian family is given by \[ f(y) = \frac{1}{\sqrt{2\pi}\sigma} \exp\left(-\frac{1}{2}\left(\frac{y - \mu}{\sigma}\right)^2\right) \]

where \(\sigma\) is the residual standard deviation. The density of the student family is given by \[ f(y) = \frac{\Gamma((\nu + 1)/2)}{\Gamma(\nu/2)} \frac{1}{\sqrt{\nu\pi}\sigma}\left(1 + \frac{1}{\nu} \left(\frac{y - \mu}{\sigma}\right)^2\right)^{-(\nu+1)/2} \]

\(\Gamma\) denotes the gamma function and \(\nu > 1\) are the degrees of freedom. As \(\nu \rightarrow \infty\), the student distribution becomes the gaussian distribution. The density of the skew_normal family is given by \[ f(y) = \frac{1}{\sqrt{2\pi}\omega} \exp\left(-\frac{1}{2} \left(\frac{y - \xi}{\omega}\right)^2 \right) \left(1 + \text{erf} \left( \alpha \left(\frac{y - \xi}{\omega \sqrt{2}} \right) \right) \right) \]

where \(\xi\) is the location parameter, \(\omega\) is the positive scale parameter, \(\alpha\) the skewness parameter, and \(\text{erf}\) denotes the error function of the gaussian distribution. To parameterize the skew-normal distribution in terms of the mean \(\mu\) and standard deviation \(\sigma\), \(\omega\) and \(\xi\) are computed as \[ \omega = \frac{\sigma}{\sqrt{1 - \frac{2}{\pi} \frac{\alpha^2}{1 + \alpha^2}}} \]

\[ \xi = \mu - \omega \frac{\alpha}{\sqrt{1 + \alpha^2}} \sqrt{\frac{2}{\pi}} \]

If \(\alpha = 0\), the skew-normal distribution becomes the gaussian distribution. For location shift models, \(y\) can be any real value.

Binary and count data models

The density of the binomial family is given by \[ f(y) = {N \choose y} \mu^{y} (1-\mu)^{N - y} \] where \(N\) is the number of trials and \(y \in \{0, ... , N\}\). When all \(N\) are \(1\) (i.e., \(y \in \{0,1\}\)), the bernoulli distribution for binary data arises.

For \(y \in \mathbb{N}_0\), the density of the poisson family is given by \[ f(y) = \frac{\mu^{y}}{y!} \exp(-\mu) \] The density of the negbinomial (negative binomial) family is \[ f(y) = {y + \phi - 1 \choose y} \left(\frac{\mu}{\mu + \phi}\right)^{y} \left(\frac{\phi}{\mu + \phi}\right)^\phi \] where \(\phi\) is a positive precision parameter. For \(\phi \rightarrow \infty\), the negative binomial distribution becomes the poisson distribution. The density of the geometric family arises if \(\phi\) is set to \(1\).

Time-to-event models

With time-to-event models we mean all models that are defined on the positive reals only, that is \(y \in \mathbb{R}^+\). The density of the lognormal family is given by \[ f(y) = \frac{1}{\sqrt{2\pi}\sigma y} \exp\left(-\frac{1}{2}\left(\frac{\log(y) - \mu}{\sigma}\right)^2\right) \] where \(\sigma\) is the residual standard deviation on the log-scale. The density of the Gamma family is given by \[ f(y) = \frac{(\alpha / \mu)^\alpha}{\Gamma(\alpha)} y^{\alpha-1} \exp\left(-\frac{\alpha y}{\mu}\right) \] where \(\alpha\) is a positive shape parameter. The density of the weibull family is given by \[ f(y) = \frac{\alpha}{s} \left(\frac{y}{s}\right)^{\alpha-1} \exp\left(-\left(\frac{y}{s}\right)^\alpha\right) \] where \(\alpha\) is again a positive shape parameter and \(s = \mu / \Gamma(1 + 1 / \alpha)\) is the scale parameter to that \(\mu\) is the mean of the distribution. The exponential family arises if \(\alpha\) is set to \(1\) for either the gamma or Weibull distribution. The density of the inverse.gaussian family is given by \[ f(y) = \left(\frac{\alpha}{2 \pi y^3}\right)^{1/2} \exp \left(\frac{-\alpha (y - \mu)^2}{2 \mu^2 y} \right) \] where \(\alpha\) is a positive shape parameter. The cox family implements Cox proportional hazards model which assumes a hazard function of the form \(h(y) = h_0(y) \mu\) with baseline hazard \(h_0(y)\) expressed via M-splines (which integrate to I-splines) in order to ensure monotonicity. The density of the cox model is then given by \[ f(y) = h(y) S(y) \] where \(S(y)\) is the survival function implied by \(h(y)\).

Extreme value models

Modeling extremes requires special distributions. One may use the weibull distribution (see above) or the frechet distribution with density \[ f(y) = \frac{\nu}{s} \left(\frac{y}{s}\right)^{-1-\nu} \exp\left(-\left(\frac{y}{s}\right)^{-\nu}\right) \] where \(s = \mu / \Gamma(1 - 1 / \nu)\) is a positive scale parameter and \(\nu > 1\) is a shape parameter so that \(\mu\) predicts the mean of the Frechet distribution. A generalization of both distributions is the generalized extreme value distribution (family gen_extreme_value) with density \[ f(y) = \frac{1}{\sigma} t(y)^{\xi + 1} \exp(-t(y)) \] where \[ t(y) = \left(1 + \xi \left(\frac{y - \mu}{\sigma} \right)\right)^{-1 / \xi} \] with positive scale parameter \(\sigma\) and shape parameter \(\xi\).

Response time models

One family that is especially suited to model reaction times is the exgaussian (‘exponentially modified Gaussian’) family. Its density is given by

\[ f(y) = \frac{1}{2 \beta} \exp\left(\frac{1}{2 \beta} \left(2\xi + \sigma^2 / \beta - 2 y \right) \right) \text{erfc}\left(\frac{\xi + \sigma^2 / \beta - y}{\sqrt{2} \sigma} \right) \] where \(\beta\) is the scale (inverse rate) of the exponential component, \(\xi\) is the mean of the Gaussian component, \(\sigma\) is the standard deviation of the Gaussian component, and \(\text{erfc}\) is the complementary error function. We parameterize \(\mu = \xi + \beta\) so that the main predictor term equals the mean of the distribution.

Another family well suited for modeling response times is the shifted_lognormal distribution. It’s density equals that of the lognormal distribution except that the whole distribution is shifted to the right by a positive parameter called ndt (for consistency with the wiener diffusion model explained below).

A family concerned with the combined modeling of reaction times and corresponding binary responses is the wiener diffusion model. It has four model parameters each with a natural interpretation. The parameter \(\alpha > 0\) describes the separation between two boundaries of the diffusion process, \(\tau > 0\) describes the non-decision time (e.g., due to image or motor processing), \(\beta \in [0, 1]\) describes the initial bias in favor of the upper alternative, and \(\delta \in \mathbb{R}\) describes the drift rate to the boundaries (a positive value indicates a drift towards to upper boundary). The density for the reaction time at the upper boundary is given by

\[ f(y) = \frac{\alpha}{(y-\tau)^3/2} \exp \! \left(- \delta \alpha \beta - \frac{\delta^2(y-\tau)}{2}\right) \sum_{k = - \infty}^{\infty} (2k + \beta) \phi \! \left(\frac{2k + \alpha \beta}{\sqrt{y - \tau}}\right) \]

where \(\phi(x)\) denotes the standard normal density function. The density at the lower boundary can be obtained by substituting \(1 - \beta\) for \(\beta\) and \(-\delta\) for \(\delta\) in the above equation. In brms the parameters \(\alpha\), \(\tau\), and \(\beta\) are modeled as auxiliary parameters named bs (‘boundary separation’), ndt (‘non-decision time’), and bias respectively, whereas the drift rate \(\delta\) is modeled via the ordinary model formula that is as \(\delta = \mu\).

Quantile regression

Quantile regression is implemented via family asym_laplace (asymmetric Laplace distribution) with density

\[ f(y) = \frac{p (1 - p)}{\sigma} \exp\left(-\rho_p\left(\frac{y - \mu}{\sigma}\right)\right) \] where \(\rho_p\) is given by \(\rho_p(x) = x (p - I_{x < 0})\) and \(I_A\) is the indicator function of set \(A\). The parameter \(\sigma\) is a positive scale parameter and \(p\) is the quantile parameter taking on values in \((0, 1)\). For this distribution, we have \(P(Y < g(\eta)) = p\). Thus, quantile regression can be performed by fixing \(p\) to the quantile to interest.

Probability models

The density of the Beta family for \(y \in (0,1)\) is given by \[ f(y) = \frac{y^{\mu \phi - 1} (1-y)^{(1-\mu) \phi-1}}{B(\mu \phi, (1-\mu) \phi)} \] where \(B\) is the beta function and \(\phi\) is a positive precision parameter. A multivariate generalization of the Beta family is the dirichlet family with density \[ f(y) = \frac{1}{B((\mu_{1}, \ldots, \mu_{K}) \phi)} \prod_{k=1}^K y_{k}^{\mu_{k} \phi - 1}. \] The dirichlet family is implemented with the multivariate logit link function so that \[ \mu_{j} = \frac{\exp(\eta_{j})}{\sum_{k = 1}^{K} \exp(\eta_{k})} \] For reasons of identifiability, \(\eta_{\rm ref}\) is set to \(0\), where \({\rm ref}\) is one of the response categories chosen as reference.

An alternative to the dirichlet family is the logistic_normal family with density \[ f(y) = \frac{1}{\prod_{k=1}^K y_k} \times \text{multivariate_normal}(\tilde{y} \, | \, \mu, \sigma, \Omega) \] where \(\tilde{y}\) is the multivariate logit transformed response \[ \tilde{y} = (\log(y_1 / y_{\rm ref}), \ldots, \log(y_{\rm ref-1} / y_{\rm ref}), \log(y_{\rm ref+1} / y_{\rm ref}), \ldots, \log(y_K / y_{\rm ref})) \] of dimension \(K-1\) (excluding the reference category), which is modeled as multivariate normally distributed with latent mean and standard deviation vectors \(\mu\) and \(\sigma\), as well as correlation matrix \(\Omega\).

Circular models

The density of the von_mises family for \(y \in (-\pi,\pi)\) is given by \[ f(y) = \frac{\exp(\kappa \cos(y - \mu))}{2\pi I_0(\kappa)} \] where \(I_0\) is the modified Bessel function of order 0 and \(\kappa\) is a positive precision parameter.

Ordinal and categorical models

For ordinal and categorical models, \(y\) is one of the categories \(1, ..., K\). The intercepts of ordinal models are called thresholds and are denoted as \(\tau_k\), with \(k \in \{1, ..., K-1\}\), whereas \(\eta\) does not contain a fixed effects intercept. Note that the applied link functions \(h\) are technically distribution functions \(\mathbb{R} \rightarrow [0,1]\). The density of the cumulative family (implementing the most basic ordinal model) is given by \[ f(y) = g(\tau_{y + 1} - \eta) - g(\tau_{y} - \eta) \]

The densities of the sratio (stopping ratio) and cratio (continuation ratio) families are given by \[ f(y) = g(\tau_{y + 1} - \eta) \prod_{k = 1}^{y} (1 - g(\tau_{k} - \eta)) \] and \[ f(y) = (1 - g(\eta - \tau_{y + 1})) \prod_{k = 1}^{y} g(\eta - \tau_{k}) \]

respectively. Note that both families are equivalent for symmetric link functions such as logit or probit. The density of the acat (adjacent category) family is given by \[ f(y) = \frac{\prod_{k=1}^{y} g(\eta - \tau_{k}) \prod_{k=y+1}^K(1-g(\eta - \tau_{k}))}{\sum_{k=0}^K\prod_{j=1}^k g(\eta-\tau_{j}) \prod_{j=k+1}^K(1-g(\eta - \tau_{j}))} \] For the logit link, this can be simplified to \[ f(y) = \frac{\exp \left(\sum_{k=1}^{y} (\eta - \tau_{k}) \right)} {\sum_{k=0}^K \exp\left(\sum_{j=1}^k (\eta - \tau_{j}) \right)} \] The linear predictor \(\eta\) can be generalized to also depend on the category \(k\) for a subset of predictors. This leads to category specific effects (for details on how to specify them see help(brm)). Note that cumulative and sratio models use \(\tau - \eta\), whereas cratio and acat use \(\eta - \tau\). This is done to ensure that larger values of \(\eta\) increase the probability of higher response categories.

The categorical family is currently only implemented with the multivariate logit link function and has density \[ f(y) = \mu_{y} = \frac{\exp(\eta_{y})}{\sum_{k = 1}^{K} \exp(\eta_{k})} \] Note that \(\eta\) does also depend on the category \(k\). For reasons of identifiability, \(\eta_{1}\) is set to \(0\). A generalization of the categorical family to more than one trial is the multinomial family with density \[ f(y) = {N \choose y_{1}, y_{2}, \ldots, y_{K}} \prod_{k=1}^K \mu_{k}^{y_{k}} \] where, for each category, \(\mu_{k}\) is estimated via the multivariate logit link function shown above.

Zero-inflated and hurdle models

Zero-inflated and hurdle families extend existing families by adding special processes for responses that are zero. The density of a zero-inflated family is given by \[ f_z(y) = z + (1 - z) f(0) \quad \text{if } y = 0 \\ f_z(y) = (1 - z) f(y) \quad \text{if } y > 0 \] where \(z\) denotes the zero-inflation probability. Currently implemented families are zero_inflated_poisson, zero_inflated_binomial, zero_inflated_negbinomial, and zero_inflated_beta.

The density of a hurdle family is given by \[ f_z(y) = z \quad \text{if } y = 0 \\ f_z(y) = (1 - z) f(y) / (1 - f(0)) \quad \text{if } y > 0 \] Currently implemented families are hurdle_poisson, hurdle_negbinomial, hurdle_gamma, and hurdle_lognormal.

The density of a zero-one-inflated family is given by \[ f_{\alpha, \gamma}(y) = \alpha (1 - \gamma) \quad \text{if } y = 0 \\ f_{\alpha, \gamma}(y) = \alpha \gamma \quad \text{if } y = 1 \\ f_{\alpha, \gamma}(y) = (1 - \alpha) f(y) \quad \text{if } y \notin \{0, 1\} \] where \(\alpha\) is the zero-one-inflation probability (i.e. the probability that zero or one occurs) and \(\gamma\) is the conditional one-inflation probability (i.e. the probability that one occurs rather than zero). Currently implemented families are zero_one_inflated_beta.