Comparing posterior predictions

Overview

With bayesnec we have included a function that allows bootstrapped comparisons of posterior predictions. The main focus here is to showcase how the user can fit several different bnec model fits and can compare differences in the posterior predictions across these fits for individual endpoint estimates (e.g. nec, nsec or ecx) or across a range of predictor (x) values. Below we demonstrate usage of compare_posterior for objects of class bayesnecfit and bayesmanecfit. In this example we compare different types of models and model sets using a single dataset. However, the intent of this function is to allow comparison across different datasets that might represent, for example, different levels of a fixed factor covariate. At this time bnec does not allow inclusion of an interaction with a fixed factor. Including an interaction term within each of the non-linear models implemented in bayesnec is relatively straightforward, and may be introduced in future releases. However, in many cases the functional form of the response may change with different levels of a given factor. The substantial complexity of defining all possible non-linear model combinations at each factor level means it unlikely this could be feasibly implemented in bayesnec in the short term. In the meantime the greatest flexibility in the functional form of individual model fits can be readily obtained using models fitted independently to data within each factor level.

To run this vignette, we will also need some additional packages

library(ggplot2)

Comparing posterior endpoint values

library(brms)
library(bayesnec)
data(nec_data)

# Fit a set of models
exmp <- bnec(data = nec_data, x_var = "x", y_var = "y", model = "all")

# bayesmanecfit
class(exmp)

This call fits all models that are suitable for modelling beta response data using a logit link. We can pull out the nec models and the ecx models separately, to create two more alternative model fits of this data.

exmp_nec <- pull_out(exmp, model = "nec")
exmp_ecx <- pull_out(exmp, model = "ecx")

Now we have three different averaged model fits, all of class bayemanec in this case (because they all contain multiple fits). We can compare their posterior estimates of the ex10 values using compare_posterior.

post_comp <- compare_posterior(list("all" = exmp, "ecx" = exmp_ecx,
                                    "nec" = exmp_nec),
                               comparison = "ecx", ecx_val = 10)
names(post_comp)
# "posterior_list" 
# "posterior_data" 
# "diff_list"      
# "diff_data"      
# "prob_diff"

The compare_posterior function outputs several elements in a named list. This includes the posterior_data for each model in the comparison as a data.frame which we can use to plot a geom_density plot of the posterior estimates, so they can be compared visually.

ggplot(data = post_comp$posterior_data, mapping = aes(x = value)) + 
  geom_density(mapping = aes(group = model, colour = model, fill = model),
               alpha = 0.3) +
  theme_classic()

From this you can see that the ec10 estimates are very similar for the nec and all. This is because the nec model types dominate the model weights in this all fit, see wi in exmp$mod_stats. The ec10 estimate is slightly lower (more conservative) for the ecx based models.

The data.frame “diff_data” can be used to make a similar plot, but specifically for the differences among models.

ggplot(data = post_comp$diff_data, mapping = aes(x = diff)) +
  geom_density(mapping = aes(group = comparison, colour = comparison,
                             fill = comparison), alpha = 0.3) +
  theme_classic() 

This shows that the differences among the three estimates. Overall they are all quite similar, but with there no difference in the nec and all estimates (the probability density overlaps zero), and a tendency for the ecx to be lower than all and nec.

Finally, the probability that the endpoint estimate for one model set is greater than the other is obtained in “prob_diff”. Here you can see there is ~57% chance that all is great than ecx, a 46% chance that all is greater than nec and a 36% chance that ecx is greater than nec.

post_comp$prob_diff
#  all.ecx all.nec ecx.nec
#1 0.57375    0.46   0.361

Comparing posterior fitted values

The user can also compare posterior fitted values across the full range of x values, using comparison = "fitted".

post_comp_fitted <- compare_posterior(list("all" = exmp, "ecx" = exmp_ecx,
                                           "nec" = exmp_nec),
                                      comparison = "fitted")

In the case of comparison = "fitted" most of the elements returned by compare_posterior are class data.frame, with summary values for the posteriors, difference values and probabilities returned for each value of x, for each model, or model comparison.

head(post_comp_fitted$posterior_data)
#   model          x  Estimate      Q2.5     Q97.5
# 1   all 0.03234801 0.8865920 0.8609242 0.8987086
# 2   all 0.09741274 0.8868260 0.8676755 0.8987086
# 3   all 0.16247747 0.8870972 0.8712948 0.8987086
# 4   all 0.22754220 0.8873789 0.8733017 0.8987346
# 5   all 0.29260692 0.8877217 0.8746777 0.8988511
# 6   all 0.35767165 0.8880612 0.8760193 0.8988571

head(post_comp_fitted$diff_data)
#   comparison diff.Estimate   diff.Q2.5 diff.Q97.5          x
# 1    all-ecx  -0.005726618 -0.03212779 0.01120134 0.03234801
# 2    all-ecx  -0.005424647 -0.02608475 0.01123945 0.09741274
# 3    all-ecx  -0.005025740 -0.02334368 0.01109118 0.16247747
# 4    all-ecx  -0.004719021 -0.02135375 0.01119652 0.22754220
# 5    all-ecx  -0.004280943 -0.02037682 0.01119120 0.29260692
# 6    all-ecx  -0.003893997 -0.01951922 0.01128307 0.35767165

Using the collated posterior_data we can plot the predicted curves with confidence bounds for each of the input models. This shows clearly that the ecx model set begins to decline earlier than the nec or all sets, which are flat prior to the nec step point, and then decline more rapidly.

ggplot(data = post_comp_fitted$posterior_data) +
  geom_line(mapping = aes(x = x, y = Estimate, color = model), size = 0.5) +
  geom_ribbon(mapping = aes(x = x, ymin = Q2.5, ymax = Q97.5, fill = model),
              alpha = 0.3)

We can plot the differences between pairs of models in the list by plotting “diff.Estimate” from “diff_data” and using colours for the different comparisons. This plot highlights where the differences among these model sets are the greatest. As we have seen for ec10, the nec and all model sets are relatively similar across the entire range of concentrations (x) (green band overlaps zero). The red band is the difference between ecx and all and shows that the ecx set has slightly higher estimates than all at low to moderate x values, although really these curves are very similar as we saw from such small differences in the ec10 estimates above.

ggplot(data = post_comp_fitted$diff_data) +
  geom_line(mapping = aes(x = x, y = diff.Estimate, color = comparison),
            size = 0.5) +
  geom_ribbon(mapping = aes(x = x, ymin  =diff.Q2.5, ymax = diff.Q97.5,
              fill = comparison), alpha = 0.3)

And finally we can plot the probability that one model is greater than the other by plotting “prob” from “diff_data”. The pattern of this plot is identical to the plot of differences, but the y axis now shows the probability of these differences. The green line hovers around 0.5 clearly indicating the lack of significant difference in the nec and all model sets at any point of the x-curve. The red and blue curves pass through 0.5 at several points, meaning there are parts of the curve where there is no significant difference between the ecx and the nec or all predictions. The greatest probability of difference among these curves is between values of ~1 and ~1.5 of x, where the probability of difference does tend towards high values, at the point where the *ecx set deviates from the nec** set.

ggplot(data = post_comp_fitted$prob_diff) +
  geom_line(mapping = aes(x = x, y = prob, color = comparison), size = 0.5)