The ‘gfilmm’ package

What does it do?

The ‘gfilmm’ package allows to generate simulations from the generalized fiducial distribution of the parameters of a Gaussian linear mixed model with categorical random effects (numeric random effects are not supported) and interval data. It also provides some helper functions to get summary statistics and confidence intervals.

The algorithm implemented in ‘gfilmm’ is the one described in the paper Generalized fiducial inference for normal linear mixed models written by Jessi Cisewski and Jan Hannig. It is coded in C++ and the code is based on the original Matlab code written by Jessi Cisewski.

Fiducial inference has something similar to Bayesian inference: the uncertainty about the parameters are represented by a distribution, the fiducial distribution, with the help of which we conduct inference on the parameters in a way similar to the Bayesian way, based on the posterior distribution of the parameters. The main difference is that there is no prior distribution (so fiducial inference is similar to objective Bayesian inference). The fiducial inference yields results close to the ones of the frequentist inference.

First example: a non-mixed linear model

The data must be given as a dataframe. Here we simulate data from a simple linear regression model:

set.seed(666L)
n <- 30L
x <- 1L:n
y <- rnorm(n, mean = x, sd = 2)
y_rounded <- round(y, digits = 1L)
dat <- data.frame(
  ylwr = y_rounded - 0.05,
  yupr = y_rounded + 0.05,
  x = x
)

Now we run the fiducial sampler:

library(gfilmm)
fidSims <- gfilmm(
  y = ~ cbind(ylwr, yupr), # interval data
  fixed = ~ x,             # fixed effects
  random = NULL,           # random effects
  data = dat,              # data
  N = 10000L               # number of simulations
)

A summary of the fiducial simulations (the Pr(=0) column will be explained latter):

gfiSummary(fidSims)
#>                  mean    median        lwr      upr Pr(=0)
#> (Intercept) 0.8696454 0.8742805 -0.9653809 2.720132     NA
#> x           0.9109923 0.9107001  0.8080415 1.015237     NA
#> sigma_error 2.4486778 2.4154403  1.8862475 3.211994      0
#> attr(,"confidence level")
#> [1] 0.95

The fiducial confidence intervals are close to the frequentist ones:

lmfit <- lm(y ~ x)
confint(lmfit)
#>                  2.5 %  97.5 %
#> (Intercept) -0.9589356 2.70406
#> x            0.8076886 1.01402

The fiducial cumulative distribution function of the slope:

Fslope <- gfiCDF(~ x, fidSims)
plot(Fslope, main = "Slope", ylab = expression("Pr("<="x)"))

To get a fiducial density, I recommend the ‘kde1d’ package:

library(kde1d)
kfit <- kde1d(fidSims$VERTEX[["x"]], weights = fidSims$WEIGHT, mult = 4)
curve(dkde1d(x, kfit), from = 0.7, to = 1.1)

Fiducial predictive inference. The gfilmmPredictive function samples the generalized fiducial predictive distribution. All the functions seen above can be applied to the output.

fpd <- gfilmmPredictive(fidSims, newdata = data.frame(x = c(1, 30)))
gfiSummary(fpd)
#>         mean    median      lwr       upr
#> y1  1.801507  1.816828 -3.21099  6.922675
#> y2 28.155631 28.157729 22.85169 33.325925
#> attr(,"confidence level")
#> [1] 0.95

Compare with the frequentist approach:

predict(lmfit, newdata = data.frame(x = c(1, 30)), interval = "prediction")
#>         fit       lwr       upr
#> 1  1.783417 -3.408471  6.975305
#> 2 28.198198 23.006310 33.390086

A mixed model

Now let us simulate some data from a one-way ANOVA model with a random factor:

mu           <- 10000 # grand mean
sigmaBetween <- 2
sigmaWithin  <- 3
n            <- 8L # sample size per group

set.seed(666L)
groupmeans <- rnorm(2L, mu, sigmaBetween)
y1         <- rnorm(n, groupmeans[1L], sigmaWithin) 
y2         <- rnorm(n, groupmeans[2L], sigmaWithin) 
y          <- c(y1, y2)
y_rounded  <- round(c(y1, y2), digits = 1L)
dat        <- data.frame(
                ylwr = y_rounded - 0.05,
                yupr = y_rounded + 0.05,
                group = gl(2L, n)
              )

We run the fiducial sampler:

fidSims <- gfilmm(~ cbind(ylwr, yupr), ~ 1, ~ group, data = dat, N = 10000L)

Observe that the between standard deviation sigma_group has a positive value in the Pr(=0) column:

gfiSummary(fidSims)
#>                     mean       median        lwr          upr     Pr(=0)
#> (Intercept) 10000.698525 10001.828832 9979.48611 10026.836248         NA
#> sigma_group    18.264835     3.399129    0.00000    83.980735 0.08605762
#> sigma_error     4.132655     4.008439    2.78631     6.267243 0.00000000
#> attr(,"confidence level")
#> [1] 0.95

What does it mean? The fiducial distributions of the variance components have a mass at zero, and this value is the probability that the between standard deviation equals zero. So you have to be careful if you are interested in the fiducial density of a standard deviation: if Pr(=0) is not null for the standard deviation you are interested in, the fiducial distribution of this standard deviation does not have a density. It has a mass at zero, and a density on the strictly positive real numbers.

Compare the fiducial confidence interval of the grand mean to its Kenward-Roger confidence interval:

library(lmerTest)
library(emmeans)
fit <- lmer(y ~ (1|group), data = dat)
emmeans(fit, ~ 1)
#>  1       emmean     SE df lower.CL upper.CL
#>  overall  10002 2.1074  1   9975.4    10029
#> 
#> Degrees-of-freedom method: kenward-roger 
#> Confidence level used: 0.95

With gfiConfInt we can get a fiducial confidence interval for any parameter of interest, for example the coefficient of total variation:

gfiConfInt(~ sqrt(sigma_group^2 + sigma_error^2)/`(Intercept)`, fidSims)
#>         2.5%        97.5% 
#> 0.0003318767 0.0084440371