--- title: "Prospective Power Estimation" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Prospective Power Estimation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, message = FALSE} library(postcard) withr::local_seed(1395878) withr::local_options(list(postcard.verbose = 0)) ``` # Estimating the power for marginal effects The method proposed in [Powering RCTs for marginal effects with GLMs using prognostic score adjustment](https://arxiv.org/abs/2503.22284) by HĂžjbjerre-Frandsen et. al (2025), which can be used to estimate the power when estimating any marginal effect, is implemented in the function `power_marginaleffect()`. An introductory example is available in `vignette("postcard")`, but here we describe how to specify arguments to change the default behavior of the function to align with assumptions for the power estimation. ### Simulating some data We generate count data to showcase the flexibility of `power_marginaleffect()` that does not assume a linear model. ```{r} n_train <- 2000 n_test <- 200 b1 <- 0.9 b2 <- 0.2 b3 <- -0.4 b4 <- -0.6 train_pois <- glm_data( Y ~ b1*log(X1)+b2*X2+b3*X3+b4*X2*X3, X1 = runif(n_train, min = 1, max = 10), X2 = rnorm(n_train), X3 = rgamma(n_train, shape = 1), family = poisson() ) test_pois <- glm_data( Y ~ b1*log(X1)+b2*X2+b3*X3+b4*X2*X3, X1 = runif(n_test, min = 1, max = 10), X2 = rnorm(n_test), X3 = rgamma(n_test, shape = 1), family = poisson() ) ``` ## Controlling assumptions As a default, the variance in group 0 is estimated from the sample variance of the response, and the variance in group 1 is assumed to be the same as the estimated variance in group 0. Use argument `var1` to change the variance estimate in group 1, either with a function that modifies the estimate obtained for group 0 or as a `numeric`. The same is true for the MSE, where `kappa1_squared` as a default is taken to be the same as the MSE in group 0 unless a `function` or `numeric` is specified. Below is an example showcasing the different ways to specify `var1` and `kappa1_squared` used to adjust the assumptions according to prior beliefs. In practice, if wanting to specify either of these arguments as just a numeric, you will likely do a calculation on some historical data, whereas here we just put some number to showcase. > Note that we fit a prognostic model using `fit_best_learner()` and use this as our prediction in accordance with the guidance in the main reference to obtain a conservative power estimate of estimating a marginal effect in a GLM that adjusts for the prognostic score as a covariate alongside other covariates. ```{r} lrnr <- fit_best_learner(list(mod = Y ~ X1 + X2 + X3), data = train_pois) preds <- dplyr::pull(predict(lrnr, new_data = test_pois)) power_marginaleffect( response = test_pois$Y, predictions = preds, var1 = function(var0) 1.1 * var0, kappa1_squared = 2, estimand_fun = "rate_ratio", target_effect = 1.4, exposure_prob = 1/2, margin = 0.8 ) ``` The function `repeat_power_marginaleffect()` allows passing arguments along to `power_marginaleffect()`, so we are able to quickly create power curves for the above case. We define our models in a named list and define a function of a single parameter - the sample size `n` - that simulates the test data with the same structure as above but for different sample sizes. ```{r} ancova_prog_list <- list( ANCOVA = glm(Y ~ X1 + X2 + X3, data = train_pois, family = poisson), # "Null model" = glm(Y ~ 1, data = train_pois, family = poisson), "ANCOVA with prognostic score" = fit_best_learner(list(mod = Y ~ X1 + X2 + X3), data = train_pois) ) test_pois_fun <- function(n) { glm_data( Y ~ b1*log(X1)+b2*X2+b3*X3+b4*X2*X3, X1 = runif(n, min = 1, max = 10), X2 = rnorm(n), X3 = rgamma(n, shape = 1), family = poisson() ) } ``` We then run the power estimation and plot the results: ```{r} rpm <- repeat_power_marginaleffect( model_list = ancova_prog_list, test_data_fun = test_pois_fun, ns = seq(20, 500, by = 10), n_iter = 20, var1 = function(var0) 1.1 * var0, kappa1_squared = function(kap0) 1.1 * kap0, estimand_fun = "rate_ratio", target_effect = 1.4, exposure_prob = 1/2, margin = 0.8 ) plot(rpm) ```