Getting Started with SteppedPower

Philipp Mildenberger1

Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI, Mainz)

Federico Marini2

Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI, Mainz)

2021-01-28

1 About SteppedPower

SteppedPower offers tools for power and sample size calculation as well as design diagnostics for longitudinal mixed model settings, with a focus on stepped wedge designs. Other implemented study design types are parallel, parallel with baseline period(s) and crossover designs. Further design types can be easily defined by the user.

Currently, normal outcomes and binomial outcomes with logit link are implemented. The following random effects can be specified: random cluster intercept, random treatment effect, random subject specific intercept and random time effect. The covariance structure can be compound symmetry or autoregressive.

This code is modularised in order to be flexible and easy to use (and hopefully to maintain as well). At the same time, the use of sparse matrix classes of the matrix package makes computation of large designs feasible.

2 Methods and Notation

A common approach to model the correlation in longitudinal studies are random effects.(Hussey and Hughes 2007; Li et al. 2020) Such a model has the form

\[y_{ijk}= \mu_0 + \alpha_i + T_{ij} \theta + b_j + e_{ijk}\]

with

  • \(y_{ijk}\) the response in cluster \(j\) at time \(i\) for individual \(k\)
  • \(\mu_0\) a overall mean (under control)
  • \(\alpha_i\) the time trend at time \(i\)
  • \(T_{ij}\) indicates the treatment status (0 = control, 1 = interventional treatment)
  • \(\theta\) the treatment effect
  • \(b_j\) a random cluster effect for cluster \(j\) with \(b_j \sim N(0,\tau^2)\)
  • \(e_{ijk}\) a normal random error term with \(e_{ijk}\sim N(0,\sigma^2)\)

For power calculation, the standard deviation of random effects is assumed to be known. Lets define \(\beta:=(\mu,\alpha',\theta)'\) and \(\omega_{ijk}:=b_j+e_{ijk}\). This leads to a compact and more general notation of the above equation:

\[\begin{align} y_{ijk}&= X_{ij}\beta + \omega_{ijk}\\ \text{or, in matrix notation:} \qquad \\ y&=X\beta + \omega \end{align}\]

Where \(X\) is the corresponding design matrix and \(\omega\sim N(0,\Omega)\), where \(\Omega\) is a compound-symmetry (syn. exchangeable) variance matrix defined by \(\tau\) and \(\sigma\). We are thus in a weighted least squares setting, so the variance of \(\beta\) is

\[ \text{Var}(\hat\beta) = {(X'\Omega^{-1}X)^{-1}}\]

We can then calculate the power of a z-test

\[ \text{power} = \Phi\left(\frac{\theta_A-\theta_0}{\sqrt{\text{Var}(\hat \theta)}}- Z_{1-\frac{\alpha}{2}}\right) \]

where \(\text{Var}(\hat \theta)\) is the diagonal element of \(\Omega\) that corresponds to \(\hat\theta\).

Extensions to the above formula implemented in this package are

  • random treatment effect
  • autoregressive cluster effect
  • binomial outcomes with logit-link

with leads to the following extended model formula:

\[y_{ijk}= g\big( \mu + \alpha_i + X (\theta_{ij} + c_j) + b_j + e_{ijk}\big)\] with

  • \(g(\cdot)\) a link function
  • \(c_j\) a random treatment effect
  • \(b_j\) and \(c_j\) jointly distributed with \(\left(\begin{smallmatrix} \tau^2 & \rho\tau\eta \\ \rho\tau\eta & \eta^2\end{smallmatrix}\right)\)

3 A quick tour

For most users, the probably most important function is wlsPower. It calls several auxiliary functions which will be shortly discussed here. This section is not essential for the usage of SteppedPower, it might be helpful to design non-standard user defined settings.

wlsPower is essentially just a flexible wrapper for the function compute_wlsPower, which does the actual computation.
compute_wlsPower then calls constuct_DesMat and construct_CovMat.

construct_DesMat builds the design matrix which consists of the treatment status, usually built byconstruct_trtMat and the time adjustment, usually built by construct_timeadjust. There is also the option to pass a user defined definition of the treatment status to construct_DesMat. If not specified, the number of timepoints is guessed as the fewest number of periods (timepoints) possible with the given design, i.e. two for cross-over designs or the number of waves plus one for stepped wedge designs.

construct_CovMat builds the covariance matrix (explicitly). It uses construct_CovBlk to construct the blocks for each cluster which are then combined to a block diagonal matrix.

4 Features

4.1 Plot Method

In the weighted least squares setting, the estimator \(\hat \beta\) is a linear function of the data \(y\)

\[\hat\beta = \underbrace{(X'\Omega^{-1}X)^{-1}(X'\Omega^{-1})}_{=:\text{M}}\cdot y\]

with \(X\) the design matrix and \(\Omega\) the covariance matrix as above. The matrix \(M\) gives an impression of the importance of clusters and time periods with regard to the estimated coefficients \(\hat\beta\). The first row of \(M\) corresponds to the coefficient of the treatment status, i.e. the treatment effect.
The plot.wlsPower method visualises this first row of \(M\) as a matrix where rows and columns correspond to clusters and time periods, respectively.

Furthermore, to give a rough comparison of importance between clusters (or between time periods), the sum of absolute weights per row (or per column) is also shown.

CAVE: These are out-of-bag estimates for the influence of observations on \(\hat\theta\), but not for \(\text{Var}(\hat\theta)\) !

wlsPwr <- wlsPower(Cl=c(3,2,3), mu0=0, mu1=1, sigma=1, tau=.5, verbose=2)
plot(wlsPwr)

4.2 Find Sample Size for given Power

When the argument Power is passed to wlsPower, the sample size needed is calculated, under the assumption of equally sized clusters and periods.

wlsPower(Cl=c(3,2,3), mu0=0, mu1=.2, sigma=1, tau=0, Power=.8)
#> Power                                = 0.805
#> Significance level (two sided)       = 0.05
#> Needed N per cluster per period      = 53

5 Use cases and examples

5.1 Comparison of two groups – Z-Test

This might be a proof of concept rather than an example with practical relevance, but let’s to compare the mean in two groups. For two groups of 10 observations each, the power of a Z-test can be calculated as follows:

wlsPower(Cl=c(10,10), mu0=0,mu1=.6,sigma=1, tau=0, N=1, 
              dsntype="parallel", timepoints=1)
#> Power                                = 0.2687
#> Significance level (two sided)       = 0.05

## the same:
wlsPower(Cl=c(1,1), mu0=0,mu1=.6, sigma=1, tau=0, N=10,
              dsntype="parallel", timepoints=1)
#> Power                                = 0.2687
#> Significance level (two sided)       = 0.05
pwr::pwr.norm.test(.3,n=20)$power
#> [1] 0.2686618

A quick Note on t-tests: It is much more challenging to use SteppedPower to reproduce settings in which the variance is assumed to be unknown, most prominently the well known t-test. In this package, you find implemented some (experimental) heuristics for guessing the denominator degrees of freedom, but they yield rather scaled Wald tests than t tests. The main difference is that the distribution under the alternative is assumed to be symmetric, whereas the t-test assumes a non-central (hence skewed) t-distribution.

5.2 Longitudinal study – parallel groups

wlsPower(Cl=c(10,10),timepoints=5,mu0=0,mu1=.25,
         sigma=.5,dsntype="parallel")
#> Power                                = 0.7054
#> Significance level (two sided)       = 0.05

wlsPower(Cl=c(10,10),timepoints=5,mu0=0,mu1=.25,
         sigma=.5,tau=.2,dsntype="parallel")
#> Power                                = 0.4616
#> Significance level (two sided)       = 0.05

5.3 Stepped Wedge designs with empty sequences (i.e. waves)

Periods in which no cluster switches to the intervention are specified by inserting zeros into the Cl argument, i.e. Cl=c(4,4,4,0).

mod1 <- wlsPower(Cl=c(1,1,1,0), mu0=0, mu1=1, 
                 sigma=0.4, tau=0, verbose=2)

knitr::kable(mod1$DesignMatrix$trtMat)
0 1 1 1 1
0 0 1 1 1
0 0 0 1 1

5.4 Autocorrelated cluster effects

In longitudinal studies, it can be sensible to assume that correlation within clusters decreases with increasing time lag. The argument tauAR enables the user to specify a AR-1 correlation. tauAR must be any value between 0 and 1. The former corresponds to i.i.d. observations, the latter to the usual compound symmetry covariance type.

An example of a stepped wedge design with 8 clusters in 4 waves, once with medium autocorrelation (tauAR=0.6) and once with high autocorrelation (tauAR=0.95):

mod2 <- wlsPower(Cl=c(2,2,2,2), mu0=0, mu1=1, 
              sigma=1, N=100, tau=1, tauAR=.6, verbose=2)

mod3 <- wlsPower(Cl=c(2,2,2,2), mu0=0, mu1=1, 
              sigma=1, N=100, tau=1, tauAR=.95, verbose=2)

For tauAR=0.6, the covariance matrix within one cluster then looks like this:

1.0100 0.600 0.36 0.216 0.1296
0.6000 1.010 0.60 0.360 0.2160
0.3600 0.600 1.01 0.600 0.3600
0.2160 0.360 0.60 1.010 0.6000
0.1296 0.216 0.36 0.600 1.0100

For tauAR=0.95 it takes the following shape

1.0100000 0.950000 0.9025 0.857375 0.8145062
0.9500000 1.010000 0.9500 0.902500 0.8573750
0.9025000 0.950000 1.0100 0.950000 0.9025000
0.8573750 0.902500 0.9500 1.010000 0.9500000
0.8145062 0.857375 0.9025 0.950000 1.0100000

5.5 Unequal cluster sizes

The argument N defines the cluster size. N can be * a scalar, if all clusters have the same assumed size, which is also constant over time * a vector, if the size differs between clusters but is assumed to be constant over time * a matrix where each row corresponds to either a cluster or a wave of clusters and each column corresponds to a timepoint

mod4 <- wlsPower(Cl=c(1,1,1), mu0=0, mu1=1, N=c(1,3,10), tau=.5, verbose=2)
plot(mod4)

5.6 Incomplete Stepped Wedge Designs

Suppose you do not plan to observe all clusters over the whole study period. Rather, clusters that switch early to the intervention are not observed until the end. Analogous, observation starts later in clusters that switch towards the end of the study. This is sometimes called ‘incomplete SWD’ [hemming2015stepped].

There are two ways to achieve this in SteppedPower, both by using the incomplete argument. One can either scalar, which then defines the number of observed periods before and after the switch from control to intervention in each cluster.

If for example the study consists of eight clusters in four sequences (i.e. five timepoints), and we observe two timepoints before and after the switch, then we receive

incompletePwr <- wlsPower(Cl=rep(2,4), sigma=2, tau=.6, mu0=0,mu1=.5, N=80, 
                             incomplete=2, verbose=2)
incompletePwr
#> Power                                = 0.8221
#> Significance level (two sided)       = 0.05

A slightly more tedious, but more flexible way is to define a matrix where each row corresponds to either a cluster or a wave of clusters and each column corresponds to a timepoint. If a cluster is not observed at a specific timepoint, set the value in the corresponding cell to 0. For the example above, such a matrix would look like this:

TM  <- toeplitz(c(1,1,0,0))
incompleteMat1 <- cbind(TM[,1:2],rep(1,4),TM[,3:4])
incompleteMat2 <- incompleteMat1[rep(1:4,each=2),]

A matrix where each row represents a wave of clusters

1 1 1 0 0
1 1 1 1 0
0 1 1 1 1
0 0 1 1 1

or each row represents a cluster

1 1 1 0 0
1 1 1 0 0
1 1 1 1 0
1 1 1 1 0
0 1 1 1 1
0 1 1 1 1
0 0 1 1 1
0 0 1 1 1

Now all that’s left to do is to plug that into the main function:

incompletePwr1 <- wlsPower(Cl=rep(2,4), sigma=2, tau=.6, mu0=0, mu1=.5, N=80, 
                        incomplete=incompleteMat1, verbose=2)
incompletePwr2 <- wlsPower(Cl=rep(2,4), sigma=2, tau=.6, mu0=0, mu1=.5, N=80, 
                        incomplete=incompleteMat2, verbose=2)

all.equal(incompletePwr,incompletePwr1)
#> [1] TRUE
all.equal(incompletePwr,incompletePwr2)
#> [1] TRUE

We can also have a quick look at the projection matrix where we see that the clusters have a weight of exactly zero at the timepoints where they are not observed

plot(incompletePwr)

The argument incomplete with matrix input works also for other design types, but makes (supposedly) most sense in the context of stepped wedge designs

5.8 Closed cohort SWD

In a closed cohort the patients are observed over the whole study period. The same correlation structure arises in cross sectional stepped wedge designs if subclusters exist (such as wards within clinics). The argument psi denotes the standard deviation of a random subject (or subcluster) specific intercept.

The power is calculated on aggregated cluster means:

Closed1 <- wlsPower(mu0=0, mu1=5, Cl=rep(3,3), sigma=5, tau=1, psi=2, gamma=1,
                      N=3, verbose=2)
a <- plot(Closed1$DesignMatrix)

WithINIDV_LVL=TRUE, the calculation is done on the individual level. This yields the same results but is far more comutationally expensive and is mainly intended for diagnostic purposes.

Closed2 <- wlsPower(mu0=0, mu1=5, Cl=rep(3,3), sigma=5, tau=1, psi=2, gamma=1,
                      N=3, verbose=2, INDIV_LVL = TRUE)
plot(Closed2)

Session Info

sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19041)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=C                    LC_CTYPE=German_Germany.1252   
#> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
#> [5] LC_TIME=German_Germany.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] Matrix_1.2-18      SteppedPower_0.1.0 knitr_1.30        
#> 
#> loaded via a namespace (and not attached):
#>  [1] highr_0.8         pillar_1.4.7      compiler_4.0.3    tools_4.0.3      
#>  [5] digest_0.6.27     viridisLite_0.3.0 jsonlite_1.7.2    evaluate_0.14    
#>  [9] lifecycle_0.2.0   tibble_3.0.4      gtable_0.3.0      lattice_0.20-41  
#> [13] pkgconfig_2.0.3   rlang_0.4.10      crosstalk_1.1.1   yaml_2.2.1       
#> [17] xfun_0.20         pwr_1.3-0         stringr_1.4.0     dplyr_1.0.2      
#> [21] httr_1.4.2        generics_0.1.0    vctrs_0.3.6       htmlwidgets_1.5.3
#> [25] grid_4.0.3        tidyselect_1.1.0  glue_1.4.2        data.table_1.13.6
#> [29] R6_2.5.0          plotly_4.9.3      rmarkdown_2.6     farver_2.0.3     
#> [33] tidyr_1.1.2       ggplot2_3.3.3     purrr_0.3.4       magrittr_2.0.1   
#> [37] scales_1.1.1      ellipsis_0.3.1    htmltools_0.5.1.1 colorspace_2.0-0 
#> [41] stringi_1.5.3     lazyeval_0.2.2    munsell_0.5.0     crayon_1.3.4

References

Hemming, Karla, Terry P Haines, Peter J Chilton, Alan J Girling, and Richard J Lilford. 2015. “The Stepped Wedge Cluster Randomised Trial: Rationale, Design, Analysis, and Reporting.” Bmj 350: h391.

Hemming, Karla, and Monica Taljaard. 2020. “Reflection on Modern Methods: When Is a Stepped-Wedge Cluster Randomized Trial a Good Study Design Choice?” International Journal of Epidemiology.

Hussey, Michael A, and James P Hughes. 2007. “Design and Analysis of Stepped Wedge Cluster Randomized Trials.” Contemporary Clinical Trials 28 (2): 182–91.

Li, Fan, James P Hughes, Karla Hemming, Monica Taljaard, Edward R Melnick, and Patrick J Heagerty. 2020. “Mixed-Effects Models for the Design and Analysis of Stepped Wedge Cluster Randomized Trials: An Overview.” Statistical Methods in Medical Research, 0962280220932962.