--- title: "An Introduction to t_TOST" subtitle: "A new function for TOST with t-tests" author: "Aaron R. Caldwell" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: TRUE vignette: > %\VignetteIndexEntry{An Introduction to t_TOST} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console bibliography: references.bib --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(TOSTER) library(ggplot2) library(ggdist) ``` # Introduction to TOST TOST (Two One-Sided Tests) is a statistical approach used primarily for equivalence testing. Unlike traditional null hypothesis significance testing which aims to detect differences, equivalence testing aims to statistically support a claim of "no meaningful difference" between groups or conditions. This is particularly useful in bioequivalence studies, non-inferiority trials, or any research where establishing similarity (rather than difference) is the primary goal. In an effort to make `TOSTER` more informative and easier to use, I created the functions `t_TOST` and `simple_htest`. These functions operate very similarly to base R's `t.test` function with a few key enhancements: 1. `t_TOST` performs 3 t-tests simultaneously: - One traditional two-tailed test - Two one-sided tests to assess equivalence 2. `simple_htest` allows you to run equivalence testing or minimal effects testing using either t-tests or Wilcoxon-Mann-Whitney tests, with output in the familiar `htest` class format (like base R's statistical test functions). Both functions have versatile methods where you can supply: - Two vectors of data - A formula interface (e.g., `y ~ group`) - Various test configurations (two-sample, one-sample, or paired samples) The functions also provide enhanced summary information and visualizations that make interpreting results more intuitive and user-friendly. ## Beyond Equivalence: Minimal Effects Testing These functions are not limited to equivalence tests. Minimal effects testing (MET) is also supported, which is useful for situations where the hypothesis concerns a minimal effect and the null hypothesis *is* equivalence. This reverses the typical TOST paradigm and can be valuable in contexts where you want to demonstrate that an effect exceeds some minimal threshold. ```{r tostplots,echo=FALSE, message = FALSE, warning = FALSE, fig.show='hold'} ggplot() + geom_vline(aes(xintercept = -.5), linetype = "dashed") + geom_vline(aes(xintercept = .5), linetype = "dashed") + geom_text(aes( y = 1, x = -0.5, vjust = -.9, hjust = "middle" ), angle = 90, label = 'Lower Bound') + geom_text(aes( y = 1, x = 0.5, vjust = 1.5, hjust = "middle" ), angle = 90, label = 'Upper Bound') + geom_text(aes( y = 1, x = 0, vjust = 1.5, hjust = "middle" ), #alignment = "center", label = "H0" ) + geom_text(aes( y = 1, x = 1.5, vjust = 1.5, hjust = "middle" ), #alignment = "center", label = "H1" ) + geom_text(aes( y = 1, x = -1.5, vjust = 1.5, hjust = "middle" ), #alignment = "center", label = "H1" ) + theme_tidybayes() + scale_y_continuous(limits = c(0,1.75)) + scale_x_continuous(limits = c(-2,2)) + labs(x = "", y = "", title="Minimal Effect Test", caption = "H1 = Alternative Hypothesis \n H0 = Null Hypothesis") + theme( strip.text = element_text(face = "bold", size = 10), axis.text.y = element_blank(), axis.ticks.y = element_blank() ) ggplot() + geom_vline(aes(xintercept = -.5), linetype = "dashed") + geom_vline(aes(xintercept = .5), linetype = "dashed") + geom_text(aes( y = 1, x = -0.5, vjust = -.9, hjust = "middle" ), angle = 90, label = 'Lower Bound') + geom_text(aes( y = 1, x = 0.5, vjust = 1.5, hjust = "middle" ), angle = 90, label = 'Upper Bound') + geom_text(aes( y = 1, x = 0, vjust = 1.5, hjust = "middle" ), #alignment = "center", label = "H1" ) + geom_text(aes( y = 1, x = 1.5, vjust = 1.5, hjust = "middle" ), #alignment = "center", label = "H0" ) + geom_text(aes( y = 1, x = -1.5, vjust = 1.5, hjust = "middle" ), #alignment = "center", label = "H0" ) + theme_tidybayes() + scale_y_continuous(limits = c(0,1.75)) + scale_x_continuous(limits = c(-2,2)) + labs(x = "", y = "", title="Equivalence Test", caption = "H1 = Alternative Hypothesis \n H0 = Null Hypothesis") + theme( strip.text = element_text(face = "bold", size = 10), axis.text.y = element_blank(), axis.ticks.y = element_blank() ) ``` The plots above illustrate the conceptual difference between Minimal Effect Tests (left) and Equivalence Tests (right). Notice how the null (H0) and alternative (H1) hypotheses are reversed between these approaches. # Getting Started with Sample Data In this vignette, we'll use the built-in `iris` and `sleep` datasets to demonstrate various applications of TOST. ```{r} data('sleep') data('iris') ``` Let's take a quick look at the sleep data: ```{r} head(sleep) ``` The `sleep` dataset contains observations on the effect of two soporific drugs (`group` 1 and 2) on 10 patients, with the increase in hours of sleep (`extra`) as the response variable. # Independent Groups Analysis ## Basic Equivalence Testing For our first example, we'll test whether the effect of the two drugs in the `sleep` data is equivalent within bounds of ±0.5 hours. We'll use both the comprehensive `t_TOST` function and the simpler `simple_htest` approach. ### Using t_TOST ```{r} res1 = t_TOST(formula = extra ~ group, data = sleep, eqb = .5, # equivalence bounds of ±0.5 hours smd_ci = "t") # t-distribution for SMD confidence intervals # Alternative syntax with separate vectors res1a = t_TOST(x = subset(sleep, group==1)$extra, y = subset(sleep, group==2)$extra, eqb = .5) ``` The `eqb` parameter specifies our equivalence bounds (±0.5 hours of sleep). The `smd_ci = "t"` argument indicates that we want to use the t-distribution and the standard error of the standardized mean difference (SMD) to create SMD confidence intervals, which is computationally faster for demonstration purposes. ### Using simple_htest We can achieve similar results with the more concise `simple_htest` function: ```{r} # Simple htest approach res1b = simple_htest(formula = extra ~ group, data = sleep, mu = .5, # equivalence bound alternative = "e") # "e" for equivalence ``` The `alternative = "e"` specifies an equivalence test, and `mu = .5` sets our equivalence bound. ## Viewing Results Let's examine the output from both approaches: ```{r} # Comprehensive t_TOST output print(res1) # Concise htest output print(res1b) ``` Notice how `t_TOST` provides detailed information including both raw and standardized effect sizes, whereas `simple_htest` gives a more concise summary in the familiar format of base R's statistical tests. ## Visualizing Results One of the advantages of `t_TOST` is its built-in visualization capabilities. There are four types of plots available: ### 1. Simple Dot-and-Whisker Plot This is the default plot type, showing the point estimate and confidence intervals relative to the equivalence bounds: ```{r fig.width=6, fig.height=6} plot(res1, type = "simple") ``` This plot clearly shows where our observed difference (with confidence intervals) falls in relation to our equivalence bounds (dashed vertical lines). ### 2. Consonance Density Plot This plot shows the distribution of possible effect sizes, with shading for different confidence levels: ```{r fig.width=6, fig.height=6, eval=TRUE} # Shade the 90% and 95% CI areas plot(res1, type = "cd", ci_shades = c(.9, .95)) ``` The darker shaded region represents the 90% confidence interval, while the lighter region represents the 95% confidence interval. This visualization helps illustrate the precision of our estimate. ### 3. Consonance Plot This plot shows multiple confidence intervals simultaneously: ```{r fig.width=6, fig.height=6} plot(res1, type = "c", ci_lines = c(.9, .95)) ``` Here, we can see both the 90% and 95% confidence intervals represented as different lines. ### 4. Null Distribution Plot This plot visualizes the null distribution: ```{r fig.width=6, fig.height=6} plot(res1, type = "tnull") ``` This visualization is particularly useful for understanding the theoretical distribution under the null hypothesis. ## Describing Results in Plain Language Both `t_TOST` and `simple_htest` objects can be used with the `describe` or `describe_htest` functions to generate plain language interpretations of the results: ```{r, eval = FALSE} describe(res1) describe_htest(res1b) ``` > `r describe(res1)` > `r describe_htest(res1b)` These descriptions provide accessible interpretations of the statistical results, making it easier to understand and communicate findings. # Paired Samples Analysis Many study designs involve paired measurements (e.g., before-after designs or matched samples). Let's examine how to perform TOST with paired data. ## Using the Sleep Data with Paired Analysis We can use the same sleep data, but now treat the measurements as paired: ```{r} res2 = t_TOST(formula = extra ~ group, data = sleep, paired = TRUE, # specify paired analysis eqb = .5) res2 res2b = simple_htest( formula = extra ~ group, data = sleep, paired = TRUE, mu = .5, alternative = "e") res2b ``` Setting `paired = TRUE` changes the analysis to account for within-subject correlations. ## Using Separate Vectors for Paired Data Alternatively, we can provide two separate vectors for paired data, as demonstrated with the iris dataset: ```{r} res3 = t_TOST(x = iris$Sepal.Length, y = iris$Sepal.Width, paired = TRUE, eqb = 1) res3 res3a = simple_htest( x = iris$Sepal.Length, y = iris$Sepal.Width, paired = TRUE, mu = 1, alternative = "e" ) res3a ``` Here we're testing whether the difference between Sepal.Length and Sepal.Width is equivalent within ±1 unit. ## Minimal Effect Testing (MET) As mentioned earlier, sometimes we want to test for a minimal effect rather than equivalence. We can do this by changing the `hypothesis` argument to "MET": ```{r} res_met = t_TOST(x = iris$Sepal.Length, y = iris$Sepal.Width, paired = TRUE, hypothesis = "MET", # Change to minimal effect test eqb = 1, smd_ci = "t") res_met res_metb = simple_htest(x = iris$Sepal.Length, y = iris$Sepal.Width, paired = TRUE, mu = 1, alternative = "minimal.effect") res_metb ``` For `simple_htest`, we use `alternative = "minimal.effect"` to specify MET. The `smd_ci = "t"` in `t_TOST` specifies using the t-distribution method for calculating standardized mean difference confidence intervals (this is faster than the default "nct" method). ### Interpreting MET Results ```{r, eval = FALSE} describe(res_met) describe_htest(res_metb) ``` > `r describe(res_met)` > `r describe_htest(res_metb)` ### Warning: careful using narrow equivalence bounds Now, using MET isn't without some potential danger. If a user utilizes a very small equivalence bound (e.g., an SMD of 0.05) then there is the possibility that the p-value for the MET test will be smaller than the a typical nil-hypothesis two-tailed test. This raises the concern that someone could "hack" these functions in order to produce a "significant" result that alluded them when using a two-tailed test. A warning message has now been added to the TOST functions in this package to at least warn users about the hazards of using very narrow equivalence bounds. ```{r, error=TRUE} set.seed(221) dat1 = rnorm(30) dat2 = rnorm(30) test = t_TOST( x = dat1, y = dat2, eqbound_type = "SMD", eqb = .2, hypothesis = "MET" ) test ``` # One Sample t-test In some cases, we may want to compare a single sample against known bounds. For this, we can use a one-sample test: ```{r} res4 = t_TOST(x = iris$Sepal.Length, hypothesis = "EQU", eqb = c(5.5, 8.5), # lower and upper bounds smd_ci = "t") res4 ``` Here, we're testing whether the mean of Sepal.Length is equivalent within bounds of 5.5 and 8.5 units. Notice how we specify the bounds as a vector with two distinct values rather than a single value that gets applied symmetrically. # Working with Summary Statistics Only Researchers often only have access to summary statistics rather than raw data, especially when working with published literature. The `tsum_TOST` function allows you to perform TOST analyses using just summary statistics: ```{r} res_tsum = tsum_TOST( m1 = mean(iris$Sepal.Length, na.rm=TRUE), # sample mean sd1 = sd(iris$Sepal.Length, na.rm=TRUE), # sample standard deviation n1 = length(na.omit(iris$Sepal.Length)), # sample size hypothesis = "EQU", eqb = c(5.5, 8.5) ) res_tsum ``` Required arguments vary depending on the test type: * `n1 & n2`: the sample sizes (only n1 needed for one-sample tests) * `m1 & m2`: the sample means * `sd1 & sd2`: the sample standard deviations * `r12`: the correlation between paired samples (only needed if `paired = TRUE`) The same visualization and description methods work with `tsum_TOST`: ```{r fig.width=6, fig.height=6} plot(res_tsum) ``` ```{r} describe(res_tsum) ``` # Power Analysis for TOST Planning studies requires determining appropriate sample sizes. The `power_t_TOST` function allows for power calculations specifically designed for TOST analyses. This function uses more accurate methods than previous TOSTER functions and matches results from commercial software like PASS. The calculations are based on Owen's Q-function or direct integration of the bivariate non-central t-distribution^[Inspired by @PowerTOST in the `PowerTOST` R package. Please see this package for more options]. Approximate power is implemented via the non-central t-distribution or the 'shifted' central t-distribution [@phillips1990, @diletti1991]. The interface mimics base R's `power.t.test` function. You specify equivalence bounds and leave one parameter blank (`alpha`, `power`, or `n`) to solve for it: ```{r} power_t_TOST(n = NULL, delta = 1, # assumed true difference sd = 2.5, # assumed standard deviation eqb = 2.5, # equivalence bounds alpha = .025, # significance level power = .95, # desired power type = "two.sample") # test type ``` In this example, we're planning a two-sample equivalence study where: - We assume the true difference is at least 1 unit - The standard deviation is estimated to be 2.5 - We set equivalence bounds to ±2.5 units - We want 95% power with alpha of 0.025 The analysis indicates we need 74 participants per group (148 total) to achieve our desired power. For more advanced power analysis options, the `PowerTOST` R package provides additional functionality. # Conclusion The `t_TOST` and `simple_htest` functions in the TOSTER package provide flexible and user-friendly tools for performing equivalence testing and minimal effect testing. With support for various study designs, robust visualization capabilities, and plain language interpretations, these functions make it easier to apply and communicate these important statistical approaches. Whether you're analyzing raw data or working with summary statistics, TOSTER provides the tools needed to conduct and interpret these tests appropriately. # References