---
title: "Advanced usage of onlineFDR"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 2
    vignette: >
      %\VignetteIndexEntry{Advanced usage of onlineFDR}
      %\VignetteEncoding{UTF-8}
      %\VignetteEngine{knitr::rmarkdown}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)

library(onlineFDR)

sample.df <- data.frame(
    id = c('A15432', 'B90969', 'C18705', 'B49731', 'E99902',
        'C38292', 'A30619', 'D46627', 'E29198', 'A41418',
        'D51456', 'C88669', 'E03673', 'A63155', 'B66033'),
    date = as.Date(c(rep("2014-12-01",3),
                    rep("2015-09-21",5),
                    rep("2016-05-19",2),
                    "2016-11-12",
                    rep("2017-03-27",4))),
    pval = c(2.90e-14, 0.06743, 0.01514, 0.08174, 0.00171,
            3.61e-05, 0.79149, 0.27201, 0.28295, 7.59e-08,
            0.69274, 0.30443, 0.000487, 0.72342, 0.54757))

set.seed(1)
```

## Brief Background of the `onlineFDR` algorithms

Javanmard and Montanari proposed two procedures, LOND and LORD, to control the
FDR in an online manner (Javanmard and Montanari (2015, 2018)), with the latter
extended by Ramdas *et al.* (2017). The LOND procedure sets the adjusted
significance thresholds  based on the number of discoveries made so far, while
LORD sets them according to the time of the most recent discovery. Ramdas
*et al.* (2018) then proposed the SAFFRON procedure, which provides an adaptive
method of online FDR control. They also proposed a variant of the
Alpha-investing algorithm of Foster and Stine (2008) that guarantees FDR
control, using SAFFRON's update rule.

Subsequently, Zrnic *et al.* (2021) proposed procedures to control the modified
FDR (mFDR) in the context of *asynchronous* testing, i.e. where each hypothesis
test can itself be a sequential process and the tests can overlap in time. They
presented asynchronous versions of the LOND, LORD and SAFFRON procedures for a
variety of trial settings. For both synchronous and asynchronous testing, Tian
& Ramdas (2019) proposed the ADDIS algorithms which compensate for the loss
in power in the presence of conservative nulls by adaptively 'discarding' these
p-values.

Finally, Tian & Ramdas (2021) proposed procedures that provide online control
of the FWER. One procedure, online fallback, gives a uniform improvement to the
naive Alpha-spending procedure (see below). The ADDIS-spending procedure
compensates for the power loss of these procedures by including both adapativity
in the fraction of null hypotheses and the conservativeness of nulls.

## Variations to the default options

In the following section,  we consider the arguments that a typical user might
consider amending for their analysis.

### Common arguments

As a default, the `alpha` argument is set to 0.05, where `alpha` sets the
overall significance level of the FDR of FWER controlling procedure. By
convention, the standard significance level utilised is the 5%.  However, there
are applications where an alternate threshold could be considered. For example,
a more stringent threshold might be appropriate when there are limited resources
to follow up significant findings. A less stringent threshold might be
appropriate when the downstream analysis is a global analysis which can tolerate
a higher proportion of false positives.

To ensure correct interpretation of the dates provided there is a date.format 
argument. As a default, the date format is set to receive dates as 
year-month(00-12)-day(number).  The following website provides clear guidance
on symbols used to interpret the date information: 
https://www.statmethods.net/input/dates.html

As a default, the `random` argument is set to `TRUE`. In this situation, the
order of p-values in each batch (i.e. with the same date) are randomised. This
is to avoid the risk of p-values being ordered post-hoc, which can lead to an
inflation of the FDR. As the dataset grows the data is reprocessed. To ensure 
the consistency of the output (with the randomisation within the previous
batches remaining the same), it is necessary to set the same `seed`
for all analyses.

The user also has the option to turn off the randomisation step, by setting the 
`random` argument to `FALSE`. This approach would be appropriate if the user 
has both a date *and* a time stamp for the p-values, in which case the data
should be ordered by date and time beforehand and then passed to a wrapper
function. Another scenario would be when p-values within the batches are
ordered using *independent* side information, so that hypotheses most likely to
be rejected come first, which would potentially increase the power of the
procedure (see Javanmard and Montanari (2018) and Li and Barber (2017)). 


### LOND

As a default, the `dep` argument is set to `FALSE`.  Alternatively, this can be
set to `TRUE` and will implement the LOND procedure to guarantee FDR control for
arbitrarily dependent p-values. This method will in general be more
conservative.

```{r}
set.seed(1); results.indep <- LOND(sample.df)    # for independent p-values
set.seed(1); results.dep <- LOND(sample.df, dep=TRUE)   # for dependent p-values

# compare adjusted significance thresholds
cbind(independent = results.indep$alphai, dependent = results.dep$alphai)

```

The vector `betai` is supplied by default, but can optionally be specified by the
user (as described above, see the formula for $\beta_j$ [here](#LOND_beta)).


### LORD

The default version of LORD used is version '++', but the user can optionally
specify versions 3, 'discard' and 'dep' using the `version` argument (see
[here](#LORD) for further details about the different versions).

```{r}
set.seed(1); results.LORD.plus <- LORD(sample.df)
set.seed(1); results.LORD3 <- LORD(sample.df, version=3)
set.seed(1); results.LORD.discard <- LORD(sample.df, version='discard')
set.seed(1); results.LORD.dep <- LORD(sample.df, version='dep') 

# compare adjusted significance thresholds
cbind(LORD.plus = results.LORD.plus$alphai,
    LORD3 = results.LORD3$alphai,
    LORD.discard  = results.LORD.discard$alphai,
    LORD.dep = results.LORD.dep$alphai)

```

By default $w_0 = \alpha/10$ and (for LORD 3 and LORD dep) $b0 = alpha - w0$,
but these parameters can optionally be specified by the user subject to the
requirements that $0 \leq w_0 \leq \alpha$, $b_0 > 0$ and $w_0+b_0 \leq \alpha$.

The value of `gammai` is also supplied by default, but can optionally be
specified by the user (as described above, see the formula for $\gamma_j$
[here](#LORDdep_xi) for version='dep' and [here](#LORD_gamma) for all other
versions of LORD).


### SAFFRON

By default $w_0 = \alpha/2$ and $\lambda = 0.5$, but these parameters can
optionally be specified by the user subject to the requirements that
$0 \leq w_0 \leq \alpha$ and $0 < \lambda < 1$. The values of `gammai` are also
supplied by default, but can optionally be specified by the user (as described
above, see the formula for $\gamma_j$ [here](#SAFFRON_gamma)).


### ADDIS

By default $w_0 = \alpha/2$, $\tau = 0.5$ and $\lambda = 0.25$, but these
parameters can optionally be specified by the user subject to the requirements
that $0 \leq w_0 < \alpha$, $0 < \tau < 1$ and $0 < \lambda < \tau$.
The values of `gammai` are also supplied by default, but can optionally be
specified by the user.


### Alpha-spending and online fallback

The values of `gammai` are supplied by default, but can optionally
be specified by the user.


### ADDIS-spending

By default $\lambda = 0.25$ and $\tau = 0.5$, but these
parameters can optionally be specified by the user subject to the requirements
that $\lambda < \tau$, $0 < \lambda < 1$ and $0 < \tau < 1$.
The values of `gammai` are also supplied by default, but can optionally be
specified by the user.

### Asynchronous testing

Zrnic *et al.* (2021) proposed procedures to control the modified
FDR (mFDR) in the context of *asynchronous* testing, i.e. where each hypothesis
test can itself be a sequential process and the tests can overlap in time. They
presented asynchronous versions of the LOND, LORD and SAFFRON procedures for a 
variety of trial settings, including the following:

1: **Asynchronous online mFDR control**: 
This is for an asynchronous testing process, consisting of tests that start and
finish at (potentially) random times. The discretised finish times of the test
correspond to the decision times.

2: **Online mFDR control under local dependence**: 
For any $t>0$ we allow the p-value $p_t$ to have arbitrary dependence on the
previous $L_t$ p-values. The fixed sequence $L_t$ is referred to as `lags'.

3: **mFDR control in asynchronous mini-batch testing**: 
A mini-batch represents a grouping of tests run asynchronously which
result in dependent p-values. Once a mini-batch of tests is fully completed,
a new one can start, testing hypotheses independent of the previous batch.