Calculating a static, delay-adjusted estimate of disease severity

Understanding disease severity, and especially the case fatality risk (CFR), is key to outbreak response. During an outbreak there is often a delay between cases being reported, and the outcomes (for CFR, deaths) of those cases being known, and accounting for these leads to better estimates of CFR. cfr_static() can be used to calculate a static estimate of the severity of an outbreak using methods from Nishiura et al. (2009) while accounting for the distribution of reporting delays.

New to calculating disease severity using cfr? You might want to see the “Get started” vignette first.

Use case

We want a static estimate of the severity of an outbreak in the form of the case fatality risk (CFR) while correcting for the delay in reporting the outcomes of cases.

What we have

First load cfr and packages to access and plot data.

Code
library(cfr)

# packages to wrangle and plot data
library(dplyr)

Severity of the 1976 Ebola Outbreak

This example data comes from the 1976 Ebola virus disease (EVD, or Ebola) outbreak in the Democratic Republic of the Congo (Camacho et al. 2014).

We focus on the roughly the first half of this dataset, by subsetting the data so that we only include days before 30th September, 1976.

Code
data("ebola1976")

# view the first few rows
head(ebola1976)
#>         date cases deaths
#> 1 1976-08-25     1      0
#> 2 1976-08-26     0      0
#> 3 1976-08-27     0      0
#> 4 1976-08-28     0      0
#> 5 1976-08-29     0      0
#> 6 1976-08-30     0      0

df_ebola_subset <- filter(ebola1976, date <= "1976-09-30")

Onset-to-death delay distribution

We retrieve the parameters of the distribution of durations (also called delays) between the onset of EVD symptoms and death from the literature (Barry et al. 2018). This is a Gamma distribution with shape \(k\) = 2.40 and scale \(\theta\) = 3.33.

Note that while we shall use a continuous distribution here as we do not expect this difference to bias the estimates excessively, it is more appropriate to use a discretised distribution instead as we are working with daily data. See the vignette on delay distributions for more on using discretised distributions with cfr.

Note also that we use the central estimates for each distribution parameter, and by ignoring uncertainty in these parameters the uncertainty in the resulting CFR is likely to be underestimated.

Intermediate step: Estimating expected deaths

The function estimate_outcomes() from the cfr package can estimate the number of deaths expected given a time-series of case onset and reported deaths and a delay density function.

The resulting data frame contains two new columns, “estimated_outcomes”, for the number of deaths expected for each day, and “u_t” for the ratio of cumulative number of estimated deaths and the cumulative number of cases reported until each date specified in data.

Code
# calculate known death outcomes
df_estimated_outcomes_ebola <- estimate_outcomes(
  data = df_ebola_subset,
  delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33)
)

# print head of data frame
head(df_estimated_outcomes_ebola)
#>         date cases deaths estimated_outcomes        u_t
#> 1 1976-08-25     1      0         0.00000000 0.00000000
#> 2 1976-08-26     0      0         0.03323030 0.03323030
#> 3 1976-08-27     0      0         0.06494676 0.09817706
#> 4 1976-08-28     0      0         0.08485286 0.18302992
#> 5 1976-08-29     0      0         0.09400738 0.27703731
#> 6 1976-08-30     0      0         0.09515185 0.37218916

# print tail of data frame
tail(df_estimated_outcomes_ebola)
#>          date cases deaths estimated_outcomes       u_t
#> 32 1976-09-25    11      8           7.787856 0.5283224
#> 33 1976-09-26     7     11           8.301252 0.5563894
#> 34 1976-09-27     7      7           8.625233 0.5840532
#> 35 1976-09-28     4     13           8.761617 0.6207698
#> 36 1976-09-29     4     12           8.659704 0.6552761
#> 37 1976-09-30     3      9           8.371787 0.6904737

The estimated deaths are greater than the true deaths at the beginning of the outbreak, as infected individuals who have not yet died are expected to die in the coming days.

Similarly, the estimated deaths later in the outbreak are similar to or less than the reported deaths, as the deaths of individuals who survived for longer are now reported.

Note that estimate_outcomes() is exported but is primarily intended for internal use.

Estimating the naive and corrected CFR

The function cfr_static() wraps the internal functions estimate_outcomes() and estimate_severity() to provide a static CFR by automatically correcting for reporting delays if a delay density function is provided.

Code
# calculating the naive CFR
cfr_static(
  data = df_ebola_subset
)
#>   severity_mean severity_low severity_high
#> 1     0.7197802    0.6485503     0.7836984

# calculating the corrected CFR
cfr_static(
  df_ebola_subset,
  delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33)
)
#>   severity_mean severity_low severity_high
#> 1             1        0.868             1

Severity of the COVID-19 pandemic in the U.K.

This example shows static severity estimation using cfr and data from the Covid-19 pandemic in the United Kingdom.

We load example Covid-19 daily case and death data provided with the cfr package as covid_data, and subset for the first year of U.K. data.

Code
# get Covid data loaded with the package
data("covid_data")

# filter for the U.K
df_covid_uk <- filter(
  covid_data,
  country == "United Kingdom", date <= "2020-12-31"
)

# View the first few rows and recall necessary columns: date, cases, deaths
head(df_covid_uk)
#>         date        country cases deaths
#> 1 2020-01-03 United Kingdom     0      0
#> 2 2020-01-04 United Kingdom     0      0
#> 3 2020-01-05 United Kingdom     0      0
#> 4 2020-01-06 United Kingdom     0      0
#> 5 2020-01-07 United Kingdom     0      0
#> 6 2020-01-08 United Kingdom     0      0

Onset-to-death distribution for Covid-19

We retrieve the appropriate distribution for Covid-19 from Linton et al. (2020); this is a lognormal distribution with \(\mu\) = 2.577 and \(\sigma\) = 0.440.

Note that Linton et al. (2020) fitted a discretised lognormal distribution and we use a continuous distribution, and that we are ignoring uncertainty in the distribution parameters and likely under-estimating uncertainty in the CFR.

Estimating the naive and corrected CFR

Finally, we calculate the naive and corrected CFRs for the Covid-19 pandemic in the U.K.

Code
# calculating the naive CFR
cfr_static(
  df_covid_uk
)
#>   severity_mean severity_low severity_high
#> 1    0.03640132   0.03617238     0.0366313

# calculating the corrected CFR
cfr_static(
  df_covid_uk,
  delay_density = function(x) dlnorm(x, meanlog = 2.577, sdlog = 0.440)
)
#>   severity_mean severity_low severity_high
#> 1         0.046        0.046         0.047

Details: Adjusting for delays between two time series

cfr_static() follows Nishiura et al. (2009) to calculate a quantity \(u_t\) for each day within the input data which represents the proportion of cases with a known adverse outcome (usually death) on day \(t\).

\[ u_t = \dfrac{\sum_{i = 0}^t \sum_{j = 0}^\infty c_i f_{j - i}}{\sum_{i = 0} c_i}, \]

where \(f_t\) is the value of the probability mass function at time \(t\), and \(c_t\), \(d_t\) are the number of new cases and new deaths at time \(t\) (respectively). We then use \(u_t\) in the following likelihood function to estimate severity.

\[ {\sf L}(\theta | y) = \log{\dbinom{u_tC}{D}} + D \log{\theta} + (u_tC - D)\log{(1.0 - \theta)}, \]

\(C\) and \(D\) are the cumulative number of cases and deaths (respectively) until time \(t\).

Lastly \(\theta\) (severity) is estimated \(\theta\) using simple maximum-likelihood methods, allowing the functions within this package to be quick and easy tools to use.

The precise severity measure — CFR, IFR, HFR, etc — that \(\theta\) represents depends upon the input data given by the user.


References

Barry, Ahmadou, Steve Ahuka-Mundeke, Yahaya Ali Ahmed, Yokouide Allarangar, Julienne Anoko, Brett Nicholas Archer, Aaron Aruna Abedi, et al. 2018. “Outbreak of Ebola virus disease in the Democratic Republic of the Congo, April–May, 2018: an epidemiological study.” The Lancet 392 (10143): 213–21. https://doi.org/10.1016/S0140-6736(18)31387-4.
Camacho, A., A. J. Kucharski, S. Funk, J. Breman, P. Piot, and W. J. Edmunds. 2014. “Potential for Large Outbreaks of Ebola Virus Disease.” Epidemics 9 (December): 70–78. https://doi.org/10.1016/j.epidem.2014.09.003.
Linton, Natalie M., Tetsuro Kobayashi, Yichi Yang, Katsuma Hayashi, Andrei R. Akhmetzhanov, Sung-mok Jung, Baoyin Yuan, Ryo Kinoshita, and Hiroshi Nishiura. 2020. “Incubation Period and Other Epidemiological Characteristics of 2019 Novel Coronavirus Infections with Right Truncation: A Statistical Analysis of Publicly Available Case Data.” Journal of Clinical Medicine 9 (2): 538. https://doi.org/10.3390/jcm9020538.
Nishiura, Hiroshi, Don Klinkenberg, Mick Roberts, and Johan A. P. Heesterbeek. 2009. “Early Epidemiological Assessment of the Virulence of Emerging Infectious Diseases: A Case Study of an Influenza Pandemic.” PLOS ONE 4 (8): e6852. https://doi.org/10.1371/journal.pone.0006852.