Using the daily incidence curve and a collection of festive or anomalous days EpiInvert estimates a time varying reproduction number and a restored incidence curve by inverting the renewal equation :
\((1) \ \ \ \ i_t=\sum_k i_{t-k}R_{t-k}\Phi_k\)
through a variational model as described in
PNAS, 2021
and
Biology, 2022.
See
Rt comparison
for a comparison with other methods which compute the reproduction number.
A festive or anomalous day is any day where we know “a priori” that the registered number of cases is biased. Typically, in those days, one observes a sharp decrease in the number of registered incidence that is compensated by increased incidence numbers in the next few days. This bias is corrected by redistributing the number of cases in the festive day and the next 2 days.
On top of the festive day bias, there is a strong administrative weekly bias introduced by the way the countries registered the new cases each day of the week. This weekly bias is corrected using 7-day quasi-periodic multiplicative correction factors. We use the following notation :
\((2) \ \ \ \ i^f_t \ \ \text{is the festive day bias free incidence}\)
\((3) \ \ \ \ q_t \ \ \text{ is the 7-day quasi-periodic multiplicative correction factor}\)
\((4) \ \ \ \ i^b_t=i^f_tq_t \ \ \text{is the festive + weekly biases free incidence}\)
Once the festive day and weekly biases are corrected in the incidence curve the difference between the incidence curve and its expected value using the renewal equation is modeled by
\((5) \ \ \ \ i^b_{t}=i^r_t+\varepsilon_{t}(i_{t}^r)^a\)
where
\((6) \ \ \ \ i^r_t=\sum_k i^b_{t-k}R_{t-k}\Phi_k.\)
In a nutshell, the proposed variational model is based on estimating all the variables involved in order to minimize the difference between the incidence and its expected value using the renewal equation.
The power a in the equation (5) is computed experimentally by linear regression (in t) applied to
\((7) \ \ \ \ (log(|i^b_{t}-i^r_t|),log(i^r_t)).\)
The normalized error of the model is given by
\((8) \ \ \ \ \epsilon_t=\frac{i^b_t-i^r_t}{(i^r_t)^a}.\)
In Biology, 2022, it is shown experimentally that this normalized error is well approximated by an exponential distributed white noise.
To model the serial interval, EpiInvert allows a shifted log-normal parametric formulation. The shift can be negative, reflecting the fact that secondary cases may present symptoms earlier than the primary case. The user can also provide a non-parametric serial interval given by a numeric vector.
You can install the development version of EpiInvert from GitHub with:
install.packages("devtools")
devtools::install_github("lalvarezmat/EpiInvert")
We attach some required packages
library(EpiInvert)
library(ggplot2)
library(dplyr)
library(grid)
Loading stored data on COVID-19 daily incidence up to 2022-05-05 for France, Germany, the USA and the UK:
data(incidence)
tail(incidence)
#> date FRA DEU USA UK
#> 828 2022-04-30 49482 11718 23349 0
#> 829 2022-05-01 36726 4032 16153 0
#> 830 2022-05-02 8737 113522 81644 32
#> 831 2022-05-03 67017 106631 61743 35518
#> 832 2022-05-04 47925 96167 114308 16924
#> 833 2022-05-05 44225 85073 72158 12460
Loading some festive days for the same countries:
data(festives)
head(festives)
#> USA DEU FRA UK
#> 1 2020-01-01 2020-01-01 2020-01-01 2020-01-01
#> 2 2020-01-20 2020-04-10 2020-04-10 2020-04-10
#> 3 2020-02-17 2020-04-13 2020-04-13 2020-04-13
#> 4 2020-05-25 2020-05-01 2020-05-01 2020-05-08
#> 5 2020-06-21 2020-05-21 2020-05-08 2020-05-25
#> 6 2020-07-03 2020-06-01 2020-05-21 2020-06-21
We show the execution of EpiInvert using Germany data. The first parameter is a numerical vector with the daily incidence, the second parameter is the date of the last incidence value and the third parameter is a character vector with the festive days (this parameter is not mandatory)
res <- EpiInvert(incidence$DEU,"2022-05-05",festives$DEU)
Plotting the results:
EpiInvert_plot(res)
EpiInvert return a list with the following elements:
i_original : the original daily incidence curve. Notice that EpiInvert does not allow missing values. On days when a country does not report data, a zero must be registered as the value associated with the incidence of that day.
i_festive : the festive days bias free incidence (see equation (2)).
i_bias_free : the festive days and weekly biases free incidence (see equation (4)).
i_restored : the restored incidence (see equation (6)).
Rt : time varying reproduction number.
Rt_CI95 : to estimate Rt on each day t, EpiInvert uses the past days (t'<=t) and the future days (t'>t) when available. Therefore, the EpiInvert estimate of Rt varies when more days are available. Rt_CI95 represents the radius of an empiric 95% confidence interval of the expected variation of Rt as a function of the number of days after t available (see the plot of Rt above).
seasonality : the 7-day quasi-periodic multiplicative correction factors (see equations (3)-(4)).
dates : the date associated with each incidence value.
festive : a Boolean vector to indicate the days considered as festive.
epsilon : the normalized error given by equation (8).
power_a : the power a in equation (8). Note that this value strongly depends on the size of the incidence curve used by the EpiInvert estimation. As we shall see later, this size is an EpiInvert parameter. The estimated value of a only has sense in the case of a large incidence sequence.
si_distr : numeric vector with the serial interval distribution used, it can be computed from the shifted parametric log-normal or it can be uploaded by the user
shift_si_distr : shift of the serial interval used.
EpiInvert execution for France using 365 days in the past. If you are not constrained by the computational cost of the algorithm, you can choose a large value of this parameter (for instance 9999), to ensure that EpiInvert will use the whole available sequence in the estimation.
res <- EpiInvert(incidence$FRA,"2022-05-05",festives$FRA,
select_params(list(max_time_interval = 365)))
Plot of the incidence between “2021-12-15” and “2022-01-15”. Observe that the festive days bias correction only modifies the original incidence in the festive days and the following 2 days.
EpiInvert_plot(res,"incid","2021-12-15","2022-01-15")
EpiInvert execution for UK using a non-parametric serial interval shifted -2 days
load data of a serial interval
data(si_distr_data)
head(si_distr_data)
#> [1] 3.285609e-06 3.401902e-04 3.904441e-03 1.543537e-02 3.466818e-02
#> [6] 5.608451e-02
res <- EpiInvert(incidence$UK,"2022-05-05",festives$UK,
select_params(list(si_distr = si_distr_data,
shift_si_distr=-2)))
Plot of the serial interval used (including the shift)
EpiInvert_plot(res,"SI")
EpiInvert execution for the USA changing the default values of the parametric serial interval (using a shifted log-normal)
res <- EpiInvert(incidence$USA,"2022-05-05",festives$USA,
select_params(list(mean_si = 11,sd_si=6,shift_si=-1)))
Plot of the reproduction number Rt including an empiric 95\% confidence interval of the variation of EpiInvert Rt estimation as a function of the number of future days available.
EpiInvert_plot(res,"R")
To load an updated version of the incidence file we use in these examples you can execute:
incidence <- read.csv(url("https://www.ctim.es/covid19/incidence.csv"))
tail(incidence)
#> date FRA DEU USA UK
#> 842 2022-05-12 36047 68999 112112 14800
#> 843 2022-05-13 32773 61859 81683 6587
#> 844 2022-05-14 30459 6151 16092 0
#> 845 2022-05-15 22844 2305 30890 0
#> 846 2022-05-16 5936 86252 145288 23820
#> 847 2022-05-17 43727 72051 112487 8596
you can introduce in the EpiInvert call the last date of the incidence using the dates included in the incidence file:
res <- EpiInvert(incidence$USA,incidence$date[length(incidence$date)],festives$USA)
EpiInvert_plot(res)
#> Warning: Removed 2 row(s) containing missing values (geom_path).