library(PandemicLP)
In this vignette, we present an advanced use of the PandemicLP package for adjusting pandemic models for multiple regions. The package uses the theory presented in http://est.ufmg.br/covidlp/home/en/ and the main goal of this vignette is to present a way to make predictions for 2 or more regions in which the future values are calculated as the sum of each individual forecast.
If the user wishes to use any other epidemic or pandemic data, it is his responsibility to prepare the database for each of the regions considered. To see how to prepare the data check ?covid19BH
.
Considering that the wish is to run the model for Covid-19 data, the first step is to define the vector that contains all the desired regions and define the final date to be considered for the available data. Besides that, the case_type can be ‘confirmed’ or ‘deaths’ and it determines which type of Covid data will be considered.
To exemplify, this vignette will make the prediction for the South Region of Brazil Covid-19 deaths using the data until October 1st.
<- c("PR", "SC","RS")
regions <- "2020-10-01"
last_date <- "deaths" case_type
If the user needs to run this example for other regions, he needs to change these variables above (regions, last_date and case_type) and then run the rest of the code without needing to change anything.
With the regions well defined, a loop is created to download the covid data from online repositories using function load_covid()
and then run the model and make predictions using functions pandemic_model()
and posterior_predict()
, respectively. The explanations about each of these functions can be accessed through the help of the package.
Inside the loop, it is possible to see that an “ifesle” statement is done to check if the regions specified are some Brazilian states. The check is done comparing the regions provided with the available states, listed by function state_list()
. Besides that, it is important to mention that with the function pandemic_model()
, it is possible to adjust models with different configurations (check ?pandemic_model
) so the user can change it to their need. In this vignette, however, the configuration will be equal to the one used on the app (http://www.est.ufmg.br/covidlp/), which can be accessed through the argument ‘covidLPconfig = TRUE’ at pandemic_model()
function.
The data, outputs, and predictions for each region are stored in lists.
<- list()
data <- list()
outputs <- list()
preds <- state_list()
states # Load precomputed MCMC
download.file("http://github.com/CovidLP/PandemicLP/raw/master/temp/PandemicLP_SumRegions.rda","./PandemicLP_SumRegions.rda")
load("./PandemicLP_SumRegions.rda")
for(i in 1:length(regions)) {
if (is.na(match(regions[i],states$state_abb))){
<- load_covid(country_name=regions[i],last_date=last_date)
data[[i]] else {
} <- load_covid(country_name="Brazil", state_name = regions[i],last_date=last_date)
data[[i]]
}# Commenting to reduce vignette runtime
# outputs[[i]] <- pandemic_model(data[[i]],case_type = case_type, covidLPconfig = TRUE)
<- posterior_predict(outputs[[i]])
preds[[i]]
}# Load precomputed MCMC
download.file("http://github.com/CovidLP/PandemicLP/raw/master/temp/PandemicLP_SumRegions.rda","./PandemicLP_SumRegions.rda")
load("./PandemicLP_SumRegions.rda")
After doing all the sums, we want the final object (data_base) to be pandemicPredicted
class and to contain the predictions, the data, the name of the regions considered, the type of case used, and the past and future mu’s. So, to make sure that the data_base
reflects all necessary format, we make it equal to the result of the posterior_predict
function for the first considered region, and then we will add the information using a loop starting from the second region until all the regions are added.
# consider the file for the first region (to save the right format)
<- preds[[1]]
data_base <- regions[1]
bind_regions for (i in 2:length(regions)) {
<- paste(bind_regions, "and", regions[i])
bind_regions
}$location <- bind_regions data_base
In order to make the predictions considering the sum of all regions, some data manipulation is required. The predictions (long and short) and the futures mu’s can be summed since the end date is the same for all regions, so the forecasts start at the same moment. For the sum of the data, however, it is necessary to pay attention to the dates because the pandemic starts in each region at a different time so we must sum the cases only on the same date. All this will be done in a loop starting from the second region until the last one, since the information of the first region is already contained in the object (explained previously).
In the case of the past mu, the data frames containing the mu values are transposed in a way that each line represents a date and the columns are the values of each mu. After adding the column with the date, the data frame for each region is combined in only one and, at the end of the loop, it is aggregated by date. The final data frame must have the date column deleted and needs to be transposed again to stay in the original format.
To sum the covid data for each region, each dataset is merged by date in a way that all the information is considered for both objects in the merge. When there is data on a given date for one region and not for another, the column for the region that does not have the information is shown as “NA”, so it is changed to 0 to be added. Then, each column from one dataset is added to the corresponding column of the second one, respecting the dates. In the end, the duplicated column is deleted so that the loop can start again for another region without any problem.
Finally, to ensure that the pandemic_stats
function has a horizon of data long enough to calculate pandemic summary information, some hidden objects are created by the posterior_predict
function. These objects are created considering the number of steps to predict equals the maximum number between 1000 and the horizon specified by the user. In order to do predictions as the sum of regions it is important to sum these hidden objects as well. Once again, they can be summed directly because all the data are ending on the same date so the predictions are starting at the same moment.
# get the mean sample and set it to be dates x mcmc sample
<- t(data_base$pastMu)
mu_t
# include the dates in the data frame
<- data.frame(data = data_base$data$date,mu_t)
mu_final <- names(data_base$pastMu)
names_mu
# get hidden objects (necessary for the pandemic_stats function)
<- methods::slot(data_base$fit,"sim")$fullPred$thousandShortPred
hidden_short_total <- methods::slot(data_base$fit,"sim")$fullPred$thousandLongPred
hidden_long_total <- methods::slot(data_base$fit,"sim")$fullPred$thousandMus
hidden_mu_total
# loop for each region - starting with the second one
for (u in 2:length(regions)) {
# preds for the selected region
<- preds[[u]]
data_region
# sum the variables predictive_Long, predictive_Short and futMu
$predictive_Long <- data_base$predictive_Long +
data_base$predictive_Long
data_region$predictive_Short <- data_base$predictive_Short +
data_base$predictive_Short
data_region$futMu <- data_base$futMu + data_region$futMu
data_base
# create a large data frame by concatenating samples for current state in the mean data frame
<- t(data_region$pastMu)
mu_t <- data.frame(data = data_region$data$date,mu_t)
mu_2 <- c(names_mu,names(data_region$pastMu))
names_mu <- rbind(mu_final,mu_2)
mu_final
# merge datasets by date since they can differ on start
$data <- merge(data_base$data,data_region$data,
data_baseby = "date", all = TRUE)
$data[is.na(data_base$data)] = 0
data_base$data$cases.x = data_base$data$cases.x +
data_base$data$cases.y
data_base$data$deaths.x = data_base$data$deaths.x +
data_base$data$deaths.y
data_base$data$new_cases.x = data_base$data$new_cases.x +
data_base$data$new_cases.y
data_base$data$new_deaths.x = data_base$data$new_deaths.x +
data_base$data$new_deaths.y
data_base$data <- data_base$data[,-c(6:9)]
data_basenames(data_base$data) <- c("date","cases","deaths",
"new_cases","new_deaths")
# sum hidden objects (necessary for the pandemic_stats function)
<- methods::slot(data_region$fit,"sim")$fullPred$thousandShortPred
hidden_short_region <- hidden_short_total + hidden_short_region
hidden_short_total <- methods::slot(data_region$fit,"sim")$fullPred$thousandLongPred
hidden_long_region <- hidden_long_total + hidden_long_region
hidden_long_total <- methods::slot(data_region$fit,"sim")$fullPred$thousandMus
hidden_mu_region <- hidden_mu_total + hidden_mu_region
hidden_mu_total
}
# create hidden object (necessary for the pandemic_stats function)
::slot(data_base$fit,"sim")$fullPred$thousandShortPred <- hidden_short_total
methods::slot(data_base$fit,"sim")$fullPred$thousandLongPred <- hidden_long_total
methods::slot(data_base$fit,"sim")$fullPred$thousandMus <- hidden_mu_total
methods
# aggregate the mean samples
<- aggregate(. ~ data, data=mu_final, FUN=sum)
mu_final <- mu_final[,-1]
mu_final <- t(mu_final)
mu_final <- unique(names_mu)
names_mu colnames(mu_final) <- names_mu
$pastMu <- mu_final data_base
After finishing all the data manipulation, the object data_base is the final object that can be used to create plots or any other analysis that the user may want. If the user doesn’t want to see the summary information about the pandemic, the argument “summary = FALSE” can be included on the plot
function to take out the notes.
<- plot(data_base,term = "both")
plots #> Plotting deaths cases
#> Short term plot created successfully
#> Long term plot created successfully
#> The generated plot(s) can be stored in a variable.
#> Long term plot: variable$long
#> Short term plot: variable$short
$long plots
$short plots