The purpose of forecastML
is to provide a series of functions and visualizations that simplify the process of multi-step-ahead direct forecasting with standard machine learning algorithms. It’s a wrapper package aimed at providing maximum flexibility in model-building–choose any machine learning algorithm from any R
package–while helping the user quickly assess the (a) accuracy, (b) stability, and (c) generalizability of grouped (i.e., multiple related time-series) and ungrouped single-outcome forecasts produced from potentially high-dimensional modeling datasets.
This package is inspired by Bergmeir, Hyndman, and Koo’s 2018 paper A note on the validity of cross-validation for evaluating autoregressive time series prediction. In particular, forecastML
makes use of
R
package,future
packageto build and evaluate high-dimensional forecast models without having to use methods that are time-series specific.
The following quote from Bergmeir et al.’s article nicely sums up the aim of this package:
“When purely (non-linear, nonparametric) autoregressive methods are applied to forecasting problems, as is often the case (e.g., when using Machine Learning methods), the aforementioned problems of CV are largely irrelevant, and CV can and should be used without modification, as in the independent case.”
In contrast to the recursive or iterated method for producing multi-step-ahead forecasts used in traditional forecasting methods like ARIMA, direct forecasting involves creating a series of distinct horizon-specific models. Though several hybrid methods exist for producing multi-step forecasts, the simple direct forecasting method with lagged features used in forecastML
let’s us avoid the exponentially more difficult problem of having to “predict the predictors” for forecast horizons beyond 1-step-ahead.
Below are some resources for learning more about multi-step forecasting strategies:
The animation below shows how historical data is used to create a 1-to-12-step-ahead forecast for a 12-step-horizon forecast model using lagged predictors or features. Though feature lags greater than 12 steps can be used to make use of additional historical predictive information, a 12-step-horizon direct forecast model requires feature lags >= 12. This animation is roughly equivalent to how a 12-period seasonal ARIMA(0, 0, 0)(1, 0, 0) model uses historical data to produce forecasts.
Transform datasets for modeling by creating various patterns of lagged features for user-specified forecast horizons with forecastML::create_lagged_df
Create datasets for evaluating forecast models using nested cross-validation with forecastML::create_windows
Train and evaluate machine learning models for forecasting with forecastML::train_model
.
Assess forecast accuracy at different forecast horizons with forecastML::return_error
.
Assess hyperparameter stability with forecastML::return_hyper
.
Create datasets of lagged features for direct forecasting.
In this walkthrough of forecastML
we’ll compare the forecast performance of two machine learning methods, LASSO and Random Forest, across forecast horizons using the Seatbelts dataset from the datasets
package.
Here’s a summary of the problem at hand:
DriversKilled
- car drivers killed per month in the UK.DriversKilled
- car drivers killed per month in the UK.kms
- a measure of distance driven.PetrolPrice
- the price of gas.law
- A binary indicator of the presence of a seatbelt law.data_train
and evaluate their out-of-sample performance on data_test
.ts_frequency <- 12 # monthly time-series
data_train <- data[1:(nrow(data) - ts_frequency), ]
data_test <- data[(nrow(data) - ts_frequency + 1):nrow(data), ]
p <- ggplot(data, aes(x = 1:nrow(data), y = DriversKilled))
p <- p + geom_line()
p <- p + geom_vline(xintercept = nrow(data_train), color = "red", size = 1.1)
p <- p + theme_bw() + xlab("Index")
p
We’ll create a list of datasets, one for each forecast horizon, with lagged values for each feature. The lookback
argument in forecastML::create_lagged_df()
specifies the feature lags in dataset rows.
horizons <- c(1, 3, 6, 9, 12)
lookback <- 1:15
data_list <- forecastML::create_lagged_df(data_train, type = "train",
outcome_col = 1, lookback = lookback,
horizons = horizons)
Let’s view the modeling dataset for a forecast horizon of 6. Notice that “lag
The plot below illustrates, for a given feature, the number and position (in dataset rows) of lagged features created for each forecast horizon/model. The lookback
argument to forecastML::created_lagged_df()
was set to create lagged features from a minimum of 1 lag to a maximum of 15 lags; however, feature lags that don’t support direct forecasting at a given forecast horizon are silently removed from the modeling dataset.
forecastML::create_windows()
creates indices for partitioning the training dataset in the outer loop of a nested cross-validation setup. The validation datasets are created in contiguous blocks of window_length
, as opposed to randomly seleted rows, to mimic forecasting over multi-step-ahead forecast horizons. The skip
, window_start
, and window_stop
arguments take dataset indices–or dates if a vector of dates is supplied to forecastML::create_lagged_df()
–that allow the user to adjust the number and placement of outer loop validation datasets.
windows <- forecastML::create_windows(lagged_df = data_list, window_length = 24, skip = 0,
window_start = NULL, window_stop = NULL,
include_partial_window = TRUE)
windows
## start stop window_length
## 1 16 39 24
## 2 40 63 24
## 3 64 87 24
## 4 88 111 24
## 5 112 135 24
## 6 136 159 24
## 7 160 180 24
Below is a plot of the nested cross-validation outer loop datasets or windows. In our example, a window_length
of 24 (months) resulted in 7 validation windows.
In this nested cross-validation setup, a model is trained with data from 6 windows and forecast accuracy is assessed on the left-out window. This means that we’ll need to train 14 models, each selecting different optimal hyperparameters and model coefficients–if available–from the inner validation loop.
We’ll compare the forecasting performance of two models: (a) a cross-validated LASSO and (b) a non-tuned Random Forest. The following user-defined functions are needed for each model:
create_lagged_df(..., type = "train")
(e.g., my_lagged_df$horizon_h),train_model()
predict()
function.Any data transformations, hyperparameter tuning, or inner loop cross-validation procedures should take place within this function, with the limitation that it ultimately needs to return()
a model suitable for the user-defined predict()
function; a list can be returned to capture meta-data such as hyperparameter results.
# Example 1 - LASSO
# Alternatively, we could define an outcome column identifier argument, say, 'outcome_col = 1' in
# this function or just 'outcome_col' and then set the argument as 'outcome_col = 1' in train_model().
model_function <- function(data) {
x <- data[, -(1), drop = FALSE]
y <- data[, 1, drop = FALSE]
x <- as.matrix(x, ncol = ncol(x))
y <- as.matrix(y, ncol = ncol(y))
model <- glmnet::cv.glmnet(x, y)
return(model)
}
# Example 2 - Random Forest
# Alternatively, we could define an outcome column identifier argument, say, 'outcome_col = 1' in
# this function or just 'outcome_col' and then set the argument as 'outcome_col = 1' in train_model().
model_function_2 <- function(data) {
outcome_names <- names(data)[1]
model_formula <- formula(paste0(outcome_names, "~ ."))
model <- randomForest::randomForest(formula = model_formula, data = data, ntree = 200)
return(model)
}
For each modeling approach, LASSO and Random Forest, a total of N forecast horizons
* N validation windows
models are trained. In this example, that means training 35 models for each algorithm.
These models could be trained in parallel on any OS with the very flexible future
package by uncommenting the code below and setting use_future = TRUE
. To avoid nested parallelization, models are either trained in parallel across forecast horizons or validation windows, whichever is longer (when equal, the default is parallel across forecast horizons). In this example we have 5 horizon-specific models and 7 validation datasets so the 35 models would be trained in parallel across horizons for a given validation dataset.
#future::plan(future::multiprocess)
model_results <- forecastML::train_model(data_list, windows, model_name = "LASSO",
model_function, use_future = FALSE)
model_results_2 <- forecastML::train_model(data_list, windows, model_name = "RF",
model_function_2, use_future = FALSE)
The following user-defined prediction function is needed for each model:
data.frame
of the model features from forecastML::create_lagged_df(..., type = "train")
.data.frame
of predictions with 1 or 3 columns. A 1-column data.frame will produce point forecasts, and a 3-column data.frame can be used to return point, lower, and upper forecasts (column names and order do not matter).# Example 1 - LASSO
prediction_function <- function(model, data_features) {
x <- as.matrix(data_features, ncol = ncol(data_features))
data_pred <- data.frame("y_pred" = predict(model, x, s = "lambda.min"))
return(data_pred)
}
# Example 2 - Random Forest
prediction_function_2 <- function(model, data_features) {
data_pred <- data.frame("y_pred" = predict(model, data_features))
return(data_pred)
}
The predict.forecast_model()
S3 method takes any number of trained models from train_model()
and a list of user-defined prediction functions. The list of prediction functions should appear in the same order as the models.
Outer loop nested cross-validation forecasts are returned for each user-defined model, forecast horizon, and validation window.
data_results <- predict(model_results, model_results_2,
prediction_function = list(prediction_function, prediction_function_2), data = data_list)
Let’s view the models’ predictions. The data.frame with S3 class training_results
contains the following columns:
train_model()
.attributes(create_lagged_df())$row_indices
.attributes(create_lagged_df())$date_indices
.create_lagged_df()
.Below is a plot of the forecasts for each validation window at select forecast horizons.
Below is a plot of the forecast error for select validation windows at select forecast horizons.
The plots below are diagnostic plots to check how forecasts for a target point in time have changed through time by looking at a history of forecasts. In this example, we have 12 forecast horizons, 1:12, so each of the 12 colored points represents the origin of the forecast for the black point. In most cases it would be reasonable to expect shorter-horizon forecasts to be more accurate than longer-horizon forecasts.
The forecast_variability
plot below is a summary of the forecast_stability
plot. It’s a plot of the variability of forecasts for a target point in time collapsed across forecast horizons. A forecast model that produces greater variability across forecast horizons could be the better model provided the forecasts are increasingly accruate at shorter and shorter forecast horizons.
Let’s calcuate several common forecast error metrics.
The forecast errors for nested cross-validation are returned at 3 levels of granularity:
data_error <- forecastML::return_error(data_results, metrics = c("mae", "mape", "smape"),
models = NULL)
DT::datatable(data_error$error_global, options = list(scrollX = TRUE), )
Below is a plot of error metrics across time for select validation windows and forecast horizons.
Below is a plot of forecast error metrics by forecast model horizon collapsed across validation windows.
Below is a plot of error metrics collapsed across validation windows and forecast horizons.
While it may be reasonable to have distinct models for each forecast horizon or even forecasting model ensembles across horizons, at this point we still have slightly different LASSO and Random Forest models from the outer loop of the nested cross-validation within each horizon-specific model. Here, we’ll take a look at the stability of the hyperparameters for the LASSO model to better understand if we can train one model across forecast horizons or if we need additional predictors or modeling strategies to forecast well under various conditions or time-series dynamics.
The following user-defined hyperparameter function is needed for each model:
data.frame
.hyper_function <- function(model) {
lambda_min <- model$lambda.min
lambda_1se <- model$lambda.1se
data_hyper <- data.frame("lambda_min" = lambda_min, "lambda_1se" = lambda_1se)
return(data_hyper)
}
Below are two plots which show (a) univariate hyperparameter variability across the training data and (b) the relationship between each error metric and hyperparameter values.
data_hyper <- forecastML::return_hyper(model_results, hyper_function)
plot(data_hyper, data_results, data_error, type = "stability", horizons = c(1, 6, 12))
To forecast with the direct forecasting method, we need to create another dataset of lagged features. We can do this by running create_lagged_df()
and setting type = "forecast"
.
For non-grouped time-series, this function takes the last rows of data_train
and creates lagged features that support forecasting from 1-step-ahead to h horizons for each horizon-specific model. Below is the forecast dataset for a 6-step-ahead forecast.
The forecast dataset has the following columns:
1:max(horizons)
.type = "train"
, dataset.data_forecast_list <- forecastML::create_lagged_df(data_train, type = "forecast",
lookback = lookback, horizon = horizons)
DT::datatable(head(data_forecast_list$horizon_6), options = list(scrollX = TRUE))
Running the predict method, predict.forecast_model()
, on the lagged predictor dataset created above–with type = "forecast"
–and placing it in the data
argument in predict.forecast_model()
below, returns a data.frame of forecasts with the following columns:
train_model()
.create_lagged_df()
.An S3 object of class, forecast_results
, is returned. This object will have different plotting and error methods than the training_results
class from earlier.
data_forecast <- predict(model_results, model_results_2,
prediction_function = list(prediction_function, prediction_function_2),
data = data_forecast_list)
DT::datatable(head(data_forecast, 10), options = list(scrollX = TRUE))
Below is a plot of the forecasts vs. the actuals for each model at select forecast horizons.
Setting the data_actual = ...
and actual_indices = ...
arguments plots a background dataset (grey line in the plots below).
It’s clear from the plots that our Random Forest model is producing less accurate forecasts and is more sensitive to the data on which it was trained–producing a handful of erratic forecasts depending on the nested cross-validation data.
plot(data_forecast, data_actual = data_train[-(1:150), ],
actual_indices = as.numeric(row.names(data_train[-(1:150), ])),
horizons = c(1, 6, 12), facet_plot = c("model", "model_forecast_horizon"))
plot(data_forecast, data_actual = data_test,
actual_indices = as.numeric(row.names(data_test)),
facet_plot = "model", horizons = c(1, 6, 12))
Finally, we’ll look at our out-of-sample forecast error by forecast horizon for our two models by setting data_test = data_test
.
If the first argument of forecastML::return_error()
is an object of class forecast_results
and the data_test
argument is a data.frame like data_test from our beginning train-test split, a data.frame of forecast error metrics with the following columns is returned:
train_model()
.data_error <- forecastML::return_error(data_forecast, data_test = data_test,
test_indices = as.numeric(row.names(data_test)),
metrics = c("mae", "mape", "smape", "mdape"))
DT::datatable(head(data_error$error_by_horizon, 10), options = list(scrollX = TRUE))
Because our LASSO model is both stabler and more accurate, we’ll re-train our model across the entire training dataset to get our final 5 models–1 for each forecast horizon. Note that for a real-world forecasting problem this is when we would do additional model tuning to imrpove forecast accuracy across validation windows as well as narrow the hyperparameter search in the user-specified modeling functions.
data_list <- forecastML::create_lagged_df(data_train, type = "train",
lookback = lookback,
horizon = horizons)
To create a dataset without nested cross-validation, set window_length = 0
in forecastML::create_windows()
.
windows <- forecastML::create_windows(data_list, window_length = 0)
plot(windows, data_list, show_labels = TRUE)
Without nested cross-validation and holdout windows, the prediction plot is essnetially a plot of model fit.
model_results <- forecastML::train_model(data_list, windows, model_name = "LASSO", model_function)
data_results <- predict(model_results, prediction_function = list(prediction_function), data = data_list)
DT::datatable(head(data_results, 10), options = list(scrollX = TRUE))
data_error <- forecastML::return_error(data_results, metrics = c("mae", "mape", "mdape", "smape"),
models = NULL)
DT::datatable(head(data_error$error_global), options = list(scrollX = TRUE))
data_hyper <- forecastML::return_hyper(model_results, hyper_function)
plot(data_hyper, data_results, data_error, type = "stability", horizons = c(1, 6, 12))
data_forecast_list <- forecastML::create_lagged_df(data_train, type = "forecast",
lookback = lookback, horizon = horizons)
data_forecast <- predict(model_results, prediction_function = list(prediction_function),
data = data_forecast_list)
plot(data_forecast, data_actual = data[-(1:150), ],
actual_indices = as.numeric(row.names(data[-(1:150), ])),
horizons = c(1, 6, 12),
facet_plot = c("model", "model_forecast_horizon")) + ggplot2::theme(legend.position = "none")
plot(data_forecast, data_actual = data_test, actual_indices = as.numeric(row.names(data_test)),
facet_plot = NULL, horizons = c(1, 6, 12))