Vector Autoregressive Regression

Ramin Mojab

2023-07-04

library(ldt) 
library(tdata)

seed <- 123
set.seed(seed)

Introduction

In this vignette, I will introduce you to the main features of the ldt package for dealing with Vector Autoregressive Regression models. I will demonstrate how to perform common tasks such as estimating a VARMA model and making predictions with it. I will also discuss model uncertainty and how to define a VARMA model set and automatically search for the best models within this set. Additionally, we will explore the use of Principal Component Analysis as an alternative approach when dealing with a large number of potential endogenous or exogenous variables.

One of the main ideas behind ldt is to minimize user discretion. An analysis in ldt is generally based on a dataset and a set of rules that convert this dataset into a list of potential regressors and/or predictors. This rule-based approach to selecting data not only avoids discretion but is also expected due to the word “automatically” used in the previous paragraph.

In this example, I will create an artificial dataset with relevant and irrelevant endogenous variables and some exogenous variables. The data is a sample from a known VARMA model. While we can discuss how well the estimation process can find the true parameters, this is not the main goal here. Instead, I will focus on explaining how to estimate, search, predict, and report results.

Let’s get started!

A simple example

Let’s assume that we know the structure of the system and simulate data from a known VARMA model. The following command generates a sample from such a system of equations:

numObs <- 100
numHorizon <- 10
startDate <- f.yearly(1900)

numEndo <- 2L
numExo <- 3L
numAr <- 2L
numMa <- 1L 
d <- 1

sample <- sim.varma(numEndo, numAr, numMa, 
                    numExo, numObs, 10, TRUE, d, 
                    startFrequency = startDate)

The parameters of the system are included in the output of the sim.varma function. This system has numEndo endogenous variables determined by an intercept, 2 exogenous variable(s), and the dynamics of the system. The sample size is 100. All coefficients of the model are generated randomly and are listed in the output sample. The MA coefficients are diagonal and this is related to identification issues in VARMA models. The parameter d shows that data is integrated. There is a numHorizon value that determines the prediction horizon. We use the f.yearly function from the tdata package and assume that our data is yearly.

The LaTeX code for the equations of the system is in the eqsLatex element:

\[\begin{gather} \Delta Y_{1t} = -0.47 - 0.06\Delta Y_{1t-1} + 0.16\Delta Y_{2t-1} + 0.01\Delta Y_{1t-2} + 0.05\Delta Y_{2t-2} + 0.11 X_1 + 1.79 X_2 - 1.97 X_3\\ - 0.07 E_{1t-1} + 0 E_{2t-1} + E_{1t},\quad \sigma_1^2 = 1.70\\\Delta Y_{2t} = -1.07 - 0.02\Delta Y_{1t-1} + 0.01\Delta Y_{2t-1} + 0.17\Delta Y_{1t-2} - 0.13\Delta Y_{2t-2} - 0.56 X_1 + 0.50 X_2 + 0.70 X_3\\ + 0 E_{1t-1} - 0.07 E_{2t-1} + E_{2t},\quad \sigma_2^2 = 0.29 \end{gather}\]

The matrix representation is in the eqsLatexSys element:

\[\begin{gather} \begin{bmatrix}\Delta Y_{1t}\\\Delta Y_{2t}\end{bmatrix} = \begin{bmatrix}-0.47\\-1.07\end{bmatrix} + \begin{bmatrix}-0.06 & 0.16\\-0.02 & 0.01\end{bmatrix} \begin{bmatrix}\Delta Y_{1t-1}\\\Delta Y_{2t-1}\end{bmatrix} + \begin{bmatrix}0.01 & 0.05\\0.17 & -0.13\end{bmatrix} \begin{bmatrix}\Delta Y_{1t-2}\\\Delta Y_{2t-2}\end{bmatrix} + \\\begin{bmatrix}0.11 & 1.79 & -1.97\\-0.56 & 0.50 & 0.70\end{bmatrix}\begin{bmatrix}X_1\\X_2\\X_3\end{bmatrix}+ \\\begin{bmatrix}-0.07 & 0.00\\0.00 & -0.07\end{bmatrix}\begin{bmatrix}E_{1t-1}\\E_{2t-1}\end{bmatrix} + \begin{bmatrix}E_{1t}\\E_{2t}\end{bmatrix},\\ \Sigma = \begin{bmatrix} 1.70 & 0.33 \\ 0.33 & 0.29 \end{bmatrix} \end{gather}\]

We can use the ldt package to estimate them:

y <- structure(sample$y[1:(numObs-numHorizon), , drop = FALSE], 
                                 ldtf = attr(sample$y, "ldtf"))
x <- sample$x[1:(numObs-numHorizon), , drop = FALSE]

fit <- estim.varma(y = y, x = x, 
                   params = c(numAr, d, numMa, 0, 0, 0))

params <- get.varma.params(fit$estimations$coefs, numAr, numMa, numExo, TRUE)
s0 <- sim.varma(fit$estimations$sigma, params$arList, params$maList,
                params$exoCoef, 10, 0, params$integer)

In the first two lines we exclude numHorizon observations from the end of the sample in the estimation process. We will use the excluded part for prediction in the next subsection. The argument params determines the lags of the model. The second part of the code is for presentation. It converts the coefficient matrix into AR, MA and other coefficient matrices and generates a LaTeX formula. Here is the result in matrix form:

\[\begin{gather} \begin{bmatrix} Y_{1t}\\ Y_{2t}\end{bmatrix} = \begin{bmatrix}0.68\\-2.11\end{bmatrix} + \begin{bmatrix}-0.05 & 0.20\\-0.01 & 0.03\end{bmatrix} \begin{bmatrix} Y_{1t-1}\\ Y_{2t-1}\end{bmatrix} + \begin{bmatrix}0.11 & -0.02\\0.20 & -0.16\end{bmatrix} \begin{bmatrix} Y_{1t-2}\\ Y_{2t-2}\end{bmatrix} + \\\begin{bmatrix}0.19 & 1.51 & -1.87\\-0.59 & 0.56 & 0.79\end{bmatrix}\begin{bmatrix}X_1\\X_2\\X_3\end{bmatrix}+ \\\begin{bmatrix}-0.22 & 0.00\\0.00 & -0.22\end{bmatrix}\begin{bmatrix}E_{1t-1}\\E_{2t-1}\end{bmatrix} + \begin{bmatrix}E_{1t}\\E_{2t}\end{bmatrix},\\ \Sigma = \begin{bmatrix} 1.17 & 0.24 \\ 0.24 & 0.23 \end{bmatrix} \end{gather}\]

We can compare estimated parameters to actual ones. Keep in mind that we can get more satisfactory results by increasing sample size (numObs) or decreasing variance of disturbances.

Prediction

One of the main goals in estimating and using a VARMA model is prediction. In the following code, I estimate the model and set the value of maxHorizon and newX arguments to tell estim.varma function to return the predictions:

fit <- estim.varma(y = y, x = x,
                   params = c(numAr, 1, numMa, 0, 0, 0),
                   newX = sample$x[(numObs-numHorizon+1):numObs, , drop = FALSE],
                   maxHorizon = numHorizon)

y_actual <- sample$y[(numObs-numHorizon+1):numObs, , drop = FALSE]

The variable y_actual shows the actual observations at the prediction horizon. There are several ways to compare the predictions with the actual values. In this section, we plot them against each other using the ldt::fan.plot function. Note that predictions are in rows of means and vars matrices in fit$prediction element. The following plot shows the predictions and actual data:

Plotting prediction bounds (0.95 percent) for the two endogenous variables of the system.
Plotting prediction bounds (0.95 percent) for the two endogenous variables of the system.

Seasonality

We are going to repeat the previous example’s procedure, but this time with seasonal data. The following code generates the required data:

numObs_s <- 400
numHorizon_s <- 40
startDate_s <- f.quarterly(1900, 1)

numAr_s <- 1L
numMa_s <- 1L

d_s <- 1
D_s <- 1

sample_s <- sim.varma(numEndo, numAr, numMa, 
                    numExo, numObs_s, 10, TRUE, d_s, 
                    startFrequency =startDate_s,
                    seasonalCoefs = c(numAr_s,D_s,numMa_s,4))

The parameter D_s determines that the model is seasonally integrated. The two parameters numAr_s and numMa_s determine the seasonal dynamics of the system. The matrix representation is in the eqsLatexSys element:

\[\begin{gather} \begin{bmatrix}\Delta\Delta_4 Y_{1t}\\\Delta\Delta_4 Y_{2t}\end{bmatrix} = \begin{bmatrix}-1.94\\0.11\end{bmatrix} + \begin{bmatrix}0.02 & -0.14\\-0.11 & 0.21\end{bmatrix} \begin{bmatrix}\Delta\Delta_4 Y_{1t-1}\\\Delta\Delta_4 Y_{2t-1}\end{bmatrix} + \begin{bmatrix}-0.07 & 0.05\\-0.19 & 0.03\end{bmatrix} \begin{bmatrix}\Delta\Delta_4 Y_{1t-2}\\\Delta\Delta_4 Y_{2t-2}\end{bmatrix} + \begin{bmatrix}-0.14 & -0.01\\-0.19 & 0.11\end{bmatrix} \begin{bmatrix}\Delta\Delta_4 Y_{1t-4}\\\Delta\Delta_4 Y_{2t-4}\end{bmatrix} + \\\begin{bmatrix}-0.40 & -0.83 & 0.74\\0.90 & -0.33 & 0.99\end{bmatrix}\begin{bmatrix}X_1\\X_2\\X_3\end{bmatrix}+ \\\begin{bmatrix}0.06 & 0.00\\0.00 & 0.06\end{bmatrix}\begin{bmatrix}E_{1t-1}\\E_{2t-1}\end{bmatrix} + \begin{bmatrix}-0.05 & 0.00\\0.00 & -0.05\end{bmatrix}\begin{bmatrix}E_{1t-4}\\E_{2t-4}\end{bmatrix} + \begin{bmatrix}E_{1t}\\E_{2t}\end{bmatrix},\\ \Sigma = \begin{bmatrix} 0.77 & 0.04 \\ 0.04 & 0.42 \end{bmatrix} \end{gather}\]

The following code estimates these parameters using the estim.varma function:

y <- structure(sample_s$y[1:(numObs_s-numHorizon_s), , drop = FALSE], 
                                 ldtf = attr(sample_s$y, "ldtf"))
x <- sample_s$x[1:(numObs_s-numHorizon_s), , drop = FALSE]

fit <- estim.varma(y = y, x = x, 
                   params = c(numAr_s, d_s, numMa_s, numAr_s, D_s, numMa_s),
                   newX = sample_s$x[(numObs_s-numHorizon_s+1):numObs_s, , drop = FALSE],
                   maxHorizon = numHorizon_s,
                   seasonsCount = 4)

params <- get.varma.params(fit$estimations$coefs, numAr_s, numMa_s, numExo, TRUE, numAr_s, numMa_s, 4)
s0 <-  sim.varma(fit$estimations$sigma, params$arList, params$maList, 
                 params$exoCoef, d = d_s, nObs =  10, intercept =  params$integer,
                 seasonalCoefs = c(numAr_s, D_s, numMa_s, 4))

\[\begin{gather} r s0$eqsLatexSys \end{gather}\]

Predicting with this model is similar to the previous subsection. I do not report the code but just the result.

Plotting prediction bounds (0.95 percent) for the two endogenous variables of the seasonal system.
Plotting prediction bounds (0.95 percent) for the two endogenous variables of the seasonal system.

Model uncertainty

Let’s consider a more realistic situation where model uncertainty exists. That’s where ldt can specifically help. In the previous subsections, we knew all relevant endogenous variables. Here, we continue the non-seasonal example and consider a situation where there are some irrelevant endogenous variables too. We limit the level of uncertainty and other practical issues by restricting the number of these variables. The following code reflects our assumptions:

sample$y <- cbind(sample$y, matrix(rnorm(numObs * 50), ncol = 50, 
                                   dimnames = list(NULL,paste0("w",1:50))))

There are 50 irrelevant and 2 relevant endogenous variables. The number of irrelevant data is relatively large and their names start with the w character.

The following code uses the search.varma function to find the true model:

search_res <- search.varma(sample$y, sample$x, numTargets = 1,
                        ySizes = c(1:3), 
                        maxParams = c(2,1,2,0,0,0),
                        metricOptions = get.options.metric(typesIn = c("sic")),
                        searchOptions = get.options.search(printMsg = TRUE, parallel = TRUE))

The ySizes = c(1:3) part assumes that we know the number of relevant dependent variables is less than 3. The metric_options part shows that we use SIC metrics to evaluate and compare models. Also, numTargets = 1 shows that we are focusing on the first variable Y1. Finding the best model means finding Y2 and correct lag structure automatically. Here, the value of maxParams determines our guess about the lag structure.

This code is very time-consuming and is not evaluated here. However, on my system, the elapsed time is 27 minutes (the number of searched models is 21232). You can compare it with similar experiments in binary regression or SUR model and see that it is relatively much more time-consuming. Apart from other factors, VARMA models are relatively large (in sense of the number of parameters). Also, in the current implementation, ldt uses numerical first and second derivatives in L-BFGS optimization algorithm. Therefore, we really need to reduce the number of potential models.

One might reduce the number of potential explanatory variables using theory or statistical testing. Since ldt avoids user discretion, it provides a more systematic approach. The idea behind it is simple: estimate smaller models, select variables, estimate larger models with fewer potential endogenous variables. Here is the code:

y_size_steps = list(c(1,2), c(3))
count_steps = c(NA, 10)

search_step_res <-
  search.varma.stepwise(y = sample$y, x = sample$x, numTargets = 1,
                        maxParams = c(2,1,2,0,0,0),
                        ySizeSteps = y_size_steps, countSteps = count_steps,
                        metricOptions = get.options.metric(typesIn = c("aic","sic")),
                        searchItems = get.items.search(bestK = 10),
                        searchOptions = get.options.search(printMsg = FALSE, parallel = TRUE))
search_step_res
  > method: varma 
  > expected: 832, searched: 832 (100%), failed: 0 (0%)
  > elapsed time: 0.1513225 minutes 
  > --------
  > 1. aic:
  >  Y1 (best=309.752)
  > 2. sic:
  >  Y1 (best=321.315)

The first two lines define the steps. We use all variables (NA in count_steps means all) to estimate models with sizes defined as the first element of y_size_steps. Then we select a number of variables from the information provided by the best models and estimate models with sizes determined by the second element of y_size_steps. And so on.

The size of the model subset and running time are greatly reduced. We can talk about performance; however if the result is not satisfactory, note that the time-consuming part of the search is with “moving average” part of the VARMA model. Therefore one can find a smaller subset of potential variables by estimating VAR models and using the results to estimate VARMA models.

To study or report results, we should use the summary function. The output of a search project in ldt does not contain estimation results but only the minimum level of information to replicate them. The summary function estimates the models. Here is the code:

ssum <- summary(search_step_res, 
                y = sample$y, x = sample$x, test = TRUE)

Usually, there is more than one model in the summary output. This is because the output is first “target-variable-specific” and second “evaluation-specific”. As before, we can use the estimated best models for prediction.