In this vignette, I will introduce you to the main features of the
ldt
package for dealing with Vector Autoregressive
Regression models. I will demonstrate how to perform common tasks such
as estimating a VARMA model and making predictions with it. I will also
discuss model uncertainty and how to define a VARMA model set and
automatically search for the best models within this set. Additionally,
we will explore the use of Principal Component Analysis as an
alternative approach when dealing with a large number of potential
endogenous or exogenous variables.
One of the main ideas behind ldt
is to minimize user
discretion. An analysis in ldt
is generally based on a
dataset and a set of rules that convert this dataset into a list of
potential regressors and/or predictors. This rule-based approach to
selecting data not only avoids discretion but is also expected due to
the word “automatically” used in the previous paragraph.
In this example, I will create an artificial dataset with relevant and irrelevant endogenous variables and some exogenous variables. The data is a sample from a known VARMA model. While we can discuss how well the estimation process can find the true parameters, this is not the main goal here. Instead, I will focus on explaining how to estimate, search, predict, and report results.
Let’s get started!
Let’s assume that we know the structure of the system and simulate data from a known VARMA model. The following command generates a sample from such a system of equations:
numObs <- 100
numHorizon <- 10
startDate <- f.yearly(1900)
numEndo <- 2L
numExo <- 3L
numAr <- 2L
numMa <- 1L
d <- 1
sample <- sim.varma(numEndo, numAr, numMa,
numExo, numObs, 10, TRUE, d,
startFrequency = startDate)
The parameters of the system are included in the output of the
sim.varma
function. This system has numEndo
endogenous variables determined by an intercept, 2 exogenous
variable(s), and the dynamics of the system. The sample size is 100. All
coefficients of the model are generated randomly and are listed in the
output sample
. The MA coefficients are diagonal and this is
related to identification issues in VARMA models. The parameter
d
shows that data is integrated. There is a
numHorizon
value that determines the prediction horizon. We
use the f.yearly
function from the tdata
package and assume that our data is yearly.
The LaTeX code for the equations of the system is in the
eqsLatex
element:
\[\begin{gather} \Delta Y_{1t} = -0.47 - 0.06\Delta Y_{1t-1} + 0.16\Delta Y_{2t-1} + 0.01\Delta Y_{1t-2} + 0.05\Delta Y_{2t-2} + 0.11 X_1 + 1.79 X_2 - 1.97 X_3\\ - 0.07 E_{1t-1} + 0 E_{2t-1} + E_{1t},\quad \sigma_1^2 = 1.70\\\Delta Y_{2t} = -1.07 - 0.02\Delta Y_{1t-1} + 0.01\Delta Y_{2t-1} + 0.17\Delta Y_{1t-2} - 0.13\Delta Y_{2t-2} - 0.56 X_1 + 0.50 X_2 + 0.70 X_3\\ + 0 E_{1t-1} - 0.07 E_{2t-1} + E_{2t},\quad \sigma_2^2 = 0.29 \end{gather}\]
The matrix representation is in the eqsLatexSys
element:
\[\begin{gather} \begin{bmatrix}\Delta Y_{1t}\\\Delta Y_{2t}\end{bmatrix} = \begin{bmatrix}-0.47\\-1.07\end{bmatrix} + \begin{bmatrix}-0.06 & 0.16\\-0.02 & 0.01\end{bmatrix} \begin{bmatrix}\Delta Y_{1t-1}\\\Delta Y_{2t-1}\end{bmatrix} + \begin{bmatrix}0.01 & 0.05\\0.17 & -0.13\end{bmatrix} \begin{bmatrix}\Delta Y_{1t-2}\\\Delta Y_{2t-2}\end{bmatrix} + \\\begin{bmatrix}0.11 & 1.79 & -1.97\\-0.56 & 0.50 & 0.70\end{bmatrix}\begin{bmatrix}X_1\\X_2\\X_3\end{bmatrix}+ \\\begin{bmatrix}-0.07 & 0.00\\0.00 & -0.07\end{bmatrix}\begin{bmatrix}E_{1t-1}\\E_{2t-1}\end{bmatrix} + \begin{bmatrix}E_{1t}\\E_{2t}\end{bmatrix},\\ \Sigma = \begin{bmatrix} 1.70 & 0.33 \\ 0.33 & 0.29 \end{bmatrix} \end{gather}\]
We can use the ldt
package to estimate them:
y <- structure(sample$y[1:(numObs-numHorizon), , drop = FALSE],
ldtf = attr(sample$y, "ldtf"))
x <- sample$x[1:(numObs-numHorizon), , drop = FALSE]
fit <- estim.varma(y = y, x = x,
params = c(numAr, d, numMa, 0, 0, 0))
params <- get.varma.params(fit$estimations$coefs, numAr, numMa, numExo, TRUE)
s0 <- sim.varma(fit$estimations$sigma, params$arList, params$maList,
params$exoCoef, 10, 0, params$integer)
In the first two lines we exclude numHorizon
observations from the end of the sample in the estimation process. We
will use the excluded part for prediction in the next subsection. The
argument params
determines the lags of the model. The
second part of the code is for presentation. It converts the coefficient
matrix into AR, MA and other coefficient matrices and generates a LaTeX
formula. Here is the result in matrix form:
\[\begin{gather} \begin{bmatrix} Y_{1t}\\ Y_{2t}\end{bmatrix} = \begin{bmatrix}0.68\\-2.11\end{bmatrix} + \begin{bmatrix}-0.05 & 0.20\\-0.01 & 0.03\end{bmatrix} \begin{bmatrix} Y_{1t-1}\\ Y_{2t-1}\end{bmatrix} + \begin{bmatrix}0.11 & -0.02\\0.20 & -0.16\end{bmatrix} \begin{bmatrix} Y_{1t-2}\\ Y_{2t-2}\end{bmatrix} + \\\begin{bmatrix}0.19 & 1.51 & -1.87\\-0.59 & 0.56 & 0.79\end{bmatrix}\begin{bmatrix}X_1\\X_2\\X_3\end{bmatrix}+ \\\begin{bmatrix}-0.22 & 0.00\\0.00 & -0.22\end{bmatrix}\begin{bmatrix}E_{1t-1}\\E_{2t-1}\end{bmatrix} + \begin{bmatrix}E_{1t}\\E_{2t}\end{bmatrix},\\ \Sigma = \begin{bmatrix} 1.17 & 0.24 \\ 0.24 & 0.23 \end{bmatrix} \end{gather}\]
We can compare estimated parameters to actual ones. Keep in mind that
we can get more satisfactory results by increasing sample size
(numObs
) or decreasing variance of disturbances.
One of the main goals in estimating and using a VARMA model is
prediction. In the following code, I estimate the model and set the
value of maxHorizon
and newX
arguments to tell
estim.varma
function to return the predictions:
fit <- estim.varma(y = y, x = x,
params = c(numAr, 1, numMa, 0, 0, 0),
newX = sample$x[(numObs-numHorizon+1):numObs, , drop = FALSE],
maxHorizon = numHorizon)
y_actual <- sample$y[(numObs-numHorizon+1):numObs, , drop = FALSE]
The variable y_actual
shows the actual observations at
the prediction horizon. There are several ways to compare the
predictions with the actual values. In this section, we plot them
against each other using the ldt::fan.plot
function. Note
that predictions are in rows of means
and vars
matrices in fit$prediction
element. The following plot
shows the predictions and actual data:
We are going to repeat the previous example’s procedure, but this time with seasonal data. The following code generates the required data:
numObs_s <- 400
numHorizon_s <- 40
startDate_s <- f.quarterly(1900, 1)
numAr_s <- 1L
numMa_s <- 1L
d_s <- 1
D_s <- 1
sample_s <- sim.varma(numEndo, numAr, numMa,
numExo, numObs_s, 10, TRUE, d_s,
startFrequency =startDate_s,
seasonalCoefs = c(numAr_s,D_s,numMa_s,4))
The parameter D_s
determines that the model is
seasonally integrated. The two parameters numAr_s
and
numMa_s
determine the seasonal dynamics of the system. The
matrix representation is in the eqsLatexSys
element:
\[\begin{gather} \begin{bmatrix}\Delta\Delta_4 Y_{1t}\\\Delta\Delta_4 Y_{2t}\end{bmatrix} = \begin{bmatrix}-1.94\\0.11\end{bmatrix} + \begin{bmatrix}0.02 & -0.14\\-0.11 & 0.21\end{bmatrix} \begin{bmatrix}\Delta\Delta_4 Y_{1t-1}\\\Delta\Delta_4 Y_{2t-1}\end{bmatrix} + \begin{bmatrix}-0.07 & 0.05\\-0.19 & 0.03\end{bmatrix} \begin{bmatrix}\Delta\Delta_4 Y_{1t-2}\\\Delta\Delta_4 Y_{2t-2}\end{bmatrix} + \begin{bmatrix}-0.14 & -0.01\\-0.19 & 0.11\end{bmatrix} \begin{bmatrix}\Delta\Delta_4 Y_{1t-4}\\\Delta\Delta_4 Y_{2t-4}\end{bmatrix} + \\\begin{bmatrix}-0.40 & -0.83 & 0.74\\0.90 & -0.33 & 0.99\end{bmatrix}\begin{bmatrix}X_1\\X_2\\X_3\end{bmatrix}+ \\\begin{bmatrix}0.06 & 0.00\\0.00 & 0.06\end{bmatrix}\begin{bmatrix}E_{1t-1}\\E_{2t-1}\end{bmatrix} + \begin{bmatrix}-0.05 & 0.00\\0.00 & -0.05\end{bmatrix}\begin{bmatrix}E_{1t-4}\\E_{2t-4}\end{bmatrix} + \begin{bmatrix}E_{1t}\\E_{2t}\end{bmatrix},\\ \Sigma = \begin{bmatrix} 0.77 & 0.04 \\ 0.04 & 0.42 \end{bmatrix} \end{gather}\]
The following code estimates these parameters using the
estim.varma
function:
y <- structure(sample_s$y[1:(numObs_s-numHorizon_s), , drop = FALSE],
ldtf = attr(sample_s$y, "ldtf"))
x <- sample_s$x[1:(numObs_s-numHorizon_s), , drop = FALSE]
fit <- estim.varma(y = y, x = x,
params = c(numAr_s, d_s, numMa_s, numAr_s, D_s, numMa_s),
newX = sample_s$x[(numObs_s-numHorizon_s+1):numObs_s, , drop = FALSE],
maxHorizon = numHorizon_s,
seasonsCount = 4)
params <- get.varma.params(fit$estimations$coefs, numAr_s, numMa_s, numExo, TRUE, numAr_s, numMa_s, 4)
s0 <- sim.varma(fit$estimations$sigma, params$arList, params$maList,
params$exoCoef, d = d_s, nObs = 10, intercept = params$integer,
seasonalCoefs = c(numAr_s, D_s, numMa_s, 4))
\[\begin{gather} r s0$eqsLatexSys \end{gather}\]
Predicting with this model is similar to the previous subsection. I do not report the code but just the result.
Let’s consider a more realistic situation where model uncertainty
exists. That’s where ldt
can specifically help. In the
previous subsections, we knew all relevant endogenous variables. Here,
we continue the non-seasonal example and consider a situation where
there are some irrelevant endogenous variables too. We limit the level
of uncertainty and other practical issues by restricting the number of
these variables. The following code reflects our assumptions:
sample$y <- cbind(sample$y, matrix(rnorm(numObs * 50), ncol = 50,
dimnames = list(NULL,paste0("w",1:50))))
There are 50 irrelevant and 2 relevant endogenous variables. The
number of irrelevant data is relatively large and their names start with
the w
character.
The following code uses the search.varma
function to
find the true model:
search_res <- search.varma(sample$y, sample$x, numTargets = 1,
ySizes = c(1:3),
maxParams = c(2,1,2,0,0,0),
metricOptions = get.options.metric(typesIn = c("sic")),
searchOptions = get.options.search(printMsg = TRUE, parallel = TRUE))
The ySizes = c(1:3)
part assumes that we know the number
of relevant dependent variables is less than 3. The
metric_options
part shows that we use SIC metrics to
evaluate and compare models. Also, numTargets = 1
shows
that we are focusing on the first variable Y1.
Finding the
best model means finding Y2
and correct lag structure
automatically. Here, the value of maxParams
determines our
guess about the lag structure.
This code is very time-consuming and is not evaluated here. However,
on my system, the elapsed time is 27 minutes (the number of searched
models is 21232). You can compare it with similar experiments in binary regression or SUR
model and see that it is relatively much more time-consuming. Apart
from other factors, VARMA models are relatively large (in sense of the
number of parameters). Also, in the current implementation,
ldt
uses numerical first and second derivatives in L-BFGS
optimization algorithm. Therefore, we really need to reduce the number
of potential models.
One might reduce the number of potential explanatory variables using
theory or statistical testing. Since ldt
avoids user
discretion, it provides a more systematic approach. The idea behind it
is simple: estimate smaller models, select variables, estimate larger
models with fewer potential endogenous variables. Here is the code:
y_size_steps = list(c(1,2), c(3))
count_steps = c(NA, 10)
search_step_res <-
search.varma.stepwise(y = sample$y, x = sample$x, numTargets = 1,
maxParams = c(2,1,2,0,0,0),
ySizeSteps = y_size_steps, countSteps = count_steps,
metricOptions = get.options.metric(typesIn = c("aic","sic")),
searchItems = get.items.search(bestK = 10),
searchOptions = get.options.search(printMsg = FALSE, parallel = TRUE))
search_step_res
> method: varma
> expected: 832, searched: 832 (100%), failed: 0 (0%)
> elapsed time: 0.1513225 minutes
> --------
> 1. aic:
> Y1 (best=309.752)
> 2. sic:
> Y1 (best=321.315)
The first two lines define the steps. We use all variables
(NA
in count_steps means all) to estimate models with sizes
defined as the first element of y_size_steps
. Then we
select a number of variables from the information provided by the best
models and estimate models with sizes determined by the second element
of y_size_steps
. And so on.
The size of the model subset and running time are greatly reduced. We can talk about performance; however if the result is not satisfactory, note that the time-consuming part of the search is with “moving average” part of the VARMA model. Therefore one can find a smaller subset of potential variables by estimating VAR models and using the results to estimate VARMA models.
To study or report results, we should use the summary
function. The output of a search project in ldt
does not
contain estimation results but only the minimum level of information to
replicate them. The summary
function estimates the models.
Here is the code:
Usually, there is more than one model in the summary output. This is because the output is first “target-variable-specific” and second “evaluation-specific”. As before, we can use the estimated best models for prediction.