--- title: "**Methodological Details of the Error Correction Model with MARS**" author: "José Mauricio Gómez Julián" date: "`r format(Sys.Date(), '%B %Y')`" output: rmarkdown::html_vignette: toc: true number_sections: false vignette: > %\VignetteIndexEntry{**Methodological Details of the Error Correction Model with MARS**} # único por archivo %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include=FALSE} is_cran <- !identical(Sys.getenv("NOT_CRAN"), "true") knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, fig.path = "figures/" ) # Si el cómputo es pesado, deja los chunks como eval = !is_cran ``` # **Econometric Framework for Cointegration and Error Correction Analysis** This document details the methodology implemented to evaluate long-run equilibrium relationships and short-run adjustment dynamics between production and circulation variables through a hybrid approach that combines classical econometric cointegration techniques with modern non-parametric statistical learning methods. This methodology integrates the rigor of cointegration analysis with the flexibility of Multivariate Adaptive Regression Splines (MARS) to capture non-linearities in the error correction mechanism. ## **Determination of Integration Order I(1)** ### **Theoretical Foundation** Cointegration analysis requires that the involved time series be integrated of order one, denoted as $I(1)$. An $I(1)$ series is non-stationary in levels but becomes stationary after first differencing. This property is fundamental because only $I(1)$ processes can maintain long-run equilibrium relationships without diverging indefinitely. ### **Verification Protocol** Let $z_t \in \{Y_t, X_t\}$ be a time series. We determine that $z_t \sim I(1)$ through the following two-stage protocol: **Stage 1: Non-stationarity in levels** We apply the Augmented Dickey-Fuller (ADF) test with deterministic components: $$\Delta z_t = \mu + \beta t + \rho z_{t-1} + \sum_{i=1}^{p} \psi_i \Delta z_{t-i} + \varepsilon_t$$ where we test $H_0: \rho = 0$ (unit root). We require failure to reject $H_0$ at 10% significance for both drift and trend specifications, indicating robust non-stationarity across deterministic specifications. **Stage 2: Stationarity in first differences** We test the first difference without deterministic components: $$\Delta^2 z_t = \rho' \Delta z_{t-1} + \sum_{i=1}^{p} \psi'_i \Delta^2 z_{t-i} + \eta_t$$ We require rejection of $H_0: \rho' = 0$ at 10%, confirming stationarity after differencing. ### **Justification of Significance Level** We use 10% in $I(1)$ tests as a conservative pre-filter to reduce the False Negative Rate (FNR)[^1]. This more permissive threshold at the initial stage is compensated by stricter filters in subsequent stages (cointegration at 5%, unidirectional ECM at 5%, out-of-sample validation). ## **Cointegration Analysis** ### **Dual Approach: Engle-Granger and Johansen** We implement two complementary cointegration methodologies, recognizing that each has specific strengths: #### **Engle-Granger Procedure with Phillips-Ouliaris** **Step 1: Cointegration regression** We estimate the long-run relationship: $$Y_t = \alpha + \beta X_t + u_t$$ where $\alpha$ and $\beta$ are the parameters of the cointegrating vector $(1, -\alpha, -\beta)'$. **Step 2: Residual stationarity test** We apply two tests on $\hat{u}_t$: - **ADF without deterministics**: Tests unit root in residuals at level $p < 0.05$ - **Phillips-Ouliaris (PO)**: Test robust to endogeneity and serial correlation **Step 3: "Either" decision rule** We accept cointegration if **either** test (ADF or PO) validates at 5%. This rule reduces FNR in the presence of structural breaks, compensated by stricter subsequent validation. #### **Johansen Trace Test** We specify a VAR of order $K$ (selected by BIC criterion via `VARselect`) and transform it to its VECM representation: $$\Delta Z_t = \Pi Z_{t-1} + \sum_{i=1}^{K-1} \Gamma_i \Delta Z_{t-i} + \Psi D_t + \varepsilon_t$$ where: - $Z_t = (Y_t, X_t)'$ is the variable vector - $\Pi = \alpha\beta'$ is the long-run impact matrix - $D_t$ contains deterministics (constant and/or trend) We test $H_0: r = 0$ (no cointegrating vectors) using the trace statistic. We test with specifications $\{\text{const}, \text{trend}\}$ and accept cointegration if the statistic exceeds the critical value at 5% for any specification. ### **Justification of the "Either" Rule** The "either" rule (EG **or** Johansen) instead of "both" (EG **and** Johansen) is justified by: 1. **Robustness to regime changes**: Different tests may be sensitive to different types of structural breaks 2. **Compensation with subsequent filters**: The ECM with $\lambda < 0$ and out-of-sample validation filter false positives 3. **Empirical evidence**: Monte Carlo tests show lower FNR without substantial FPR inflation when combined with predictive validation ## **Error Correction Model (ECM)** ### **Linear ECM Specification** Given the cointegrating vector $(\alpha, \beta)$ from Engle-Granger, we construct: $$\text{ECM1}_t = Y_{t-1} - \alpha - \beta X_{t-1}$$ This term captures the deviation from long-run equilibrium at $t-1$. The complete ECM model is: $$\Delta Y_t = \lambda \cdot \text{ECM1}_t + \sum_{i=1}^{L} \phi_i \Delta Y_{t-i} + \sum_{i=1}^{L} \gamma_i \Delta X_{t-i} + \varepsilon_t$$ where: - $\lambda < 0$ is the speed of adjustment toward equilibrium - $L$ is the number of lags in differences - $\phi_i, \gamma_i$ capture short-run dynamics ### **Optimal Lag Selection** We implement an automatic selection procedure: 1. **Search over $L \in \{1, 2, ..., L_{\max}\}$** with $L_{\max} = 4$ by default 2. **Primary criterion**: BIC for parsimony 3. **White noise constraint**: If the model with lowest BIC fails Ljung-Box ($p \leq 0.05$ with 12 lags), we select the model with lowest BIC that passes the test 4. **Optional guidance**: $L \approx \max(1, K-1)$ where $K$ is the VAR order, though not binding ### **Unidirectional Test with HAC Errors** #### **Hypothesis and Justification** We test: - $H_0: \lambda \geq 0$ (no correction or divergence) - $H_1: \lambda < 0$ (correction toward equilibrium exists) The unidirectional test is fundamental because $\lambda > 0$ would imply divergence from equilibrium, which is economically incoherent and would violate the system's stability condition. #### **Robust Inference** We employ HAC (Heteroskedasticity and Autocorrelation Consistent) standard errors using the Newey-West estimator: $$\hat{V}_{NW} = \hat{\Omega}_0 + \sum_{j=1}^{m} w_j (\hat{\Omega}_j + \hat{\Omega}'_j)$$ where $w_j = 1 - j/(m+1)$ are Bartlett weights and $m$ is automatically selected. The robust $t$ statistic is: $$t = \frac{\hat{\lambda}}{\sqrt{\hat{V}_{NW,\lambda\lambda}}}$$ with one-sided p-value $p = P(T \leq t | H_0)$. We reject if $p < 0.05$ **and** $\hat{\lambda} < 0$. ## **Non-Linear Extension with MARS** ### **Economic Motivation** Economic relationships frequently exhibit non-linearities: - **Threshold effects**: Different responses depending on variable levels - **Asymmetries**: Different adjustments for positive vs. negative deviations - **Regime changes**: Parameters varying with economic context MARS captures these characteristics through adaptive basis functions without requiring *a priori* specification of the functional form. ### **MARS-ECM Model Specification** The non-linear model is specified as: $$\Delta Y_t = f(\text{ECM1}_t, \Delta X_t, \Delta Y_{t-1}, \Delta X_{t-1}, \Delta Y_{t-2}) + \eta_t$$ where $f(\cdot)$ is approximated by MARS as: $$f(\mathbf{x}) = \beta_0 + \sum_{m=1}^{M} \beta_m \prod_{k=1}^{K_m} h_{km}(x_{v(k,m)})$$ with: - $h_{km}$ are hinge functions: $\max(0, x - c)$ or $\max(0, c - x)$ - $M$ is the number of basis functions (controlled by `nk`) - $K_m$ is the interaction degree (controlled by `degree`) ### **Hyperparameter Configuration** The search grid specifies: - **degree** $\in \{1, 2\}$: Controls interactions (1 = additive, 2 = allows interactions) - **nk** $\in \{15, 25, 35, 50, 65\}$: Maximum number of terms before pruning This grid balances flexibility with overfitting risk, expandable with more historical data. ## **Temporal Cross-Validation** ### **Rolling-Origin with Sliding Window** We implement temporal cross-validation respecting causality through rolling-origin with sliding window: #### **Configuration Parameters** - **Initial size**: $\max(40, 0.80 \times n)$ observations - **Test horizon**: 12 months (annual forecast) - **Step between origins**: 12 months (avoids overlap) - **Window type**: Sliding (constant size) vs Expanding (cumulative) #### **Sliding Window Justification** The sliding window maintains "comparable memory" between folds and is more sensitive to regime changes than the expanding window. This is crucial for economic series with non-constant parameters in a broad sense. Empirical evidence shows that sliding: 1. Preserves global stability (same number of robust models) 2. Increases local sensitivity (detects more regime-dependent relationships) 3. Improves adaptation to recent dynamics ### **Nested Cross-Validation** For hyperparameter selection without contaminating evaluation: 1. **Outer level**: Rolling-origin for performance evaluation 2. **Inner level**: Within each outer train, additional rolling-origin with: - Initial: 60% of outer train - Inner test: 6 months - Inner step: 3 months This structure avoids *data snooping*[^2] and provides unbiased estimates of generalization error. ## **Evaluation Metrics** ### **Scale-Dependent Metrics** - **RMSE**: $\sqrt{\frac{1}{n}\sum_{t=1}^n (Y_t - \hat{Y}_t)^2}$ - **MAE**: $\frac{1}{n}\sum_{t=1}^n |Y_t - \hat{Y}_t|$ ### **Relative Metrics** - **MAPE**: $\frac{100}{n}\sum_{t=1}^n \left|\frac{Y_t - \hat{Y}_t}{Y_t}\right|$ (protected for $|Y_t| > \epsilon$) - **sMAPE**: $\frac{100}{n}\sum_{t=1}^n \frac{2|Y_t - \hat{Y}_t|}{|Y_t| + |\hat{Y}_t|}$ - **Theil's U**: $\frac{\sqrt{\overline{(Y_t - \hat{Y}_t)^2}}}{\sqrt{\overline{Y_t^2}}}$ ### **Explanatory Metric** - **Protected $R^2$**: $$R^2 = \begin{cases} 1 - \frac{\sum(Y_t - \hat{Y}_t)^2}{\sum(Y_t - \bar{Y})^2} & \text{if } SST > \epsilon \\ \text{NA} & \text{if } SST \leq \epsilon \end{cases}$$ ### **Theil Decomposition** The MSE decomposes into interpretable components: $$\text{MSE} = \underbrace{(\bar{\hat{Y}} - \bar{Y})^2}_{\text{Bias}^2} + \underbrace{(\sigma_{\hat{Y}} - \sigma_Y)^2}_{\text{Var. differential}} + \underbrace{2\sigma_{\hat{Y}}\sigma_Y(1 - \rho_{\hat{Y},Y})}_{\text{Imperfect covariance}}$$ The proportions (bias_prop, var_prop, cov_prop) diagnose the primary source of predictive error. ## **Temporal Stability Criteria** ### **Support Metric** We define support as: $$\text{support} = \frac{\text{folds\_proceed}}{\text{folds}}$$ where `folds_proceed` counts folds that pass all econometric filters and have acceptable predictive performance. ### **Validation Thresholds** - **Strict threshold**: support $\geq 0.75$ **and** folds_proceed $\geq 5$ - **Moderate threshold**: support $\geq 0.60$ **and** folds_proceed $\geq 3$ The absolute minimum requirement prevents "false robustness" from small denominators. ### **Stability-Adjusted Metrics** - **Stable $R^2$**: $R^2_{\text{stab}} = R^2 \times \text{support}$ - **Stable U**: $U_{\text{stab}} = \frac{U}{\text{support}}$ (penalizes instability) These metrics integrate predictive performance with temporal consistency, favoring "good and constant" models over "sometimes excellent" ones. ## **Computational Implementation** ### **Multi-Level Parallelization** The parallelization architecture operates at two levels: 1. **Pair level**: Each combination $(X \to Y)$ is processed in an independent worker 2. **BLAS control**: `blas_set_num_threads(1)` is set to avoid CPU over-subscription Workers are independent R processes (not threads) coordinated by `future::multisession`, each with its own memory. The seed `future.seed=TRUE` ensures reproducibility in parallel. ### **Progress Management** The `progressr` package provides real-time feedback without interfering with parallelization, crucial for long executions (84 pairs × multiple folds × nested validation). ### **Computational Complexity** Total complexity is: $$\mathcal{O}(N_{\text{pairs}} \times F_{\text{outer}} \times F_{\text{inner}} \times G \times C_{\text{model}})$$ where: - $N_{\text{pairs}} = 84$ (6 circulation × 7 production × 2 directions) - $F_{\text{outer}}$ = number of outer folds (typically 8-15) - $F_{\text{inner}}$ = inner folds per outer fold (typically 3-5) - $G$ = grid size