--- title: "Kernel and Streaming PLS Methods in bigPLSR" shorttitle: "Kernel and Streaming PLS Methods" author: - name: "Frédéric Bertrand" affiliation: - Cedric, Cnam, Paris email: frederic.bertrand@lecnam.net date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{Kernel and Streaming PLS Methods in bigPLSR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup_ops, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "figures/kpls_review-", fig.width = 7, fig.height = 5, dpi = 150, message = FALSE, warning = FALSE ) LOCAL <- identical(Sys.getenv("LOCAL"), "TRUE") set.seed(2025) ``` ## Notation Let \(X \in \mathbb{R}^{n\times p}\) and \(Y \in \mathbb{R}^{n\times m}\). We assume column-centered data unless stated otherwise. PLS extracts latent scores \(T=[t_1,\dots,t_a]\) with loadings and weights so that covariance between \(X\) and \(Y\) along \(t_a\) is maximized, with orthogonality constraints across components. For kernel methods, let \(\phi(\cdot)\) be an implicit feature map and define the Gram matrix \(K_X = \Phi_X \Phi_X^\top\) where \((K_X)_{ij} = k(x_i, x_j)\). The centering operator \(H = I_n - \frac{1}{n}\mathbf{1}\mathbf{1}^\top\) yields a centered Gram \(\tilde K_X = H K_X H\). ## Pseudo-code for bigPLSR algorithms The package implements several complementary extraction schemes. The following pseudo-code summarises the core loops. ### SIMPLS (dense/bigmem) 1. Compute centered cross-products \(C_{xx} = X^\top X\), \(C_{xy} = X^\top Y\). 2. Initialise orthonormal basis \(V = []\). 3. For each component \(a = 1..A\): - Deflate \(C_{xy}\) in the subspace spanned by \(V\). - Extract \(q_a\) as the dominant eigenvector of \(C_{xy}^\top C_{xy}\). - Compute \(w_a = C_{xy} q_a\) and normalise under the \(C_{xx}\)-metric. - Obtain loadings \(p_a = C_{xx} w_a\) and regression weights \(c_a = C_{xy}^\top w_a\). - Expand \(V \leftarrow [V, p_a]\). 4. Form \(W = [w_a]\), \(P = [p_a]\), \(Q = [c_a]\) and compute regression coefficients \(B = W (P^\top W)^{-1} Q^\top\). ### NIPALS (dense/streamed) 1. Initialise \(t_a\) from \(Y\) (or \(X\)). 2. Iterate until convergence: - \(w_a = X^\top t_a / (t_a^\top t_a)\), normalise \(w_a\). - \(t_a = X w_a\). - \(c_a = Y^\top t_a / (t_a^\top t_a)\). - \(u_a = Y c_a\) (for multi-response). 3. Deflate \(X \leftarrow X - t_a p_a^\top\), \(Y \leftarrow Y - t_a q_a^\top\) and repeat for the next component. ### Kernel PLS / RKHS (dense & streamed) 1. Form (or stream) the centered Gram matrix \(\tilde K_X\). 2. At each iteration extract a dual weight \(\alpha_a\) maximising covariance with \(Y\). 3. Obtain the score \(t_a = \tilde K_X \alpha_a\), regress \(Y\) on \(t_a\) to get \(q_a\) and deflate in the \(\tilde K_X\) metric. 4. Accumulate \(\alpha_a\), \(q_a\) and the orthonormal basis for subsequent deflation steps. ### Double RKHS ( `algorithm = "rkhs_xy"` ) 1. Build (or approximate) Gram matrices for \(X\) and \(Y\). 2. Extract dual directions \(\alpha_a\) and \(\beta_a\) so that the score pair \((t_a, u_a)\) maximises covariance under both kernels. 3. Use ridge-regularised projections to obtain regression weights. 4. Store kernel centering statistics for prediction. ### Kalman-filter PLS (`algorithm = "kf_pls"`) 1. Maintain exponentially weighted means \(\mu_x, \mu_y\). 2. Update cross-products \(C_{xx}, C_{xy}\) with forgetting factor \(\lambda\) and optional process noise. 3. Periodically call SIMPLS on the smoothed moments to recover regression coefficients consistent with the streamed state. Common kernels: \[ \begin{aligned} \text{Linear:}\quad& k(x,z) = x^\top z \\ \text{RBF:}\quad& k(x,z) = \exp(-\gamma \|x-z\|^2) \\ \text{Polynomial:}\quad& k(x,z) = (\gamma\,x^\top z + c_0)^{d} \\ \text{Sigmoid:}\quad& k(x,z) = \tanh(\gamma\,x^\top z + c_0). \end{aligned} \] ### Centering the Gram matrix Given \(K\in\mathbb{R}^{n\times n}\), the centered version is: \[ \tilde K = H K H, \quad H = I_n - \tfrac{1}{n}\mathbf{1}\mathbf{1}^\top. \] --- ## KLPLS / Kernel PLS (Dayal & MacGregor) We operate in the dual. Consider \(K_X\) and \(K_{XY} = K_X Y\). At step \(a\), we extract a dual direction \(\alpha_a\) so that the score \(t_a = \tilde K_X \alpha_a\) maximizes covariance with \(Y\), subject to orthogonality in the RKHS metric: \[ \max_{\alpha} \ \mathrm{cov}(t, Y) \quad \text{s.t.}\ \ t=\tilde K_X \alpha,\ \ t^\top t = 1,\ \ t^\top t_b = 0,\ b. - Rosipal, R., & Trejo, L.J. (2001). Kernel PLS regression in RKHS. *Journal of Machine Learning Research*, **2**, 97–123, . - Tenenhaus et al., Kernel Logistic PLS. - Sparse Kernel Partial Least Squares Regression. In *LNCS* Proceedings. - Rosipal et al., RKHS PLS (JMLR) . - Kernel PLS Regression II (double RKHS). *IEEE Transactions on Neural Networks and Learning Systems*, . - KF-PLS (2024)