# A Quick Introduction to the psqn Package

$\renewcommand\vec{\boldsymbol} \def\bigO#1{\mathcal{O}(#1)} \def\Cond#1#2{\left(#1\,\middle|\, #2\right)} \def\mat#1{\boldsymbol{#1}} \def\der{{\mathop{}\!\mathrm{d}}} \def\argmax{\text{arg}\,\text{max}} \def\Prob{\text{P}} \def\diag{\text{diag}} \def\argmin{\text{arg}\,\text{min}} \def\Expe{\text{E}}$

This is a quick introduction to the psqn package. A more detailed description can be found in the psqn vignette (call vignette("psqn", package = "psqn")). The main function in the package is the psqn function. The psqn function minimizes functions which can be written like:

$f(\vec x) = \sum_{i = 1}^n f_i(\vec x_{\mathcal I_i})$

where $$\vec x\in \mathbb R^l$$,

$\vec x_{\mathcal I_i} = (\vec e_{j_{i1}}^\top, \dots ,\vec e_{j_{im_i}}^\top)\vec x, \qquad \mathcal I_i = (j_{i1}, \dots, \mathcal j_{im_i}) \subseteq \{1, \dots, l\},$

and $$\vec e_k$$ is the $$k$$’th column of the $$l$$ dimensional identity matrix. We call the $$f_i$$s element functions and assume that each of them only depend on a small number of variables. Furthermore, we assume that each index set $$\mathcal I_i$$ is of the form:

\begin{align*} \mathcal I_i &= \{1,\dots, p\} \cup \mathcal J_i \\ \mathcal J_i \cap \mathcal J_j &= \emptyset \qquad j\neq i \\ \mathcal J_i \cap \{1,\dots, p\} &= \emptyset \qquad \forall i = 1,\dots, n \end{align*}.

That is, each index set contains $$p$$ global parameters and $$q_i = \lvert\mathcal J_i\rvert$$ private parameters which are particular for each element function, $$f_i$$. For implementation reason, we let:

\begin{align*} \overleftarrow q_i &= \begin{cases} p & i = 0 \\ p + \sum_{k = 1}^i q_k & i > 0 \end{cases} \\ \mathcal J_i &= \{1 + \overleftarrow q_{i - 1}, \dots , q_i + \overleftarrow q_{i - 1}\} \end{align*}

such that the element functions’ private parameters lies in consecutive parts of $$\vec x$$. There is also a less restricted optimizer called optimizer_generic were the parameters can overlap in an arbitrary way. The R interface for this function is implemented in the psqn_generic function. See vignette("psqn", package = "psqn") for further details on both the psqn and the psqn_generic function.

## The R Interface

As a simple example, we consider the element functions:

$f_i((\vec\beta^\top, \vec u^\top_i)^\top) = -\vec y_i(\mat X_i\vec\beta + \mat Z_i\vec u_i) + \sum_{k = 1}^{t_i} \log(1 + \exp(\vec x_{ik}^\top\vec\beta + \vec z_{ik}^\top\vec u_i)) + \frac 12 \vec u^\top_i\mat\Sigma^{-1} \vec u_i.$

$$\vec\beta$$ is the $$p$$ dimensional global parameter and $$\vec u_i$$ is the $$q_i = q$$ dimensional private parameters for the $$i$$th element function. $$\vec y_i\in \{0, 1\}^{t_i}$$, $$\mat X_i\in\mathbb R^{t_i\times p}$$, and $$\mat Z_i\in\mathbb R^{t_i\times q}$$ are particular to each element function. We simulate some data below to use:

# assign global parameters, number of private parameters, etc.
q <- 4 # number of private parameters per cluster
p <- 5 # number of global parameters
beta <- sqrt((1:p) / sum(1:p))
Sigma <- diag(q)

# simulate a data set
n_ele_func <- 1000L # number of element functions
set.seed(80919915)

sim_dat <- replicate(n_ele_func, {
t_i <- sample.int(40L, 1L) + 2L
X <- matrix(runif(p * t_i, -sqrt(6 / 2), sqrt(6 / 2)),
p)
u <- drop(rnorm(q) %*% chol(Sigma))
Z <- matrix(runif(q * t_i, -sqrt(6 / 2 / q), sqrt(6 / 2 / q)),
q)
eta <- drop(beta %*% X + u %*% Z)
y <- as.numeric((1 + exp(-eta))^(-1) > runif(t_i))

list(X = X, Z = Z, y = y, u = u, Sigma_inv = solve(Sigma))
}, simplify = FALSE)

# data for the first element function
sim_dat[[1L]]
#> $X #> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] #> [1,] 0.147 -1.246 0.840 -1.083 0.701 -1.6532 0.549 1.522 -0.449 0.254 #> [2,] -0.241 1.391 -0.873 -0.877 1.005 1.5401 -0.207 -0.999 1.249 1.379 #> [3,] -1.713 0.698 1.550 -1.335 0.687 0.0999 0.688 0.493 -0.992 0.780 #> [4,] 1.116 1.687 0.557 -1.380 0.294 1.2391 -1.331 -0.459 0.262 -1.351 #> [5,] -0.391 1.310 -1.477 -0.836 -1.542 1.3278 -0.788 -0.675 -1.184 0.208 #> [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] #> [1,] 0.3633 -0.578 1.5378 1.488 0.653 1.707 -1.4558 -1.396 0.58917 1.473 #> [2,] -0.4122 -0.616 -1.2711 0.256 -1.494 0.615 -0.4410 0.114 -0.56704 -0.261 #> [3,] 0.0818 -0.272 -1.4706 1.060 -0.959 -1.141 0.0916 -0.928 1.68352 -0.155 #> [4,] -1.4245 1.716 -0.9433 0.428 1.670 -0.254 -0.1064 -0.245 0.00692 0.161 #> [5,] 1.6009 1.628 0.0971 -0.818 0.402 -0.497 -1.3034 0.636 0.72653 -0.425 #> [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] #> [1,] -1.086 -0.8711 1.2213 0.698 0.721 0.9319 -0.326 -0.00238 -1.164 0.203 #> [2,] 0.397 1.1903 -0.3113 -0.837 1.501 -0.0304 1.509 -0.17466 0.547 -0.667 #> [3,] 0.440 0.0235 -0.7929 0.305 -0.809 0.0949 -0.946 -0.44998 -0.761 -0.724 #> [4,] 0.222 1.2529 -0.0905 -0.879 -0.274 1.0152 0.492 -1.48076 -0.213 1.332 #> [5,] 0.872 -1.2783 1.0110 -1.225 0.904 1.0819 -1.243 0.34144 0.919 0.404 #> [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38] #> [1,] -0.333 -0.842 -0.3760 1.529 0.439 -1.227 -0.235 -0.562 #> [2,] 0.649 1.103 -1.1518 -0.277 -1.369 -0.951 1.702 1.685 #> [3,] 1.370 -1.343 1.5827 0.355 0.457 -1.509 -1.427 0.779 #> [4,] 0.179 1.544 -0.0281 -0.199 -0.923 -0.524 0.406 0.515 #> [5,] 0.138 -0.470 1.4224 0.271 -0.424 1.090 0.290 1.585 #> #>$Z
#>        [,1]   [,2]    [,3]   [,4]    [,5]   [,6]    [,7]     [,8]    [,9]
#> [1,] -0.178  0.838 -0.6811 -0.752 -0.0552 -0.076 -0.0308 -0.19838  0.7404
#> [2,]  0.843  0.372 -0.0235 -0.652 -0.1140 -0.531  0.7292  0.74028  0.0757
#> [3,]  0.432  0.231  0.8555  0.624  0.2864  0.583  0.5691  0.00112 -0.6749
#> [4,] -0.288 -0.297  0.6028  0.536 -0.8658  0.142 -0.0209  0.53635  0.8298
#>       [,10]  [,11]  [,12]  [,13]   [,14]   [,15]  [,16]   [,17]  [,18]  [,19]
#> [1,] 0.0646 -0.657 -0.566  0.806 -0.1562  0.0875 -0.039 -0.7200  0.835  0.220
#> [2,] 0.6481 -0.232  0.215 -0.425  0.0812 -0.3133  0.378 -0.0546 -0.553 -0.365
#> [3,] 0.1781 -0.331  0.691  0.683 -0.8539 -0.8270 -0.257 -0.4874 -0.753  0.747
#> [4,] 0.7046  0.643  0.141 -0.308 -0.7163  0.7274 -0.711 -0.0990 -0.400  0.642
#>       [,20]  [,21]   [,22]   [,23]   [,24]  [,25]  [,26]  [,27]  [,28]   [,29]
#> [1,] -0.315 -0.395 -0.4401 -0.1950  0.0695 -0.402 -0.297 -0.862 -0.448  0.8003
#> [2,] -0.053 -0.498 -0.6587  0.0463  0.4262 -0.440 -0.551  0.852  0.764  0.6065
#> [3,] -0.323  0.307  0.3877  0.5228 -0.5408 -0.532 -0.758  0.346 -0.847 -0.6973
#> [4,] -0.779 -0.779  0.0879  0.8470 -0.1618 -0.320 -0.858  0.527 -0.784  0.0282
#>        [,30]    [,31]  [,32]  [,33]  [,34]   [,35]  [,36]  [,37]  [,38]
#> [1,] -0.4176 -0.60760  0.502 -0.816 -0.568 -0.3325 -0.085 -0.656 -0.828
#> [2,] -0.6609  0.00662  0.296  0.480 -0.538 -0.3001  0.750  0.375 -0.358
#> [3,]  0.1750  0.25447  0.124 -0.367  0.129 -0.0108  0.711  0.751 -0.159
#> [4,]  0.0145  0.08370 -0.802  0.500  0.263 -0.0894 -0.637  0.416 -0.677
#>
#> $y #> [1] 1 1 0 0 1 0 0 0 0 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 #> #>$u
#> [1] -0.170  0.427  0.918 -2.080
#>
#> $Sigma_inv #> [,1] [,2] [,3] [,4] #> [1,] 1 0 0 0 #> [2,] 0 1 0 0 #> [3,] 0 0 1 0 #> [4,] 0 0 0 1 We work with $$\mat X_i^\top$$ and $$\mat Z_i^\top$$ for computational reasons. The function we need to pass to psqn needs to take three arguments: • An index of the element function. • A vector with $$\vec x_{\mathcal I_i}$$. It will have length zero if the backend requests an integer vector with $$p$$ and $$q_i$$. • A logical variable which is TRUE if the function should return an attribute with the gradient with respect to $$\vec x_{\mathcal I_i}$$. The function should return the element function value (potentially with the gradient as an attribute) or $$p$$ and $$q_i$$. Thus, an example in our case will be: r_func <- function(i, par, comp_grad){ dat <- sim_dat[[i]] X <- dat$X
Z <- dat$Z if(length(par) < 1) # requested the dimension of the parameter return(c(global_dim = NROW(dat$X), private_dim = NROW(dat$Z))) y <- dat$y
Sigma_inv <- dat$Sigma_inv beta <- par[1:p] u_i <- par[1:q + p] eta <- drop(beta %*% X + u_i %*% Z) exp_eta <- exp(eta) # compute the element function out <- -sum(y * eta) + sum(log(1 + exp_eta)) + sum(u_i * (Sigma_inv %*% u_i)) / 2 if(comp_grad){ # we also need to compute the gradient d_eta <- -y + exp_eta / (1 + exp_eta) grad <- c(X %*% d_eta, Z %*% d_eta + dat$Sigma_inv %*% u_i)
}

out
}

Then we can optimize the function as follows:

library(psqn)
start_val <- numeric(p + n_ele_func * q) # the starting value
opt_res <- psqn(par = start_val, fn = r_func, n_ele_func = n_ele_func)

# check the minimum
opt_res$value #> [1] 11969 # check the estimated global parameters head(opt_res$par, length(beta))
#> [1] 0.254 0.356 0.410 0.520 0.564

# should be close to
beta
#> [1] 0.258 0.365 0.447 0.516 0.577

## The R Interface for optimizer_generic

We can also use the psqn_generic function although it will be slower because of some additional computational overhead because the function is more general. The function we need to pass to psqn_generic needs to take three arguments:

• An index of the element function.
• A vector with $$\vec x_{\mathcal I_i}$$. This time, we make no assumptions about the index sets, the $$\mathcal I_i$$s. Thus, the argument will have length zero if the backend requests an integer vector with $$\mathcal I_i$$.
• A logical variable which is TRUE if the function should return an attribute with the gradient with respect to $$\vec x_{\mathcal I_i}$$.

We assign the function we need to pass to psqn_generic for the example in this vignette:

dat <- sim_dat[[i]]
X <- dat$X Z <- dat$Z

if(length(par) < 1)
# return the index set. This is one-based like in R
return(c(1:NROW(dat$X), seq_len(NROW(dat$Z)) + NROW(dat$X) + (i - 1L) * NROW(dat$Z)))

y <- dat$y Sigma_inv <- dat$Sigma_inv

beta <- par[1:p]
u_i <- par[1:q + p]
eta <- drop(beta %*% X + u_i %*% Z)
exp_eta <- exp(eta)

# compute the element function
out <- -sum(y * eta) + sum(log(1 + exp_eta)) +
sum(u_i * (Sigma_inv %*% u_i)) / 2

# we also need to compute the gradient
d_eta <- -y + exp_eta / (1 + exp_eta)
Z %*% d_eta + dat$Sigma_inv %*% u_i) attr(out, "grad") <- grad } out } Then we can optimize the function as follows: opt_res_generic <- psqn_generic( par = start_val, fn = r_func_generic, n_ele_func = n_ele_func) # we get the same all.equal(opt_res_generic$value, opt_res$value) #> [1] TRUE all.equal(opt_res_generic$par  , opt_res\$par)
#> [1] TRUE

# the generic version is slower
bench::mark(
psqn = psqn(par = start_val, fn = r_func, n_ele_func = n_ele_func),
psqn_generic = psqn_generic(
par = start_val, fn = r_func_generic, n_ele_func = n_ele_func),
min_iterations = 5)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 × 6
#>   expression        min   median itr/sec mem_alloc gc/sec
#>   <bch:expr>   <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 psqn            290ms    292ms      3.41    30.9MB     13.7
#> 2 psqn_generic    300ms    302ms      2.97    30.9MB     11.9

## Ending Remarks

The package can also be used as a header-only library in C++. This can yield a very large reduction in computation time and be easy to implement with the Rcpp package. Two examples are shown in the psqn vignette (see vignette("psqn", package = "psqn")).

There is also a BFGS implementation in the package. This can be used in R with the psqn_bfgs function. The BFGS implementation can also be used in C++ using the psqn-bfgs.h file.