DSSP: Direct Sampling Spatial Prior

2022-06-02

Introduction

The Direct Sampling Spatial Prior (DSSP) is based on the thin-plate splines solution to the smoothing problem of minimising the penalised sum of squares \[ S_{\eta}(f) = \frac{1}{n}\sum^{n}_{i}W_i(y_i - f(\mathbf{x}_i))^2 +\eta J_m(f) \] which can be written as \[ \min_{\mathbf{\nu}}\:(\mathbf{y}-\mathbf{\nu})'\mathbf{W}(\mathbf{y}-\mathbf{\nu})+ \eta\mathbf{\nu}'\mathbf{M}\mathbf{\nu}. \] The solution for this problem is \[ \hat{\mathbf{\nu}}= (\mathbf{W}+\eta\mathbf{M})^{-1}\mathbf{y}. \] If we assume that the observed data are from a Gaussian distribution \[ \mathbf{y}\sim N(\mathbf{\nu},\delta\mathbf{W}^{-1}) \] the if we specify the prior for \(\mathbf{\nu}\) \[ \left[\mathbf{\nu}\mid\eta,\delta\right]\propto\ \frac{\eta}{\delta}^{-{r}/2} \exp\left(-\frac{\eta}{2\delta}\mathbf{\nu}'\mathbf{M}\mathbf{\nu}\right), \] the resulting posterior of \(\nu\) is proportional to \[ -\frac{1}{2\delta}\left( (\mathbf{y}-\mathbf{\nu})'\mathbf{W}({\mathbf{y}}-\mathbf{\nu}) -\eta\mathbf{\nu}'\mathbf{M}\mathbf{\nu}\right) \] which yields the posterior mean with the same solution as the penalised least squares.

The complete model is specified with a Gaussian likelihood, the improper prior for \(\nu\), an inverse gaussian prior for \(\delta\), and a prior for \(\eta\). With this specification the joint posterior is written \[ \pi\left(\mathbf{\nu},\delta_0,\eta|\mathbf{y}\right)\propto f(\mathbf{y}|\mathbf{\nu},\delta)\pi\left(\mathbf{\nu}|\eta,\delta\right)\pi\left(\delta\right) \pi\left(\eta\right). \] Given this it is possible to derive the set of posterior distributions \[ \pi\left(\mathbf{\nu}|\delta,\eta,\mathbf{y}\right)\\ \pi\left(\delta|\eta,\mathbf{y}\right)\\ \pi\left(\eta|\mathbf{y}\right) \] which can be sampled directly in sequence to create a draw from the joint posterior \[ \pi\left(\mathbf{\nu},\delta_0,\eta|\mathbf{y}\right). \] This is the heart of what the function DSSP() does1.

Example

This short example demonstrates the use of the functions DSSP() and DSSP.predict() to analyse spatial data.

The example here uses the Meuse River data set from the package ‘sp’. First start by loading the library and data set ‘meuse.all’.

library(sp)
library(gstat)
library(DSSP)
library(ggplot2)

data("meuse.all")
data("meuse")

Next, set the coordinates for the data and extract the \(X\) matrix of locations and the \(Y\) the log of observed zinc concentrations,

meuse.train <- meuse.all[1:155, ]
meuse.valid <- meuse.all[156:164, ]
coordinates(meuse.train) <- ~ x + y
coordinates(meuse.valid) <- ~ x + y

note that data are scaled, this is typically good practice.

Next, we select the number of samples to draw and invoke the function DSSP().

N <- 10000 ## number of samples to draw from the DSSP model
meuse.fit <- DSSP(
  formula = log(zinc) ~ 1, data = meuse.train, N = N, 
  pars = c(0.001, 0.001), log_prior = function(x) -2 * log(1 + x)
)

ETA <- meuse.fit$eta
DELTA <- meuse.fit$delta
NU <- meuse.fit$nu

The function DSSP() takes as its argument \(N\) the number of samples desired, \(X\) the matrix of coordinate pairs indicating locations of the observed data \(Y\), as well as a function specifying the prior density of \(\eta\), the shape and rate parameters of the inverse-gamma prior for \(\delta\), and \(ncores\) the number of cores to use. The argument \(ncores\) is an optional argument that is only effective if your version of R is configured to use OpenMP.

Yhat <- rowMeans(exp(meuse.fit$y_fitted))

meuse$Yhat <- Yhat ## Model estimates of E(zinc concentration (ppm))
meuse$Y.true <- meuse.all$zinc[1:155]

##  Compare the smoothed values and the observed values

smooth.data <- data.frame(Yhat = meuse$Yhat, Y.true = meuse$Y.true)

smooth.scatterplot <- ggplot(smooth.data, aes(x = Yhat, y = Y.true)) +
  geom_point(size = 3) +
  geom_abline(aes(intercept = 0, slope = 1)) +
  labs(x = "Smoothed Values", y = "Observed Values", title = "Smoothed vs. Observed Values") +
  xlim(min(smooth.data), max(smooth.data)) +
  ylim(min(smooth.data), max(smooth.data)) +
  theme(plot.title = element_text(hjust = 0.5))

smooth.scatterplot

##  Now plot Parameter Estimates and ACF Plots

eta.densityplot <- ggplot(data.frame(x = ETA)) +
  geom_density(aes(x = x)) +
  labs(x = expression(eta), y = "posterior density", title = expression("Posterior Density of " * eta)) +
  theme(plot.title = element_text(hjust = 0.5))

delta.densityplot <- ggplot(data.frame(x = DELTA)) +
  geom_density(aes(x = x)) +
  labs(x = expression(delta), y = "posterior density", title = expression("Posterior Density of " * delta)) +
  theme(plot.title = element_text(hjust = 0.5))

eta.densityplot
delta.densityplot

eta.acfplot
delta.acfplot

eta.cumsumplot <- ggplot(data.frame(x = 1:length(ETA), y = cumsum(ETA) / (1:length(ETA)))) +
  geom_line(aes(x = x, y = y)) +
  labs(x = "sample", y = expression(eta), title = bquote(atop("Cumuulative Mean of Samples", "from Posterior of" ~ eta))) +
  theme(plot.title = element_text(hjust = 0.5))



delta.cumsumplot <- ggplot(data.frame(x = 1:length(DELTA), y = cumsum(DELTA) / (1:length(DELTA)))) +
  geom_line(aes(x = x, y = y)) +
  labs(x = "sample", y = expression(eta), title = bquote(atop("Cumuulative Mean of Samples", "from Posterior of" ~ delta))) +
  theme(plot.title = element_text(hjust = 0.5))
eta.cumsumplot
delta.cumsumplot

Y.pred <- predict(meuse.fit, meuse.valid)
Y.pred <- exp(Y.pred)

pred.data <- data.frame(Yhat.pred = rowMeans(Y.pred), Y.true = meuse.all$zinc[156:164])

pred.scatterplot <- ggplot(pred.data, aes(x = Yhat.pred, y = Y.true)) +
  geom_point(size = 3) +
  geom_abline(aes(intercept = 0, slope = 1)) +
  labs(x = "Predicted Values", y = "True Values", title = "Predicted vs. True Values") +
  xlim(min(pred.data), max(pred.data)) +
  ylim(min(pred.data), max(pred.data)) +
  theme(plot.title = element_text(hjust = 0.5))

pred.boxplot <- ggplot(stack(as.data.frame(t(Y.pred)))) +
  geom_boxplot(aes(x = ind, y = values)) +
  geom_point(data = data.frame(Y.true = meuse.all$zinc[156:164]), aes(x = 1:9, y = Y.true), shape = 4, size = 3) +
  labs(x = "", y = "Y", title = bquote(atop("Boxplot of Predicted Values of", "Y and True Values (X)"))) +
  theme(plot.title = element_text(hjust = 0.5))
pred.scatterplot
pred.boxplot


  1. see G. White, D. Sun, P. Speckman (2019) <arXiv:1906.05575> for details↩︎