The Direct Sampling Spatial Prior (DSSP) is based on the thin-plate splines solution to the smoothing problem of minimising the penalised sum of squares \[ S_{\eta}(f) = \frac{1}{n}\sum^{n}_{i}W_i(y_i - f(\mathbf{x}_i))^2 +\eta J_m(f) \] which can be written as \[ \min_{\mathbf{\nu}}\:(\mathbf{y}-\mathbf{\nu})'\mathbf{W}(\mathbf{y}-\mathbf{\nu})+ \eta\mathbf{\nu}'\mathbf{M}\mathbf{\nu}. \] The solution for this problem is \[ \hat{\mathbf{\nu}}= (\mathbf{W}+\eta\mathbf{M})^{-1}\mathbf{y}. \] If we assume that the observed data are from a Gaussian distribution \[ \mathbf{y}\sim N(\mathbf{\nu},\delta\mathbf{W}^{-1}) \] the if we specify the prior for \(\mathbf{\nu}\) \[ \left[\mathbf{\nu}\mid\eta,\delta\right]\propto\ \frac{\eta}{\delta}^{-{r}/2} \exp\left(-\frac{\eta}{2\delta}\mathbf{\nu}'\mathbf{M}\mathbf{\nu}\right), \] the resulting posterior of \(\nu\) is proportional to \[ -\frac{1}{2\delta}\left( (\mathbf{y}-\mathbf{\nu})'\mathbf{W}({\mathbf{y}}-\mathbf{\nu}) -\eta\mathbf{\nu}'\mathbf{M}\mathbf{\nu}\right) \] which yields the posterior mean with the same solution as the penalised least squares.
The complete model is specified with a Gaussian likelihood, the improper prior for \(\nu\), an inverse gaussian prior for \(\delta\), and a prior for \(\eta\). With this specification the joint posterior is written \[ \pi\left(\mathbf{\nu},\delta_0,\eta|\mathbf{y}\right)\propto f(\mathbf{y}|\mathbf{\nu},\delta)\pi\left(\mathbf{\nu}|\eta,\delta\right)\pi\left(\delta\right) \pi\left(\eta\right). \] Given this it is possible to derive the set of posterior distributions \[ \pi\left(\mathbf{\nu}|\delta,\eta,\mathbf{y}\right)\\ \pi\left(\delta|\eta,\mathbf{y}\right)\\ \pi\left(\eta|\mathbf{y}\right) \] which can be sampled directly in sequence to create a draw from the joint posterior \[ \pi\left(\mathbf{\nu},\delta_0,\eta|\mathbf{y}\right). \] This is the heart of what the function DSSP() does1.
This short example demonstrates the use of the functions DSSP() and DSSP.predict() to analyse spatial data.
The example here uses the Meuse River data set from the package ‘sp’. First start by loading the library and data set ‘meuse.all’.
library(sp)
library(gstat)
library(DSSP)
library(ggplot2)
data("meuse.all")
data("meuse")
Next, set the coordinates for the data and extract the \(X\) matrix of locations and the \(Y\) the log of observed zinc concentrations,
<- meuse.all[1:155, ]
meuse.train <- meuse.all[156:164, ]
meuse.valid coordinates(meuse.train) <- ~ x + y
coordinates(meuse.valid) <- ~ x + y
note that data are scaled, this is typically good practice.
Next, we select the number of samples to draw and invoke the function DSSP().
<- 10000 ## number of samples to draw from the DSSP model
N <- DSSP(
meuse.fit formula = log(zinc) ~ 1, data = meuse.train, N = N,
pars = c(0.001, 0.001), log_prior = function(x) -2 * log(1 + x)
)
<- meuse.fit$eta
ETA <- meuse.fit$delta
DELTA <- meuse.fit$nu NU
The function DSSP() takes as its argument \(N\) the number of samples desired, \(X\) the matrix of coordinate pairs indicating locations of the observed data \(Y\), as well as a function specifying the prior density of \(\eta\), the shape and rate parameters of the inverse-gamma prior for \(\delta\), and \(ncores\) the number of cores to use. The argument \(ncores\) is an optional argument that is only effective if your version of R is configured to use OpenMP.
<- rowMeans(exp(meuse.fit$y_fitted))
Yhat
$Yhat <- Yhat ## Model estimates of E(zinc concentration (ppm))
meuse$Y.true <- meuse.all$zinc[1:155]
meuse
## Compare the smoothed values and the observed values
<- data.frame(Yhat = meuse$Yhat, Y.true = meuse$Y.true)
smooth.data
<- ggplot(smooth.data, aes(x = Yhat, y = Y.true)) +
smooth.scatterplot geom_point(size = 3) +
geom_abline(aes(intercept = 0, slope = 1)) +
labs(x = "Smoothed Values", y = "Observed Values", title = "Smoothed vs. Observed Values") +
xlim(min(smooth.data), max(smooth.data)) +
ylim(min(smooth.data), max(smooth.data)) +
theme(plot.title = element_text(hjust = 0.5))
smooth.scatterplot
## Now plot Parameter Estimates and ACF Plots
<- ggplot(data.frame(x = ETA)) +
eta.densityplot geom_density(aes(x = x)) +
labs(x = expression(eta), y = "posterior density", title = expression("Posterior Density of " * eta)) +
theme(plot.title = element_text(hjust = 0.5))
<- ggplot(data.frame(x = DELTA)) +
delta.densityplot geom_density(aes(x = x)) +
labs(x = expression(delta), y = "posterior density", title = expression("Posterior Density of " * delta)) +
theme(plot.title = element_text(hjust = 0.5))
eta.densityplot delta.densityplot
eta.acfplot delta.acfplot
<- ggplot(data.frame(x = 1:length(ETA), y = cumsum(ETA) / (1:length(ETA)))) +
eta.cumsumplot geom_line(aes(x = x, y = y)) +
labs(x = "sample", y = expression(eta), title = bquote(atop("Cumuulative Mean of Samples", "from Posterior of" ~ eta))) +
theme(plot.title = element_text(hjust = 0.5))
<- ggplot(data.frame(x = 1:length(DELTA), y = cumsum(DELTA) / (1:length(DELTA)))) +
delta.cumsumplot geom_line(aes(x = x, y = y)) +
labs(x = "sample", y = expression(eta), title = bquote(atop("Cumuulative Mean of Samples", "from Posterior of" ~ delta))) +
theme(plot.title = element_text(hjust = 0.5))
eta.cumsumplot delta.cumsumplot
<- predict(meuse.fit, meuse.valid)
Y.pred <- exp(Y.pred)
Y.pred
<- data.frame(Yhat.pred = rowMeans(Y.pred), Y.true = meuse.all$zinc[156:164])
pred.data
<- ggplot(pred.data, aes(x = Yhat.pred, y = Y.true)) +
pred.scatterplot geom_point(size = 3) +
geom_abline(aes(intercept = 0, slope = 1)) +
labs(x = "Predicted Values", y = "True Values", title = "Predicted vs. True Values") +
xlim(min(pred.data), max(pred.data)) +
ylim(min(pred.data), max(pred.data)) +
theme(plot.title = element_text(hjust = 0.5))
<- ggplot(stack(as.data.frame(t(Y.pred)))) +
pred.boxplot geom_boxplot(aes(x = ind, y = values)) +
geom_point(data = data.frame(Y.true = meuse.all$zinc[156:164]), aes(x = 1:9, y = Y.true), shape = 4, size = 3) +
labs(x = "", y = "Y", title = bquote(atop("Boxplot of Predicted Values of", "Y and True Values (X)"))) +
theme(plot.title = element_text(hjust = 0.5))
pred.scatterplot pred.boxplot
see G. White, D. Sun, P. Speckman (2019) <arXiv:1906.05575> for details↩︎