This vignette formed pp.Â 239â€“251 of the first edition of Bivand, R. S., Pebesma, E. and GÃ³mez-Rubio V. (2008) *Applied Spatial Data Analysis with R*, Springer-Verlag, New York. It was retired from the second edition (2013) to accommodate material on other topics, and is made available in this form with the understanding of the publishers.

Creating spatial weights is a necessary step in using areal data, perhaps just to check that there is no remaining spatial patterning in residuals. The first step is to define which relationships between observations are to be given a non-zero weight, that is to choose the neighbour criterion to be used; the second is to assign weights to the identified neighbour links.

The 281 census tract data set for eight central New York State counties featured prominently in (Waller and Gotway 2004) will be used in many of the examples, supplemented with tract boundaries derived from TIGER 1992 and distributed by SEDAC/CIESIN. The boundaries have been projected from geographical coordinates to UTM zone~18. This file is not identical with the boundaries used in the original source, but is very close and may be re-distributed, unlike the version used in the book. Starting from the census tract queen contiguities, where all touching polygons are neighbours, used in (Waller and Gotway 2004) and provided as a DBF file on their website, a GAL format file has been created and read into R.

```
if (require(rgdal, quietly=TRUE)) {
NY8 <- readOGR(system.file("shapes/NY8_utm18.shp", package="spData"))
} else {
require(maptools, quietly=TRUE)
NY8 <- readShapeSpatial(system.file("shapes/NY8_utm18.shp", package="spData"))
}
```

```
## Warning in OGRSpatialRef(dsn, layer, morphFromESRI = morphFromESRI, dumpSRS = dumpSRS, :
## Discarded datum D_unknown in Proj4 definition: +proj=utm +zone=18 +ellps=WGS84 +units=m
## +no_defs
```

```
## OGR data source with driver: ESRI Shapefile
## Source: "/home/rsb/lib/r_libs/spData/shapes/NY8_utm18.shp", layer: "NY8_utm18"
## with 281 features
## It has 17 fields
```

```
library(spdep)
NY_nb <- read.gal(system.file("weights/NY_nb.gal", package="spData"), region.id=row.names(NY8))
```

For the sake of simplicity in showing how to create neighbour objects, we work on a subset of the map consisting of the census tracts within Syracuse, although the same principles apply to the full data set. We retrieve the part of the neighbour list in Syracuse using the `subset`

method.

```
Syracuse <- NY8[NY8$AREANAME == "Syracuse city",]
Sy0_nb <- subset(NY_nb, NY8$AREANAME == "Syracuse city")
summary(Sy0_nb)
```

```
## Neighbour list object:
## Number of regions: 63
## Number of nonzero links: 346
## Percentage nonzero weights: 8.717561
## Average number of links: 5.492063
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9
## 1 1 5 9 14 17 9 6 1
## 1 least connected region:
## 164 with 1 link
## 1 most connected region:
## 136 with 9 links
```

We can create a copy of the same neighbours object for polygon contiguities using the `poly2nb()`

function in **spdep**. It takes an object extending the `SpatialPolygons`

class as its first argument, and using heuristics identifies polygons sharing boundary points as neighbours. It also has a `snap=`

argument, to allow the shared boundary points to be a short distance from one another.

`class(Syracuse)`

```
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
```

```
Sy1_nb <- poly2nb(Syracuse)
isTRUE(all.equal(Sy0_nb, Sy1_nb, check.attributes=FALSE))
```

`## [1] TRUE`

As we can see, creating the contiguity neighbours from the `Syracuse`

object reproduces the neighbours from (Waller and Gotway 2004). Careful examination of the next figure shows, however, that the graph of neighbours is not planar, since some neighbour links cross each other. By default, the contiguity condition is met when at least one point on the boundary of one polygon is within the snap distance of at least one point of its neighbour. This relationship is given by the argument `queen=TRUE`

by analogy with movements on a chessboard. So when three or more polygons meet at a single point, they all meet the contiguity condition, giving rise to crossed links. If `queen=FALSE`

, at least two boundary points must be within the snap distance of each other, with the conventional name of a `rookâ€™ relationship. The figure shows the crossed line differences that arise when polygons touch only at a single point, compared to the stricter rook criterion.

```
Sy2_nb <- poly2nb(Syracuse, queen=FALSE)
isTRUE(all.equal(Sy0_nb, Sy2_nb, check.attributes=FALSE))
```

`## [1] FALSE`

```
oopar <- par(mfrow=c(1,2), mar=c(3,3,1,1)+0.1)
plot(Syracuse, border="grey60")
plot(Sy0_nb, coordinates(Syracuse), add=TRUE, pch=19, cex=0.6)
text(bbox(Syracuse)[1,1], bbox(Syracuse)[2,2], labels="a)", cex=0.8)
plot(Syracuse, border="grey60")
plot(Sy0_nb, coordinates(Syracuse), add=TRUE, pch=19, cex=0.6)
plot(diffnb(Sy0_nb, Sy2_nb, verbose=FALSE), coordinates(Syracuse),
add=TRUE, pch=".", cex=0.6, lwd=2)
text(bbox(Syracuse)[1,1], bbox(Syracuse)[2,2], labels="b)", cex=0.8)
```

`par(oopar)`

a: Queen-style census tract contiguities, Syracuse; b: Rook-style contiguity differences shown as thicker lines

If we have access to a GIS such as GRASS or ArcGIS, we can export the `SpatialPolygonsDataFrame`

object and use the topology engine in the GIS to find contiguities in the graph of polygon edges - a shared edge will yield the same output as the rook relationship.

This procedure does, however, depend on the topology of the set of polygons being clean, which holds for this subset, but not for the full eight-county data set. Not infrequently, there are small artefacts, such as slivers where boundary lines intersect or diverge by distances that cannot be seen on plots, but which require intervention to keep the geometries and data correctly associated. When these geometrical artefacts are present, the topology is not clean, because unambiguous shared polygon boundaries cannot be found in all cases; artefacts typically arise when data collected for one purpose are combined with other data or used for another purpose. Topologies are usually cleaned in a GIS by `snapping' vertices closer than a threshold distance together, removing artefacts -- for example, snapping across a river channel where the correct boundary is the median line but the input polygons stop at the channel banks on each side. The`

poly2nb()`function does have a`

snap` argument, which may also be used when input data possess geometrical artefacts.

```
library(spgrass6)
writeVECT6(Syracuse, "SY0")
contig <- vect2neigh("SY0")
Sy3_nb <- sn2listw(contig)$neighbours
isTRUE(all.equal(Sy3_nb, Sy2_nb, check.attributes=FALSE))
```

Similar approaches may also be used to read ArcGIS coverage data by tallying the left neighbour and right neighbour arc indices with the polygons in the data set, using either **RArcInfo** or **rgdal**.

In our Syracuse case, there are no exclaves or `islands' belonging to the data set, but not sharing boundary points within the snap distance. If the number of polygons is moderate, the missing neighbour links may be added interactively using the`

edit()`method for`

nb` objects, and displaying the polygon background. The same method may be used for removing links which, although contiguity exists, may be considered void, such as across a mountain range.

Continuing with irregularly located areal entities, it is possible to choose a point to represent the polygon-support entities. This is often the polygon centroid, which is not the average of the coordinates in each dimension, but takes proper care to weight the component triangles of the polygon by area. It is also possible to use other points, or if data are available, construct, for example population-weighted centroids. Once representative points are available, the criteria for neighbourhood can be extended from just contiguity to include graph measures, distance thresholds, and \(k\)-nearest neighbours.

The most direct graph representation of neighbours is to make a Delaunay triangulation of the points, shown in the first panel in the figure. The neighbour relationships are defined by the triangulation, which extends outwards to the convex hull of the points and which is planar. Note that graph-based representations construct the interpoint relationships based on Euclidean distance, with no option to use Great Circle distances for geographical coordinates. Because it joins distant points around the convex hull, it may be worthwhile to thin the triangulation as a Sphere of Influence (SOI) graph, removing links that are relatively long. Points are SOI neighbours if circles centred on the points, of radius equal to the pointsâ€™ nearest neighbour distances, intersect in two places (Avis and Horton 1985). Functions for graph-based neighbours were kindly contributed by Nicholas Lewin-Koh.

```
coords <- coordinates(Syracuse)
IDs <- row.names(as(Syracuse, "data.frame"))
#FIXME library(tripack)
Sy4_nb <- tri2nb(coords, row.names=IDs)
if (require(rgeos, quietly=TRUE) && require(dbscan, quietly=TRUE)) {
Sy5_nb <- graph2nb(soi.graph(Sy4_nb, coords), row.names=IDs)
} else Sy5_nb <- NULL
```

```
## rgeos version: 0.5-7, (SVN revision 670M)
## GEOS runtime version: 3.10.0dev-CAPI-1.15.0
## GEOS using OverlayNG
## Linking to sp version: 1.4-6
## Polygon checking: TRUE
```

```
Sy6_nb <- graph2nb(gabrielneigh(coords), row.names=IDs)
Sy7_nb <- graph2nb(relativeneigh(coords), row.names=IDs)
```

```
oopar <- par(mfrow=c(2,2), mar=c(1,1,1,1)+0.1)
plot(Syracuse, border="grey60")
plot(Sy4_nb, coords, add=TRUE, pch=".")
text(bbox(Syracuse)[1,1], bbox(Syracuse)[2,2], labels="a)", cex=0.8)
plot(Syracuse, border="grey60")
if (!is.null(Sy5_nb)) {
plot(Sy5_nb, coords, add=TRUE, pch=".")
text(bbox(Syracuse)[1,1], bbox(Syracuse)[2,2], labels="b)", cex=0.8)
}
plot(Syracuse, border="grey60")
plot(Sy6_nb, coords, add=TRUE, pch=".")
text(bbox(Syracuse)[1,1], bbox(Syracuse)[2,2], labels="c)", cex=0.8)
plot(Syracuse, border="grey60")
plot(Sy7_nb, coords, add=TRUE, pch=".")
text(bbox(Syracuse)[1,1], bbox(Syracuse)[2,2], labels="d)", cex=0.8)
```