Introducing stplanr

Robin Lovelace

2016-05-30

Introduction

Common computational tasks in transportation research and planning include:

the automation of such tasks (and there are many others) can assist researchers and practitioners create evidence for design and decision making, as part of the analyse, plan, design workflow.

stplanr facilitates each of these tasks with an integrated series of tools and example datasets. The ‘sustainable’ in the package name (“sustainable transport planning with R”) refers to both the focus on active travel and the aim for longevity and accessibility in the software. Transport planning is notoriously reliant on ‘black boxes’ and the same applies to scientific research into transport systems [@Waddell2002].

stplanr seeks to address these issues. After the package has been installed (see the package’s README for details), it can be loaded in with library():

library(stplanr)

Accessing and converting data

Transport data is often provided in origin-destination (‘OD’) form, either as a matrix or (more commonly) a long table of OD pairs. An example of this type of raw data is provided below (see ?flow to see how this dataset was created).

data("flow", package = "stplanr")
head(flow[c(1:3, 12)])
##        Area.of.residence Area.of.workplace All Bicycle
## 920573         E02002361         E02002361 109       2
## 920575         E02002361         E02002363  38       0
## 920578         E02002361         E02002367  10       0
## 920582         E02002361         E02002371  44       3
## 920587         E02002361         E02002377  34       0
## 920591         E02002361         E02002382   7       0

Although the flow data displayed above describes movement over geographical space, it contains no explicitly geographical information. Instead, the coordinates of the origins and destinations are linked to a separate geographical dataset which also must be loaded to analyse the flows. This is a common problem solved by the function od2line. The geographical data is a set of points representing centroids of the origin and destinations, saved as a SpatialPointsDataFrame. Geographical data in R is best represented as such Spatial* objects, which use the S4 object engine. This explains the close integration of stplanr with R’s spatial packages, especially sp, which defines the S4 spatial object system.

data("cents", package = "stplanr")
as.data.frame(cents[1:3,-c(3,4)])
##       geo_code  MSOA11NM coords.x1 coords.x2
## 1708 E02002384 Leeds 055 -1.546463  53.80952
## 1712 E02002382 Leeds 053 -1.511861  53.81161
## 1805 E02002393 Leeds 064 -1.524205  53.80410

We use od2line to combine flow and cents, to join the former to the latter. We will visualise the l object created below in the next section.

l <- od2line(flow = flow, zones = cents)

# remove lines with no length
l <- l[!l$Area.of.residence == l$Area.of.workplace,]

The data is now in a form that is much easier to analyse. We can plot the data with the command plot(l), which was not possible before. Because the SpatialLinesDataFrame object also contains data per line, it also helps with visualisation of the flows, as illustrated below.

plot(l, lwd = l$All / 10)

Allocating flows to the transport network

A common problem faced by transport researchers is network allocation: converting the ‘as the crow flies’ lines illustrated in the figure above into routes. These are the complex, winding paths that people and animals make to avoid obstacles such as buildings and to make the journey faster and more efficient (e.g. by following the road network).

This is difficult (and was until recently near impossible using free software) because of the size and complexity of transport networks, the complexity of realistic routing algorithms and need for context-specificity in the routing engine. Inexperienced cyclists, for example, would take a very different route than a heavy goods vehicle.

stplanr tackles this issue by using 3rd party APIs to provide route-allocation.

CycleStreets.net

The line2route function allocates straight line routes to the transport network using the CycleStreets.net API (you must request an API key for the function to work):

# test the connection
x = try(download.file("http://www.google.com", "test.html"), silent = T) 
offline <- grepl("Error", x)
if(!offline) file.remove("test.html")
## [1] TRUE
data("routes_fast")
data("routes_slow")

plot(l)
lines(routes_fast, col = "red")

The above code sends requests to CycleStreets.net and saves the results. By changing the plan argument of line2route, we can download routes that are more suitable for people prioritising speed, quietness or a balance between speed and quietness. The plots of l and routes_fast illustrate how the spatial classes that the flow data are converted to are easy to visualise. Using such data objects, generated by stplanr, in combination with additional packages for creating maps in R [e.g. see @Cheshire2015; @Kahle2013], will further enhance the utility of the package’s functions. Route-allocated flows are also useful for analysis.

Graphhopper

route_graphhopper() extracts geographical and other data from the graphhopper routing engine to allocate flows to the travel network.

# ggmap dependency may be down
if(!offline & !Sys.getenv("GRAPHHOPPER") == ""){
  ny2oaxaca1 <- route_graphhopper("New York", "Oaxaca", vehicle = "bike")
  ny2oaxaca2 <- route_graphhopper("New York", "Oaxaca", vehicle = "car")
  plot(ny2oaxaca1)
  plot(ny2oaxaca2, add = T, col = "red")
  
  ny2oaxaca1@data
  ny2oaxaca2@data
}

Note that the function also saves time, distance and (for bike trips) vertical distance climbed for the trips.

Analysis of origin-destination flow data

Route-allocated lines allow estimation of route distance and cirquity (route distance divided by Euclidean distance). These variables can help model the rate of flow between origins and destination, as illustrated in the next section. The code below demonstrates how objects generated by stplanr can be used to estimate route distance.

lgb <- spTransform(l, CRSobj = CRS("+init=epsg:27700"))
l$d_euclidean <- rgeos::gLength(lgb, byid = T)
l$d_fastroute <- routes_fast@data$length
plot(l$d_euclidean, l$d_fastroute,
  xlab = "Euclidean distance", ylab = "Route distance")
abline(a = 0, b = 1)
abline(a = 0, b = 1.2, col = "green")
abline(a = 0, b = 1.5, col = "red")

The plot illustrates the expected strong correlation between Euclidean and fastest route distance. However, some OD pairs have a proportionally higher route distance than others. This is illustrated by distance from the black line in the above plot, which represents circuity.

To estimate the amount of capacity needed at each segment on the transport network, the overline function, written by Barry Rowlingson can be used. This divides up line geometries into unique segments then aggregates the overlapping values. The results, plotted on the map, can be used to estimate where there is most need to improve the transport network, for example informing the decision of where to build new bicycle paths. (The leaflet package was used to generate the basemap).

Developing models of travel behaviour

There are many ways of estimating flows between origins and destinations, including spatial interaction models, the four-stage transport model and ‘distance decay’. stplanr aims eventually to facilitate creation of many types of flow model.

At present (May 2015) the only functions to assist with this are the dd_* functions, simple commands to create distance decay curves. Distance decay is an especially important concept for sustainable transport planning due to physical limitations on the ability of people to walk and cycle large distances [@Iacono2010].

Let’s explore the relationship between distance and the proportion of trips made by walking, using the same object l generated by stplanr.

l$pwalk <- l$On.foot / l$All
plot(l$d_euclidean, l$pwalk, cex = l$All / 50,
  xlab = "Euclidean distance (m)", ylab = "Proportion of trips by foot")

Based on the figure there is a clear negative relationship between distance of trips and the proportion of those trips made by walking. This is unsurprising: beyond a certain distance (around 1.5km according the the data presented in the figure above) walking may a rather slow and and many people would concider travelling by bike instead! According to the academic literature, this ‘distance decay’ is non-linear and there have been a number of functions proposed to fit to distance decay curves [@Martinez2013]. From the range of options we test below just two forms. We will compare the ability of linear and log-square-root functions to fit the data contained in l for walking.

lm1 <- lm(pwalk ~ d_euclidean, data = l@data, weights = All)
lm2 <- lm(pwalk ~ d_fastroute, data = l@data, weights = All)
lm3 <- glm(pwalk ~ d_fastroute + I(d_fastroute^0.5), data = l@data, weights = All, family = quasipoisson(link = "log"))

The results of these regression models can be seen using summary(). The results for such a small dataset are not particularly interesting and, surprisingly, Euclidean distance seems to be a better predictor of walking that route distance. The results are displayed in the graphic below.

plot(l$d_euclidean, l$pwalk, cex = l$All / 50,
  xlab = "Euclidean distance (m)", ylab = "Proportion of trips by foot")
l2 <- data.frame(d_euclidean = 1:5000, d_fastroute = 1:5000)
lm1p <- predict(lm1, l2)
lm2p <- predict(lm2, l2)
lm3p <- predict(lm3, l2)
lines(l2$d_euclidean, lm1p)
lines(l2$d_euclidean, exp(lm2p), col = "green")
lines(l2$d_euclidean, exp(lm3p), col = "red")

Future plans

The final thing to say about this package is that it is work-in-progress and we hope to add more functionality, including tools for creating spatial interaction models, more functions to access routing APIs and improved functionality for accessing OpenStreetMap data.

If you would like to contribute or request new features, see https://github.com/Robinlovelace/stplanr

References

Cheshire, James, and Robin Lovelace. 2015. “Spatial data visualisation with R.” In Geocomputation, edited by Chris Brunsdon and Alex Singleton, 1–14. SAGE Publications.

Iacono, Michael, Kevin J. Krizek, and Ahmed El-Geneidy. 2010. “Measuring non-motorized accessibility: issues, alternatives, and execution.” Journal of Transport Geography 18 (1). Elsevier Ltd: 133–40.

Kahle, D, and Hadley Wickham. 2013. “ggmap: Spatial Visualization with ggplot2.” The R Journal 5: 144–61.

Martínez, L. Miguel, and José Manuel Viegas. 2013. “A new approach to modelling distance-decay functions for accessibility assessment in transport studies.” Journal of Transport Geography 26: 87–96.

Waddell, Paul. 2002. “UrbanSim: Modeling urban development for land use, transportation, and environmental planning.” Journal of the American Planning Association 68 (3). Taylor & Francis: 297–314.