This package is intended to be used in support of data management activities associated with fixed locations in space. The motivating fields include both air and water quality monitoring where fixed sensors report at regular time intervals.
When working with environmental monitoring time series, one of the first things
you have to do is create unique identifiers for each individual time series. In
an ideal world, each environmental time series would have both a
locationID and a
sensorID that uniquely identify the spatial location and
specific instrument making measurements. A unique
be produced as
locationID_sensorID. Location metadata associated with each
time series would contain basic information needed for downstream analysis
including at least:
timeseriesID, locationID, sensorID, longitude, latitude, ...
Unfortunately, we are rarely supplied with a truly unique and truly spatial
locationID. Instead we often use
sensorID or an associated non-spatial
identifier as a stand-in for
Complications we have seen include:
A solution to all these problems is possible if we store spatial metadata in
simple tables in a standard directory. These tables will be referred to as
collections. Location lookups can be performed with
geodesic distance calculations where a location is assigned to a pre-existing
known location if it is within
radius meters. These will be extremely fast.
If no previously known location is found, the relatively slow (seconds) creation of a new known location metadata record can be performed and then added to the growing collection.
For collections of stationary environmental monitors that only number in the
thousands, this entire collection (i.e. “database”) can be stored as either a
.csv file and will be under a megabyte in size making it fast to
load. This small size also makes it possible to store multiple known location
files, each created with different locations and different radii to address
the needs of different scientific studies.
The package comes with some example known location tables to demonstrate.
Lets take some metadata we have for air quality monitors in Washington state and create a known location table for them.
wa <- get(data("wa_airfire_meta", package = "MazamaLocationUtils")) names(wa)
##  "monitorID" "longitude" "latitude" ##  "elevation" "timezone" "countryCode" ##  "stateCode" "siteName" "agencyName" ##  "countyName" "msaName" "monitorType" ##  "siteID" "instrumentID" "aqsID" ##  "pwfslID" "pwfslDataIngestSource" "telemetryAggregator" ##  "telemetryUnitID"
We can create a known location table for them with a minimum 500 meter separation between distinct locations:
library(MazamaLocationUtils) # Initialize with standard directories mazama_initialize() setLocationDataDir("./data") wa_monitors_500 <- table_initialize() %>% table_addLocation(wa$longitude, wa$latitude, radius = 500)
Right now, our known locations table contains only automatically generated spatial metadata:
##  "locationID" "locationName" "longitude" "latitude" "elevation" ##  "countryCode" "stateCode" "county" "timezone" "houseNumber" ##  "street" "city" "zip"
Perhaps we would like to import some of the original metadata into our new table. This is a very common use case where non-spatial metadata like site name or agency responsible for a monitor can be added.
Just to make it interesting, let's assume that our known location table is already large and we are only providing additional metadata for a subset of the records.
# Use a subset of the wa metadata wa_indices <- seq(5,65,5) wa_sub <- wa[wa_indices,] # Use a generic name for the location table locationTbl <- wa_monitors_500 # Find the location IDs associated with our subset locationID <- table_getLocationID( locationTbl, longitude = wa_sub$longitude, latitude = wa_sub$latitude, radius = 500 ) # Now add the "siteName" column for our subset of locations locationData <- wa_sub$siteName locationTbl <- table_updateColumn( locationTbl, columnName = "siteName", locationID = locationID, locationData = locationData ) # Lets see how we did locationTbl_indices <- table_getRecordIndex(locationTbl, locationID) locationTbl[locationTbl_indices, c("city", "siteName")]
## # A tibble: 13 x 2 ## city siteName ## <chr> <chr> ## 1 Chelan "Chelan-Woodin Ave" ## 2 La Crosse "Lacrosse-Hill St" ## 3 Tri-Cities "Kennewick-Metaline" ## 4 Sunnyside "Sunnyside-S 16th" ## 5 Inchelium "Inchelium" ## 6 Wellpinit "Wellpinit-Spokane Tribe" ## 7 Lake Forest Park "Lake Forest Park-Town Center" ## 8 Okanogan County "Twisp-Glover St" ## 9 Limestone Junction "Maple Falls-Azure Way" ## 10 Okanogan County "Omak-Colville Tribe" ## 11 Ritzville "Ritzville-Alder St " ## 12 Darrington "Darrington-Fir St" ## 13 Tukwila "Tukwila_Allentown"
The whole point of a know location table is to speed up access to spatial and other metadata. Here's how we can use it with a set of longitudes and latitudes that are not currently in our table.
# Create new locations near our known locations lons <- jitter(wa_sub$longitude) lats <- jitter(wa_sub$latitude) # Any known locations within 50 meters? table_getNearestLocation( wa_monitors_500, longitude = lons, latitude = lats, radius = 50 ) %>% dplyr::pull(city)
##  NA NA NA NA NA NA NA NA NA NA NA NA NA
# Any known locations within 500 meters table_getNearestLocation( wa_monitors_500, longitude = lons, latitude = lats, radius = 500 ) %>% dplyr::pull(city)
##  NA NA NA "Sunnyside" ##  NA "Wellpinit" NA "Okanogan County" ##  NA "Okanogan County" NA "Darrington" ##  "Tukwila"
# How about 5000 meters? table_getNearestLocation( wa_monitors_500, longitude = lons, latitude = lats, radius = 5000 ) %>% dplyr::pull(city)
##  "Chelan" "La Crosse" "Tri-Cities" ##  "Sunnyside" "Inchelium" "Wellpinit" ##  "Lake Forest Park" "Okanogan County" "Limestone Junction" ##  "Okanogan County" "Ritzville" "Darrington" ##  "Tukwila"
Before using MazamaLocationUtils you must first install MazamaSpatialUtils and then install core spatial data with:
library(MazamaSpatialUtils) setSpatialDataDir("~/Data/Spatial") installSpatialData()
Once the required datasets have been installed, the easiest way to set things up each session is with:
library(MazamaLocationUtils) mazama_initialize() setLocationDataDir("~/Data/KnownLocations")
mazama_initialize() assumes spatial data are installed in the standard
location and is just a wrapper for:
MazamaSpatialUtils::setSpatialDataDir("~/Data/Spatial") MazamaSpatialUtils::loadSpatialData("EEZCountries") MazamaSpatialUtils::loadSpatialData("OSMTimezones") MazamaSpatialUtils::loadSpatialData("NaturalEarthAdm1") MazamaSpatialUtils::loadSpatialData("USCensusCounties")
Every time you
table_save() your location table, a backup will be created
so you can experiment without losing your work. File sizes are pretty tiny
so you don't have to worry about filling up your disk.
Best wishes for well organized spatial metadata!