Get started with bcdata

The bcdata R package contains functions for searching & retrieving data from the B.C. Data Catalogue.

The B.C. Data Catalogue is the place to find British Columbia Government data, applications and web services. Much of the data are released under the Open Government Licence — British Columbia, as well as numerous other licences.

You can install bcdata directly from GitHub using the remotes package:

install.packages("remotes")

remotes::install_github("bcgov/bcdata")
library(bcdata)

bcdc_browse()

bcdata::bcdc_browse() let’s you access the B.C. Data Catalogue web interface directly from R—opening the home page in your default browser:

## Take me to the B.C. Data Catalogue home page
bcdc_browse()

If you know the catalogue record “human-readable” name or permanent ID you can open directly to the record web page:

## Take me to the B.C. Winery Locations catalogue record using the record name
bcdc_browse("bc-winery-locations") 

## Take me to the B.C. Winery Locations catalogue record using the record ID
bcdc_browse("1d21922b-ec4f-42e5-8f6b-bf320a286157")

You can also use bcdc_browse() to open to search results in the catalogue:

## Take me to the catalogue search results for 'wineries'
bcdc_browse("wineries")

bcdc_get_data()

One you have located the B.C. Data Catalogue record with the data you want, you can use bcdata::bcdc_get_data() to download and read the data from the record. You can use the record name, ID or the result from bcdc_get_record(). Let’s look at the B.C. Highway Web Cameras data:

## Get the data resource for the `bc-highway-cams` catalogue record
bcdc_get_data("bc-highway-cams") 
#> # A tibble: 874 x 19
#>    links_bchighway… links_imageDisp… links_imageThum… links_replayThe…
#>    <chr>            <chr>            <chr>            <chr>           
#>  1 http://images.d… http://images.d… http://images.d… http://images.d…
#>  2 http://images.d… http://images.d… http://images.d… http://images.d…
#>  3 http://images.d… http://images.d… http://images.d… http://images.d…
#>  4 http://images.d… http://images.d… http://images.d… http://images.d…
#>  5 http://images.d… http://images.d… http://images.d… http://images.d…
#>  6 http://images.d… http://images.d… http://images.d… http://images.d…
#>  7 http://images.d… http://images.d… http://images.d… http://images.d…
#>  8 http://images.d… http://images.d… http://images.d… http://images.d…
#>  9 http://images.d… http://images.d… http://images.d… http://images.d…
#> 10 http://images.d… http://images.d… http://images.d… http://images.d…
#> # … with 864 more rows, and 15 more variables: id <dbl>,
#> #   highway_number <chr>, highway_locationDescription <chr>,
#> #   camName <chr>, caption <chr>, credit <chr>, orientation <chr>,
#> #   latitude <dbl>, longitude <dbl>, imageStats_updatePeriodMean <chr>,
#> #   imageStats_updatePeriodStdDev <dbl>, markedDelayed <dbl>,
#> #   updatePeriodMean <dbl>, updatePeriodStdDev <dbl>, fetchMean <dbl>

## OR use the permanent ID, which is better for scripts or non-interactive use 
bcdc_get_data("6b39a910-6c77-476f-ac96-7b4f18849b1c")
#> # A tibble: 874 x 19
#>    links_bchighway… links_imageDisp… links_imageThum… links_replayThe…
#>    <chr>            <chr>            <chr>            <chr>           
#>  1 http://images.d… http://images.d… http://images.d… http://images.d…
#>  2 http://images.d… http://images.d… http://images.d… http://images.d…
#>  3 http://images.d… http://images.d… http://images.d… http://images.d…
#>  4 http://images.d… http://images.d… http://images.d… http://images.d…
#>  5 http://images.d… http://images.d… http://images.d… http://images.d…
#>  6 http://images.d… http://images.d… http://images.d… http://images.d…
#>  7 http://images.d… http://images.d… http://images.d… http://images.d…
#>  8 http://images.d… http://images.d… http://images.d… http://images.d…
#>  9 http://images.d… http://images.d… http://images.d… http://images.d…
#> 10 http://images.d… http://images.d… http://images.d… http://images.d…
#> # … with 864 more rows, and 15 more variables: id <dbl>,
#> #   highway_number <chr>, highway_locationDescription <chr>,
#> #   camName <chr>, caption <chr>, credit <chr>, orientation <chr>,
#> #   latitude <dbl>, longitude <dbl>, imageStats_updatePeriodMean <chr>,
#> #   imageStats_updatePeriodStdDev <dbl>, markedDelayed <dbl>,
#> #   updatePeriodMean <dbl>, updatePeriodStdDev <dbl>, fetchMean <dbl>

## OR use the result from bcdc_get_record()
my_data <- bcdc_get_record("6b39a910-6c77-476f-ac96-7b4f18849b1c")
bcdc_get_data(my_data)
#> # A tibble: 874 x 19
#>    links_bchighway… links_imageDisp… links_imageThum… links_replayThe…
#>    <chr>            <chr>            <chr>            <chr>           
#>  1 http://images.d… http://images.d… http://images.d… http://images.d…
#>  2 http://images.d… http://images.d… http://images.d… http://images.d…
#>  3 http://images.d… http://images.d… http://images.d… http://images.d…
#>  4 http://images.d… http://images.d… http://images.d… http://images.d…
#>  5 http://images.d… http://images.d… http://images.d… http://images.d…
#>  6 http://images.d… http://images.d… http://images.d… http://images.d…
#>  7 http://images.d… http://images.d… http://images.d… http://images.d…
#>  8 http://images.d… http://images.d… http://images.d… http://images.d…
#>  9 http://images.d… http://images.d… http://images.d… http://images.d…
#> 10 http://images.d… http://images.d… http://images.d… http://images.d…
#> # … with 864 more rows, and 15 more variables: id <dbl>,
#> #   highway_number <chr>, highway_locationDescription <chr>,
#> #   camName <chr>, caption <chr>, credit <chr>, orientation <chr>,
#> #   latitude <dbl>, longitude <dbl>, imageStats_updatePeriodMean <chr>,
#> #   imageStats_updatePeriodStdDev <dbl>, markedDelayed <dbl>,
#> #   updatePeriodMean <dbl>, updatePeriodStdDev <dbl>, fetchMean <dbl>

A catalogue record can have one or multiple data files—or “resources”. If there is only one resource, bcdc_get_data() will return that resource by default, as in the above bc-highway-cams example. If there are multiple data resources you will need to specify which resource you want. Let’s look at a catalogue record that contains multiple data resources—BC Schools - Programs Offered in Schools:

## Get the record ID for the `bc-schools-programs-offered-in-schools` catalogue record
bcdc_search("school programs", n = 1)
#> List of B.C. Data Catalogue Records
#> 
#> Number of records: 1
#> Titles:
#> 1: BC Schools - Programs Offered in Schools (txt, xlsx)
#>  ID: b1f27d1c-244a-410e-a361-931fac62a524
#>  Name: bc-schools-programs-offered-in-schools 
#> 
#> Access a single record by calling bcdc_get_record(ID)
#>       with the ID from the desired record.

## Get the metadata for the `bc-schools-programs-offered-in-schools` catalogue record
bcdc_get_record("b1f27d1c-244a-410e-a361-931fac62a524")
#> B.C. Data Catalogue Record:
#>     BC Schools - Programs Offered in Schools 
#> 
#> Name: bc-schools-programs-offered-in-schools (ID: b1f27d1c-244a-410e-a361-931fac62a524 )
#> Permalink: https://catalogue.data.gov.bc.ca/dataset/b1f27d1c-244a-410e-a361-931fac62a524
#> Sector: Education
#> Licence: Open Government Licence - British Columbia
#> Type: Dataset
#> Last Updated: 2017-02-02 
#> 
#> Description:
#>     BC Schools English Language Learners, French Immersion, Francophone, Career
#>     Preparation, Aboriginal Support Services, Aboriginal Language and Culture,
#>     Continuing Education and Career Technical Programs offered in BC schools up to
#>     2013/2014. 
#> 
#> Resources: ( 2 )
#> 1) ProgramsOfferedinSchools.txt
#>      format: txt 
#>      url: http://www.bced.gov.bc.ca/reporting/odefiles/ProgramsOfferedinSchools.txt 
#>      resource: a393f8cf-51ec-42c6-8449-4cea4c75385c 
#>      available in R via bcdata:  TRUE 
#>      code: bcdc_get_data(record = 'b1f27d1c-244a-410e-a361-931fac62a524', resource = 'a393f8cf-51ec-42c6-8449-4cea4c75385c')
#> 
#> 2) ProgramsOfferedinSchools.xlsx
#>      format: xlsx 
#>      url: http://www.bced.gov.bc.ca/reporting/odefiles/ProgramsOfferedinSchools.xlsx 
#>      resource: 1e34098e-70d3-454d-a0fb-e4f8f477d0c0 
#>      available in R via bcdata:  TRUE 
#>      code: bcdc_get_data(record = 'b1f27d1c-244a-410e-a361-931fac62a524', resource = '1e34098e-70d3-454d-a0fb-e4f8f477d0c0')

We see there are two data files or resources available in this record, so we need to tell bcdc_get_data() which one we want. When used interactively, bcdc_get_data() will prompt you with the list of available resources through bcdata and ask you to select the resource you want. The full code—with the resource ID for each data set—is also available in the metadata record ☝️:

## Get the txt data resource from the `bc-schools-programs-offered-in-schools`
## catalogue record
bcdc_get_data("b1f27d1c-244a-410e-a361-931fac62a524", resource = 'a393f8cf-51ec-42c6-8449-4cea4c75385c')
#> # A tibble: 16,152 x 24
#>    `Data Level` `School Year` `Facility Type` `Public Or Inde…
#>    <chr>        <chr>         <chr>           <chr>           
#>  1 SCHOOL LEVEL 2005/2006     STANDARD        BC Public School
#>  2 SCHOOL LEVEL 2006/2007     STANDARD        BC Public School
#>  3 SCHOOL LEVEL 2007/2008     STANDARD        BC Public School
#>  4 SCHOOL LEVEL 2005/2006     STANDARD        BC Public School
#>  5 SCHOOL LEVEL 2006/2007     STANDARD        BC Public School
#>  6 SCHOOL LEVEL 2007/2008     STANDARD        BC Public School
#>  7 SCHOOL LEVEL 2008/2009     STANDARD        BC Public School
#>  8 SCHOOL LEVEL 2009/2010     STANDARD        BC Public School
#>  9 SCHOOL LEVEL 2010/2011     STANDARD        BC Public School
#> 10 SCHOOL LEVEL 2011/2012     STANDARD        BC Public School
#> # … with 16,142 more rows, and 20 more variables: `District Number` <chr>,
#> #   `District Name` <chr>, `School Number` <chr>, `School Name` <chr>,
#> #   `Has Eng Lang Learner Prog` <lgl>, `Has Core French` <lgl>, `Has Early
#> #   French Immersion` <lgl>, `Has Late French Immersion` <lgl>, `Has Prog
#> #   Francophone` <lgl>, `Has Any French Immersion Prog` <lgl>, `Has Any
#> #   French Prog` <lgl>, `Has Aborig Supp Services` <lgl>, `Has Other Appr
#> #   Aborig Prog` <lgl>, `Has Aborig Lang And Cult` <lgl>, `Has Continuing
#> #   Ed Prog` <lgl>, `Has Distributed Learn Prog` <lgl>, `Has Career Prep
#> #   Prog` <lgl>, `Has Coop Prog` <lgl>, `Has Apprenticeship Prog` <lgl>,
#> #   `Has Career Technical Prog` <lgl>

bcdc_get_data() will also detect if the data resource is a geospatial file, and automatically reads and returns it as an sf object in your R session. Let’s get the air zones for British Columbia:

## Find the B.C. Air Zones catalogue record
bcdc_search("air zones", res_format = "geojson")
#> List of B.C. Data Catalogue Records
#> 
#> Number of records: 1
#> Titles:
#> 1: British Columbia Air Zones (shp, kml, geojson)
#>  ID: e8eeefc4-2826-47bc-8430-85703d328516
#>  Name: british-columbia-air-zones 
#> 
#> Access a single record by calling bcdc_get_record(ID)
#>       with the ID from the desired record.

## Get the metadata for the B.C. Air Zones catalogue record
bc_az_metadata <- bcdc_get_record("e8eeefc4-2826-47bc-8430-85703d328516")

## Get the B.C. Air Zone geospatial data
bc_az <- bcdc_get_data(bc_az_metadata, resource = "c495d082-b586-4df0-9e06-bd6b66a8acd9")

## Plot the B.C. Air Zone geospatial data with ggplot()
bc_az %>% 
  ggplot() +
  geom_sf()

Note: The bcdata package supports downloading most file types, including zip archives. It will do its best to identify and read data from zip files, however if there are multiple data files in the zip, or data files that bcdata doesn’t know how to import, it will fail.

Using B.C. Geographic Warehouse (BCGW) layer names

If you are familiar with the B.C. Geographic Warehouse (BCGW), you may already know the name of a layer that you want from the BCGW. bcdc_get_data() supports supplying that name directly. For example, the record for the B.C. airports layer shows that the object name is WHSE_IMAGERY_AND_BASE_MAPS.GSR_AIRPORTS_SVW, and we can use that in bcdc_get_data():

bcdc_get_data("WHSE_IMAGERY_AND_BASE_MAPS.GSR_AIRPORTS_SVW")
#> Simple feature collection with 455 features and 41 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 406543.7 ymin: 367957.6 xmax: 1796645 ymax: 1689146
#> epsg (SRID):    3005
#> proj4string:    +proj=aea +lat_1=50 +lat_2=58.5 +lat_0=45 +lon_0=-126 +x_0=1000000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
#> # A tibble: 455 x 42
#>    id    CUSTODIAN_ORG_D… BUSINESS_CATEGO… BUSINESS_CATEGO…
#>  * <chr> <chr>            <chr>            <chr>           
#>  1 WHSE… "Ministry of Fo… airTransportati… Air Transportat…
#>  2 WHSE… "Ministry of Fo… airTransportati… Air Transportat…
#>  3 WHSE… "Ministry of Fo… airTransportati… Air Transportat…
#>  4 WHSE… "Ministry of Fo… airTransportati… Air Transportat…
#>  5 WHSE… "Ministry of Fo… airTransportati… Air Transportat…
#>  6 WHSE… "Ministry of Fo… airTransportati… Air Transportat…
#>  7 WHSE… "Ministry of Fo… airTransportati… Air Transportat…
#>  8 WHSE… "Ministry of Fo… airTransportati… Air Transportat…
#>  9 WHSE… "Ministry of Fo… airTransportati… Air Transportat…
#> 10 WHSE… "Ministry of Fo… airTransportati… Air Transportat…
#> # … with 445 more rows, and 38 more variables:
#> #   OCCUPANT_TYPE_DESCRIPTION <chr>, SOURCE_DATA_ID <chr>,
#> #   SUPPLIED_SOURCE_ID_IND <chr>, AIRPORT_NAME <chr>, DESCRIPTION <chr>,
#> #   PHYSICAL_ADDRESS <chr>, ALIAS_ADDRESS <chr>, STREET_ADDRESS <chr>,
#> #   POSTAL_CODE <chr>, LOCALITY <chr>, CONTACT_PHONE <chr>,
#> #   CONTACT_EMAIL <chr>, CONTACT_FAX <chr>, WEBSITE_URL <chr>,
#> #   IMAGE_URL <chr>, LATITUDE <dbl>, LONGITUDE <dbl>, KEYWORDS <chr>,
#> #   DATE_UPDATED <date>, SITE_GEOCODED_IND <chr>, AERODROME_STATUS <chr>,
#> #   AIRCRAFT_ACCESS_IND <chr>, DATA_SOURCE <chr>, DATA_SOURCE_YEAR <chr>,
#> #   ELEVATION <dbl>, FUEL_AVAILABILITY_IND <chr>,
#> #   HELICOPTER_ACCESS_IND <chr>, IATA_CODE <chr>, ICAO_CODE <chr>,
#> #   MAX_RUNWAY_LENGTH <dbl>, NUMBER_OF_RUNWAYS <int>,
#> #   OIL_AVAILABILITY_IND <chr>, RUNWAY_SURFACE <chr>,
#> #   SEAPLANE_ACCESS_IND <chr>, TC_LID_CODE <chr>, SEQUENCE_ID <int>,
#> #   SE_ANNO_CAD_DATA <chr>, geometry <POINT [m]>

bcdc_query_geodata()

Many geospatial datasets in the B.C. Data Catalogue are available through a Web Service. While bcdc_get_data() will retrieve the geospatial data for you, sometimes the geospatial file is very large—and slow to download—and/or you may only want some of the data. bcdc_query_geodata() let’s you query catalogue geospatial data available as Web Service using select and filter functions (just like in dplyr. The bcdc::collect() function returns the bcdc_query_geodata() query results as an sf object in your R session.

Let’s get the Capital Regional District boundary from the B.C. Regional Districts geospatial data—the whole file takes 30-60 seconds to download and I only need the one polygon, so why not save some time:

## Find the B.C. Regional Districts catalogue record
bcdc_search("regional districts administrative areas", res_format = "wms", n = 1)
#> List of B.C. Data Catalogue Records
#> 
#> Number of records: 1
#> Titles:
#> 1: Regional Districts - Legally Defined Administrative Areas of BC (other, xlsx, wms, kml)
#>  ID: d1aff64e-dbfe-45a6-af97-582b7f6418b9
#>  Name: regional-districts-legally-defined-administrative-areas-of-bc 
#> 
#> Access a single record by calling bcdc_get_record(ID)
#>       with the ID from the desired record.

## Get the metadata for the B.C. Regional Districts catalogue record
bc_regional_districts_metadata <- bcdc_get_record("d1aff64e-dbfe-45a6-af97-582b7f6418b9")

## Have a quick look at the geospatial columns to help with filter or select
bcdc_describe_feature(bc_regional_districts_metadata)
#> # A tibble: 18 x 4
#>    col_name                selectable remote_col_type        local_col_type
#>    <chr>                   <lgl>      <chr>                  <chr>         
#>  1 id                      FALSE      xsd:string             character     
#>  2 LGL_ADMIN_AREA_ID       FALSE      xsd:decimal            numeric       
#>  3 ADMIN_AREA_NAME         TRUE       xsd:string             character     
#>  4 ADMIN_AREA_ABBREVIATION TRUE       xsd:string             character     
#>  5 ADMIN_AREA_BOUNDARY_TY… TRUE       xsd:string             character     
#>  6 ADMIN_AREA_TYPE         TRUE       xsd:string             character     
#>  7 ADMIN_AREA_GROUP_NAME   TRUE       xsd:string             character     
#>  8 CHANGE_REQUESTED_ORG    TRUE       xsd:string             character     
#>  9 UPDATE_TYPE             TRUE       xsd:string             character     
#> 10 WHEN_UPDATED            TRUE       xsd:date               date          
#> 11 OIC_NUMBER              TRUE       xsd:string             character     
#> 12 OIC_YEAR                TRUE       xsd:string             character     
#> 13 AFFECTED_ADMIN_AREA_AB… TRUE       xsd:string             character     
#> 14 FEATURE_AREA_SQM        TRUE       xsd:decimal            numeric       
#> 15 FEATURE_LENGTH_M        TRUE       xsd:decimal            numeric       
#> 16 SHAPE                   TRUE       gml:GeometryPropertyT… sfc geometry  
#> 17 OBJECTID                FALSE      xsd:decimal            numeric       
#> 18 SE_ANNO_CAD_DATA        TRUE       xsd:hexBinary          numeric

## Get the Capital Regional District polygon from the B.C. Regional 
## Districts geospatial data
my_regional_district <- bcdc_query_geodata(bc_regional_districts_metadata) %>% 
  filter(ADMIN_AREA_NAME == "Capital Regional District") %>% 
  collect()

## Plot the Capital Regional District polygon with ggplot()
my_regional_district  %>% 
  ggplot() +
  geom_sf()

The vignette Querying spatial data with bcdata provides a full demonstration on how to use bcdata::bcdc_query_geodata() to fine tune a Web Service request for geospatial data from the B.C. Data Catalogue.