statCanR

Thierry Warin

library(statcanR)

Overview

statcanR provides the R user with a consistent process to collect data from Statistics Canada’s data portal. It provides access to all Statistics Canada’ open economic data (formerly known as CANSIM tables) now identified by product IDs (PID) by the new Statistics Canada’s Web Data Service: https://www150.statcan.gc.ca/n1/en/type/data.

Quick start

First, install statcanR:

devtools::install_github("warint/statcanR")

Next, call statcanR to make sure everything is installed correctly.

library(statcanR)

Practical usage

This section presents an example of how to use the statcanR R package and its function statcan_data().

The following example is provided to illustrate how to use the function. It consists in collecting some descriptive statistics about the Canadian labour force at the federal, provincial and industrial levels, on a monthly basis.

With a simple web search ‘statistics canada wages by industry metropolitan area monthly’, the table number can easily be found on Statisitcs Canada’s webpage. Here is below a figure that illustrates this example, such as ‘27-10-0014-01’ for the Federal expenditures on science and technology, by socio-economic objectives.

Once the table number is identified, the statcan_data() function is easy to use in order to collect the data, as following:

# Get data with statcan_data function
mydata <- statcan_data("27-10-0014-01", "eng")
# sqs_statcandata will be depricated 
mydata <- sqs_statcan_data("27-10-0014-01","eng")

Should you want the same data in French, just replace the argument at the end of the function with “fra”.

Code architecture

This section describes the code architecture of the statcanR package.

In details, the statcan_data() function has 2 arguments:

  1. tableNumber
  2. lang

The ‘tableNumber’ argument simply refers to the table number of the Statistics Canada data table a user wants to collect, such as ‘27-10-0014-01’ for the Federal expenditures on science and technology, by socio-economic objectives, as an example.

The second argument, ‘lang’, refers to the language. As Canada is a bilingual country, Statistics Canada displays all the economic data in both languages. Therefore, users can choose to collect satistics data tables in French or English by setting the lang argument with c(“fra”, “eng”).

The code architecture of the statcan_data() function is as follows. The first step if to clean the table number in order to align it with the official typology of Statistics Canada’s Web Data Service. The second step is to create a temporary folder where all the data wrangling operations are performed. The third step is to check and select the correct language. The fourth step is to define the right URL where the actual data table is stored and then to download the .Zip container. The fifth step is to unzip the previously downloaded .Zip file to extract the data and metadata from the .csv files. The final step is to load the statistics data into a data frame called ‘sqs_data’ and to add the official table indicator name in the new ‘INDICATOR’ column.

To be more precise about the statcan_data() function, here is below some further code description: