Using odns


odns provides a base for exploring and obtaining data available through the Scottish Health and Social Care Open Data platform. The package provides a wrapper for the underlying CKAN) API and simplifies the process of accessing the available data with R, allowing users to quickly explore the available data and start using it without having to write complex queries.


Language of CKAN

CKAN and by extension this package refers to packages and resources.

The term package refers to a dataset, a collection of resources. A resource, contains the data itself.

Example of structure;

.
├── package_1
│   ├── resource_1
│   ├── resource_2
│   └── resource_3
|
└── package_2
    ├── resource_1
    └── resource_2


Exploring the available data

To view all the packages available we can use all_packages().

all_packages()
  
#>  [1] "18-weeks-referral-to-treatment"                                                
#>  [2] "27-30-month-review-statistics"                                                
#>  [3] "alcohol-related-hospital-statistics-scotland"
#>  [4] "allied-health-professionals-musculoskeletal-waiting-times"
#>  [5] "allied-health-professional-vacancies"
#>  ...


If we wish to search for packages by name, or a part of a name, we can use the contains argument of all_packages(). For example if we want to find packages relating to populations;

all_packages(contains = "population")

#>  [1] "gp-practice-populations" "population-estimates"   
#>  [3] "population-projections"  "standard-populations"


If we prefer to see more detail, including the names of resources within packages, we can use all_resources() with the package_contains argument.

all_resources(package_contains = "population")

#>                                   name            package_name
#> 1   GP Practice Populations April 2022 gp-practice-populations
#> 2 GP Practice Populations January 2022 gp-practice-populations
#>                                     id
#> 1 2c701f90-c26d-4963-8062-95b8611e5fd1
#> 2 d07debcf-7832-4dc4-afb2-41101d5cc7ff
#>                             package_id
#> 1 e3300e98-cdd2-4f4e-a24e-06ee14fcc66c
#> 2 e3300e98-cdd2-4f4e-a24e-06ee14fcc66c
#>                last_modified
#> 1 2022-05-10T09:43:24.390241
#> 2 2022-02-07T11:13:52.195764


The resource_contains argument of all_resources() can also be used to further narrow the results of a query.

all_resources(resource_contains = "european")

#>                                   name
#>  1        European Standard Population
#>  2 European Standard Population by Sex
#>            package_name
#>  1 standard-populations
#>  2 standard-populations
#>                                      id
#>  1 edee9731-daf7-4e0d-b525-e4c1469b8f69
#>  2 29ce4cda-a831-40f4-af24-636196e05c1a
#>                              package_id
#>  1 4dd86111-7326-48c4-8763-8cc4aa190c3e
#>  2 4dd86111-7326-48c4-8763-8cc4aa190c3e
#>                 last_modified
#>  1 2018-04-05T14:42:35.785110
#>  2 2018-04-05T14:45:36.996054


When passing search strings they are case insensitive. The example above of all_resources(resource_contains = "european") would return resources contained ‘EUROPEAN’, ‘European’, or european.

Further information and metadata

Lets say that we are interested in ‘standard-populations’ resource ‘European Standard Population’. We can view the metadata for the package and the resource.

package_metadata(package = "standard-populations")

resource_metadata(resource="edee9731-daf7-4e0d-b525-e4c1469b8f69")


The package metadata contains useful information such as the time it was last modified, the publisher, a description, and notes.

The resource metadata provides details of the available fields and their types. This information is particularly useful when putting together a query where we want to return only a subset of data.

Importing data to R

We can import all of the resources within a package using get_resource().

# get full data-sets
get_resource(package = "4dd86111-7326-48c4-8763-8cc4aa190c3e")

# get the first 10 rows of each data-set
get_resource(package = "4dd86111-7326-48c4-8763-8cc4aa190c3e", limit = 10L)


We can also use the resource argument of get_resource() to import a specific resource within a package. This is often the simplest way to get a single resource in its entirety.

get_dataset(
   package = "4dd86111-7326-48c4-8763-8cc4aa190c3e",
   resource = "edee9731-daf7-4e0d-b525-e4c1469b8f69",
   limit = 5L
   )


get_resource() always returns a list, even when only one resource is being imported. Where multiple resources have been imported, each resource is its own list element.

Where more granular control is desired over the data imported, the get_data() function can be used. get_data() allows us to import selected fields and to filter the data. If we want to import the fields ‘AgeGroup’ and ‘EuropeanStandardPopulation’ from the ‘European Standard Population’ resource we can achieve this with get_data().

get_data(
   resource = "edee9731-daf7-4e0d-b525-e4c1469b8f69",
   fields = c("AgeGroup", "EuropeanStandardPopulation")
 )


The where argument of get_data() can be used to exact further control over the data we import. If we only want to retrieve rows from ‘European Standard Population’ where the age group is 45-49 years, we can write a SQL style where query to achieve this.

get_data(
   resource = "edee9731-daf7-4e0d-b525-e4c1469b8f69",
   fields = c("AgeGroup", "EuropeanStandardPopulation"),
   where = "\"AgeGroup\" = \'45-49 years\'"
 )


The where argument of get_data() requires specific formatting to allow for compatibility with the CKAN API. Field names must be double quoted ", non-numeric values must be single quoted ', and both single and double quotes must be delimited, for example; where = "\"AgeGroup\" = \'45-49 years\\'".