odns
provides a base for exploring and obtaining data
available through the Scottish Health and Social Care Open Data
platform. The package provides a wrapper for the underlying CKAN) API
and simplifies the process of accessing the available data with R,
allowing users to quickly explore the available data and start using it
without having to write complex queries.
CKAN and by extension this package refers to packages and resources.
The term package refers to a dataset, a collection of resources. A resource, contains the data itself.
Example of structure;
.
├── package_1
│ ├── resource_1
│ ├── resource_2
│ └── resource_3
|
└── package_2
├── resource_1
└── resource_2
To view all the packages available we can use
all_packages()
.
all_packages()
#> [1] "18-weeks-referral-to-treatment"
#> [2] "27-30-month-review-statistics"
#> [3] "alcohol-related-hospital-statistics-scotland"
#> [4] "allied-health-professionals-musculoskeletal-waiting-times"
#> [5] "allied-health-professional-vacancies"
#> ...
If we wish to search for packages by name, or a part of a name, we
can use the contains
argument of
all_packages()
. For example if we want to find packages
relating to populations;
all_packages(contains = "population")
#> [1] "gp-practice-populations" "population-estimates"
#> [3] "population-projections" "standard-populations"
If we prefer to see more detail, including the names of resources
within packages, we can use all_resources()
with the
package_contains
argument.
all_resources(package_contains = "population")
#> name package_name
#> 1 GP Practice Populations April 2022 gp-practice-populations
#> 2 GP Practice Populations January 2022 gp-practice-populations
#> id
#> 1 2c701f90-c26d-4963-8062-95b8611e5fd1
#> 2 d07debcf-7832-4dc4-afb2-41101d5cc7ff
#> package_id
#> 1 e3300e98-cdd2-4f4e-a24e-06ee14fcc66c
#> 2 e3300e98-cdd2-4f4e-a24e-06ee14fcc66c
#> last_modified
#> 1 2022-05-10T09:43:24.390241
#> 2 2022-02-07T11:13:52.195764
The resource_contains
argument of
all_resources()
can also be used to further narrow the
results of a query.
all_resources(resource_contains = "european")
#> name
#> 1 European Standard Population
#> 2 European Standard Population by Sex
#> package_name
#> 1 standard-populations
#> 2 standard-populations
#> id
#> 1 edee9731-daf7-4e0d-b525-e4c1469b8f69
#> 2 29ce4cda-a831-40f4-af24-636196e05c1a
#> package_id
#> 1 4dd86111-7326-48c4-8763-8cc4aa190c3e
#> 2 4dd86111-7326-48c4-8763-8cc4aa190c3e
#> last_modified
#> 1 2018-04-05T14:42:35.785110
#> 2 2018-04-05T14:45:36.996054
When passing search strings they are case insensitive. The example
above of all_resources(resource_contains = "european")
would return resources contained ‘EUROPEAN’, ‘European’, or european.
Lets say that we are interested in ‘standard-populations’ resource
‘European Standard Population’. We can view the metadata for the package
and the resource.
package_metadata(package = "standard-populations")
resource_metadata(resource="edee9731-daf7-4e0d-b525-e4c1469b8f69")
The package metadata contains useful information such as the time it was last modified, the publisher, a description, and notes.
The resource metadata provides details of the available fields and
their types. This information is particularly useful when putting
together a query where we want to return only a subset of data.
We can import all of the resources within a package using
get_resource()
.
# get full data-sets
get_resource(package = "4dd86111-7326-48c4-8763-8cc4aa190c3e")
# get the first 10 rows of each data-set
get_resource(package = "4dd86111-7326-48c4-8763-8cc4aa190c3e", limit = 10L)
We can also use the resource
argument of
get_resource()
to import a specific resource within a
package. This is often the simplest way to get a single resource in its
entirety.
get_dataset(
package = "4dd86111-7326-48c4-8763-8cc4aa190c3e",
resource = "edee9731-daf7-4e0d-b525-e4c1469b8f69",
limit = 5L
)
get_resource()
always returns a list, even when only one
resource is being imported. Where multiple resources have been imported,
each resource is its own list element.
Where more granular control is desired over the data imported, the
get_data()
function can be used. get_data()
allows us to import selected fields and to filter the data. If we want
to import the fields ‘AgeGroup’ and ‘EuropeanStandardPopulation’ from
the ‘European Standard Population’ resource we can achieve this with
get_data()
.
get_data(
resource = "edee9731-daf7-4e0d-b525-e4c1469b8f69",
fields = c("AgeGroup", "EuropeanStandardPopulation")
)
The where
argument of get_data()
can be
used to exact further control over the data we import. If we only want
to retrieve rows from ‘European Standard Population’ where the age group
is 45-49 years, we can write a SQL style where query to achieve this.
get_data(
resource = "edee9731-daf7-4e0d-b525-e4c1469b8f69",
fields = c("AgeGroup", "EuropeanStandardPopulation"),
where = "\"AgeGroup\" = \'45-49 years\'"
)
The where
argument of get_data()
requires
specific formatting to allow for compatibility with the CKAN API. Field
names must be double quoted "
, non-numeric values must be
single quoted '
, and both single and double quotes must be
delimited, for example;
where = "\"AgeGroup\" = \'45-49 years\\'"
.