Statistical classifications and correspondence tables are published as Linked Open Data (LOD) by several organisations, notably the Publications Office of the European Union (OP, via CELLAR) and the Food and Agriculture Organization (FAO).
While these resources can be accessed directly using SPARQL, this requires specific technical expertise. The correspondenceTables package provides high‑level R functions that allow users to retrieve these data as standard R data frames, without writing SPARQL queries themselves.
Two core data retrieval functions are provided:
retrieveClassificationTable(): retrieves the structure
of a statistical classification (codes, labels, hierarchy).retrieveCorrespondenceTable(): retrieves a
correspondence (mapping) table between two classifications.Optionally, both functions can return the SPARQL query used for the retrieval, making the process transparent, inspectable, and reproducible.
In addition, the dataStructure() utility allows users to
inspect the hierarchical structure of a classification (e.g. available
levels and code depth) before retrieving the data. This step is optional
but recommended when working with hierarchical classifications,
particularly when the desired level is not known in advance and is
covered in this vignette, with illustrative examples provided for both
the CELLAR and FAO endpoints.
Before using the core retrieval functions
retrieveClassificationTable() and
retrieveCorrespondenceTable(), it is necessary to know
which data can be retrieved and how it is identified.
In practice, this means answering the following questions:
CELLAR or FAO)?prefix, conceptScheme,
ID_table) should be used?The package provides lightweight discovery utilities to support this step before data retrieval.
To retrieve a statistical classification, users first need to know
which classifications are available at a given endpoint and how they are
identified. The classificationList() utility provides this
information.
The example below illustrates the typical output structure using a
static snapshot of the CELLAR classification list bundled
with the package.
To retrieve updated information about available classifications,
users only need to execute the classificationList()
function.
list_data <- read.csv(
system.file("extdata/test", "classificationList_CELLAR.csv",
package = "correspondenceTables"),
stringsAsFactors = FALSE
)
knitr::kable(
head(list_data, 3),
caption = "Example output of classificationList() (retrieved from CELLAR)"
)| Prefix | ConceptScheme | URI | Title | Languages |
|---|---|---|---|---|
| ACL 2018 | acl2018 | https://data.europa.eu/b8y/timeuse2018/acl2018 | Activity coding list for harmonised European time use surveys, 2018 (ACL 2018) | en |
| CBF | cbf | https://data.europa.eu/38u/cbf1.0/cbf | Classification of Business Functions (CBF) | en |
| CEP | cep | https://data.europa.eu/4k9/cep/cep | Classification of Environmental Purposes (CEP) | en |
For each classification, three identifiers are required to retrieve the data:
endpoint: "CELLAR" or
"FAO"prefix: namespace prefix used in the SPARQL
endpointconceptScheme: unique identifier of the
classificationFor example, the NACE Rev. 2 classification:
CELLAR repository,"nace2","nace2".Many statistical classifications are hierarchical. If only a specific
level is required (e.g. divisions or classes), it is recommended to
inspect the classification structure first. The
dataStructure() function provides this information.
This example illustrates how to inspect the structural
characteristics of a classification stored in the CELLAR
repository. The dataStructure() function can be used to
retrieve either a summary view, a detailed
view, or both.
To keep the vignette reproducible and independent of live SPARQL endpoints, the function calls below are shown for documentation purposes only.
The summary output provides an overview of the
hierarchical organisation of the classification. For each level, it
reports:
This view is useful for quickly understanding the overall structure of a classification and identifying which hierarchical levels are available.
ds_cn <- dataStructure(
endpoint = "CELLAR",
prefix = "cn2022",
conceptScheme = "cn2022",
language = "en",
return = "summary"
)
knitr::kable(head(ds_cn, 20), caption = "CN 2022 — dataStructure(summary)")The Combined Nomenclature (CN 2022) follows a hierarchical product classification structure defined at several levels. The summary output shows that it consists of five hierarchical levels:
The Count column indicates the number of classification
items defined at each hierarchical depth.
The details output returns one row per classification
item. It provides item‑level metadata, including:
This view is intended for detailed inspection of classification content, for example when analysing parent-child relationships or validating code hierarchies.
When return = "both", the function returns a list
containing both summary and detailed outputs. This option can be
convenient when both a structural overview and item‑level information
are required within a single workflow.
ds_cn_both <- dataStructure(
endpoint = "CELLAR",
prefix = "cn2022",
conceptScheme = "cn2022",
language = "en",
return = "both"
)
knitr::kable(head(ds_cn_both$summary, 20), caption = "CN 2022 — summary (from both)")
knitr::kable(head(ds_cn_both$details, 20), caption = "CN 2022 — details (from both)")As with classifications retrieved from CELLAR, this
inspection step can be skipped if the required classification level is
already known in advance.
The same approach can be applied to classifications hosted in the FAO repository. This example illustrates how to inspect the structure of the Central Product Classification (CPC), version 2.1.
As with CELLAR, the dataStructure() function can return
a summary view, a detailed view, or
both representations of the classification structure.
In practice, the choice depends on whether a high-level overview or
item-level information is required.
To keep the vignette reproducible and independent of live SPARQL endpoints, the function call below is provided for documentation purposes only.
The summary output provides a compact overview of the hierarchical organisation of CPC 2.1. For each level, it reports:
This view is useful for understanding the overall structure of the classification before retrieving detailed content.
endpoint <- "FAO"
prefix <- "CPC21"
conceptScheme <- "CPC21"
ds_cpc <- dataStructure(
endpoint = endpoint,
prefix = prefix,
conceptScheme = conceptScheme,
language = "en",
showQuery = FALSE,
return = "summary"
)
knitr::kable(
head(ds_cpc, 20),
caption = "CPC 2.1 — dataStructure(summary, FAO)"
)As in the CELLAR example, return = "details" retrieves
item-level information, while return = "both" returns both
summary and detailed outputs in a single call.
Once the classification identifiers and (optionally) the desired
level are known, the retrieveClassificationTable() function
can be used to retrieve the data.
The function returns a flat data frame suitable for:
Main arguments
endpoint: "CELLAR" or
"FAO"prefix: Character. Classification prefix used for
matching and URI resolution (e.g. “cn2022”, “cpc21”, “isic4”).conceptScheme: Character. Local identifier of the
scheme (often identical to prefix). The function
automatically resolves this to the canonical ConceptScheme URI published
in the endpoint.language: Character. Preferred label language as a
BCP47 code. Defaults to “en” (English). Examples: “fr”, “de”.level: Character. One of:
"ALL" (default): return all levels in the
hierarchy;showQuery: Logical.
FALSE (default): returns only the classification
table;TRUE: returns a list containing the SPARQL query, the
resolved scheme URI, and the table itself.knownSchemes: Optional. A data.frame supplying
authoritative mappings of the form Prefix, ConceptScheme, URI. When
provided, this overrides automatic discovery. To be obtained using
classificationList(endpoint).preferMappingOnly: Logical. If TRUE, the
function never attempts SPARQL discovery and uses only information in
knownSchemes or classificationList(endpoint).
Default: FALSE.The following example demonstrates how to retrieve level‑4 (“class”) data for the German, French, and Bulgarian versions of NACE Rev. 2. The code is not executed during vignette rendering as data availability and response times may vary.
endpoint <- "CELLAR"
prefix <- "nace2"
conceptScheme <- "nace2"
level <- "4"
languages <- c("de", "fr", "bg")
results <- lapply(languages, function(lang) {
retrieveClassificationTable(
endpoint = endpoint,
prefix = prefix,
conceptScheme = conceptScheme,
language = lang,
level = level,
showQuery = FALSE
)
})The resulting object is a list of data frames, one per language, each containing the class‑level codes and labels for NACE Rev. 2 in the selected language.
The FAO endpoint provides access to a limited subset of international classifications. Availability depends on the endpoint configuration.
The following example illustrates how a FAO classification would be retrieved. The code is not executed during vignette rendering.
This call queries the FAO repository and returns metadata describing all published classification schemes (prefix, concept scheme, title, etc.).
cl_fao <- classificationList("FAO")
knitr::kable(
head(cl_fao),
caption = "Retrieving a classification table from the FAO endpoint"
)Inspect available prefix identifiers
The Prefix field identifies the catalogue or namespace
under which each FAO classification is published.
Inspect available concept schemes
The ConceptScheme field identifies the underlying
classification schemes that can be queried using
retrieveClassificationTable().
Retrieving a classification table from the FAO endpoint
The following example illustrates how to retrieve a classification
from the FAO repository using
retrieveClassificationTable(). Because FAO
data availability and response times may vary, this example is shown for
documentation purposes and is not executed in the vignette.
endpoint <- "FAO"
prefix <- "cpc21"
conceptScheme <- "core"
out <- retrieveClassificationTable(
endpoint = endpoint,
prefix = prefix,
conceptScheme = conceptScheme,
language = "en",
level = "2",
showQuery = TRUE
)The FAO endpoint provides access to selected
international and domain‑specific classifications maintained by
FAO. Not all CELLAR classifications are
available via FAO, and vice versa.
Every time it is executed, the
retrieveClassificationTable() function attempts to retrieve
the list of all the available classifications for a selected endpoint,
in order to have always the most up-to-date URI for a given pair of
prefix-concept scheme. Since this step can be time consuming, it can be
skipped entirely by providing a previously retrieved (and stored)
classification list (obtained with classificationList())
using the knownSchemes argument. The example that follows,
shows how to use this argument:
The retrieveCorrespondenceTable() function retrieves a
correspondence (mapping) table between two statistical classifications
from a SPARQL endpoint. Its interface is similar to
retrieveClassificationTable(), with the main difference
that correspondence tables are identified using ID_table
(instead of conceptScheme). Correspondence tables are
usually provided at the most granular level of the classifications
involved.
Main arguments
endpoint: Character. The online service to query.
Case-insensitive. Supported values are those returned by the internal
endpoint registry (e.g., "CELLAR",
"FAO").
prefix: Character. Catalogue prefix where the
correspondence is published (e.g., “nace2”, “cpa21”, “cn2022”). Use
correspondenceTableList() to discover valid
values.
ID_table: Character. Identifier of the
correspondence, typically of the form “A_B” such as “NACE2_CPA21” or
“CN2022_NACE2”. Discover identifiers via
correspondenceTableList().
language: Character. Preferred label language as a
BCP47 code. Defaults to “en” (English). Examples: “fr”, “de”.
showQuery: Logical. If TRUE, returns a
list with the SPARQL query and the result data frame; otherwise
(default) returns just the data frame.
Before retrieving a correspondence table, users need to identify
which correspondences are available and how they are referenced at a
given SPARQL endpoint. The correspondenceTableList()
utility serves this purpose. It is analogous to
classificationList(), but lists correspondence tables
instead of classifications.
The following example illustrates how to list correspondence tables
available from the CELLAR and FAO
repositories. It is shown for documentation purposes and not executed
during vignette rendering to avoid reliance on live external SPARQL
endpoints.
corr_list = correspondenceTableList("ALL")
names(corr_list)
#Correspondence tables available from CELLAR
knitr::kable(
head(corr_list$CELLAR, 10),
caption = "Available correspondence tables from the CELLAR endpoint (preview)"
)
#Correspondence tables available from FAO
knitr::kable(
head(corr_list$FAO, 10),
caption = "Available correspondence tables from the FAO endpoint (preview)"
)When executed interactively, this call returns a list whose elements
correspond to the selected endpoints (e.g. CELLAR,
FAO). Each element is a data frame describing the available
correspondence tables, including their identifiers, associated prefixes,
and human-readable labels.
Each correspondence table is identified by:
"CELLAR" or
"FAO"The following examples illustrate how to inspect the correspondence
tables available from the CELLAR endpoints.
The following example illustrates the retrieval of a correspondence
table published by the Publications Office of the European Union via the
CELLAR endpoint. Users should note that the availability of
correspondence data depends on what is currently exposed by the
underlying SPARQL endpoint. Although a correspondence table may be
listed by correspondenceTableList(), it can legitimately
return an empty result when queried. For some CELLAR
correspondences (including several PRODCOM‑related mappings),
retrieveCorrespondenceTable() may therefore return a valid
but empty data frame, which does not indicate a failure of the retrieval
process.
res <- retrieveCorrespondenceTable(
endpoint = "CELLAR",
prefix = "prodcom2023",
ID_table = "PRODCOM2023_CPA21",
language = "en",
showQuery = FALSE
)
knitr::kable(
head(res, 10),
caption = "PRODCOM2023_CPA21 CorrespondenceTable from the CELLAR endpoint "
)To reduce potential user confusion, it is helpful to include at least one correspondence example that is more likely to return data when queried.
res2 <- retrieveCorrespondenceTable(
endpoint = "CELLAR",
prefix = "nace2",
ID_table = "NACE2_CPA21",
language = "en"
)
knitr::kable(
head(res2, 10),
caption = "NACE2_CPA21 CorrespondenceTable from the CELLAR endpoint "
)For transparency and reproducibility, the SPARQL query used for
retrieval can also be inspected by setting
showQuery = TRUE.
The following examples illustrate how to inspect the correspondence
tables available from the FAO endpoint.
The following example illustrates the retrieval of a correspondence
table published by the Food and Agriculture Organization of the United
Nations (FAO) via the FAO endpoint.
Users should note that the availability of correspondence data
depends on what is currently exposed by the underlying SPARQL endpoint.
Although a correspondence table may be listed by
correspondenceTableList(), it can legitimately return an
empty result when queried. In practice, however, correspondence tables
exposed by the FAO endpoint tend to be more consistently
populated than some of those available from CELLAR.
The English-language version of the CPC 2.1 : ISIC Rev. 4 correspondence table can be retrieved as follows. This example is not executed during vignette rendering.
For transparency and reproducibility, the SPARQL query used for
retrieval can also be inspected by setting
showQuery = TRUE.
The correspondenceTables package simplifies access to
statistical classifications and correspondence tables published as
Linked Open Data (LOD), including those provided by major repositories
such as the EU Publications Office (CELLAR) and FAO.
It offers a high-level R interface to:
This approach lowers the technical barrier to working with official classification systems, enabling analysts to integrate them seamlessly into their workflows while preserving transparency and reproducibility.