This vignette provides information about the
classificationQC() function included in the
correspondenceTables package, which is used to perform
quality control on classifications.
The main function classificationQC() performs structural
and logical quality control on hierarchical classifications. It returns
a list of data frames, including an enriched version of the
classification (QC_output) and additional tables flagging potential
issues such as orphan codes, duplicate labels, or sequencing
problems.
The quality‑control checks identify several types of potential structural or logical issues commonly observed in official classifications:
Missing hierarchy levels: codes for which no hierarchical level can be inferred from the provided structure (for example because their code length does not match any declared level). Such codes cannot be positioned reliably in the hierarchy.
Orphan codes: codes that do not have a valid parent code at the immediately higher hierarchical level. This indicates a break in the hierarchical structure and usually reflects missing or inconsistent higher‑level codes.
Childless codes: internal codes that do not have any descendants at the immediately lower hierarchical level. While expected at the lowest level, this may signal incomplete structures or unintended dead ends at higher levels.
Duplicate labels: identical labels occurring more than once within the same hierarchical level. This does not invalidate the hierarchy, but may reduce interpretability and cause ambiguity when classifications are used in tabular or statistical contexts.
Label‑hierarchy inconsistencies: situations in which the label of a child code does not reflect the label of its parent where such inheritance is expected (for example in single‑child situations), suggesting potential inconsistencies in naming conventions.
Sequencing anomalies: gaps or breaks in expected code sequences among sibling codes under the same parent. In structured coding systems where code values are meaningful (for example numeric ranges), such gaps may indicate missing or omitted codes.
Main arguments
The main arguments of the function are:
classification: A data frame containing the classification codes and labels.
lengths: A data frame with one row per
hierarchical level giving the initial and final positions of the segment
of the code referring to that level. The number of rows implicitly
defines the number of hierarchical levels (\(k\)). The column names should be
charb and chare, if this is not the case, they
will be automatically changed and a warning will appear.
fullHierarchy: Logical. If FALSE,
the function checks that all positions at levels greater than 1 have a
parent at the level immediately above (no orphans). If
TRUE, it additionally checks that positions at levels
strictly lower than \(k\) have children
in the next level (no childless nodes). More specifically, the function
checks the completeness of the hierarchical structure by applying two
rules:
orphan, takes the value 1 for positions at hierarchical
level \(j > 1\) that lack a parent
at the immediately higher level (\(j −
1\)), and 0 otherwise.childless, takes the value 1 for positions at
hierarchical level \(j < k\) that
lack a child at the immediately lower level (\(j + 1\)), and 0 otherwise.labelUniqueness: Logical. When
TRUE, the function checks whether labels are unique, at
each hierarchical level. Duplicates are listed in the
QC_duplicatesLabel table.
labelHierarchy:Logical. When TRUE,
the function checks that single children share the same label as their
parent and if a parent shares a label with one of its children, it must
be a single-child parent. The possible values are:
singleChildCode: an optional data frame defining admissible coding rules for single‑child and multiple‑child situations, with columns level, singleCode, and multipleCode. If these headers are missing or incorrect, they are automatically corrected with a warning.
sequencing: an optional data frame defining admissible code‑range rules for multiple‑child situations, used to identify potential gaps in structured code sequences. The expected columns are level and multipleCode.
It is important to note that not all detected issues necessarily indicate errors. The quality‑control checks are diagnostic signals intended to support expert review of classification quality and consistency, and they do not impose constraints on the hierarchical structure itself. In particular:
The validation procedures rely on a small set of auxiliary tables that define structural constraints, such as expected code lengths, single‑child rules, and sequencing between levels.
We load three auxiliary tables used for classification validation.
lengths argumentThe lengths table specifies the character positions at
which each hierarchical level of a classification code starts and ends.
Specifically, column charb indicates the starting position
of the segment (character beginning), while column chare
indicates the ending position (character end).
For example, the following definition indicates that:
An example of such a structure is shown below:
lengths_example <- data.frame(
charb = c(1, 3, 5),
chare = c(2, 4, 7)
)
knitr::kable(
lengths_example,
caption = "Example of expected code lengths by hierarchical level",
align = "c"
)| charb | chare |
|---|---|
| 1 | 2 |
| 3 | 4 |
| 5 | 7 |
In some classifications, specific coding conventions are used to distinguish between situations where a parent code has a single child and situations where it has multiple children. These conventions do not restrict the hierarchical structure itself and do not limit the number of children per node.
Instead, they verify whether observed codes comply with predefined coding patterns when a single‑child or multiple‑child situation occurs.
The singleChildCode table defines these admissible
patterns and contains the following columns:
These checks do not modify the classification and do not enforce a specific hierarchical shape. They merely flag cases where observed coding does not match the declared conventions, which may indicate inconsistencies in code design.
singleChildCode <- read.csv(
system.file("extdata/test", "SingleChild.csv",
package = "correspondenceTables")
)
knitr::kable(
singleChildCode,
caption = "Single-child code rules",
align = "c"
)| level | singleCode | multipleCode |
|---|---|---|
| 2 | 0 | 10 |
| 3 | 0 | 1 |
| 4 | 0 | 1 |
Sequencing checks are not intended to impose an ordering on hierarchical trees. In a pure tree structure, only parent‑child relationships matter.
However, in many official classifications, code values themselves convey implicit structure (for example numeric or alphanumeric sequences). In such systems, sibling codes are often expected to follow predefined ranges or patterns.
The purpose of sequencing checks is therefore diagnostic, not normative: they aim to detect gaps or breaks in otherwise structured code spaces, which may indicate missing, omitted, or inconsistently defined codes.
Sequencing rules are defined through a table with the following columns:
Sequencing anomalies do not invalidate the hierarchy, but they may point to classification maintenance issues or incomplete implementations of official coding schemes.
sequencing <- read.csv(
system.file("extdata/test", "Sequencing.csv",
package = "correspondenceTables")
)
knitr::kable(
sequencing,
caption = "Example of sequencing rules by hierarchical level",
align = "c"
)| level | multipleCode |
|---|---|
| 2 | 1.020304e+196 |
| 3 | 1.234568e+08 |
| 4 | 1.234568e+08 |
The following example applies classificationQC() to the
NACE Rev.2 classification using additional parameters.
In this example, the user provides:
lengths argument.This example demonstrates how different parameters of
classificationQC() are used to perform structural and
logical quality checks.
classification <- read.csv(
system.file("extdata/test", "Nace2_long.csv", package = "correspondenceTables")
)
lengths <- data.frame(
charb = c(1, 2, 3, 5),
chare = c(1, 2, 4, 5)
)We now apply the classificationQC() function using the
previously defined classification and hierarchy structure. The function
performs structural and logical quality checks on the NACE Rev.2
classification. For illustration purposes, the output is summarised by
reporting the number of detected issues for selected quality checks.
output <- classificationQC(
classification = classification,
lengths = lengths,
fullHierarchy = TRUE,
labelUniqueness = TRUE,
labelHierarchy = TRUE,
singleChildCode = NULL,
sequencing = NULL
)
qc_summary <- data.frame(
Check = c("No levels", "Orphan codes", "Childless codes"),
Number_of_issues = c(
nrow(output$QC_noLevels),
nrow(output$QC_orphan),
nrow(output$QC_childless)
)
)
knitr::kable(
qc_summary,
caption = "Summary of quality control checks",
align = "c"
)| Check | Number_of_issues |
|---|---|
| No levels | 0 |
| Orphan codes | 88 |
| Childless codes | 21 |
QC_noLevels)In this example, all classification codes have a properly defined
hierarchy level. As a result, the quality check QC_noLevels
does not produce any output.
QC_noLevels
QC_orphan)Orphan codes are codes that have no parent code at a higher hierarchical level. This usually indicates breaks in the hierarchical structure.
QC_orphan
| Code | Label | Level | Parent | Include | Include_Also | Exclude | level | |
|---|---|---|---|---|---|---|---|---|
| 1 | 01 | Crop and animal production, hunting and related service activities | 2 | 0 | NA | This division also includes service activities incidental to agriculture, as well as hunting, trapping and related activities. | Agricultural activities exclude any subsequent processing of the agricultural products (classified under divisions 10 and 11 (Manufacture of food products and beverages) and division 12 (Manufacture of tobacco products)), beyond that needed to prepare them for the primary markets. The preparation of products for the primary markets is included here. The division excludes field construction (e.g. agricultural land terracing, drainage, preparing rice paddies etc.) classified in section F (Construction) and buyers and cooperative associations engaged in the marketing of farm products classified in section G. Also excluded is the landscape care and maintenance, which is classified in class 81.30. | 2 |
| 40 | 02 | Forestry and logging | 2 | 0 | NA | NA | Excluded is further processing of wood beginning with sawmilling and planing of wood, see division 16. | 2 |
| 49 | 03 | Fishing and aquaculture | 2 | 0 | NA | Also included are activities that are normally integrated in the process of production for own account (e.g. seeding oysters for pearl production). Service activities incidental to marine or freshwater fishery or aquaculture are included in the related fishing or aquaculture activities. | This division does not include building and repairing of ships and boats (30.1, 33.15) and sport or recreational fishing activities (93.19). Processing of fish, crustaceans or molluscs is excluded, whether at land-based plants or on factory ships (10.20). | 2 |
| 56 | 05 | Mining of coal and lignite | 2 | 0 | NA | NA | This division does not include coking (see 19.10), services incidental to coal or lignite mining (see 09.90) or the manufacture of briquettes (see 19.20). | 2 |
| 61 | 06 | Extraction of crude petroleum and natural gas | 2 | 0 | NA | NA | This division excludes: - oil and gas field services, performed on a fee or contract basis, see 09.10 - oil and gas well exploration, see 09.10 - test drilling and boring, see 09.10 - refining of petroleum products, see 19.20 - geophysical, geologic and seismic surveying, see 71.12 | 2 |
QC_childless)Childless codes are codes at high level that have no descendants at lower hierarchical levels. This can be expected at the lowest level of a classification, but may indicate structural issues at higher levels.
QC_childless
| Code | Label | Level | Parent | Include | Include_Also | Exclude | level | |
|---|---|---|---|---|---|---|---|---|
| 976 | A | AGRICULTURE, FORESTRY AND FISHING | 1 | NA | NA | NA | NA | 1 |
| 977 | B | MINING AND QUARRYING | 1 | NA | NA | NA | This section excludes: - processing of the extracted materials, see section C (Manufacturing) - usage of the extracted materials without a further transformation for construction purposes, see section F (Construction) - bottling of natural spring and mineral waters at springs and wells, see 11.07 - crushing, grinding or otherwise treating certain earths, rocks and minerals not carried on in conjunction with mining and quarrying, see 23.9 | 1 |
| 978 | C | MANUFACTURING | 1 | NA | NA | NA | NA | 1 |
| 979 | D | ELECTRICITY, GAS, STEAM AND AIR CONDITIONING SUPPLY | 1 | NA | NA | Also included is the provision of steam and air-conditioning supply. | This section excludes the operation of water and sewerage utilities, see 36, 37. This section also excludes the (typically long-distance) transport of gas through pipelines. | 1 |
| 980 | E | WATER SUPPLY; SEWERAGE, WASTE MANAGEMENT AND REMEDIATION ACTIVITIES | 1 | NA | NA | Activities of water supply are also grouped in this section, since they are often carried out in connection with, or by units also engaged in, the treatment of sewage. | NA | 1 |
The following example illustrates the quality control of the NACE
Rev.2 classification from CELLAR using additional parameters, including
the singleChildCode argument.
singleChildCode <- read.csv(
system.file("extdata/test", "SingleChild.csv", package = "correspondenceTables")
)
knitr::kable(
singleChildCode,
caption = "singleChildCode argument",
align = "c"
)| level | singleCode | multipleCode |
|---|---|---|
| 2 | 0 | 10 |
| 3 | 0 | 1 |
| 4 | 0 | 1 |
output2 <- classificationQC(
classification = classification,
lengths = lengths,
fullHierarchy = TRUE,
labelUniqueness = TRUE,
labelHierarchy = TRUE,
singleChildCode = singleChildCode,
sequencing = NULL
)This table lists orphan codes, i.e. codes that do not have a valid parent at the immediately higher hierarchical level.
QC_orphan
| Code | Label | Level | Parent | Include | Include_Also | Exclude | level | |
|---|---|---|---|---|---|---|---|---|
| 1 | 01 | Crop and animal production, hunting and related service activities | 2 | 0 | NA | This division also includes service activities incidental to agriculture, as well as hunting, trapping and related activities. | Agricultural activities exclude any subsequent processing of the agricultural products (classified under divisions 10 and 11 (Manufacture of food products and beverages) and division 12 (Manufacture of tobacco products)), beyond that needed to prepare them for the primary markets. The preparation of products for the primary markets is included here. The division excludes field construction (e.g. agricultural land terracing, drainage, preparing rice paddies etc.) classified in section F (Construction) and buyers and cooperative associations engaged in the marketing of farm products classified in section G. Also excluded is the landscape care and maintenance, which is classified in class 81.30. | 2 |
| 40 | 02 | Forestry and logging | 2 | 0 | NA | NA | Excluded is further processing of wood beginning with sawmilling and planing of wood, see division 16. | 2 |
| 49 | 03 | Fishing and aquaculture | 2 | 0 | NA | Also included are activities that are normally integrated in the process of production for own account (e.g. seeding oysters for pearl production). Service activities incidental to marine or freshwater fishery or aquaculture are included in the related fishing or aquaculture activities. | This division does not include building and repairing of ships and boats (30.1, 33.15) and sport or recreational fishing activities (93.19). Processing of fish, crustaceans or molluscs is excluded, whether at land-based plants or on factory ships (10.20). | 2 |
| 56 | 05 | Mining of coal and lignite | 2 | 0 | NA | NA | This division does not include coking (see 19.10), services incidental to coal or lignite mining (see 09.90) or the manufacture of briquettes (see 19.20). | 2 |
| 61 | 06 | Extraction of crude petroleum and natural gas | 2 | 0 | NA | NA | This division excludes: - oil and gas field services, performed on a fee or contract basis, see 09.10 - oil and gas well exploration, see 09.10 - test drilling and boring, see 09.10 - refining of petroleum products, see 19.20 - geophysical, geologic and seismic surveying, see 71.12 | 2 |
This table lists childless codes, i.e. codes that have no descendants at the immediately lower hierarchical level
QC_childless
| Code | Label | Level | Parent | Include | Include_Also | Exclude | level | |
|---|---|---|---|---|---|---|---|---|
| 976 | A | AGRICULTURE, FORESTRY AND FISHING | 1 | NA | NA | NA | NA | 1 |
| 977 | B | MINING AND QUARRYING | 1 | NA | NA | NA | This section excludes: - processing of the extracted materials, see section C (Manufacturing) - usage of the extracted materials without a further transformation for construction purposes, see section F (Construction) - bottling of natural spring and mineral waters at springs and wells, see 11.07 - crushing, grinding or otherwise treating certain earths, rocks and minerals not carried on in conjunction with mining and quarrying, see 23.9 | 1 |
| 978 | C | MANUFACTURING | 1 | NA | NA | NA | NA | 1 |
| 979 | D | ELECTRICITY, GAS, STEAM AND AIR CONDITIONING SUPPLY | 1 | NA | NA | Also included is the provision of steam and air-conditioning supply. | This section excludes the operation of water and sewerage utilities, see 36, 37. This section also excludes the (typically long-distance) transport of gas through pipelines. | 1 |
| 980 | E | WATER SUPPLY; SEWERAGE, WASTE MANAGEMENT AND REMEDIATION ACTIVITIES | 1 | NA | NA | Activities of water supply are also grouped in this section, since they are often carried out in connection with, or by units also engaged in, the treatment of sewage. | NA | 1 |
| 981 | F | CONSTRUCTION | 1 | NA | NA | This section also includes the development of building projects for buildings or civil engineering works by bringing together financial, technical and physical means to realise the construction projects for later sale. | If these activities are carried out not for later sale of the construction projects, but for their operation (e.g. rental of space in these buildings, manufacturing activities in these plants), the unit would not be classified here, but according to its operational activity, i.e. real estate, manufacturing etc. | 1 |
| 982 | G | WHOLESALE AND RETAIL TRADE; REPAIR OF MOTOR VEHICLES AND MOTORCYCLES | 1 | NA | NA | NA | NA | 1 |
| 983 | H | TRANSPORTATION AND STORAGE | 1 | NA | NA | NA | This section excludes: - major repair or alteration of transport equipment, except motor vehicles, see group 33.1 - construction, maintenance and repair of roads, railways, harbours, airfields, see division 42 - maintenance and repair of motor vehicles, see 45.20 - rental of transport equipment without driver or operator, see 77.1, 77.3 | 1 |
| 984 | I | ACCOMMODATION AND FOOD SERVICE ACTIVITIES | 1 | NA | NA | NA | This section excludes the provision of long-term accommodation as primary residences, which is classified in real estate activities (section L). Also excluded is the preparation of food or drinks that are either not fit for immediate consumption or that are sold through independent distribution channels, i.e. through wholesale or retail trade activities. The preparation of these foods is classified in manufacturing (section C). | 1 |
| 985 | J | INFORMATION AND COMMUNICATION | 1 | NA | NA | NA | NA | 1 |
In this final example, the sequencing parameter is used
to detect potential gaps in structured sequences of sibling codes within
the hierarchy.
Sequencing rules are applied at hierarchical levels 3 and 4, as
specified in the sequencing input table. At these levels,
the function identifies missing or inconsistent code values within
predefined numeric or alphanumeric ranges, which may indicate incomplete
or faulty classification structures.
singleChildCode <- read.csv(
system.file("extdata/test", "SingleChild2.csv", package = "correspondenceTables")
)
sequencing <- read.csv(
system.file("extdata/test", "Sequencing.csv",
package = "correspondenceTables")
)
output3 <- classificationQC(
classification = classification,
lengths = lengths,
fullHierarchy = TRUE,
labelUniqueness = TRUE,
labelHierarchy = TRUE,
singleChildCode = singleChildCode,
sequencing = sequencing
)The QC_gapBefore argument identifies gaps in expected
code sequences among sibling codes within the same parent.
QC_gapBefore
| Code | Label | Level | Parent | Include | Include_Also | Exclude | level | |
|---|---|---|---|---|---|---|---|---|
| 2 | 01.1 | Growing of non-perennial crops | 3 | 01 | NA | NA | NA | 3 |
| 9 | 01.19 | Growing of other non-perennial crops | 4 | 01.1 | NA | NA | This class excludes: - growing of non-perennial spices, aromatic, drug and pharmaceutical crops, see 01.28 | 4 |
| 10 | 01.2 | Growing of perennial crops | 3 | 01 | NA | NA | NA | 3 |
| 20 | 01.3 | Plant propagation | 3 | 01 | NA | NA | NA | 3 |
| 22 | 01.4 | Animal production | 3 | 01 | NA | NA | This group excludes: - farm animal boarding and care, see 01.62 - production of hides and skins from slaughterhouses, see 10.11 | 3 |
| 30 | 01.49 | Raising of other animals | 4 | 01.4 | NA | NA | This class excludes: - production of hides and skins originating from hunting and trapping, see 01.70 - operation of frog farms, crocodile farms, marine worm farms, see 03.21, 03.22 - operation of fish farms, see 03.21, 03.22 - boarding and training of pet animals, see 96.09 - raising and breeding of poultry, see 01.47 | 4 |
| 31 | 01.5 | Mixed farming | 3 | 01 | NA | NA | NA | 3 |
| 33 | 01.6 | Support activities to agriculture and post-harvest crop activities | 3 | 01 | NA | Also included are post-harvest crop activities, aimed at preparing agricultural products for the primary market. | NA | 3 |
| 38 | 01.7 | Hunting, trapping and related service activities | 3 | 01 | NA | NA | NA | 3 |
| 41 | 02.1 | Silviculture and other forestry activities | 3 | 02 | NA | NA | NA | 3 |
This table lists the last sibling codes within each group of children, used to assess sequence completeness.
QC_lastSibling
| Code | Label | Level | Parent | Include | Include_Also | Exclude | level | |
|---|---|---|---|---|---|---|---|---|
| 9 | 01.19 | Growing of other non-perennial crops | 4 | 01.1 | NA | NA | This class excludes: - growing of non-perennial spices, aromatic, drug and pharmaceutical crops, see 01.28 | 4 |
| 19 | 01.29 | Growing of other perennial crops | 4 | 01.2 | NA | NA | This class excludes: - growing of flowers, production of cut flower buds and growing of flower seeds, see 01.19 - gathering of tree sap or rubber-like gums in the wild, see 02.30 | 4 |
| 30 | 01.49 | Raising of other animals | 4 | 01.4 | NA | NA | This class excludes: - production of hides and skins originating from hunting and trapping, see 01.70 - operation of frog farms, crocodile farms, marine worm farms, see 03.21, 03.22 - operation of fish farms, see 03.21, 03.22 - boarding and training of pet animals, see 96.09 - raising and breeding of poultry, see 01.47 | 4 |
| 37 | 01.64 | Seed processing for propagation | 4 | 01.6 | NA | NA | This class excludes: - growing of seeds, see groups 01.1 and 01.2 - processing of seeds to obtain oil, see 10.41 - research to develop or modify new forms of seeds, see 72.11 | 4 |
| 38 | 01.7 | Hunting, trapping and related service activities | 3 | 01 | NA | NA | NA | 3 |
| 47 | 02.4 | Support services to forestry | 3 | 02 | NA | NA | NA | 3 |
| 52 | 03.12 | Freshwater fishing | 4 | 03.1 | NA | This class also includes: - gathering of freshwater materials | This class excludes: - processing of fish, crustaceans and molluscs, see 10.20 - fishing inspection, protection and patrol services, see 84.24 - fishing practiced for sport or recreation and related services, see 93.19 - operation of sport fishing preserves, see 93.19 | 4 |
| 53 | 03.2 | Aquaculture | 3 | 03 | NA | In addition, “aquaculture” also encompasses individual, corporate or state ownership of the individual organisms throughout the rearing or culture stage, up to and including harvesting. | NA | 3 |
| 55 | 03.22 | Freshwater aquaculture | 4 | 03.2 | NA | NA | This class excludes: - aquaculture activities in salt water filled tanks and reservoirs, see 03.21 - operation of sport fishing preserves, see 93.19 | 4 |
| 59 | 05.2 | Mining of lignite | 3 | 05 | NA | NA | NA | 3 |
This table contains the full classification enriched with all quality‑control flags produced by the checks
QC_output
| nace2 | Label | Level | Parent | Include | Include_Also | Exclude | level |
|---|---|---|---|---|---|---|---|
| 01 | Crop and animal production, hunting and related service activities | 2 | 0 | NA | This division also includes service activities incidental to agriculture, as well as hunting, trapping and related activities. | Agricultural activities exclude any subsequent processing of the agricultural products (classified under divisions 10 and 11 (Manufacture of food products and beverages) and division 12 (Manufacture of tobacco products)), beyond that needed to prepare them for the primary markets. The preparation of products for the primary markets is included here. The division excludes field construction (e.g. agricultural land terracing, drainage, preparing rice paddies etc.) classified in section F (Construction) and buyers and cooperative associations engaged in the marketing of farm products classified in section G. Also excluded is the landscape care and maintenance, which is classified in class 81.30. | 2 |
| 01.1 | Growing of non-perennial crops | 3 | 01 | NA | NA | NA | 3 |
| 01.11 | Growing of cereals (except rice), leguminous crops and oil seeds | 4 | 01.1 | NA | NA | This class excludes: - growing of rice, see 01.12 - growing of sweet corn, see 01.13 - growing of maize for fodder, see 01.19 - growing of oleaginous fruits, see 01.26 | 4 |
| 01.12 | Growing of rice | 4 | 01.1 | NA | NA | NA | 4 |
| 01.13 | Growing of vegetables and melons, roots and tubers | 4 | 01.1 | NA | NA | This class excludes: - growing of chillies, peppers (capsicum sop.) and other spices and aromatic crops, see 01.28 - growing of mushroom spawn, see 01.30 | 4 |
| 01.14 | Growing of sugar cane | 4 | 01.1 | NA | NA | This class excludes: - growing of sugar beet, see 01.13 | 4 |
| 01.15 | Growing of tobacco | 4 | 01.1 | NA | NA | This class excludes: - manufacture of tobacco products, see 12.00 | 4 |
| 01.16 | Growing of fibre crops | 4 | 01.1 | NA | NA | NA | 4 |
| 01.19 | Growing of other non-perennial crops | 4 | 01.1 | NA | NA | This class excludes: - growing of non-perennial spices, aromatic, drug and pharmaceutical crops, see 01.28 | 4 |
| 01.2 | Growing of perennial crops | 3 | 01 | NA | NA | NA | 3 |