Aggregate Correspondence Table

Overview

This vignette illustrates how to aggregate numeric values across classification systems using the aggregateCorrespondenceTable() function from the correspondenceTables package. The function aggregates numeric values expressed in a source classification (A) into a target classification (B), using a correspondence table that links A to B (denoted A → B). When correspondence weights are available, values are redistributed proportionally according to these weights. If no weights are provided, values are distributed equally across all corresponding target codes.

This type of aggregation is commonly used to convert statistics between classification systems, for example:

NACE → CPA

CPA → CN

PRODCOM → CPA

CPC → HS

The aggregateCorrespondenceTable() function expects the following inputs:

AB: A data frame representing the correspondence table between classification A and classification B.
It must contain:
- a source code column (from_code)
- a target code column (to_code)
- optionally, a weight column
A: A data frame containing values expressed in the source classification A. It typically includes:
- a source classification code
- one or more numeric variables to be aggregated
By default, the function expects the source code column in A to be named code. This can be adapted if the function supports custom column names.
B (optional): A data frame defining the domain of the target classification B.
If provided, all B codes are preserved in the output, and target codes with no matching contributions receive a value of zero.

Application of `aggregateCorrespondenceTable()`

Example 1: Basic aggregation using a correspondence table

In this example, all inputs are read from sample datasets included in the package.

AB_path <- system.file("extdata/test", "ab_data.csv", package = "correspondenceTables")
A_path  <- system.file("extdata/test", "a_data.csv",  package = "correspondenceTables")
B_path  <- system.file("extdata/test", "b_data.csv",  package = "correspondenceTables")

stopifnot(nzchar(AB_path), nzchar(A_path), nzchar(B_path))

AB <- utils::read.csv(AB_path, stringsAsFactors = FALSE)
A  <- utils::read.csv(A_path,  stringsAsFactors = FALSE)
B  <- utils::read.csv(B_path,  stringsAsFactors = FALSE)

#For clarity and consistency, the correspondence table columns are renamed to the expected identifiers:

names(AB)[names(AB) == "NACE.Rev..2.Code"]   <- "from_code"
names(AB)[names(AB) == "NACE.Rev..2.1.Code"] <- "to_code"


res <- aggregateCorrespondenceTable(AB = AB, A = A, B = B)


knitr::kable(
  head(res$result),
  caption = "Aggregation using a correspondence table",
  align = "c"
)

Aggregation using a correspondence table
code_B	Level	Superior	value
12	2	C	2.0
31	2	C	16.0
36	2	E	2.0
37	2	E	2.0
39	2	E	2.0
41	2	F	1.5

The function returns a list. The aggregated values are stored in the result element, which is a data frame structured according to the target classification B.

Interpretation of the output

In this example:

Dataset A contains numeric values expressed in the source classification.
The correspondence table AB specifies how each source code is linked to one or more target codes.
No weights are supplied in the correspondence table.

For each source code in A:

If it maps to a single target code, its full value is assigned to that target code.
If it maps to multiple target codes, its value is split equally among them.

All allocated contributions are then summed for each target code. The column containing numeric values in the output therefore represents the total value aggregated to each target classification code in B.

Notes

The aggregation performed by aggregateCorrespondenceTable() is additive: values are redistributed and summed, not averaged or otherwise summarized.
Supplying the B argument ensures that the output covers the full target classification domain; target codes with no matching contributions receive a value of zero.

Example 2: Weighted correspondence (proportional allocation)

This example illustrates aggregation when the correspondence table includes explicit weights.

Here:

Source code A1 is linked to two target codes:
- 70% of its value goes to B1
- 30% goes to B2
Source code A2 is linked entirely to B2

The function multiplies each source value by the corresponding weight for each correspondence link and then sums all weighted contributions per target code.

# Correspondence table with weights  
AB <- data.frame(
  from_code = c("A1", "A1", "A2"),
  to_code   = c("B1", "B2", "B2"),
  weight    = c(0.7, 0.3, 1.0)
)

# Source classification with values  
A <- data.frame(
  code  = c("A1", "A2"), 
  value = c(100, 50)
)

# Target classification domain
B <- data.frame(
  code = c("B1", "B2")
)

res2 <- aggregateCorrespondenceTable(AB = AB, A = A, B = B)

knitr::kable(
  head(res2$result),
  caption = "Weighted correspondence (proportional allocation)",
  align = "c"
)

Weighted correspondence (proportional allocation)
code_B	value
B1	70
B2	80

Interpretation of the output

The values shown in the output represent the total weighted sums per target code.

For example:

Target code B1 receives 70% of the value associated with A1
Target code B2 receives:
- 30% of A1
- 100% of A2

All contributions are summed to produce the final totals.

Tiny numeric illustration

If A1 has a value of 100:

70 is allocated to B1
30 is allocated to B2

If A2 has a value of 50 and maps fully to B2, the final value for B2 is:

\(30 + 50 = 80\)

Aggregate Correspondence Table

Overview

Application of aggregateCorrespondenceTable()

Example 1: Basic aggregation using a correspondence table

Example 2: Weighted correspondence (proportional allocation)

Application of `aggregateCorrespondenceTable()`