Recoding & Relabelling

Eurostat offers so-called correspondence tables to follow boundary changes, recoding and relabelling for all NUTS changes since the formalization of the NUTS typology. Unfortunately, these Excel tables do not conform with the requirements of tidy data, and their vocabulary for is not standardized, either. For example, recoding changes are often labelled as recoding, recoding and renaming, code change, Code change, etc.

The data-raw library contains these Excel tables and very long data wrangling code that unifies the relevant vocabulary of these Excel files and brings the tables into a single, tidy format , starting with the definition NUTS1999. The resulting data file nuts_changes is included in the regions package. It already contains the changes that will come into force in 2021.

Let’s review a few changes.

data(nuts_changes)

nuts_changes %>%
  mutate ( geo_16 = .data$code_2016, 
           geo_13 = .data$code_2013 ) %>%
  filter ( code_2016 %in% c("FRB", "HU11") | 
             code_2013 %in% c("FR7", "HU10", "FR24")) %>%
  select ( all_of(c("typology", "geo_16", "geo_13", "start_year",
           "code_2013", "change_2013",
           "code_2016", "change_2016")) 
           ) %>%
  pivot_longer ( cols = starts_with("code"), 
                 names_to = 'definition', 
                 values_to = 'code') %>%
  pivot_longer ( cols = starts_with("change"), 
                 names_to = 'change', 
                 values_to = 'description')  %>%
  filter (!is.na(.data$description), 
          !is.na(.data$code)) %>%
  select ( -.data$change ) %>%
  knitr::kable ()
typology geo_16 geo_13 start_year definition code description
nuts_level_1 FRB NA 2016 code_2016 FRB new nuts 1 region, identical to ex-nuts 2 region fr24
nuts_level_1 FRK FR7 NA code_2013 FR7 relabelled and recoded
nuts_level_1 FRK FR7 NA code_2016 FRK relabelled and recoded
nuts_level_1 NA FR7 NA code_2013 FR7 discontinued
nuts_level_2 FRB0 FR24 NA code_2013 FR24 recoded and relabelled
nuts_level_2 FRB0 FR24 NA code_2016 FRB0 recoded and relabelled
nuts_level_2 HU11 NA 2016 code_2016 HU11 new region, equals ex-nuts 3 region hu101
nuts_level_2 NA HU10 NA code_2013 HU10 discontinued; split into new hu11 and hu12

You will not find the geo identifier FRB in any statistical data that was released before France changes its administrative boundaries and the NUTS2016 boundary definition came into force. However, as the description says, you may find historical data elsewhere, in a historical NUTS2-level product for the FRB CENTRE — VAL DE LOIRE NUTS1 region, because it is identical to the earlier NUTS2 level region FR24, i.e. Central France, which was known as Centre for many years before the transition to NUTS2016. The size and importance of this territorial unit is more similar to NUTS1 than NUTS2 units.

Because FRB contains only one FRB0, the earlier FR24, it is technically identified as a NUTS2-level region, too. You find the same data in the NUTS2 typology. With statistical products on NUTS2 level, you can simply recode historical FR24 data to FRB0, since the aggregation level and the boundaries are not changed. Furthermore, you can project this data to any NUTS1 level panel either under the earlier FR2 NUTS1 label, if you use the old definition, or the new FRB label, if you use the current NUTS2016 typology.

Let’s see a hypothetical data frame with random variables. (Usually a data frame has no so many issues, so a more detailed example can be constructed this way.)

example_df <- data.frame ( 
  geo  =  c("FR", "DEE32", "UKI3" ,
            "HU12", "DED", 
            "FRK"), 
  values = runif(6, 0, 100 ),
  stringsAsFactors = FALSE )

recode_nuts(dat = example_df, 
            nuts_year = 2013) %>%
  select ( geo, values, code_2013) %>%
  knitr::kable()
geo values code_2013
FR 84.66323 FR
UKI3 12.99526 UKI3
DED 30.99934 DED
FRK 64.72867 FR7
HU12 97.05639 NA
DEE32 97.01076 NA

In this hypothetical example we are creating backward compatibility with the NUTS2013 definition. There are three type of observations:

recode_nuts(example_df, nuts_year = 2013) %>%
  select ( all_of(c("geo", "values", "typology_change", "code_2013")) ) %>%
  knitr::kable()
geo values typology_change code_2013
FR 84.66323 unchanged FR
UKI3 12.99526 unchanged UKI3
DED 30.99934 unchanged DED
FRK 64.72867 Recoded from FRK [used in NUTS 2016-2021] FR7
HU12 97.05639 Used in NUTS 2016-2021 NA
DEE32 97.01076 Used in NUTS 1999-2003 NA

The first three observations are comparable with a NUTS2013 dataset. The fourth observation is comparable, too, but when joining with a NUTS2013 dataset or map, it is likely that FRK needs to be re-coded to FR7.

The following data can be joined with a NUTS2013 dataset or map:

recode_nuts(example_df, nuts_year = 2013) %>%
  select ( .data$code_2013, .data$values, .data$typology_change ) %>%
  rename ( geo = .data$code_2013 ) %>% 
  filter ( !is.na(.data$geo) ) %>%
  knitr::kable()
geo values typology_change
FR 84.66323 unchanged
UKI3 12.99526 unchanged
DED 30.99934 unchanged
FR7 64.72867 Recoded from FRK [used in NUTS 2016-2021]

And re-assuringly these data will be compatible with the next NUTS typology, too!

recode_nuts(example_df, nuts_year = 2021) %>%
  select ( .data$code_2021, .data$values, .data$typology_change ) %>%
  rename ( geo = .data$code_2021 ) %>% 
  filter ( !is.na(.data$geo) ) %>%
  knitr::kable()
geo values typology_change
FR 84.66323 unchanged
UKI3 12.99526 unchanged
HU12 97.05639 unchanged
DED 30.99934 unchanged
FRK 64.72867 unchanged

What about HU12?

data(nuts_changes) 
nuts_changes %>% 
  select( .data$code_2016, .data$geo_name_2016, .data$change_2016) %>%
  filter( code_2016 == "HU12") %>%
  filter( complete.cases(.) ) %>%
  knitr::kable()
code_2016 geo_name_2016 change_2016
HU12 Pest new region, equals ex-nuts 3 region hu102

The description in the correspondence tables clarifies that in fact historical data may be assembled for HU12 (Pest county.)

That will be the topic of a later vignette on aggregation and re-aggregation.