Automatic Variable Labeling

Creating a Data Dictionary

A dictionary is a data frame with two columns: variable (exact variable names) and description (the labels you want displayed). Column names are case-insensitive.

dictionary <- tibble::tribble(
  ~variable,    ~description,
  "trt",        "Chemotherapy Treatment",
  "age",        "Age at Enrollment (years)",
  "marker",     "Marker Level (ng/mL)",
  "stage",      "T Stage",
  "grade",      "Tumor Grade",
  "response",   "Tumor Response",
  "death",      "Patient Died"
)

dictionary
#> # A tibble: 7 × 2
#>   variable description              
#>   <chr>    <chr>                    
#> 1 trt      Chemotherapy Treatment   
#> 2 age      Age at Enrollment (years)
#> 3 marker   Marker Level (ng/mL)     
#> 4 stage    T Stage                  
#> 5 grade    Tumor Grade              
#> 6 response Tumor Response           
#> 7 death    Patient Died

In practice, you could load this from a CSV or define it once at the top of your analysis script.

Labeling gtsummary Tables

Pass the Dictionary Explicitly

trial |>
  tbl_summary(by = trt, include = c(age, grade, marker)) |>
  extras() |> 
  add_auto_labels(dictionary = dictionary)

	Overall N = 200¹	Drug A N = 98¹	Drug B N = 102¹	p-value²
Age	47 (38, 57)	46 (37, 60)	48 (39, 56)	0.718
Unknown	11	7	4
Grade				0.871
I	68 (34%)	35 (36%)	33 (32%)
II	68 (34%)	32 (33%)	36 (35%)
III	64 (32%)	31 (32%)	33 (32%)
Marker Level (ng/mL)	0.64 (0.22, 1.41)	0.84 (0.23, 1.60)	0.52 (0.18, 1.21)	0.085
Unknown	10	6	4
¹ Median (Q1, Q3); n (%)
² Wilcoxon rank sum test; Pearson’s Chi-squared test

Automatic Discovery

If a dictionary object exists in your environment, add_auto_labels() finds it without you passing it:

# dictionary already exists from above
trial |>
  tbl_summary(by = trt, include = c(age, stage, response)) |>
  extras() |> 
  add_auto_labels()

	Overall N = 200¹	Drug A N = 98¹	Drug B N = 102¹	p-value²
Age	47 (38, 57)	46 (37, 60)	48 (39, 56)	0.718
Unknown	11	7	4
T Stage				0.866
T1	53 (27%)	28 (29%)	25 (25%)
T2	54 (27%)	25 (26%)	29 (28%)
T3	43 (22%)	22 (22%)	21 (21%)
T4	50 (25%)	23 (23%)	27 (26%)
Tumor Response	61 (32%)	28 (29%)	33 (34%)	0.530
Unknown	7	3	4
¹ Median (Q1, Q3); n (%)
² Wilcoxon rank sum test; Pearson’s Chi-squared test

Pre-Labeled Data

If your data already has label attributes (e.g., from haven::read_sas() or manual assignment), add_auto_labels() reads those directly:

labeled_trial <- trial
attr(labeled_trial$age, "label") <- "Patient Age at Baseline"
attr(labeled_trial$marker, "label") <- "Biomarker Concentration (ng/mL)"

labeled_trial |>
  tbl_summary(by = trt, include = c(age, marker)) |>
  extras() |> 
  add_auto_labels()

	Overall N = 200¹	Drug A N = 98¹	Drug B N = 102¹	p-value²
Patient Age at Baseline	47 (38, 57)	46 (37, 60)	48 (39, 56)	0.718
Unknown	11	7	4
Biomarker Concentration (ng/mL)	0.64 (0.22, 1.41)	0.84 (0.23, 1.60)	0.52 (0.18, 1.21)	0.085
Unknown	10	6	4
¹ Median (Q1, Q3)
² Wilcoxon rank sum test

Manual Overrides Always Win

Labels set via label = list(...) in tbl_summary() always take priority over dictionary or attribute labels:

trial |>
  tbl_summary(
    by = trt,
    include = c(age, grade, marker),
    label = list(age ~ "Age (from tbl_summary function)")
  ) |>
  extras() |> 
  add_auto_labels(dictionary = dictionary)

	Overall N = 200¹	Drug A N = 98¹	Drug B N = 102¹	p-value²
Age (from tbl_summary function)	47 (38, 57)	46 (37, 60)	48 (39, 56)	0.718
Unknown	11	7	4
Grade				0.871
I	68 (34%)	35 (36%)	33 (32%)
II	68 (34%)	32 (33%)	36 (35%)
III	64 (32%)	31 (32%)	33 (32%)
Marker Level (ng/mL)	0.64 (0.22, 1.41)	0.84 (0.23, 1.60)	0.52 (0.18, 1.21)	0.085
Unknown	10	6	4
¹ Median (Q1, Q3); n (%)
² Wilcoxon rank sum test; Pearson’s Chi-squared test

Regression Tables

Works with tbl_regression() the same way:

lm(marker ~ age + grade + stage, data = trial) |>
  tbl_regression() |>
  add_auto_labels()

Characteristic	Beta	95% CI	p-value
Age at Enrollment (years)	0.00	-0.01, 0.01	>0.9
Tumor Grade
I	—	—
II	-0.35	-0.67, -0.04	0.027
III	-0.12	-0.43, 0.19	0.4
T Stage
T1	—	—
T2	0.33	-0.01, 0.67	0.057
T3	0.21	-0.17, 0.58	0.3
T4	0.14	-0.22, 0.50	0.4
Abbreviation: CI = Confidence Interval

Label Priority

When both dictionary labels and attribute labels exist for the same variable, attribute labels take priority by default:

Manual labels (from label = list(...) in tbl_summary()) always win
Attribute labels (from attr(data$var, "label")) take priority over dictionary
Dictionary labels are used as a fallback

We recommend setting options(sumExtras.prefer_dictionary = TRUE) so dictionary labels take priority over attribute labels. This is especially useful when your imported data has generic attribute labels but your dictionary has the labels you actually want in publication tables. See vignette("options") for details.

trial_both <- trial
attr(trial_both$age, "label") <- "Age from Attribute"

dictionary_conflict <- tibble::tribble(
  ~variable, ~description,
  "age", "Age from Dictionary"
)

# Attribute wins over dictionary
trial_both |>
  tbl_summary(by = trt, include = age) |>
  add_auto_labels(dictionary = dictionary_conflict) |>
  extras()

	Overall N = 200¹	Drug A N = 98¹	Drug B N = 102¹	p-value²
Age from Attribute	47 (38, 57)	46 (37, 60)	48 (39, 56)	0.718
Unknown	11	7	4
¹ Median (Q1, Q3)
² Wilcoxon rank sum test

Automatic Labeling via Options

If you always keep a dictionary in your environment, you can skip calling add_auto_labels() entirely. Set this once per session (or put it in your .Rprofile):

options(sumExtras.auto_labels = TRUE)

Now every extras() call picks up the dictionary automatically:

dictionary <- tibble::tribble(
  ~variable,    ~description,
  "age",        "Age at Enrollment (years)",
  "marker",     "Marker Level (ng/mL)",
  "grade",      "Tumor Grade"
)

# No add_auto_labels() needed
trial |>
  tbl_summary(by = trt) |>
  extras()

If no dictionary is found and the data has no label attributes, extras() continues normally. If something goes wrong, it warns and moves on. You can still call add_auto_labels() explicitly whenever you need per-table control.

See vignette("options") for more on .Rprofile setup.

More Vignettes

vignette("sumExtras-intro") – getting started with extras()
vignette("styling") – group headers and advanced formatting
vignette("themes") – JAMA compact themes for {gtsummary} and {gt}