--- title: "Enhanced Workflow with Labelled Data" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Enhanced Workflow with Labelled Data} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: markdown: wrap: sentence --- ```{r} #| label: setup #| include: false knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = TRUE, error = FALSE, fig.path = "static" ) vcr::setup_knitr(prefix = "lab-") nettskjemar:::mock_if_no_auth() ``` ## Introduction to using Labelled data Labelled data is a type of dataset where variables or their values contain additional metadata. These datasets, common in tools like SPSS or Stata, allow for better understanding and documentation of variables. The `{haven}` package in R is specifically designed for working with such datasets. It enhances metadata handling by allowing you to embed variable and value labels directly into your data. In this vignette, we’ll explore how labelled data works, leveraging key `{haven}` package functionalities to extract, inspect, and manipulate labels. ## Getting labelled data from nettskjemar ```{r} #| label: data #| cassette: true library(nettskjemar) # Replace this with your form ID formid <- 123823 data <- ns_get_data(formid) data ``` Here we have our example data, displayed as a standard data.frame. There is nothing particularly special about it. To continue, we also need to download the form codebook, as we need this to add the labels. ```{r} #| label: cb #| cassette: true cb <- ns_get_codebook(formid) cb ``` To add the labels, we use the `ns_add_labels` function, using both the unlabelled data and the codebook. ```{r} #| label: labels lab_data <- data |> ns_add_labels(cb) lab_data ``` You will notice that this does on the surface look completely normal, with no added extras. Inspecting the data with the `str` function will however expose what lies beneath the surface. ```{r} #| label: labels-str str(data) str(lab_data) ``` You can see there are lots of label attributes attached to `lab_data` that are not there in the `data` object. These labels are attached from the codebook, and provide important context to what the data source actually is. These hidden features of the data are unlocked when working with functions from the [{haven}](https://haven.tidyverse.org/) package. Notice how the metadata (variable and value labels) are now embedded in the dataset. ## Exploring Labelled Data with `{haven}` Once your data has been labelled, the `{haven}` package provides functionalities to inspect and manipulate labels with ease. ### Inspecting Labels Use `var_label()` to extract variable-level labels and `val_labels()` to extract value-level labels: ```{r} #| label: load-labelled library(labelled) ``` ```{r} #| label: var-lab # Variable labels var_label(lab_data$freetext) ``` ```{r} #| label: var-lab-2 # Value labels for 'radio' val_labels(lab_data$radio) ``` ### Modifying Labels If you need to modify labels, we suggest you do this directly in the Nettskjema codebook setup. However, if you are working on a form that is no longer available in Nettskjema, and you have downloaded and saved both the data and the codebook (or the labelled data), labels can be modified using `{haven}`: ```{r} #| label: var-lab-mod lab_data$freetex # Update variable-level label for 'freetext' var_label(lab_data$freetext) <- "Important freetext comment" lab_data$radio # Update value labels for 'radio' val_labels(lab_data$radio) <- c(Unhappy = -1, Happy = 1) # Check updated labels var_label(lab_data$freetext) val_labels(lab_data$radio) ``` ## Benefits of Using Labelled Data Some key benefits include the following: 1. **Enhanced Documentation**: Embedding metadata directly in the dataset improves clarity. 2. **Consistency**: Reduces ambiguity when working across different teams or systems. 3. **Compatibility**: Facilitates interoperability with SPSS, Stata, and other statistical software. For more information, check out the [labelled package documentation](https://larmarange.github.io/labelled/).