--- title: "Batch Importing ASC Files" author: "Austin Hurst" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Batch Importing ASC Files} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r results="hide", message=FALSE} # Import libraries required for the vignette require(eyelinker) require(dplyr) require(tibble) require(purrr) ``` Generally when working with eye tracking data, you're working with data from more than one participant. As such, you generally want to be able to write your analysis scripts to be able to batch import and merge a whole list of `.asc` files! There are a few different ways to do this, depending on your specific use. Which method you use will depend on what kind of information you want to extract from the files as well as the file sizes of the recordings. First, you'll need to get a vector with paths to the files you want to import. For actual projects you can do this with R's built-in `list.files` function, but for the sake of this vignette we'll load some file paths from the package example data: ```{r results="hide", message=FALSE} # Get full paths for all compressed .asc files in _Data/asc folder ascs <- list.files( "./_Data/asc", pattern = "*.asc.gz", full.names = TRUE, recursive = TRUE ) # Get paths of example files for batch import ascs <- c( system.file("extdata/mono250.asc.gz", package = "eyelinker"), system.file("extdata/mono500.asc.gz", package = "eyelinker"), system.file("extdata/mono1000.asc.gz", package = "eyelinker") ) ``` ## Single Event Type If you're only interested in importing a single event type (and that event type isn't raw samples), batch importing data can be done easily using `map_df` from the `purrr` package: ```{r} # Batch import and merge saccade data for all files sacc_dat <- map_df(ascs, function(f) { # Extract saccade data frame from file df <- read_asc(f, samples = FALSE)$sacc # Extract ID from file name and append to data as first column id <- gsub(".asc.gz", "", basename(f)) df <- add_column(df, asc_id = id, .before = 1) # Return data frame df }) # Batch import file metadata asc_info <- map_df(ascs, function(f) { # Extract metadata data frame from file df <- read_asc(f, samples = FALSE)$info # Extract ID from file name and append to data as first column id <- gsub(".asc.gz", "", basename(f)) df <- add_column(df, asc_id = id, .before = 1) # Return data frame df }) ``` Now let's take a look at the saccade data we batch-imported. As you can see, the saccades from all three data files have been merged into a single data frame with the first column identifying the source file: ```{r} sacc_dat ``` The batch-imported metadata is the same, with a single row for each participant. Reading in metadata this way makes it easy to identify any differences in eye tracker settings across participants (e.g. sample rate, eye tracked): ```{r} asc_info %>% select(c(asc_id, model, sample.rate, left, right, cr, screen.x, screen.y)) ``` All the `map_df` function does is take a list of inputs (in this case, our list of `.asc` files), runs the same wrangling code on each input separately, and then stacks the output into a single data frame. This will work as long as the data frames returned in the wrangling stage all have identical column names and column types. Note that you need to extract and append the file ID or participant ID and append it to the data in this stage, otherwise you won't be able to tell which rows belong to which file! ### Raw Samples If you're interested in batch-importing raw samples from multiple files you can use a similar approach but will need to keep RAM usage in mind. Remember that a single `.asc` file can contain millions of samples (especially at high sample rates), so anything you can do to cut down the amount of data from each file will help speed things up! A good approach for batch-importing raw sample data is to write a function that performs your desired preprocessing steps on the output from `read_asc` and then call that preprocessing function in `map_df`. For example, for a pupilometry study this function might window the pupil data for each trial to the region of interest using message timestamps (`asc$msg`), identify and interpolate blinks using the blink events identified by the tracker (`asc$blinks`), and then filter and downsample the pupil data before returning the data frame. ## Multiple Event Types For some use cases, the above approach will work perfectly fine. However, if your project involves analyzing *multiple* eye data types it can be needlessly slow to parse each `.asc` file multiple times to extract all the data you need. As an alternative, you can use R's built-in `lapply` function to import all data into a list and then process the contents of that list separately: ```{r} # Batch import full eye data (excluding raw samples) for all files eyedat <- lapply(ascs, function(f) { # Since importing can be slow, print out progress message for each file cat(paste0("Importing ", basename(f), "...\n")) # Actually import the data read_asc(f, samples = FALSE) }) # Extract names of files (excluding suffix) and use them as participant IDs asc_ids <- gsub(".asc.gz", "", basename(ascs)) names(eyedat) <- asc_ids # Parse fixation data from list fix_dat <- map_df(asc_ids, function(id) { # Grab fixation data from each file in the list & append ID eyedat[[id]]$fix %>% add_column(asc_id = id, .before = 1) }) # Parse blink data from list sacc_dat <- map_df(asc_ids, function(id) { # Grab saccade data from each file in the list & append ID eyedat[[id]]$sacc %>% add_column(asc_id = id, .before = 1) }) ``` ## Caching Imported Data Because importing a full dataset of high-resolution eye tracking recordings can be quite slow, it's often useful to cache your eye data after importing so you don't have to wait for it all to import again next time you run the script. To do this, you can save your eye data into an `.Rds` file that can be quickly loaded back in: ```{r} cache_path <- "./eyedata_cache.Rds" if (file.exists(cache_path)) { # If cached eye data already exists, load that to save time eyedat <- readRDS(cache_path) } else { # Otherwise, import all raw .asc files and cache them # [Insert import code that generates eyedat here] # Save the imported data for next run saveRDS(eyedat, file = cache_path) } ``` Note that if you make any changes to your import code, you will need to manually delete the cache file and re-run your import script for any changes to take effect!