--- title: "Handling metadata and annotations" author: "AlpsNMR authors" package: AlpsNMR abstract: > This vignette shows some examples on how to explore sample metadata and add additional sample annotations, coming from one or more CSV or Excel files. date: "`r format(Sys.Date(), '%F')`" output: BiocStyle::pdf_document: latex_engine: lualatex vignette: > %\VignetteIndexEntry{Vignette 02: Handling metadata and annotations} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Getting started We start by loading `AlpsNMR` and some convenience libraries: ```{r load-libraries, message=FALSE, warning=FALSE} library(dplyr) library(readxl) library(AlpsNMR) ``` We also load the demo samples, see the introduction vignette for further details: ```{r load-samples} MeOH_plasma_extraction_dir <- system.file("dataset-demo", package = "AlpsNMR") zip_files <- list.files(MeOH_plasma_extraction_dir, pattern = glob2rx("*.zip"), full.names = TRUE) dataset <- nmr_read_samples(sample_names = zip_files) dataset <- nmr_interpolate_1D(dataset, axis = NULL) dataset ``` ```{r} plot(dataset, chemshift_range = c(3.4, 3.6)) ``` # Exploring the sample metadata Most NMR formats include besides the actual NMR spectra, a lot of additional information describing the acquisition properties, instrument settings, and spectral processing information. `AlpsNMR` parses all that information whenever possible, and stores it in the `nmr_dataset`object, so the user can inspect it. Since there may be a lot of information, the data is stored in several data frames. The available data frames are: ```{r} nmr_meta_groups(dataset) ``` We can further explore each of those groups. For instance, for the `acqus` group we find `r ncol(nmr_meta_get(dataset, groups = "acqus"))` columns: ```{r} acqus_metadata <- nmr_meta_get(dataset, groups = "acqus") acqus_metadata ``` Here follows a long list of all the columns available: ```{r} colnames(acqus_metadata) ``` We can check for instance that the nuclei used on all samples is 1H: ```{r} acqus_metadata[, c("NMRExperiment", "acqus_NUC1")] ``` Similarly, we can obtain the processing settings: ```{r} procs_metadata <- nmr_meta_get(dataset, groups = "procs") procs_metadata ``` # Sample annotations Besides the sample metadata, most studies usually have design variables or annotations, that describe the biological sample. These annotations do not come from the instrument itself, but rather usually are defined on an *external* CSV or Excel file. `AlpsNMR` supports adding *external* annotations from data frames. Let's load a table from an Excel file, that has some annotations for our demo dataset: ```{r} excel_file <- file.path(MeOH_plasma_extraction_dir, "dummy_metadata.xlsx") subject_timepoint <- read_excel(excel_file, sheet = 1) subject_timepoint ``` Note how this table includes a first column named `NMRExperiment`. This column allows us to match the rows in the table with our samples. We can embed these external annotations in our dataset: ```{r} dataset <- nmr_meta_add(dataset, metadata = subject_timepoint, by = "NMRExperiment") ``` We can retrieve these *external* columns from the dataset: ```{r} nmr_meta_get(dataset, groups = "external") ``` After adding the annotations to the dataset, we can use them in plots: ```{r} plot(dataset, color = "TimePoint", linetype = "SubjectID", chemshift_range = c(3.4, 3.6)) ``` # Further annotations Sometimes due to the study design we have more than one table that we want to match with our data. For instance, a collaborator just sent us this table: ```{r} additional_annotations <- data.frame( NMRExperiment = c("10", "20", "30"), SampleCollectionDay = c(1, 91, 3) ) additional_annotations ``` Since we have the `NMRExperiment` column it is very easy to include it: ```{r} dataset <- nmr_meta_add(dataset, additional_annotations) ``` And the column has been added: ```{r} nmr_meta_get(dataset, groups = "external") ``` We received further information, but this time it is related to the `SubjectID` that we added before: ```{r} subject_related_information <- data.frame( SubjectID = c("Ana", "Elia"), Age = c(33, 3), Sex = c("female", "female") ) subject_related_information ``` Note how in this case we only have two rows, and we don't have the `NMRExperiment` column anymore. We can specify the `by` argument in `nmr_meta_add()` to use another column for merging: ```{r} dataset <- nmr_meta_add(dataset, subject_related_information, by = "SubjectID") ``` And the Sex and Age columns will have been added: ```{r} nmr_meta_get(dataset, groups = "external") ``` We can also use it in a plot: ```{r} plot(dataset, color = "SubjectID", linetype = "as.factor(Age)", chemshift_range = c(7.7, 7.8)) + ggplot2::labs(linetype = "Age") ``` # Summary In this vignette we have seen how to explore the sample metadata, including acquisition and processing settings, and how to embed external annotations and use them in plots. `AlpsNMR` is able to merge external annotations as long as there is a common annotation in the data that can be used as merging key. To import external data, you may want to use the following functions: | File type | Suggested function | | ----------| ---------------------- | | CSV | `readr::read_csv()` | | TSV | `readr::read_tsv()` | | SPSS | `haven::read_spss()` | | xls/xlsx | `readxl::read_excel()` | # Session Information ```{r} sessionInfo() ```