--- title: "gDRutils" author: "gDR team" output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{gDRutils} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(gDRutils) suppressPackageStartupMessages(library(MultiAssayExperiment)) ``` # Overview `gDRutils` is part of the `gDR` suite. This package provides a bunch of tools for, among others: * data manipulation, especially output of the `gDRcore` package (`MultiAssayExperiments` and `SummarizedExperiment`), * data extraction, * managing identifiers used for creating `gDR` experiments, * data validation. # Use cases ## Data manipulation The basic output of `gDRcore` package is the `MultiAssayExperiment` object. Function `MAEpply` allows for the data manipulation of this object, and can be used in a similar way as a basic function `lapply`. ```{r} mae <- get_synthetic_data("finalMAE_combo_matrix_small") MAEpply(mae, dim) ``` ```{r} MAEpply(mae, rowData) ``` This function allows also for extraction of unified data across all the `SummarizedExperiment`s inside `MultiAssayExperiment`, e.g. ```{r} MAEpply(mae, rowData, unify = TRUE) ``` ## Data extraction All the metrics data are stored inside `assays` of `SummarizedExperiment`. For the downstream analyses we provide tools allowing for the extraction of the data into user-friendly `data.table` style. There is a function working on the `MultiAssayExperiment` object as well as a set of functions working on the `SummarizedExperiment` object: * convert_mae_assay_to_dt * convert_se_assay_to_dt * convert_se_assay_to_custom_dt * convert_combo_data_to_dt ```{r} mdt <- convert_mae_assay_to_dt(mae, "Metrics") head(mdt, 3) ``` or alternatively for `SummarizedExperiment` object: ```{r} se <- mae[[1]] sdt <- convert_se_assay_to_dt(se, "Metrics") head(sdt, 3) ``` ## Managing gDR identifiers ### Overview In `gDR` we require standard identifiers that should be visible in the input data, such as e.g. `Gnumber`, `CLID`, `Concentration`. However, user can define their own custom identifiers. To display gDR default identifier they can use `get_env_identifiers` function: ```{r} get_env_identifiers() ``` To change any of these identifiers user can use `set_env_identifier`, e.g. ```{r} set_env_identifier("concentration", "Dose") ``` and confirm, by displaying: ```{r} get_env_identifiers("concentration") ``` To restore default identifiers user can use `reset_env_identifiers`. ```{r} reset_env_identifiers() ``` ```{r} get_env_identifiers("concentration") ``` ### Validating identifiers The `validate_identifiers` function checks if the specified identifier values exist in the data and (if needed) tries to modify them to pass validation. ```{r} # Example data.table dt <- data.table::data.table( Barcode = c("A1", "A2", "A3"), Duration = c(24, 48, 72), Template = c("T1", "T2", "T3"), clid = c("C1", "C2", "C3") ) # Validate identifiers validated_identifiers <- validate_identifiers( dt, req_ids = c("barcode", "duration", "template", "cellline") ) print(validated_identifiers) ``` In detail, `validate_identifiers` wraps the following steps: * modify identifier values to reflect the data, handling many-to-one mappings via the `.modify_polymapped_identifiers` function * ensure that all required identifiers are present in the data via the `.check_required_identifiers` function * check for polymapped identifiers in the data via the `.check_polymapped_identifiers` function ### Prettifying identifiers Prettifying identifiers means making them more user-friendly and human-readable and is handled by the `prettify_flat_metrics` function. Please see [the relevant section](#prettifying) for more details. ```{r} # Example of prettifying identifiers x <- c("CellLineName", "Tissue", "Concentration_2") prettified_names <- prettify_flat_metrics(x, human_readable = TRUE) print(prettified_names) ``` ## Data validation Applied custom changes in the gDR output can disrupt internal functions operation. Custom changes can be validated using `validate_MAE` ```{r} validate_MAE(mae) ``` or `validate_SE`. ```{r} validate_SE(se) ``` ```{r, error=TRUE, purl = FALSE} assay(se, "Normalized") <- NULL validate_SE(se) ``` There is also a group of functions to validate data used in the gDR application: * is_combo_data * has_single_codrug_data * has_valid_codrug_data * get_additional_variables ## Prettifying Prettifying involves transforming data into a more descriptive and human-readable version. This is particularly useful for front-end applications where user-friendly names are preferred over technical or abbreviated terms. In gdrplatform there are two entities that can be prettified: * colnames of data.tables * assay names ### Colnames of data.table(s) One can prettify the columns of the data.table(s) with a single function called `prettify_flat_metrics`. ``` dt <- get_testdata()[["raw_data"]] colnames(dt) prettify_flat_metrics(colnames(dt), human_readable = TRUE) ``` The `prettify_flat_metrics` function is in fact a wrapper for the following actions: * conversion of the normalization-specific metric names via the `.convert_norm_specific_metrics` function * moving the GDS source info to the end of the column name via the `.prettify_GDS_columns` * prettifying the metadata columns via the `.prettify_metadata_columns` function * prettifying the metric columns via the `.prettify_metric_columns` function * prettifying the co-treatment column names. via the `.prettify_cotreatment_columns` * minor corrections (removal of 'gDR' and "_" prefixes, removal of spaces at the end/beginning, other) In case of data.table(s) with combo excess and score assays some of the columns are prettified with the dedicated helper functions instead of using `prettify_flat_metrics`: * get_combo_excess_field_names() * get_combo_score_field_names() These helpers depend on the DATA_COMBO_INFO_TBL, (gDRutils) internal data.table. ### Assay names The function `get_assay_names` is the primary solution for obtaining prettified versions of the assay names. It wraps the `get_env_assay_names` function which depends on ASSAY_INFO_TBL, (gDRutils) internal data.table. There are some functions that wrap the `get_assay_names` function for combo data: * get_combo_assay_names * get_combo_score_assay_names * get_combo_base_assay_names # SessionInfo {-} ```{r sessionInfo} sessionInfo() ```