tcpl v3.0
Data Retrieval

Center for Computational Toxicology and Exposure, US EPA

R Packages

# Primary Packages #
library(tcpl)
library(tcplfit2)
# Data Formatting Packages #
library(dplyr)
library(magrittr)
# Plotting Packages #
library(ggplot2)
library(RColorBrewer)
library(colorspace)
library(viridis)
# Table Packages #
library(htmlTable)
library(kableExtra)

Introduction

This vignette describes how the user can retrieve data from the ToxCast database, known as invitrodb, using tcpl. The MySQL version of the ToxCast database containing all the publicly available ToxCast data is available for download at: https://www.epa.gov/chemical-research/exploring-toxcast-data-downloadable-data.

NOTE:

Users must be connected to the ToxCast database (invitrodb), or a replicate of the database, to utilize many of these functions and execute the examples in this vignette. Please see the introductory vignette in this package for more details.

Overview of Key Functions

To support different data retrieval needs within tcpl, there are a number of functions which query the database and return information to the local R session.

Overview of Data Nomenclature

Throughout this vignette we will use abbreviated designations for data retrieved from the database or to refer to processing steps within tcpl. For data from single concentration assays we use ‘SC.’ ‘MC’ is used for assay data with multiple concentrations. A particular data or processing level is indicated by appending the level id/number to the end of the ‘SC’ or ‘MC’ designation. For example, if we are discussing single concentration data from level 2 processing, then we will use the abbreviation ‘SC2.’

Assay Elements

The tcplLoadAsid, tcplLoadAid, tcplLoadAcid, and tcplLoadAeid functions load relevant assay ids and names for the respective assay elements based on the user specified parameters.

# List all assay source IDs
tcplLoadAsid() 
# Create table of all assay endpoint ids (aeids) per assay source
aeids <- tcplLoadAeid(fld="asid", # field to query on
                      val=14, # value for each field
                              # values should match their corresponding 'fld'
                      add.fld = c("aid", "anm", "acid", "acnm")) # additional fields to return

Data

The tcplQuery function allows a user to provide an SQL query to load data from the MySQL database into the R session. In the following chunk we provide an example, but any valid SQL query can replace the one provided in our example.

# Load sample table using a MySQL query.
samples <- tcplQuery("SELECT * FROM sample;")

The tcplLoadData function can be used to load the data from the MySQL database into the R session. Further, the tcplPrepOtpt function can be used in combination with tcplLoadData to add useful chemical and assay annotation information, mapped to the retrieved data.

# Load multi concentration data from level 2,
# and map only the chemical annotation information.
mc2_fmtd <- tcplPrepOtpt(
  tcplLoadData(
    lvl = 2, # data level
    fld = 'acid', # field to query on
    val = 49, # value for each field
             # values should match their corresponding 'fld'
    type = 'mc' # data type
  ),
  ids = 'spid' # additional annotation fields to add - just chemical info
               # - (Default): map assay and chemical annotation
               # - 'acid' OR 'aeid': map only assay annotation
               # - 'spid': map only chemical annotation
)
# Print the first 6 rows of 'mc2_fmtd'
head(mc2_fmtd)

When loading data, the user must indicate the applicable fields and ids for the corresponding data level of interest. Loading level 0 (SC0 and MC0), MC1, and MC2 data the assay component id (\(\mathit{acid}\)) will always be used. As described in Table 1 of the tcpl Data Processing vignette, SC1 and MC3 processing levels perform data normalization where assay component ids (\(\mathit{acid}\)) are converted to assay endpoint ids (\(\mathit{aeid}\)). Thus, the SC1 and MC3 data tables contain both \(\mathit{acid}\) and (\(\mathit{aeid}\)) ID’s. Data can be loaded using either id as long as it is properly specified. Loading SC2, MC4, and MC5, one should always use the assay endpoint id (\(\mathit{aeid}\)). Selected id(s) are based on the primary key within each table containing data. Examples of loading data are detailed in later sections.

Assay Annotations

Assay source, assay, assay component, and assay endpoint are registered via tcpl scripting into a collection of tables. The database structure takes the annotations and organizes them as attributes of the assay conductors, the assays (i.e., experiments), the assay components (i.e., raw readouts), or the assay endpoints (i.e., normalized component data) enabling aggregation and differentiation of the data generated through ToxCast and Tox21. The annotations capture four types of information:

  1. Identification information
  2. Design information such as the technology, format, and objective aspects that decompress the assay’s innovations,
  3. Target information such as the target of technological measurement and the biologically intended target, and
  4. Analysis information about how the data were processed and analyzed.
#load libraries and connections
library(RMySQL)
con <- dbConnect(drv = RMySQL::MySQL(), user="user", pass="pass", db="InvitroDB", host="host")
#query database using RMySQL:
#use source table to identify which ids are needed in subsequent queries.
tcplLoadAsid()
source <- tcplLoadAeid(fld="asid", val=1, add.fld = c("aid", "anm", "acid", "acnm"))
#select annotation and subset by ids or name
assay <- dbGetQuery(con, "SELECT * FROM invitrodb.assay where aid=1;")
component <- dbGetQuery(con, "SELECT * FROM invitrodb.assay_component;")
component <- subset(component, acid %in% source$acid)
endpoint <- dbGetQuery(con, "SELECT * FROM invitrodb.assay_component_endpoint;")
endpoint <- endpoint[grepl("ATG", endpoint$assay_component_endpoint_name),]

Chemical Information

The tcplLoadChem function returns chemical information for user specified parameters, e.g. the chemical name (chnm) and chemical id (chid). The tcplLoadClib function provides more information about the ToxCast chemical library used for sample generation.

Methods

The tcplMthdList function returns methods available for processing at a specified level (i.e. step in the tcpl pipeline). The user defined function in the following code chunk utilizes the tcplMthdList function to retrieve and output all available methods for both the SC and MC data levels.

# Create a function to list all available methods function (SC & MC).
method_list <- function() {
  # Single Concentration
  ## Level 1
  sc1 <- tcplMthdList(1, 'sc')
  sc1[, lvl := "sc1"]
  setnames(sc1, c("sc1_mthd", "sc1_mthd_id"), c("mthd", "mthd_id"))
  ## Level 2
  sc2 <- tcplMthdList(2, 'sc')
  sc2[, lvl := "sc2"]
  setnames(sc2, c("sc2_mthd", "sc2_mthd_id"), c("mthd", "mthd_id"))
  
  # Multiple Concentration
  ## Level 2
  mc2 <- tcplMthdList(2, 'mc')
  mc2[, lvl := "mc2"]
  setnames(mc2, c("mc2_mthd", "mc2_mthd_id"), c("mthd", "mthd_id"))
  ## Level 3
  mc3 <- tcplMthdList(3, 'mc')
  mc3[, lvl := "mc3"]
  setnames(mc3, c("mc3_mthd", "mc3_mthd_id"), c("mthd", "mthd_id"))
  ## Level 4
  mc4 <- tcplMthdList(4, 'mc')
  mc4[, lvl := "mc4"]
  setnames(mc4, c("mc4_mthd", "mc4_mthd_id"), c("mthd", "mthd_id"))
  ## Level 5
  mc5 <- tcplMthdList(5, 'mc')
  mc5[, lvl := "mc5"]
  setnames(mc5, c("mc5_mthd", "mc5_mthd_id"), c("mthd", "mthd_id"))
  # Compile the Output
  mthd.list <- rbind(sc1, sc2, mc2, mc3, mc4, mc5)
  mthd.list <- mthd.list[, c("lvl", "mthd_id", "mthd", "desc")]
  # Return the Results
  return(mthd.list)
}

# Run the 'method_list' functions and store output.
amthds <- method_list()
# Print the available methods list.
amthds

The tcplMthdLoad function returns the method assignments for specified id(s). Later sections provide more detailed examples for utilizing the tcplMthdLoad function for individuals ids.

Retrieving Level 0 Data

Prior to the pipeline processing provided in this package, all the data must go through pre-processing, i.e. raw data to database level 0 data. Pre-processing the data should transform data from heterogeneous assays into a uniform format. This is executed using dataset specific R scripts. After pre-processing is complete and the formatted data matches the level 0 format, it can be loaded into the database using tcplWriteLvl0, as described in the tcpl Data Processing vignette. The standard level 0 format is identical for both testing paradigms, SC or MC. Users can inspect the level 0 data and calculate assay quality metrics prior to running the processing pipeline.

Load SC0 Data

# Load Level 0 single concentration data for a single acid to R.
sc0 <- tcplLoadData(lvl=0, # data level
                    fld="acid", # field to query on
                    val=1, # value for each field
                           # values should match their corresponding 'fld'
                    type = "sc") # data type - single concentration

# Alternatively, load data in and format with tcplPrepOtpt.
sc0 <- tcplPrepOtpt(tcplLoadData(lvl=0, fld="acid", val=1, type = "sc"))

Since we are not able to connect to the database directly in this vignette, we have provided a sample dataset in the package to illustrate what the results should look like.

# Load the example data from the package.
data(sc_vignette,package = 'tcpl')
# Save the single concentration level 0 data in the 'sc0' object.
sc0 <- sc_vignette[["sc0"]]
# Print the first 6 rows of the data.
head(sc0) %>%
  # format output into a table
  kbl() %>%
  # format the output rendering to allow horizontal scrolling
  scroll_box(width = "100%") %>% 
  # reduce the size of the table text to improve readability
  kable_styling(font_size = 10)
spid chid casn chnm dsstox_substance_id code acid acnm s0id apid rowi coli wllt wllq conc rval srcf conc_unit
TP0000073D03 34212 118134-30-8 Spiroxamine DTXSID1034212 C118134308 111 ATG_RXRb_TRANS 9940119 NA NA NA t 1 45.8 1.0599632 CLIN11 PlateTP0000049.xls uM
TP0000073G09 20122 86-50-0 Azinphos-methyl DTXSID3020122 C86500 111 ATG_RXRb_TRANS 9970455 NA NA NA t 1 50.0 0.9581900 CLIN11 PlateTP0000049.xls uM
TP0000075H04 21166 51-03-6 Piperonyl butoxide DTXSID1021166 C51036 111 ATG_RXRb_TRANS 10045157 NA NA NA t 1 50.0 0.9531718 CLIN11 PlateTP0000050.xls uM
TP0000077B04 24102 22224-92-6 Fenamiphos DTXSID3024102 C22224926 111 ATG_RXRb_TRANS 10062416 NA NA NA t 1 50.0 1.1310499 CLIN11 PlateTP0000051.xls uM
TP0000077B09 24195 94-74-6 2-(4-Chloro-2-methylphenoxy)acetic acid DTXSID4024195 C94746 111 ATG_RXRb_TRANS 10066027 NA NA NA t 1 50.0 0.8759538 CLIN11 PlateTP0000051.xls uM
TP0000077B10 32398 131341-86-1 Fludioxonil DTXSID2032398 C131341861 111 ATG_RXRb_TRANS 10066756 NA NA NA t 1 19.3 12.7624120 CLIN11 PlateTP0000051.xls uM

Load MC0 Data

# Load Level 0 multiple concentration data.
mc0 <- tcplPrepOtpt(
  tcplLoadData(lvl=0, # data level
               fld="acid", # field to query on
               val=1, # value for each field
                      # values should match their corresponding 'fld'
               type = "mc") # data type - multiple concentrations
)

We again can use one of the provided datasets in this package to demonstrate what the above results should look like.

# Load the example data from the package.
data(mc_vignette,package = 'tcpl')
# Save the multiple concentration level 0 data in the 'mc0' object.
mc0 <- mc_vignette[["mc0"]]
# Print the first 6 rows of the data.
head(mc0) %>%
  # format output into a table
  kbl() %>%
  # format the output rendering to allow horizontal scrolling
  scroll_box(width = "100%") %>% 
  # reduce the size of the table text to improve readability
  kable_styling(font_size = 10)
spid chid casn chnm dsstox_substance_id code acid acnm m0id apid rowi coli wllt wllq conc rval srcf conc_unit
01504209 379721 2264-01-9 1H,1H,6H,6H-Perfluorohexane-1,6-diol diacrylate DTXSID80379721 C2264019 49 ATG_GLI_CIS 626391856 TO-17-1CD NA NA t 1 0.412 1.0183150 EPA-TO17-Part2-ATTAGENE-CIS-FACTORIAL-DATA-May-10-2019.xlsx uM
01504209 379721 2264-01-9 1H,1H,6H,6H-Perfluorohexane-1,6-diol diacrylate DTXSID80379721 C2264019 49 ATG_GLI_CIS 626391857 TO-17-1CD NA NA t 1 11.100 0.9848485 EPA-TO17-Part2-ATTAGENE-CIS-FACTORIAL-DATA-May-10-2019.xlsx uM
01504209 379721 2264-01-9 1H,1H,6H,6H-Perfluorohexane-1,6-diol diacrylate DTXSID80379721 C2264019 49 ATG_GLI_CIS 626391850 TO-17-1CD NA NA t 1 33.300 1.0134680 EPA-TO17-Part2-ATTAGENE-CIS-FACTORIAL-DATA-May-10-2019.xlsx uM
01504209 379721 2264-01-9 1H,1H,6H,6H-Perfluorohexane-1,6-diol diacrylate DTXSID80379721 C2264019 49 ATG_GLI_CIS 626391858 TO-17-1CD NA NA t 1 1.230 0.9882155 EPA-TO17-Part2-ATTAGENE-CIS-FACTORIAL-DATA-May-10-2019.xlsx uM
01504209 379721 2264-01-9 1H,1H,6H,6H-Perfluorohexane-1,6-diol diacrylate DTXSID80379721 C2264019 49 ATG_GLI_CIS 626391851 TO-17-1CD NA NA t 1 0.412 1.0860806 EPA-TO17-Part2-ATTAGENE-CIS-FACTORIAL-DATA-May-10-2019.xlsx uM
01504209 379721 2264-01-9 1H,1H,6H,6H-Perfluorohexane-1,6-diol diacrylate DTXSID80379721 C2264019 49 ATG_GLI_CIS 626391859 TO-17-1CD NA NA t 1 11.100 1.0858586 EPA-TO17-Part2-ATTAGENE-CIS-FACTORIAL-DATA-May-10-2019.xlsx uM

Review MC assay quality

The goal of this section is to provide example quantitative metrics, such as z-prime and coefficient of variance, to evaluate assay performance relative to controls.

# Create a function to review assay quality metrics using indexed Level 0 data.
aq <- function(ac){
  # obtain level 1 multiple concentration data for specified acids
  dat <- tcplPrepOtpt(tcplLoadData(1L, "acid", aeids$acid, type="mc"))
  
  # keep only observations with good well quality (wllq = 1)
  dat <- dat[wllq==1]
  
  # obtain summary values for data and remove missing data (i.e. NA's)
  agg <- dat[ ,
              list(
                # median response values (rval) of neutral wells (wllt = n)
                nmed = median(rval[wllt=="n"], na.rm=TRUE), 
                # median absolute deviation (mad) of neutral wells (wllt = n)
                nmad = mad(rval[wllt=="n"], na.rm=TRUE), 
                # median response values of positive control wells (wllt = p)
                pmed = median(rval[wllt=="p"], na.rm=TRUE),
                # median absolute deviation of positive control wells (wllt = p)
                pmad = mad(rval[wllt=="p"], na.rm=TRUE),
                # median response values of negative control wells (wllt = m)
                mmed = median(rval[wllt=="m"], na.rm=TRUE),
                # median absolute deviation of negative control wells (wllt = m)
                mmad = mad(rval[wllt=="m"], na.rm=TRUE)
                ),
              # aggregate on assay component id, assay component name,
              # and assay plate id
              by = list(acid, acnm, apid)]
  
  # Z prime factor: separation between positive and negative controls,
  # indicative of likelihood of false positives or negatives. 
  # - Between 0.5 - 1 are excellent,
  # - Between 0 and 0.5 may be acceptable,
  # - Less than 0 not good
  # obtain the z-prime factor for positive controls and neutral
  agg[ , zprm.p := 1 - ((3 * (pmad + nmad)) / abs(pmed - nmed))]  
  # obtain the z-prime factor for negative controls and neutral
  agg[ , zprm.m := 1 - ((3 * (mmad + nmad)) / abs(mmed - nmed))]
  
  agg[ , ssmd.p := (pmed - nmed) / sqrt(pmad^2 + nmad^2 )]
  agg[ , ssmd.m := (mmed - nmed) / sqrt(mmad^2 + nmad^2 )]
  
  # Coefficient of Variation (cv) of neutral control
  # - Ideally should be under 25%
  agg[ , cv     := nmad / nmed] 
  
  agg[ , sn.p :=  (pmed - nmed) / nmad]
  agg[ , sn.m :=  (mmed - nmed) / nmad]
  agg[ , sb.p :=  pmed / nmed]
  agg[ , sb.m :=  mmed / nmed]
  
  agg[zprm.p<0, zprm.p := 0]
  agg[zprm.m<0, zprm.m := 0]
  
  acqu <- agg[ , list( nmed   = signif(median(nmed, na.rm = TRUE)),
                       nmad   = signif(median(nmad, na.rm = TRUE)),
                       pmed   = signif(median(pmed, na.rm = TRUE)),
                       pmad   = signif(median(pmad, na.rm = TRUE)),
                       mmed   = signif(median(mmed, na.rm = TRUE)),
                       mmad   = signif(median(mmad, na.rm = TRUE)),
                       zprm.p = round(median(zprm.p, na.rm=TRUE),2),
                       zprm.m = round(median(zprm.m, na.rm=TRUE),2),
                       ssmd.p = round(median(ssmd.p, na.rm=TRUE),0),
                       ssmd.m = round(median(ssmd.m, na.rm=TRUE),0),
                       cv = round(median(cv, na.rm=TRUE),2),
                       sn.p = round(median(sn.p, na.rm=TRUE),2),
                       sn.m = round(median(sn.m, na.rm=TRUE),2),
                       sb.p = round(median(sb.p, na.rm=TRUE),2),
                       sb.m = round(median(sb.m, na.rm=TRUE),2)
  ), by = list(acid, acnm)]
  # Return the Results.
  return(acqu)
} #per acid 

# Run the 'aq' function & store the output. 
assayq <- aq(ac)
# Print the first 6 rows of the assay quality results.
head(assayq)

Retrieving Processed Single-Concentration (SC) Data and Methods

The goal of SC processing is to identify potentially active compounds from a large screen at a single concentration. After processing, users can inspect SC activity hit calls and the applied methods.

Load SC2 Data

# Load Level 2 single concentration data for a single aeid.
sc2 <- tcplPrepOtpt(
  tcplLoadData(lvl=2, # data level
               fld="aeid", # id field to query on
               val=3, # value for the id field
               type = "sc") # data type - single concentration
)
# Alternatively, data for a set of aeids can be loaded with a vector of ids.
sc2 <- tcplPrepOtpt(
  tcplLoadData(lvl=2, fld="aeid", val=aeids$aeid, type = "sc")
)

Load SC Methods

# Create a function to load methods for single concentration data processing
# steps for given aeids.
sc_methods <- function(aeids) {
  # load the level 1 methods assigned for the single concentration aeid's
  sc1_mthds <- tcplMthdLoad(lvl=1, type ="sc", id=aeids$aeid)
  # aggregate the method id's by aeid
  sc1_mthds<- aggregate(mthd_id ~ aeid, sc1_mthds, toString)
  # reset the names of the sc1_mthds object
  setnames(sc1_mthds, "mthd_id", "sc1_mthd_id")
  
  # load the level 2 methods assigned for the single concentration aeid's
  sc2_mthds <- tcplMthdLoad(lvl=2, type ="sc", id=aeids$aeid)
  # aggregate the method id's by aeid
  sc2_mthds<- aggregate(mthd_id ~ aeid, sc2_mthds, toString)
  # reset the names of the sc2_mthds object
  setnames(sc2_mthds, "mthd_id", "sc2_mthd_id")
  
  # Compile the Output 
  methods <- merge( merge(aeids, sc1_mthds,  by = "aeid", all = TRUE), 
                  sc2_mthds, by = "aeid", all = TRUE )
  # Return the Results
  return(methods)
}

# Run the 'sc_methods' function and store the output.
smthds <- sc_methods(aeids)

# Print the assigned sc methods.
smthds

Retrieving Processed Multi-Concentration (MC) Data and Methods

The goal of MC processing is to estimate the hitcall, potency, efficacy, and other curve-fitting parameters for sample-assay endpoint pairs. After processing, users can inspect the activity hitcalls, model parameters, concentration-response plots, and the applied methods for the multiple concentration data.

Load MC5 Data

# Load Level 5 MC data summary values for a set of aeids.
# (NOTE: As before, the user can obtain data for individual aeids.)
mc5 <- tcplPrepOtpt(
  tcplLoadData(lvl=5, # data level
               fld="aeid", # fields to query on
               val=aeids$aeid, # value for each field
                               # values should match their corresponding 'fld'
               type = "mc") # data type - MC
)

# For tcpl v3.0.0 and future releases, to output mc5_param information with
# the default mc5 results then 'add.fld' must be set to TRUE.
# (NOTE: Default for add.fld is FALSE, unless otherwise specified.)
mc5 <- tcplPrepOtpt(
  tcplLoadData(lvl=5, # data level
               fld="aeid", # fields to query on
               val=aeids$aeid, # value for each field
                               # values should match their corresponding 'fld'
               type = "mc", # data type - multiple concentration
               add.fld=TRUE) # return additional parameters from mc5_param 
  )

Load MC Methods

# Create a function to load methods for MC data processing
# for select aeids.
mc_methods <- function(aeids) {
  # acid
  ## load the methods assigned to level 2 for given acids
  mc2_mthds <- tcplMthdLoad(2,aeids$acid)
  ## aggregate the assigned methods by acid
  mc2_mthds<- aggregate(mthd_id ~ acid, mc2_mthds, toString)
  ## rename the columns for the 'mc2_mthds' object
  setnames(mc2_mthds, "mthd_id", "mc2_mthd_id")
  
  # aeid
  ## load the methods assigned to level 3 for given aeids
  mc3_mthds <- tcplMthdLoad(3,aeids$aeid)
  ## aggregate the assigned methods by aeid
  mc3_mthds<- aggregate(mthd_id ~ aeid, mc3_mthds, toString)
  ## rename the columns for the 'mc3_mthds' object
  setnames(mc3_mthds, "mthd_id", "mc3_mthd_id")
  ## load the methods assigned to level 4 for given aeids
  mc4_mthds <- tcplMthdLoad(4,aeids$aeid)
  ## aggregate the assigned methods by aeid
  mc4_mthds<- aggregate(mthd_id ~ aeid, mc4_mthds, toString) 
  ## rename the columns for 'mc4_mthds' object
  setnames(mc4_mthds, "mthd_id", "mc4_mthd_id")
  ## load the methods assigned to level 5 for given aeids
  mc5_mthds <- tcplMthdLoad(5,aeids$aeid)
  ## aggregate the assigned methods by aeid
  mc5_mthds<- aggregate(mthd_id ~ aeid, mc5_mthds, toString)
  ## rename the columns for 'mc5_mthds' object
  setnames(mc5_mthds, "mthd_id", "mc5_mthd_id")

  # Compile the Results.
  ## merge the aeid information with the level 2 methods by acid
  acid.methods <- merge(aeids, mc2_mthds,by.x = "acid", by.y = "acid")
  ## merge the level 3, 4, and 5 methods by aeid
  mthd35 <- merge(
    merge(mc3_mthds, mc4_mthds, by = "aeid", all = TRUE),
    mc5_mthds, by = "aeid", all = TRUE
    )
  ## merge all methods information by aeid
  methods <- merge(acid.methods, mthd35,by.x = "aeid", by.y = "aeid")
  # Print the Results.
  print(methods)
  # Return the Results.
  return(methods)
}

# Run the 'methods' function and store the output.
mmthds <- mc_methods(aeids)

# Print the assigned mc methods.
mmthds

Plotting

tcplPlot is tcpl’s single flexible plotting function, allowing for interactive yet consistent visualization of concentration-response curves via customizable parameters. As a standalone plotting utility built with the R library plotly to display the additional curve-fitting models, tcplPlot implements the R library plumber to provide representational state transfer-application programming interface (REST API) functionality. The tcplPlot function requires the selection of a level (lvl), field (fld), and value (val) to load the necessary data and display the associated plots. Level 4, lvl = 4, plots the concentration-response series fit by all models. Level 5, lvl = 5, extends Level 4 plotting by highlighting the winning model with activity hit call presented. Level 6 multi-concentration plotting, including lists of flags, are not currently supported by tcplPlot. Moreover, only multi-concentration plotting is currently supported.

Customization of output is possible by specifying parameters, including output, verbose, multi, by, fileprefix, nrow, ncol, and dpi.

  • The output parameter indicates how the plots will be presented. In addition to outputs viewable with the R console, tcplPlot supports a variety of publication-quality file type options, including raster graphics (PNG, JPG, and TIFF) to retain color quality when printing to photograph and vector graphics (SVG and PDF) to retain image resolution when scaled to large formats.

  • The verbose parameter results in a plot that includes a table containing potency and model performance metrics; verbose = FALSE is default and the only option in console outputs. When verbose = TRUE the model aic values are listed in descending order and generally the winning model will be listed first.

  • The multi parameter allows for single or multiple plots per page. multi = TRUE is the default option for PDF outputs, whereas multi = FALSE is the only option for other outputs. If using the parameter option multi = TRUE, the default number of plots per page is set by the verbose parameter. The default number of plots per page is either 6 plots per page (verbose = FALSE) or 4 plots per page (verbose = TRUE).

  • The by parameter indicates how files should be divided, typically by \(aeid\) or \(spid\).

  • The fileprefix parameter allows the user to set a custom filename prefix. The standard filename is tcplPlot_sysDate().output (example: tcplPlot_2023_08_02.jpg) or, if by parameter is set, tcplPlot_sysDate()_by.output (example: tcplPlot_2023_08_02_aeid_80.pdf). When a fileprefix is assigned the default tcplPlot prefix is replaced with the new filename. (example: myplot_2023_08_02_aeid_80.pdf or myplot_2023_08_02.jpg).

  • The nrow parameter specifies the number of rows for the multiple plots per page; this is 2 by default. The ncol parameter specifies the number of columns for the multiple plots per page; this is 3 by default. If verbose = FALSE, ncol is 2. nrow and ncol can customize the number of plots included per page. Both nrow and ncol must be greater than 0. While there is no hard coded upper limit to the number of rows and columns, the underlying technology has a dimension limitation of nrow = 9 and ncol = 7.

  • The dpi parameter specifies image print resolution for image file output types (PNG, JPG, TIFF, SVG); this is 600 by default.

The following examples demonstrate tcplPlot functionality through available the variety of customization options:

Output PDF of Verbose, Multiple Plots per Page, by AEID and/or SPID

The following two examples produce plots of Level 5 MC data for the selected \(aeids\). A new pdf is generated for each endpoint. Filtering can be applied if only plots for a subset of samples (\(spids\)) are desired.

# Plot Level 5 MC data for aeids 3157-3159 and outputs plots separate pdfs by aeid.
tcplPlot(lvl = 5, # data level
         fld = "aeid", # field to query on
         val = 3157:3159, # values must be listed for each corresponding 'fld'
         by = "aeid", # parameter to divide files
         multi = TRUE, # multiple plots per page - output 4 per page
         verbose = TRUE, # output all details if TRUE
         output = "pdf") # output as pdf

# Loading required mc_vignette data for example below
data(mc_vignette, package = 'tcpl')
mc5 <- mc_vignette[["mc5"]]

# Plot Level 5 MC data from the mc_vignette R data object for a single aeid 80 and
# spids "TP0001652B01", 01504209", "TP0001652D01", "TP0001652A01", and "1210314466" 
tcplPlot(lvl = 5, # data level
         fld = c("aeid", "spid"), # field to query on
         val = list(mc5$aeid, mc5$spid), # values must be listed for each corresponding 'fld'
         by = "aeid", # parameter to divide files
         multi = TRUE, # multiple plots per page - output 4 per page
         verbose = TRUE, # output all details
         output = "pdf", # output as pdf
         fileprefix = "output_pdf") # prefix of the filename

Plots with parameters: output = “pdf”, multi = TRUE, and verbose = TRUE for aeid 80 and spids “TP0001652B01”, 01504209”, “TP0001652D01”, “TP0001652A01”, and “1210314466”

Output Image File (JPG) of Single Verbose Plot, by AEID and SPID

This example illustrates a Level 5 verbose plot for a single endpoint and single sample of output type “jpg”.

# Plot a verbose plot of Level 5 MC data for single aeid 80 and spid 01504209 and 
# output as jpg.
tcplPlot(lvl = 5, # data level
         fld = c('aeid','spid'), # field to query on
         val = list(80,'01504209'), # values must be listed for each corresponding 'fld'
         # values should match their corresponding 'fld'
         multi = FALSE, # single plot per page
         verbose = TRUE, # output all details
         output = "jpg", # output as jpg
         fileprefix = "output_jpg")

Plot generated with parameters: output = “jpg” and verbose = TRUE for aeid 80 and spid 01504209

Output to Console, by M4ID or AEID and SPID

Due to the dynamic nature of _m#_ids, the first example code chunk does not include a corresponding plot. Here, the \(m4id\) value (482273) corresponds with the mc_vignette R data object. To run test this code, a valid \(m4id\) value must be supplied.

The second example includes a level 5 plot for one endpoint and one sample of output type “console”. Only 1 concentration series can be output in console at a time.

# Create Level 4 plot for a single m4id.
tcplPlot(lvl = 4,  # data level
         fld = "m4id", # field to query on 
         val = 482273, # values must be listed for each corresponding 'fld'
         multi = FALSE, # single plot
         verbose = FALSE, # do not output all details
         output = "console") # output in R console

# Plot of Level 5 MC data for single aeid (80) and spid (01504209)
# and output to console.
tcplPlot(lvl = 5, # data level
         fld = c('aeid','spid'), # field to query on
         val = list(80, '01504209'), # values must be listed for each corresponding 'fld'
         multi = FALSE, # single plot
         verbose = FALSE, # do not output all details
         output = "console") # output in R console

Plot generated with parameters: output = “console” for aeid 80 and spid 01504209

Additional Examples

Below are a few case examples for retrieving various bits of information from the database.

Load Data for a Specific Chemical

In this example, we illustrate the necessary steps for extracting information about the compound Bisphenol A found within the database. The user will define the chemical of interest, isolate all associated sample ids (\(\mathit{spids}\)), and then load all data for the given chemical.

# Provide the chemical name and assign to 'chnm'.
chnm <- 'Bisphenol A'
# Load the chemical data from the database.
chem <- tcplLoadChem(field = 'chnm',val = chnm)
# Load mc5 data from the database for the specified chemical.
BPA.mc5 <- tcplLoadData(lvl = 5, # data level 
                        fld = 'spid', # field to query on
                        val = chem[,spid], # value for each field (fld)
                        type = 'mc') # data type - MC

Plot Sample Subset

In this example, we illustrate how to plot by endpoint for a sample subset, as opposed to plotting all samples tested within an endpoint. The user will load data for the select endpoints, isolate the samples of interest, and then plot by endpoint for the sample subset.

# Load Level 5 multiple concentration data summary values for select aeids.
mc5 <- tcplPrepOtpt(
  tcplLoadData(lvl=5, # data level
               fld='aeid', # id field to query on
               val=tcplLoadAeid(fld="asid",val = 25)$aeid, # value for each field
               type='mc', # data type - MC
               add.fld=TRUE) # return additional parameters from mc5_param
  )

# Identify sample subset.
spid.mc5 <- mc5[spid %in% c("EPAPLT0018N08", "EPAPLT0023A16", "EPAPLT0020C11",  
                            "EPAPLT0018B13","EPAPLT0018B14","EPAPLT0018B15"),]

# Plot by endpoint for sample subset.
tcplPlot(lvl = 5, # data level
         fld = c("spid","aeid"), # fields to query on
         val = list( # value for each field, must be same order as 'fld'
           spid.mc5$spid, # sample id's
           spid.mc5$aeid  # assay endpoint id's
           ),
         by = "aeid", # parameter to divide files
         multi = TRUE, # multiple plots per page - output 6 per page if TRUE
         verbose = TRUE, # output all details if TRUE
         output = "pdf", # output as pdf
         fileprefix = "output/upitt") # prefix of the filename