The package MORSE is devoted to the analysis of data from standard toxicity bioassays. It provides a simple workflow to explore/visualize a dataset, and compute estimations of risk assessment indicators. This document illustrates a typical use of MORSE on survival and reproduction data, which can be followed to analyze new datasets.
The following example shows all the stages to perform survival analysis on standard bioassay data and produce estimate values of the \(LC_x\). We will use a dataset of the library named cadmium2
, which contains both survival and reproduction data from a chronic laboratory bioassay. In this experiment, snails were exposed to six concentrations of a metal contaminant (cadmium) during 56 days.
The data from a survival assay should be gathered in a data.frame
with a specific layout. This is documented in the paragraph on survData
in the reference manual, and you can also inspect one of the datasets provided in the package (like cadmium2
). First, we load the dataset and use the function survDataCheck
to verify that it has the expected layout:
data(cadmium2)
survDataCheck(cadmium2, diagnosis.plot = TRUE)
## No error detected.
survData
objectThe class survData
represents survival data and is the basic representation used for the subsequent operations. Note that if the call to survDataCheck
reports no error, it is guaranteed that survData
will not fail.
dat <- survData(cadmium2)
head(dat)
## replicate conc time Nsurv Nrepro ID Ninit Nindtime
## 1 A 0 0 5 0 A_0_0 5 0
## 2 A 0 3 5 4 A_0_3 5 15
## 3 A 0 7 5 5 A_0_7 5 35
## 4 A 0 10 5 8 A_0_10 5 50
## 5 A 0 14 5 4 A_0_14 5 70
## 6 A 0 17 5 9 A_0_17 5 85
The function plot
can be used to plot the number of surviving individuals as a function of time for all concentrations and replicates.
plot(dat, style = "ggplot", pool.replicate = FALSE)
Two graphical styles are available, "generic"
for standar R
plots or "ggplot"
to call the package ggplot2
. If the argument pool.replicate
is TRUE
the datapoints for a given time and concentration are pooled and only the mean number of survivors is plotted. To observe the full dataset, we set this option to FALSE
.
By fixing the concentration to a (tested) value, we can visualize one subplot in particular:
plot(dat, concentration = 124, addlegend = TRUE,
pool.replicate = FALSE, style = "ggplot")
We can also plot the number of surviving individuals at a given time point as a function of the concentration, by fixing target.time
instead of concentration
:
plot(dat, target.time = 21, pool.replicate = FALSE, style = "ggplot",
addlegend = TRUE)
Finally, when fixing both concentration
and target.time
, we can compare one condition to the control:
plot(dat, concentration = 232, target.time = 21)
The function summary
provides some descriptive statistics on the experimental design.
summary(dat)
## Number of datapoints per concentration:
##
## 0 53 78 124 232 284
## 102 102 102 102 102 102
##
## Number of datapoints per time:
##
## 0 3 7 10 14 17 21 24 28 31 35 38 42 45 49 52 56
## 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
##
## Number of replicate per time and concentration:
## time
## conc 0 3 7 10 14 17 21 24 28 31 35 38 42 45 49 52 56
## 0 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
## 53 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
## 78 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
## 124 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
## 232 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
## 284 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
##
## Number of survival (sum of replicate) per time and concentration:
## 0 3 7 10 14 17 21 24 28 31 35 38 42 45 49 52 56
## 0 30 30 30 30 29 29 29 29 29 28 28 28 28 28 28 28 28
## 53 30 30 29 29 29 29 29 29 29 29 28 28 28 28 28 28 28
## 78 30 30 30 30 30 30 29 29 29 29 29 29 29 29 29 27 27
## 124 30 30 30 30 30 29 28 28 27 26 25 23 21 18 11 11 9
## 232 30 30 30 22 18 18 17 14 13 12 8 4 3 1 0 0 0
## 284 30 30 15 7 4 4 4 2 2 1 1 1 1 1 1 0 0
Now we are ready to fit a probabilistic model for the survival data, which aims at finding the relation between concentration of pollutant and mean survival rate at the target time. Our model assumes the latter is a log-logistic function of the former, and the work here is to estimate the parameters of this log-logistic function. Once we have estimated the parameters, we can then calculate the \(LC_x\) for any \(x\). All this work is performed by the survFitTT
function, which requires a survData
object and the levels of \(LC_x\) we need:
fit <- survFitTT(dat,
lcx = c(10, 20, 30, 40, 50))
The return value is an object of class survFitTT
and provides the estimated parameters as a probability distribution, in order to model our uncertainty on their true value. For the parameters of the models, as well as the \(LC_x\) values, we report the median (as the estimated value) and the 2.5 % and 97.5 % quantiles of the distribution (as a measure of uncertainty, a.k.a. credible intervals). These can seen using the summary
method:
summary(fit)
## Summary:
##
## The loglogisticbinom_3 model with a binomial stochastic part was used !
##
## Quantiles of priors on parameters:
##
## 50% 2.5% 97.5%
## b 1.000 0.013 79.433
## d 0.500 0.025 0.975
## e 122.687 53.898 279.268
##
## Quantiles of posteriors on parameters:
##
## 50% 2.5% 97.5%
## b 11.739 5.393 72.776
## d 0.922 0.853 0.971
## e 117.770 104.238 124.653
##
## Quantiles of the estimated LCx:
##
## 50% 2.5% 97.5%
## LC10 97.123 72.530 119.243
## LC20 104.193 83.323 120.559
## LC30 109.269 91.322 121.575
## LC40 113.589 97.945 122.620
## LC50 117.770 104.238 124.653
or in a plot:
plot(fit, log.scale = TRUE, ci = TRUE, style = "ggplot",
addlegend = TRUE)
Note that survFitTT
will warn you if the estimated \(LC_{50}\) lies outside the range of tested concentrations:
data("cadmium1")
fit <- survFitTT(survData(cadmium1),
lcx = c(10, 20, 30, 40, 50))
## Warning in survFitTT(survData(cadmium1), lcx = c(10, 20, 30, 40, 50)): The
## LC50 estimation lies outsides the range of tested concentration and may be
## unreliable !
plot(fit, log.scale = TRUE, ci = TRUE, style = "ggplot",
addlegend = TRUE)
In this example, the experimental design did not include sufficiently high concentrations, and we are missing measurements that would have a major influence on the final estimation. For this reason this result should be considered unreliable.
The steps in reproduction data analysis are absolutely analogous to what we described for survival data. This time the goal is to estimate the relation between pollutant concentration and reproduction rate per individual-day.
Here is a typical session:
# (1) load dataset
data(cadmium2)
# (2) check structure and integrity of the dataset
reproDataCheck(cadmium2, diagnosis.plot = TRUE)
## No error detected.
# (3) create a `reproData` object
dat <- reproData(cadmium2)
# (4) represent the cumulated number of offspring as a function time
plot(dat, style = "ggplot", pool.replicate = FALSE)
plot(dat, target.time = 21, addlegend = TRUE, style = "ggplot",
pool.replicate = FALSE)
plot(dat, concentration = 124, addlegend = TRUE, style = "ggplot",
pool.replicate = FALSE)
plot(dat, concentration = 124, target.time = 21, style = "ggplot")
# (5) check information on the experimental design
summary(dat)
## Number of datapoints per concentration:
##
## 0 53 78 124 232 284
## 102 102 102 102 102 102
##
## Number of datapoints per time:
##
## 0 3 7 10 14 17 21 24 28 31 35 38 42 45 49 52 56
## 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
##
## Number of replicate per time and concentration:
## time
## conc 0 3 7 10 14 17 21 24 28 31 35 38 42 45 49 52 56
## 0 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
## 53 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
## 78 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
## 124 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
## 232 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
## 284 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
##
## Number of survival (sum of replicate) per time and concentration:
## 0 3 7 10 14 17 21 24 28 31 35 38 42 45 49 52 56
## 0 30 30 30 30 29 29 29 29 29 28 28 28 28 28 28 28 28
## 53 30 30 29 29 29 29 29 29 29 29 28 28 28 28 28 28 28
## 78 30 30 30 30 30 30 29 29 29 29 29 29 29 29 29 27 27
## 124 30 30 30 30 30 29 28 28 27 26 25 23 21 18 11 11 9
## 232 30 30 30 22 18 18 17 14 13 12 8 4 3 1 0 0 0
## 284 30 30 15 7 4 4 4 2 2 1 1 1 1 1 1 0 0
##
## Number of offspring (sum of replicate) per time and concentration:
## 0 3 7 10 14 17 21 24 28 31 35 38 42 45 49 52 56
## 0 0 29 25 33 24 32 29 29 22 38 29 39 23 30 18 33 34
## 53 0 20 26 27 27 25 23 27 22 28 20 28 12 20 26 21 20
## 78 0 17 31 28 24 23 24 15 21 22 23 27 19 24 36 32 26
## 124 0 14 32 24 9 6 9 10 12 12 16 16 12 7 14 7 8
## 232 0 5 20 6 2 0 5 1 5 4 2 3 0 0 0 0 0
## 284 0 2 6 1 1 0 0 0 0 0 0 0 0 0 0 0 0
# (6) fit an exposure-response model at target-time
fit <- reproFitTT(dat, stoc.part = "bestfit",
ecx = c(10, 20, 30, 40, 50),
quiet = TRUE)
plot(fit, log.scale = TRUE, ci = TRUE,
style = "ggplot", addlegend = TRUE)
As in survival analysis, we assume that the reproduction rate per individual-day is a log-logistic function of the concentration. More details and parameter signification can be found in the Modeling vignette.
For reproduction analysis, we consider one model which neglects inter-individual variability (named “Poisson”) and another one which takes it into account (named “gamma Poisson”). You can choose either one using the option stoc.part
, but by setting it to “bestfit”, you let reproFitTT
decide which models fits the data best. The choice can be seen by calling the summary
function:
summary(fit)
## Summary:
##
## The loglogistic model with a Poisson stochastic part was used !
##
## Quantiles of priors on parameters:
##
## 50% 2.5% 97.5%
## b 1.000 0.013 79.433
## d 0.290 0.262 0.319
## e 148.835 79.015 280.350
##
## Quantiles of posteriors on parameters:
##
## 50% 2.5% 97.5%
## b 2.282 1.791 2.942
## d 0.281 0.256 0.307
## e 140.929 122.296 161.612
##
## Quantiles of the estimated ECx:
##
## 50% 2.5% 97.5%
## EC10 53.725 37.550 73.696
## EC20 76.673 58.591 97.947
## EC30 97.149 78.395 118.504
## EC40 117.990 99.150 138.968
## EC50 140.929 122.296 161.612
When the gamma Poisson model is selected, the summary shows an additional parameter called omega
, which quantifies the inter-individual variability (the higher omega
the higher variability).
In MORSE, reproduction datasets are a special case of survival datasets: a reproduction dataset includes the same information than a survival dataset plus the information on reproduction outputs. For that reason the S3 class reproData
inherits from the class survData
, which means that any operation on a survData
object is legal on a reproData
object. In particular, to use the plot function related to survival analysis on reproData
object, we can use the as.survData
conversion function:
dat <- reproData(cadmium2)
plot(as.survData(dat))