The Stochastic Process Model (SPM) was developed several decades ago (Woodbury and Manton 1977, A. I. Yashin, Arbeev, Akushevich, et al. (2007)), and applied for analyses of clinical, demographic, epidemiologic longitudinal data as well as in many other studies that relate stochastic dynamics of repeated measures to the probability of end-points (outcomes). SPM links the dynamic of stochastical variables with a hazard rate as a quadratic function of the state variables (A. I. Yashin, Arbeev, Akushevich, et al. 2007). The R-package, “stpm”, is a set of utilities to estimate parameters of stochastic process and modeling survival trajectories and time-to-event outcomes observed from longitudinal studies. It is a general framework for studying and modeling survival (censored) traits depending on random trajectories (stochastic paths) of variables.
install.packages("stpm")
require(devtools)
devtools::install_github("izhbannikov/stpm")
Data represents a typical longitudinal data in form of two datasets: longitudinal dataset (follow-up studies), in which one record represents a single observation, and vital (survival) statistics, where one record represents all information about the subject. Longitudinal dataset cat contain a subject ID (identification number), status (event(1)/censored(0)), time and measurements across the variables.
Below there is an example of clinical data that can be used in stpm
and we will discuss the fields later.
Longitudinal table:
## ID IndicatorDeath Age DBP BMI
## 1 1 0 30 80.00000 25.00000
## 2 1 0 32 80.51659 26.61245
## 3 1 0 34 77.78412 29.16790
## 4 1 0 36 77.86665 32.40359
## 5 1 0 38 96.55673 31.92014
## 6 1 0 40 94.48616 32.89139
The packate accepts longitudinal data in two formats: “short” and “long”.
## id xi t y1
## 1 1 0 30 79.80497
## 2 1 0 31 90.15730
## 3 1 0 32 88.54718
## 4 1 0 33 92.39700
## 5 1 0 34 88.81260
## 6 1 0 35 90.23328
## id xi t1 t2 y1 y1.next
## 1 1 0 30 31 76.59921 77.82783
## 2 1 0 31 32 77.82783 86.88414
## 3 1 0 32 33 86.88414 87.98383
## 4 1 0 33 34 87.98383 87.09486
## 5 1 0 34 35 87.09486 91.61453
## 6 1 0 35 36 91.61453 94.52103
There are two main SPM types in the package: discrete-time model (Akushevich, Kulminski, and Manton 2005) and continuous-time model (A. I. Yashin, Arbeev, Akushevich, et al. 2007). Discrete model assumes equal intervals between follow-up observations. The example of discrete dataset is given below.
library(stpm)
data <- simdata_discr(N=10) # simulate data for 10 individuals, "long" format (default)
head(data)
## id xi t1 t2 y1 y1.next
## 1 1 0 30 31 67.62051 63.64435
## 2 1 0 31 32 63.64435 63.53720
## 3 1 0 32 33 63.53720 66.78659
## 4 1 0 33 34 66.78659 69.87201
## 5 1 0 34 35 69.87201 71.33571
## 6 1 0 35 36 71.33571 68.71782
In this case there are equal intervals between \(t_1\) and \(t_2\).
In the continuous-time SPM, in which intervals between observations are not equal (arbitrary or random). The example of such dataset is shown below:
library(stpm)
data <- simdata_cont(N=5, format="short") # simulate data for 5 individuals, "short" format
head(data)
## id xi t y1
## 1 0 0 33.04076 79.44746
## 2 0 0 34.81608 72.25014
## 3 0 0 36.62824 55.72658
## 4 0 0 38.36642 57.63615
## 5 0 0 40.06906 53.31191
## 6 0 0 41.70729 59.47700
The discrete model assumes fixed time intervals between consecutive observations. In this model, \(\mathbf{Y}(t)\) (a \(k \times 1\) matrix of the values of covariates, where \(k\) is the number of considered covariates) and \(\mu(t, \mathbf{Y}(t))\) (the hazard rate) have the following form:
\(\mathbf{Y}(t+1) = \mathbf{u} + \mathbf{R} \mathbf{Y}(t) + \mathbf{\epsilon}\)
\(\mu (t, \mathbf{Y}(t)) = [\mu_0 + \mathbf{b} \mathbf{Y}(t) + \mathbf{Y}(t)^* \mathbf{Q} \mathbf{Y}(t)] e^{\theta t}\)
Coefficients \(\mathbf{u}\) (a \(k \times 1\) matrix, where \(k\) is a number of covariates), \(\mathbf{R}\) (a \(k \times k\) matrix), \(\mu_0\), \(\mathbf{b}\) (a \(1 \times k\) matrix), \(\mathbf{Q}\) (a \(k \times k\) matrix) are assumed to be constant in the particular implementation of this model in the R-package stpm
. \(\mathbf{\epsilon}\) are normally-distributed random residuals, \(k \times 1\) matrix. A symbol ’*’ denotes transpose operation. \(\theta\) is a parameter to be estimated along with other parameters (\(\mathbf{u}\), \(\mathbf{R}\), \(\mathbf{\mu_0}\), \(\mathbf{b}\), \(\mathbf{Q}\)).
library(stpm)
#Data simulation (200 individuals)
data <- simdata_discr(N=100)
#Estimation of parameters
pars <- spm_discrete(data)
pars
## $dmodel
## $dmodel$theta
## [1] 0.069
##
## $dmodel$mu0
## [1] 0.0001770801246
##
## $dmodel$b
## [1] -3.574715964e-06
##
## $dmodel$Q
## [,1]
## [1,] 2.121578072e-08
##
## $dmodel$u
## [1] 3.675386047
##
## $dmodel$u.std.err
## (Intercept)
## 0.3246913866
##
## $dmodel$R
## [,1]
## [1,] 0.9552291843
##
## $dmodel$R.std.err
## y1_1
## [1,] 0.003919128066
##
## $dmodel$Sigma
## [1] 4.956116391
##
##
## $cmodel
## $cmodel$a
## [,1]
## [1,] -0.04477081567
##
## $cmodel$f1
## [,1]
## [1,] 82.09334569
##
## $cmodel$Q
## [,1]
## [1,] 2.121578072e-08
##
## $cmodel$f
## [,1]
## [1,] 84.24662781
##
## $cmodel$b
## [,1]
## [1,] 4.956116391
##
## $cmodel$mu0
## [,1]
## [1,] 2.650124189e-05
##
## $cmodel$theta
## [1] 0.069
##
##
## attr(,"class")
## [1] "spm.discrete"
In the specification of the SPM described in 2007 paper by Yashin and collegaues (A. I. Yashin, Arbeev, Akushevich, et al. 2007) the stochastic differential equation describing the age dynamics of a covariate is:
\(d\mathbf{Y}(t)= \mathbf{a}(t)(\mathbf{Y}(t) -\mathbf{f}_1(t))dt + \mathbf{b}(t)d\mathbf{W}(t), \mathbf{Y}(t=t_0)\)
In this equation, \(\mathbf{Y}(t)\) (a \(k \times 1\) matrix) is the value of a particular covariate at a time (age) \(t\). \(\mathbf{f}_1(t)\) (a \(k \times 1\) matrix) corresponds to the long-term mean value of the stochastic process \(\mathbf{Y}(t)\), which describes a trajectory of individual covariate influenced by different factors represented by a random Wiener process \(\mathbf{W}(t)\). Coefficient \(\mathbf{a}(t)\) (a \(k \times k\) matrix) is a negative feedback coefficient, which characterizes the rate at which the process reverts to its mean. In the area of research on aging, \(\mathbf{f}_1(t)\) represents the mean allostatic trajectory and \(\mathbf{a}(t)\) represents the adaptive capacity of the organism. Coefficient \(\mathbf{b}(t)\) (a \(k \times 1\) matrix) characterizes a strength of the random disturbances from Wiener process \(\mathbf{W}(t)\).
The following function \(\mu(t, \mathbf{Y}(t))\) represents a hazard rate:
\(\mu(t, \mathbf{Y}(t)) = \mu_0(t) + (\mathbf{Y}(t) - \mathbf{f}(t))^* \mathbf{Q}(t) (\mathbf{Y}(t) - \mathbf{f}(t))\)
here \(\mu_0(t)\) is the baseline hazard, which represents a risk when \(\mathbf{Y}(t)\) follows its optimal trajectory; \(\mathbf{f}(t)\) (a \(k \times 1\) matrix) represents the optimal trajectory that minimizes the risk and \(\mathbf{Q}(t)\) (\(k \times k\) matrix) represents a sensitivity of risk function to deviation from the norm.
library(stpm)
#Simulate some data for 50 individuals
data <- simdata_cont(N=50)
head(data)
## id xi t1 t2 y1 y1.next
## 1 0 0 36.11835785 37.36292266 80.75909693 83.91585041
## 2 0 0 37.36292266 39.08122103 83.91585041 84.43405609
## 3 0 0 39.08122103 40.35419687 84.43405609 78.13721865
## 4 0 0 40.35419687 41.99380399 78.13721865 69.75343735
## 5 0 0 41.99380399 43.83044550 69.75343735 68.74444605
## 6 0 0 43.83044550 45.26412573 68.74444605 65.40429121
#Estimate parameters
# a=-0.05, f1=80, Q=2e-8, f=80, b=5, mu0=2e-5, theta=0.08 are starting values for estimation procedure
pars <- spm_continuous(dat=data,a=-0.05, f1=80, Q=2e-8, f=80, b=5, mu0=2e-5, theta=0.08)
pars
## $a
## [,1]
## [1,] -0.04875243195
##
## $f1
## [,1]
## [1,] 77.69279947
##
## $Q
## [,1]
## [1,] 2.187677427e-08
##
## $f
## [,1]
## [1,] 87.95794529
##
## $b
## [,1]
## [1,] 4.837217019
##
## $mu0
## [1] 2.19635437e-05
##
## $theta
## [1] 0.08085888184
##
## $status
## [1] 5
##
## $LogLik
## [1] -6834.570729
##
## $objective
## [1] 6834.548118
##
## $message
## [1] "NLOPT_MAXEVAL_REACHED: Optimization stopped because maxeval (above) was reached."
##
## $limit
## [1] FALSE
##
## attr(,"class")
## [1] "spm.continuous"
The coefficient conversion between continuous- and discrete-time models is as follows (‘c’ and ‘d’ denote continuous- and discrete-time models respectively; note: these equations can be used if intervals between consecutive observations of discrete- and continuous-time models are equal; it also required that matrices \(\mathbf{a}_c\) and \(\mathbf{Q}_{c,d}\) must be full-rank matrices):
\(\mathbf{Q}_c = \mathbf{Q}_d\)
\(\mathbf{a}_c = \mathbf{R}_d - I(k)\)
\(\mathbf{b}_c = \mathbf{\Sigma}\)
\({\mathbf{f}_1}_c = -\mathbf{a}_c^{-1} \times \mathbf{u}_d\)
\(\mathbf{f}_c = -0.5 \mathbf{b}_d \times \mathbf{Q}^{-1}_d\)
\({\mu_0}_c = {\mu _0}_d - \mathbf{f}_c \times \mathbf{Q_c} \times \mathbf{f}_c^*\)
\(\theta_c = \theta_d\)
where \(k\) is a number of covariates, which is equal to model’s dimension and ’*’ denotes transpose operation; \(\mathbf{\Sigma}\) is a \(k \times 1\) matrix which contains s.d.
s of corresponding residuals (residuals of a linear regression \(\mathbf{Y}(t+1) = \mathbf{u} + \mathbf{R}\mathbf{Y}(t) + \mathbf{\epsilon}\); s.d.
is a standard deviation), \(I(k)\) is an identity \(k \times k\) matrix.
In previous models, we assumed that coefficients is sort of time-dependant: we multiplied them on to \(e^{\theta t}\). In general, this may not be the case (A. I. Yashin, Arbeev, Kulminski, et al. 2007). We extend this to a general case, i.e. (we consider one-dimensional case):
\(\mathbf{a(t)} = \mathbf{par}_1 t + \mathbf{par}_2\) - linear function.
The corresponding equations will be equivalent to one-dimensional continuous case described above.
library(stpm)
#Data preparation:
n <- 10
data <- simdata_time_dep(N=n)
# Estimation:
opt.par <- spm_time_dep(data,
start = list(a = -0.05, f1 = 80, Q = 2e-08, f = 80, b = 5, mu0 = 0.001),
frm = list(at = "a", f1t = "f1", Qt = "Q", ft = "f", bt = "b", mu0t= "mu0"))
opt.par
## $a
## [1] -0.05
##
## $f1
## [1] 80
##
## $Q
## [1] 2e-08
##
## $f
## [1] 80
##
## $b
## [1] 5
##
## $mu0
## [1] 0.001
Lower and upper boundaries can be set up with parameters \(lb\) and \(ub\), which represents simple numeric vectors. Note: lengths of \(lb\) and \(ub\) must be the same as the total length of the parameters. Lower and upper boundaries can be set for continuous-time and time-dependent models only.
Below we show the example of setting up \(lb\) and \(ub\) when we have a single covariate:
library(stpm)
data <- simdata_cont(N=10, ystart = 80, a = -0.1, Q = 1e-06, mu0 = 1e-5, theta = 0.08, f1 = 80, f=80, b=1, dt=1, sd0=5)
ans <- spm_continuous(dat=data,
a = -0.1,
f1 = 82,
Q = 1.4e-6,
f = 77,
b = 1,
mu0 = 1.6e-5,
theta = 0.1,
stopifbound = FALSE,
lb=c(-0.2, 60, 0.1e-6, 60, 0.1, 0.1e-5, 0.01),
ub=c(0, 140, 5e-06, 140, 3, 5e-5, 0.20))
ans
## $a
## [,1]
## [1,] -0.1099994386
##
## $f1
## [,1]
## [1,] 80.05727523
##
## $Q
## [,1]
## [1,] 4.658323922e-06
##
## $f
## [,1]
## [1,] 106.4963117
##
## $b
## [,1]
## [1,] 1.043646423
##
## $mu0
## [1] 4.303398937e-05
##
## $theta
## [1] 0.1548749843
##
## $status
## [1] 5
##
## $LogLik
## [1] -715.5976063
##
## $objective
## [1] 715.5023754
##
## $message
## [1] "NLOPT_MAXEVAL_REACHED: Optimization stopped because maxeval (above) was reached."
##
## $limit
## [1] FALSE
##
## attr(,"class")
## [1] "spm.continuous"
This is an example for two physiological variables (covariates).
library(stpm)
data <- simdata_cont(N=10,
a=matrix(c(-0.1, 0.001, 0.001, -0.1), nrow = 2, ncol = 2, byrow = T),
f1=t(matrix(c(100, 200), nrow = 2, ncol = 1, byrow = F)),
Q=matrix(c(1e-06, 1e-7, 1e-7, 1e-06), nrow = 2, ncol = 2, byrow = T),
f=t(matrix(c(100, 200), nrow = 2, ncol = 1, byrow = F)),
b=matrix(c(1, 2), nrow = 2, ncol = 1, byrow = F),
mu0=1e-4,
theta=0.08,
ystart = c(100,200), sd0=c(5, 10), dt=1)
a.d <- matrix(c(-0.15, 0.002, 0.002, -0.15), nrow = 2, ncol = 2, byrow = T)
f1.d <- t(matrix(c(95, 195), nrow = 2, ncol = 1, byrow = F))
Q.d <- matrix(c(1.2e-06, 1.2e-7, 1.2e-7, 1.2e-06), nrow = 2, ncol = 2, byrow = T)
f.d <- t(matrix(c(105, 205), nrow = 2, ncol = 1, byrow = F))
b.d <- matrix(c(1, 2), nrow = 2, ncol = 1, byrow = F)
mu0.d <- 1.1e-4
theta.d <- 0.07
ans <- spm_continuous(dat=data,
a = a.d,
f1 = f1.d,
Q = Q.d,
f = f.d,
b = b.d,
mu0 = mu0.d,
theta = theta.d,
lb=c(-0.5, ifelse(a.d[2,1] > 0, a.d[2,1]-0.5*a.d[2,1], a.d[2,1]+0.5*a.d[2,1]), ifelse(a.d[1,2] > 0, a.d[1,2]-0.5*a.d[1,2], a.d[1,2]+0.5*a.d[1,2]), -0.5,
80, 100,
Q.d[1,1]-0.5*Q.d[1,1], ifelse(Q.d[2,1] > 0, Q.d[2,1]-0.5*Q.d[2,1], Q.d[2,1]+0.5*Q.d[2,1]), ifelse(Q.d[1,2] > 0, Q.d[1,2]-0.5*Q.d[1,2], Q.d[1,2]+0.5*Q.d[1,2]), Q.d[2,2]-0.5*Q.d[2,2],
80, 100,
0.1, 0.5,
0.1e-4,
0.01),
ub=c(-0.08, 0.002, 0.002, -0.08,
110, 220,
Q.d[1,1]+0.1*Q.d[1,1], ifelse(Q.d[2,1] > 0, Q.d[2,1]+0.1*Q.d[2,1], Q.d[2,1]-0.1*Q.d[2,1]), ifelse(Q.d[1,2] > 0, Q.d[1,2]+0.1*Q.d[1,2], Q.d[1,2]-0.1*Q.d[1,2]), Q.d[2,2]+0.1*Q.d[2,2],
110, 220,
1.5, 2.5,
1.2e-4,
0.10))
ans
## $a
## [,1] [,2]
## [1,] -0.150267163523 0.001692834917
## [2,] 0.001979205983 -0.148012155129
##
## $f1
## [,1]
## [1,] 105.2777868
## [2,] 195.2059583
##
## $Q
## [,1] [,2]
## [1,] 1.302225923e-06 1.305236854e-07
## [2,] 1.299338548e-07 1.285194449e-06
##
## $f
## [,1]
## [1,] 107.4474372
## [2,] 210.4295117
##
## $b
## [,1]
## [1,] 1.119337586
## [2,] 1.958657445
##
## $mu0
## [1] 0.0001124533236
##
## $theta
## [1] 0.073700401
##
## $status
## [1] 5
##
## $LogLik
## [1] 1513.260586
##
## $objective
## [1] -1943.966319
##
## $message
## [1] "NLOPT_MAXEVAL_REACHED: Optimization stopped because maxeval (above) was reached."
##
## $limit
## [1] FALSE
##
## attr(,"class")
## [1] "spm.continuous"
This model uses only one covariate, therefore setting-up model parameters is easy:
n <- 10
data <- simdata_time_dep(N=n)
# Estimation:
opt.par <- spm_time_dep(data, start=list(a=-0.05, f1=80, Q=2e-08, f=80, b=5, mu0=0.001),
lb=c(-1, 30, 1e-8, 30, 1, 1e-6), ub=c(0, 120, 5e-8, 130, 10, 1e-2))
opt.par
## $a
## [1] -0.05
##
## $f1
## [1] 80
##
## $Q
## [1] 2e-08
##
## $f
## [1] 80
##
## $b
## [1] 5
##
## $mu0
## [1] 0.001
Imagine a situation when one parameter function you want to be equal to zero: \(f=0\). Let’s emulate this case:
library(stpm)
n <- 10
data <- simdata_time_dep(N=n)
# Estimation:
opt.par <- spm_time_dep(data, frm = list(at="a", f1t="f1", Qt="Q", ft="0", bt="b", mu0t="mu0"))
opt.par
## $a
## [1] -0.05
##
## $f1
## [1] 80
##
## $Q
## [1] 2e-08
##
## $b
## [1] 80
##
## $mu0
## [1] 5
##
## $<NA>
## <NA>
## NA
As you can see, there is no parameter \(f\) in \(opt.par\). This is because we set \(f=0\) in \(frm\)!
Then, is you want to set the constraints, you must not specify the starting value (parameter \(start\)) and \(lb\)/\(ub\) for the parameter \(f\) (otherwise, the function raises an error):
n <- 10
data <- simdata_time_dep(N=n)
# Temporarily commented below
# Estimation:
opt.par <- spm_time_dep(data, frm = list(at="a", f1t="f1", Qt="Q", ft="0", bt="b", mu0t="mu0"),
start=list(a=-0.05, f1=80, Q=2e-08, b=5, mu0=0.001),
lb=c(-1, 30, 1e-8, 1, 1e-6), ub=c(0, 120, 5e-8, 10, 1e-2))
opt.par
## $a
## [1] -0.05
##
## $f1
## [1] 80
##
## $Q
## [1] 2e-08
##
## $b
## [1] 5
##
## $mu0
## [1] 0.001
You can do the same manner if you want two or more parameters to be equal to zero.
Function spm_con_1d(...)
allows for very fast parameter estimating for one-dimensional model. This function implements a analytical solution to estimate the parameters in the continuous SPM model by assuming all the parameters are constants. Below there is an example.
library(stpm)
dat <- simdata_cont(N=500)
colnames(dat) <- c("id", "xi", "t1", "t2", "y", "y.next")
res <- spm_con_1d(as.data.frame(dat), a=-0.05, b=2, q=1e-8, f=80, f1=90, mu0=1e-3, theta=0.08)
## [1] "Initial values:"
## [1] -5e-02 2e+00 1e-08 8e+01 9e+01 1e-03 8e-02
## [1] "Lower bounds:"
## [1] -2.5e-01 2.0e-01 1.0e-10 4.0e+01 4.5e+01 1.0e-04 8.0e-02
## [1] "Upper bounds:"
## [1] -2.5e-03 1.0e+01 1.0e-07 1.6e+02 1.8e+02 1.0e-02 8.0e-02
res
## $est
## Coeff. Std. Err. z p value
## a -0.0493420758 NaN NaN NaN
## b 4.9794029870 NaN NaN NaN
## q 0.0000000001 NaN NaN NaN
## f 80.0000016471 NaN NaN NaN
## f1 80.6272218816 NaN NaN NaN
## mu0 0.0001000000 NaN NaN NaN
## theta 0.0800000000 NaN NaN NaN
##
## $hessian
## a b q f
## a 2.793748738e+05 6.195718914e+03 NaN -7.483745923e-07
## b 6.195718914e+03 1.702639077e+03 NaN 3.185468928e-11
## q NaN NaN NaN NaN
## f -7.483745923e-07 3.185468928e-11 NaN 5.658734792e-06
## f1 -7.076387136e+00 -1.578011636e-08 NaN 4.760882619e-08
## mu0 7.150983495e-01 2.440492388e-02 NaN 3.825948102e-04
## theta 1.113777122e-02 3.291746676e-04 NaN 6.860465881e-07
## f1 mu0 theta
## a -7.076387136e+00 7.150983495e-01 1.113777122e-02
## b -1.578011636e-08 2.440492388e-02 3.291746676e-04
## q NaN NaN NaN
## f 4.760882619e-08 3.825948102e-04 6.860465881e-07
## f1 3.107301573e+00 -1.748834381e-05 1.442931365e-07
## mu0 -1.748834381e-05 -2.232323244e+08 1.789815141e+09
## theta 1.442931365e-07 1.789815141e+09 1.646030562e+07
##
## $lik
## [1] 50154.32939
##
## $con
## [1] 4
##
## $message
## [1] "NLOPT_XTOL_REACHED: Optimization stopped because xtol_rel or xtol_abs (above) was reached."
We added one- and multi- dimensional simulation to be able to generate test data for hyphotesis testing. Data, which can be simulated can be discrete (equal intervals between observations) and continuous (with arbitrary intervals).
The corresponding function is (k
- a number of variables(covariates), equal to model’s dimension):
simdata_discr(N=100, a=-0.05, f1=80, Q=2e-8, f=80, b=5, mu0=1e-5, theta=0.08, ystart=80, tstart=30, tend=105, dt=1)
Here:
N
- Number of individuals
a
- A matrix of k
xk
, which characterize the rate of the adaptive response
f1
- A particular state, which if a deviation from the normal (or optimal). This is a vector with length of k
Q
- A matrix of k
by k
, which is a non-negative-definite symmetric matrix
f
- A vector-function (with length k
) of the normal (or optimal) state
b
- A diffusion coefficient, k
by k
matrix
mu0
- mortality at start period of time (baseline hazard)
theta
- A displacement coefficient of the Gompertz function
ystart
- A vector with length equal to number of dimensions used, defines starting values of covariates
tstart
- A number that defines a start time (30 by default). Can be a number (30 by default) or a vector of two numbers: c(a, b) - in this case, starting value of time is simulated via uniform(a,b) distribution.
tend
- A number, defines a final time (105 by default)
dt
- A time interval between observations.
This function returns a table with simulated data, as shown in example below:
library(stpm)
data <- simdata_discr(N=10)
head(data)
## id xi t1 t2 y1 y1.next
## 1 1 0 30 31 77.38590799 75.04251562
## 2 1 0 31 32 75.04251562 74.07235572
## 3 1 0 32 33 74.07235572 68.24435253
## 4 1 0 33 34 68.24435253 77.27571868
## 5 1 0 34 35 77.27571868 69.71147161
## 6 1 0 35 36 69.71147161 71.48394836
The corresponding function is (k
- a number of variables(covariates), equal to model’s dimension):
simdata_cont(N=100, a=-0.05, f1=80, Q=2e-07, f=80, b=5, mu0=2e-05, theta=0.08, ystart=80, tstart=c(30,50), tend=105)
Here:
N
- Number of individuals
a
- A matrix of k
xk
, which characterize the rate of the adaptive response
f1
- A particular state, which if a deviation from the normal (or optimal). This is a vector with length of k
Q
- A matrix of k
by k
, which is a non-negative-definite symmetric matrix
f
- A vector-function (with length k
) of the normal (or optimal) state
b
- A diffusion coefficient, k
by k
matrix
mu0
- mortality at start period of time (baseline hazard)
theta
- A displacement coefficient of the Gompertz function
ystart
- A vector with length equal to number of dimensions used, defines starting values of covariates
tstart
- A number that defines a start time (30 by default). Can be a number (30 by default) or a vector of two numbers: c(a, b) - in this case, starting value of time is simulated via uniform(a,b) distribution.
tend
- A number, defines a final time (105 by default)
This function returns a table with simulated data, as shown in example below:
library(stpm)
data <- simdata_cont(N=10)
head(data)
## id xi t1 t2 y1 y1.next
## 1 0 0 30.05514743 31.75589530 81.21442985 67.81582863
## 2 0 0 31.75589530 33.64579207 67.81582863 63.71588584
## 3 0 0 33.64579207 34.90695192 63.71588584 63.14442946
## 4 0 0 34.90695192 36.63225290 63.14442946 57.15930218
## 5 0 0 36.63225290 38.00665299 57.15930218 65.27227064
## 6 0 0 38.00665299 39.29479353 65.27227064 63.79163312
Stochastic Process Model has many applications in analysis of longitudinal biodemographic data. Such data contain various physiological variables (known as covariates). Data can also potentially contain genetic information available for all or a part of participants. Taking advantage from both genetic and non-genetic information can provide future insights into a broad range of processes describing aging-related changes in the organism.
In this package, SPM with partially observed covariates is implemented in form of GenSPM (Genetic SPM), presented in (Arbeev et al. 2009) and further advanced in (Arbeev et al. 2014), further elaborates the basic stochastic process model conception by introducing a categorical variable, \(Z\), which may be a specific value of a genetic marker or, in general, any categorical variable. Currently, \(Z\) has two gradations: 0 or 1 in a genetic group of interest, assuming that \(P(Z=1) = p\), \(p \in [0, 1]\), were \(p\) is the proportion of carriers and non-carriers of an allele in a population. Example of longitudinal data with genetic component \(Z\) is provided below.
library(stpm)
data <- sim_pobs(N=10)
head(data)
## id xi t1 t2 Z y1 y1.next
## 1 0 0 31.47900527 32.43165163 0 79.76114023 73.46059821
## 2 0 0 32.43165163 33.52554149 0 73.46059821 72.68718241
## 3 0 0 33.52554149 34.45189144 0 72.68718241 81.69941982
## 4 0 0 34.45189144 35.46995416 0 81.69941982 86.03949276
## 5 0 0 35.46995416 36.50914835 0 86.03949276 93.75475285
## 6 0 0 36.50914835 37.50860283 0 93.75475285 93.90007857
In the specification of the SPM described in 2007 paper by Yashin and colleagues (A. I. Yashin, Arbeev, Akushevich, et al. 2007) the stochastic differential equation describing the age dynamics of a physiological variable (a dynamic component of the model) is:
\(dY(t) = a(Z, t)(Y(t) - f1(Z, t))dt + b(Z, t)dW(t), Y(t = t_0)\)
Here in this equation, \(Y(t)\) is a \(k \times 1\) matrix, where \(k\) is a number of covariates, which is a model dimension) describing the value of a physiological variable at a time (e.g. age) t. \(f_1(Z,t)\) is a \(k \times 1\) matrix that corresponds to the long-term average value of the stochastic process \(Y(t)\), which describes a trajectory of individual variable influenced by different factors represented by a random Wiener process \(W(t)\). The negative feedback coefficient \(a(Z,t)\) (\(k \times k\) matrix) characterizes the rate at which the stochastic process goes to its mean. In research on aging and well-being, \(f_1(Z,t)\) represents the average allostatic trajectory and \(a(t)\) in this case represents the adaptive capacity of the organism. Coefficient \(b(Z,t)\) (\(k \times 1\) matrix) characterizes a strength of the random disturbances from Wiener process \(W(t)\). All of these parameters depend on \(Z\) (a genetic marker having values 1 or 0). The following function \(\mu(t,Y(t))\) represents a hazard rate:
\(\mu(t,Y(t)) = \mu_0(t) + (Y(t) - f(Z, t))^*Q(Z, t)(Y(t) - f(Z, t))\)
In this equation: \(\mu_0(t)\) is the baseline hazard, which represents a risk when \(Y(t)\) follows its optimal trajectory; f(t) (\(k \times 1\) matrix) represents the optimal trajectory that minimizes the risk and \(Q(Z, t)\) (\(k \times k\) matrix) represents a sensitivity of risk function to deviation from the norm. In general, model coefficients \(a(Z, t)\), \(f1(Z, t)\), \(Q(Z, t)\), \(f(Z, t)\), \(b(Z, t)\) and \(\mu_0(t)\) are time(age)-dependent. Once we have data, we then can run analysis, i.e. estimate coefficients (they are assumed to be time-independent and data here is simulated):
library(stpm)
#Generating data:
data <- sim_pobs(N=10)
head(data)
## id xi t1 t2 Z y1 y1.next
## 1 0 0 42.96537413 43.93424901 0 81.09420838 87.32580403
## 2 0 0 43.93424901 44.95051132 0 87.32580403 91.10494024
## 3 0 0 44.95051132 46.04589845 0 91.10494024 89.11539922
## 4 0 0 46.04589845 47.09571553 0 89.11539922 80.72382896
## 5 0 0 47.09571553 48.16995502 0 80.72382896 86.41089536
## 6 0 0 48.16995502 49.09516177 0 86.41089536 89.64723957
#Parameters estimation:
pars <- spm_pobs(x=data)
pars
## $aH
## [,1]
## [1,] -0.05455152254
##
## $aL
## [,1]
## [1,] -0.01082694219
##
## $f1H
## [,1]
## [1,] 54.00349188
##
## $f1L
## [,1]
## [1,] 87.95079099
##
## $QH
## [,1]
## [1,] 2.099845839e-08
##
## $QL
## [,1]
## [1,] 2.679727583e-08
##
## $fH
## [,1]
## [1,] 55.74221877
##
## $fL
## [,1]
## [1,] 78.08969139
##
## $bH
## [,1]
## [1,] 3.953390685
##
## $bL
## [,1]
## [1,] 4.811547921
##
## $mu0H
## [1] 8.592411404e-06
##
## $mu0L
## [1] 9.000194352e-06
##
## $thetaH
## [1] 0.07219411681
##
## $thetaL
## [1] 0.09002514384
##
## $p
## [1] 0.2743863643
##
## $limit
## [1] FALSE
##
## attr(,"class")
## [1] "pobs.spm"
Here and represents parameters when \(Z\) = 1 (H) and 0 (L).
library(stpm)
data.genetic <- sim_pobs(N=5, mode='observed')
head(data.genetic)
## id xi t1 t2 Z y1 y1.next
## 1 0 0 52.92633782 53.83660991 0 80.72067931 85.20371375
## 2 0 0 53.83660991 54.79346598 0 85.20371375 83.74273818
## 3 0 0 54.79346598 55.77058561 0 83.74273818 80.86897902
## 4 0 0 55.77058561 56.70478555 0 80.86897902 75.64129154
## 5 0 0 56.70478555 57.72601473 0 75.64129154 71.82635171
## 6 0 0 57.72601473 58.77020023 0 71.82635171 71.39442583
data.nongenetic <- sim_pobs(N=10, mode='unobserved')
head(data.nongenetic)
## id xi t1 t2 y1 y1.next
## 1 0 0 84.29181177 85.26539264 79.88872595 84.12203915
## 2 0 0 85.26539264 86.32976741 84.12203915 83.44782763
## 3 0 0 86.32976741 87.29412396 83.44782763 83.27045530
## 4 0 0 87.29412396 88.27682348 83.27045530 85.05927777
## 5 0 0 88.27682348 89.21539052 85.05927777 89.75485956
## 6 0 0 89.21539052 90.30465647 89.75485956 91.90341543
#Parameters estimation:
pars <- spm_pobs(x=data.genetic, y = data.nongenetic, mode='combined')
## Parameter thetaH achieved lower/upper bound.
## 0.072
pars
## $aH
## [,1]
## [1,] -0.05109289625
##
## $aL
## [,1]
## [1,] -0.002363482725
##
## $f1H
## [,1]
## [1,] 56.1002391
##
## $f1L
## [,1]
## [1,] 79.7742775
##
## $QH
## [,1]
## [1,] 1.286046738e-08
##
## $QL
## [,1]
## [1,] 2.707157887e-08
##
## $fH
## [,1]
## [1,] 58.37145763
##
## $fL
## [,1]
## [1,] 87.68090394
##
## $bH
## [,1]
## [1,] 3.997357435
##
## $bL
## [,1]
## [1,] 4.858458942
##
## $mu0H
## [1] 7.345627583e-06
##
## $mu0L
## [1] 9.009038329e-06
##
## $thetaH
## [1] 0.072
##
## $thetaL
## [1] 0.0900036672
##
## $p
## [1] 0.2298334868
##
## $limit
## [1] TRUE
##
## attr(,"class")
## [1] "pobs.spm"
Here mode ‘observed’ is used for simlation of data with genetic component \(Z\) and ‘unobserved’ - without genetic component.
This type of SPM also uses genetic component by analogy from the previous chapters but uses explicit gradient function which speeds up computations significantly. See (He et al. 2017) for details. Below we provide examples of usage:
library(stpm)
data(ex_spmcon1dg)
head(ex_data$spm_data)
## id xi t1 t2 y y.next
## 1 1 0 30 31 2.000000000 2.024328135
## 2 1 0 31 32 2.024328135 1.927486318
## 3 1 0 32 33 1.927486318 1.899083801
## 4 1 0 33 34 1.899083801 2.061574385
## 5 1 0 34 35 2.061574385 2.034558435
## 6 1 0 35 36 2.034558435 2.114382051
head(ex_data$gene_data)
## id geno
## 1 1 1
## 2 2 1
## 3 3 0
## 4 4 0
## 5 5 1
## 6 6 0
res <- spm_con_1d_g(spm_data=ex_data$spm_data,
gene_data=ex_data$gene_data,
a = -0.02, b=0.2, q=0.01, f=3, f1=3, mu0=0.01, theta=1e-05,
upper=c(-0.01,3,0.1,10,10,0.1,1e-05), lower=c(-1,0.01,0.00001,1,1,0.001,1e-07),
effect=c('q'), method = "tnewton")
## [1] "Initial values:"
## [1] -2e-02 -2e-02 2e-01 2e-01 1e-02 1e-02 3e+00 3e+00 3e+00 1e-02
## [11] 1e-02 1e-05
## [1] "Lower bounds:"
## [1] -1e+00 -1e+00 1e-02 1e-02 1e-05 1e-05 1e+00 1e+00 1e+00 1e-03
## [11] 1e-03 1e-07
## [1] "Upper bounds:"
## [1] -1e-02 -1e-02 3e+00 3e+00 1e-01 1e-01 1e+01 1e+01 1e+01 1e-01
## [11] 1e-01 1e-05
res
## $est
## Coeff. Std. Err. z p value
## a -0.030542287582 0.0009054040948 -33.733321680772 0.000000000e+00
## b 0.101197292964 0.0002776220664 364.514587343492 0.000000000e+00
## q_0 0.004484150670 0.0011948015801 3.753050502424 1.746956431e-04
## q_2 0.004879437094 0.0016644345209 2.931588496421 3.372332686e-03
## f 2.045171421288 0.1237085096922 16.532180578178 0.000000000e+00
## f1 3.010584770658 0.0177570676278 169.542901664270 0.000000000e+00
## mu0 0.001420354287 0.0002449270474 5.799091207131 6.667526309e-09
## theta 0.000010000000 0.0040375911295 0.002476724284 9.980238620e-01
##
## $lik
## [1] -121717.8478
##
## $con
## [1] 1
##
## $message
## [1] "NLOPT_SUCCESS: Generic success return value."
##
## $hessian
## a b q_0 q_2
## a 2418684.84487125 6.662849960e+05 1986.7584649 1121.9247711
## b 666284.99599781 1.332736169e+07 -723.6350360 -551.1722169
## q_0 1986.75846489 -7.236350360e+02 2667427.7128133 855374.9329144
## q_2 1121.92477105 -5.511722169e+02 855374.9329144 855374.9329141
## f -82.61078143 -3.786614884e+01 -32507.4309394 -16187.8862791
## f1 -85019.08361268 -9.944973009e+01 -247.8412004 -131.6918599
## mu0 -11704.11953438 1.410671879e+04 4620297.5328282 2397280.2928824
## theta -895.86213391 1.023340202e+03 468707.8036552 232112.4932970
## f f1 mu0 theta
## a -82.610781428 -85019.083612678 -11704.119534 -895.8621339
## b -37.866148838 -99.449730092 14106.718790 1023.3402017
## q_0 -32507.430939421 -247.841200363 4620297.532828 468707.8036552
## q_2 -16187.886279052 -131.691859939 2397280.292882 232112.4932970
## f 551.678038414 1.016137209 -90004.674280 -8475.9147685
## f1 1.016137209 6201.557308083 1347.685898 126.8541456
## mu0 -90004.674280040 1347.685898473 38250576.325565 2134987.4556726
## theta -8475.914768517 126.854145628 2134987.455673 215962.9464849
##
## $beta
## Coeff. Std. Err. Chi. Sq p value
## beta_a NA NA NA NA
## beta_b NA NA NA NA
## beta_q 0.000197643212 0.0009203050627 4.244553338e-05 0.9948018
## beta_f NA NA NA NA
## beta_mu0 NA NA NA NA
Here: spm_data
- A dataset for the SPM model. See the STPM package for more details about the format.
gene_data
- A two column dataset containing the genotypes for the individuals in spm_data. The first column id
is the ID of the individuals in dataset spm_data
, and the second column geno
is the genotype.
a
- The initial value for the paramter . The initial value will be predicted if not specified.
b
- The initial value for the paramter . The initial value will be predicted if not specified.
q
- The initial value for the paramter . The initial value will be predicted if not specified.
f
- The initial value for the paramter . The initial value will be predicted if not specified.
f1
- The initial value for the paramter . The initial value will be predicted if not specified.
mu0
- The initial value for the paramter in the baseline hazard. The initial value will be predicted if not specified.
theta
- The initial value for the paramter in the baseline hazard. The initial value will be predicted if not specified.
lower
- A vector of the lower bound of the parameters.
upper
- A vector of the upper bound of the parameters.
effect
- A character vector of the parameters that are linked to genotypes. The vector can contain any combination of , , , , .
control
- A list of the control parameters for the optimization paramters.
global
- A logical variable indicating whether the MLSL (TRUE) or the L-BFGS (FALSE) algorithm is used for the optimization.
verbose
- A logical variable indicating whether initial information is printed.
ahessian
- A logical variable indicating whether the approximate (FALSE) or analytical (TRUE) Hessian is returned.
est
- The estimates of the parameters.
hessian
- The Hessian matrix of the estimates.
lik
- The minus log-likelihood.
con
- A number indicating the convergence. See the ‘nloptr’ package for more details.
message
- Extra message about the convergence. See the ‘nloptr’ package for more details.
beta
- The coefficients of the genetic effect on the parameters to be linked to genotypes.
The SPM offers longitudinal data imputation with results that are better than from other imputation tools since it preserves data structure, i.e. relation between Y(t) and mu(Y(t),t). Below there are two examples of multiple data imputation with function spm.impute(…).
library(stpm)
#######################################################################
############## One dimensional case (one covariate) ###################
#######################################################################
## Data preparation (short format)#
data <- simdata_discr(N=1000, dt = 2, format="short")
miss.id <- sample(x=dim(data)[1], size=round(dim(data)[1]/4)) # ~25% missing data
incomplete.data <- data
incomplete.data[miss.id,4] <- NA
# End of data preparation #
##### Multiple imputation with SPM #####
imp.data <- spm.impute(x=incomplete.data, id=1, case="xi", t1=3, covariates="y1", minp=1, theta_range=seq(0.075, 0.09, by=0.001))$imputed
##### Look at the incomplete data with missings #####
head(incomplete.data)
## id xi t y1
## 1 1 0 30 82.65461314
## 2 1 0 32 NA
## 3 1 0 34 NA
## 4 1 0 36 NA
## 5 1 0 38 88.32390570
## 6 1 0 40 88.30620433
##### Look at the imputed data #####
head(imp.data)
## id xi t y1
## 1 1 0 30 82.65461314
## 2 1 0 32 82.56857198
## 3 1 0 34 82.48706776
## 4 1 0 36 82.40986126
## 5 1 0 38 88.32390570
## 6 1 0 40 88.30620433
#########################################################
################ Two-dimensional case ###################
#########################################################
## Parameters for data simulation #
a <- matrix(c(-0.05, 0.01, 0.01, -0.05), nrow=2)
f1 <- matrix(c(90, 30), nrow=1, byrow=FALSE)
Q <- matrix(c(1e-7, 1e-8, 1e-8, 1e-7), nrow=2)
f0 <- matrix(c(80, 25), nrow=1, byrow=FALSE)
b <- matrix(c(5, 3), nrow=2, byrow=TRUE)
mu0 <- 1e-04
theta <- 0.07
ystart <- matrix(c(80, 25), nrow=2, byrow=TRUE)
## Data preparation #
data <- simdata_discr(N=1000, a=a, f1=f1, Q=Q, f=f0, b=b, ystart=ystart, mu0 = mu0, theta=theta, dt=2, format="short")
## Delete some observations in order to have approx. 25% missing data
incomplete.data <- data
miss.id <- sample(x=dim(data)[1], size=round(dim(data)[1]/4))
incomplete.data <- data
incomplete.data[miss.id,4] <- NA
miss.id <- sample(x=dim(data)[1], size=round(dim(data)[1]/4))
incomplete.data[miss.id,5] <- NA
## End of data preparation #
###### Multiple imputation with SPM #####
imp.data <- spm.impute(x=incomplete.data, id=1, case="xi", t1=3, covariates=c("y1", "y2"), minp=1, theta_range=seq(0.060, 0.07, by=0.001))$imputed
###### Look at the incomplete data with missings #####
head(incomplete.data)
## id xi t y1 y2
## 1 1 0 30 80.45017457 22.10556634
## 2 1 0 32 86.82854274 NA
## 3 1 0 34 95.80696331 17.19225908
## 4 1 0 36 98.24349138 24.06168982
## 5 1 0 38 NA 26.64258565
## 6 1 0 40 93.64763208 23.85298210
###### Look at the imputed data #####
head(imp.data)
## id xi t y1 y2
## 1 1 0 30 80.45017457 22.10556634
## 2 1 0 32 86.82854274 22.20764576
## 3 1 0 34 95.80696331 17.19225908
## 4 1 0 36 98.24349138 24.06168982
## 5 1 0 38 97.71211436 26.64258565
## 6 1 0 40 93.64763208 23.85298210
We provide a simple function to predict the next value of . Refer to the example below:
#library(stpm)
#data <- simdata_discr(N=100, format="long")
#res <- spm_discrete(data)
#splitted <- split(data, data$id)
#df <- data.frame()
#lapply(1:100, function(i) {df<<-rbind(df,splitted[[i]][dim(splitted[[i]])[1],c("id", "xi", "t1", "y1")])})
#names(df) <- c("id", "xi", "t", "y")
#predicted <- predict(object=res, data=df, dt=3)
#head(predicted)
The package offers following five hypotheses to test for function (Arbeev et al. 2016):
H01
: \(Q(t)=0\) (i.e., \(a_Q = 0\) and \(b_Q = 0\),so that there is no quadratic term in the hazard rate and mortality is described by the baseline Gompertz rate \(μ_0(t)\)).
H02
: \(Q(t) = a_Q\) (i.e., \(b_Q = 0\)).
H03
: \(f_1(t) = 0\) (i.e., \(a_{f1} = 0\) and \(b_{f1} = 0\)).
H04
: \(f_1(t) = a_{f_1}\) (i.e., \(b_{f_1} = 0\)).
H05
: \(a(t) = a_Y\) (i.e., \(b_Y = 0\)).
To perform hypothesis testing you should put the variable lrtest
to TRUE
(this is "H01"
by default) or to any of the following: "H01"
, "H02"
, "H03"
, "H04"
, "H05"
.
library(stpm)
n <- 1000
# Data simulation:
data <- simdata_time_dep(N=n, format="long")
head(data)
# Hypotheses testing
## H01
res <- spm_time_dep(data, verbose=F,
frm = list(at="a", f1t="f1", Qt="Q", ft="f", bt="b", mu0t="mu0"),
start=list(a=-0.05, f1=80, Q=1e-8, f=90, b=5, mu0=0.001),
lb=c(a=-1, f1=30, Q=1e-9, f=10, b=1, mu0=1e-6),
ub=c(a=0, f1=120, Q=1e-7, f=150, b=10, mu0=1e-2),
opts = list(algorithm = "NLOPT_LN_NELDERMEAD",
maxeval = 200, ftol_rel = 1e-12), lrtest="H01")
res$alternative$lr.test.pval
## H02
res <- spm_time_dep(data, verbose=F,
frm = list(at="a", f1t="f1", Qt="1e-6", ft="f", bt="b", mu0t="mu0"),
start=list(a=-0.05, f1=80, Q=1e-8, f=90, b=5, mu0=0.001),
lb=c(a=-1, f1=30, Q=1e-9, f=10, b=1, mu0=1e-6),
ub=c(a=0, f1=120, Q=1e-7, f=150, b=10, mu0=1e-2),
opts = list(algorithm = "NLOPT_LN_NELDERMEAD",
maxeval = 200, ftol_rel = 1e-12), lrtest="H02")
res$alternative$lr.test.pval
## H03
res <- spm_time_dep(data, verbose=F,
frm = list(at="a", f1t="f1", Qt="Q", ft="f", bt="b", mu0t="mu0"),
start=list(a=-0.05, f1=80, Q=1e-8, f=90, b=5, mu0=0.001),
ub=c(a=0, f1=120, Q=1e-7, f=150, b=10, mu0=1e-2),
opts = list(algorithm = "NLOPT_LN_NELDERMEAD",
maxeval = 200, ftol_rel = 1e-12), lrtest="H03")
res$alternative$lr.test.pval
## H04
res <- spm_time_dep(data, verbose=F,
frm = list(at="a", f1t="120", Qt="Q", ft="f", bt="b", mu0t="mu0"),
start=list(a=-0.05, f1=80, Q=1e-8, f=90, b=5, mu0=0.001),
lb=list(a=-1, f1=30, Q=1e-9, f=10, b=1, mu0=1e-6),
opts = list(algorithm = "NLOPT_LN_NELDERMEAD",
maxeval = 200, ftol_rel = 1e-12), lrtest="H04")
res$alternative$lr.test.pval
## H05
res <- spm_time_dep(data, verbose=F,
frm = list(at="-0.1", f1t="f1", Qt="Q", ft="f", bt="b", mu0t="mu0"),
start=list(a=-0.05, f1=80, Q=1e-8, f=90, b=5, mu0=0.001),
opts = list(algorithm = "NLOPT_LN_NELDERMEAD",
maxeval = 200, ftol_rel = 1e-12), lrtest="H05")
res$alternative$lr.test.pval
Akushevich, I., A. Kulminski, and K. G. Manton. 2005. “Life Tables with Covariates: Dynamic Model for Nonlinear Analysis of Longitudinal Data.” Mathematical Population Studies 12 (2). Informa UK Limited: 51–80. doi:10.1080/08898480590932296.
Arbeev, Konstantin G., Igor Akushevich, Alexander M. Kulminski, Svetlana V. Ukraintseva, and Anatoliy I. Yashin. 2014. “Joint Analyses of Longitudinal and Time-to-Event Data in Research on Aging: Implications for Predicting Health and Survival.” Frontiers in Public Health 2 (November). Frontiers Media SA. doi:10.3389/fpubh.2014.00228.
Arbeev, Konstantin G., Igor Akushevich, Alexander M. Kulminski, Liubov S. Arbeeva, Lucy Akushevich, Svetlana V. Ukraintseva, Irina V. Culminskaya, and Anatoli I. Yashin. 2009. “Genetic Model for Longitudinal Studies of Aging, Health, and Longevity and Its Potential Application to Incomplete Data.” Journal of Theoretical Biology 258 (1). Elsevier BV: 103–11. doi:10.1016/j.jtbi.2009.01.023.
Arbeev, Konstantin G., Alan A. Cohen, Liubov S. Arbeeva, Emmanuel Milot, Eric Stallard, Alexander M. Kulminski, Igor Akushevich, Svetlana V. Ukraintseva, Kaare Christensen, and Anatoliy I. Yashin. 2016. “Optimal Versus Realized Trajectories of Physiological Dysregulation in Aging and Their Relation to Sex-Specific Mortality Risk.” Frontiers in Public Health 4 (January). Frontiers Media SA. doi:10.3389/fpubh.2016.00003.
He, Liang, Ilya Zhbannikov, Konstantin G. Arbeev, Anatoliy I. Yashin, and Alexander M. Kulminski. 2017. “A Genetic Stochastic Process Model for Genome-Wide Joint Analysis of Biomarker Dynamics and Disease Susceptibility with Longitudinal Data.” Genetic Epidemiology 41 (7). Wiley: 620–35. doi:10.1002/gepi.22058.
Woodbury, Max A., and Kenneth G. Manton. 1977. “A Random-Walk Model of Human Mortality and Aging.” Theoretical Population Biology 11 (1). Elsevier BV: 37–48. doi:10.1016/0040-5809(77)90005-3.
Yashin, Anatoli I., Konstantin G. Arbeev, Igor Akushevich, Aliaksandr Kulminski, Lucy Akushevich, and Svetlana V. Ukraintseva. 2007. “Stochastic Model for Analysis of Longitudinal Data on Aging and Mortality.” Mathematical Biosciences 208 (2). Elsevier BV: 538–51. doi:10.1016/j.mbs.2006.11.006.
Yashin, Anatoli I., Konstantin G. Arbeev, Aliaksandr Kulminski, Igor Akushevich, Lucy Akushevich, and Svetlana V. Ukraintseva. 2007. “Health Decline, Aging and Mortality: How Are They Related?” Biogerontology 8 (3). Springer Nature: 291–302. doi:10.1007/s10522-006-9073-3.