%%\VignetteIndexEntry{User guide to msm with worked examples} %\VignettePackage{msm} \documentclass{article} %% Need to modify Sweave.sty to pass pdftex option to graphicx. %\usepackage{Sweave-local} \usepackage{graphics} \RequirePackage[T1]{fontenc} %%Check if we are compiling under latex or pdflatex \ifx\pdftexversion\undefined \RequirePackage[dvips]{graphicx} \else \RequirePackage[pdftex]{graphicx} \RequirePackage{epstopdf} \fi \RequirePackage{ae,fancyvrb} \IfFileExists{upquote.sty}{\RequirePackage{upquote}}{} \setkeys{Gin}{width=0.8\textwidth} \DefineVerbatimEnvironment{Sinput}{Verbatim}{fontshape=sl} \DefineVerbatimEnvironment{Soutput}{Verbatim}{} \DefineVerbatimEnvironment{Scode}{Verbatim}{fontshape=sl} \newenvironment{Schunk}{}{} \usepackage{times} \usepackage{url} \addtolength{\textwidth}{2cm} \newcommand{\Exp}{\mathop{\mathrm{Exp}}} \newcommand{\etal}{{\textit{et al.}}} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} %% version <- 1.3 <>= version <- gsub("Version: +", "", packageDescription("msm", lib.loc=c("../..",.libPaths()))$Version) @ \title{Multi-state modelling with R: the {\tt msm} package \vskip 0.2in \large{Version % <>= cat(version) @ \vskip 0.1in <>= cat(format(Sys.time(), "%d %B, %Y")) @ }} \author{Christopher Jackson\\MRC Biostatistics Unit\\Cambridge, U.K.\\ \texttt{chris.jackson@mrc-bsu.cam.ac.uk}} \date{} %% label with date when Rnw is compiled, not when tex is compiled. should be release date of package \bibliographystyle{unsrt} \begin{document} \maketitle \begin{abstract} The multi-state Markov model is a useful way of describing a process in which an individual moves through a series of states in continuous time. The \Rpackage{msm} package for R allows a general multi-state model to be fitted to longitudinal data. Data often consist of observations of the process at arbitrary times, so that the exact times when the state changes are unobserved. For example, the progression of chronic diseases is often described by stages of severity, and the state of the patient may only be known at doctor or hospital visits. Features of \Rpackage{msm} include the ability to model transition rates and hidden Markov output models in terms of covariates, and the ability to model data with a variety of observation schemes, including censored states. Hidden Markov models, in which the true path through states is only observed through some error-prone marker, can also be fitted. The observation is generated, conditionally on the underlying states, via some distribution. An example is a screening misclassification model in which states are observed with error. More generally, hidden Markov models can have a continuous response, with some arbitrary distribution, conditionally on the underlying state. This manual introduces the theory behind multi-state Markov and hidden Markov models, and gives a tutorial in the typical use of the \Rpackage{msm} package, illustrated by some typical applications to modelling chronic diseases. Much of the material in this manual is published, in a more concise form, in Journal of Statistical Software (2011) 38(8):1-29, \url{http://www.jstatsoft.org/v38/i08/} \end{abstract} \section{Multi-state models} \label{sec:multistate} \subsection{Introduction} \label{sec:intro} Figure \ref{fig:multi} illustrates a multi-state model in continuous time. Its four states are labelled {\bf 1, 2, 3, 4}. At a time $t$ the individual is in state $S(t)$. The arrows show which transitions are possible between states. The next state to which the individual moves, and the time of the change, are governed by a set of {\em transition intensities} $q_{rs}(t, z(t))$ for each pair of states $r$ and $s$. The intensities may also depend on the time of the process $t$, or more generally a set of individual-specific or time-varying explanatory variables $z(t)$. The intensity represents the instantaneous risk of moving from state $r$ to state $s$: \begin{equation} \label{eq:multi:intensity} q_{rs}(t, z(t)) = \lim_{\delta t \rightarrow 0} P (S(t+\delta t) = s | S(t) = r) / \delta t \end{equation} The intensities form a matrix $Q$ whose rows sum to zero, so that the diagonal entries are defined by $q_{rr} = - \sum_{s \neq r} q_{rs}$. To fit a multi-state model to data, we estimate this transition intensity matrix. We concentrate on {\em Markov} models here. The Markov assumption is that future evolution only depends on the current state. That is, $q_{rs}(t, z(t), \mathcal{F}_t)$ is independent of $\mathcal{F}_t$, the observation history $\mathcal{F}_t$ of the process up to the time preceding $t$. See, for example, Cox and Miller\cite{cox:miller} for a thorough introduction to the theory of continuous-time Markov chains. \begin{figure}[p] \begin{center} \scalebox{0.4}{\includegraphics{figures/multistate}} \[ Q = \left( \begin{array}{llll} q_{11} & q_{12} & q_{13} & q_{14}\\ q_{21} & q_{22} & q_{23} & q_{24}\\ q_{31} & q_{32} & q_{33} & q_{34}\\ q_{41} & q_{42} & q_{43} & q_{44}\\ \end{array} \right ) \] \caption{\label{fig:multi}General multi-state model.} \end{center} \end{figure} In a time-homogeneous continuous-time Markov model, a single period of occupancy (or \emph{sojourn time}) in state $r$ has an exponential distribution, with rate given by $-q_{rr}$, (or mean $-1 / q_{rr}$). The remaining elements of the $r$th row of $Q$ are \emph{proportional to} the probabilities governing the next state after $r$ to which the individual makes a transition. The probability that the individual's next move from state $r$ is to state $s$ is $-q_{rs} / q_{rr}$. \subsection{Disease progression models} The development of the \Rpackage{msm} package was motivated by applications to disease modelling. Many chronic diseases have a natural interpretation in terms of staged progression. Multi-state Markov models in continuous time are often used to model the course of diseases. A commonly-used model is illustrated in Figure \ref{fig:disease}. This represents a series of successively more severe disease stages, and an `absorbing' state, often death. The patient may advance into or recover from adjacent disease stages, or die at any disease stage. Observations of the state $S_i(t)$ are made on a number of individuals $i$ at arbitrary times $t$, which may vary between individuals. The stages of disease may be modelled as a homogeneous continuous-time Markov process, with a transition matrix $Q$, pictured below Figure \ref{fig:disease}. A commonly-used model is the \emph{illness-death} model, with three states representing health, illness and death (Figure \ref{fig:multi:illdeath}). Transitions are permitted from health to illness, illness to death and health to death. Recovery from illness to health is sometimes also considered. A wide range of medical situations have been modelled using multi-state methods, for example, screening for abdominal aortic aneurysms (Jackson \etal\cite{jackson:sharples:2003}), problems following lung transplantation (Jackson and Sharples\cite{jackson:sharples:2002}), problems following heart transplantation (Sharples\cite{sharp:gibbs}, Klotz and Sharples\cite{klotz:est}), hepatic cancer (Kay\cite{kay:mark}), HIV infection and AIDS (Longini \etal\cite{long1}, Satten and Longini\cite{sattlong}, Guihenneuc-Jouyaux \etal\cite{rich}, Gentleman \etal\cite{gentlaw}), diabetic complications (Marshall and Jones\cite{retino2}, Andersen\cite{andersen88}), breast cancer screening (Duffy and Chen\cite{duffy95}, Chen \emph{et al.}\cite{duffy96}), cervical cancer screening (Kirby and Spiegelhalter\cite{kirby:spiegelhalter}) and liver cirrhosis (Andersen \etal\cite{ander:proth}). Many of these references also describe the mathematical theory, which will be reviewed in the following sections. \begin{figure}[p] \centering \vskip 1cm \scalebox{1.0}{\includegraphics{figures/general}} \[ Q = \left( \begin{array}{llllll} q_{11} & q_{12} & 0 & 0 & \ldots & q_{1n}\\ q_{21} & q_{22} & q_{23} & 0 & \ldots & q_{2n}\\ 0 & q_{32} & q_{33} & q_{34} & \ddots & q_{3n}\\ 0 & 0 & q_{43} & q_{44} & \ddots & q_{4n}\\ \vdots & \vdots & \ddots & \ddots & \ddots & \vdots\\ 0 & 0 & 0 & 0 & \ldots & 0\\ \end{array} \right ) \] \caption{\label{fig:disease}General model for disease progression.} \end{figure} \begin{figure}[p] \begin{center} \scalebox{0.4}{\includegraphics{figures/illdeath}} \caption{Illness-death model.} \label{fig:multi:illdeath} \end{center} \end{figure} \subsection{Arbitrary observation times} \label{sec:arbitr-observ-times} Longitudinal data from monitoring disease progression are often incomplete in some way. Usually patients are seen at intermittent follow-up visits, at which monitoring information is collected, but information from the periods between visits is not available. Often the exact time of disease onset is unknown. Thus, the changes of state in a multi-state model usually occur at unknown times. Also a subject may only be followed up for a portion of their disease history. A fixed observation schedule may be specified in advance, but in practice times of visits may vary due to patient and hospital pressures. The states of disease progression models often include death. Death times are commonly recorded to within a day. Also observations may be censored. For example, at the end of a study, an individual may be known only to be alive, and in an unknown state. A typical sampling situation is illustrated in Figure \ref{fig:multi:sampling}. The individual is observed at four occasions through 10 months. The final occasion is the death date which is recorded to within a day. The only other information available is the occupation of states 2, 2, and 1 at respective times 1.5, 3.5 and 5. The times of movement between states and the state occupancy in between the observation times are unknown. Although the patient was in state 3 between 7 and 9 months this was not observed at all. \paragraph{Informative sampling times} To fit a model to longitudinal data with arbitrary sampling times we must consider the reasons why observations were made at the given times. This is analogous to the problem of missing data, where the fact that a particular observation is missing may implicitly give information about the value of that observation. Possible observation schemes include: \begin{itemize} \item {\em fixed}. Each patient is observed at fixed intervals specified in advance. \item {\em random}. The sampling times vary randomly, independently of the current state of the disease. \item {\em doctor's care}. More severely ill patients are monitored more closely. The next sampling time is chosen on the basis of the current disease state. \item patient {\em self-selection}. A patient may decide to visit the doctor on occasions when they are in a poor condition. \end{itemize} Gr\"uger \etal \cite{gruger:valid} discussed conditions under which sampling times are \emph{informative}. If a multi-state model is fitted, ignoring the information available in the sampling times, then inference may be biased. Mathematically, because the sampling times are often themselves random, they should be modelled along with the observation process $X_t$. However the ideal situation is where the joint likelihood for the times and the process is proportional to the likelihood obtained if the sampling times were fixed in advance. Then the parameters of the process can be estimated independently of the parameters of the sampling scheme. In particular, they showed that fixed, random and doctor's care observation policies are not informative, whereas patient self-selection is informative. Note that \Rpackage{msm} does not deal with informative sampling times. See, e.g. \cite{sweeting:inform:msm} for some methods in this case, which require specialised programming. \begin{figure}[p] \begin{center} \scalebox{0.6}{\rotatebox{270}{\includegraphics{figures/sampling}}} \caption{Evolution of a multi-state model. The process is observed on four occasions.} \label{fig:multi:sampling} \end{center} \end{figure} \subsection{Likelihood for the multi-state model} \label{sec:multi:likelihood} Kalbfleisch and Lawless\cite{kalbfleisch:lawless} and later Kay \cite{kay:mark} described a general method for evaluating the likelihood for a general multi-state model in continuous time, applicable to any form of transition matrix. The only available information is the observed state at a set of times, as in Figure \ref{fig:multi:sampling}. The sampling times are assumed to be non-informative. \paragraph{Transition probability matrix} The likelihood is calculated from the \emph{transition probability matrix} $P(t)$. For a time-homogeneous process, the $(r,s)$ entry of $P(t)$, $p_{rs}(t)$, is the probability of being in state $s$ at a time $t+u$ in the future, given the state at time $u$ is $r$. It does not say anything about the time of transition from $r$ to $s$, indeed the process may have entered other states between times $u$ and $t+u$. $P(t)$ can be calculated by taking the matrix exponential of the scaled transition intensity matrix (see, for example, Cox and Miller \cite{cox:miller}). \begin{equation} \label{eq:exptq} P(t) = \Exp(tQ) \end{equation} The matrix exponential $\Exp$ is different from a scalar exponential. The exponential of a matrix is defined by the same "power series" $\Exp(X) = 1 + X^2/2! + X^3/3! + ...$ as the scalar exponential, except that each term $X^k$ in the series is defined by matrix products, not element-wise scalar multiplication. It is notoriously difficult to calculate reliably, as discussed by Moler and van Loan \cite{matrixexp}. For simpler models, it is feasible to calculate an analytic expression for each element of $P(t)$ in terms of $Q$. This is generally faster and avoids the potential numerical instability of calculating the matrix exponential. Symbolic algebra sofware, such as Mathematica, can be helpful for obtaining these expressions. For example, the three-state illness-death model with no recovery has a transition intensity matrix of \[ Q = \left( \begin{array}{llllll} -(q_{12} + q_{13}) & q_{12} & q_{13}\\ 0 & - q_{23} & q_{23}\\ 0 & 0 & 0\\ \end{array} \right ) \] The corresponding time $t$ transition probabilities are \begin{eqnarray*} p_{11}(t) & = & e^{-(q_{12} + q_{13})t}\\ p_{12}(t) & = & \left\{ \begin{array}{ll} \frac{ q_{12} }{q_{12} + q_{13} - q_{23} } (e^{-q_{23} t} - e^{-(q_{12} + q_{13})t}) & (q_{12} + q_{13} \neq q_{23})\\ q_{12}te^{(-(q_{12} + q_{13})t} & (q_{12} + q_{13} = q_{23}) \end{array} \right. \\ p_{13}(t) & = & \left\{ \begin{array}{ll} 1 - e^{-(q_{12} + q_{13})t} - \frac{ q_{12} }{q_{12} + q_{13} - q_{23} } (e^{-q_{23} t} - e^{-(q_{12} + q_{13})t}) & (q_{12} + q_{13} \neq q_{23})\\ (-1 + e^{(q_{12} + q_{13})t} - q_{12}t)e^{-(q_{12} + q_{13})t} & (q_{12} + q_{13} = q_{23}) \end{array} \right. \\ p_{21}(t) & = & 0\\ p_{22}(t) & = & e^{-q_{23}t}\\ p_{23}(t) & = & 1 - e^{-q_{23}t}\\ p_{31}(t) & = & 0\\ p_{32}(t) & = & 0\\ p_{33}(t) & = & 1\\ \end{eqnarray*} The \Rpackage{msm} package calculates $P(t)$ analytically for selected 2, 3, 4 and 5-state models, illustrated in Figures \ref{fig:anp2}--\ref{fig:anp5}. For other models, which can have any transition structure on any number of states in principle, $P(t)$ is determined from the matrix exponential. This is calculated using eigensystem decomposition (if eigenvalues are distinct) or a method based on Pad\'e approximants with scaling and squaring \cite{matrixexp} (if there are repeated eigenvalues). Notice that the states are not labelled in these figures. Each graph can correspond to several different $Q$ matrices, depending on how the states are labelled. For example, Figure~\ref{fig:anp2} a) illustrates the model defined by either \( Q = \left( \begin{array}{ll} - q_{12} & q_{12} \\ 0 & 0 \end{array} \right) \) or \( Q = \left( \begin{array}{ll} 0 & 0\\ q_{21} & - q_{21} \end{array} \right) \). \begin{figure} \begin{center} a) \includegraphics[width=3cm]{figures/p2q1} \hskip 3cm b) \includegraphics[width=3cm]{figures/p2q12} \end{center} \caption{\label{fig:anp2}Two-state models fitted using analytic $P(t)$ matrices in \Rpackage{msm}. Implemented for all permutations of state labels 1, 2.} \end{figure} \begin{figure} \begin{center} a) \includegraphics[width=3cm]{figures/p3q12} \hskip 3cm b) \includegraphics[width=5cm]{figures/p3q14} \vskip 1cm c) \includegraphics[width=3cm]{figures/p3q16}\hskip 3cm d) \includegraphics[width=3cm]{figures/p3q124}\vskip 1cm e) \includegraphics[width=3cm]{figures/p3q135}\hskip 3cm f) \includegraphics[width=3cm]{figures/p3q1246} \end{center} \caption{\label{fig:anp3}Three-state models fitted using analytic $P(t)$ matrices in \Rpackage{msm}. Implemented for all permutations of state labels 1, 2, 3.} \end{figure} \begin{figure} \begin{center} a) \includegraphics[width=7cm]{figures/p4q159} \vskip 1cm b) \includegraphics[width=7cm]{figures/p4q13569} \end{center} \caption{\label{fig:anp4}Four-state models fitted using analytic $P(t)$ matrices in \Rpackage{msm}. Implemented for all permutations of state labels 1, 2, 3, 4.} \end{figure} \begin{figure} \begin{center} a) \includegraphics[width=9cm]{figures/p5q1_6_11_16}\vskip 1cm b) \includegraphics[width=9cm]{figures/p5q1_4_6_8_11_12_16}\vskip 1cm c) \includegraphics[width=5cm]{figures/p5q1_6_7_11_12} \end{center} \caption{\label{fig:anp5}Five-state models fitted using analytic $P(t)$ matrices in \Rpackage{msm}. Implemented for all permutations of state labels 1, 2, 3, 4, 5.} \end{figure} \paragraph{Likelihood for intermittently-observed processes} Suppose $i$ indexes $M$ individuals. The data for individual $i$ consist of a series of times $(t_{i1}, \ldots, t_{in_i})$ and corresponding states $(S(t_{i1}), \ldots, S(t_{in_i}))$. Consider a general multi state model, with a pair of successive observed disease states $S(t_j), S(t_{j+1})$ at times $t_j, t_{j+1}$. The contribution to the likelihood from this pair of states is \begin{equation} \label{eq:multi:lik:contrib} L_{i, j} = p_{S(t_j)S(t_{j+1})}(t_{j+1} - t_j) \end{equation} This is the entry of the transition matrix $P(t)$ at the $S(t_j)$th row and $S(t_{j+1})$th column, evaluated at $t = t_{j+1} - t_j$. The full likelihood $L(Q)$ is the product of all such terms $L_{i,j}$ over all individuals and all transitions. It depends on the unknown transition matrix $Q$, which was used to determine $P(t)$. \paragraph{Exactly-observed death times} In observational studies of chronic diseases, it is common that the time of death is known, but the state on the previous instant before death is unknown. If $S(t_{j+1}) = D $ is such a death state, then the contribution to the likelihood is summed over the unknown state $m$ on the instant before death: \begin{equation} \label{eq:multi:lik:death} L_{i, j} = \sum_{m \neq D} p_{S(t_j),m}(t_{j+1} - t_j) q_{m, D} \end{equation} The sum is taken over all possible states $m$ which can be visited between $S(t_j)$ and $D$. \paragraph{Exactly observed transition times} If the times $(t_{i1}, \ldots, t_{in_i})$ had been the {\em exact} transition times between the states, with no transitions between the observation times, then the contributions would be of the form \begin{equation} \label{eq:multi:lik:exact} L_{i, j} = \exp(q_{S(t_j)S(t_{j})}(t_{j+1} - t_j)) q_{S(t_j)S(t_{j+1})} \end{equation} since the state is assumed to be $S(t_j)$ throughout the interval between $t_j$ and $t_{j+1}$ with a known transition to state $S(t_{j+1})$ at $t_{j+1}$. \Rpackage{msm} is restricted to Markov models, but much richer models are possible for this type of data. For example, Putter \etal \cite{putter:mstate} discussed the \Rpackage{mstate} software for semi-parametric multi-state models with non-parametric baseline hazards and Cox regression. The Markov assumption is restrictive but necessary in general to compute a likelihood for intermittently-observed processes. \paragraph{Censored states} A censored quantity is one whose exact value is unknown, but known to be in a certain interval. For example, in survival analysis, a death time is \emph{right-censored} if the study ends and the patient is still alive, since the death time is known to be greater than the end time. In multi-state models for intermittently-observed processes, the times of changes of state are usually \emph{interval censored}, known to be within bounded intervals. This leads to a likelihood based on equation \ref{eq:multi:lik:contrib}. In some circumstances, \emph{states} may be censored as well as \emph{event times}. For example, at the end of some chronic disease studies, patients are known to be alive but in an \emph{unknown state}. For such a censored observation $S(t_{j+1})$ ($j+1=n$) known only to be a state in the set $C$, the equivalent contribution to the likelihood is \begin{equation} \label{eq:multi:lik:deathcens} L_{i, j} = \sum_{m \in C} p_{S(t_j),m}(t_{j+1} - t_j) \end{equation} Note that this special likelihood is not needed if the state is known at the end of the study. In this case, likelihood \ref{eq:multi:lik:contrib} applies. Although the \emph{survival time} is censored, the \emph{state} at the end of the study is not censored. More generally, suppose every observation from a particular individual is censored. Observations $1, 2, \ldots n_i$ are known only to be in the sets $C_1, C_2, \ldots, C_{n_i}$ respectively. The likelihood for this individual is a sum of the likelihoods of all possible paths through the unobserved states. \begin{equation} \label{eq:multi:lik:cens} L_i = \sum_{s_{n_i} \in C_{n_i}} \ldots \sum_{s_2 \in C_2} \sum_{s_1 \in C_1} p_{s_1 s_2}(t_2 - t_1) p_{s_2 s_3} (t_3 - t_2) \ldots p_{s_{n_i-1} s_{n_i}} (t_{n_i} - t_{n_i-1}) \end{equation} Suppose the states comprising the set $C_j$ are $c^{(j)}_1, \ldots, c^{(j)}_{m_j}$. This likelihood can also be written as a matrix product, say, \begin{equation} \label{eq:multi:lik:cens:matrix} L_i = \mathbf{1}^T P^{1,2} P^{2,3} \ldots P^{n_i-1, n_i} \mathbf{1} \end{equation} where $P^{j-1, j}$ is a $m_{j-1} \times m_j$ matrix with $(r,s)$ entry $p_{c^{(j-1)}_r c^{(j)}_s}(t_j - t_{j-1})$, and $\mathbf{1}$ is the vector of ones. The \Rpackage{msm} package allows multi-state models to be fitted to data from processes with arbitrary observation times (panel data), exactly-observed transition times, exact death times and censored states, or a mixture of these schemes. \subsection{Covariates} \label{sec:multi:covariates} The relation of constant or time-varying characteristics of individuals to their transition rates is often of interest in a multi-state model. Explanatory variables for a particular transition intensity can be investigated by modelling the intensity as a function of these variables. Marshall and Jones \cite{retino2} described a form of a {\em proportional hazards} model, where the transition intensity matrix elements $q_{rs}$ which are of interest can be replaced by \[ q_{rs}(z(t)) = q_{rs}^{(0)} \exp(\beta_{rs}^T z(t)) \] The new $Q$ is then used to determine the likelihood. If the covariates $z(t)$ are time dependent, the contributions to the likelihood of the form $p_{rs} (t - u)$ are replaced by \[ p_{rs}(t - u, z(u)) \] although this requires that the value of the covariate is known at every observation time $u$. Sometimes covariates are observed at different times to the main response, for example recurrent disease events or other biological markers. In some of these cases it could be assumed that the covariate is a step function which remains constant between its observation times. If the main response (the state of the Markov process) is not observed at the times when the covariate changes, it could be considered as a "censored" state (as in Section \ref{sec:multi:likelihood}). The \Rpackage{msm} package allows individual-specific or time dependent covariates to be fitted to transition intensities. In order to calculate transition probabilities $P(t)$ on which the likelihood depends, time-dependent covariates are assumed to be piecewise-constant. Models whose intensities change with time are called \emph{time-inhomogeneous}. An important special case handled by \Rpackage{msm} is the model in which intensities change at a series of times common to each individual. Marshall and Jones \cite{retino2} described likelihood ratio and Wald tests for covariate selection and testing hypotheses, for example whether the effect of a covariate is the same for all forward transitions in a disease progression model, or whether the effect on backward transitions is equal to minus the effect on forward transitions. \subsection{Hidden Markov models} \label{sec:hmm} In a {\em hidden Markov model} (HMM) the states of the Markov chain are not observed. The observed data are governed by some probability distribution (the \emph{emission} distribution) conditionally on the unobserved state. The evolution of the underlying Markov chain is governed by a transition intensity matrix $Q$ as before. (Figure \ref{fig:multi:hidden}). Hidden Markov models are mixture models, where observations are generated from a certain number of unknown distributions. However the distribution changes through time according to states of a hidden Markov chain. This class of model is commonly used in areas such as speech and signal processing \cite{juang:rabiner} and the analysis of biological sequence data \cite{biolog:seq}. In engineering and biological sequencing applications, the Markov process usually evolves over an equally-spaced, discrete `time' space. Therefore most of the theory of HMM estimation was developed for discrete-time models. HMMs have less frequently been used in medicine, where continuous time processes are often more suitable. A disease process evolves in continuous time, and patients are often monitored at irregular and differing intervals. These models are suitable for estimating population quantities for chronic diseases which have a natural staged interpretation, but which can only be diagnosed by an error-prone marker. The \Rpackage{msm} package can fit continuous-time hidden Markov models with a variety of emission distributions. \begin{figure}[htbp] \begin{center} \scalebox{0.6}{\rotatebox{270}{\includegraphics{figures/hidden}}} \caption{A hidden Markov model in continuous time. Observed states are generated conditionally on an underlying Markov process. } \label{fig:multi:hidden} \end{center} \end{figure} \subsubsection{Misclassification models} An example of a hidden Markov model is a multi-state model with misclassification. Here the observed data are states, assumed to be misclassifications of the true, underlying states. For example, consider a disease progression model with at least a disease-free and a disease state. When screening for the presence of the disease, the screening process can sometimes be subject to error. Then the Markov disease process $S_i(t)$ for individual $i$ is not observed directly, but through realisations $O_i(t)$. The quality of a diagnostic test is often measured by the probabilities that the true and observed states are equal, $Pr(O_i(t) = r | S_i(t) = r)$. Where $r$ represents a `positive' disease state, this is the {\em sensitivity}, or the probability that a true positive is detected by the test. Where $r$ represents a `negative' or disease-free state, this represents the {\em specificity}, or the probability that, given the condition of interest is absent, the test produces a negative result. As an extension to the simple multi-state model described in section \ref{sec:multistate}, the \Rpackage{msm} package can fit a general multi-state model with misclassification. For patient $i$, observation time $t_{ij}$, observed states $O_{ij}$ are generated conditionally on true states $S_{ij}$ according to a {\em misclassification matrix} $E$. This is a $n \times n$ matrix, whose $(r,s)$ entry is \begin{equation} \label{eq:misc} e_{rs} = Pr(O(t_{ij}) = s | S(t_{ij}) = r), \end{equation} which we first assume to be independent of time $t$. Analogously to the entries of $Q$, some of the $e_{rs}$ may be fixed to reflect knowledge of the diagnosis process. For example, the probability of misclassification may be negligibly small for non-adjacent states of disease. Thus the progression through underlying states is governed by the transition intensity matrix $Q$, while the observation process of the underlying states is governed by the misclassification matrix $E$. To investigate explanatory variables $w(t)$ for the probability $e_{rs}$ of misclassification as state $s$ given underlying state $r$, a multinomial logistic regression model can be used: \begin{equation} \label{eq:misccovs} \log \frac{e_{rs}(t)}{e_{rs_0}(t)} = \gamma_{rs}^T w(t). \end{equation} where $s_0$ is some baseline state, usually chosen as the underlying state, or the state with the highest probability (for numerical stability). \subsubsection{General hidden Markov model} \label{sec:hmm:general} Consider now a general hidden Markov model in continuous time. The true state of the model $S_{ij}$ evolves as an unobserved Markov process. Observed data $y_{ij}$ are generated conditionally true states $S_{ij} = 1, 2, \ldots, n$ according to a set of distributions $f_1(y | \theta_1, \gamma_1)$, $f_2(y | \theta_2, \gamma_2)$, $\ldots$, $f_n(y | \theta_n, \gamma_n)$, respectively. $\theta_r$ is a vector of parameters for the state $r$ distribution. One or more of these parameters may depend on explanatory variables through a link-transformed linear model with coefficients $\gamma_r$. \subsubsection{Likelihood for general hidden Markov models} A type of EM algorithm known as the {\em Baum-Welch} or {\em forward-backward} algorithm \cite{baum:petrie66, baum:petrie70}, is commonly used for hidden Markov model estimation in discrete-time applications. See, for example, Durbin \etal \cite{biolog:seq}, Albert \cite{albert99}. A generalisation of this algorithm to continuous time was described by Bureau \etal \cite{bureau:hughes:shiboski:00}. The \Rpackage{msm} package uses a direct method of calculating likelihoods for continuous-time models based on matrix products. This type of method has been described by Macdonald and Zucchini \cite[pp. 77--79]{macdonald:zucchini}, Lindsey \cite[p.73]{lindsey:rm} and Guttorp \cite{guttorp}. Satten and Longini \cite{sattlong} used this method to calculate likelihoods for a hidden Markov model in continuous time with observations of a continuous marker generated conditionally on underlying discrete states. Patient $i$'s contribution to the likelihood is \begin{eqnarray} \label{eq:multi:hiddencontrib} L_i & = & Pr(y_{i1}, \ldots, y_{in_i})\\ & = & \sum Pr(y_{i1}, \ldots, y_{in_i} | S_{i1}, \ldots, S_{in_i}) Pr(S_{i1}, \ldots, S_{in_i}) \nonumber \end{eqnarray} where the sum is taken over all possible paths of underlying states $S_{i1}, \ldots, S_{in_i}$. Assume that the observed states are conditionally independent given the values of the underlying states. Also assume the Markov property, $Pr(S_{ij}|S_{i,j-1}, \ldots, S_{i1}) = Pr(S_{ij}|S_{i,j-1})$. Then the contribution $L_i$ can be written as a product of matrices, as follows. To derive this matrix product, decompose the overall sum in equation \ref{eq:multi:hiddencontrib} into sums over each underlying state. The sum is accumulated over the unknown first state, the unknown second state, and so on until the unknown final state: \begin{eqnarray} \label{eq:multi:hiddenlik} L_i & = & \sum_{S_{i1}} Pr(y_{i1}|S_{i1})Pr(S_{i1}) \sum_{S_{i2}} Pr(y_{i2}|S_{i2})Pr(S_{i2}|S_{i1}) \sum_{S_{i3}} Pr(y_{i3}|S_{i3}) Pr(S_{i3}|S_{i2}) \nonumber \\ & & \ldots \sum_{S_{in_i}} Pr(y_{in_i}|S_{in_i}) Pr(S_{in_i}|S_{in_{i-1}}) \end{eqnarray} where $Pr(y_{ij}|S_{ij})$ is the emission probability density. For misclassification models, this is the misclassification probability $e_{S_{ij} O_{ij}}$. For general hidden Markov models, this is the probability density $f_{S_{ij}}(y_{ij}|\theta_{S_{ij}},\gamma_{S_{ij}})$. $Pr(S_{i,j+1}|S_{ij})$ is the $(S_{ij}, S_{i,j+1})$ entry of the Markov chain transition matrix $P(t)$ evaluated at $t = t_{i,j+1} - t_{ij}$. Let $f$ be the vector with $r$ element the product of the initial state occupation probability $Pr(S_{i1}=r)$ and $Pr(y_{i1}| r)$, and let $\mathbf 1$ be a column vector consisting of ones. For $j = 2, \ldots, n_i$ let $T_{ij}$ be the $R \times R$ matrix (where $R$ is the number of states) with $(r,s)$ entry \[ Pr(y_{ij}| s) p_{rs} (t_{ij} - t_{i,j-1}) \] Then subject $i$'s likelihood contribution is \begin{equation} L_i = f T_{i2} T_{i3}, \ldots T_{in_i} \mathbf 1 \label{eq:multi:hidden:matprod} \end{equation} If $S(t_{j}) = D$ is an absorbing state such as death, measured without error, whose entry time is known exactly, then the contribution to the likelihood is summed over the unknown state at the previous instant before death. The $(r,s)$ entry of $T_{ij}$ is then \[ p_{rs}(t_{j} - t_{j-1}) q_{s, D} \] Section \ref{sec:fitting:hmm:misc} describes how to fit multi-state models with misclassification using the \Rpackage{msm} package, and Section \ref{sec:fitting:hmm:general} describes how to apply general hidden Markov models. \subsubsection{Example of a general hidden Markov model} \label{sec:hmm:example:fev} Jackson and Sharples \cite{jackson:sharples:2002} described a model for FEV$_1$ (forced expiratory volume in 1 second) in recipients of lung transplants. These patients were at risk of BOS (bronchiolitis obliterans syndrome), a progressive, chronic deterioration in lung function. In this example, BOS was modelled as a discrete, staged process, a model of the form illustrated in Figure \ref{fig:disease}, with 4 states. State 1 represents absence of BOS. State 1 BOS is roughly defined as a sustained drop below 80\% below baseline FEV$_1$, while state 2 BOS is a sustained drop below 65\% baseline. FEV$_1$ is measured as a percentage of a baseline value for each individual, determined in the first six months after transplant, and assumed to be 100\% baseline at six months. As FEV$_1$ is subject to high short-term variability due to acute events and natural fluctuations, the exact BOS state at each observation time is difficult to determine. Therefore, a hidden Markov model for FEV$_1$, conditionally on underlying BOS states, was used to model the natural history of the disease. Discrete states are appropriate as onset is often sudden. \paragraph{Model 1} Jackson \cite{my:phd} considered models for these data where FEV$_1$ were Normally distributed, with an unknown mean and variance conditionally each state (\ref{eq:fev:normal}). This model seeks the most likely location for the within-state FEV$_1$ means. \begin{equation} \label{eq:fev:normal} y_{ij} | \{ S_{ij} = k\} \sim N(\mu_k + \beta x_{ij}, \sigma^2_k) \end{equation} \paragraph{Model 2} Jackson and Sharples \cite{jackson:sharples:2002} used a more complex two-level model for FEV$_1$ measurements. Level 1 (\ref{eq:fev:level1}) represents the short-term fluctuation error of the marker around its underlying continuous value $y^{hid}_{ij}$. Level 2 (\ref{eq:fev:level2}) represents the distribution of $y^{hid}_{ij}$ conditionally on each underlying state, as follows. \begin{equation} \label{eq:fev:level1} y_{ij} | y^{hid}_{ij} \qquad \sim N ( y^{hid}_{ij} + \beta x_{ij} , \sigma^2_\epsilon) \end{equation} \begin{equation} \label{eq:fev:level2} y^{hid}_{ij} | S_{ij} \quad \sim \quad \left\{ \begin{array}{cll} \mbox{State}& \mbox{Three state model} & \mbox{Four state model} \\ S_{ij} = 0 & N(\mu_0, \sigma^2_0)I_{[80, \infty)} & N(\mu_0, \sigma^2_0)I_{[80, \infty)} \\ S_{ij} = 1 & N(\mu_1, \sigma^2_1)I_{(0, 80)} & Uniform(65, 80) \\ S_{ij} = 2 & \mbox{(death)} & N(\mu_2, \sigma^2_2)I_{(0, 65)} \\ S_{ij} = 3 & & \mbox{(death)} \end{array} \right . \end{equation} Integrating over $y^{hid}_{ij}$ gives an explicit distribution for $y_{ij}$ conditionally on each underlying state (given in Section \ref{sec:fitting:hmm:general}, Table \ref{tab:hmm:dists}). Similar distributions were originally applied by Satten and Longini \cite{sattlong} to modelling the progression through discrete, unobserved HIV disease states using continuous CD4 cell counts. The \Rpackage{msm} package includes density, quantile, cumulative density and random number generation functions for these distributions. In both models 1 and 2, the term $\beta x_{ij}$ models the short-term fluctuation of the marker in terms of acute events. $x_{ij}$ is an indicator for the occurrence of an acute rejection or infection episode within 14 days of the observation of FEV$_1$. Section \ref{sec:fitting:hmm:general} describes how these and more general hidden Markov models can be fitted with the \Rpackage{msm} package. \clearpage \section{Fitting multi-state models with {\tt msm}} <>= options(width = 60) @ \Rpackage{msm} is a package of functions for multi-state modelling using the R statistical software. The \Rfunction{msm} function itself implements maximum-likelihood estimation for general multi-state Markov or hidden Markov models in continuous time. We illustrate its use with a set of data from monitoring heart transplant patients. Throughout this section ``\textsl{\texttt{>}}'' indicates the R command prompt, \textsl{\texttt{slanted typewriter}} text indicates R commands, and \texttt{typewriter} text R output. \subsection{Installing \tt{msm}} \label{sec:installing} The easiest way to install the \Rpackage{msm} package on a computer connected to the Internet is to run the R command: \begin{Scode} install.packages("msm") \end{Scode} This downloads \Rpackage{msm} from the CRAN archive of contributed R packages (\texttt{cran.r-project.org} or one of its mirrors) and installs it to the default R system library. To install to a different location, for example if you are a normal user with no administrative privileges, create a directory in which R packages are to be stored, say, \texttt{/your/library/dir}, and run \begin{Scode} install.packages("msm", lib='/your/library/dir') \end{Scode} After \Rpackage{msm} has been installed, its functions can be made visible in an R session by <<>>= library(msm) @ or, if it has been installed into a non-default library, \begin{Scode} library(msm, lib.loc='/your/library/dir') \end{Scode} \subsection{Getting the data in} \label{sec:datain} The data are specified as a series of observations, grouped by patient. At minimum there should be a data frame with variables indicating \begin{itemize} \item the time of the observation, \item the observed state of the process. \end{itemize} If the data do not also contain \begin{itemize} \item the subject identification number, \end{itemize} then all the observations are assumed to be from the same subject. The subject ID does not need to be numeric, but data must be grouped by subject, and observations must be ordered by time within subjects. If the model includes variables with missing values, then the corresponding observations are omitted by \Rfunction{msm} with a warning. If you have missing data, as in any statistical model, it is recommended to ensure these do not result in biases. An example data set, taken from monitoring a set of heart transplant recipients, is provided with \Rpackage{msm}. (Note: since \Rpackage{msm} version 1.3, the command \Rfunction{data(cav)} is no longer needed to load the data --- it is now ``lazy-loaded'' when required). Sharples \etal \cite{my:cav} studied the progression of coronary allograft vasculopathy (CAV), a post-transplant deterioration of the arterial walls, using these data. Risk factors and the accuracy of the screening test were investigated using multi-state Markov and hidden Markov models. The first three patient histories are shown below. There are 622 patients in all. \Robject{PTNUM} is the subject identifier. Approximately each year after transplant, each patient has an angiogram, at which CAV can be diagnosed. The result of the test is in the variable \Robject{state}, with possible values 1, 2, 3 representing CAV-free, mild CAV and moderate or severe CAV respectively. A value of 4 is recorded at the date of death. \Robject{years} gives the time of the test in years since the heart transplant. Other variables include \Robject{age} (age at screen), \Robject{dage} (donor age), \Robject{sex} (0=male, 1=female), \Robject{pdiag} (primary diagnosis, or reason for transplant - IHD represents ischaemic heart disease, IDC represents idiopathic dilated cardiomyopathy), \Robject{cumrej} (cumulative number of rejection episodes), and \Robject{firstobs}, an indicator which is 1 when the observation corresponds to the patient's transplant (the first observation), and 0 when the observation corresponds to a later angiogram. <<>>= cav[1:21,] @ A useful way to summarise multi-state data is as a frequency table of pairs of consecutive states. This counts over all individuals, for each state $r$ and $s$, the number of times an individual had an observation of state $r$ followed by an observation of state $s$. The function \Rfunction{statetable.msm} can be used to produce such a table, as follows, <<>>= statetable.msm(state, PTNUM, data=cav) @ Thus there were 148 CAV-free deaths, 48 deaths from state 2, and 55 deaths from state 3. On only four occasions was there an observation of severe CAV followed by an observation of no CAV. \subsection{Specifying a model} \label{sec:specifying:model} We now specify the multi-state model to be fitted to the data. A model is governed by a transition intensity matrix $Q$. For the heart transplant example, there are four possible states through which the patient can move, corresponding to CAV-free, mild/moderate CAV, severe CAV and death. We assume that the patient can advance or recover from consecutive states while alive, and die from any state. Thus the model is illustrated by Figure \ref{fig:disease} with four states, and we have \[ Q = \left( \begin{array}{llll} -(q_{12} + q_{14}) & q_{12} & 0 & q_{14}\\ q_{21} & -(q_{21}+q_{23}+q_{24}) & q_{23} & q_{24}\\ 0 & q_{32} & -(q_{32}+q_{34}) & q_{34}\\ 0 & 0 & 0 & 0 \\ \end{array} \right ) \] It is important to remember that this defines which \emph{instantaneous} transitions can occur in the Markov process, and that the data are \emph{snapshots} of the process (see section \ref{sec:arbitr-observ-times}). Although there were 44 occasions on which a patient was observed in state 1 followed by state 3, we can still have $q_{13}=0$. The underlying model specifies that the patient must have passed through state 2 in between, rather than jumping straight from 1 to 3. If your data represent the exact and complete transition times of the process, then you must specify \Rfunarg{exacttimes=TRUE} or \Rfunarg{obstype=2} in the call to \Rfunction{msm}. To tell \Rfunction{msm} what the allowed transitions of our model are, we define a matrix of the same size as $Q$, containing zeroes in the positions where the entries of $Q$ are zero. All other positions contain an initial value for the corresponding transition intensity. The diagonal entries supplied in this matrix do not matter, as the diagonal entries of $Q$ are defined as minus the sum of all the other entries in the row. This matrix will eventually be used as the \Rfunarg{qmatrix} argument to the \Rfunction{msm} function. For example, <<>>= Q <- rbind ( c(0, 0.25, 0, 0.25), c(0.166, 0, 0.166, 0.166), c(0, 0.25, 0, 0.25), c(0, 0, 0, 0) ) @ Fitting the model is a process of finding values of the seven unknown transition intensities: $q_{12}$, $q_{14}$, $q_{21}$, $q_{23}$, $q_{24}$, $q_{32}$, $q_{34}$, which maximise the likelihood. \subsection{Specifying initial values} \label{sec:inits} The likelihood is maximised by numerical methods, which need a set of initial values to start the search for the maximum. For reassurance that the true maximum likelihood estimates have been found, models should be run repeatedly starting from different initial values. However a sensible choice of initial values can be important for unstable models with flat or multi-modal likelihoods. For example, the transition rates for a model with misclassification could be initialised at the corresponding estimates for an approximating model without misclassification. Initial values for a model without misclassification could be set by supposing that transitions between states take place only at the observation times. If we observe $n_{rs}$ transitions from state $r$ to state $s$, and a total of $n_r$ transitions from state $r$, then $q_{rs} / q_{rr}$ can be estimated by $n_{rs} / n_r$. Then, given a total of $T_r$ years spent in state $r$, the mean sojourn time $1 / q_{rr}$ can be estimated as $T_r / n_r$. Thus, $n_{rs} / T_r$ is a crude estimate of $q_{rs}$. Such default initial values can be used by supplying \Rfunarg{gen.inits=TRUE} in the call to \Rfunction{msm} below, along with a \Rfunarg{qmatrix} whose non-zero entries represent the allowed transitions of the model. Alternatively the function \Rfunction{crudeinits.msm} could be used to get this matrix of initial values explicitly as follows. These methods are only available for non-hidden Markov models. <<>>= Q.crude <- crudeinits.msm(state ~ years, PTNUM, data=cav, qmatrix=Q) @ However, if there are are many changes of state in between the observation times, then this crude approach may fail to give sensible initial values. For the heart transplant example we could also guess that the mean period in each state before moving to the next is about 2 years, and there is an equal probability of progression, recovery or death. This gives $q_{rr} = - 0.5$ for $r = 1, 2, 3$, and $q_{12} = q_{14} = 0.25$, $q_{21} = q_{23} = q_{24} = 0.166$, $q_{32} = q_{34} = 0.25$, and the initial value matrix \Robject{Q} shown above, which we now use to fit the model. \subsection{Running \Rfunction{msm}} \label{sec:running} To fit the model, call the \Rfunction{msm} function with the appropriate arguments. For our running example, we have defined a data set \Robject{cav}, a matrix \Robject{Q} indicating the allowed transitions, and initial values. We are ready to run \Rfunction{msm}. \paragraph{Model 1: simple bidirectional model} <<>>= cav.msm <- msm( state ~ years, subject=PTNUM, data = cav, qmatrix = Q, deathexact = 4) @ In this example the day of death is assumed to be recorded exactly, as is usual in studies of chronic diseases. At the previous instant before death the state of the patient is unknown. Thus we specify \Rfunarg{deathexact = 4}, to indicate to \Rfunction{msm} that the entry times into state 4 are observed in this manner. If the model had five states, and states 4 and 5 were two competing causes of death with times recorded exactly in this way, then we would specify \Rfunarg{deathexact = c(4,5)}. By default, the data are assumed to represent snapshots of the process at arbitrary times. However, observations can also represent exact times of transition, ``exact death times'', or a mixture of these. See the \Rfunarg{obstype} argument to \Rfunction{msm}. While the \Rfunction{msm} function runs, it searches for the maximum of the likelihood of the unknown parameters. Internally, it uses the R function \Rfunction{optim} to minimise the minus log-likelihood. When the data set, the model, or both, are large, then this may take a long time. It can then be useful to see the progress of the optimisation algorithm. To do this, we can specify a \Rfunarg{control} argument to \Rfunction{msm}, which is passed internally to the \Rfunction{optim} function. For example \texttt{control = list(trace=1, REPORT=1)}. See the help page for \Rfunction{optim}, <>= help(optim) @ for more options to control the optimisation. \footnote{Note that since version 1.3.2, \Rfunarg{method=''BFGS''}, is the default optimisation algorithm in \Rfunction{msm}, since it can use analytic derivatives, which are available for most models.} When completed, the \Rfunction{msm} function returns a value. This value is a list of the important results of the model fitting, including the parameter estimates and their covariances. To keep these results for post-processing, we store them in an R object, here called \Robject{cav.msm}. When running several similar \Rfunction{msm} models, it is recommended to store the respective results in informatively-named objects. \subsection{Showing results} To show the maximum likelihood estimates and 95\% confidence intervals, type the name of the fitted model object at the R command prompt. \footnote{This is equivalent to typing \texttt{print.msm(cav.msm)}. The function \Rfunction{print.msm} formats the important information in the model object for printing on the screen.} The confidence level can be changed using the \Rfunarg{cl} argument to \Rfunction{msm}. <<>>= cav.msm @ From the estimated intensities, we see patients are three times as likely to develop symptoms than die without symptoms (transitions from state 1). After disease onset (state 2), progression to severe symptoms (state 3) is 50\% more likely than recovery. Once in the severe state, death is more likely than recovery, and a mean of 1 / -0.44 = 2.3 years is spent in state 3 before death or recovery. Section \ref{sec:extractor} describes various functions that can be used to obtain summary information from the fitted model. \subsection{Covariates on the transition rates} \label{sec:msm:covariates} We now model the effect of explanatory variables on the rates of transition, using a proportional intensities model. Now we have an intensity matrix $Q(z)$ which depends on a covariate vector $z$. For each entry of $Q(z)$, the transition intensity for patient $i$ at observation time $j$ is $q_{rs}(z_{ij}) = q_{rs}^{(0)} \exp(\beta_{rs}^T z_{ij})$. The covariates $z$ are specified through the \Rfunarg{covariates} argument to \Rfunction{msm}. If $z_{ij}$ is time-dependent, we assume it is constant in between the observation times of the Markov process. \Rfunction{msm} calculates the probability for a state transition from times $t_{i,j-1}$ to $t_{ij}$ using the covariate value at time $t_{i,j-1}$. We consider a model with just one covariate, female sex. Out of the 622 transplant recipients, 535 are male and 87 are female. By default, all linear covariate effects $\beta_{rs}$ are initialised to zero. To specify different initial values, use a \Rfunarg{covinits} argument, as described in \Rfunction{help(msm)}. Initial values given in the \Rfunarg{qmatrix} represent the intensities with covariate values set to their means in the data. In the following model, all transition intensities are modelled in terms of sex. \paragraph{Model 2: sex as a covariate} <<>>= cavsex.msm <- msm( state ~ years, subject=PTNUM, data = cav, qmatrix = Q, deathexact = 4, covariates = ~ sex) @ Printing the \Robject{msm} object now displays the estimated covariate effects and their confidence intervals (note since version 1.3.2 these are \emph{hazard ratios} $\exp(\beta_{rs})$, not \emph{log hazard ratios} $\beta_{rs}$ as in previous versions). <<>>= cavsex.msm @ The sizes of the confidence intervals for some of the hazard ratios suggests there is no information in the data about the corresponding covariate effects, leading to a likelihood that is a flat function of these parameters, and this model should be simplified. The first column shown in the output is the estimated transition intensity matrix $q_{rs}(z) = q_{rs}^{(0)} \exp(\beta_{rs}^T z)$ with the covariate $z$ set to its mean value in the data. This represents an average intensity matrix for the population of 535 male and 87 female patients. To extract separate intensity matrices for male and female patients ($z = 0$ and $1$ respectively), use the function \Rfunction{qmatrix.msm}, as shown below. This and similar summary functions will be described in more detail in section \ref{sec:extractor}. <<>>= qmatrix.msm(cavsex.msm, covariates=list(sex=0)) # Male qmatrix.msm(cavsex.msm, covariates=list(sex=1)) # Female @ Since \Rpackage{msm} version 1.2.3, different transition rates may be easily modelled on different covariates by specifying a named list of formulae as the \Rfunarg{covariates} argument. Each element of the list has a name identifying the transition. In the model below, the transition rate from state 1 to state 2 and the rate from state 1 to state 4 are each modelled on sex as a covariate, but no other intensities have covariates on them. \paragraph{Model 2a: transition-specific covariates} <>= cavsex.msm <- msm( state ~ years, subject=PTNUM, data = cav, qmatrix = Q, deathexact = 4, covariates = list("1-2" = ~ sex, "1-4" = ~sex) ) @ We may also want to constrain the effect of a covariate to be equal for certain transition rates, to reduce the number of parameters in the model, or to investigate hypotheses on the covariate effects. A \Rfunarg{constraint} argument can be used to indicate which of the transition rates have common covariate effects. \paragraph{Model 3: constrained covariate effects} <>= cav3.msm <- msm( state ~ years, subject=PTNUM, data = cav, qmatrix = Q, deathexact = 4, covariates = ~ sex, constraint = list(sex=c(1,2,3,1,2,3,2)) ) @ This constrains the effect of sex to be equal for the progression rates $q_{12}, q_{23}$, equal for the death rates $q_{14}, q_{24}, q_{34}$, and equal for the recovery rates $q_{21}, q_{32}$. The intensity parameters are assumed to be ordered by reading across the rows of the transition matrix, starting at the first row: ($q_{12}, q_{14}, q_{21}, q_{23}, q_{24}, q_{32}, q_{34}$), giving constraint indicators \Rfunarg{(1,2,3,1,2,3,2)}. Any vector of increasing numbers can be used for the indicators. Negative entries can be used to indicate that some effects are equal to minus others: \Rfunarg{(1,2,3,-1,2,3,2)} sets the fourth effect to be minus the first. In a similar manner, we can constrain some of the baseline transition intensities to be equal to one another, using the \Rfunarg{qconstraint} argument. For example, to constrain the rates $q_{12}$ and $q_{23}$ to be equal, and $q_{24}$ and $q_{34}$ to be equal, specify \Rfunarg{qconstraint = c(1,2,3,1,4,5,4)}. \subsection{Fixing parameters at their initial values} For exploratory purposes we may want to fit a model assuming that some parameters are fixed, and estimate the remaining parameters. This may be necessary in cases where there is not enough information in the data to be able to estimate a proposed model, and we have strong prior information about a certain transition rate. To do this, use the \Rfunarg{fixedpars} argument to \Rfunction{msm}. For model 1, the following statement fixes the parameters numbered 6, 7, that is, $q_{32}$, $q_{34}$, to their initial values (0.25 and 0.25, respectively). \paragraph{Model 4: fixed parameters} <>= cav4.msm <- msm( state ~ years, subject=PTNUM, data = cav, qmatrix = Q, deathexact = 4, control = list(trace=2, REPORT=1), fixedpars = c(6, 7) ) @ A \Rfunarg{fixedpars} statement can also be used for fixing covariate effect parameters to zero, that is to assume no effect of a covariate on a certain transition rate. \subsection{Extractor functions} \label{sec:extractor} We may want to extract some of the information from the \Rfunction{msm} model fit for post-processing, for example for plotting graphs or generating summary tables. A set of functions is provided for extracting interesting features of the fitted model. \begin{description} \item[Intensity matrices] The function \Rfunction{qmatrix.msm} extracts the estimated transition intensity matrix and its confidence intervals for a given set of covariate values, as shown in section \ref{sec:msm:covariates}. Confidence intervals are calculated from the covariance matrix of the estimates by assuming the distribution is symmetric on the log scale. Standard errors for the intensities are also available from the object returned by \Rfunction{qmatrix.msm}. These are calculated by the delta method. The \Rpackage{msm} package provides a general-purpose function \Rfunction{deltamethod} for estimating the variance of a function of a random variable $X$ given the expectation and variance of $X$. See \texttt{help(deltamethod)} for further details. Bootstrap confidence intervals are also available for \Rfunction{qmatrix.msm} and for most output functions; these are often more accurate, at the cost of computational time. For more about bootstrapping in \Rpackage{msm}, see Section \ref{sec:boot}. \item[Transition probability matrices] The function \Rfunction{pmatrix.msm} extracts the estimated transition probability matrix $P(t)$ within a given time. For example, for model 1, the 10 year transition probabilities are given by: <<>>= pmatrix.msm(cav.msm, t=10) @ Thus, a typical person in state 1, disease-free, has a probability of 0.5 of being dead ten years from now, a probability of 0.3 being still disease-free, and probabilities of 0.1 of being alive with mild/moderate or severe disease, respectively. This assumes $Q$ is constant within the desired time interval. For non-homogeneous processes, where $Q$ varies with time-dependent covariates but can be approximated as piecewise constant, there is an equivalent function \Rfunction{pmatrix.piecewise.msm}. Consult its help page for further details. If \Rfunarg{ci=''norm''} is specified, then a confidence interval is calculated based on drawing a random sample (default size 1000) from the assumed multivariate normal distribution of the maximum likelihood estimates and covariance matrix, and transforming. If \Rfunarg{ci=''boot''} is specified, then a bootstrap confidence interval for the transition probability matrix is calculated (see Section \ref{sec:boot}) . However, both of these are computationally intensive, particularly the bootstrap method, so no confidence interval is calculated by default. \item[Mean sojourn times] The function \Rfunction{sojourn.msm} extracts the estimated mean sojourn times in each transient state $r$, for a given set of covariate values. This is calculated as $-1 / \hat q_{rr}$, where $\hat q_{rr}$ is the $r$th diagonal entry of the estimated transition intensity matrix. <<>>= sojourn.msm(cav.msm) @ \item[Probability that each state is next] The function \Rfunction{pnext.msm} extracts the matrix of probabilities $-q_{rs} / q_{rr}$ that the next state after state $r$ is state $s$, for each $r$ and $s$. Together with the mean sojourn times, this gives a more intuitive parameterisation of a continuous-time Markov model than the raw transition intensities $q_{rs}$. Note these are different from the transition probabilities in a given time $t$ returned by \Rfunction{pmatrix.msm}. <<>>= pnext.msm(cav.msm) @ \item[Total length of stay] Mean sojourn times describe the average period in a single stay in a state. For processes with successive periods of recovery and relapse, we may want to forecast the total time spent healthy or diseased, before death. The function \Rfunction{totlos.msm} estimates the forecasted total length of time spent in each transient state $s$ between two future time points $t_1$ and $t_2$, for a given set of covariate values. This defaults to the expected amount of time spent in each state between the start of the process (time 0, the present time) and death or a specified future time. This is obtained as \[ L_s = \int_{t_1}^{t_2} P(t)_{r,s} dt \] where $r$ is the state at the start of the process, which defaults to 1. This is calculated using numerical integration. For model 1, each patient is forecasted to spend 8.8 years disease free, 2.2 years with mild or moderate disease and 1.8 years with severe disease. Bootstrap and asymptotic confidence intervals are available, as for \Rfunction{pmatrix.msm}, but are not calculated by default. <<>>= totlos.msm(cav.msm) @ \item[Expected first passage times] The function \Rfunction{efpt.msm} estimates the expected time until the process first enters a given state or set of states, also called the ``hitting time''. See its help page for further details. \item[Expected number of visits] The function \Rfunction{envisits.msm} estimates the expected number of visits to a state, computed in a similar way to the total length of stay. See its help page for further details. \item[Ratio of transition intensities] The function \Rfunction{qratio.msm} estimates a ratio of two entries of the transition intensity matrix at a given set of covariate values, together with a confidence interval estimated assuming normality on the log scale and using the delta method. For example, we may want to estimate the ratio of the progression rate $q_{12}$ into the first state of disease to the corresponding recovery rate $q_{21}$. For example in model 1, recovery is 1.8 times as likely as progression. <<>>= qratio.msm(cav.msm, ind1=c(2,1), ind2=c(1,2)) @ \item[Hazard ratios for transition] The function \Rfunction{hazard.msm} gives the estimated hazard ratios corresponding to each covariate effect on the transition intensities. 95\% confidence limits are computed by assuming normality of the log-effect. <<>>= hazard.msm(cavsex.msm) @ \end{description} \paragraph{Setting covariate values} All of these extractor functions take an argument called \Rfunarg{covariates}. If this argument is omitted, for example, <>= qmatrix.msm(cav.msm) @ then the intensity matrix is evaluated as $Q(\bar x)$ with all covariates set to their mean values $\bar x$ in the data. For factor / categorical variables, the mean of the 0/1 dummy variable for each factor level is used, representing an average over all values in the data, rather than a specific factor level. Alternatively, set \Rfunarg{covariates} to 0 to return the result $Q(0)$ with covariates set to zero. This will usually be preferable for categorical covariates, where we wish to see the result for the baseline category. <>= qmatrix.msm(cavsex.msm, covariates = 0) @ Alternatively, the desired covariate values can be specified explicitly as a list, <>= qmatrix.msm(cavsex.msm, covariates = list(sex = 1)) @ Values of categorical covariates must be quoted. For example, consider a covariate \texttt{smoke}, representing tobacco smoking status, with three levels, \texttt{NON, CURRENT, EX}, representing a non-smoker, current smoker or ex-smoker. \begin{Scode} qmatrix.msm(example.msm, covariates = list(age = 60, smoke=''CURRENT'')) \end{Scode} \subsection{Survival plots} In studies of chronic disease, an important use of multi-state models is in predicting the probability of survival for patients in increasingly severe states of disease, for some time $t$ in the future. This can be obtained directly from the transition probability matrix $P(t)$. The \Rfunction{plot} method for \Robject{msm} objects produces a plot of the expected probability of survival against time, from each transient state. Survival is defined as not entering the final absorbing state. <>= plot(cav.msm, legend.pos=c(8, 1)) @ This shows that the 10-year survival probability with severe CAV is approximately 0.1, as opposed to 0.3 with mild CAV and 0.5 without CAV. With severe CAV the survival probability diminishes very quickly to around 0.3 in the first five years after transplant. The \Rfunarg{legend.pos} argument adjusts the position of the legend in case of clashes with the plot lines. A \Rfunarg{times} argument can be supplied to indicate the time interval to forecast survival for. A more sophisticated analysis of these data might explore competing causes of death from causes related or unrelated to the disease under study. \subsection{Bootstrapping} \label{sec:boot} Most of \Rpackage{msm}'s output functions present confidence intervals based on asymptotic standard errors calculated from the Hessian, or transformations of these using the delta method. The asymptotic standard errors are expected to be underestimates of the true standard errors (Cramer-Rao lower bound). For some output functions, such as \Rfunction{pmatrix.msm}, and functions based on \Rfunction{pmatrix.msm} such as \Rfunction{totlos.msm} and \Rfunction{prevalence.msm}, the delta method cannot be used at all to obtain standard errors. In these cases, confidence intervals can be calculated by drawing a random sample from the assumed multivariate normal distribution of the maximum likelihood estimates and covariance matrix, and transforming. However, this is still based on potentially inaccurate asymptotic theory. The \Rpackage{msm} package provides the function \Rfunction{boot.msm} to enable bootstrap refitting of \Rfunction{msm} models, an alternative way to estimate uncertainty. For non-hidden Markov models, a bootstrap dataset is drawn by resampling pairs of consecutive states from the full data, i.e. \emph{transitions}. These are assumed to be independent when calculating the likelihood (Section \ref{sec:multi:likelihood}). For hidden Markov models and models with censoring, a bootstrap dataset is drawn by resampling complete series from independent subjects. The bootstrap datasets have the same number of transitions, or subjects, respectively, as the original data. For most output extractor functions provided with \Rpackage{msm}, the option \Rfunarg{ci=''boot''} is available, as a wrapper around \Rfunction{boot.msm}, to enable bootstrap confidence intervals to be calculated. But any user-defined output statistic can be bootstrapped, as follows. The function \Rfunction{boot.msm} is called with the fitted \Rfunction{msm} model as first argument, and an R function specifying the statistic to be bootstrapped as the second argument \texttt{stat}. The return value from \Rfunction{boot.msm} is a list of \texttt{B} replicates (by default, \texttt{B=1000}) of the desired statistic. For example, to bootstrap the transition intensity matrix of the heart transplantation model \Robject{cav.msm}: \begin{Scode} q.list <- boot.msm(cav.msm, stat=function(x){qmatrix.msm(x)$estimates}) \end{Scode} Note that for \Rfunction{boot.msm} to be able to refit the original model that produced \Robject{cav.msm}, all objects used in the original model fit (for example, in this case, \Robject{Q}) must be in the working environment. Otherwise, \Rfunction{boot.msm} will give an ``object not found'' error. The user can then summarise these replicates by calculating empirical standard deviations or quantile-based intervals. In this example, \Robject{q.list} is a list of 1000 4$\times$4 matrices. The following code calculates the bootstrap standard error as the empirical standard deviation of the 1000 replicates, and a similar 95\% bootstrap confidence interval. \begin{Scode} q.array <- array(unlist(q.list), dim=c(4,4,1000)) apply(q.array, c(1,2), sd) apply(q.array, c(1,2), function(x)quantile(x, c(0.025, 0.975))) \end{Scode} Note that when bootstrapping, the refits of the model to the resampled datasets may occasionally fail to converge (as discussed in Section \ref{sec:failure}) even if the original model fit did converge. In these cases, a warning is given, but \Rfunction{boot.msm} simply discards the failed dataset and moves on to the next bootstrap iteration. Unless convergence failure occurs for a large proportion of iterations, this should not affect the accuracy of the final bootstrap confidence intervals. \subsection{Convergence failure} \label{sec:failure} Inevitably if over-complex models are applied with insufficient data then the parameters of the model will not be identifiable. This will result in the optimisation algorithm failing to find the maximum of the log-likelihood, or even failing to evaluate the likelihood. For example, it will commonly be inadvisable to include several covariates in a model simultaneously. In some circumstances, the optimisation may report convergence, but fail to calculate any standard errors. In these cases, the Hessian of the log-likelihood at the reported solution is not positive definite. Thus the reported solution may be a saddle point rather than the maximum likelihood, or it may be close to the maximum. \begin{description} \item[Model simplification] Firstly, make sure there are not too many parameters in the model. There may not be enough information in the data on a certain transition rate. It is recommended to count all the pairs of transitions between states in successive observation times, making a frequency table of previous state against current state (function \Rfunction{statetable.msm}), and do this for any subgroups defining covariates. Although the data are a series of snapshots of a continuous-time process, and the actual transitions take place in between the observation times, this type of table may still be helpful. If there are not many observed `transitions' from state 2 to state 4, for example, then the data may be insufficient to estimate $q_{24}$. For a staged disease model (Figure \ref{fig:disease}), the number of disease states should be low enough that all transition rates can be estimated. Consecutive states of disease severity should be merged if necessary. If it is realistic, consider applying constraints on the intensities or the covariate effects so that the parameters are equal for certain transitions, or zero for certain transitions. Be careful to use a observation scheme and transition matrix appropriate to your data (see Section \ref{sec:arbitr-observ-times}). By default, \Rfunction{msm} assumes that the data represent snapshots of the process, and the true state is unknown between observation times. In such circumstances, it is rarely feasible to estimate an intensity matrix with \emph{instantaneous} transitions allowed between every pair of states. This would be easier if the complete course of the process is known \Rfunarg{(exacttimes = TRUE)} in the call to \Rfunction{msm}. Understand the difference between \emph{instantaneous} and \emph{interval} transitions - although individuals may be in state 1 at time $t_r$, and state 3 at time $t_{r+1}$, that doesn't mean that instantaneous transitions from 1 to 3 should be permitted. \item[Initial values] Make sure that a sensible set of initial values have been chosen. The optimisation may only converge within a limited range of `informative' initial values. It is also sensible to run the model for several different initial values to ensure that the estimation has converged to a global rather than a local optimum. \item[Scaling] It is often necessary to apply a scaling factor to normalise the likelihood (\Rfunarg{fnscale}), or certain individual parameters \Rfunarg{(parscale)}. This may prevent overflow or underflow problems within the optimisation. For example, if the value of the -2 $\times$ log-likelihood is around 5000, then the following option leads to an minimisation of the -2 $\times$ log-likelihood on an approximate unit scale: \Rfunarg{control = list(fnscale = 5000)}. % Though since version 1.4.1, \Rfunarg{fnscale} is applied automatically using the likelihood at the initial values, unless the user has already supplied it. % If not provided by the user, \code{control=list(fnscale = a)} is % applied automatically to normalise the optimisation, where \code{a} % is the minus twice log likelihood at the initial values. It is also advisable to analyse all variables, including covariates and the time unit, on a roughly normalised scale. For example, working in terms of a time unit of months or years instead of days, when the data range over thousands of days. \item[Convergence criteria] ``False convergence'', in which \Rfunction{optim()} reports convergence of the optimisation but the Hessian is not positive definite, can sometimes be solved by tightening the criteria (\Rfunarg{reltol}, defaults to \texttt{1e-08}) for reporting convergence. For example, \Rfunarg{control = list(reltol = 1e-16)}. Alternatively consider using smaller step sizes for the numerical approximation to the gradient, used in calculating the Hessian. This is given by the control parameter \Rfunarg{ndeps}. For example, for a model with 5 parameters, \Rfunarg{control = list(ndeps = rep(1e-6, 5))} \item[Choice of algorithm] By default, since version 1.3.2, \Rfunction{msm} uses the BFGS method of \Rfunction{optim}, which makes use of analytic derivatives. Analytic derivatives are available for all models in msm, apart from hidden Markov models with unknown initial state probabilities, misclassification models with equality constraints on misclassification probabilities, and truncated or measurement-error outcome distributions. This speeds up optimisation. Though alternative algorithms are available such as \Rfunarg{method = ``CG''}. Or use the \Rfunction{nlm} R function via \Rfunarg{msm(..., opt.method = "nlm" , ...)} Note also the Fisher scoring method available for non-hidden Markov models for panel data, via \Rfunarg{msm(..., opt.method = "fisher", ...)}, is expected to be faster than the generic methods, but less robust to bad initial values. Or since version 1.3.2, msm can also use \Rfunarg{method=``bobyqa''} from the \Rpackage{minqa} package, a fast derivative-free method. \item[\Rfunction{optim} "function cannot be evaluated at initial parameters"] To diagnose this problem, run \Rfunction{msm} again with \Rfunarg{fixedpars=TRUE} set, to calculate the -2 log-likelihood at the initial values. This will probably be \Robject{Inf}. To show the contributions of individual subjects to the overall log likelihood, call \Rfunction{logLik.msm(x, by.subject=TRUE)}, where \Robject{x} is the fitted model object. If only a few subjects give an infinite log-likelihood, then you can check whether their state histories are particularly unusual and conflict with the model. For example, they might appear to make unusually large jumps between states in short periods of time. For models with misclassification, note that the default true initial state distribution \Rfunarg{initprobs} puts all individuals in true state 1 at their first observation. If someone starts in a much higher state, this may result in an infinite log-likelihood, and changing \Rfunarg{initprobs} would be sensible. \end{description} \subsection{Model assessment} \label{sec:model-assessment} \paragraph{Observed and expected prevalence} To compare the relative fit of two nested models, it is easy to compare their likelihoods. However it is not always easy to determine how well a fitted multi-state model describes an irregularly-observed process. Ideally we would like to compare observed data with fitted or expected data under the model. If there were times at which all individuals were observed then the fit of the expected numbers in each state or {\em prevalences} can be assessed directly at those times. Otherwise, some approximations are necessary. We could assume that an individual's state at an arbitrary time $t$ was the same as the state at their previous observation time. This might be fairly accurate if observation times are close together. This approach is taken by the function \Rfunction{prevalence.msm}, which constructs a table of observed and expected numbers and percentages of individuals in each state at a set of times. A set of expected counts can be produced if the process begins at a common time for all individuals. Suppose at this time, each individual is in state 0. Then given $n(t)$ individuals are under observation at time $t$, the expected number of individuals in state $r$ at time $t$ is $n(t) P(t)_{0,r}$. If the covariates on which $P(t)$ depends vary between individuals, then this can be averaged over the covariates observed in the data. For example, we calculate the observed and expected numbers and percentages at two-yearly intervals up to 20 years after transplant, for the heart transplant model \Rfunction{cav.msm}. The number of individuals still alive and under observation decreases from 622 to 251 at year 20. The observed and expected percentages are plotted against time. <<>>= options(digits=3) prevalence.msm(cav.msm, times=seq(0,20,2)) @ <>= plot.prevalence.msm(cav.msm, mintime=0, maxtime=20) @ Comparing the observed and expected percentages in each state, we see that the predicted number of individuals who die (State 4) is under-estimated by the model from about year 8 onwards. Similarly the number of individuals sill alive and free of CAV (State 1) is over-estimated by the model from about year 8 onwards. Such discrepancies could have many causes. One possibility is that the transition rates vary with the time since the beginning of the process, the age of the patient, or some other omitted covariate, so that the Markov model is {\em non-homogeneous}. This could be accounted for by modelling the intensity as a function of age, for example, such as a piecewise-constant function. The \Rfunarg{pci} argument to \Rfunction{msm} can be used to automatically construct models with transition intensities which are piecewise-constant in time. In this example, the hazard of death may increase with age, so that the model underestimates the number of deaths when forecasting far into the future. Another cause of poor model fit may sometimes be the failure of the Markov assumption. That is, the transition intensities may depend on the time spent in the current state (a semi-Markov process) or other characteristics of the process history. Accounting for the process history is difficult as the process is only observed through a series of snapshots. Semi-Markov models may in principle be fitted to this type of data using phase-type distributions. Since version 1.4.1 the \Rfunarg{phase.states} option to \Rfunction{msm} can be used to define some phase-type models. See \Rfunction{help(msm)} for further details. However, if it is known that individuals who died would not have been followed up after a certain time, had they survived to that time, then they should not be included in the observed prevalence of the death state after that time. This can be accounted for by passing a vector of maximum potential follow-up times, one for each individual in the same order as the original data, in the \Rfunarg{censtime} argument to \Rfunction{prevalence.msm}. Ignoring the potential follow-up times is likely to have resulted in overestimates of the number of deaths at later times for the ``observed'' prevalences in the CAV example, though these times are not available in the data supplied with \Rpackage{msm}. \paragraph{Pearson-type goodness-of-fit test} \label{sec:pearson} Suppose that the true transition times are unknown, and data consist of observations of the process at arbitrary times which differ between individuals (panel data). Assessing goodness of fit by prevalence counts then involves estimating the observed prevalence at a series of points by some form of interpolation. This is only advisable if observation times are close together. An alternative method of assessing goodness-of-fit is to construct tables of observed and expected numbers of transitions, as described by Aguirre-Hernandez and Farewell \cite{ahf}. This leads to a formal test of goodness-of-fit, analogous to the classical Pearson $\chi^2$ test for contingency tables. The tables are constructed as follows. Each pair of successive observations in the data (\emph{transition}) is classified by \begin{itemize} \item the starting state $r$ and finishing state $s$, \item time between the start of the process and the first of the pair of observations (indexed by $h$), \item time interval between the observations (indexed by $l_h$, within categories $h$), \item (if there are fitted covariates) the impact of covariates, as summarised by $q_{rr}$ (indexed by $c$), \item any other grouping of interest for diagnosing lack of fit (indexed by $g$). \end{itemize} Groupings of continuous quantities are normally defined by quantiles, so that there are a similar number of observed transitions in each (one-dimensional) category. The observed and expected numbers of transitions in each group are then defined by \[ o_{hl_h rscg} = \sum I(S(t_{i,j+1}) = s, S(t_{ij}) = r) \] \[ e_{hl_h rscg} = \sum P(S(t_{i,j+1}) = s | S(t_{ij}) = r) \] where $I(A)$ is the indicator function for an event $A$ and the summation is over the set of transitions in the category defined by $h,l_h,c,g$, over all individuals $i$. The Pearson-type test statistic is then \[ T = \sum_{hl_h rscg} \frac{(o_{hl_h rscg} - e_{hl_h rscg})^2}{e_{hl_h rscg}} \] The classical Pearson test statistic is distributed as $\chi^2_{n-p}$, where $n$ is the number of independent cells in the table and $p$ is the number of estimated parameters $p$. But the null distribution of $T$ is not exactly $\chi^2$, since the time intervals are non-identical, therefore the observed transitions are realizations from a set of independent but non-identical multinomial distributions. Titman \cite{titman:asympnull} showed that the null distribution of $T$ is asymptotically approximated by a weighted sum of $\chi^2_1$ random variables. Aguirre-Hernandez and Farewell \cite{ahf} also showed that $\chi^2_{n-p}$ is a good approximation if there are no fitted covariates. For models with covariates, the null mean of $T$ is higher than $n - p$, but lower than $n$. Therefore, upper and lower bounds for the true $p$-value of the statistic can be obtained from the $\chi^2_{n-p}$ and $\chi^2_n$ distributions. Aguirre-Hernandez and Farewell \cite{ahf} also described a bootstrap procedure for obtaining an accurate $p$-value. Titman and Sharples \cite{titman:sharples} described modifications to the test to correct for the biases introduced where in addition to the panel-data observation scheme: \begin{itemize} \item Times of death are known exactly. In this case, transitions ending in death are classified according to the next scheduled observation time after the death, which is estimated by multiple imputation from a Kaplan-Meier estimate of the distribution of time intervals between observations. \item An individual's final observation is censored, so that they are only known to be alive at that point. \item States are misclassified. \end{itemize} The \Rpackage{msm} package provides the function \Rfunction{pearson.msm} to perform the Pearson-type test. By default, three groups are used for each of $h$, $l_h$ and $c$. Often the number of groups will need to be reduced in cases where the resulting contingency tables are sparse (thus there are several low expected counts and the variance of $T$ is inflated). The test is now performed on the model \Robject{cav.msm} for the heart transplant dataset (a version of which was also analysed by Titman and Sharples \cite{titman:sharples}). The default three interval groups are used, and two groups of the time since the start of the process. The \Rfunarg{transitions} argument groups the transitions from state 3 to each of states 1, 2 and 3 (the 9th, 10th and 11th transitions) together in the table, since these transitions are infrequent. <<>>= options(digits=2) pearson.msm(cav.msm, timegroups=2, transitions=c(1,2,3,4,5,6,7,8,9,9,9,10)) @ The first two tables in the output show the contingency tables of observed and expected numbers of transitions. Note that the observed number of transitions in certain categories is not a whole number, since these are averaged over multiple imputations of the next scheduled observation time following deaths. The column \texttt{Time} is the group defined by time since the start of the process, and the column \texttt{Interval} is the group defined by intervals between observations. The columns indicate the allowed transitions, or pairs of states which can be observed in successive observations. The third table presents the ``deviance'', the value of $\frac{(o_{hl_h rscg} - e_{hl_h rscg})^2}{e_{hl_h rscg}}$ for each cell, multipled by the sign of $o_{hl_h rscg} - e_{hl_h rscg}$ to indicate whether there were more or fewer transitions than expected in each cell. These can indicate areas of bad fit. For example, systematic changes in deviance by time or time interval between observations can indicate that a model with time-varying transition intensities is more suitable. Changes in deviance by covariate impact may indicate heterogeneity between individuals which is unexplained by the fitted covariates. Changes in deviance with the length of the interval between observations may also indicate failure of the Markov assumption, and that a semi-Markov model (in which intensities depend on the time spent in the current state) may fit better. In this example, the test statistic is 100. \Robject{p.upper} is an upper bound for the $p$-value of the statistic based on an asymptotic $\chi^2_{42}$ distribution, therefore the model does not fit well. It is not clear from the table of deviances which aspects of the fit are most influental to the test statistic. However, the two-way Markov model itself is not biologically plausible, as discussed in Section \ref{sec:fitting:hmm:misc}. For non-hidden Markov models for panel data, \Rfunction{pearson.msm} also presents the accurate analytic p-value of Titman \cite{titman:asympnull}. For all models, \Rfunction{pearson.msm} provides an option for parametric bootstrapping to obtain an accurate p-value. \subsection{Fitting misclassification models with \Rpackage{msm}} \label{sec:fitting:hmm:misc} In fact, in the heart transplant example from section \ref{sec:datain}, it is not medically realistic for patients to recover from a diseased state to a healthy state. Progression of coronary artery vasculopathy is thought to be an irreversible process. The angiography scan for CAV is actually subject to error, which leads to some false measurements of CAV states and apparent recoveries. Thus we account for misclassification by fitting a \emph{hidden Markov model} using \Rpackage{msm}. Firstly we replace the two-way multi-state model by a one-way model with transition intensity matrix \[ Q = \left( \begin{array}{llll} -(q_{12} + q_{14}) & q_{12} & 0 & q_{14}\\ 0 & -(q_{23}+q_{24}) & q_{23} & q_{24}\\ 0 & 0 & -q_{34} & q_{34}\\ 0 & 0 & 0 & 0 \\ \end{array} \right ) \] We also assume that true state 1 (CAV-free) can be classified as state 1 or 2, state 2 (mild/moderate CAV) can be classified as state 1, 2 or 3, while state 3 (severe CAV) can be classified as state 2 or 3. Recall that state 4 represents death. Thus our matrix of misclassification probabilities is \[ E = \left( \begin{array}{llll} 1 - e_{12} & e_{12} & 0 & 0 \\ e_{21} & 1 - e_{21} - e_{23} & e_{23} & 0 \\ 0 & e_{32} & 1 - e_{32} & 0 \\ 0 & 0 & 0 & 0\\ \end{array} \right) \] with underlying states as rows, and observed states as columns. To model observed states with misclassification, we define a matrix \Rfunarg{ematrix} indicating the states that can be misclassified. Rows of this matrix correspond to true states, columns to observed states. It should contains zeroes in the positions where misclassification is not permitted. Non-zero entries are initial values for the corresponding misclassification probabilities. We then call \Rfunction{msm} as before, but with this matrix as the \Rfunarg{ematrix} argument. Initial values of 0.1 are assumed for each of the four misclassification probabilities $e_{12}, e_{21}, e_{23}, e_{32}$. Zeroes are given where the elements of $E$ are zero. The diagonal elements supplied in \Rfunarg{ematrix} are ignored, as rows must sum to one. The matrix \Rfunarg{qmatrix}, specifying permitted transition intensities and their initial values, also changes to correspond to the new $Q$ representing the progression-only model for the underlying states. The true state for every patient at the date of transplant is known to be ``CAV-free'', not misclassified. To indicate this we use the argument \Rfunarg{obstrue} to \Rfunction{msm}. This is set to be a variable in the dataset, \Rfunarg{firstobs}, indicating where the observed state equals the true state. This takes the value of 1 at the patient's first observation, at the transplant date, and 0 elsewhere. We use an alternative quasi-Newton optimisation algorithm \Rfunarg{(method="BFGS")} which can often be faster or more robust than the default Nelder-Mead simplex-based algorithm. An optional argument \Rfunarg{initprobs} could also have been given here, representing the vector of the probabilities of occupying each true state at the initial observation (equation \ref{eq:multi:hidden:matprod}). This can also be a matrix with number of rows equal to the number of subjects, if these probabilities are subject-dependent and known. If not given, all individuals are assumed to be in true state 1 at their initial observation. If \Rfunarg{est.initprobs=TRUE} is specified, then these probabilites are estimated as part of the model fit, using a vector \Rfunarg{initprobs} as initial values. Covariate effects on these probabilities can also be estimated using a multinomial logistic regression model, if an \Rfunarg{initcovariates} argument is specified. See \Rfunction{help(msm)} for further details. \paragraph{Model 5: multi-state model with misclassification} <<>>= Qm <- rbind(c(0, 0.148, 0, 0.0171), c(0, 0, 0.202, 0.081), c(0, 0, 0, 0.126), c(0, 0, 0, 0)) ematrix <- rbind(c(0, 0.1, 0, 0), c(0.1, 0, 0.1, 0), c(0, 0.1, 0, 0), c(0, 0, 0, 0)) cavmisc.msm <- msm(state ~ years, subject = PTNUM, data = cav, qmatrix = Qm, ematrix = ematrix, deathexact = 4, obstrue = firstobs) cavmisc.msm @ Thus there is an estimated probability of about 0.03 that mild/moderate CAV will be diagnosed erroneously, but a rather higher probability of 0.17 that underlying mild/moderate CAV will be diagnosed as CAV-free. Between the two CAV states, the mild state will be misdiagnosed as severe with a probability of 0.06, and the severe state will be misdiagnosed as mild with a probability of 0.12. The model also estimates the progression rates through underlying states. An average of 8 years is spent disease-free, an average of about 3 years is spent with mild/moderate disease, and periods of severe disease also last about 3 years on average before death. \subsection{Effects of covariates on misclassification rates} We can investigate how the probabilities of misclassification depend on covariates in a similar way to the transition intensities, using a \Rfunarg{misccovariates} argument to \Rfunction{msm}. For example, we now include female sex as a covariate for the misclassification probabilities. The linear effects on the log odds of each misclassified state relative to the true state are initialised to zero by default (but this can be changed with the \Rfunarg{misccovinits} argument). \paragraph{Model 6: misclassification model with misclassification probabilities modelled on sex} <<>>= cavmiscsex.msm <- msm(state ~ years, subject = PTNUM, data = cav, qmatrix = Qm, ematrix = ematrix, deathexact = 4, misccovariates = ~sex, obstrue=firstobs) @ <<>>= cavmiscsex.msm @ The large confidence interval for the odds ratio for 1/2 misclassification suggests there is no information in the data about the difference between genders in the false positive rates for angiography. On the other hand, women have slightly more false negatives. \subsection{Extractor functions} As well as the functions described in section \ref{sec:extractor} for extracting useful information from fitted models, there are a number of extractor functions specific to models with misclassification. \begin{description} \item[Misclassification matrix] The function \Rfunction{ematrix.msm} gives the estimated misclassification probability matrix at the given covariate values. For illustration, the fitted misclassification probabilities for men and women in model 6 are given by <<>>= ematrix.msm(cavmiscsex.msm, covariates=list(sex=0)) ematrix.msm(cavmiscsex.msm, covariates=list(sex=1)) @ The confidence intervals for the estimates for women are wider, since there are only 87 women in this set of 622 patients. \item[Odds ratios for misclassification] The function \Rfunction{odds.msm} would give the estimated odds ratios corresponding to each covariate effect on the misclassification probabilities. \begin{Scode} odds.msm(cavmiscsex.msm) \end{Scode} \item[Observed and expected prevalences] The function \Rfunction{prevalence.msm} is intended to assess the goodness of fit of the hidden Markov model for the \emph{observed} states to the data. Tables of observed prevalences of observed states are calculated as described in section \ref{sec:model-assessment}, by assuming that observed states are retained between observation times. The expected numbers of individuals in each observed state are calculated similarly. Suppose the process begins at a common time for all individuals, and at this time, the probability of occupying \emph{true} state $r$ is $f_r$. Then given $n(t)$ individuals under observation at time $t$, the expected number of individuals in true state $r$ at time $t$ is the $r$th element of the vector $n(t) f P(t)$. Thus the expected number of individuals in \emph{observed} state $r$ is the $r$th element of the vector $n(t) f P(t) E$, where $E$ is the misclassification probability matrix. The expected prevalences (not shown) for this example are similar to those forecasted by the model without misclassification, with underestimates of the rates of death from 8 years onwards. To improve this model's long-term prediction ability, it is probably necessary to account for the natural increase in the hazard of death from any cause as people become older. \item[Goodness-of-fit test] The Pearson-type goodness-of-fit test is performed, as in Section \ref{sec:pearson}. The table of deviances indicates that there are more 1-3 and 1-4 transitions than expected in short intervals, and fewer in long intervals. This may indicate some time-dependence in the transition rates. Indeed, Titman \cite{titman:phd} found that a model with piecewise-constant transition intensities gave a greatly improved fit to these data. <<>>= pearson.msm(cavmisc.msm, timegroups=2, transitions=c(1,2,3,4,5,6,7,8,9,9,9,10)) @ \end{description} \subsection{Recreating the path through underlying states} In speech recognition and signal processing, {\em decoding} is the procedure of determining the underlying states that are most likely to have given rise to the observations. The most common method of reconstructing the most likely state path is the {\em Viterbi} algorithm. Originally proposed by Viterbi \cite{viterbi}, it is also described by Durbin \etal \cite{biolog:seq} and Macdonald and Zucchini \cite{macdonald:zucchini} for discrete-time hidden Markov chains. For continuous-time models it proceeds as follows. Suppose that a hidden Markov model has been fitted and a Markov transition matrix $P(t)$ and misclassification matrix $E$ are known. Let $v_k(t_i)$ be the probability of the most probable path ending in state $k$ at time $t_i$. \begin{enumerate} \item Estimate $v_k(t_1)$ using known or estimated initial-state occupation probabilities (\texttt{initprobs}), each multiplied by the probability of the observed outcome at the initial time, given that the true initial state is $k$. \item For $i = 1 \ldots N$, calculate $v_l(t_i) = e_{l,O_{t_i}} \max_k v_k(t_{i-1}) P_{kl}(t_{i} - t_{i-1})$. Let $K_i(l)$ be the maximising value of $k$. \item At the final time point $t_N$, the most likely underlying state $S^*_N$ is the value of $k$ which maximises $v_k(t_N)$. \item Retrace back through the time points, setting $S^*_{i-1} = K_i(S^*_i)$. \end{enumerate} The computations should be done in log space to prevent underflow. The \Rpackage{msm} package provides the function \Rfunction{viterbi.msm} to implement this method. For example, the following is an extract from a result of calling \Rfunction{viterbi.msm} to determine the most likely underlying states for all patients. The results for patient 100103 are shown, who appeared to `recover' to a less severe state of disease while in state 3. We assume this is not biologically possible for the true states, so we expect that either the observation of state 3 at time 4.98 was an erroneous observation of state 2, or their apparent state 2 at time 5.94 was actually state 3. According to the expected path constructed using the Viterbi algorithm, it is the observation at time 5.94 which is most probably misclassified. <<>>= vit <- viterbi.msm(cavmisc.msm) vit[vit$subject==100103,] @ \subsection{Fitting general hidden Markov models with \Rpackage{msm}} \label{sec:fitting:hmm:general} The \Rpackage{msm} package provides a framework for fitting continuous-time hidden Markov models with general, continuous outcomes. As before, we use the \Rfunction{msm} function itself. \paragraph{Specifying the hidden Markov model} A hidden Markov model consists of two related components: \begin{itemize} \item the model for the evolution of the underlying Markov chain, \item the set of models for the observed data conditionally on each underlying state. \end{itemize} The model for the transitions between underlying states is specified as before, by supplying a \Rfunarg{qmatrix}. The model for the outcomes is specified using the argument \Rfunarg{hmodel} to \Rfunction{msm}. This is a list, with one element for each underlying state, in order. Each element of the list should be an object returned by a hidden Markov model \emph{constructor function}. The HMM constructor functions provided with \Rpackage{msm} are listed in Table \ref{tab:hmm:dists}. There is a separate constructor function for each class of outcome distribution, such as uniform, normal or gamma. Consider a three-state hidden Markov model, with a transition intensity matrix of \[ Q = \left( \begin{array}{llll} -q_{12} & q_{12} & 0 \\ 0 & -q_{23} & q_{23}\\ 0 & 0 & 0 \\ \end{array} \right ) \] Suppose the outcome distribution for state 1 is Normal$(\mu_1, \sigma^2_1)$, the distribution for state 2 is Normal$(\mu_2, \sigma^2_2)$, and state 3 is exactly observed. Observations of state 3 are given a label of -9 in the data. Here our \Rfunarg{hmodel} argument should be a list of objects returned by \Rfunction{hmmNorm} and \Rfunction{hmmIdent} constructor functions. We must specify initial values for the parameters as the arguments to the constructor functions. For example, we take initial values of $\mu_1 = 90, \sigma_1 = 8, \mu_2 = 70, \sigma_2 = 8$. Initial values for $q_{12}$ and $q_{23}$ are 0.25 and 0.2. Finally suppose the observed data are in a variable called \Robject{y}, the measurement times are in \Robject{time}, and subject identifiers are in \Robject{ptnum}. The call to \Rfunction{msm} to estimate the parameters of this hidden Markov model would then be \begin{Scode} msm( y ~ time, subject=ptnum, data = example.df, qmatrix = rbind( c(0, 0.25, 0), c(0, 0, 0.2), c(0, 0, 0)), hmodel = list (hmmNorm(mean=90, sd=8), hmmNorm(mean=70, sd=8), hmmIdent(-9)) ) \end{Scode} \begin{table}[htbp] \scriptsize \centering \begin{tabular}{lp{0.8in}p{0.6in}p{0.6in}p{0.7in}l} \hline Function & Distribution & Parameters & & Location (link) & Density for an observation $x$ \\ \hline \Rfunction{hmmCat} & Categorical & \Rfunarg{prob, basecat} & $p,c_0$ & $p$ (logit) & $p_x$, $x = 1, \ldots, n$ \\ \Rfunction{hmmIdent} & Identity & \Rfunarg{x} & $x_0$ & & $I_{x = x_0}$ \\ \Rfunction{hmmUnif} & Uniform & \Rfunarg{lower, upper} & $l,u$ & & $1 / (u - l)$, $u \leq x \leq l$ \\ \Rfunction{hmmNorm} & Normal & \Rfunarg{mean, sd} & $\mu,\sigma$ & $\mu$ (identity) & $\phi(x, \mu, \sigma) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp(-(x - \mu)^2/(2 \sigma^2) )$ \\ \Rfunction{hmmLNorm} & Log-normal & \Rfunarg{meanlog, sdlog} & $\mu,\sigma$ & $\mu$ (identity) & $\frac{1}{x \sqrt{2 \pi \sigma^2}} \exp(-(\log x - \mu)^2 / (2 \sigma^2))$ \\ \Rfunction{hmmExp} & Exponential & \Rfunarg{rate} & $\lambda$ & $\lambda$ (log) & $\lambda e^{- \lambda x}$, $x > 0$ \\ \Rfunction{hmmGamma} & Gamma & \Rfunarg{shape, rate} & $n,\lambda$ & $\lambda$ (log) & $\frac{\lambda^n}{\Gamma(n)}x^{n-1} \exp(-\lambda x)$, $x > 0, n > 0, \lambda > 0$ \\ \Rfunction{hmmWeibull} & Weibull & \Rfunarg{shape, scale} & $a, b$ & $b$ (log) & $\frac{a}{b} (\frac{x}{b})^{a-1} \exp{(-(\frac{x}{b})^a)}$, $x > 0$ \\ \Rfunction{hmmPois} & Poisson & \Rfunarg{rate} & $\lambda$ & $\lambda$ (log) & $\lambda^x \exp(-\lambda)/x!$, $x = 0, 1, 2, \ldots$ \\ \Rfunction{hmmBinom} & Binomial & \Rfunarg{size, prob} & $n,p$ & $p$ (logit) & ${n \choose x} p^x (1-p)^{n-x}$ \\ \Rfunction{hmmNBinom} & Negative binomial & \Rfunarg{disp, prob} & $n,p$ & $p$ (logit) & $\Gamma(x+n)/(\Gamma(n)x!) p^n (1-p)^x$ \\ \Rfunction{hmmBetaBinom} & Beta-binomial & \Rfunarg{size, meanp, sdp} & $n,\mu,\sigma$ & $\mu$ (logit) & ${n \choose x} Beta(x+a,n-x+b)/Beta(a,b)$ \\ & & & & & where $a=\mu/\sigma, b=(1-\mu)/\sigma$ \\ \Rfunction{hmmBeta} & Beta & \Rfunarg{shape1,shape2} & $a,b$ & & $ \Gamma(a+b) / (\Gamma(a)\Gamma(b))x^{a-1}(1-x)^{b-1}$ \\ \Rfunction{hmmT} & Student $t$ & \Rfunarg{mean, scale, df} & $\mu,\sigma,k$ & $\mu$ (identity) & $\frac{\Gamma\left((k+1)/2\right)}{\Gamma(k/2)}{\sqrt{\frac{1}{k\pi\sigma^2}}}\left\{1 + \frac{1}{k\sigma^2}(x - \mu)^{2} \right\}^{-(k + 1)/2}$ \\ \Rfunction{hmmTNorm} & Truncated normal & \Rfunarg{mean, sd, lower, upper} & $\mu,\sigma,l,u$ & $\mu$ (identity) & \parbox[t]{2in}{$\phi(x, \mu, \sigma) / \\ (\Phi(u, \mu, \sigma) - \Phi(l, \mu, \sigma))$, \\ where $\Phi(x,\mu,\sigma) = \int_{-\infty}^x \phi(u,\mu,\sigma) du$} \\ \Rfunction{hmmMETNorm} & Normal with truncation and measurement error & \Rfunarg{mean, sd, lower, upper, sderr, meanerr} & \parbox[t]{1in} {$\mu_0,\sigma_0,l,u$, \\ $\sigma_\epsilon,\mu_\epsilon$} & $\mu_\epsilon$ (identity) & \parbox[t]{2in}{$( \Phi(u, \mu_2, \sigma_3) - \Phi(l, \mu_2, \sigma_3)) / $ \\ $(\Phi(u, \mu_0, \sigma_0) - \Phi(l, \mu_0, \sigma_0)) $ \\ $\times \phi(x, \mu_0 + \mu_\epsilon, \sigma_2)$, \\ $\sigma_2^2 = \sigma_0^2 + \sigma_\epsilon^2$, \\ $\sigma_3 = \sigma_0 \sigma_\epsilon / \sigma_2$, \\ $\mu_2 = (x - \mu_\epsilon) \sigma_0^2 + \mu_0 \sigma_\epsilon^2$} \\ \Rfunction{hmmMEUnif} & Uniform with measurement error & \Rfunarg{lower, upper, sderr, meanerr} & $l,u,\mu_\epsilon,\sigma_\epsilon$ & $\mu_\epsilon$ (identity) & \parbox[t]{2in}{$(\Phi(x, \mu_\epsilon+l, \sigma_\epsilon) - \Phi(x, \mu_\epsilon+u, \sigma_\epsilon)) / \\ (u - l)$} \\ \hline \end{tabular} \caption{Hidden Markov model distributions in \Rpackage{msm}.} \label{tab:hmm:dists} \end{table} \paragraph{Covariates on hidden Markov model parameters} Most of the outcome distributions can be parameterised by covariates, using a link-transformed linear model. For example, an observation $y_{ij}$ may have distribution $f_1$ conditionally on underlying state 1. The link-transformed parameter $\theta_1$ is a linear function of the covariate vector $x_{ij}$ at the same observation time. \begin{eqnarray*} \label{eq:hmm:covs} y_{ij} | S_{ij} & \sim & f_1 (y | \theta_1, \gamma_1)\\ g(\theta_1) & = & \alpha + \beta^T x_{ij} \end{eqnarray*} Specifically, parameters named as the ``Location'' parameter in Table \ref{tab:hmm:dists} can be modelled in terms of covariates, with the given link function. The \Rfunarg{hcovariates} argument to \Rfunction{msm} specifies the model for covariates on the hidden Markov outcome distributions. This is a list of the same length as the number of underlying states, and the same length as the \Rfunarg{hmodel} list. Each element of the list is a formula, in standard R linear model notation, defining the covariates on the distribution for the corresponding state. If there are no covariates for a certain hidden state, then insert a \texttt{NULL} in the corresponding place in the list. For example, in the three-state normal-outcome example above, suppose that the normal means on states 1 and 2 are parameterised by a single covariate $x$. \[ \mu_1 = \alpha_1 + \beta_1 x_{ij}, \qquad \mu_2 = \alpha_2 + \beta_2 x_{ij}. \] The equivalent call to \Rfunction{msm} would be \begin{Scode} msm( state ~ time, subject=ptnum, data = example.df, qmatrix = rbind( c(0, 0.25, 0), c(0, 0, 0.2), c(0, 0, 0)), hmodel = list (hmmNorm(mean=90, sd=8), hmmNorm(mean=70, sd=8), hmmIdent(-9)), hcovariates = list ( ~ x, ~ x, NULL) ). \end{Scode} \paragraph{Constraints on hidden Markov model parameters} Sometimes it is realistic that parameters are shared between some of the state-specific outcome distributions. For example, the Normally-distributed outcome in the previous example could have a common variance $\sigma^2_1 = \sigma^2_2 = \sigma^2$ between states 1 and 2, but differing means. It would also be realistic for any covariates on the mean to have a common effect $\beta_1 = \beta_2 = \beta$ on the state 1 and 2 outcome distributions. The argument \Rfunarg{hconstraint} to \Rfunction{msm} specifies which hidden Markov model parameters are constrained to be equal. This is a named list. Each element is a vector of constraints on the named hidden Markov model parameter. The vector has length equal to the number of times that class of parameter appears in the whole model. As for the other constraint arguments such as \Rfunarg{qconstraint}, identical values of this vector indicate parameters constrained to be equal. For example, consider the three-state hidden Markov model described above, with normally-distributed outcomes for states 1 and 2. To constrain the outcome variance to be equal for states 1 and 2, and to also constrain the effect of \Robject{x} on the outcome mean to be equal for states 1 and 2, specify \begin{Scode} hconstraint = list(sd = c(1,1), x=c(1,1)) \end{Scode} Parameters of the outcome distributions may also be constrained within specific ranges. If chosen carefully, this may improve identifiability of hidden Markov states. For example to constrain the mean for state 1 to be between 80 and 110, and the mean for state 2 to be between 50 and 80, specify \begin{Scode} hranges = list(mean=list(lower=c(80,50), upper=c(110,80))) \end{Scode} Maximum likelihood estimation is then performed on the appropriate log or logit-transformed scale so that these constraints are satisfied. See the \Rfunction{msm} help page for further details. Note that initial values should be strictly within the ranges, and not on the range boundary. \paragraph{FEV$_1$ after lung transplants} Now we give an example of fitting a hidden Markov model to a real dataset. The data on FEV$_1$ measurements from lung transplant recipients, described in \ref{sec:hmm:example:fev}, are provided with the \Rpackage{msm} package in a dataset called \Robject{fev}. We fit models Models 1 and 2, each with three states and common $Q$ matrix. <<>>= three.q <- rbind(c(0, exp(-6), exp(-9)), c(0, 0, exp(-6)), c(0, 0, 0)) @ The simpler Model 1 is specified as follows. Under this model the FEV$_1$ outcome is Normal with unknown mean and variance, and the mean and variance are different between BOS state 1 and state 2. \Rfunarg{hcovariates} specifies that the mean of the Normal outcome depends linearly on acute events. Specifically, this covariate is an indicator for the occurrence of an acute event within 14 days of the observation, denoted \texttt{acute} in the data. As an initial guess, we suppose the mean FEV$_1$ is 100\% baseline in state 1, and 54\% baseline in state 2, with corresponding standard deviations 16 and 18, and FEV$_1$ observations coinciding with acute events are on average 8\% baseline lower. \Rfunarg{hconstraint} specifies that the acute event effect is equal between state 1 and state 2. Days of death are coded as 999 in the \texttt{fev} outcome variable. <<>>= hmodel1 <- list(hmmNorm(mean=100, sd=16), hmmNorm(mean=54, sd=18), hmmIdent(999)) fev1.msm <- msm(fev ~ days, subject=ptnum, data=fev, qmatrix=three.q, deathexact=3, hmodel=hmodel1, hcovariates=list(~acute, ~acute, NULL), hcovinits = list(-8, -8, NULL), hconstraint = list(acute = c(1,1))) fev1.msm sojourn.msm(fev1.msm) @ Printing the \Rclass{msm} object \Rfunarg{fev1.msm} shows estimates and confidence intervals for the transition intensities and the hidden Markov model parameters. The estimated within-state means of FEV$_1$ are around 98\% and 52\% baseline respectively. From the estimated transition intensities, individuals spend around 1421 days (3.9 years) before getting BOS, after which they live for an average of 1248 days (3.4 years). FEV$_1$ is lower by an average of 8\% baseline within 14 days of acute events. Model 2, where the outcome distribution is a more complex two-level model, is specified as follows. We use the distribution defined by equations \ref{eq:fev:level1}--\ref{eq:fev:level2}. The \Rfunction{hmmMETNorm} constructor defines the truncated normal outcome with an additional normal measurement error. The explicit probability density for this distribution is given in Table \ref{tab:hmm:dists}. Our initial values are 90 and 54 for the means of the within-state distribution of \emph{underlying} FEV$_1$, and 16 and 18 for the standard errors. This time, underlying FEV$_1$ is truncated normal. The truncation limits \Rfunarg{lower} and \Rfunarg{upper} are not estimated. We take an initial measurement error standard deviation of \Rfunarg{sderr=8}. The extra shift \Rfunarg{meanerr} in the measurement error model is fixed to zero and not estimated. The \Rfunarg{hconstraint} specifies that the measurement error variance $\sigma^2_\epsilon$ is equal between responses in states 1 and 2, as is the effect of short-term acute events on the FEV$_1$ response. The convergence of maximum likelihood estimation in this example is particularly sensitive to the optimisation method and options, initial values, the unit of the time variable and whether covariates are centered, probably because the likelihood surface is irregular near to the true maximum. \begin{Scode} hmodel2 <- list(hmmMETNorm(mean=90, sd=16, sderr=8, lower=80, upper=Inf, meanerr=0), hmmMETNorm(mean=54, sd=18, sderr=8, lower=0, upper=80, meanerr=0), hmmIdent(999)) fev2.msm <- msm(fev ~ days, subject=ptnum, data=fev, qmatrix=three.q, deathexact=3, hmodel=hmodel2, hcovariates=list(~acute, ~acute, NULL), hcovinits = list(-8, -8, NULL), hconstraint = list(sderr = c(1,1), acute = c(1,1)), control=list(maxit=10000), center=TRUE) \end{Scode} Under this model the standard deviation of FEV$_1$ measurements caused by measurement error (more realistically, natural short-term fluctuation) is around 9\% baseline. The estimated effect of acute events on FEV$_1$ and sojourn times in the BOS-free state and in BOS before death are similar to Model 1. The following code will create a plot that illustrates a trajectory of declining FEV$_1$ from the first lung transplant recipient in this dataset. The Viterbi algorithm is used to locate the most likely point at which this individual moved from BOS state 1 to BOS state 2, according to the fitted Model 2. This is illustrated by a vertical dotted line. This is the point at which the individual's lung function started to remain consistently below 80\% baseline FEV$_1$. \begin{Scode} keep <- fev$ptnum==1 & fev$fev<999 plot(fev$days[keep], fev$fev[keep], type="l", ylab=expression(paste("% baseline ", FEV[1])), xlab="Days after transplant") vit <- viterbi.msm(fev2.msm)[keep,] (max1 <- max(vit$time[vit$fitted==1])) (min2 <- min(vit$time[vit$fitted==2])) abline(v = mean(max1,min2), lty=2) text(max1 - 500, 50, "STATE 1") text(min2 + 500, 50, "STATE 2") \end{Scode} \includegraphics{figures/fev_viterbi} \paragraph{An alternative way of specifying a misclassification model} This general framework for specifying hidden Markov models can also be used to specify multi-state models with misclassification. A misclassification model is a hidden Markov model with a categorical outcome distribution. So instead of an \Rfunarg{ematrix} argument to \Rfunction{msm}, we can use a \Rfunarg{hmodel} argument with \Rfunction{hmmCat} constructor functions. \Rfunction{hmmCat} takes at least one argument \Rfunarg{prob}, a vector of probabilities of observing outcomes of $1, 2, \ldots, n$ respectively, where $n$ is the length of \Rfunarg{prob}. All outcome probabilities with an initial value of zero are assumed to be fixed at zero. \Rfunarg{prob} is scaled if necessary to sum to one. The model in section \ref{sec:fitting:hmm:misc} specifies that an individual occupying underlying state 1 can be observed as states 2 (and 1), underlying state 2 can be observed as states 1, 2 or 3, and state 3 can be observed as states 2 or 3, and underlying state 4 (death) cannot be misclassified. Initial values of 0.1 are given for the 1-2, 2-1, 2-3 and 3-2 misclassification probabilities. This is equivalent to the model below, specified using a \Rfunarg{hmodel} argument to \Rfunction{msm}. The maximum likelihood estimates should be the same as before (Model 5) if the \Rfunarg{obstrue=firstobs} is removed from the \Rfunction{msm()} call. \Rfunarg{obstrue} has a slightly different meaning for models specified with \Rfunarg{hmodel}. If supplied, the variable indicated by \Rfunarg{obstrue} should contain the true state if this is known, and \Robject{NA} otherwise, whereas the \Rfunarg{state} variable contains an observation generated from the HMM outcome model given the (unobserved) true state. For models specified with \Rfunarg{ematrix}, the \Rfunarg{state} variable contains the (observed) true state itself. Thus the \Rfunarg{hmodel} specification is not strictly suitable for the CAV data, since the true state is both \emph{known} and \emph{observed} at the time of transplant. \begin{Scode} Qm <- rbind(c(0, 0.148, 0, 0.0171), c(0, 0, 0.202, 0.081), c(0, 0, 0, 0.126), c(0, 0, 0, 0)) cavmisc.msm <- msm(state ~ years, subject = PTNUM, data = cav, hmodel = list (hmmCat(c(0.9, 0.1, 0, 0)), hmmCat(c(0.1, 0.8, 0.1, 0)), hmmCat(c(0, 0.1, 0.9, 0)), hmmIdent(4)), qmatrix = Qm, deathexact = 4) cavmisc.msm \end{Scode} \subsubsection{Hidden Markov models with multivariate outcomes} Since version 1.5.2, \Rpackage{msm} can fit models where at each time point, there are multiple outcomes generated conditionally on a single hidden Markov state. The outcomes must be independent conditionally on the hidden state, but they may be generated from the same or different univariate distributions. See \Rfunction{help(hmmMV)} for detailed documentation and a worked example. \subsubsection{Defining a new hidden Markov model distribution} Suppose the hidden Markov model outcome distributions supplied with \Rpackage{msm} (Table \ref{tab:hmm:dists}) are insufficient. We want to define our own univariate distribution, called \Rfunction{hmmNewDist}, taking two parameters \Robject{location} and \Robject{scale}. Download the source package, for example \texttt{msm-0.7.2.tar.gz} for version 0.7.2, from CRAN and edit the files in there, as follows. \begin{enumerate} \item Add an element to \Robject{.msm.HMODELPARS} in the file \texttt{R/constants.R}, naming the parameters of the distribution. For example \begin{Scode} newdist = c('location', 'scale') \end{Scode} \item Add a corresponding element to the C variable \texttt{HMODELS} in the file \texttt{src/lik.c}. This MUST be in the same position as in the \Robject{.msm.HMODELPARS} list. For example, \begin{Scode} hmmfn HMODELS[] = { ..., hmmNewDist };. \end{Scode} \item The new distribution is allowed to have one parameter which can be modelled in terms of covariates. Add the name of this parameter to the named vector \Robject{.msm.LOCPARS} in \texttt{R/constants.R}. For example \texttt{newdist = 'location'}. Specify \texttt{newdist = NA} if there is no such parameter. \item Supposed we have specified a parameter with a non-standard name, that is, one which doesn't already appear in \Robject{.msm.HMODELPARS}. Standard names include, for example, \texttt{'mean'}, \texttt{'sd'}, \texttt{'shape'} or \texttt{'scale'}. Then we should add the allowed range of the parameter to \Robject{.msm.PARRANGES}. In this example, we add \texttt{meanpars = c(-Inf, Inf)} to \Robject{.msm.PARRANGES}. This ensures that the optimisation to estimate the parameter takes place on a suitable scale, for example, a log scale for a parameter constrained to be positive. If the parameter should be fixed during maximum likelihood estimation (for example, the denominator of a binomial distribution) then add its name to \Robject{.msm.AUXPARS}. \item Add an R constructor function for the distribution to \texttt{R/hmm-dists.R}. For a simple univariate distribution, this is of the form \begin{Scode} hmmNewDist <- function(location, scale) { hmmDIST(label = "newdist", link = "identity", r = function(n) rnewdist(n, location, scale), match.call()) } \end{Scode} \begin{itemize} \item The \texttt{'label'} must be the same as the name you supplied for the new element of \Robject{.msm.HMODELPARS} \item \texttt{link} is the link function for modelling the location parameter of the distribution as a linear function of covariates. This should be the quoted name of an R function. A log link is \texttt{'log'} and a logit link is \texttt{'qlogis'}. If using a new link function other than \texttt{'identity'}, \texttt{'log'}, or \texttt{'qlogis'}, you should add its name to the vector \Robject{.msm.LINKFNS} in \texttt{R/constants.R}, and add the name of the corresponding inverse link to \Robject{.msm.INVLINK}. You should also add the names of these functions to the C array \texttt{LINKFNS} in \texttt{src/lik.c}, and write the functions if they do not already exist. \item \texttt{r} is an R function, of the above format, returning a vector of \texttt{n} random values from the distribution. You should write this if it doesn't already exist. \end{itemize} \item Add the name of the new constructor function to the NAMESPACE in the top-level directory of the source package. \item Write a C function to compute the probability density of the distribution, and put this in \texttt{src/hmm.c}, with a declaration in \texttt{src/hmm.h}. This must be of the form \begin{Scode} double hmmNewDist(double x, double *pars) \end{Scode} where \texttt{*pars} is a vector of the parameters of the distribution, and the density is evaluated at \texttt{x}. \item (Optionally) Write a C function to compute the derivatives of the probability density with respect to the parameters, and put this in \texttt{src/hmmderiv.c}, with a declaration in \texttt{src/hmm.h}. Then add the model to \texttt{DHMODELS} in \texttt{lik.c} (in the same position as in \texttt{HMODELS}) and \texttt{.msm.HMODELS.DERIV} in \texttt{R/constants.R}. This will generally double the speed of maximum likelihood estimation, but analytic derivatives will not be available for all distributions. \item Update the documentation (\texttt{man/hmm-dists.Rd}) and the distribution table in\\ \texttt{inst/doc/msm-manual.Rnw}) if you like. \item Recompile the package (see the ``Writing R Extensions'' manual) \end{enumerate} Your new distribution will be available to use in the \Rfunarg{hmodel} argument to \Rfunction{msm}, as, for example \begin{Scode} hmodel = list(..., hmmNewDist(location = 0, scale = 1), ...) \end{Scode} If your distribution may be of interest to others, ask me (\texttt{chris.jackson@mrc-bsu.cam.ac.uk}) to include it in a future release. \clearpage \section{\Rpackage{msm} reference guide} The R help page for \Rfunction{msm} gives details of all the allowed arguments and options to the \Rfunction{msm} function. To view this online in R, type: <>= help(msm) @ Similarly all the other functions in the package have help pages, which should always be consulted in case of doubt about how to call them. The web-browser based help interface may often be convenient - type <>= help.start() @ and navigate to \textsf{Packages} $\ldots$ \textsf{msm}, which brings up a list of all the functions in the package with links to their documentation, and a link to this manual in PDF format. \appendix \section{Changes in the msm package} For a detailed list of the changes in recent versions of \Rpackage{msm}, see the \texttt{NEWS} file in the top-level directory of the installed package. Development versions of \Rpackage{msm} are to be found on GitHub \url{https://github.com/chjackson/msm}. These are often more recent than the version released on CRAN, so if you think you have found a bug, then please check first to see whether it has been fixed on the GitHub version. \section{Get in touch} If you use \Rpackage{msm} in your work, whatever your field of application, please let me know, for my own interest! Suggestions for improvement are welcome. \clearpage \bibliography{msm} \end{document}