Initialization

In the first example, we use the Cox-Model and the ovarian data set from the survival package. In the first step we initialize the R6 data object.

library(tidyverse)
library(survival)
library(CaseBasedReasoning)
ovarian$resid.ds <- factor(ovarian$resid.ds)
ovarian$rx <- factor(ovarian$rx)
ovarian$ecog.ps <- factor(ovarian$ecog.ps)

# initialize R6 object
coxBeta <- CoxBetaModel$new(Surv(futime, fustat) ~ age + resid.ds + rx + ecog.ps)

All cases with missing values in the learning and end point variables are dropped (na.omit) and the reduced data set without missing values is saved internally. You get a text output on how many cases were dropped. character variables will be transformed to factor.

Similar Cases

After the initialization, we may want to get for each case in the query data the most similar case from the learning data.

n <- nrow(ovarian)
trainID <- sample(1:n, floor(0.8 * n), F)
testID <- (1:n)[-trainID]

# fit model 
ovarian[trainID, ] %>% 
  coxBeta$fit()
## Dropped cases with missing values: 0
## Start learning...
## Learning finished in: 0.78 seconds.
# get similar cases
ovarian[trainID, ] %>%
  coxBeta$get_similar_cases(queryData = ovarian[testID, ], k = 3) -> matchedData
## Start caclulating similar cases...
## Similar cases calculation finished in: 0.02 seconds.
knitr::kable(head(matchedData))
futime fustat age resid.ds rx ecog.ps scDist caseId scCaseId group
115 1 74.4932 2 1 1 0.0000000 0 1 Query Data
59 1 72.3315 2 1 1 0.2143509 1 1 Matched Data
268 1 74.5041 2 1 2 0.3158370 2 1 Matched Data
156 1 66.4658 2 1 2 0.4812288 3 1 Matched Data
803 0 39.2712 1 1 1 0.0000000 0 2 Query Data
744 0 50.1096 1 2 1 0.0860680 1 2 Matched Data
You may e xtract th en the sim ilar cases and t he verum d ata and put them toge ther:

Note 1: In the initialization step, we dropped all cases with missing values in the variables of data and endPoint. So, you need to make sure that NA handling is done by you.

Note 2: The data.table returned from coxBeta$get_similar_cases has four additional columns:

  1. caseId: By this column you may map the similar cases to cases in data, e.g. if you had chosen k = 3, then the first three elements in the column caseId will be 1 (following three 2 and so on). This means that this three cases are the three most similar cases to case 0 in verum data.
  2. scDist: The calculated distance
  3. scCaseId: Grouping number of query with matched data
  4. group: Grouping matched or query data

Distance Matrix

Alternatively, you may just be interested in the distance matrix, then you go this way:

ovarian %>%
  coxBeta$calc_distance_matrix() -> ditMatrix
## Start calculating distance matrix...
## Distance matrix calculation finished in: 0 seconds.

coxBeta$calc_distance_matrix() calculates the full distance matrix. This matrix the dimension: cases of data versus cases of query data. If the query dataset is bot available, this functions calculates a n times n distance matrix of all pairs in data. The distance matrix is saved internally in the CoxBetaModel object: coxBeta$distMat.

Check Proportional Hazard Assumption:

pp <- coxBeta$check_ph()
pp