Multivariate Survival Analysis

Marcel Wiesweg

2022-02-11

For a very short introduction on survival data, please refer to the vignette on univariate analysis. Multivariate analysis, using the technique of Cox regression, is applied when there are multiple, potentially interacting covariates. While the log-rank test and Kaplan-Meier plots require categorical variables, Cox regression works with continuous variables. (Of course, you can use it with categorical variables as well, but this has implications which are described below.)

The Multivariate Analysis result.

For the purpose of this vignette, we use the lung dataset from the survival package:

inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
3 306 2 74 1 1 90 100 1175 NA
3 455 2 68 1 0 90 90 1225 15
3 1010 1 56 1 0 90 90 NA 15
5 210 2 57 1 1 90 60 1150 11
1 883 2 60 1 0 100 90 NA 0
12 1022 1 74 1 1 50 80 513 0
7 310 2 68 2 2 70 60 384 10
11 361 2 71 2 2 60 80 538 1
1 218 2 53 1 1 70 80 825 16
7 166 2 61 1 2 70 70 271 34

We load the tidyverse, for some tools the tidytidbits package and finally and the survivalAnalysis package:

library(tidyverse)
library(tidytidbits)
library(survivalAnalysis)

Possibly interesting covariates are patient age, sex, ECOG status and the amount of weight loss. Sex is encoded as a numerical vector, but is in fact categorical. We need to make it a factor. ECOG status is at least ordinally scaled, so we leave it numerical for now. Following the two-step philosophy of survivalAnalysis, we first perform the analysis with analyse_multivariate and store the result object. We provide readable labels for the covariates to allow easy interpretation. There is a print() implementation which prints essential information for our result.

covariate_names <- c(age="Age at Dx",
                     sex="Sex",
                     ph.ecog="ECOG Status",
                     wt.loss="Weight Loss (6 mo.)",
                     `sex:female`="Female",
                     `ph.ecog:0`="ECOG 0",
                     `ph.ecog:1`="ECOG 1",
                     `ph.ecog:2`="ECOG 2",
                     `ph.ecog:3`="ECOG 3")

survival::lung %>%
  mutate(sex=rename_factor(sex, `1` = "male", `2` = "female")) %>%
  analyse_multivariate(vars(time, status),
                       covariates = vars(age, sex, ph.ecog, wt.loss),
                       covariate_name_dict = covariate_names) ->
  result
print(result)
#> $colname_unmangle_dict
#>      time    status       age       sex   ph.ecog   wt.loss 
#>    "time"  "status"     "age"     "sex" "ph.ecog" "wt.loss" 
#> 
#> $data
#>     time status age    sex ph.ecog wt.loss
#> 1    306      2  74   male       1      NA
#> 2    455      2  68   male       0      15
#> 3   1010      1  56   male       0      15
#> 4    210      2  57   male       1      11
#> 5    883      2  60   male       0       0
#> 6   1022      1  74   male       1       0
#> 7    310      2  68 female       2      10
#> 8    361      2  71 female       2       1
#> 9    218      2  53   male       1      16
#> 10   166      2  61   male       2      34
#> 11   170      2  57   male       1      27
#> 12   654      2  68 female       2      23
#> 13   728      2  68 female       1       5
#> 14    71      2  60   male      NA      32
#> 15   567      2  57   male       1      60
#> 16   144      2  67   male       1      15
#> 17   613      2  70   male       1      -5
#> 18   707      2  63   male       2      22
#> 19    61      2  56 female       2      10
#> 20    88      2  57   male       1      NA
#> 21   301      2  67   male       1      17
#> 22    81      2  49 female       0      -8
#> 23   624      2  50   male       1      16
#> 24   371      2  58   male       0      13
#> 25   394      2  72   male       0       0
#> 26   520      2  70 female       1       6
#> 27   574      2  60   male       0     -13
#> 28   118      2  70   male       3      20
#> 29   390      2  53   male       1      -7
#> 30    12      2  74   male       2      20
#> 31   473      2  69 female       1      -1
#> 32    26      2  73   male       2      20
#> 33   533      2  48   male       2     -11
#> 34   107      2  60 female       2     -15
#> 35    53      2  61   male       2      10
#> 36   122      2  62 female       2      NA
#> 37   814      2  65   male       2      28
#> 38   965      1  66 female       1       4
#> 39    93      2  74   male       2      24
#> 40   731      2  64 female       1      15
#> 41   460      2  70   male       1      10
#> 42   153      2  73 female       2      11
#> 43   433      2  59 female       0      27
#> 44   145      2  60 female       2      NA
#> 45   583      2  68   male       1       7
#> 46    95      2  76 female       2     -24
#> 47   303      2  74   male       0      30
#> 48   519      2  63   male       1      10
#> 49   643      2  74   male       0       2
#> 50   765      2  50 female       1       4
#> 51   735      2  72 female       1       9
#> 52   189      2  63   male       0       0
#> 53    53      2  68   male       0       0
#> 54   246      2  58   male       0       7
#> 55   689      2  59   male       1      15
#> 56    65      2  62   male       0      NA
#> 57     5      2  65 female       0       5
#> 58   132      2  57   male       2      18
#> 59   687      2  58 female       1      10
#> 60   345      2  64 female       1      -3
#> 61   444      2  75 female       2       8
#> 62   223      2  48   male       1      68
#> 63   175      2  73   male       1      NA
#> 64    60      2  65 female       1       0
#> 65   163      2  69   male       1       0
#> 66    65      2  68   male       2       8
#> 67   208      2  67 female       2       2
#> 68   821      1  64 female       0       3
#> 69   428      2  68   male       0       0
#> 70   230      2  67   male       1      23
#> 71   840      1  63   male       0      -1
#> 72   305      2  48 female       1      29
#> 73    11      2  74   male       2       0
#> 74   132      2  40   male       1       3
#> 75   226      2  53 female       1       3
#> 76   426      2  71 female       1      19
#> 77   705      2  51 female       0       0
#> 78   363      2  56 female       1      -2
#> 79    11      2  81   male       0      15
#> 80   176      2  73   male       0      30
#> 81   791      2  59   male       0       5
#> 82    95      2  55   male       1      15
#> 83   196      1  42   male       1       8
#> 84   167      2  44 female       1      -1
#> 85   806      1  44   male       1       1
#> 86   284      2  71   male       1      14
#> 87   641      2  62 female       1       1
#> 88   147      2  61   male       0       4
#> 89   740      1  44 female       1      39
#> 90   163      2  72   male       2       2
#> 91   655      2  63   male       0      -1
#> 92   239      2  70   male       1      23
#> 93    88      2  66   male       1       8
#> 94   245      2  57 female       1      14
#> 95   588      1  69 female       0      13
#> 96    30      2  72   male       2       7
#> 97   179      2  69   male       1      25
#> 98   310      2  71   male       1       0
#> 99   477      2  64   male       1       0
#> 100  166      2  70 female       0      10
#> 101  559      1  58 female       0      15
#> 102  450      2  69 female       1       3
#> 103  364      2  56   male       1       4
#> 104  107      2  63   male       1       0
#> 105  177      2  59   male       2      32
#> 106  156      2  66   male       1      14
#> 107  529      1  54 female       1      -3
#> 108   11      2  67   male       1      NA
#> 109  429      2  55   male       1       5
#> 110  351      2  75 female       2      11
#> 111   15      2  69   male       0      10
#> 112  181      2  44   male       1       5
#> 113  283      2  80   male       1       6
#> 114  201      2  75 female       0       1
#> 115  524      2  54 female       1      15
#> 116   13      2  76   male       2      20
#> 117  212      2  49   male       2      20
#> 118  524      2  68   male       2      30
#> 119  288      2  66   male       2      24
#> 120  363      2  80   male       1      11
#> 121  442      2  75   male       0       0
#> 122  199      2  60 female       2      10
#> 123  550      2  69 female       1       0
#> 124   54      2  72   male       2      -3
#> 125  558      2  70   male       0      17
#> 126  207      2  66   male       1      20
#> 127   92      2  50   male       1      13
#> 128   60      2  64   male       1       0
#> 129  551      1  77 female       2      28
#> 130  543      1  48 female       0       4
#> 131  293      2  59 female       1      52
#> 132  202      2  53   male       1      20
#> 133  353      2  47   male       0       5
#> 134  511      1  55 female       1      49
#> 135  267      2  67   male       0       6
#> 136  511      1  74 female       2      37
#> 137  371      2  58 female       1       0
#> 138  387      2  56   male       2      NA
#> 139  457      2  54   male       1      -5
#> 140  337      2  56   male       0      15
#> 141  201      2  73 female       2     -16
#> 142  404      1  74   male       1      38
#> 143  222      2  76   male       2       8
#> 144   62      2  65 female       1       0
#> 145  458      1  57   male       1      30
#> 146  356      1  53 female       1       2
#> 147  353      2  71   male       0       2
#> 148  163      2  54   male       1      13
#> 149   31      2  82   male       0      27
#> 150  340      2  59 female       0       0
#> 151  229      2  70   male       1      -2
#> 152  444      1  60   male       0       7
#> 153  315      1  62 female       0       0
#> 154  182      2  53 female       1       4
#> 155  156      2  55   male       2      10
#> 156  329      2  69   male       2      20
#> 157  364      1  68 female       1       7
#> 158  291      2  62   male       2      27
#> 159  179      2  63   male       1      -2
#> 160  376      1  56 female       1      17
#> 161  384      1  62 female       0       8
#> 162  268      2  44 female       1       2
#> 163  292      1  69   male       2      36
#> 164  142      2  63   male       1       2
#> 165  413      1  64   male       1      16
#> 166  266      1  57 female       0       3
#> 167  194      2  60 female       1      33
#> 168  320      2  46   male       0       4
#> 169  181      2  61   male       1       0
#> 170  285      2  65   male       0       0
#> 171  301      1  61   male       1       2
#> 172  348      2  58 female       0      10
#> 173  197      2  56   male       1      37
#> 174  382      1  43 female       0       6
#> 175  303      1  53   male       1      12
#> 176  296      1  59 female       1       0
#> 177  180      2  56   male       2      -2
#> 178  186      2  55 female       1      NA
#> 179  145      2  53 female       1      13
#> 180  269      1  74 female       0       0
#> 181  300      1  60   male       0       5
#> 182  284      1  39   male       0      -5
#> 183  350      2  66 female       0      NA
#> 184  272      1  65 female       1      -1
#> 185  292      1  51 female       0       0
#> 186  332      1  45 female       0       5
#> 187  285      2  72 female       2      20
#> 188  259      1  58   male       0       8
#> 189  110      2  64   male       1      12
#> 190  286      2  53   male       0       8
#> 191  270      2  72   male       1      14
#> 192   81      2  52   male       2      NA
#> 193  131      2  50   male       1      NA
#> 194  225      1  64   male       1      33
#> 195  269      2  71   male       1      -2
#> 196  225      1  70   male       0       6
#> 197  243      1  63 female       1       0
#> 198  279      1  64   male       1       4
#> 199  276      1  52 female       0       0
#> 200  135      2  60   male       1       0
#> 201   79      2  64 female       1      37
#> 202   59      2  73   male       1       5
#> 203  240      1  63 female       0       0
#> 204  202      1  50 female       0       1
#> 205  235      1  63 female       0       0
#> 206  105      2  62   male       2      NA
#> 207  224      1  55 female       0      23
#> 208  239      2  50 female       2      -3
#> 209  237      1  69   male       1      NA
#> 210  173      1  59 female       1      10
#> 211  252      1  60 female       0      -2
#> 212  221      1  67   male       1      23
#> 213  185      1  69   male       1       0
#> 214   92      1  64 female       2      31
#> 215   13      2  65   male       1      10
#> 216  222      1  65   male       1      18
#> 217  192      1  41 female       1     -10
#> 218  183      2  76   male       2       7
#> 219  211      1  70 female       2       3
#> 220  175      1  57 female       0      11
#> 221  197      1  67   male       1       2
#> 222  203      1  71 female       1       0
#> 223  116      2  76   male       1       0
#> 224  188      1  77   male       1       3
#> 225  191      1  39   male       0      -5
#> 226  105      1  75 female       2       5
#> 227  174      1  66   male       1       1
#> 228  177      1  58 female       1       0
#> 
#> $data_attributes
#> $data_attributes$survival_ids
#>     time   status 
#>   "time" "status" 
#> 
#> $data_attributes$factor_ids
#>       age       sex   ph.ecog   wt.loss 
#>     "age"     "sex" "ph.ecog" "wt.loss" 
#> 
#> $data_attributes$strata_ids
#> character(0)
#> 
#> 
#> $coxph
#> Call:
#> coxph(formula = Surv(time, status) ~ age + sex + ph.ecog + wt.loss, 
#>     data = data)
#> 
#>                coef exp(coef)  se(coef)      z        p
#> age        0.013369  1.013459  0.009628  1.389 0.164951
#> sexfemale -0.590775  0.553898  0.175339 -3.369 0.000754
#> ph.ecog    0.515111  1.673824  0.125988  4.089 4.34e-05
#> wt.loss   -0.009006  0.991034  0.006658 -1.353 0.176135
#> 
#> Likelihood ratio test=31.02  on 4 df, p=3.032e-06
#> n= 213, number of events= 151 
#>    (15 observations deleted due to missingness)
#> 
#> $summary
#> Call:
#> coxph(formula = Surv(time, status) ~ age + sex + ph.ecog + wt.loss, 
#>     data = data)
#> 
#>   n= 213, number of events= 151 
#>    (15 observations deleted due to missingness)
#> 
#>                coef exp(coef)  se(coef)      z Pr(>|z|)    
#> age        0.013369  1.013459  0.009628  1.389 0.164951    
#> sexfemale -0.590775  0.553898  0.175339 -3.369 0.000754 ***
#> ph.ecog    0.515111  1.673824  0.125988  4.089 4.34e-05 ***
#> wt.loss   -0.009006  0.991034  0.006658 -1.353 0.176135    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#>           exp(coef) exp(-coef) lower .95 upper .95
#> age          1.0135     0.9867    0.9945    1.0328
#> sexfemale    0.5539     1.8054    0.3928    0.7811
#> ph.ecog      1.6738     0.5974    1.3076    2.1427
#> wt.loss      0.9910     1.0090    0.9782    1.0041
#> 
#> Concordance= 0.647  (se = 0.026 )
#> Likelihood ratio test= 31.02  on 4 df,   p=3e-06
#> Wald test            = 29.94  on 4 df,   p=5e-06
#> Score (logrank) test = 30.65  on 4 df,   p=4e-06
#> 
#> 
#> $summaryAsFrame
#>    factor.id         factor.name factor.value        HR  Lower_CI  Upper_CI
#> 1 sex:female                 Sex       female 0.5538981 0.3928084 0.7810504
#> 2    wt.loss Weight Loss (6 mo.) <continuous> 0.9910344 0.9781867 1.0040508
#> 3        age           Age at Dx <continuous> 1.0134589 0.9945143 1.0327643
#> 4    ph.ecog         ECOG Status <continuous> 1.6738244 1.3075807 2.1426504
#>      Inv_HR Inv_Lower_CI Inv_Upper_CI            p
#> 1 1.8053862    1.2803272    2.5457706 7.535384e-04
#> 2 1.0090467    0.9959656    1.0222997 1.761353e-01
#> 3 0.9867199    0.9682752    1.0055160 1.649509e-01
#> 4 0.5974342    0.4667117    0.7647712 4.340524e-05
#> 
#> $overall
#> $overall$n
#> [1] 213
#> 
#> $overall$covariates
#> [1] "age"     "sex"     "ph.ecog" "wt.loss"
#> 
#> $overall$`Likelihood ratio test p`
#> [1] 3.031988e-06
#> 
#> $overall$`Wald test p`
#> [1] 5.029969e-06
#> 
#> $overall$`Score (logrank) test p`
#> [1] 3.61563e-06
#> 
#> 
#> $formula
#> [1] "Surv(time, status) ~ age + sex + ph.ecog + wt.loss"
#> 
#> attr(,"class")
#> [1] "SurvivalAnalysisResult"             "SurvivalAnalysisMultivariateResult"
#> [3] "list"

A Note on Categorical Variables

In the example above, you may have noted that the hazard ratio is given for women compared to men (women have a better outcome, HR 0.55). What is the hazard ratio for men compared to women? In the case of a binary variable, this is simply the inverted HR, which is also always provided (1.81). Things get more complicated with three or more categories. The rule is: You must choose one level of the factor as the reference level with defined hazard ratio 1.0. Then, for k levels, there will be k-1 pseudo variables created which represent the hazard ratio of this level compared to subjects in the reference level. (For the comparison of one level vs. all remaining subjects see the paragraph on one-hot analysis further down.)

As an example, we consider the two covariates which were significant above, sex and ECOG, and regard the ECOG status as a categorical variable with four levels. As reference level, we choose ECOG=0 with the parameter reference_level_dict.

survival::lung %>%
  mutate(sex=rename_factor(sex, `1` = "male", `2` = "female"),
         ph.ecog = as.factor(ph.ecog)) %>%
  analyse_multivariate(vars(time, status),
                       covariates = vars(sex, ph.ecog),
                       covariate_name_dict=covariate_names,
                       reference_level_dict=c(ph.ecog="0"))
#> $colname_unmangle_dict
#>      time    status       sex   ph.ecog 
#>    "time"  "status"     "sex" "ph.ecog" 
#> 
#> $data
#>     time status    sex ph.ecog
#> 1    306      2   male       1
#> 2    455      2   male       0
#> 3   1010      1   male       0
#> 4    210      2   male       1
#> 5    883      2   male       0
#> 6   1022      1   male       1
#> 7    310      2 female       2
#> 8    361      2 female       2
#> 9    218      2   male       1
#> 10   166      2   male       2
#> 11   170      2   male       1
#> 12   654      2 female       2
#> 13   728      2 female       1
#> 14    71      2   male    <NA>
#> 15   567      2   male       1
#> 16   144      2   male       1
#> 17   613      2   male       1
#> 18   707      2   male       2
#> 19    61      2 female       2
#> 20    88      2   male       1
#> 21   301      2   male       1
#> 22    81      2 female       0
#> 23   624      2   male       1
#> 24   371      2   male       0
#> 25   394      2   male       0
#> 26   520      2 female       1
#> 27   574      2   male       0
#> 28   118      2   male       3
#> 29   390      2   male       1
#> 30    12      2   male       2
#> 31   473      2 female       1
#> 32    26      2   male       2
#> 33   533      2   male       2
#> 34   107      2 female       2
#> 35    53      2   male       2
#> 36   122      2 female       2
#> 37   814      2   male       2
#> 38   965      1 female       1
#> 39    93      2   male       2
#> 40   731      2 female       1
#> 41   460      2   male       1
#> 42   153      2 female       2
#> 43   433      2 female       0
#> 44   145      2 female       2
#> 45   583      2   male       1
#> 46    95      2 female       2
#> 47   303      2   male       0
#> 48   519      2   male       1
#> 49   643      2   male       0
#> 50   765      2 female       1
#> 51   735      2 female       1
#> 52   189      2   male       0
#> 53    53      2   male       0
#> 54   246      2   male       0
#> 55   689      2   male       1
#> 56    65      2   male       0
#> 57     5      2 female       0
#> 58   132      2   male       2
#> 59   687      2 female       1
#> 60   345      2 female       1
#> 61   444      2 female       2
#> 62   223      2   male       1
#> 63   175      2   male       1
#> 64    60      2 female       1
#> 65   163      2   male       1
#> 66    65      2   male       2
#> 67   208      2 female       2
#> 68   821      1 female       0
#> 69   428      2   male       0
#> 70   230      2   male       1
#> 71   840      1   male       0
#> 72   305      2 female       1
#> 73    11      2   male       2
#> 74   132      2   male       1
#> 75   226      2 female       1
#> 76   426      2 female       1
#> 77   705      2 female       0
#> 78   363      2 female       1
#> 79    11      2   male       0
#> 80   176      2   male       0
#> 81   791      2   male       0
#> 82    95      2   male       1
#> 83   196      1   male       1
#> 84   167      2 female       1
#> 85   806      1   male       1
#> 86   284      2   male       1
#> 87   641      2 female       1
#> 88   147      2   male       0
#> 89   740      1 female       1
#> 90   163      2   male       2
#> 91   655      2   male       0
#> 92   239      2   male       1
#> 93    88      2   male       1
#> 94   245      2 female       1
#> 95   588      1 female       0
#> 96    30      2   male       2
#> 97   179      2   male       1
#> 98   310      2   male       1
#> 99   477      2   male       1
#> 100  166      2 female       0
#> 101  559      1 female       0
#> 102  450      2 female       1
#> 103  364      2   male       1
#> 104  107      2   male       1
#> 105  177      2   male       2
#> 106  156      2   male       1
#> 107  529      1 female       1
#> 108   11      2   male       1
#> 109  429      2   male       1
#> 110  351      2 female       2
#> 111   15      2   male       0
#> 112  181      2   male       1
#> 113  283      2   male       1
#> 114  201      2 female       0
#> 115  524      2 female       1
#> 116   13      2   male       2
#> 117  212      2   male       2
#> 118  524      2   male       2
#> 119  288      2   male       2
#> 120  363      2   male       1
#> 121  442      2   male       0
#> 122  199      2 female       2
#> 123  550      2 female       1
#> 124   54      2   male       2
#> 125  558      2   male       0
#> 126  207      2   male       1
#> 127   92      2   male       1
#> 128   60      2   male       1
#> 129  551      1 female       2
#> 130  543      1 female       0
#> 131  293      2 female       1
#> 132  202      2   male       1
#> 133  353      2   male       0
#> 134  511      1 female       1
#> 135  267      2   male       0
#> 136  511      1 female       2
#> 137  371      2 female       1
#> 138  387      2   male       2
#> 139  457      2   male       1
#> 140  337      2   male       0
#> 141  201      2 female       2
#> 142  404      1   male       1
#> 143  222      2   male       2
#> 144   62      2 female       1
#> 145  458      1   male       1
#> 146  356      1 female       1
#> 147  353      2   male       0
#> 148  163      2   male       1
#> 149   31      2   male       0
#> 150  340      2 female       0
#> 151  229      2   male       1
#> 152  444      1   male       0
#> 153  315      1 female       0
#> 154  182      2 female       1
#> 155  156      2   male       2
#> 156  329      2   male       2
#> 157  364      1 female       1
#> 158  291      2   male       2
#> 159  179      2   male       1
#> 160  376      1 female       1
#> 161  384      1 female       0
#> 162  268      2 female       1
#> 163  292      1   male       2
#> 164  142      2   male       1
#> 165  413      1   male       1
#> 166  266      1 female       0
#> 167  194      2 female       1
#> 168  320      2   male       0
#> 169  181      2   male       1
#> 170  285      2   male       0
#> 171  301      1   male       1
#> 172  348      2 female       0
#> 173  197      2   male       1
#> 174  382      1 female       0
#> 175  303      1   male       1
#> 176  296      1 female       1
#> 177  180      2   male       2
#> 178  186      2 female       1
#> 179  145      2 female       1
#> 180  269      1 female       0
#> 181  300      1   male       0
#> 182  284      1   male       0
#> 183  350      2 female       0
#> 184  272      1 female       1
#> 185  292      1 female       0
#> 186  332      1 female       0
#> 187  285      2 female       2
#> 188  259      1   male       0
#> 189  110      2   male       1
#> 190  286      2   male       0
#> 191  270      2   male       1
#> 192   81      2   male       2
#> 193  131      2   male       1
#> 194  225      1   male       1
#> 195  269      2   male       1
#> 196  225      1   male       0
#> 197  243      1 female       1
#> 198  279      1   male       1
#> 199  276      1 female       0
#> 200  135      2   male       1
#> 201   79      2 female       1
#> 202   59      2   male       1
#> 203  240      1 female       0
#> 204  202      1 female       0
#> 205  235      1 female       0
#> 206  105      2   male       2
#> 207  224      1 female       0
#> 208  239      2 female       2
#> 209  237      1   male       1
#> 210  173      1 female       1
#> 211  252      1 female       0
#> 212  221      1   male       1
#> 213  185      1   male       1
#> 214   92      1 female       2
#> 215   13      2   male       1
#> 216  222      1   male       1
#> 217  192      1 female       1
#> 218  183      2   male       2
#> 219  211      1 female       2
#> 220  175      1 female       0
#> 221  197      1   male       1
#> 222  203      1 female       1
#> 223  116      2   male       1
#> 224  188      1   male       1
#> 225  191      1   male       0
#> 226  105      1 female       2
#> 227  174      1   male       1
#> 228  177      1 female       1
#> 
#> $data_attributes
#> $data_attributes$survival_ids
#>     time   status 
#>   "time" "status" 
#> 
#> $data_attributes$factor_ids
#>       sex   ph.ecog 
#>     "sex" "ph.ecog" 
#> 
#> $data_attributes$strata_ids
#> character(0)
#> 
#> 
#> $coxph
#> Call:
#> coxph(formula = Surv(time, status) ~ sex + ph.ecog, data = data)
#> 
#>              coef exp(coef) se(coef)      z        p
#> sexfemale -0.5449    0.5799   0.1681 -3.241  0.00119
#> ph.ecog1   0.4182    1.5192   0.1994  2.097  0.03602
#> ph.ecog2   0.9475    2.5792   0.2248  4.216 2.49e-05
#> ph.ecog3   2.0485    7.7565   1.0269  1.995  0.04605
#> 
#> Likelihood ratio test=29.5  on 4 df, p=6.172e-06
#> n= 227, number of events= 164 
#>    (1 observation deleted due to missingness)
#> 
#> $summary
#> Call:
#> coxph(formula = Surv(time, status) ~ sex + ph.ecog, data = data)
#> 
#>   n= 227, number of events= 164 
#>    (1 observation deleted due to missingness)
#> 
#>              coef exp(coef) se(coef)      z Pr(>|z|)    
#> sexfemale -0.5449    0.5799   0.1681 -3.241  0.00119 ** 
#> ph.ecog1   0.4182    1.5192   0.1994  2.097  0.03602 *  
#> ph.ecog2   0.9475    2.5792   0.2248  4.216 2.49e-05 ***
#> ph.ecog3   2.0485    7.7565   1.0269  1.995  0.04605 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#>           exp(coef) exp(-coef) lower .95 upper .95
#> sexfemale    0.5799     1.7245    0.4171    0.8062
#> ph.ecog1     1.5192     0.6582    1.0277    2.2459
#> ph.ecog2     2.5792     0.3877    1.6602    4.0067
#> ph.ecog3     7.7565     0.1289    1.0366   58.0390
#> 
#> Concordance= 0.642  (se = 0.025 )
#> Likelihood ratio test= 29.5  on 4 df,   p=6e-06
#> Wald test            = 30.7  on 4 df,   p=4e-06
#> Score (logrank) test = 32.71  on 4 df,   p=1e-06
#> 
#> 
#> $summaryAsFrame
#>    factor.id factor.name factor.value        HR  Lower_CI   Upper_CI    Inv_HR
#> 1 sex:female         Sex       female 0.5798863 0.4170846  0.8062348 1.7244761
#> 2  ph.ecog:1 ECOG Status            1 1.5192159 1.0276575  2.2459010 0.6582343
#> 3  ph.ecog:2 ECOG Status            2 2.5791741 1.6602467  4.0067172 0.3877210
#> 4  ph.ecog:3 ECOG Status            3 7.7564573 1.0365901 58.0389756 0.1289248
#>   Inv_Lower_CI Inv_Upper_CI            p
#> 1    1.2403335    2.3975953 1.191350e-03
#> 2    0.4452556    0.9730869 3.601567e-02
#> 3    0.2495809    0.6023201 2.490622e-05
#> 4    0.0172298    0.9647014 4.604714e-02
#> 
#> $overall
#> $overall$n
#> [1] 227
#> 
#> $overall$covariates
#> [1] "sex"     "ph.ecog"
#> 
#> $overall$`Likelihood ratio test p`
#> [1] 6.172497e-06
#> 
#> $overall$`Wald test p`
#> [1] 3.528987e-06
#> 
#> $overall$`Score (logrank) test p`
#> [1] 1.372107e-06
#> 
#> 
#> $formula
#> [1] "Surv(time, status) ~ sex + ph.ecog"
#> 
#> attr(,"class")
#> [1] "SurvivalAnalysisResult"             "SurvivalAnalysisMultivariateResult"
#> [3] "list"

Sidenote: We are very much used to see hazard ratios for a binary covariate. Please note the different interpretation for continous variables: The hazard ratio is to be interpreted with regard to a change of size 1. With a binary variable, there is only one step from 0 to 1. With age, the range is different! Assume we detect a hazard ratio of 1.04 for age in some study. What is the calculated HR of a 75y old patient compared to a 45y old?

exp((75-45)*log(1.04)) 
#> [1] 3.243398

A Forest Plot

The usual method to display results of multivariate analyses is the forest plot. The survivalAnalysis package provides an implementation which generates ready-to-publish plots and allows extensive customization.

forest_plot(result)
#> Warning: `lgl_along()` is deprecated as of rlang 0.2.0.
#> This warning is displayed once per session.

Ok, this one is not ready to publish. We need to tweak some parameters:

forest_plot(result,
            factor_labeller = covariate_names,
            endpoint_labeller = c(time="OS"),
            orderer = ~order(HR),
            labels_displayed = c("endpoint", "factor", "n"),
            ggtheme = ggplot2::theme_bw(base_size = 10),
            relative_widths = c(1, 1.5, 1),
            HR_x_breaks = c(0.25, 0.5, 0.75, 1, 1.5, 2))

The forest_plot function actually accepts any number of results and will display them all in the same plot. For example, you may want to display both OS and PFS in the same plot, but of course ordered and with a line separating the two. To do that,

Finally, if you want separate plots but display them in a grid, use forest_plot_grid to do the grid layout and forest_plot’s title argument to add the A, B, C… labels.

Multiple Univariate Analyses

It is common practice to perform univariate analyses of all covariates first and take only those into the multivariate analysis which were significant to some level in the univariate analysis. (I see some pros and strong cons with this procedure, but am open to learn more on this). The univariate part can easily be achieved using purrr’s map function. A forest plot, as already said, will happily plot multiple results, even if they come as a list.


df <- survival::lung %>% mutate(sex=rename_factor(sex, `1` = "male", `2` = "female"))

map(vars(age, sex, ph.ecog, wt.loss), function(by)
{
  analyse_multivariate(df,
                       vars(time, status),
                       covariates = list(by), # covariates expects a list
                       covariate_name_dict = covariate_names)
}) %>%
  forest_plot(factor_labeller = covariate_names,
              endpoint_labeller = c(time="OS"),
              orderer = ~order(HR),
              labels_displayed = c("endpoint", "factor", "n"),
              ggtheme = ggplot2::theme_bw(base_size = 10))

Note how the values only slighlty differ. The number of cases is larger each time because the multivariate analysis will only include the subset of subjects for which all covariates are known. Age is significant in UV but not in MV - there is probably interaction between ECOG and age.

One-Hot Encoding

We are moving to the grounds of exploratory analysis. For a somewhat interesting example, we add the KRAS mutational status to the data set (by random sampling, for the sake of this tutorial). No, there is a categorical variable with five levels, but none of these comes natural as reference level. One may argue that wild type should be the reference level, but we may want to know if wild type is better than any KRAS mutation. If we omit wild-type and compare only among mutated tumors, there is definitely no suitable reference level.

The one-hot parameter triggers a mode where for each factor level, the hazard ratio vs. the remaining cohort is plotted. This means that no level is omitted. Please be aware of the statistical caveats. And please note that this has nothing to do any more with multivariate analysis. In fact, now you need the result of the univariate, categorically-minded analyse_survival.


survival::lung %>% 
  mutate(kras=sample(c("WT", "G12C", "G12V", "G12D", "G12A"), 
                     nrow(.), 
                     replace = TRUE, 
                     prob = c(0.6, 0.24, 0.16, 0.06, 0.04))
          ) %>%
  analyse_survival(vars(time, status), by=kras) %>%
  forest_plot(use_one_hot=TRUE,
              endpoint_labeller = c(time="OS"),
              orderer = ~order(HR),
              labels_displayed = c("endpoint", "factor", "n"),
              ggtheme = ggplot2::theme_bw(base_size = 10))

We randomly assigned the RAS status, so results will differ at each generation of this vignette. If one of the subgroups should differ significantly, by the law of small numbers, it will probably be one of the rarer variants.