Type: | Package |
Title: | Estimate Latent Classes on a Mixture of Continuous and Dichotomous Data |
Version: | 1.0.0 |
Description: | EQ-5D value set estimation can be done using the hybrid model likelihood as described by Oppe and van Hout (2010) <doi:10.1002/hec.3560> and Ramos-Goñi et al. (2017) <doi:10.1097/MLR.0000000000000283 >. The package is based on 'flexmix()' and among others contains an M-step-driver as described by Leisch (2004) <doi:10.18637/jss.v011.i08>. Users can estimate latent classes and address preference heterogeneity. Both uncensored and censored data are supported. Furthermore, heteroscedasticity can be taken into account. It is possible to control for different covariates on the continuous and dichotomous parts of the data and start values can differ between the expected latent classes. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.1 |
Imports: | flexmix, bbmle, ggplot2, methods, stats, utils |
Suggests: | testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
Depends: | R (≥ 2.10) |
NeedsCompilation: | no |
Packaged: | 2025-09-30 10:46:26 UTC; selkenkamp |
Author: | Svenja Elkenkamp [aut, cre], Kim Rand [aut], John Grosser [aut], EuroQol [fnd] |
Maintainer: | Svenja Elkenkamp <svenja.elkenkamp@uni-bielefeld.de> |
Repository: | CRAN |
Date/Publication: | 2025-10-06 08:30:18 UTC |
M-step driver to be used in flexmix
Description
Function used in flexmix M-Step to estimate hybrid model
Usage
FLXMRhyreg(
formula = . ~ .,
family = c("hyreg"),
type = NULL,
type_cont = NULL,
type_dich = NULL,
variables_both = NULL,
variables_cont = NULL,
variables_dich = NULL,
stv = NULL,
offset = NULL,
opt_method = "BFGS",
optimizer = "optim",
lower = -Inf,
upper = Inf,
...
)
Arguments
formula |
linear model |
family |
default |
type |
|
type_cont |
value of |
type_dich |
value of |
variables_both |
|
variables_cont |
|
variables_dich |
character vactor; variables to be fitted only on dichotomous data. see Details of hyreg2 |
stv |
|
offset |
offset as in |
opt_method |
|
optimizer |
|
lower |
lower bound for censored data. If this is used, |
upper |
upper bound for censored data. If this is used, |
... |
additional arguments for |
Value
a model
object, that can be used in hyreg2 as input for parameter model
in flexmix::flexmix()
a model object, that can be used in hyreg2 as input for parameter model in flexmix::flexmix
Author(s)
Svenja Elkenkamp and Kim Rand
Examples
formula <- y ~ -1 + x1 + x2 + x3
the$k <- 2
stv <- setNames(c(0.2,0,1,1,1),c(colnames(simulated_data_norm)[3:5],c("sigma","theta")))
x <- model.matrix(formula,simulated_data_norm)
y <- simulated_data_norm$y
w <- 1
model <- FLXMRhyreg(formula = formula,
family=c("hyreg"),
type = simulated_data_norm$type,
stv = stv,
type_cont = "TTO",
type_dich = "DCE_A",
opt_method = "L-BFGS-B",
control = list(iter.max = 1000, verbose = 4),
offset = NULL,
optimizer = "optim",
variables_both = names(stv)[!is.element(names(stv),c("sigma","theta"))],
variables_cont = NULL,
variables_dich = NULL,
lower = -Inf,
upper = Inf,
)
M-step driver to be used in flexmix accounting for heteroscedastisity
Description
Function used in flexmix M-Step to estimate hybrid model accounting for heteroscedastisity
Usage
FLXMRhyreg_het(
data,
formula = . ~ .,
formula_sigma = formula_sigma,
family = c("hyreg"),
type = NULL,
type_cont = NULL,
type_dich = NULL,
variables_both = NULL,
variables_cont = NULL,
variables_dich = NULL,
stv = NULL,
stv_sigma = NULL,
offset = NULL,
opt_method = "BFGS",
optimizer = "optim",
lower = -Inf,
upper = Inf,
...
)
Arguments
data |
a |
formula |
linear model |
formula_sigma |
|
family |
default |
type |
|
type_cont |
value of |
type_dich |
value of |
variables_both |
|
variables_cont |
|
variables_dich |
character vactor; variables to be fitted only on dichotomous data. see Details of hyreg2_het |
stv |
|
stv_sigma |
|
offset |
offset as in |
opt_method |
|
optimizer |
|
lower |
lower bound for censored data. If this is used, |
upper |
upper bound for censored data. If this is used, |
... |
additional arguments for |
Value
a model
object, that can be used in hyreg2_het as input for parameter model
in flexmix::flexmix()
Author(s)
Svenja Elkenkamp and Kim Rand
Examples
formula <- y ~ -1 + x1 + x2 + x3
formula_sigma <- y ~ x1 + x2 + x3
stv <- setNames(c(0.2,0,1,1),c(colnames(simulated_data_norm)[3:5],c("theta")))
stv_sigma <- setNames(c(0.2,0.2,0.1,1),c(colnames(simulated_data_norm)[3:5],c("(Intercept)")))
x <- model.matrix(formula,simulated_data_norm)
y <- simulated_data_norm$y
w <- 1
model <- FLXMRhyreg_het( data = simulated_data_norm,
formula = formula,
formula_sigma = formula_sigma,
family=c("hyreg"),
type = simulated_data_norm$type,
stv = stv,
stv_sigma = stv_sigma,
type_cont = "TTO",
type_dich = "DCE_A",
opt_method = "L-BFGS-B",
control = list(iter.max = 1000, verbose = 4),
offset = NULL,
optimizer = "optim",
variables_both = names(stv)[!is.element(names(stv),c("theta"))],
variables_cont = NULL,
variables_dich = NULL,
lower = -Inf,
upper = Inf,
)
extract parameter estimates as named vector
Description
function to export coefficient values and names from a model
fitted with hyreg2
or hyreg2_het
These values can be used as stv
for a new model with k > 1
Usage
get_stv(mod, comp = "Comp.1")
Arguments
mod |
|
comp |
|
Value
named vector
of parameter estimates from mod
. Can be used as stv
for additional model estimations using
hyreg2
or hyreg2_het
Author(s)
Svenja Elkenkamp
Examples
formula <- y ~ -1 + x1 + x2 + x3 | id
k <- 1
stv <- setNames(c(0.2,0,1,1,1),c(colnames(simulated_data_norm)[3:5],c("sigma","theta")))
control = list(iter.max = 1000, verbose = 4)
rm(counter)
mod <- hyreg2(formula = formula,
data = simulated_data_norm,
type = simulated_data_norm$type,
stv = stv,
k = k,
type_cont = "TTO",
type_dich = "DCE_A",
opt_method = "L-BFGS-B",
control = control,
latent = "both",
id_col = "id"
)
new_stv <- get_stv(mod)
# these new_stv can be used in an other estimation using hyreg2 as stv
function to decode which group or observation was classified to which class by the model
Description
This function can be used to decode the classified classes by the model generates using hyreg2
or hyreg2_het
and see,
which group or observation was signed to which class
Usage
give_id(data, model, id_col = NULL)
Arguments
data |
a |
model |
a flexmix |
id_col |
|
Value
dataframe
of two columns, first column named as provided id_col
or "observation"
if id_col
was not given as
an input. second column named "mod_comp"
indicating the assigned class for this group or observation
Author(s)
Svenja Elkenkamp & John Grosser
Examples
# estimate a model using simulated_data_rnorm
### using grouping variable id ####
formula <- y ~ -1 + x1 + x2 + x3 | id
k <- 1
stv <- setNames(c(0.2,0.2,0.2,1,1),c(colnames(simulated_data_norm)[3:5],c("sigma","theta")))
control <- list(iter.max = 1000, verbose = 4)
hyflex_mod <- hyreg2(formula = formula,
data = simulated_data_norm,
type = simulated_data_norm$type,
stv = stv,
k = k,
type_cont = "TTO",
type_dich = "DCE_A",
opt_method = "L-BFGS-B",
control = control,
latent = "both",
id_col = "id"
)
# use of function give_id
give_id(data = simulated_data_norm,
model = hyflex_mod,
id_col = "id")
function for model estimation for EQ-5D valuesets
Description
Estimation of hybrid model for EQ-5D data
Usage
hyreg2(
formula,
data,
type,
type_cont,
type_dich,
k = 1,
control = NULL,
stv = NULL,
offset = NULL,
opt_method = "BFGS",
optimizer = "optim",
lower = -Inf,
upper = Inf,
latent = "both",
id_col = NULL,
classes_only = FALSE,
variables_both = NULL,
variables_dich = NULL,
variables_cont = NULL,
...
)
Arguments
formula |
linear model |
data |
a |
type |
either the name of the column in |
type_cont |
value of |
type_dich |
Value of |
k |
|
control |
control list for |
stv |
|
offset |
offset as in |
opt_method |
|
optimizer |
|
lower |
|
upper |
|
latent |
|
id_col |
|
classes_only |
|
variables_both |
|
variables_dich |
|
variables_cont |
|
... |
additional arguments for |
Details
see details of different inputs listed below
Value
model object of type flexmix
or list
of model objects of type flexmix
formula
a typical R formula of the form y ~ x1 + x2 + …
should be provided.
Additionally, it is possible to include a grouping variable for repeated measures by using
“| xg”
where xg
is the column containing the group-memberships. The resulting formula will look
like this: y ~ x1 + x2 +… | xg
. In flexmix
, this is called the concomitant variable specification:
the model is fit conditional on grouping, so that all observations with the same group are treated
as belonging together when computing likelihood contributions. One possible grouping variable can be
an id number to identify answers by the same participants. We highly recommend using a grouping variable,
since otherwise the algorithm for k = 2 tends to classify all continuous data into one estimated class
and all dichotomous data into the other.
data
a dataframe having the following columns: all independent variables (x)
and the dependent variable y used in formula
, one column for the grouping variable xg if grouping
should be used, e.g. id numbers of participants with repeated measurements, one column indicating
if the observations belongs to continuous or dichotomous data with the entries type_cont
and type_dich
(e.g., for a column called "type"
with the entries "TTO" for continuous datapoints
and "DCE" for dichotomous datapoints, type_cont
will be "TTO" and type_dich
will be "DCE").
One row should match one observation (one datapoint).
start values (stv)
if the same start values stv
are to be used for all latent classes,
the given start values must be a named vector
. Otherwise (if different start values are assumed for
each latent class), a list
of named vectors should be used . In this case, there must be one entry
in the list for each latent class. Each start value vector must include start values for sigma and
theta. Currently, it is necessary to use the names "sigma"
and "theta"
for these values.
If users are unsure for which variables start values must be provided, this can be checked by
calling colnames(model.matrix(formula,data))
. In this call, the formula
should not include the
grouping variable.
latent, id_col, classes_only
in some situations, it can be useful to identify the latent classes on
only one type
of data while estimating the model parameters on both types
of data. In such cases,
the input variable latent
can be used to specify on which type of data the classification should be done.
If “cont”
or “dich”
is used, the input parameter id_col
must be specified and gives the name,
i.e. a character string
, of the grouping variable for classification. Some groups may be removed from
the data, since they have only continuous or only dichotomous observations. Then in a first step,
a model is estimated only on the continuous/dichotomous data and the achieved classification is stored.
In a next step, model parameters are estimated separately for each identified class on both types
of data
using this classification. The output object of hyreg2
in this case is a list
of k models.
Additionally, at position k+1 of the list, a data frame containing the corresponding classifications
from the first step is returned. Each element k in the list contains the estimated parameters for one
of the latent classes. When setting the input variable classes_only
to TRUE
, the second step is left
out and the estimated classes from step one are given as output.
variables_both, variables_cont, variables_dich
It is possible to specify partial coefficients, which are used only on continuous or dichotomous data.
Example: Suppose different models should be specified for continuous and dichotomous data:
Model continuous data:
y ~ x1 + x3
Model dichotomous data:
y ~ x1 + x2
The
formula
input tohyreg2
must then include all parameters that occur in either model:y ~ x1 + x2 + x3
The assignment of parameters to data types is then achieved via the input arguments
variables_both
,variables_cont
, andvariables_dich
:-
variables_both
=“x1”
, -
variables_cont
=“x3”
and -
variables_dich
=“x2”
. Every variable included in the provided
formula
(except the grouping variable ) must appear in exactly one of these vectors. One of thevariables_
vectors can also beNULL
, if no variables should be used only on this type of the data.
Author(s)
Svenja Elkenkamp, Kim Rand and John Grosser
Examples
formula <- y ~ -1 + x1 + x2 + x3 | id
k <- 2
stv <- setNames(c(0.2,0,1,1,1),c(colnames(simulated_data_norm)[3:5],c("sigma","theta")))
control = list(iter.max = 1000, verbose = 4)
rm(counter)
mod <- hyreg2(formula = formula,
data = simulated_data_norm,
type = simulated_data_norm$type, # also "type" would work
stv = stv,
k = k,
type_cont = "TTO",
type_dich = "DCE_A",
opt_method = "L-BFGS-B",
control = control,
latent = "cont",
id_col = "id"
)
summary_hyreg2(mod)
function for model estimation for EQ-5D valueset data accounting for heteroscedasticity in continous data
Description
Estimation of hybrid model for EQ-5D data
Usage
hyreg2_het(
formula,
formula_sigma = NULL,
data,
type,
type_cont,
type_dich,
k = 1,
control = NULL,
stv = NULL,
stv_sigma = NULL,
offset = NULL,
opt_method = "BFGS",
optimizer = "optim",
lower = -Inf,
upper = Inf,
latent = "both",
id_col = NULL,
classes_only = FALSE,
variables_both = NULL,
variables_dich = NULL,
variables_cont = NULL,
...
)
Arguments
formula |
linear model |
formula_sigma |
linear |
data |
a |
type |
either the name of the column in |
type_cont |
value of |
type_dich |
Value of |
k |
|
control |
control list for |
stv |
|
stv_sigma |
|
offset |
offset as in |
opt_method |
|
optimizer |
|
lower |
|
upper |
|
latent |
|
id_col |
|
classes_only |
|
variables_both |
|
variables_dich |
|
variables_cont |
|
... |
additional arguments for |
Details
see details of different inputs listed below
Value
model object of type flemix, coefficients named ..._h are coefficients for heteroscedasticity
formula
a typical R formula of the form y ~ x1 + x2 + …
should be provided.
Additionally, it is possible to include a grouping variable for repeated measures by using
“| xg”
where xg
is the column containing the group-memberships. The resulting formula will look
like this: y ~ x1 + x2 +… | xg
. In flexmix
, this is called the concomitant variable specification:
the model is fit conditional on grouping, so that all observations with the same group are treated
as belonging together when computing likelihood contributions. One possible grouping variable can be
an id number to identify answers by the same participants. We highly recommend using a grouping variable,
since otherwise the algorithm for k = 2 tends to classify all continuous data into one estimated class
and all dichotomous data into the other.
data
a dataframe having the following columns: all independent variables (x)
and the dependent variable y used in formula
, one column for the grouping variable xg if grouping
should be used, e.g. id numbers of participants with repeated measurements, one column indicating
if the observations belongs to continuous or dichotomous data with the entries type_cont
and type_dich
(e.g., for a column called "type"
with the entries "TTO" for continuous datapoints
and "DCE" for dichotomous datapoints, type_cont
will be "TTO" and type_dich
will be "DCE").
One row should match one observation (one datapoint).
start values (stv)
if the same start values stv
are to be used for all latent classes,
the given start values must be a named vector
. Otherwise (if different start values are assumed for
each latent class), a list
of named vectors should be used . In this case, there must be one entry
in the list for each latent class. Each start value vector must include start values for sigma and
theta. Currently, it is necessary to use the names "sigma"
and "theta"
for these values.
If users are unsure for which variables start values must be provided, this can be checked by
calling colnames(model.matrix(formula,data))
. In this call, the formula
should not include the
grouping variable.
formula_sigma, stv_sigma
To account for heteroscedasticity in the data, an additional formula formula_sigma
and an additional
vector of starting values for this formula (stv_sigma
) can be specified.
The provided formula_sigma
must be linear and the vector stv_sigma
must contain start values for
all parameters used in the formula. If neither formula_sigma
nor stv_sigma
are provided, the same
inputs as for formula
(without controlling for groups) and stv
(without sigma) are used.
The estimates for sigma
can be identified in the model output by the ending "_h"
. It is important to note
that, when using hyreg2_het
, neither stv
nor stv_sigma
are allowed to include sigma
,
because sigma
is estimated with its own formula (in contrast to hyreg2
, where sigma
must always be
specified in stv
).
latent, id_col, classes_only
in some situations, it can be useful to identify the latent classes on
only one type
of data while estimating the model parameters on both types
of data. In such cases,
the input variable latent
can be used to specify on which type of data the classification should be done.
If “cont”
or “dich”
is used, the input parameter id_col
must be specified and gives the name,
i.e. a character string
, of the grouping variable for classification. Some groups may be removed from
the data, since they have only continuous or only dichotomous observations. Then in a first step,
a model is estimated only on the continuous/dichotomous data and the achieved classification is stored.
In a next step, model parameters are estimated separately for each identified class on both types
of data
using this classification. The output object of hyreg2
in this case is a list
of k models.
Additionally, at position k+1 of the list, a data frame containing the corresponding classifications
from the first step is returned. Each element k in the list contains the estimated parameters for one
of the latent classes. When setting the input variable classes_only
to TRUE
, the second step is left
out and the estimated classes from step one are given as output.
variables_both, variables_cont, variables_dich
It is possible to specify partial coefficients, which are used only on continuous or dichotomous data.
Example: Suppose different models should be specified for continuous and dichotomous data:
Model continuous data:
y ~ x1 + x3
Model dichotomous data:
y ~ x1 + x2
The
formula
input tohyreg2
must then include all parameters that occur in either model:y ~ x1 + x2 + x3
The assignment of parameters to data types is then achieved via the input arguments
variables_both
,variables_cont
, andvariables_dich
:-
variables_both
=“x1”
, -
variables_cont
=“x3”
and -
variables_dich
=“x2”
. Every variable included in the provided
formula
(except the grouping variable ) must appear in exactly one of these vectors. One of thevariables_
vectors can also beNULL
, if no variables should be used only on this type of the data.
Author(s)
Svenja Elkenkamp, Kim Rand and John Grosser
Examples
formula <- y ~ -1 + x1 + x2 + x3
formula_sigma <- y ~ x1 + x2 + x3
k <- 1
stv <- setNames(c(0.2,0,1,1),c(colnames(simulated_data_norm)[3:5],c("theta")))
stv_sigma <- setNames(c(0.2,0,1,1),c(colnames(simulated_data_norm)[3:5],c("(Intercept)")))
control = list(iter.max = 1000, verbose = 4)
rm(counter)
mod <- hyreg2_het(formula = formula,
formula_sigma = formula_sigma,
data = simulated_data_norm,
type = simulated_data_norm$type, # or "type"
stv = stv,
stv_sigma = stv_sigma,
k = k,
type_cont = "TTO",
type_dich = "DCE_A",
opt_method = "L-BFGS-B",
control = control,
latent = "both",
id_col = "id"
)
summary_hyreg2(mod)
plot function to visualize the classification based on the model estimated using hyreg2
or hyreg2_het
Description
This function can be used to visualize the classification based on the model for different variables.
ggplot2::ggplot()
is used.
Usage
plot_hyreg2(
data,
x,
y,
id_col,
id_df_model,
type_to_plot = NULL,
colors = NULL
)
Arguments
data |
a |
x |
|
y |
|
id_col |
|
id_df_model |
|
type_to_plot |
|
colors |
|
Details
id_col_df
has to be provided anyway, even if the model was estimated without grouping variable.
Since there might be no grouping varibale in the data
, we recommend to create a new column called "observation"
in data using the rownames
/observationnumbers
as charachter
values and use this column as
input for id_col
in plot_hyreg2
, additionally you can then use id_df_model
= give_id(data,model,"observation")
,
see example
Value
ggplot
object visualizing x against y by classes from the model
Author(s)
Svenja Elkenkamp & John Grosser
Examples
# estimate a model using simulated_data_rnorm
formula <- y ~ -1 + x1 + x2 + x3 | id
k <- 2
stv <- setNames(c(0.2,0.2,0.2,1,1),c(colnames(simulated_data_norm)[3:5],c("sigma","theta")))
control <- list(iter.max = 1000, verbose = 4)
hyflex_mod <- hyreg2(formula = formula,
data = simulated_data_norm,
type = simulated_data_norm$type,
stv = stv,
k = k,
type_cont = "TTO",
type_dich = "DCE_A",
opt_method = "L-BFGS-B",
control = control,
latent = "cont",
id_col = "id"
)
# plotting the variables id against y
plot_hyreg2(data = simulated_data_norm,
x = "id",
y = "y",
id_col = "id",
id_df_model = give_id(data = simulated_data_norm,
model = hyflex_mod,
id = "id"))
simulated_data
Description
simulated_data
Usage
simulated_data
Format
simulated_data
A simulated data frame with 480 rows and 25 columns, following a combination of normal and binomial distribution
- type
type of data
- y
result of the model y = -1 + mo2 + mo3 + ... + ad4 + ad5
- mo2, mo3, mo4, mo5, sc2, sc3, sc4, sc5, ua2, ua3, ua4, ua5, pd2, pd3, pd4, pd5,, ad2, ad3, ad4, ad5,
dummy variables for EQ5D data simulation
- class
original class of the data point
- id
id number of observations to simulated different persons
- y_cens
column y censored at 2 (upper boundary)
...
Source
simulated with true parameter values: Class 1: sigma = XXX, theta = XXX and c(mo2,mo3,mo4,mo5, XXX) = c(XXX) Class 2: sigma = XXX, theta = XXX and c(mo2,mo3,mo4,mo5, XXX) = c(XXX)
simulated_data_mo
Description
simulated_data_mo
Usage
simulated_data_mo
Format
simulated_data_mo
A simulated data frame with 480 rows and 9 columns, following a combination of normal and binomial distribution
- type
type of data
- y
result of the model y = -1 + mo2 + mo3 + mo4 + mo5
- mo2, mo3, mo4, mo5
dummy variables
- class
original class of the data point
- id
id number of observations to simulated different persons
- y_cens
column y censored at 0 (lower boundary)
...
Source
simulated with true parameter values: Class 1: sigma = XXX, theta = XXX and c(mo2,mo3,mo4,mo5) = c(XXX) Class 2: sigma = XXX, theta = XXX and c(mo2,mo3,mo4,mo5) = c(XXX)
simulated_data_norm
Description
simulated_data_norm
Usage
simulated_data_norm
Format
simulated_data_norm
A simulated data frame with 600 rows and 8 columns, following a combination of normal and binomial distribution
- type
type of data
- y
result of the model y = x1 + x2 + x3
- x1, x2, x3
random numbers from rnorm
- class
original class of the data point
- id
id number of observations to simulated different persons
- y_cens
column y censored at 3
...
Source
simulated with true parameter values: Class 1: sigma = 1.0, theta = 5 and c(x1,x2,x3) = c(0.5, -0.3, 0.8) Class 2: sigma = 0.5, theta = 2 and c(x1,x2,x3) = c(1.4, 2.3, -0.2)
construct model summary as in base R
Description
get model parameters of model generated by hyreg2
oder hyreg2_het
Usage
summary_hyreg2(object)
Arguments
object |
|
Value
summary
object of bbmle::mle2()
model
Author(s)
Svenja Elkenkamp
Examples
formula <- y ~ -1 + x1 + x2 + x3 | id
k <- 1
stv <- setNames(c(0.2,0,1,1,1),c(colnames(simulated_data_norm)[3:5],c("sigma","theta")))
control = list(iter.max = 1000, verbose = 4)
rm(counter)
mod <- hyreg2(formula = formula,
data = simulated_data_norm,
type = simulated_data_norm$type,
stv = stv,
k = k,
type_cont = "TTO",
type_dich = "DCE_A",
opt_method = "L-BFGS-B",
control = control,
latent = "both",
id_col = "id"
)
summary_hyreg2(mod)
creating environment for package internal objects
Description
creating environment for package internal objects
Usage
the
Format
An object of class environment
of length 1.