Fixed a long-standing issue in the internal
augment() function that affected ordered factors
(#713).
Previously, augment() would:
The old behavior could degrade imputation quality for ordinal
outcomes when using the "polr" method, potentially causing
model convergence issues or increased noise in imputations.
The issue did not affect methods for unordered factors
("logreg", "polyreg",
"mnar.logreg"), where level order is inconsequential.
Thanks to @mmansolf for identifying the problem and
suggesting a fix. The updated augment() now correctly
preserves the ordered class and level order of factor
variables.
mice will now automatically move all passive
variables to the end of the visitSequence for passive
methods used without a user-specified
visitSequence. This change in behavior ensures
greater consistency at the end of each iteration.
The new behavior works well for simple cases. However, for more
complex situations — especially when passive variables depend on other
passive variables — it is recommended to manually specify a
visitSequence that updates each passive variable
immediately after one of its right-hand side predictors changes.
(#699)
Adds the calltype argument to
mice() for mixing predictorMatrix and
formulas specifications per variable-block. The
calltype argument allows the user to specify some variables
(or blocks of variables) by the formulas argument, and
other variables by predictorMatrix argument. (Note: This
argument was called modeltype in version 3.17.1).
calltype is a character vector of
length(blocks) elements that indicates how the imputation
model is specified. Entries can one of two values: "pred"
or "formula". If calltype = "pred", the
predictors of the imputation model for the block are specified by the
corresponding row of the predictorMatrix. If
calltype = "formula" the imputation model is specified by
relevant entry in formulas. The default depends on the
presence of the formulas argument. If formulas
is present, then mice() sets
calltype = "formula" for any block for which a
formula is specified. Otherwise,
calltype = "pred".
Introduces an optimized matchindex C++
function to improve speed of predictive mean matching
(#695)
dawidd6/action-download-artifact@v9pool.r.squared()
(#700)lasso.select.norm() and
lasso.norm() into one file
test-mice.impute.lasso.norm.Rlasso.select.logreg()
and lasso.logreg() into one file
test-mice.impute.lasso.logreg.Rmice 3.17.0 - with the dfcom argument of
pool(..., dfcom = .., ) (#689, #706, #707)method and
formulas (#698)Imputing categorical data by predictive mean
matching. Predictive mean matching (PMM) is the default method
of mice() for imputing numerical variables, but it has long
been possible to impute factors. This enhancement introduces better
support to work with categorical variables in PMM. The former
system translated factors into integers by
ynum <- as.integer(f). However, the order of integers in
ynum may have no sensible interpretation for an unordered
factor. The new system quantifies ynum and
could yield better results because of higher \(R^2\). The method calculates the canonical
correlation between y (as dummy matrix) and a linear
combination of imputation model predictors x. The algorithm
then replaces each category of y by a single number taken
from the first canonical variate. After this step, the imputation model
is fitted, and the predicted values from that model are extracted to
function as the similarity measure for the matching step.
The method works for both ordered and unordered factors. No
special precautions are taken to ensure monotonicity between the
category numbers and the quantifications, so the method should be able
to preserve quadratic and other non-monotone relations of the predicted
metric. It may be beneficial to remove very sparsely filled categories,
for which there is a new trim argument. All you have to use
the new technique is specify to
mice(..., method = "pmm", ...). Both numerical and
categorical variables will then be imputed by PMM.
Potential advantages are:
Note that we still lack solid evidence for these claims. (#576). Contributed @stefvanbuuren
New system-independent method for pooling: This
version introduces a new function pool.table() that takes a
tidy table of parameter estimates stemming from m repeated
analyses. The input data must consist of three columns (parameter name,
estimate, standard error) and a specification of the degrees of freedom
of the model fitted to the complete data. The pool.table()
function outputs 14 pooled statistics in a tidy form. The primary use of
pool.table() is to support parameter pooling for techiques
that have no tidy() or glance() methods,
either within R or outside R. The
pool.table() function also allows for a novel workflows
that 1) break apart the traditional pool() function into a
data-wrangling part and a parameters-reducing part, and 2) does not
necessarily depend on classed R objects. (#574). Contributed @stefvanbuuren
literanger: Adds support for the
literanger package for rf imputation that is
about twice as fast as ranger (#648). Thanks @stephematician for
the contribution.
The complete(..., action = "long", ...) command puts
the columns named ".imp" and ".id" in the last
two positions of the long data (instead of first two positions). In this
way, the columns of the imputed data will have the same positions as in
the original data, which is more user-friendly and easier to work with.
Note that any existing code that assumes that variables
".imp" and ".id" are in columns 1 and 2 will
need to be modified. The advice is to modify the code using the variable
names ".imp" and ".id". If you want the old
behaviour, specify the argument order = "first". (#569).
Contributed @stefvanbuuren
Drops support for S4. Convert S4-related code to S3. Syntax
as(df, "mids") is deprecated. Use as.mids(df)
instead.
dots argument to
ranger::ranger(...) in mice.impute.rf()
(#563). Contributed @edbonnevilleblocks argument at
various placesblocks in
initialize_chain()rbind(), when formulas are concatenated and
duplicate names are found, also rename the duplicated variables in
formulas by their new nameNEWS.md formatting to get correct version
sequence on CRAN and in-package NEWSmake.method() in
a more efficient way (resolves #672)as.mids() from filling the imp
object for complete variablesmids,
mads, mira and mipo objectscomplete() that auto-repeated imputed
values into cells that should NOT be imputed (occurred as a special case
of rbind(), where the first set of rows was imputed and the
second was not).type by the more
informative pred (currently active row of
predictorMatrix)filter.mids() that incorrectly removed
empty components in the imp objectibind() that incorrectly used
length(blocks) as the first dimension of the
chainMean and chainVar objectsvisitSequence,
chainMean and chainVar components of the
mids objectminpuc argument in
quickpred() (#634)coef() not available on S4 object when using with
lavaan (#615, #616).github/dependabot.yml configuration to automate
daily check (#598)roxygen2 7.3.1
requirementsRprofile prints to
stdout on Fedora, R version 4.1.3 (#646, #647). Thanks
@brookslogan for
the fix.methods and rlang from
Dependsampute() helpers\link statements that do not pass CRAN
checksExpands futuremice() functionality by allowing for
external packages and user-written functions (#550). Contributed @thomvolker
Adds GH issue templates bug_report,
feature_request and help_wanted (#560).
Contributed @hanneoberman
rbind.mids() and
cbind.mids() to conform to CRAN policymitml and glmnet to imports so that
test code conforms to _R_CHECK_DEPENDS_ONLY=true flag in
R CMD checkfuturemice() if
there is no .Random.seed yet.predictorMatrix for case F
by adding a predictorMatrix argument to
make.predictorMatrix()mice.impute.mpmm() example codemice.impute.2lonly.pmm() (#555)tidy(), update(), format() and
sum()R CMD check with
_R_CHECK_DEPENDS_ONLY=truefuturemice() that throws an error
when the number of cores is not specified, but the number of available
cores is greater than the number of imputations.mice.impute.mpmm() that changed the
column order of the dataAdds a function futuremice() with support for
parallel imputation using the future package (#504).
Contributed @thomvolker, @gerkovink
Adds multivariate predictive mean matching
mice.impute.mpmm(). (#460). Contributed @Mingyang-Cai
Adds convergence() for convergence evaluation
(#484). Contributed @hanneoberman
Reverts the internal seed behaviour back to
mice 3.13.10 (#515). #432 introduced new local seed in
response to #426. However, various issues arose with this facility
(#459, #492, #502, #505). This version restores the old behaviour using
global .Random.seed. Contributed @gerkovink
Adds a custom.t argument to pool() that
allows the advanced user to specify a custom rule for calculating the
total variance \(T\). Contributed @gerkovink
Adds new argument exclude to
mice.impute.pmm() that excludes a user-specified vector of
values from matching. Excluded values will not appear in the
imputations. Since the observed values are not imputed, the
user-specified values are still being used to fit the imputation model
(#392, #519). Contributed @gerkovink
.R and .Rmd filessampler.R (#511)inherits() to check on class membershipparlmice()prop, patterns and
weights matrices for pattern with only 1’sD1() and
D2() (#420)mice()make.where()test-mice.impute.rf.R(#448).Random.seed reads from the
.GlobalEnv by
get(".Random.seed", envir = globalenv(), mode = "integer", inherits = FALSE)lastSeedValue
variable namex$lastSeedValue problem in
cbind.mids() (#502)ampute()mice() by smarter random
seed initialisation (#459)drop = FALSE buglet in
mice.impute.rf() (#447, #448)withr package should have
version 2.4.0 (published in January 2021) or higher. Versions
withr 2.3.0 and before may give
Error: object 'local_seed' is not exported by 'namespace:withr'.
Either update manually, or install the patched version
mice 3.14.1 from GitHub. (#445). NOTE: withr
is no longer needed in mice 3.15.0Adds four new univariate functions using the lasso for automatic variable selection. Contributed by @EdoardoCostantini (#438).
mice.impute.lasso.norm() for lasso linear
regressionmice.impute.lasso.logreg() for lasso logistic
regressionmice.impute.lasso.select.norm() for lasso selector +
linear regressionmice.impute.lasso.select.logreg() for lasso selector +
logistic regressionAdds Jamshidian && Jalal’s non-parametric MCAR test,
mice::MCAR() and associated plot method. Contributed by
@cjvanlissa
(#423).
Adds two new functions pool.syn() and
pool.scalar.syn() that specialise pooling estimates from
synthetic data. The "reiter2003" pooling rule assumes that
synthetic data were created from complete data. Thanks Thom Volker
(#436).
By default, mice.impute.rf() now uses the faster
ranger package as back-end instead of
randomForest package. If you want the old behaviour specify
the rfPackage = "randomForest" argument to the
mice(...) call. Contributed @prockenschaub (#431).
.Random.seed (#426, #432) by
implementing withr::local_preserve_seed() and
withr::local_seed(). This change provides stabler behavior
in complex scripts. The change does not appear to break reproducibility
when mice() was run with a seed. Nevertheless, if you run
into a reproducibility problem, install mice 3.13.12 or
before.mice.impute.quadratic(), adds a parameter
quad.outcome containing the name of the outcome variable in
the complete-data model. Contributed @Mingyang-Cai, @gerkovink (#408)pool() so that it processes the parameters
from all gamlss sub-models. Thanks Marcio Augusto Diniz
(#406, #405)pool() can extract robust.se from the object
returned by broom::tidy() (#310)pool() cannot take a
mids object (#433)mice.impute.2l.lmer() to indicate a problem in fitting the
imputation model (#385)post parameter (#326)install.on.demand()
broke the standard CRAN workflow. mice 3.14.0 does not call
install.on.demand() anymore for recommended packages. Also,
install.on.demand() will not run anymore in non-interactive
mode.mice:::barnard.rubin() function
for infinite dfcom. Thanks @huftis (#441).Xi <- as.matrix(...) in
mice.impute.2l.lmer() that occurred when a cluster contains
only one observation (#384)predictorMatrix to a monotone pattern if
visitSequence = "monotone" and maxit = 1
(#316)md.pattern()
(#318, #323)make.formulas() (#305,
#324)newdata in
mice.mids() (#313, #325)where element
created in rbind() (#319)mids2spss() replaces the foreign
by haven package. Contributed Gerko Vink (#291)tests\testhat\test-D1.R that failed
on mitml 0.4-0with.mids() function to old version because the
change in commit 4634094 broke downstream package metafor
(#292)mice.impute.rf() in finding
candidate donors (#288, #289)Much faster predictive mean matching. The new
matchindex C function makes predictive mean matching
50 to 600 times faster. The speed of pmm
is now on par with normal imputation (mice.impute.norm())
and with the miceFast package, without compromising on the
statistical quality of the imputations. Thanks to Polkas https://github.com/Polkas/miceFast/issues/10 and
suggestions by Alexander Robitzsch. See #236 for more details.
New ignore argument to
mice(). This argument is a logical vector of
nrow(data) elements indicating which rows are ignored when
creating the imputation model. We may use the ignore
argument to split the data into a training set (on which the imputation
model is built) and a test set (that does not influence the imputation
model estimates). The argument is based on the suggestion in https://github.com/amices/mice/issues/32#issuecomment-355600365.
See #32 for more background and techniques. Crafted by Patrick
Rockenschaub
New filter() function for mids
objects. New filter() method that subsets a
mids object (multiply-imputed data set). The method accepts
a logical vector of length nrow(data), or an expression to
construct such a vector from the incomplete data. (#269). Crafted by
Patrick Rockenschaub.
Breaking change: The matcher
algorithm in pmm has changed to matchindex for
speed improvements. If you want the old behavior, specify
mice(..., use.matcher = TRUE).
cpp11 package
(#286)with.mids() by calling
eval_tidy() on a quosure. Does not yet solve #265.pool() and
pool.scalar() (#142, #106, #190 and others)tidy.mipo more flexible (#276)nelsonaalen() gets a
tibble (#272)NAs can appear in the imputed
data (#267)quickpred() documentation (#268)sum.scores()lm.mids(),
glm.mids(), pool.compare().pmm.match() and expandcov()return() calls placed just before
end-of-functionprintFlag
value (#258)amicesdf.residual, which caused
problematic behavior in the D1(), D2(),
D3(), anova() and pool().
mice now extracts the relevant information from other parts
of the objects returned by survival::coxph(), which solves
long-standing issues with the integration of the Cox model (#246).Rccp dependency to work with
tidyr 1.1.1 (#248).Non-file package-anchored link(s) in documentation object.ampute documentation (#251).suggests.tidy.mipo() and
glance.mipo() return standardized output that conforms to
broom specifications. Kindly contributed by Vincent Arel
Bundock (#240).D3 testing script that
produced an error on CRAN (#244).The D3() function in mice gave
incorrect results. This version solves a problem in the calculation of
the D3-statistic. See #226 and #228 for more details. The
documentation explains why results from mice::D3() and
mitml::testModels() may differ.
The pool() function is now more forgiving when there
is no glance() function (#233)
It is possible to bypass remove.lindep() by setting
eps = 0 (#225)
plot.mids() documentationThis version adds two new NARFCS methods for imputing data under
the Missing Not at Random (MNAR) assumption. NARFCS is
generalised version of the so-called \(\delta\)-adjustment method. Margarita
Moreno-Betancur and Ian White kindly contributes the functions
mice.impute.mnar.norm() and
mice.impute.mnar.logreg(). These functions aid in
performing sensitivity analysis to investigate the impact of different
MNAR assumptions on the conclusion of the study. An alternative for MNAR
is the older mice.impute.ri() function.
Installation of mice is faster. External packages
needed for imputation and analyses are now installed on demand. The
number of dependencies as estimated by
rsconnect::appDepencies() decreased from 132 to
83.
The name clash with the complete() function of
tidyr should no longer be a problem.
There is now a more flexible pool() function that
integrates better with the broom and
broom.mixed packages.
pool.compare(). Use D1()
instead (#220)utils::globalVariables()tidyr by defining
complete.mids() as an S3 method for the
tidyr::complete() generic (#212)pool() function to deal with multiple sets
of parameters. Currently supported keywords are: term (all
broom functions), component (some
broom.mixed functions) and y.values (for
multinom() model) (#219)install.on.demand() function for lighter
installationtoenail2 and remove dependency on
HSAUR3ampute in extreme cases (#216)pool with mgcv::gam
(#218).gitattributes for consistent line endingspolr() always fail (#206)data.frame
(#208)mira-class documentation (#207)CALIBERrfimpute2lonly.norm and 2lonly.pmma2 to elementwise division by a
matrix of observations2lonly.norm and
2lonly.pmm2lonly.pmm2lonly.mean now also works with
factorsimputationMethod argument in
examples by methodcheck.predictorMatrix()
(#191)toenail data from orphaned DPpackage
packageDPpackage from Suggests field in
DESCRIPTIONmd.pattern() (#170,
#177)as.mids()
(#173)mice.impute.xxx() so that mice::mice() works
as expected (#55)mids2spss(), thanks Edgar
Schoreit (#149)predictorMatrix.mice 3.3.1 will impute those variables using the intercept
onlynelsonaalen() function for data where
variables time or status have already been
defined (#140), thanks matthieu-faronmice 3.0.0 -
mice 3.2.0 under passive imputation.broom 0.5.0 (#128)mice.impute.2l.norm() (#129)mice.impute.2l.norm() (#129)D1() (#128)md.pattern (#126)rbind and cbind
(#114)rbind problem when method is a list
(#113)parlmice (#109)dfcom argument to pool() (#105,
#110)parlmice + bugfix (#107)parlmice (#104)flux
(#102)estimice (#101)parent.frame (#98)NEWS.md, index.Rmd and online package
documentation.R instead of .rupdateLog (#8, @alexanderrobitzsch)md.pattern (#90)m (#89)Version 3.0 represents a major update that implements the following features:
blocks: The main algorithm iterates over blocks. A
block is simply a collection of variables. In the common MICE algorithm
each block was equivalent to one variable, which - of course - is the
default; The blocks argument allows mixing univariate
imputation method multivariate imputation methods. The
blocks feature bridges two seemingly disparate approaches,
joint modeling and fully conditional specification, into one
framework;
where: The where argument is a logical
matrix of the same size of data that specifies which cells
should be imputed. This opens up some new analytic
possibilities;
Multivariate tests: There are new functions D1(),
D2(), D3() and anova() that
perform multivariate parameter tests on the repeated analysis from on
multiply-imputed data;
formulas: The old form argument has
been redesign and is now renamed to formulas. This provides
an alternative way to specify imputation models that exploits the full
power of R’s native formula’s.
Better integration with the tidyverse framework,
especially for packages dplyr, tibble and
broom;
Improved numerical algorithms for low-level imputation function. Better handling of duplicate variables.
Last but not least: A brand new edition AND online version of Flexible Imputation of Missing Data. Second Edition.
mids object in mice
(thanks stephematician) (#61)rbind.mids (thanks stephematician)
(#59)pool.compare() in handling factors
(#60)rbind.mids in handling where
(#59)as.mids(), add
as()cart not accepting a matrix (thanks
Joerg Drechsler)pool() to list of modelsampute function and vignettes (Rianne
Schouten)mice.impute.2l.sys to
mice.impute.2l.lmerwhereargument to micewy argument to imputation functionsmice.impute.2l.sys(), author Shahab Jolanicbind() functionmids
objectlattice packagexyplot.madsmice.impute.2lonly.pmm()ampute() by Rianne Schoutenmice function (thanks Ben
Ogorek)cbind.mids() replaced by calls to
cbind()miceVignettes on github (thanks Gerko
Vink)README for GitHubccn –> ncc, icn
–> niccc(), ncc(),
cci(), ic(), nic() and
ici() use S3 dispatchmultinom MaxNWts type fix in polyreg
and polr #9pool.compare #12as.mids if names not same as all columns #11glmer models #5midastouch: predictive mean matching for small
samples (thanks Philip Gaffert, Florian Meinfelder)rpart callridge to 2l.norm().o filesas.mids() bug that crashed
miceadds::mice.1chain()impute.polyreg() bug that bombed if there were no
predictors (thanks Jan Graffelman)as.mids() bug that gave incorrect \(m\) (several users)pool.compare() error for lmer object
(thanks Claudio Bustos)mice.impute.2l.norm() if just one
NA (thanks Jeroen Hoogland)pool.scalar() now can do Barnard-Rubin adjustmentpool() now handles class lmerMod from the
lme4 package.pmm.match() for
safetymice.impute.pmm() for
increased visibilitymice.impute.rf()
from 100 to 10 (thanks Anoop Shah)long2mids() deprecated. Use as.mids()
insteadlattice back into DEPENDS to find generic
xyplot() and friends2lonly.pmm (thanks Alexander Robitzsch,
Gerko Vink, Judith Godin)as.mids() (thanks Tommy
Nyberg, Gerko Vink)mdc() in example
mice.impute.quadratic()mice.impute.rf() if just one
NA (thanks Anoop Shah)summary.mipo() when
names(x$qbar) equals NULL (thanks Aiko
Kuhn)ncol() in
mice.impute.2lonly.mean()