LMMsolver

CRAN RStudio mirror downloads R-CMD-check codecov

The aim of the LMMsolver package is to provide an efficient and flexible system to estimate variance components using restricted maximum likelihood or REML (Patterson and Thompson 1971), for models where the mixed model equations are sparse (Boer 2023). An example of an application is using splines to model spatial (Rodríguez-Álvarez et al. 2018; Boer, Piepho, and Williams 2020) or temporal (Bustos-Korts et al. 2019) trends. Another example is mixed model Quantitative Trait Locus (QTL) analysis for multiparental populations, allowing for heterogeneous residual variance and design matrices with Identity-By-Descent (IBD) probabilities (Li et al. 2021).

Installation

install.packages("LMMsolver")
remotes::install_github("Biometris/LMMsolver", ref = "develop", dependencies = TRUE)

Example

As an example of the functionality of the package we use a model defined in Rodríguez-Álvarez et al. (2015). It uses the USprecip data set in the spam package (Furrer and Sain 2010).

library(LMMsolver)
library(ggplot2)

## Get precipitation data from spam
data(USprecip, package = "spam")

## Only use observed data.
USprecip <- as.data.frame(USprecip)
USprecip <- USprecip[USprecip$infill == 1, ]

A two-dimensional P-spline can be defined with the spl2D() function, with longitude and latitude as covariates. The number of segments chosen here is equal to the number of segments used in Rodríguez-Álvarez et al. (2015).

obj1 <- LMMsolve(fixed = anomaly ~ 1,
                 spline = ~spl2D(x1 = lon, x2 = lat, nseg = c(41, 41)),
                 data = USprecip)

The summary function gives a table with the effective dimensions and the penalty parameters:

summary(obj1)
#> Table with effective dimensions and penalties: 
#> 
#>           Term Effective Model Nominal Ratio Penalty
#>    (Intercept)      1.00     1       1  1.00    0.00
#>  lin(lon, lat)      3.00     3       3  1.00    0.00
#>         s(lon)    302.60  1936    1932  0.16    0.26
#>         s(lat)    409.09  1936    1932  0.21    0.08
#>       residual   5190.31  5906    5902  0.88   13.53
#> 
#>  Total Effective Dimension: 5906

The spatial trend for the precipitation can now be plotted on the map of the USA.

plotDat <- obtainSmoothTrend(obj1, grid = c(200, 300), includeIntercept = TRUE)
plotDat <- sf::st_as_sf(plotDat, coords = c("lon", "lat"))
usa <- sf::st_as_sf(maps::map("usa", regions = "main", plot = FALSE))
sf::st_crs(usa) <- sf::st_crs(plotDat)
intersection <- sf::st_intersects(plotDat, usa)
plotDat <- plotDat[!is.na(as.numeric(intersection)), ]

ggplot(usa) + 
  geom_sf(color = NA) +
  geom_tile(data = plotDat, 
            mapping = aes(geometry = geometry, fill = ypred), 
            linewidth = 0,
            stat = "sf_coordinates") +
  scale_fill_gradientn(colors = topo.colors(100))+
  labs(title = "Precipitation (anomaly)", 
       x = "Longitude", y = "Latitude") +
  coord_sf() +
  theme(panel.grid = element_blank())

Further examples can be found in the vignette.

vignette("Solving_Linear_Mixed_Models")

References

Boer, Martin P. 2023. “Tensor Product P-Splines Using a Sparse Mixed Model Formulation.” Statistical Modelling 23 (5-6): 465–79. https://doi.org/https://doi.org/10.1177/1471082X231178591.

Boer, Martin P., Hans-Peter Piepho, and Emlyn R. Williams. 2020. “Linear Variance, P-splines and Neighbour Differences for Spatial Adjustment in Field Trials: How are they Related?J. Agric. Biol. Environ. Stat. 25 (4): 676–98. https://doi.org/10.1007/S13253-020-00412-4.

Bustos-Korts, Daniela, Martin P. Boer, Marcos Malosetti, Scott Chapman, Karine Chenu, Bangyou Zheng, and Fred A. van Eeuwijk. 2019. “Combining Crop Growth Modeling and Statistical Genetic Modeling to Evaluate Phenotyping Strategies.” Front. Plant Sci. 10 (November). https://doi.org/10.3389/fpls.2019.01491.

Furrer, R, and SR Sain. 2010. “spam: A sparse matrix R package with emphasis on MCMC methods for Gaussian Markov random fields.” J. Stat. Softw. https://core.ac.uk/download/pdf/6340272.pdf.

Li, Wenhao, Martin P. Boer, Chaozhi Zheng, Ronny V. L. Joosen, and Fred A. van Eeuwijk. 2021. “An IBD-based mixed model approach for QTL mapping in multiparental populations.” Theor. Appl. Genet. 2021 1 (August): 1–18. https://doi.org/10.1007/S00122-021-03919-7.

Patterson, HD, and R Thompson. 1971. “Recovery of inter-block information when block sizes are unequal.” Biometrika. https://doi.org/10.1093/biomet/58.3.545.

Rodríguez-Álvarez, María Xosé, Martin P. Boer, Fred A. van Eeuwijk, and Paul H. C. Eilers. 2018. “Correcting for spatial heterogeneity in plant breeding experiments with P-splines.” Spat. Stat. 23 (March): 52–71. https://doi.org/10.1016/J.SPASTA.2017.10.003.

Rodríguez-Álvarez, María Xosé, Dae Jin Lee, Thomas Kneib, María Durbán, and Paul Eilers. 2015. “Fast smoothing parameter separation in multidimensional generalized P-splines: the SAP algorithm.” Stat. Comput. 25 (5): 941–57. https://doi.org/10.1007/S11222-014-9464-2.