This vignette describes how to perform F-informed
multidimensional scaling using the FinfoMDS package.
FinfoMDS was developed by Soobin Kim (sbbkim@ucdavis.edu).
A proposal of the method and its full description can be found at:
This vignette was updated in August 2025.
Multidimensional scaling (MDS) is a dimensionality reduction technique used in microbial ecology data analysis to represent multivariate structures while preserving pairwise distances between samples. While its improvement has enhanced the ability to reveal data patterns by sample groups, these MDS-based methods often require prior assumptions for inference, limiting their broader application in general microbiome analysis.
Here, we introduce a new MDS-based ordination,
F-informed MDS (implemented in the R package
FinfoMDS), which configures data distribution based on the
F-statistic, the ratio of dispersion between groups that share
common and different labels. Our approach offers a well-founded
refinement of MDS that aligns with statistical test results, which can
be beneficial for broader compositional data analyses in microbiology
and ecology.
To install the official release version of this package, start R (version “4.5”) and enter:
For older R versions, please refer to the appropriate Bioconductor release.
As introduced earlier, F-informed MDS is a microbiome data
visualization tool designed to display statistical significance. The
visualization (a 2D representation) output is obtained by iterating an
optimization algorithm at each epoch. The procedure is summarized in the
diagram below and is implemented in the fmds()
function.
For fmds() function to operate, the following arguments
are required: 1) labels y, 2) either a distance matrix
D or a design matrix X (if both are provided,
D takes precedence). Additionally, the following optional
arguments can be adjusted according to user preference: 1)
hyperparameter lambda, 2) initial representation
z0, 3) maximum epoch number nit, and 4)
stopping criterion threshold_p. Upon completion,
fmds() returns a two-column matrix containing the
coordinates representing the distance matrix D.
As an example, we use an algal-associated bacterial community (Kim et
al., 2022). First, load a phyloseq-class object by
typing:
Next, compute the weighted UniFrac distance from this dataset and obtain its label set:
require(phyloseq)
#> Loading required package: phyloseq
D <- distance(microbiome, method = 'wunifrac') # requires phyloseq package
y <- sample_data(microbiome)$TreatmentThen, compute the F-informed MDS by running:
result <- fmds(D = D, y = y, lambda = 0.3, threshold_p = 0.05)
#> epoch 0 lambda 0.3 total 0.10 mds 0.02 conf 0.24 p_z 0.505 p_0 0.082
#> epoch 1 lambda 0.3 total 0.14 mds 0.01 conf 0.43 p_z 0.389 p_0 0.082
#> epoch 2 lambda 0.3 total 0.10 mds 0.01 conf 0.30 p_z 0.264 p_0 0.082
#> epoch 3 lambda 0.3 total 0.06 mds 0.01 conf 0.17 p_z 0.161 p_0 0.082
#> epoch 4 lambda 0.3 total 0.01 mds 0.01 conf 0.00 p_z 0.098 p_0 0.082
#> Lambda 0.30 ...halt iterationThis procedure will iterate until the 2D distributions converge, as
long as the p-value does not deviate more than
threshold_p, or until reaching the default maximum of 100
iterations, whichever occurs first. We have observed that setting lambda
between 0.3 and 0.5 typically yields optimal results; however, this
hyperparameter can be adjusted as long as it does not exceed 1.
The 2D representation of the community dataset is returned as a matrix and can be visualized by typing:
H Kim, JA Kimbrel, CA Vaiana, JR Wollard, X Mayali, and CR Buie (2022). Bacterial response to spatial gradients of algal-derived nutrients in a porous microplate. The ISME Journal, 16(4):1036–1045.
sessionInfo()
#> R version 4.5.2 (2025-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] phyloseq_1.54.0 FinfoMDS_1.0.0 BiocStyle_2.38.0
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.6 xfun_0.54 bslib_0.9.0
#> [4] ggplot2_4.0.1 rhdf5_2.54.0 Biobase_2.70.0
#> [7] lattice_0.22-7 rhdf5filters_1.22.0 vctrs_0.6.5
#> [10] tools_4.5.2 generics_0.1.4 biomformat_1.38.0
#> [13] stats4_4.5.2 parallel_4.5.2 cluster_2.1.8.1
#> [16] pkgconfig_2.0.3 Matrix_1.7-4 data.table_1.17.8
#> [19] RColorBrewer_1.1-3 S7_0.2.1 S4Vectors_0.48.0
#> [22] lifecycle_1.0.4 compiler_4.5.2 farver_2.1.2
#> [25] stringr_1.6.0 Biostrings_2.78.0 Seqinfo_1.0.0
#> [28] codetools_0.2-20 permute_0.9-8 htmltools_0.5.8.1
#> [31] sys_3.4.3 buildtools_1.0.0 sass_0.4.10
#> [34] yaml_2.3.10 crayon_1.5.3 jquerylib_0.1.4
#> [37] MASS_7.3-65 cachem_1.1.0 vegan_2.7-2
#> [40] iterators_1.0.14 foreach_1.5.2 nlme_3.1-168
#> [43] digest_0.6.38 stringi_1.8.7 reshape2_1.4.5
#> [46] maketools_1.3.2 splines_4.5.2 ade4_1.7-23
#> [49] fastmap_1.2.0 grid_4.5.2 cli_3.6.5
#> [52] magrittr_2.0.4 survival_3.8-3 ape_5.8-1
#> [55] scales_1.4.0 rmarkdown_2.30 XVector_0.50.0
#> [58] igraph_2.2.1 multtest_2.66.0 evaluate_1.0.5
#> [61] knitr_1.50 IRanges_2.44.0 mgcv_1.9-4
#> [64] rlang_1.1.6 Rcpp_1.1.0 glue_1.8.0
#> [67] BiocManager_1.30.27 BiocGenerics_0.56.0 jsonlite_2.0.0
#> [70] R6_2.6.1 Rhdf5lib_1.32.0 plyr_1.8.9