--- title: "Getting Started with BioMoR" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with BioMoR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` BioMoR: Bioinformatics Modeling with Recursion and Autoencoder-Based Ensembles BioMoR is an R package for bioinformatics modeling that integrates: • Recursive Transformer architectures via Mixture-of-Recursions (MoR) (Bae et al. 2025 doi:10.48550/arXiv.2507.10524) • Autoencoder-based representation learning (Hinton & Salakhutdinov 2006 doi:10.1126/science.1127647) • Random Forests for robust tree-based modeling (Breiman 2001 doi:10.1023/A:1010933404324) • XGBoost for efficient gradient boosting (Chen & Guestrin 2016 doi:10.1145/2939672.2939785) • Stacked ensembles to combine diverse models for stronger predictive power. It is designed as a benchmarking framework for predictive workflows in bioinformatics, enabling consistent cross-validation, calibration, and threshold optimization. Motivation Modern bioinformatics involves high-dimensional and noisy data such as genomics, transcriptomics, and proteomics. BioMoR addresses these challenges by: • Using Mixture-of-Recursions (MoR) for adaptive recursive depth and computational efficiency. • Learning latent embeddings through autoencoders to improve classifier generalization. • Leveraging ensemble methods (RF, XGB) for robustness. • Providing a standardized benchmarking interface to evaluate models on ROC-AUC, PR-AUC, F1, Balanced Accuracy, Brier score, calibration, and threshold optimization. Example Workflow We illustrate with the classic iris dataset (binary recoding for simplicity): ```{r, message=FALSE} library(BioMoR) # Prepare dataset: recode labels to binary data(iris) iris$Label <- ifelse(iris$Species == "setosa", "Active", "Inactive") # Cross-validation control ctrl <- get_cv_control(cv = 3) # Train a Random Forest fit <- train_rf(iris, outcome_col = "Label", ctrl = ctrl) # Benchmark the model results <- biomor_benchmark(fit, iris, outcome_col = "Label") # Print metrics results$metrics ``` Visualization ```{r, fig.height=4, fig.width=6} # ROC Curve results$plots$ROC # Precision-Recall Curve results$plots$PR # Threshold Optimization results$plots$Thresholds # Calibration Curve results$plots$Calibration ``` Extending BioMoR • Replace train_rf() with train_xgb_caret() for XGBoost. • Incorporate autoencoder features via train_autoencoder() and get_embeddings(). • Use train_biomor() to stack multiple models. • Benchmark across models to compare pipelines in one consistent framework.