---
title: "Getting Started with BioMoR"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with BioMoR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

BioMoR: Bioinformatics Modeling with Recursion and Autoencoder-Based Ensembles

BioMoR is an R package for bioinformatics modeling that integrates:
	•	Recursive Transformer architectures via Mixture-of-Recursions (MoR)
(Bae et al. 2025 doi:10.48550/arXiv.2507.10524)
	•	Autoencoder-based representation learning
(Hinton & Salakhutdinov 2006 doi:10.1126/science.1127647)
	•	Random Forests for robust tree-based modeling
(Breiman 2001 doi:10.1023/A:1010933404324)
	•	XGBoost for efficient gradient boosting
(Chen & Guestrin 2016 doi:10.1145/2939672.2939785)
	•	Stacked ensembles to combine diverse models for stronger predictive power.

It is designed as a benchmarking framework for predictive workflows in bioinformatics, enabling consistent cross-validation, calibration, and threshold optimization.

Motivation

Modern bioinformatics involves high-dimensional and noisy data such as genomics, transcriptomics, and proteomics. BioMoR addresses these challenges by:
	•	Using Mixture-of-Recursions (MoR) for adaptive recursive depth and computational efficiency.
	•	Learning latent embeddings through autoencoders to improve classifier generalization.
	•	Leveraging ensemble methods (RF, XGB) for robustness.
	•	Providing a standardized benchmarking interface to evaluate models on ROC-AUC, PR-AUC, F1, Balanced Accuracy, Brier score, calibration, and threshold optimization.

Example Workflow

We illustrate with the classic iris dataset (binary recoding for simplicity):

```{r, message=FALSE}
library(BioMoR)

# Prepare dataset: recode labels to binary
data(iris)
iris$Label <- ifelse(iris$Species == "setosa", "Active", "Inactive")

# Cross-validation control
ctrl <- get_cv_control(cv = 3)

# Train a Random Forest
fit <- train_rf(iris, outcome_col = "Label", ctrl = ctrl)

# Benchmark the model
results <- biomor_benchmark(fit, iris, outcome_col = "Label")

# Print metrics
results$metrics
```
Visualization
```{r, fig.height=4, fig.width=6}
# ROC Curve
results$plots$ROC
# Precision-Recall Curve
results$plots$PR
# Threshold Optimization
results$plots$Thresholds
# Calibration Curve
results$plots$Calibration
```
Extending BioMoR
	•	Replace train_rf() with train_xgb_caret() for XGBoost.
	•	Incorporate autoencoder features via train_autoencoder() and get_embeddings().
	•	Use train_biomor() to stack multiple models.
	•	Benchmark across models to compare pipelines in one consistent framework.