1 Introduction

Fancy infers microbial co-abundance networks using a hybrid scoring framework that combines kNN-based mutual information (MI), MRNET edge selection, and distance correlation (dCor) within a bootstrap resampling loop. Each bootstrap iteration computes an MI matrix, applies MRNET for sparse edge selection, and independently computes pairwise dCor. The final hybrid score ranks edges by combining MRNET edge frequency with dCor stability across bootstraps.

This hybrid approach captures nonlinear associations between MAGs, including threshold effects, competitive exclusion, and other ecological interactions that linear correlation measures miss.

1.1 Positioning relative to existing Bioconductor packages

Existing microbial association frameworks in Bioconductor and CRAN span several methodological classes that differ in their assumptions about linearity, sparsity, and conditional independence:

Class of method Representative tools Core idea Limitation relevant to FANCY
Correlation / co-expression modules WGCNA Pearson/biweight correlation, hierarchical module detection Mainly captures linear or monotonic dependence
Mutual-information networks minet (CLR, ARACNE, MRNET) Information-theoretic dependence No explicit frequency or ecological-coordination weighting
Sparse conditional-dependence graphs SpiecEasi, glasso-based methods Sparse precision matrices, indirect-edge reduction Assumes sparse graphical structure; can miss nonlinear ecological relationships
Association ensembles CoNet and similar Combine multiple association metrics Flexible but not specifically tuned to nonlinear ecological coordination

FANCY was developed as a complementary framework that emphasises nonlinear and context-dependent ecological associations through a hybrid dependence-scoring strategy combining information-theoretic and frequency-aware components. Rather than inferring direct or conditional- independence interaction networks, FANCY is intended for exploratory identification of coordinated microbial structure and module organisation that may be difficult to capture using purely correlation- or precision-based approaches. The package is best understood as hypothesis-generating: it surfaces nonlinear and potentially state- dependent ecological associations that warrant follow-up with mechanistic or causal-inference methods.

1.2 Integration with the Bioconductor microbiome workflow

FANCY is designed to slot into standard Bioconductor microbiome analyses. Input abundance tables can be derived from phyloseq or mia / SummarizedExperiment workflows — for example, by extracting an otu_table() and applying a CLR transform via compositions (which FANCY imports) before passing the resulting samples × MAGs matrix to fancy(). The resulting edge tables and module assignments are returned as standard data frames and adjacency matrices, ready for downstream graph analysis with igraph, tidygraph or ggraph, or for export to Cytoscape via the workflow described later in this vignette. FANCY therefore complements rather than replaces existing Bioconductor microbiome and network packages.

library(Fancy)

2 Example Data

Fancy ships with a dataset subset from a published cattle rumen metagenome study. The data contains 100 MAGs across 321 samples with CLR-transformed abundances pre-computed on the full 2178-MAG compositional reference:

  • MAGs 1-2: Bulleidia vs AC2028 (both Bacillota), showing threshold competitive exclusion
  • MAGs 3-4: RUG023 (Spirochaetota) vs Cryptobacteroides (Bacteroidota), showing an L-shaped nonlinear relationship
  • MAGs 5-100: high-quality MAGs spanning 9 phyla
data(fancy_tiny_clr)
#> Warning in data(fancy_tiny_clr): data set 'fancy_tiny_clr' not found
data(fancy_tiny_counts)
#> Warning in data(fancy_tiny_counts): data set 'fancy_tiny_counts' not found
data(fancy_tiny_coverage)
#> Warning in data(fancy_tiny_coverage): data set 'fancy_tiny_coverage' not found
data(fancy_tiny_taxonomy)
#> Warning in data(fancy_tiny_taxonomy): data set 'fancy_tiny_taxonomy' not found
data(fancy_tiny_metadata)
#> Warning in data(fancy_tiny_metadata): data set 'fancy_tiny_metadata' not found

dim(fancy_tiny_clr)    # MAGs x samples (CLR-transformed)
#> [1] 100 321
fancy_tiny_clr[1:5, 1:5]
#>         Sample_001 Sample_002 Sample_003 Sample_004  Sample_005
#> MAG_001 -1.4452339 -0.8091287 -0.7510971  0.1360897 -1.27453076
#> MAG_002  0.6111012  0.7826824  0.9602607 -1.5487055  0.24604355
#> MAG_003 -0.7414102  1.1039286  1.6685132  3.9810748  0.04382526
#> MAG_004 -1.0266201 -1.0009358 -0.7654210 -0.7946228 -0.37945251
#> MAG_005  0.3110781  0.1937208 -0.2376735 -0.3140157 -0.06940226

The taxonomy spans 9 phyla:

phyla_counts <- sort(table(fancy_tiny_taxonomy$Phyla), decreasing = TRUE)
par(mar = c(5, 10, 2, 1))
barplot(phyla_counts, horiz = TRUE, las = 1,
        col = phyla_palette(names(phyla_counts))[names(phyla_counts)],
        xlab = "Number of MAGs", main = "Phyla Distribution")

2.1 Coverage distribution

Coverage values reflect real genome mapping fractions. Most MAGs in this subset have high coverage, as they were selected for genome quality (coverage >= 0.5 in >= 200 samples).

cov_vals <- unlist(fancy_tiny_coverage)
hist(cov_vals, breaks = 60, main = "Genome Coverage Distribution",
     xlab = "Covered fraction", col = "steelblue", border = "white")
abline(v = 0.3, col = "red", lwd = 2, lty = 2)
text(0.32, par("usr")[4] * 0.85, "min_coverage = 0.3",
     col = "red", cex = 0.8, pos = 4)

cat("Fraction with coverage >= 0.3:", round(mean(cov_vals >= 0.3, na.rm = TRUE), 4), "\n")
#> Fraction with coverage >= 0.3: 0.6763

2.2 Metadata

The metadata links each sample to methane emissions (CH4) and cattle breed.

head(fancy_tiny_metadata)
#>       Sample       Breed   CH4
#> 1 Sample_001       Luing 17.96
#> 2 Sample_002       Luing 18.16
#> 3 Sample_003 Charolais_X 25.97
#> 4 Sample_004 Charolais_X 18.85
#> 5 Sample_005       Luing 20.93
#> 6 Sample_006 Charolais_X 12.70
hist(fancy_tiny_metadata$CH4, breaks = 15, main = "CH4 Emissions",
     xlab = "CH4 (g/day)", col = "grey80", border = "white")

3 Preparing the Network Input

The shipped CLR matrix (fancy_tiny_clr) was computed on the full 2178-MAG compositional reference, preserving the proper geometric mean. Network functions in Fancy expect samples as rows, MAGs as columns, so we transpose:

net_input <- t(fancy_tiny_clr)
dim(net_input)  # samples x MAGs
#> [1] 321 100
net_input[1:5, 1:5]
#>               MAG_001    MAG_002     MAG_003    MAG_004     MAG_005
#> Sample_001 -1.4452339  0.6111012 -0.74141025 -1.0266201  0.31107806
#> Sample_002 -0.8091287  0.7826824  1.10392858 -1.0009358  0.19372079
#> Sample_003 -0.7510971  0.9602607  1.66851318 -0.7654210 -0.23767346
#> Sample_004  0.1360897 -1.5487055  3.98107483 -0.7946228 -0.31401566
#> Sample_005 -1.2745308  0.2460435  0.04382526 -0.3794525 -0.06940226

Note: If you are working with your own raw count data, run filter_mags() then clr_normalize() before transposing. The example data ships pre-CLR to avoid recomputing CLR on a reduced MAG set, which would distort the geometric mean.

4 Why a Hybrid Scoring Framework?

Fancy combines two complementary approaches in a bootstrap framework:

  1. Mutual information + MRNET: kNN-based MI captures all types of statistical dependencies (linear and nonlinear). MRNET then sparsifies the MI network by removing indirect edges, retaining only the most informative direct associations. The fraction of bootstrap iterations in which MRNET selects an edge gives the EdgeFrequency.

  2. Distance correlation (dCor): Computed independently in each bootstrap iteration, dCor provides a second nonlinear dependency measure. The ratio of mean dCor to its standard deviation across bootstraps gives a Stability score, meaning edges with consistently high dCor are more trustworthy.

The final HybridScore ranks edges by combining both signals. Fancy offers two scoring modes:

4.1 Multiplicative scoring (default)

\[\text{HybridScore} \;=\; \text{EdgeFrequency}^{\,w_1} \;\times\; \text{Stability.dcor.scaled}^{\,w_2}\]

\(w_1\) controls the weight on MRNET edge frequency and \(w_2\) controls the weight on dCor stability (default \(w_1 = 0.3\), \(w_2 = 0.7\)). Edges need support from both components: a near-zero value in either pulls the score toward zero.

4.2 Additive scoring

\[\text{HybridScore} \;=\; w_1 \times \text{EdgeFrequency} \;+\; w_2 \times \text{Stability.dcor.scaled}\]

Set score_type = "additive" in fancy() to use this mode. This allows edges with strong dCor but weak MRNET frequency (or vice versa) to retain a meaningful score — useful when users want purely dCor-discovered edges to carry weight.

4.3 dCor rescue (multiplicative mode)

For multiplicative scoring, a dcor_rescue_percentile option is available (e.g. dcor_rescue_percentile = 0.6). Edges whose Stability.dcor.scaled exceeds the given percentile receive a minimum EdgeFrequency floor (the 30th percentile of non-zero EdgeFrequency values). This prevents MRNET from suppressing edges that dCor finds reliably — important for users who believe some associations are only detectable by dCor and not MI/MRNET.

This hybrid approach captures nonlinear ecological interactions that Pearson or Spearman correlation would miss.

5 Running the Pipeline

fancy() wraps the full pipeline: bootstrap network inference, edge scoring, and thresholding.

Parameter Default Suggested Description
n_bootstrap 100 100 to 200 Number of bootstrap iterations
k 5 3 to 7 kNN parameter for MI estimation
find_k FALSE TRUE for exploratory runs Auto-select k via elbow method
cpus 4 2 to 8 Number of parallel workers
use_mrnet TRUE TRUE Apply MRNET for sparse edge selection
w1 0.3 0.2 to 0.4 Weight for edge frequency
w2 0.7 0.6 to 0.8 Weight for dCor stability
threshold_method “quantile” “quantile” or “top_n” How to threshold edges
threshold_value 0.7 0.7 to 0.9 Quantile cutoff (top 30 percent at 0.7)

We run 100 bootstrap iterations here for meaningful hybrid scores. We set cpus = 6L for parallel processing and use the default 70th percentile threshold. Because this dataset contains high-quality MAGs subset from real data (100 MAGs, 321 samples), we use equal weights (w1 = w2 = 0.5) so that MRNET edge frequency and dCor stability contribute equally to the hybrid score.

result <- fancy(
  net_input,
  n_bootstrap     = 100L,
  k               = 3L,
  cpus            = 6L,
  w1              = 0.5,
  w2              = 0.5,
  threshold_method = "quantile",
  threshold_value  = 0.7,
  verbose          = TRUE
)
#> Warning in searchCommandline(parallel, cpus = cpus, type = type, socketHosts =
#> socketHosts, : Unknown option on commandline: tools::buildVignettes(dir~+~
#> R Version:  R version 4.6.0 RC (2026-04-17 r89917)
#> Library minet loaded.
#> Library energy loaded.
#> Library knnmi loaded.

6 Inspecting Results

The result is a list with class "fancy":

str(result, max.level = 1)
#> List of 6
#>  $ edges               : tibble [685 × 8] (S3: tbl_df/tbl/data.frame)
#>  $ all_edges           : tibble [2,283 × 8] (S3: tbl_df/tbl/data.frame)
#>  $ all_edges_unfiltered: tibble [4,950 × 8] (S3: tbl_df/tbl/data.frame)
#>  $ k                   : int 3
#>  $ n_bootstrap         : int 100
#>  $ params              :List of 11
#>  - attr(*, "class")= chr "fancy"
cat("\nRetained edges:", nrow(result$edges), "\n")
#> 
#> Retained edges: 685
cat("Total scored edges:", nrow(result$all_edges), "\n")
#> Total scored edges: 2283

With 100 MAGs there are choose(100, 2) = 4,950 possible pairwise edges. After bootstrap filtering (minimum EdgeFrequency and dCor stability), 2283 edges are scored. The 70th percentile threshold retains 685 of these.

Top 5 edges by hybrid score:

top5 <- head(result$edges[order(-result$edges$HybridScore), ], 5)
knitr::kable(as.data.frame(top5), digits = 3, row.names = FALSE)
source target EdgeFrequency mean.dcor sd.dcor Stability.dcor Stability.dcor.scaled HybridScore
MAG_002 MAG_085 1 0.872 0.016 56.256 1 1
MAG_002 MAG_090 1 0.882 0.015 57.929 1 1
MAG_007 MAG_023 1 0.870 0.016 54.772 1 1
MAG_007 MAG_027 1 0.917 0.011 82.203 1 1
MAG_008 MAG_033 1 0.905 0.017 53.592 1 1

6.1 Diagnostic Table for Known Nonlinear Pairs

The all_edges_unfiltered component contains all 4950 pairs with their intermediate metrics, including pairs that were dropped by the hard pre-filter. This is useful for inspecting known nonlinear pairs:

## Known nonlinear pairs (MAGs 1-4)
pairs <- list(c("MAG_001", "MAG_002"), c("MAG_003", "MAG_004"))
diag_rows <- lapply(pairs, function(p) {
  uf <- result$all_edges_unfiltered
  idx <- which((uf$source == p[1] & uf$target == p[2]) |
               (uf$source == p[2] & uf$target == p[1]))
  if (length(idx)) as.data.frame(uf[idx[1], ])
})
diag_df <- do.call(rbind, diag_rows)
knitr::kable(
  diag_df[, c("source", "target", "EdgeFrequency", "mean.dcor", "sd.dcor",
              "Stability.dcor", "Stability.dcor.scaled", "HybridScore")],
  digits = 3, row.names = FALSE,
  caption = "Intermediate metrics for known nonlinear MAG pairs"
)

Table 1: Intermediate metrics for known nonlinear MAG pairs
source target EdgeFrequency mean.dcor sd.dcor Stability.dcor Stability.dcor.scaled HybridScore
MAG_001 MAG_002 0.96 0.539 0.034 16.023 0.209 0.448
MAG_003 MAG_004 0.44 0.366 0.032 11.410 0.109 0.219

The table shows where each pair sits: EdgeFrequency (how often MRNET selected the edge across bootstraps), mean.dcor and sd.dcor (dCor mean and variability), Stability.dcor (mean/sd ratio), and the final HybridScore. If a pair has EdgeFrequency below 0.3, it was dropped from all_edges by the hard pre-filter but remains visible here.

7 Edges Across Hybrid Score Bands

The scatter plots below show pairs sampled across four hybrid score bands, annotated with hybrid scores and Pearson r. Note how edges with high hybrid scores can have weak Pearson r, indicating nonlinear dependencies that linear measures miss.

all_scored <- result$all_edges[order(-result$all_edges$HybridScore), ]

## Define five score bands and sample 2 pairs from each
bands <- list(
  c(0.9, Inf),
  c(0.6, 0.7),
  c(0.45, 0.55),
  c(0.3, 0.4),
  c(0.1, 0.2)
)
band_labels <- c("~1.0", "0.6-0.7", "0.45-0.55", "0.3-0.4", "0.1-0.2")

set.seed(42)
show_pairs <- data.frame()
for (b in seq_along(bands)) {
  lo <- bands[[b]][1]
  hi <- bands[[b]][2]
  in_band <- all_scored[all_scored$HybridScore >= lo &
                        all_scored$HybridScore < hi, ]
  if (nrow(in_band) == 0) {
    ## If exact band is empty, take the 2 closest edges to midpoint
    mid <- (lo + hi) / 2
    all_scored$dist_tmp <- abs(all_scored$HybridScore - mid)
    in_band <- head(all_scored[order(all_scored$dist_tmp), ], 2)
    all_scored$dist_tmp <- NULL
  }

  ## For mid bands (0.3-0.55): prefer low sd.dcor + weak Pearson r
  ## For lowest band (0.1-0.2): prefer higher Pearson r to show that
  ## linearly correlated pairs correctly receive low hybrid scores
  if ((hi <= 0.55 && lo >= 0.3) && nrow(in_band) > 2) {
    in_band$abs_r <- vapply(seq_len(nrow(in_band)), function(j) {
      abs(cor(net_input[, in_band$source[j]],
              net_input[, in_band$target[j]]))
    }, numeric(1))
    low_r <- in_band[in_band$abs_r < 0.4, ]
    if (nrow(low_r) >= 2) {
      low_r <- low_r[order(low_r$sd.dcor), ]
      picked <- head(low_r, 2)
    } else {
      in_band <- in_band[order(in_band$sd.dcor), ]
      picked <- head(in_band, 2)
    }
    picked$abs_r <- NULL
  } else if (lo < 0.3 && nrow(in_band) > 2) {
    in_band$abs_r <- vapply(seq_len(nrow(in_band)), function(j) {
      abs(cor(net_input[, in_band$source[j]],
              net_input[, in_band$target[j]]))
    }, numeric(1))
    ## Pick one pair with |r| > 0.5, one with |r| > 0.4
    high_r <- in_band[order(-in_band$abs_r), ]
    picked <- head(high_r[high_r$abs_r > 0.4, ], 2)
    if (nrow(picked) < 2) picked <- head(high_r, 2)
    picked$abs_r <- NULL
  } else {
    n_pick <- min(2, nrow(in_band))
    picked <- in_band[sample(seq_len(nrow(in_band)), n_pick), ]
  }

  picked$band <- band_labels[b]
  show_pairs <- rbind(show_pairs, picked)
}

par(mfrow = c(5, 2), mar = c(4, 4.5, 3, 1))
for (i in seq_len(nrow(show_pairs))) {
  s <- show_pairs$source[i]
  t <- show_pairs$target[i]
  hs <- round(show_pairs$HybridScore[i], 3)
  band <- show_pairs$band[i]

  s_genus <- fancy_tiny_taxonomy[s, "Genus"]
  t_genus <- fancy_tiny_taxonomy[t, "Genus"]

  xi <- net_input[, t]
  yi <- net_input[, s]
  cti <- cor.test(xi, yi)

  plot(xi, yi,
       xlab = paste0(t_genus, " (CLR)"),
       ylab = paste0(s_genus, " (CLR)"),
       main = paste0("Score ", band, ": ", s_genus, " vs ", t_genus),
       pch = 19, col = adjustcolor(ch4_cols, 0.7), cex = 1.1)
  legend("topright", bty = "n", cex = 0.9,
         legend = c(
           paste0("Hybrid = ", hs),
           paste0("r = ", round(cti$estimate, 3),
                  " (p = ", format.pval(cti$p.value, digits = 2), ")")
         ))
  if (i == 1) {
    legend("bottomleft", bty = "n", cex = 0.8, title = "CH4",
           legend = c("Low", "Mid", "High"),
           pch = 19, col = c("dodgerblue", "gold", "firebrick"))
  }
}

As the score bands illustrate, the hybrid framework surfaces nonlinear patterns across a range of strengths. We recommend inspecting scatter plots for top-ranking edges to confirm the biological context before interpretation.

8 Nonlinear Interactions Fancy Captured

MAGs 1-4 contain nonlinear associations. We plot their CLR-transformed abundances and annotate with the hybrid scores from the pipeline.

# ---- Pair 1: Threshold competitive exclusion (Bulleidia vs AC2028) ----
x1 <- net_input[, "MAG_002"]
y1 <- net_input[, "MAG_001"]
ct1 <- cor.test(x1, y1)
hs1 <- get_hybrid_score(result$all_edges_unfiltered, "MAG_001", "MAG_002")

par(mar = c(4.5, 4.5, 3.5, 1))
plot(x1, y1,
     xlab = "AC2028 (CLR)",
     ylab = "Bulleidia (CLR)",
     main = "MAG_001 vs MAG_002\nThreshold Competitive Exclusion",
     cex.main = 0.85, cex.axis = 0.75, cex.lab = 0.85,
     pch = 19, col = adjustcolor(ch4_cols, 0.8), cex = 0.9)
lo <- loess(y1 ~ x1, span = 0.75)
ox <- order(x1)
lines(x1[ox], predict(lo)[ox], col = "black", lwd = 2.5)

legend("topright", bty = "n", cex = 0.75,
       legend = c(
         paste0("Pearson r = ", round(ct1$estimate, 3),
                " (p = ", format.pval(ct1$p.value, digits = 2), ")"),
         paste0("Hybrid score = ", hs1)
       ))
legend("bottomleft", bty = "n", cex = 0.7, title = "CH4 (g/day)",
       legend = c("Low", "Mid", "High"),
       pch = 19, col = c("dodgerblue", "gold", "firebrick"))

# ---- Pair 2: L-shaped relationship (RUG023 vs Cryptobacteroides) ----
x2 <- net_input[, "MAG_004"]
y2 <- net_input[, "MAG_003"]
ct2 <- cor.test(x2, y2)
hs2 <- get_hybrid_score(result$all_edges_unfiltered, "MAG_003", "MAG_004")

par(mar = c(4.5, 4.5, 3.5, 1))
plot(x2, y2,
     xlab = "Cryptobacteroides (CLR)",
     ylab = "RUG023 (CLR)",
     main = "MAG_003 vs MAG_004\nL-Shaped Nonlinear Relationship",
     cex.main = 0.85, cex.axis = 0.75, cex.lab = 0.85,
     pch = 19, col = adjustcolor(ch4_cols, 0.8), cex = 0.9)

legend("topright", bty = "n", cex = 0.75,
       legend = c(
         paste0("Pearson r = ", round(ct2$estimate, 3),
                " (p = ", format.pval(ct2$p.value, digits = 2), ")"),
         paste0("Hybrid score = ", hs2)
       ))
legend("bottomleft", bty = "n", cex = 0.7, title = "CH4 (g/day)",
       legend = c("Low", "Mid", "High"),
       pch = 19, col = c("dodgerblue", "gold", "firebrick"))

Both interactions have weak Pearson correlations, yet the underlying biological dependencies are clear. Note that the L-shaped pair (MAG_003 vs MAG_004) is lost from the final network because its EdgeFrequency falls below the hard hybrid cut-off of 0.3 — MRNET does not consistently select it across bootstraps, even though dCor captures the pattern reliably. This is a deliberate trade-off: the multiplicative hybrid formula prioritises edges supported by both MI/MRNET and dCor, favouring robustness over sensitivity. Purely nonlinear associations that MI misses can still be found by inspecting all_edges_unfiltered sorted by Stability.dcor.scaled. Users who prefer dCor-discovered edges to be retained can use score_type = "additive" (where strong dCor alone carries a meaningful score) or set dcor_rescue_percentile (e.g. 0.6) to give high-stability edges a minimum EdgeFrequency floor in the multiplicative formula.

9 Re-thresholding

threshold_edges() re-thresholds the full scored edge list without re-running the bootstrap. This is useful for exploring different stringency levels.

Parameter Default Suggested Description
method “quantile” “quantile” or “top_n” Thresholding strategy
value 0.8 0.7 to 0.9 for quantile; 10 to 50 for top_n Cutoff value
strict <- threshold_edges(result$all_edges, method = "quantile", value = 0.8)
cat("Edges at 80th percentile:", nrow(strict), "\n")
#> Edges at 80th percentile: 457

10 Score Distribution

The S3 plot() method shows the hybrid score distribution with the threshold cutoff marked.

plot(result)

11 Taxonomy Annotation

Merge taxonomy onto edges to see which lineages are co-abundant.

edges <- result$edges

# Add source taxonomy
edges <- merge(edges, fancy_tiny_taxonomy[, c("Genus", "Phyla")],
               by.x = "source", by.y = "row.names", all.x = TRUE)
names(edges)[names(edges) == "Genus"] <- "source_Genus"
names(edges)[names(edges) == "Phyla"] <- "source_Phyla"

# Add target taxonomy
edges <- merge(edges, fancy_tiny_taxonomy[, c("Genus", "Phyla")],
               by.x = "target", by.y = "row.names", all.x = TRUE)
names(edges)[names(edges) == "Genus"] <- "target_Genus"
names(edges)[names(edges) == "Phyla"] <- "target_Phyla"

head(edges[order(-edges$HybridScore),
     c("source_Genus", "target_Genus", "source_Phyla", "target_Phyla",
       "HybridScore")], 15)
#>         source_Genus     target_Genus source_Phyla target_Phyla HybridScore
#> 42           CAG-791          CAG-791    Bacillota    Bacillota           1
#> 54           CAG-791          CAG-791    Bacillota    Bacillota           1
#> 55           CAG-791          CAG-791    Bacillota    Bacillota           1
#> 63           CAG-791          CAG-791    Bacillota    Bacillota           1
#> 64           CAG-791          CAG-791    Bacillota    Bacillota           1
#> 70           CAG-791          CAG-791    Bacillota    Bacillota           1
#> 76           CAG-791          CAG-791    Bacillota    Bacillota           1
#> 77           CAG-791          CAG-791    Bacillota    Bacillota           1
#> 79           CAG-791          CAG-791    Bacillota    Bacillota           1
#> 80           CAG-791          CAG-791    Bacillota    Bacillota           1
#> 83           CAG-791          CAG-791    Bacillota    Bacillota           1
#> 98  Succiniclasticum Succiniclasticum    Bacillota    Bacillota           1
#> 111          CAG-791           RUG842    Bacillota    Bacillota           1
#> 140           RUG740           RUG740    Bacillota    Bacillota           1
#> 162       Prevotella       Prevotella Bacteroidota Bacteroidota           1

12 Network Visualization

If igraph is installed, plot_network() draws the co-abundance network with nodes coloured by phylum and edges weighted by HybridScore. Communities are detected using modularity-based clustering (analogous to GLay in Cytoscape). Shaded regions group MAGs that co-occur more frequently with each other than with MAGs outside their cluster.

plot_network(result, fancy_tiny_taxonomy, community = TRUE)

13 Exporting for Cytoscape

export_cytoscape() writes node and edge tables as tab-separated files that Cytoscape can import directly.

Parameter Default Description
fancy_result Object of class "fancy"
taxonomy Data.frame with MAG IDs as rownames, must have Phyla column
file_prefix “fancy_network” Path prefix for output TSV files
edges “thresholded” Which edges: "thresholded" or "all"
cyto <- export_cytoscape(
  result,
  fancy_tiny_taxonomy,
  file_prefix = tempfile("fancy")
)

The returned list contains the same data.frames that were written to disk:

head(cyto$edges)
#>    source  target HybridScore
#> 1 MAG_002 MAG_085           1
#> 2 MAG_002 MAG_090           1
#> 3 MAG_007 MAG_023           1
#> 4 MAG_007 MAG_027           1
#> 5 MAG_008 MAG_033           1
#> 6 MAG_008 MAG_044           1
head(cyto$nodes)
#>        id   Domain          Phyla          Class              Order
#> 1 MAG_001 Bacteria      Bacillota        Bacilli Erysipelotrichales
#> 2 MAG_002 Bacteria      Bacillota     Clostridia     Lachnospirales
#> 3 MAG_003 Bacteria  Spirochaetota   Spirochaetia   Sphaerochaetales
#> 4 MAG_004 Bacteria   Bacteroidota    Bacteroidia      Bacteroidales
#> 5 MAG_005 Bacteria Actinomycetota Coriobacteriia   Coriobacteriales
#> 6 MAG_006 Bacteria      Bacillota  Negativicutes  Acidaminococcales
#>                Family             Genus           Species   Color
#> 1 Erysipelotrichaceae         Bulleidia                sp #377EB8
#> 2     Lachnospiraceae            AC2028            AC2028 #377EB8
#> 3   Sphaerochaetaceae            RUG023            RUG023 #999999
#> 4              UBA932 Cryptobacteroides Cryptobacteroides #4DAF4A
#> 5     Eggerthellaceae           UBA9715           UBA9715 #E41A1C
#> 6  Acidaminococcaceae  Succiniclasticum  Succiniclasticum #377EB8

To import into Cytoscape:

  1. File > Import > Network from File: select the *_edges.tsv file. Map source to Source Node, target to Target Node, and HybridScore to an edge attribute.
  2. File > Import > Table from File: select the *_nodes.tsv file. Map the id column to the node key column.
  3. Use the Color column in the node table to set node fill colour by mapping it as a passthrough in the Style panel.
  4. Apps > Community Detection (GLay) to run GLay community clustering on the network.

14 Saving Results

To save all pipeline and intermediate objects for later use without re-running the bootstrap:

save(result, net_input,
     fancy_tiny_clr, fancy_tiny_counts, fancy_tiny_coverage,
     fancy_tiny_taxonomy, fancy_tiny_metadata,
     all_scored, show_pairs, diag_df, edges, top5, strict,
     file = "vignette_objects.RData")

15 Session Info

sessionInfo()
#> R version 4.6.0 RC (2026-04-17 r89917)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] knnmi_1.0        energy_1.7-12    minet_3.70.0     Fancy_0.99.0    
#> [5] BiocStyle_2.40.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] tensorA_0.36.2.1    tidyr_1.3.2         sass_0.4.10        
#>  [4] generics_0.1.4      robustbase_0.99-7   stringi_1.8.7      
#>  [7] hms_1.1.4           digest_0.6.39       magrittr_2.0.5     
#> [10] evaluate_1.0.5      bayesm_3.1-7        bookdown_0.46      
#> [13] fastmap_1.2.0       jsonlite_2.0.0      snowfall_1.84-6.3  
#> [16] tinytex_0.59        BiocManager_1.30.27 purrr_1.2.2        
#> [19] jquerylib_0.1.4     cli_3.6.6           rlang_1.2.0        
#> [22] gsl_2.1-9           withr_3.0.2         cachem_1.1.0       
#> [25] yaml_2.3.12         otel_0.2.0          tools_4.6.0        
#> [28] parallel_4.6.0      tzdb_0.5.0          dplyr_1.2.1        
#> [31] compositions_2.0-9  boot_1.3-32         vctrs_0.7.3        
#> [34] R6_2.6.1            lifecycle_1.0.5     magick_2.9.1       
#> [37] stringr_1.6.0       MASS_7.3-65         pkgconfig_2.0.3    
#> [40] pillar_1.11.1       bslib_0.11.0        glue_1.8.1         
#> [43] Rcpp_1.1.1-1.1      DEoptimR_1.1-4      xfun_0.57          
#> [46] tibble_3.3.1        tidyselect_1.2.1    knitr_1.51         
#> [49] igraph_2.3.1        htmltools_0.5.9     snow_0.4-4         
#> [52] rmarkdown_2.31      readr_2.2.0         compiler_4.6.0