---
title: "multiWGCNA: the full workflow"
author: "Dario Tommasini"
date: "`r Sys.Date()`"
output: 
  BiocStyle::html_document:
    toc: true
vignette: >
  %\VignetteIndexEntry{General Workflow}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction

The multiWGCNA R package is a WGCNA-based procedure designed for RNA-seq datasets with two biologically meaningful variables. We developed this package because we've found that the experimental design for network analysis can be ambiguous. For example, should I analyze my treatment and wildtype samples separately or together? We find that it is informative to perform all the possible design combinations, and subsequently integrate these results. For more information, please see Tommasini and Fogel. BMC Bioinformatics. 2023.

In this vignette, we will be showing how users can perform a full start-to-finish multiWGCNA analysis. We will be using the regional autism data from Voineagu et al. 2011 (https://www.nature.com/articles/nature10110). This dataset has two sample traits: 1) autism versus control, and 2) temporal cortex versus frontal cortex. 

# Install the multiWGCNA R package

```{r eval=FALSE}
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("multiWGCNA")
```

The development version of multiWGCNA can be installed from GitHub like this:

```{r eval=FALSE}
if (!require("devtools", quietly = TRUE))
    install.packages("devtools")
    
devtools::install_github("fogellab/multiWGCNA")
```

Load multiWGCNA library

```{r}
library(multiWGCNA)
```

# Load microarray data from human post-mortem brains

Since this is purely an example of how to run the entire analysis from start to finish, we are going to limit our analysis to 1500 randomly selected probes. 

Input data: 

* The expression data can be either a SummarizedExperiment object or a data.frame with genes as rows and samples as columns
* The sampleTable should be a data.frame with the first column named "Sample". The second and third column names can be arbitrary, i.e. "disease" or "Genotype"


```{r}
# Download data from the ExperimentHub
library(ExperimentHub)
eh = ExperimentHub()

# Note: this requires the SummarizedExperiment package to be installed
eh_query = query(eh, c("multiWGCNAdata"))
autism_se = eh_query[["EH8219"]]

# Collect the metadata in the sampleTable
sampleTable = colData(autism_se)

# Randomly sample 1500 genes from the expression matrix
set.seed(1)
autism_se = autism_se[sample(rownames(autism_se), 1500),]

# Check the data
assays(autism_se)[[1]][1:5, 1:5]
# sampleTable$Status = paste0("test_", sampleTable$Status, "_test")
# sampleTable$Sample = paste0(sampleTable$Sample, "_test")
sampleTable

# Define our conditions for trait 1 (disease) and 2 (brain region)
conditions1 = unique(sampleTable[,2])
conditions2 = unique(sampleTable[,3])
```

# Perform network construction, module eigengene calculation, module-trait correlation 

Also, let's find best trait for each module and identify outlier modules (ie modules driven by a single sample). 

Let's use power = 10 since Voineagu et al. 2011 used that for all their networks. They also constructed "unsigned" networks and a mergeCutHeight of 0.10 so that modules with similar expression are merged. This step may take a minute.  

```{r}
# Construct the combined networks and all the sub-networks (autism only, controls only, FC only, and TC only)
autism_networks = constructNetworks(autism_se, sampleTable, conditions1, conditions2,
                                  networkType = "unsigned", power = 10,
                                  minModuleSize = 40, maxBlockSize = 25000,
                                  reassignThreshold = 0, minKMEtoStay = 0.7,
                                  mergeCutHeight = 0.10, numericLabels = TRUE,
                                  pamRespectsDendro = FALSE, verbose=3,
                                  saveTOMs = TRUE)
```

# Compare modules by overlap across conditions

For calculating significance of overlap, we use the hypergeometric test (also known as the one-sided Fisher exact test). We'll save the results in a list. Then, let's take a look at the mutual best matches between autism modules and control modules. 

```{r, fig.height = 4, fig.width = 7}
# Save results to a list
results=list()
results$overlaps=iterate(autism_networks, overlapComparisons, plot=TRUE)

# Check the reciprocal best matches between the autism and control networks
head(results$overlaps$autism_vs_controls$bestMatches)
```

You can use the TOMFlowPlot to plot the recruitment of genes from one network analysis to another. See Figure 6 from Tommasini and Fogel, 2023 for an example. Note that you will need to set saveTOMs = TRUE in the constructNetworks function above. 

```{r, fig.height=6, fig.width=4}
networks = c("autism", "controls")
toms = lapply(networks, function(x) {
  load(paste0(x, '-block.1.RData'))
  get("TOM")
})

# Check module autism_005
TOMFlowPlot(autism_networks, 
            networks, 
            toms, 
            genes_to_label = topNGenes(autism_networks$autism, "autism_005"), 
            color = 'black', 
            alpha = 0.1, 
            width = 0.05)
```

# Perform differential module expression analysis 

This test for an association between the module eigengene (ME) and the two sample traits or their interaction. Therefore, the model is ME = trait1 + trait2 + trait1*trait2 and tests for a significant association to the traits using factorial ANOVA. These results can be stored in the diffModExp component of the results list. It can also perform PERMANOVA, which uses multivariate distances and can thus be applied to the module expression rather than just the first principal component (module eigengene). 

Module combined_004 seems to be the module most significantly associated with disease status using standard ANOVA (module combined_000 represents genes that were unassigned so disregard this module). Let's use the diffModuleExpression function to visually check this module for an association to autism status. 

Note: By setting plot = TRUE, the runDME function generates a PDF file called "combined_DME.pdf" in the current directory with these plots for all the modules. 

```{r, fig.height = 6, fig.width = 7}
# Run differential module expression analysis (DME) on combined networks
results$diffModExp = runDME(autism_networks[["combined"]], 
                            sampleTable, 
                            p.adjust="fdr", 
                            refCondition="Tissue", 
                            testCondition="Status") 
                            # plot=TRUE, 
                            # out="combined_DME.pdf")

# to run PERMANNOVA
# library(vegan)
# results$diffModExp = runDME(autism_networks[["combined"]], p.adjust="fdr", refCondition="Tissue", 
#                          testCondition="Status", plot=TRUE, test="PERMANOVA", out="PERMANOVA_DME.pdf")

# Check adjusted p-values for the two sample traits
results$diffModExp

# You can check the expression of a specific module like this. Note that the values reported in the bottom panel title are p-values and not FDR-adjusted like in results$diffModExp
diffModuleExpression(autism_networks[["combined"]], 
                     geneList = topNGenes(autism_networks[["combined"]], "combined_004"), 
                     design = sampleTable,
                     test = "ANOVA",
                     plotTitle = "combined_004",
                     plot = TRUE)
```

# Perform the module preservation analysis

Determine if any modules in the autism data are not preserved in the healthy data (or vice versa). This is the slowest step as the calculation of permuted statistics takes a while. Typically, this can be parallelized using the enableWGCNAThreads function, but for this vignette we'll just do 10 permutations so one core should suffice.   

```{r, fig.height = 3, fig.width = 7}
# To enable multi-threading
# library(doParallel)
# library(WGCNA)
# nCores = 8
# registerDoParallel(cores = nCores)
# enableWGCNAThreads(nThreads = nCores)

# Calculate preservation statistics
results$preservation=iterate(autism_networks[conditions1], # this does autism vs control; change to "conditions2" to perform comparison between FC and TC
                             preservationComparisons, 
                             write=FALSE, 
                             plot=TRUE, 
                             nPermutations=10)
```

# Summarize interesting results from the analyses

These include differentially preserved trait-associated modules and differentially expressed modules. 

```{r}
# Print a summary of the results
summarizeResults(autism_networks, results)
```

# Print the session info

```{r}
sessionInfo()
```