We introduce a Statistical Approach via Pseudo-value Information and Estimation for Differential Network Analysis (SOHPIE; pronounced as “Sofie”) [1]. This is a regression modeling method for differential network (DN) analysis that can include covariate information in analyzing microbiome data.
Please install these R packages prior to use SOHPIE-DNA.
# library(robustbase) # To fit a robust regression.
# library(parallel) # To use mclapply() when reestimating the association matrix.
# library(dplyr) # For the convenience of tabulating p-values, coefficients, and q-values.
# library(fdrtool) # For false discovery rate control.
# library(gtools) # To estimate an association matrix via SparCC.
Two sample datasets are available in this package. One (combinedamgut
) is from the American Gut Project [2] and contains 138 taxa and 268 subjects. In this user manual, the first 30 out of 138 taxa will be used for the simple demonstration purpose. The other (combineddietswap
) is from the geographical epidemiology study of diet swap intervention [3] that includes 112 taxa with 37 subjects (20 African Americans from Pittsburgh and 17 rural South Africans). The full data of each study are available in the SpiecEasi and microbiome R packages, respectively.
The main grouping variable will be the indicator variable for the status of living with a dog. After the data processing, the indices of subjects will be available for each ‘Not living with a dog (Group A)’ vs. ‘Living with a dog (Group B).’ We need these indices for the estimation of group-specific \(p \times p\) association matrices (and re-estimation of association matrices for pseudo-value calculations later).
# Note: Again, we will use a toy example with the first 30 out of 138 taxa.
OTUtab = combinedamgut[ , 8:37]
# Clinical/demographic covariates (phenotypic data):
# Note: All of these covariates in phenodat below will be included in the regression
# when you use SOHPIE_DNA function later. Please make sure
# phenodat below include variables that will be analyzed only.
phenodat = combinedamgut[, 1:7] # first column is ID, so not using it.
# Obtain indices of each grouping factor.
# In this example, a variable indicating the status of living with a dog was chosen (i.e. bin_dog).
# Accordingly, Groups A and B imply living without and with a dog, respectively.
newindex_grpA = which(combinedamgut$bin_dog == 0)
newindex_grpB = which(combinedamgut$bin_dog == 1)
Upon our data processing step above is complete, we can then fit a pseudo-value regression using SOHPIE_DNA
function. An important note! Please provide the object name of each OTU table and clinical/demographic data (i.e. metadata) separately in the function. In addition, you must indicate the object names of the indices for each group of a binary indicator variable that is used as a main predictor variable (e.g. living with a dog vs. without a dog).
Now, I would like to show you that SOHPIE has some convenient tools/functions after fitting a pseudo-value regression. There are functions that you can quickly extract names of taxa that are significantly differentially connected (DC; DCtaxa_tab
), as well as adjusted p-values (q-values; qval
and qval_specific_var
) and coefficient estimates (coeff
and coeff_specific_var
) of all variables that are considered in the regression or a specific variable.
# qval() function will get you a table with q-values.
qval(SOHPIEres)
#> bin_dog age sex bin_floss bin_exercise cat_alcohol1
#> 326792 0.6718599537 0.03101449 0.06266118 0.9162595 1 0.52326081
#> 348374 0.5748463782 0.74520158 0.19217181 0.9600343 1 0.38502163
#> 181016 0.6307894498 0.73537257 0.31097113 0.9552732 1 0.69537716
#> 191687 0.6176794513 0.75533084 0.19065216 0.9475789 1 0.14040928
#> 305760 0.3901074464 0.30332590 0.05846960 0.8306534 1 0.14218172
#> 326977 0.3865110096 0.72056596 0.06122356 0.9307835 1 0.46242556
#> 194648 0.7171013989 0.66708462 0.21097982 0.6282862 1 0.63387885
#> 28186 0.6598291209 0.35869206 0.17999026 0.9616961 1 0.70635985
#> 541301 0.5466453444 0.70558475 0.32229419 0.6282862 1 0.11265657
#> 198941 0.6674430798 0.03955728 0.18871585 0.9654564 1 0.58934495
#> 353985 0.6246109890 0.27581278 0.28870141 0.9529683 1 0.59538815
#> 187524 0.6184338262 0.01049243 0.37866069 0.9398391 1 0.36751633
#> 182054 0.7192386938 0.14048573 0.17444083 0.9421658 1 0.72538489
#> 175537 0.5925609895 0.56989007 0.09704646 0.8213131 1 0.08461242
#> 9753 0.0007377574 0.28063614 0.17340556 0.8853829 1 0.14024956
#> 194211 0.3940861909 0.64146842 0.18019947 0.7018547 1 0.08905296
#> 188518 0.6178576071 0.70399261 0.06210556 0.8894388 1 0.66679831
#> 189396 0.5593156353 0.67969064 0.28480546 0.6282862 1 0.38300187
#> 90487 0.5818455566 0.01376920 0.06194247 0.9502697 1 0.37477822
#> 203708 0.6031857425 0.18927892 0.15567288 0.9068511 1 0.32739979
#> 173965 0.6776070018 0.71064343 0.05174975 0.9370214 1 0.59055892
#> 194661 0.5880918436 0.29693394 0.06244570 0.9548121 1 0.49558941
#> 512309 0.3999094450 0.67878505 0.13539747 0.8109234 1 0.74848265
#> 170124 0.7334064348 0.68746552 0.29771847 0.9366570 1 0.75842568
#> 216862 0.6282296460 0.71829359 0.11440726 0.9608466 1 0.71383068
#> 352304 0.3996296770 0.67747514 0.19140209 0.9033311 1 0.73118550
#> 191306 0.6777469157 0.62493368 0.19400474 0.9618242 1 0.72866880
#> 191541 0.5992836390 0.41855510 0.21526287 0.9500501 1 0.61848756
#> 191547 0.3349451520 0.56546219 0.17931233 0.9656134 1 0.62444641
#> 195493 0.6313144806 0.35204345 0.18474762 0.9563969 1 0.14443690
#> cat_alcohol2 bin_migraine
#> 326792 0.9431084 0.7577956
#> 348374 0.9392681 0.8684028
#> 181016 0.9553457 0.8297496
#> 191687 0.9596005 0.8801983
#> 305760 0.8233684 0.7194888
#> 326977 0.9174052 0.8803676
#> 194648 0.9152999 0.7289431
#> 28186 0.6542158 0.5837704
#> 541301 0.9428093 0.5831110
#> 198941 0.6542158 0.7357105
#> 353985 0.9545933 0.1835796
#> 187524 0.9673489 0.3214464
#> 182054 0.9343531 0.1835796
#> 175537 0.9470631 0.8343346
#> 9753 0.9629066 0.6973559
#> 194211 0.9401837 0.6534395
#> 188518 0.9429131 0.8767364
#> 189396 0.9413717 0.8080775
#> 90487 0.9622269 0.1835796
#> 203708 0.9430064 0.7378143
#> 173965 0.9628792 0.8567233
#> 194661 0.9103574 0.6680625
#> 512309 0.9576967 0.6177373
#> 170124 0.9499444 0.8679927
#> 216862 0.9513115 0.8424656
#> 352304 0.8919443 0.8659532
#> 191306 0.9386446 0.4095100
#> 191541 0.9501993 0.1835796
#> 191547 0.9302687 0.8628035
#> 195493 0.9581127 0.8865934
qval_specific_var
function will be useful to retrieve the q-values of a specific variable, bin_dog
in this example.
# Create an object to keep the table with q-values.
qvaltab <- qval(SOHPIEres)
# Retrieve a vector of q-values for a single variable of interest.
qval_specific_var(qvaltab = qvaltab, varname = "bin_dog")
#> bin_dog
#> 326792 0.6718599537
#> 348374 0.5748463782
#> 181016 0.6307894498
#> 191687 0.6176794513
#> 305760 0.3901074464
#> 326977 0.3865110096
#> 194648 0.7171013989
#> 28186 0.6598291209
#> 541301 0.5466453444
#> 198941 0.6674430798
#> 353985 0.6246109890
#> 187524 0.6184338262
#> 182054 0.7192386938
#> 175537 0.5925609895
#> 9753 0.0007377574
#> 194211 0.3940861909
#> 188518 0.6178576071
#> 189396 0.5593156353
#> 90487 0.5818455566
#> 203708 0.6031857425
#> 173965 0.6776070018
#> 194661 0.5880918436
#> 512309 0.3999094450
#> 170124 0.7334064348
#> 216862 0.6282296460
#> 352304 0.3996296770
#> 191306 0.6777469157
#> 191541 0.5992836390
#> 191547 0.3349451520
#> 195493 0.6313144806
DCtaxa_tab
will return a list containing of (1) names and q-values of taxa that are significantly DC between two biological conditions and (2) names of DC taxa only.
# Please do NOT forget to provide the name of variable in DCtaxa_tab(groupvar = )
# and the level of significance (0.3 in this example).
DCtaxa_tab <- DCtaxa_tab(qvaltab = qvaltab, groupvar = "bin_dog", alpha = 0.3)
DCtaxa_tab
#> $DCtaxa_complete_tab
#> bin_dog
#> 9753 0.0007377574
#>
#> $DCtaxa_names_only
#> [1] "9753"
[1] Ahn S, Datta S. (2023). Differential Co-Abundance Network Analyses for Microbiome Data Adjusted for Clinical Covariates Using Jackknife Pseudo-Values. Under Review at \(\textit{BMC Bioinformatics}\).
[2] McDonald D. et al. (2018). American Gut: an Open Platform for Citizen Science Microbiome Research. \(\textit{mSystems}\). 3(3), e00031–18
[3] O’Keefe SJ. et al. (2015). Fat, fibre and cancer risk in African Americans and rural Africans. \(\textit{Nat Commun}\). 6, 6342