title: “robin” output: rmarkdown::html_vignette vignette: > % %

author: “Valeria Policastro - Dario Righelli”


ROBIN

In network analysis, many community detection algorithms have been developed. However,their applications leave unaddressed one important question: the statistical validation of the results. Are the detected communities significant or are they a result of chance only due to the positions of edges in the network?

ROBIN (ROBustness In Network) is an R package for the validation of community detection it has a double aim it studies the robustness of a community detection algorithm and compares the robustness of two community algorithms.

The package implements a methodology that detects if the community structure found by a detection algorithm is statistically significant or is a result of chance, merely due to edge positions in the network. It performs a perturbation strategy and runs a null model to build a set of procedures based on different stability measures.

In particular, it provides: 1)A procedure to examine the stability of the partition recovered against random perturbations of the original graph structure 2)Three tests to determine whether the obtained clustering departs significantly from the null model 3)A routine to compare different detection algorithms applied to the same network to discover which fits better 4)A graphical interactive representation

Install package

library(devtools)
install_github("ValeriaPolicastro/robin",force=TRUE)
## 
##   
   checking for file ‘/tmp/RtmpiN9A93/remotes426a675064c1/ValeriaPolicastro-robin-d9b82fd/DESCRIPTION’ ...
  
✔  checking for file ‘/tmp/RtmpiN9A93/remotes426a675064c1/ValeriaPolicastro-robin-d9b82fd/DESCRIPTION’
## 
  
─  preparing ‘robin’:
## 
  
   checking DESCRIPTION meta-information ...
  
✔  checking DESCRIPTION meta-information
## 
  
─  checking for LF line-endings in source and make files and shell scripts
## 
  
─  checking for empty or unneeded directories
## 
  
─  building ‘robin_0.99.0.tar.gz’
## 
  
   
## 
library("robin")

Preparation of the Graph

As input, the ROBIN package expects a network that can be read from different format: edgelist, pajek, graphml, gml, ncol, lgl, dimacs, graphdb and igraph graphs.

With the prepGraph function we create an igraph object, from the selected input file, needed for the ROBIN execution.

my_network <- system.file("example/football.gml", package="robin")
# downloaded from: http://www-personal.umich.edu/~mejn/netdata/
graph <- prepGraph(file=my_network, file.format="gml")
graph
## IGRAPH fa1689d U--- 115 613 -- 
## + attr: id (v/n), label (v/c), value (v/n)
## + edges from fa1689d:
##  [1] 1--  2 1--  5 1-- 10 1-- 17 1-- 24 1-- 34 1-- 36 1-- 42 1-- 66 1-- 91
## [11] 1-- 94 1--105 2-- 26 2-- 28 2-- 34 2-- 38 2-- 46 2-- 58 2-- 90 2--102
## [21] 2--104 2--106 2--110 3--  4 3--  7 3-- 14 3-- 15 3-- 16 3-- 48 3-- 61
## [31] 3-- 65 3-- 73 3-- 75 3--101 3--107 4--  6 4-- 12 4-- 27 4-- 41 4-- 53
## [41] 4-- 59 4-- 73 4-- 75 4-- 82 4-- 85 4--103 5--  6 5-- 10 5-- 17 5-- 24
## [51] 5-- 29 5-- 42 5-- 70 5-- 94 5--105 5--109 6-- 11 6-- 12 6-- 53 6-- 75
## [61] 6-- 82 6-- 85 6-- 91 6-- 98 6-- 99 6--108 7--  8 7-- 33 7-- 40 7-- 48
## [71] 7-- 56 7-- 59 7-- 61 7-- 65 7-- 86 7--101 7--107 8--  9 8-- 22 8-- 23
## + ... omitted several edges

Random Graph

In such a way, we can create the Random graph (null model) that has to be used for the package, the graph argument must be of the same type returned by the prepGraph function.

graphRandom <- random(graph=graph)
graphRandom
## IGRAPH 7de5771 U--- 115 613 -- 
## + attr: id (v/n), label (v/c), value (v/n)
## + edges from 7de5771:
##  [1]  1-- 93  1-- 95  1-- 10 17-- 92  1-- 24 17-- 34 36-- 45  1-- 42
##  [9] 66--115 55--113  1-- 91  1--107  2--  9  2-- 28 77-- 95  2-- 72
## [17]  2-- 46  2-- 37  2-- 91 47--102  2-- 24  2-- 29  2-- 88  4-- 33
## [25]  3-- 82  3-- 81  1-- 71  3-- 16  3-- 71 34-- 91  3-- 35  3-- 68
## [33] 79--108 75--113  6-- 74  4--104  4-- 46  4--101 92-- 98  4-- 53
## [41] 21-- 59  4-- 43 32--103 30-- 81 76-- 85  4-- 18  5-- 89  5-- 10
## [49]  5-- 19  5--  7  5-- 40 42--113 64-- 70  5-- 92 33--105  4--  5
## [57]  6-- 94  6-- 12  6-- 70  6-- 75  6-- 97  6-- 46  6-- 91  6-- 98
## + ... omitted several edges

Plot graph

To have a graphical representation of the network we use the plotGraph function implemented with aid of networkD3 package.

plotGraph(graph)

Create Community

To create the communities, in the package are implemented all the igraph algorithms.

methodCommunity(graph=graph, method="fastGreedy") #as community 
## IGRAPH clustering fast greedy, groups: 6, mod: 0.55
## + groups:
##   $`1`
##    [1]   7  14  16  33  40  48  61  65 101 107
##   
##   $`2`
##    [1]   8   9  10  17  22  23  24  42  47  50  52  54  68  69  74  78  79
##   [18]  89 105 109 111 112 115
##   
##   $`3`
##    [1]   1   2  20  26  30  31  34  36  38  46  56  80  81  83  90  94  95
##   [18] 102 104 106 110
##   + ... omitted several groups/vertices
membershipCommunities(graph=graph, method="fastGreedy") #as membership
##   [1] 3 3 5 5 5 5 1 2 2 2 5 5 4 1 4 1 2 6 4 3 6 2 2 2 5 3 4 6 5 3 3 4 1 3 4
##  [36] 3 6 3 4 1 5 2 6 4 6 3 2 1 6 2 5 2 5 2 4 3 6 6 6 6 1 4 6 6 1 6 6 2 2 5
##  [71] 6 4 5 2 5 6 6 2 2 3 3 5 3 5 5 4 6 6 2 3 5 6 6 3 3 6 6 6 5 4 1 3 5 3 2
## [106] 3 1 5 2 3 2 2 6 6 2

Community Plot

It gives an interactive 3D plot of the communites for the chosen algorithm.

members <- membershipCommunities(graph=graph, method="fastGreedy")
plotComm(graph=graph, members=members)

Procedure to validate the robustness

It creates the ROBIN procedure to validate the robustness of your network. In this example we used the “vi” distance as a stability measure, the indipendent type of procedure and the louvain algorithm as a community detection algorithm, but users can choose different measures (“nmi”,“split.join”, “adjusted.rand”) and algorithms (walktrap“,”edgeBetweenness“,”fastGreedy“,”spinglass“,”leadingEigen“,”labelProp“,”infomap“,”optimal“,”other") implemented in the package. To make things easier we save the output list in the proc variable to use it later.

proc <- robinRobust(graph=graph, graphRandom=graphRandom, measure="vi", 
                  method="louvain", type="independent")
## [1] 31
## [1] 61
## [1] 92
## [1] 123
## [1] 153
## [1] 184
## [1] 215
## [1] 245
## [1] 276
## [1] 306
## [1] 337
## [1] 368

Robin Plots

Now in order to compare the obtained curves from the Procedure we can make a plot with the plotRobin function. It shows on the y-axis the average of the chosen measure while on the x-axis the percentuage of perturbation of the real data and the null model.

The model1 and model2 arguments are, respectively, the Mean and the MeanRandom contained inside the robinRobust output list.

plotRobin(graph=graph, model1=proc$Mean, model2=proc$MeanRandom, 
legend=c("real data", "null model"), measure="vi")

Statistical Tests between Real data and Null model

Now we test the differeces between this two curves with: - Functional data analysis - Gaussian Process - Area Under the Curve (AUC) The model1 and model2 arguments are, respectively, the Mean and the MeanRandom contained inside the robinRobust output list.

robinFDATest(graph=graph, model1=proc$Mean, model2=proc$MeanRandom, 
             measure="vi")
## [1] "First step: basis expansion"
## Swapping 'y' and 'argvals', because 'y' is  simpler,
##   and 'argvals' should be;  now  dim(argvals) =  13 ;  dim(y) =  13 x 20 
## [1] "Second step: joint univariate tests"
## [1] "Third step: interval-wise combination and correction"
## [1] "creating the p-value matrix: end of row 2 out of 9"
## [1] "creating the p-value matrix: end of row 3 out of 9"
## [1] "creating the p-value matrix: end of row 4 out of 9"
## [1] "creating the p-value matrix: end of row 5 out of 9"
## [1] "creating the p-value matrix: end of row 6 out of 9"
## [1] "creating the p-value matrix: end of row 7 out of 9"
## [1] "creating the p-value matrix: end of row 8 out of 9"
## [1] "creating the p-value matrix: end of row 9 out of 9"
## [1] "Interval Testing Procedure completed"

## $ask
## [1] TRUE
robinGPTest(ratio=proc$ratios)
##  Profile  1 
##  Profile  2
## [1] 133.2466
robinAUC(graph=graph, model1=proc$Mean, model2=proc$MeanRandom, 
             measure="vi")
## $area1
## [1] 0.1145604
## 
## $area2
## [1] 0.245196

Comparison Two different Methods

Now we want to see which algorithm (between two of them) better detects the communities. For example if we take the Fast Greedy and Louvain algorithms, we want to check which one better fits our network. We firstly plot them with their different communities to have an idea of the network of interest.

membersFast <- membershipCommunities(graph=graph, method="fastGreedy")
membersLouv <- membershipCommunities(graph=graph, method="louvain")
plotComm(graph=graph, members=membersFast)
plotComm(graph=graph, members=membersLouv)

Secondly, we run the robinCompare function to compare the two algorithms, which, as before, we store the output in the comp variable to use it later.

comp <- robinCompare(graph=graph, method1="fastGreedy",
                method2="louvain", measure="vi", type="independent")
## [1] 31
## [1] 61
## [1] 92
## [1] 123
## [1] 153
## [1] 184
## [1] 215
## [1] 245
## [1] 276
## [1] 306
## [1] 337
## [1] 368

Thirdly, we plot the two curves of the two methods in comparison. The model1, model2 arguments are, respectively, the Mean1, Mean2 contained inside the robinCompare output list.

plotRobin(graph=graph, model1=comp$Mean1, model2=comp$Mean2, measure="vi", 
legend=c("fastGreedy", "louvain"), title="FastGreedy vs Louvain")

In this example, the Louvain algorithm fits better the network of interest, as the curve of the stability measure varies less than the one obtained by the Fast greedy method.

Statistical Tests between two community detection algorithms

Now we test the differeces between the two curves, as we did before but for the comparison of the two different methods the model1 argument must be the Mean1 and the model2 argument must be the Mean2 both contained inside the robinCompare output list.

robinFDATest(graph=graph, model1=comp$Mean1, model2=comp$Mean2, measure="vi")
## [1] "First step: basis expansion"
## Swapping 'y' and 'argvals', because 'y' is  simpler,
##   and 'argvals' should be;  now  dim(argvals) =  13 ;  dim(y) =  13 x 20 
## [1] "Second step: joint univariate tests"
## [1] "Third step: interval-wise combination and correction"
## [1] "creating the p-value matrix: end of row 2 out of 9"
## [1] "creating the p-value matrix: end of row 3 out of 9"
## [1] "creating the p-value matrix: end of row 4 out of 9"
## [1] "creating the p-value matrix: end of row 5 out of 9"
## [1] "creating the p-value matrix: end of row 6 out of 9"
## [1] "creating the p-value matrix: end of row 7 out of 9"
## [1] "creating the p-value matrix: end of row 8 out of 9"
## [1] "creating the p-value matrix: end of row 9 out of 9"
## [1] "Interval Testing Procedure completed"

## $ask
## [1] TRUE
robinGPTest(ratio=comp$ratios1vs2)
##  Profile  1 
##  Profile  2
## [1] 34.19659
robinAUC(graph=graph, model1=comp$Mean1, model2=comp$Mean2, measure="vi")
## $area1
## [1] 0.1686563
## 
## $area2
## [1] 0.1205174