---
title: "Working with StarBioTrek package"
author: "Claudia Cava, Isabella Castiglioni"
date: "`r Sys.Date()`"
output: 
    BiocStyle::html_document:
        toc: true
        number_sections: false
        toc_depth: 2
        highlight: haddock

references:


- id: ref1
  title: graphite - a Bioconductor package to convert pathway topology to gene network
  author: 
  - family: Sales G, et al.
    given:
  journal: BMC bioinformatics
  volume: 13
  DOI: "10.1186/1471-2105-13-20"
  number: 
  pages: 20
  issued:
    year: 2012 
    
- id: ref2
  title: The GeneMANIA prediction server biological network integration for gene prioritization and predicting gene function
  author: 
  - family: Warde-Farley D, et al.
    given:
  journal: Nucleic Acids Res.
  volume: 28
  number: 
  pages: 214-20
  issued:
    year: 2010 
    
    
- id: ref3
  title: qgraph Network visualizations of relationships in psychometric data
  author: 
  - family:  S. Epskamp,  et al.
    given:
  journal:  Journal of Statistical Software
  volume: 48
  number: 4
  pages: 1-18
  issued:
    year: 2012

- id: ref4
  title: GOplot an R package for visually combining expression data with functional analysis
  author: 
  - family: Walter W, et al.
    given:
  journal: Bioinformatics
  volume: 31
  number: 17
  pages: 2912-4
  issued:
    year: 2015 



- id: ref6
  title: GC-content normalization for RNA-Seq data
  author: 
  - family: Risso, D., Schwartz, K., Sherlock, G., & Dudoit, S. 
    given:
  journal: BMC Bioinformatics
  volume: 12
  number: 1
  pages: 480
  issued:
    year: 2011 

- id: ref7
  title: Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma
  author: 
  - family: Noushmehr, H., Weisenberger, D.J., Diefes, K., Phillips, H.S., Pujara, K., Berman, B.P., Pan, F., Pelloski, C.E., Sulman, E.P., Bhat, K.P. et al.
    given:
  journal: Cancer cell
  volume: 17
  number: 5
  pages: 510-522
  issued:
    year: 2010

- id: ref8
  title: Molecular Profiling Reveals Biologically Discrete Subsets and Pathways of Progression in Diffuse Glioma
  author: 
  - family: Ceccarelli, Michele and Barthel, Floris P and Malta, Tathiane M and Sabedot, Thais S and Salama, Sofie R and Murray, Bradley A and Morozova, Olena and Newton, Yulia and Radenbaugh, Amie and Pagnotta, Stefano M and others
    given:
  journal: Cell
  URL: "http://doi.org/10.1016/j.cell.2015.12.028"
  DOI: "10.1016/j.cell.2015.12.028"
  volume: 164
  number: 3
  pages: 550-563
  issued:
    year: 2016


- id: ref9
  title: Comprehensive molecular profiling of lung adenocarcinoma
  author: 
  - family: Cancer Genome Atlas Research Network and others
    given:
  journal: Nature
  URL: "http://doi.org/10.1038/nature13385"
  DOI: "10.1038/nature13385"
  volume: 511
  number: 7511
  pages: 543-550
  issued:
    year: 2014


- id: ref10
  title: Comprehensive molecular characterization of gastric adenocarcinoma
  author: 
  - family: Cancer Genome Atlas Research Network and others
    given:
  journal: Nature
  URL: "http://doi.org/10.1038/nature13480"
  DOI: "10.1038/nature13480"
  issued:
    year: 2014

- id: ref11
  title: Comprehensive molecular portraits of human breast tumours
  author: 
  - family: Cancer Genome Atlas Research Network and others
    given:
  journal: Nature
  URL: "http://doi.org/10.1038/nature11412"
  DOI: "10.1038/nature11412"
  volume: 490
  number: 7418
  pages: 61-70
  issued:
    year: 2012
  
- id: ref12
  title: Comprehensive molecular characterization of human colon and rectal cancer
  author: 
  - family: Cancer Genome Atlas Research Network and others
    given:
  journal: Nature
  URL: "http://doi.org/10.1038/nature11252"
  DOI: "10.1038/nature11252"
  volume: 487
  number: 7407
  pages: 330-337
  issued:
    year: 2012    

- id: ref13
  title: Genomic classification of cutaneous melanoma
  author: 
  - family: Cancer Genome Atlas Research Network and others
    given:
  journal: Cell
  URL: "http://doi.org/10.1016/j.cell.2015.05.044"
  DOI: "10.1016/j.cell.2015.05.044"
  volume: 161
  number: 7
  pages: 1681-1696
  issued:
    year: 2015    

- id: ref14
  title: Comprehensive genomic characterization of head and neck squamous cell carcinomas
  author: 
  - family: Cancer Genome Atlas Research Network and others
    given:
  journal: Nature
  URL: "http://doi.org/10.1038/nature14129"
  DOI: "10.1038/nature14129"
  volume: 517
  number: 7536
  pages: 576-582
  issued:
    year: 2015    

- id: ref15
  title: The somatic genomic landscape of chromophobe renal cell carcinoma
  author: 
  - family: Davis, Caleb F and Ricketts, Christopher J and Wang, Min and Yang, Lixing and Cherniack, Andrew D and Shen, Hui and Buhay, Christian and Kang, Hyojin and Kim, Sang Cheol and Fahey, Catherine C and others
    given:
  journal: Cancer Cell
  URL: "http://doi.org/10.1016/j.ccr.2014.07.014"
  DOI: "10.1016/j.ccr.2014.07.014"
  volume: 26
  number: 3
  pages: 319-330
  issued:
    year: 2014    


- id: ref16
  title: Comprehensive genomic characterization of squamous cell lung cancers
  author: 
  - family: Cancer Genome Atlas Research Network and others
    given:
  journal: Nature
  URL: "http://doi.org/10.1038/nature11404"
  DOI: "10.1038/nature11404"
  volume: 489
  number: 7417
  pages: 519-525
  issued:
    year: 2012   

- id: ref17
  title: Integrated genomic characterization of endometrial carcinoma
  author: 
  - family: Cancer Genome Atlas Research Network and others
    given:
  journal: Nature
  URL: "http://doi.org/10.1038/nature12113"
  DOI: "10.1038/nature12113"
  volume: 497
  number: 7447
  pages: 67-73
  issued:
    year: 2013   

- id: ref18
  title: Integrated genomic characterization of papillary thyroid carcinoma
  author: 
  - family: Cancer Genome Atlas Research Network and others
    given:
  journal: Cell
  URL: "http://doi.org/10.1016/j.cell.2014.09.050"
  DOI: "10.1016/j.cell.2014.09.050"
  volume: 159
  number: 3
  pages: 676-690
  issued:
    year: 2014   

- id: ref19
  title: The molecular taxonomy of primary prostate cancer
  author: 
  - family: Cancer Genome Atlas Research Network and others
    given:
  journal: Cell
  URL: "http://doi.org/10.1016/j.cell.2015.10.025"
  DOI: "10.1016/j.cell.2015.10.025"
  volume: 163
  number: 4
  pages: 1011-1025
  issued:
    year: 2015   
    

- id: ref20
  title: Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma
  author: 
  - family: Linehan, W Marston and Spellman, Paul T and Ricketts, Christopher J and Creighton, Chad J and Fei, Suzanne S and Davis, Caleb and Wheeler, David A and Murray, Bradley A and Schmidt, Laura and Vocke, Cathy D and others
    given:
  journal: NEW ENGLAND JOURNAL OF MEDICINE
  URL: "http://doi.org/10.1056/NEJMoa1505917"
  DOI: "10.1056/NEJMoa1505917"
  volume: 374
  number: 2
  pages: 135-145
  issued:
    year: 2016    
    
- id: ref21
  title: Comprehensive molecular characterization of clear cell renal cell carcinoma
  author: 
  - family: Cancer Genome Atlas Research Network and others
    given:
  journal: Nature
  URL: "http://doi.org/10.1038/nature12222"
  DOI: "10.1038/nature12222"
  volume: 499
  number: 7456
  pages: 43-49
  issued:
    year: 2013        
          
- id: ref22
  title: Comprehensive Pan-Genomic Characterization of Adrenocortical Carcinoma
  author: 
  - family: Cancer Genome Atlas Research Network and others
    given:
  journal: Cancer Cell
  URL: "http://dx.doi.org/10.1016/j.ccell.2016.04.002"
  DOI: "10.1016/j.ccell.2016.04.002"
  volume: 29
  pages: 43-49
  issued:
    year: 2016 

- id: ref23
  title: Complex heatmaps reveal patterns and correlations in multidimensional genomic data
  author: 
  - family: Gu, Zuguang and Eils, Roland and Schlesner, Matthias
  given:
  journal: Bioinformatics
  URL: "http://dx.doi.org/10.1016/j.ccell.2016.04.002"
  DOI: "10.1016/j.ccell.2016.04.002"
  pages: "btw313"
  issued:
   year: 2016 

- id: ref24
  title: "TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages"
  author: 
  - family:  Silva, TC and Colaprico, A and Olsen, C and D'Angelo, F and Bontempi, G and Ceccarelli, M and Noushmehr, H
  given:
  journal: F1000Research
  URL: "http://dx.doi.org/10.12688/f1000research.8923.1"
  DOI: "10.12688/f1000research.8923.1"
  volume: 5
  number: 1542
  issued:
   year: 2016 

- id: ref25
  title: "StarBioTrek: an R/Bioconductor package for integrative analysis of TCGA data"
  author: 
  - family:  Colaprico, Antonio and Silva, Tiago C. and Olsen, Catharina and Garofano, Luciano and Cava, Claudia and Garolini, Davide and Sabedot, Thais S. and Malta, Tathiane M. and Pagnotta, Stefano M. and Castiglioni, Isabella and Ceccarelli, Michele and Bontempi, Gianluca and Noushmehr, Houtan
  given:
  journal: Nucleic Acids Research
  URL: "http://dx.doi.org/10.1093/nar/gkv1507"
  DOI: "10.1093/nar/gkv1507"
  volume: 44
  number: 8
  pages: e71
  issued:
   year: 2016 

vignette: >
  %\VignetteIndexEntry{Vignette Title}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---
  
```{r setup, include=FALSE}
knitr::opts_chunk$set(dpi = 300)
knitr::opts_chunk$set(cache=FALSE)
```

```{r, echo = FALSE,hide=TRUE, message=FALSE,warning=FALSE}
devtools::load_all(".")
```
# Introduction 

Motivation: 
New technologies have made possible to identify marker gene signatures. However, gene expression-based signatures present some limitations because they do not consider metabolic role of the genes and are affected by genetic heterogeneity across patient cohorts. Considering the activity of entire pathways rather than the expression levels of individual genes can be a way to exceed these limits [@ref12].
This tool `StarBioTrek` presents some methodologies to measure pathway activity and cross-talk among pathways integrating also the information of network and TCGA data. New measures are under development.  

# Installation

To install use the code below.

```{r, eval = FALSE}
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("StarBioTrek")
```



# `Get data`: Get pathway and network data

## `SELECT_path_species`: Select the pathway database and species of interest

The user can select the pathway database and species of interest using some functions implemented in graphite [@ref1]

```{r, eval = TRUE}
library(graphite)
sel<-pathwayDatabases()
```

```{r, eval = TRUE, echo = FALSE}
knitr::kable(sel, digits = 2,
             caption = "List of patwhay databases and species",row.names = FALSE)
```

## `GetData`: Searching pathway data for download 

The user can easily search pathways data and their genes using the `GetData` function. It can download  pathways from several databases and species using the following parameters:

```{r, eval = TRUE}
species="hsapiens"
pathwaydb="kegg"
path<-GetData(species,pathwaydb)
```


## `GetPathData`: Get  genes inside pathways

The user can identify the genes inside the pathways of interest

```{r, eval = FALSE}
pathway_ALLGENE<-GetPathData(path_ALL=path[1:3])
```

## `GetPathNet`: Get interacting genes inside pathways

`GetPathNet` generates a list of interacting genes for each pathway 


```{r, eval = FALSE}
pathway_net<-GetPathNet(path_ALL=path[1:3])
```




## `ConvertedIDgenes`: Get  genes inside pathways

The user can convert the gene ID into GeneSymbol

```{r, eval = TRUE}
pathway<-ConvertedIDgenes(path_ALL=path[1:10])
```




## `getNETdata`: Searching network data for download 
You can easily search human network data from GeneMania using the `getNETdata` function [@ref2].
The network category can be filtered using the following parameters: 

* **PHint** Physical_interactions 
* **COloc**  Co-localization 
* **GENint** Genetic_interactions 
* **PATH** Pathway 
* **SHpd**  Shared_protein_domains 

The species can be filtered using the following parameters:
* **Arabidopsis_thaliana**
* **Caenorhabditis_elegans**
* **Danio_rerio**
* **Drosophila_melanogaster**
* **Escherichia_coli**
* **Homo_sapiens**
* **Mus_musculus**
* **Rattus_norvegicus**
* **Saccharomyces_cerevisiae**

For default the organism is homo sapiens. 
The example show the shared protein domain network for Saccharomyces_cerevisiae. For more information see `SpidermiR` package.

```{r, eval = TRUE}
organismID="Saccharomyces_cerevisiae"
netw<-getNETdata(network="SHpd",organismID)
```


# `Integration data`: Integration between pathway and network data 

## `path_net`: Network of interacting genes for each pathway according a network type (PHint,COloc,GENint,PATH,SHpd)

The function `path_net` creates a network of interacting genes (downloaded from GeneMania) for each pathway. Interacting genes are genes belonging to the same pathway and the interaction is given from network chosen by the user, according the paramenters of the function `getNETdata`.
The output will be a network of genes belonging to the same pathway.  

```{r, eval = TRUE}
lista_net<-pathnet(genes.by.pathway=pathway[1:5],data=netw)
```


## `list_path_net`: List of interacting genes for each pathway (list of genes) according a network type (PHint,COloc,GENint,PATH,SHpd)

The function `list_path_net` creates a list of interacting genes for each pathway. Interacting genes are genes belonging to the same pathway and the interaction is given from network chosen by the user, according the paramenters of the function `getNETdata`.
The output will be a list of genes belonging to the same pathway and those having an interaction in the network.  

```{r, eval = TRUE}
list_path<-listpathnet(lista_net=lista_net,pathway=pathway[1:5])
```






# `Pathway summary indexes`: Score for each pathway 

## `GE_matrix`: grouping gene expression profiles in pathways
Get human KEGG pathway data and a gene expression matrix in order to obtain a matrix with the gene expression levels grouped by pathways.

Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function `GE_matrix` creates a profile of gene expression levels for each pathway given by the user:

```{r, eval = TRUE}
list_path_gene<-GE_matrix(DataMatrix=tumo[,1:2],genes.by.pathway=pathway[1:10])
```




## `GE_matrix_mean`: 
Get human KEGG pathway data and a gene expression matrix in order to obtain a matrix PXG (in the columns there are the pathways and in the rows there are genes) with the mean gene expression for only genes given containing in the pathways given in input by the user.

```{r, eval = TRUE}
list_path_plot<-GE_matrix_mean(DataMatrix=tumo[,1:2],genes.by.pathway=pathway[1:10])
```



## `average`: Average of genes for each pathway starting from a matrix of gene expression 
Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function `average` creates an average matrix (SXG: S are the samples and P the pathways) of gene expression for each pathway:

```{r, eval = FALSE}
score_mean<-average(pathwayexpsubset=list_path_gene)
```



## `stdv`: Standard deviations of genes for each pathway starting from a matrix of gene expression 
Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function `stdv` creates a standard deviation matrix of gene expression for each pathway:

```{r, eval = TRUE}
score_st_dev<-stdv(gslist=list_path_gene)
```




# `Pathway cross-talk indexes`: Score for pairwise pathways 




## `eucdistcrtlk`: Euclidean distance for cross-talk measure
Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function `eucdistcrtlk` creates an euclidean distance matrix of gene expression for pairwise pathway.

```{r, eval = FALSE}
score_euc_distance<-eucdistcrtlk(dataFilt=tumo[,1:2],pathway_exp=pathway[1:10])
```


## `dsscorecrtlk`: Discriminating score for cross-talk measure
Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function `dsscorecrtlk` creates an discriminating score matrix for pairwise pathway as measure of cross-talk. Discriminating score is given by |M1-M2|/S1+S2 where M1 and M2 are mean and S1 and S2 standard deviation of expression levels of genes in a pathway 1 and and in a pathway 2 . 


```{r, eval = FALSE}
cross_talk_st_dv<-dsscorecrtlk(dataFilt=tumo[,1:2],pathway_exp=pathway[1:10])
```


# `Selection of pathway cross-talk`: Selection of pathway cross-talk

## `svm_classification`: SVM classification

Given the substantial difference in the activities of many pathways between two classes (e.g. normal and cancer), we examined the effectiveness to classify the classes based on their pairwise pathway profiles. 
This function is used to find the interacting pathways that are altered in a particular pathology in terms of Area Under Curve (AUC).AUC was estimated by cross-validation method (k-fold cross-validation, k=10).It randomly selected some fraction of TCGA data (e.g. nf= 60; 60% of original dataset) to form the training set and then assigned the rest of the points to the testing set (40% of original dataset). For each pairwise pathway the user can obtain using the methods mentioned above a score matrix ( e.g.dev_std_crtlk ) and can  focus on the pairs of pathways able to differentiate a particular subtype with respect to the normal type.

```{r, eval = FALSE}
nf <- 60
res_class<-svm_classification(TCGA_matrix=score_euc_dista[1:30,],nfs=nf,
normal=colnames(norm[,1:10]),tumour=colnames(tumo[,1:10]))
```



# `IPPI`: Driver genes for each pathway

The function `IPPI`, using pathways and networks data, calculates the driver genes for each pathway. Please see Cava et al. BMC Genomics 2017.  

```{r, eval = FALSE}
 DRIVER_SP<-IPPI(pathax=pathway_matrix[,1:3],netwa=netw_IPPI[1:50000,])
```

# `Visualization`: Gene interactions and pathways

StarBioTrek presents several functions for the preparation to the visualization of gene-gene interactions and pathway cross-talk using the qgraph package [@ref3]. The function plotcrosstalk
prepares the data:

```{r, eval = TRUE}
formatplot<-plotcrosstalk(pathway_plot=pathway[1:6],gs_expre=tumo)
library(qgraph)
qgraph(formatplot[[1]], minimum = 0.25, cut = 0.6, vsize = 5, groups = formatplot[[2]], legend = TRUE, borders = FALSE,layoutScale=c(0.8,0.8))
```

```{r, eval = TRUE}
qgraph(formatplot[[1]],groups=formatplot[[2]], layout="spring", diag = FALSE,
cut = 0.6,legend.cex = 0.5,vsize = 6,layoutScale=c(0.8,0.8))
```


A circle can be generated using the function  `circleplot` [@ref4]. A score for each gene can be assigned.

```{r, eval = FALSE}
formatplot<-plotcrosstalk(pathway_plot=pathway[1:6],gs_expre=tumo)
score<-runif(length(formatplot[[2]]), min=-10, max=+10)
circleplot(preplot=formatplot,scoregene=score)
```

```{r, fig.width=6, fig.height=4, echo=FALSE, fig.align="center"}
library(png)
library(grid)
img <- readPNG("circleplot.png")
grid.raster(img)
```


******

### Session Information
******
```{r sessionInfo}
sessionInfo()
```

# References