---
title: "The iPath User's Guide"
author:  
  - name: Kenong Su
    email: kenong.su@emory.edu
  - name: Steve Qin
    email: zhaohui.qin@emory.edu
shorttitle: iPath guide
package: iPath
abstract: >
   This vignette introduces the usage of the Bioconductor package iPath (individualized pathway analysis), which is capable of identifying highly predictive biomarkers for clinical outcomes. It includes two major steps: calculating the personalized iES for each sample and each pathway, and investigating whether stratified tumor samples are associated with clinical. Here, we introduce iPath, or individual-level pathway analysis, to quantify the magnitude of alteration occurring for a particular pathway at the individual sample level. Our goal is to understand cancer one tumor sample at a time outcomes. 
vignette: >
  %\VignetteIndexEntry{The iPath User's Guide}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
  %\usepackage[utf8]{inputenc}
output:
  BiocStyle::html_document:
    toc: true
    toc_float:
      collapsed: true
      smooth_scroll: true
    fig_width: 5
---
<style type="text/css">
  pre:not([class]) {
    background-color: white;
  }
  .tocify-subheader > .tocify-item {
  text-indent: 2px;}
</style>


```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

\vspace{.1in}
# Installation and help

## Install iPath
To install this package, start R (version  > "4.0") and enter:

```{r quickYo, eval = FALSE}
if(!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("iPath")
```

## Help for iPath
If you have any iPath-related  questions, please post to the GitHub Issue section of iPath at https://github.com/suke18/iPath/issues, which will be helpful for the construction of iPath. 

# Introduction
## Background

Identifying biomarkers to predict the clinical outcomes of individual patients is a fundamental problem in clinical oncology. Multiple single-gene biomarkers have already been identified and used in the clinics. However, multiple oncogenes or tumor-suppressor genes are involved during the process of tumorigenesis. Additionally, the efficacy of single-gene biomarkers is limited by the extensively variable expression levels measured by high-throughput assays. In this study, we hypothesize that in individual tumor samples, the disruption of transcription homeostasis in key pathways or gene set plays an important role in tumorigenesis and has profound implications for the patient’s clinical outcome. We devised a computational method named iPath to identify, at the individual sample level, which pathways or gene sets significantly deviate from their norms. We conducted a pan-cancer analysis and demonstrated that iPath is capable of identifying highly predictive biomarkers for clinical outcomes, including overall survival, tumor subtypes, and tumor stage classifications.

## Citation
- For using (**iPath**) feature selection procedure, DOI is 10.1016/j.crmeth.2021.100050
- For downloading TCGA dataset, https://www.bioconductor.org/packages/release/bioc/html/RTCGA.html


# Calculate iES
iPath requires an normalized expression matrix with rows representing the genes and columns representing the samples. To preprocess the expression matrix, iPath filters out the genes depending on standard deviations (sd). Here, we sampled PRAD TCGA dataset for illustration. It is noted that iPath requires a gene set database (GSDB) as another input, which can be obtained by the MSigDB database. 

## Load the data
The `PRAD_data` dataset is loaded with three objects including the RPKM expression matrix (`prad_exprs`), corresponding phenotype information (prad_inds). `prad_inds` is the binary vector with 0 representing normal and 1 representing tumor sample, and simulated clinical dataset (`prad_cli`).
```{r load_data, eval = TRUE,  message=FALSE, results='hide', include = TRUE}
library(iPath)
data(PRAD_data)
dim(prad_exprs)
data(GSDB_example)
head(prad_cli)
```

## Calculate iES per sample per pathway
The core of iPath is to calculate the iES score for each patient and pathway. The function *iES_cal2* requires two input an expression matrix and gene set database (GSDB). The returned matrix contains iES with rows corresponding to the pathways and columns corresponding to the samples. 

```{r cal_iES2, eval = TRUE, message=FALSE, results='hide', include = TRUE}
iES_mat = iES_cal2(Y = prad_exprs, GSDB = GSDB_example)
iES_mat[1:2, 1:4]
```

# Test association with survival outcomes
After computing iES matrix, it is important to investigate whether the classified normal-like and perturbed groups exist significance different in terms of survival outcomes. To perform the classifcaiton in tumor samples, we use normal sampels as reference by fiting a Guassian Mixture. The investigation is conducted for each individual pathway. `iES_surv` function inputs the iES matrix from the `iES_cal2` step, the clinical data, and the binary vector indicating the patient phenotypes; for example, 0 represents normal sample and 1 represents tumor sample. 
```{r survival, eval = TRUE, message=FALSE, include = TRUE}
surv_outcomes = iES_surv(iES_mat = iES_mat, cli = prad_cli, indVec = prad_inds)
head(surv_outcomes)
```
# Data visualization
## waterfall 
We also provide two forms of visualization for iES scores. One is the waterfall plot ranked from the smallest to the largest. 
```{r waterfall, eval = TRUE, message=FALSE, include = TRUE}
water_fall(iES_mat, gs_str = "SimPathway2", indVec = prad_inds)
density_fall(iES_mat, gs_str = "SimPathway2", indVec = prad_inds)
```

## one survival outcomes
```{r onesurvival, eval = TRUE, message=FALSE, include = TRUE}
iES_survPlot(iES_mat = iES_mat, cli = prad_cli, gs_str = "SimPathway1", indVec = prad_inds, title = TRUE)
```

# Session Info
```{r}
sessionInfo()
```