---
title: "scMAGeCK"
author: "Wei Li, Xiaolong Cheng"
date: "Oct 22, 2019"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{scMAGeCK}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Introduction

scMAGeCK is a computational model to identify genes associated with multiple expression phenotypes from CRISPR screening coupled with single-cell RNA sequencing data (CROP-seq).

scMAGeCK is based on our previous [MAGeCK](http://mageck.sourceforge.net) and MAGeCK-VISPR models for pooled CRISPR screens, but further extends to scRNA-seq as the readout of the screening experiment. scMAGeCK consists of two modules: scMAGeCK-Robust Rank Aggregation (RRA), a sensitive and precise algorithm to detect genes whose perturbation links to one single marker expression; and scMAGeCK-LR, a linear-regression based approach that unravels the perturbation effects on thousands of gene expressions, especially from cells undergo multiple perturbations.

## Usage

### scmageck_rra

```{r}
    library(scMAGeCK)
    ### BARCODE file contains cell identity information, generated from the cell identity collection step
    BARCODE <- system.file("extdata","barcode_rec.txt",package = "scMAGeCK")
    
    ### RDS can be a Seurat object or local RDS file path that contains the scRNA-seq dataset
    RDS <- system.file("extdata","singles_dox_mki67_v3.RDS",package = "scMAGeCK")
    
    ### Set RRA executable file path. 
    ###    You can generate RRA executable file by following commands:
    ###        wget https://bitbucket.org/weililab/scmageck/downloads/RRA_0.5.9.zip
    ###        unzip RRA_0.5.9.zip
    ###        cd RRA_0.5.9
    ###        make
    RRAPATH <- "/Library/RRA_0.5.9/bin/RRA"
    
    target_gene <- "MKI67"
    
    rra_result <- scmageck_rra(BARCODE=BARCODE, RDS=RDS, GENE=target_gene, RRAPATH=RRAPATH, 
             LABEL='dox_mki67', NEGCTRL=NULL, KEEPTMP=FALSE, PATHWAY=FALSE, SAVEPATH=NULL)
    
```

### scmageck_lr

```{r}
    library(scMAGeCK)
    ### BARCODE file contains cell identity information, generated from the cell identity collection step
    BARCODE <- system.file("extdata","barcode_rec.txt",package = "scMAGeCK")
    ### RDS can be a Seurat object or local RDS file path that contains the scRNA-seq dataset
    RDS <- system.file("extdata","singles_dox_mki67_v3.RDS",package = "scMAGeCK")
    
    lr_result <- scmageck_lr(BARCODE=BARCODE, RDS=RDS, LABEL='dox_scmageck_lr', 
            NEGCTRL = 'NonTargetingControlGuideForHuman', PERMUTATION = 1000, SAVEPATH=NULL, LAMBDA=0.01)
    lr_score <- lr_result[1]
    lr_score_pval <- lr_result[2]
```


## Output

### scmageck_rra

The scmageck_rra function will output the ranking and p values of each perturbed genes, using the RRA program in MAGeCK. Users familiar with the MAGeCK program may find it similar with the [gene_summary](https://sourceforge.net/p/mageck/wiki/output/#gene_summary_txt) output in MAGeCK.

Here is the example output of scMAGeCK-RRA:

    Row.names  items_in_group.low  lo_value.low  p.low  FDR.low goodsgrna.low  items_in_group.high  lo_value.high  p.high  FDR.high  goodsgrna.high
    TP53    271     0.11832 0.95619 1       48      271     1.014e-83       4.9975e-06      0.00015 184

Explanations of each column are below:

|Column|Content|
|------|-------|
|Row.names| Perturbed gene name|
|items_in_group.low| The number of single-cells with each gene perturbed |
|lo_value.low | The RRA score in negative selection (reducing the marker expression if this gene is perturbed). The RRA score uses a p value from rank order statistics to measure the degree of selection; the smaller score, the stronger the selection is. More information on the calculation of RRA score can be found in our original [MAGeCK](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0554-4) paper. |      
|p.low  | The raw p-value (using permutation) of this gene in negative selection  |  
|FDR.low | The false discovery rate of this gene in negative selection |
|goodsgrna.low | The number of single-cells that passes the threshold and is considered in the RRA score calculation in negative selection|
|items_in_group.high| The same as items_in_group.low: the number of single-cells with each gene perturbed) |
|lo_value.high| The RRA score in positive selection (increasing the marker expression if this gene is perturbed|      
|p.high| The raw p-value (using permutation) of this gene in positive selection  |  
|FDR.high| The false discovery rate of this gene in positive selection |
|goodsgrna.high| The number of single-cells that passes the threshold and is considered in the RRA score calculation in positive selection|


### scmageck_lr

The scmageck_lr function will generate several files below:

|File|Description|
|----|----------|
|lr_score|The score (similar with log fold change) of each perturbed gene (rows) on each marker gene (columns)|
|lr_score.pval|The associated p values of each score|
|LR.RData|An R object to store scores and p values|

The format of score.txt and score.pval.txt is a simple table file with rows corresponding to perturbed genes and columns corresponding to marker genes. For example in the score.txt,

    Perturbedgene  APC                ARID1A               TP53               MKI67
         APC       0.138075836476524  -0.0343441660045313  0.214449590551132  -0.150287676553705


This row records the effects of perturbing APC gene on the expressions of APC, ARID1A, TP53 and MKI67.

## Contact us
Questions? Comments? Join the [MAGeCK Google group](https://groups.google.com/forum/?hl=en#!forum/mageck) or email us (<wli2@childrensnational.org>) directly.

Any advice and suggestions will be greatly appreciated.