---
title: "Copy number analysis"
author: "Anand Mayakonda"
date: "`r Sys.Date()`"
output: 
  html_document:
    toc: true
    toc_depth: 3
    toc_float: true
    number_sections: true
    self_contained: yes
    css: corp-styles.css
    highlight: pygments
vignette: >
  %\VignetteIndexEntry{04: Copy number analysis}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r}
library(maftools)
```

Note: This vignette was not evaluated.

# Introduction

`maftools` provides a set of functions to facilitate copy number analysis using [ASCAT](https://github.com/VanLoo-lab/ascat) for tumor-normal or tumor-only WGS datasets.
Although there exists [ascatNgs](https://github.com/cancerit/ascatNgs), it requires the installation of Perl and C modules to fetch the read counts across the markers. `maftools` bypass these requirements entirely within R with the C code baked in. However, `maftools` only generates the required read counts, BAF, and logR files. Downstream analyses have to be done with [ASCAT](https://github.com/VanLoo-lab/ascat). 

ASCAT is not available on CRAN or Bioconductor and needs to be installed from [GitHub](https://github.com/VanLoo-lab/ascat)

```{r, eval=FALSE}
remotes::install_github(repo = 'VanLoo-lab/ascat/ASCAT')
```

If you use `maftools` functions for CNV analysis, please cite the ASCAT publication

------------------------------------------------------------------------
***Van Loo P, Nordgard SH, LingjÃ¦rde OC, et al. Allele-specific copy number analysis of tumors. Proc Natl Acad Sci U S A. 2010;107(39):16910-16915. doi:10.1073/pnas.1009843107***
------------------------------------------------------------------------

# Step-1: Get nucleotide counts for genetic markers 

Below command will generate two tsv files `tumor_nucleotide_counts.tsv` and `normal_nucleotide_counts.tsv` that can be used for downstream analysis. Note that the function will process ~900K SNPs from [Affymetrix Genome-Wide Human SNP 6.0 Array](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL6801). The process can be sped up linearly by increasing `nthreads` which will launch each chromosome on a separate thread. 
Currently `hg19` and `hg38` are supported.

```{r, eval=FALSE}
#Matched normal BAM files are strongly recommended
counts = maftools::gtMarkers(t_bam = "tumor.bam",
                             n_bam = "normal.bam",
                             build = "hg19")
```

# Step-2: Prepare input files for ASCAT with `prepAscat()`

## Tumor-Normal pair

Below command takes `tumor_nucleotide_counts.tsv` and `normal_nucleotide_counts.tsv` files, filter SNPs with low coverage (default <15), estimate BAF, logR, and generates the input files for ASCAT.


```{r, eval=FALSE}
library(ASCAT)
ascat.bc = maftools::prepAscat(t_counts = "tumor_nucleotide_counts.tsv",
                               n_counts = "normal_nucleotide_counts.tsv",
                               sample_name = "tumor")

# Library sizes:
# Tumor:  1830168947
# Normal: 1321201848
# Library size difference: 1.385
# ------
# Counts file: tumor_nucleotide_counts.tsv
# Markers: 932148
# Removed 2982 duplicated loci
# Markers > 15: 928607
# ------
# Counts file: normal_nucleotide_counts.tsv
# Markers: 932148
# Removed 2982 duplicated loci
# Markers > 15: 928311
# ------
# Final number SNPs: 928107
# Generated following files:
# tumor_nucleotide_counts.tumour.BAF.txt
# tumor_nucleotide_counts.tumour.logR.txt
# tumor_nucleotide_counts.normal.BAF.txt
# tumor_nucleotide_counts.normal.logR.txt
# ------
```

Generated BAF and logR files can be processed with [ASCAT functions](https://www.crick.ac.uk/research/labs/peter-van-loo/software). 
The below code chunk shows minimal usage with ASCAT. See [here](https://github.com/VanLoo-lab/ascat/tree/master/ExampleData) for further workflow examples. 

```{r, eval=FALSE}

ascat.bc = ASCAT::ascat.loadData(
  Tumor_LogR_file = "tumor_nucleotide_counts.tumour.logR.txt",
  Tumor_BAF_file = "tumor_nucleotide_counts.tumour.BAF.txt",
  Germline_LogR_file = "tumor_nucleotide_counts.normal.logR.txt",
  Germline_BAF_file = "tumor_nucleotide_counts.normal.BAF.txt",
  chrs = c(1:22, "X", "Y"),
  sexchromosomes = c("X", "Y")
)

ASCAT::ascat.plotRawData(ASCATobj = ascat.bc, img.prefix = "tumor")
ascat.bc = ASCAT::ascat.aspcf(ascat.bc)
ASCAT::ascat.plotSegmentedData(ascat.bc)
ascat.output = ASCAT::ascat.runAscat(ascat.bc) 
```

## Tumor only

In tumor-only mode, read counts are normalized for median depth of coverage across autosomes. 

```{r, eval=FALSE}
ascat.bc = maftools::prepAscat_t(t_counts = "tumor_nucleotide_counts.tsv", sample_name = "tumor_only")

# Library sizes:
# Tumor: 1830168947
# Counts file: tumor_nucleotide_counts.tsv
# Markers: 932148
# Removed 2982 duplicated loci
# Markers > 15: 928607
# Median depth of coverage (autosomes): 76
# ------
# Generated following files:
# tumor_only.tumour.BAF.txt
# tumor_only.tumour.logR.txt
# ------
```

The output logR and BAF files can be processed with _ASCAT without matched normal data protocol_:

```{r, eval=FALSE}
ascat.bc = ASCAT::ascat.loadData(
  Tumor_LogR_file = "tumor_only.tumour.logR.txt",
  Tumor_BAF_file = "tumor_only.tumour.BAF.txt",
  chrs = c(1:22, "X", "Y"),
  sexchromosomes = c("X", "Y")
)

ASCAT::ascat.plotRawData(ASCATobj = ascat.bc, img.prefix = "tumor_only")
ascat.gg = ASCAT::ascat.predictGermlineGenotypes(ascat.bc) 
ascat.bc = ASCAT::ascat.aspcf(ascat.bc, ascat.gg=ascat.gg) 
ASCAT::ascat.plotSegmentedData(ascat.bc)
ascat.output = ASCAT::ascat.runAscat(ascat.bc) 
```

## CBS segmentation

Alternatively, tumor logR files generated by `prepAscat()`/`prepAscat_t()` can be processed with `segmentLogR()` function that performs circular binary segmentation and returns the [DNAcopy](https://bioconductor.org/packages/release/bioc/html/DNAcopy.html) object.

```{r, eval=FALSE}
maftools::segmentLogR(tumor_logR = "tumor.tumour.logR.txt", sample_name = "tumor")

# Analyzing: tumor 
#   current chromosome: 1 
#   current chromosome: 2 
#   current chromosome: 3 
#   current chromosome: 4 
#   current chromosome: 5 
#   current chromosome: 6 
#   current chromosome: 7 
#   current chromosome: 8 
#   current chromosome: 9 
#   current chromosome: 10 
#   current chromosome: 11 
#   current chromosome: 12 
#   current chromosome: 13 
#   current chromosome: 14 
#   current chromosome: 15 
#   current chromosome: 16 
#   current chromosome: 17 
#   current chromosome: 18 
#   current chromosome: 19 
#   current chromosome: 20 
#   current chromosome: 21 
#   current chromosome: 22 
#   current chromosome: MT 
#   current chromosome: X 
#   current chromosome: Y 
# Segments are written to: tumor_only.tumour_cbs.seg
# Segments are plotted to: tumor_only.tumour_cbs.png
```

# Processing Mosdepth output

[Mosdepth](https://github.com/brentp/mosdepth) offers the fastest way to estimate coverage metrics from WGS bam files. Output generated by mosdepth can be processed with maftools function `plotMosdepth` and `plotMosdepth_t` for CNV analysis by performing segmentation and plotting.

Below `mosdepth` command generates `tumor.regions.bed.gz` and `normal.regions.bed.gz` that contains depth of coverage across the genome in fixed windows.

```
mosdepth -n -b 5000 tumor tumor.bam
mosdepth -n -b 5000 normal normal.bam
```

The output `{prefix}.regions.bed.gz` can be imported and analyzed with `maftools` in tumor/normal or tumor only mode.

If you use the functions for CNV analysis, please cite the mosdepth publication

------------------------------------------------------------------------
***Pedersen BS, Quinlan AR. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics. 2018;34(5):867-868. doi:10.1093/bioinformatics/btx699***
------------------------------------------------------------------------

## Tumor normal pair

```{r, eval=FALSE}
plotMosdepth(
  t_bed = "tumor.regions.bed.gz",
  n_bed = "normal.regions.bed.gz",
  segment = TRUE,
  sample_name = "tumor"
)

# Coverage ratio T/N: 1.821
# Running CBS segmentation:
# Analyzing: tumor01 
#   current chromosome: 1 
#   current chromosome: 2 
#   current chromosome: 3 
#   current chromosome: 4 
#   current chromosome: 5 
#   current chromosome: 6 
#   current chromosome: 7 
#   current chromosome: 8 
#   current chromosome: 9 
#   current chromosome: 10 
#   current chromosome: 11 
#   current chromosome: 12 
#   current chromosome: 13 
#   current chromosome: 14 
#   current chromosome: 15 
#   current chromosome: 16 
#   current chromosome: 17 
#   current chromosome: 18 
#   current chromosome: 19 
#   current chromosome: 20 
#   current chromosome: 21 
#   current chromosome: 22 
#   current chromosome: X 
#   current chromosome: Y 
# Segments are written to: tumor01_cbs.seg
# Plotting
```

<p align="left">
<img src="https://user-images.githubusercontent.com/8164062/160853005-4f79ae83-8fb1-493c-b240-2b794274c36d.png" height="350" width="700">
</p>

## Tumor only 

Above tumor sample without the germline control, normalized for median depth of coverage

```{r, eval=FALSE}
plotMosdepth_t(bed = "tumor.regions.bed.gz")
```

<p align="left">
<img src="https://user-images.githubusercontent.com/8164062/160853684-85021668-a515-4e3d-8ccb-dddf992098d9.png" height="320" width="700">
</p>

# Session Info

```{r}
sessionInfo()
```