---
title: "LACHESIS - Real-time inference of evolutionary dynamics during tumor initiation based on whole genome sequencing data"
date: "`r format(Sys.Date(), '%m/%d/%Y')`"
abstract: >
  Understanding tumor evolution, particularly the temporal order of driver mutations, is increasingly important for improving diagnosis and treatment strategies. A recent study by Körber et al. (Nature Genetics, 2023) identified a bimodal distribution in the timing of the most recent common ancestor (MRCA) in neuroblastomas. This early or late MRCA distinction strongly correlated with clinical outcome, suggesting its potential as a prognostic biomarker. To facilitate the application of these findings, we developed LACHESIS, a package to estimate MRCA timing using the molecular clock of somatic single nucleotide variant (SSNV) accumulation. LACHESIS is based on the statistical model introduced by Körber et al. and provides tools for analyzing tumor evolution. This vignette details the functionality of the package and demonstrates its application through typical workflows, enabling users to integrate MRCA timing into their analyses.
  LACHESIS package version: `r packageVersion("LACHESIS")`

output:
  BiocStyle::html_document:
    highlight: pygments
    toc: true
    toc_float:
      toc_collapsed: true
    toc_depth: 3
    number_sections: false
    fig_width: 5
vignette: >
    %\VignetteIndexEntry{LACHESIS - Real-time inference of evolutionary dynamics during tumor initiation based on whole genome sequencing data}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}
    %\usepackage[utf8]{inputenc}
---

```{r setup, echo=FALSE, results="hide"}
knitr::opts_chunk$set(
    tidy = FALSE,
    cache = FALSE,
    dev = "CairoPNG",
    message = FALSE, error = FALSE, warning = TRUE
)
```

# Installation

To install this package, start R and enter:

```{r install, eval=FALSE}
if (!require("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("LACHESIS")
```

For older versions of R, please refer to the appropriate [Bioconductor release.](https://bioconductor.org/about/release-announcements/)

# Citation

If you use LACHESIS in published research, please cite:

  *Körber V, Stainczyk SA, Kurilov R, Henrich KO, Hero B, Brors B, Westermann F, Höfer T. (2023)
  Neuroblastoma arises in early fetal development and its evolutionary duration predicts outcome.
  *Nat Genet.*, **55**:619-630.
  [10.1038/s41588-023-01332-y](https://www.nature.com/articles/s41588-023-01332-y)*


<!-- Other R/ Bioconductor Packages with similar aims are  .... -->

# Input Data

LACHESIS takes sample names, somatic single nucleotide variants and allele-specific copy number information as input. Though not strictly required, you may, ideally, also provide an estimate for the tumor cell purity and ploidy. Sample name, paths to respective variant files,  and optionally further specifications can either be specified by vectors or in a tab-delimited **sample-specification file**. The package can be run on a single sample or on a cohort. A detailed **log file** will be generated, capturing the date, time, and provided input parameters.

#### Copy Number Variation (CNV) File

The CNV file is a tab-delimited file that contains segmented copy number information for your sample of interest. The CNV file requires columns for the **chromosome number**, **start** and **end position** of the segment, and either the **total copy number** or the **number of A- and B-alleles**. We encourage you to provide allele-specific copy number information; if this information is not provided, LACHESIS will assume that a given total copy arose by the minimal number of genomic changes (e.g., whole genome duplication, resulting in a 2:2 configuration for a total copy number of 4 or single-chromosomal missegregation, resulting in a 2:1 configuration for a copy number of 3), which may yield wrong results in some cases. 

#### Single Nucleotide Variant (SNV) File

The SNV file should be in VCF format (Variant Call Format), a standardized text file format with a **header section** (lines starting with # and ##) and a **data section** where each row represents a variant. We recommend using only the variants that have passed the filtering process.

# Package Output

LACHESIS utilizes whole genome sequencing data to determine the mutational timing of a tumor's most recent common ancestor (MRCA). In addition, it will time the emergence of clonal (and thus early) copy number gains relative to the MRCA. If multiple chromosomal gains preceded the MRCA, LACHESIS tests whether they can be explained by a single aneuploidization event, in an earlier common ancestor (ECA) of the tumor, or whether they resulted from sequential chromosomal misseggregations. LACHESIS generates an output folder for each patient ID, which contains the following information:

#### Datatables (.txt)

* **SNV_timing_per_segment_ID.txt:** purity, ploidy, and the estimated mean, lower, and upper bounds for the timing of the tumor's MRCA and ECA (if detected)
* **SNV_timing_per_SNV_ID.txt:** SNV context, clonality status and association with ECA/ MRCA.
* **MRCA_densities_ID.txt:** per-segment mutation time, differentiated by minor and major copy number (A and B) For more details go to [Estimating Mutation Densities at ECA/MRCA (MRCA)](#estimating-mutation-densities-at-ecamrca-mrca)

#### Graphs (.pdf)

* **SNV_densities.pdf:** histograms of normalized, mean single-copy and multi-copy mutation densities and timeline of early tumor evolution
* **VAF_histogram_strat.pdf:** measured copy numbers along the genome and variant allele frequency (VAF) histograms of SNVs stratified by copy number and minor/major allele count
* **VAF_histogram:** variant allele frequency histogram and density distribution of all SNVs
* **SNV_timing_per_SNV_ID.pdf:** clonality status of SNVs associated with specific chromosomal gains

Additionally, if the function was applied to a cohort, the following files will be included:

* **SNV_densities_cohort.pdf:** histograms and cumulative distributions (with 95% confidence intervals) of mean mutation densities per segment at MRCA and ECA
* **Driver_mutations_cohort.pdf:** known driver SNVs and their evolutionary timing clonal (pre-CNV, post-CNV, unknown) or subclonal
* **lachesis_clin.par.pdf:** correlation between SNV densities computed by LACHESIS and clinical parameters (age, survival,...)
* **survival_plots.pdf:** Kaplan-Meier plots stratified by MRCA density if providing survival information

## Structure

There are two ways to perform analyses using LACHESIS: either by utilizing the sub-functions individually or by calling the wrapper function. All of the functions presented in the main workflow are also invoked by the wrapper function. In addition, the wrapper function summarizes results across samples, allowing cohort analysis to be performed.

```{r structure, echo=FALSE, fig.show='hold', fig.wide = TRUE}
knitr::include_graphics(system.file("extdata", "LACHESIS_package_structure.png", package = "LACHESIS"))
```

# LACHESIS Main Workflow

## Importing Data

### Importing Copy Number Information (readCNV)

The `readCNV` function converts a user-specified BED file containing copy number information into a standardized format. It performs several quality checks on the input file and returns a clean, structured dataframe. 
Users can specify column identifiers for chromosomal positions and allele-specific copy number (A and B alleles) using either column names or indices. If no identifiers are provided, the function will attempt to identify the relevant columns using standard nomenclature (e.g., "chrom", "start", "end", etc.). 

#### Handling Missing Allele Information
If total copy number information is available but the number of A and B alleles is missing, the function estimates allele counts. To this end, the function assumes the molecular scenario with minimal step count to be the most likely one. For example, a total copy number of 4 is assumed to stem from whole genome duplication, corresponding to a 2:2 configuration; likewise a total copy number of 3 is assumed to stem from chromosomal missegregation, corresponding to a 2:1 configuration. Formally, the total copy number is divided by two, with one allele rounded up and the other rounded down.

#### Input 

```{r readCNV_function, eval = FALSE}
readCNV(cn.info = NULL, chr.col = NULL, start.col = NULL, end.col = NULL, A.col = NULL, B.col = NULL, tcn.col = NULL, merge.tolerance = 10^5, ignore.XY = TRUE, max.cn = 4, tumor.id = NULL)
```

#### Output

A `data.table` containing the following information:

```{r readCNV_output, echo=FALSE}
library(knitr)

readcnv_output <- data.frame(
    Column1 = c("*Chr*", "*Start*", "*End*", "*A*", "*B*", "*TCN*"),
    Column2 = c(
        "the chromosome number",
        "start position of the segment",
        "end position of the segment",
        "copy number of the major allele",
        "copy number of the minor allele",
        "the total copy number"
    )
)

kable(readcnv_output, col.names = c("Output", "Definition"), escape = FALSE)
```

#### Example using output from ACESeq as copy number information

```{r readCNV_example1, message=FALSE, warning=FALSE}
aceseq_cn <- system.file("extdata", "ACESeq/NBE11_comb_pro_extra2.59_0.83.txt", package = "LACHESIS")
cn_data <- LACHESIS::readCNV(aceseq_cn)
head(cn_data, n = 1)
```

#### Example using output from ASCAT as copy number information

```{r readCNV_example2, message=FALSE, warning=FALSE}
ascat_cn <- system.file("extdata", "ASCAT/S98.segments.txt", package = "LACHESIS")
cn_data <- LACHESIS::readCNV(ascat_cn)
head(cn_data, n = 1)
```


#### Example using output from PURPLE as copy number information

```{r readCNV_example3, message=FALSE, warning=FALSE}
purple_cn <- system.file("extdata", "PURPLE/NB-S-599-T.purple.cnv.somatic.tsv", package = "LACHESIS")
cn_data <- LACHESIS::readCNV(purple_cn)
head(cn_data, n = 1)
```

### Importing Variant Information (readVCF)  

The `readVCF` function is used to import a VCF file and extract key genomic information such as read counts and VAFs. Supported tools to generate the VCF file are [Mutect2](https://gatk.broadinstitute.org/hc/en-us/articles/360037593851-Mutect2), [Strelka2](https://github.com/Illumina/strelka?tab=readme-ov-file), [Sentieon](https://support.sentieon.com/manual/) and the [DKFZ SNV Calling Workflow](https://github.com/DKFZ-ODCF/SNVCallingWorkflow).

#### Input 

```{r readVCF_function, eval=FALSE}
readVCF(vcf = NULL, ignore.XY = TRUE, vcf.source = "strelka", min.vaf = 0.01, min.depth = 30, t.sample = NULL, info.af = "AF", info.dp = "DP", filter.value = "PASS")
```

#### Output

A `data.table` containing the following information:

```{r readVCF_output, echo=FALSE}
library(knitr)

readvcf_output <- data.frame(
    Column1 = c("*chrom*", "*pos*", "*ref*", "*alt*", "*t_ref_count*", "*t_alt_count*", "*t_depth*", "*t_vaf*"),
    Column2 = c(
        "the chromosome",
        "the chromosomal coordinate of the SNV",
        "the reference nucleotide at this position",
        "the alternate nucleotide at this position",
        "the number of reads reporting the reference nucleotide",
        "the number of reads reporting the alternate nucleotide",
        "the read coverage at this position",
        "the variant allele frequency of the SSNV"
    )
)

knitr::kable(readvcf_output, col.names = c("Output", "Definition"), escape = FALSE)
```

#### Example: SNV calls obtained with Mutect

```{r readVCF_example1, message=FALSE, warning=FALSE}
mutect_vcf <- system.file("extdata", "mutect.somatic.vcf.gz", package = "LACHESIS")
m_data <- LACHESIS::readVCF(vcf = mutect_vcf, vcf.source = "mutect", filter.value = ".")
head(m_data, n = 1)
```

#### Example: SNV calls obtained with Strelka

```{r readVCF_example2, message=FALSE, warning=FALSE}
strelka_vcf <- system.file("extdata", "strelka2.somatic.snvs.vcf.gz", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = strelka_vcf, vcf.source = "strelka", filter.value = "PASS")
head(s_data, n = 1)
```

#### Example: SNV calls obtained with the dkfz SNV calling workflow

```{r readVCF_example3, message=FALSE, warning=FALSE}
dkfz_vcf <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
d_data <- LACHESIS::readVCF(vcf = dkfz_vcf, vcf.source = "dkfz")
head(d_data, n = 1)
```


### Plotting Variant Allele Frequencies (plotVAFdistr) 

The `plotVAFdistr` function generates a VAF histogram of somatic SNVs. If `showdensity` is set to `TRUE`, an additional density plot is displayed.

#### Input  

```{r plotVAF_function, eval = FALSE}
plotVAFdistr(vaf = NULL, vaf.interval = 0.05, t_sample = NULL, vaf.show.counts = FALSE, vaf.show.density = TRUE, vaf.col = "#34495e", vaf.border = "#bdc3c7", srtcounts = 45, output.file = NULL)
```

#### Example

```{r plot1, message=FALSE, warning=FALSE, fig.cap="VAF distribution of SSNVs"}
strelka_vcf <- system.file("extdata", "strelka2.somatic.snvs.vcf.gz", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = strelka_vcf, vcf.source = "strelka")
head(s_data, n = 1)
LACHESIS::plotVAFdistr(s_data)
```

This VAF histogram provides a first overview of the tumor’s mutational landscape. The observed clonal (~0.3) and subclonal (~0.1) peaks align with expectations based on the tumor purity (0.83) and ploidy (2.59). 

### Assigning copy number states to single nucleotide variants (nbImport)

The function `nbImport` integrates CNV and SNV information into a single datatable. Each variant is assigned to its corresponding copy number segment and status. 

#### Input  

```{r nbImport_function, eval = FALSE}
nbImport(cnv = NULL, snv = NULL, purity = NULL, ploidy = NULL)
```

##### Assignment of mutational signatures

LACHESIS can incorporate mutational signatures generated by either [SigProfilerAssignment](https://github.com/AlexandrovLab/SigProfilerAssignment) or [MutationalPatterns](https://bioconductor.org/packages/release/bioc/html/MutationalPatterns.html). SigProfilerAssignment is a Python-based tool developed by the Alexandrov Lab, whereas MutationalPatterns is an R package available through Bioconductor. Both tools assign previously known mutational signatures to sequenced samples. If signature assignment has already been performed with SigProfilerAssignment, the outputfile, typically named "Decomposed_MutationType_Probabilities.txt", can be provided via sig.file parameter. If sig.file is not specified and sig.assign = TRUE, the probabilities for the SBS signatures will be automatically computed using MutationalPatterns.  

A global random seed (set.seed()) should be specified before running LACHESIS to create reproducable results.

#### Output

A `data.table` containing the following information:

```{r nbImport_output, echo=FALSE}
library(knitr)

nb_import_output <- data.frame(
    Column1 = c("*chrom*", "*cn_start*", "*cn_end*", "*A*", "*B*", "*TCN*", "*snv_start*", "*ref*", "*alt*", "*t_ref_count*", "*t_alt_count*", "*t_depth*", "*t_vaf*", "*snv_end*"),
    Column2 = c(
        "the chromosome",
        "start of the segment",
        "end of the segment",
        "copy number of the major allele",
        "copy number of the minor allele",
        "the total copy number",
        "the chromosomal start coordinate of the SNV",
        "the reference nucleotide at this position",
        "the alternate nucleotide at this position",
        "the number of reads reporting the reference nucleotide",
        "the number of reads reporting the alternate nucleotide",
        "the read coverage at this position",
        "the variant allele frequency of the SSNV",
        "the chromosomal end coordinate of the SNV"
    )
)

knitr::kable(nb_import_output, col.names = c("Output", "Definition"), escape = FALSE)
```

#### Example using all variants from vcf file

```{r nbImport_example1, message=FALSE, warning=FALSE}
snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = snvs, vcf.source = "dkfz")
aceseq_cn <- system.file("extdata", "NBE15", "NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS")
c_data <- LACHESIS::readCNV(aceseq_cn)
nb <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51)
```

#### Example using variants associated with specific SBS mutational signatures from vcf file

```{r nbImport_example2, message=FALSE, warning=FALSE}
snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = snvs, vcf.source = "dkfz")
aceseq_cn <- system.file("extdata", "NBE15", "NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS")
c_data <- LACHESIS::readCNV(aceseq_cn)
sig.filepath <- system.file("extdata", "NBE15_Decomposed_MutationType_Probabilities.txt", package = "LACHESIS")
head(sig.filepath, n = 1)
nb <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51, sig.assign = TRUE, ID = "NBE15", sig.file = sig.filepath, sig.select = c("SBS1", "SBS5", "SBS40a", "SBS18"))
head(nb, n = 1)
nb.2 <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51, sig.assign = TRUE, ID = "NBE15", sig.select = c("SBS1", "SBS5", "SBS40", "SBS18"))
head(nb.2, n = 1)
```

### Plotting Imported Data (plotNB)  

This function visualizes the results generated by `nbImport`. The top plot displays the inferred copy numbers along the genome. By default, the human genome build hg19 is used as reference. However, alternative genome builds such as hg18 or hg38 can be specified if required.
The bottom plots show VAF histograms of SNVs, stratified by copy number and minor/major alleles, as well as the clonality status of each SNV.

#### Input 

```{r plotNB_function, eval = FALSE}
plotNB(nb = NULL, snvClonality = NULL, ref.build = "hg19", min.cn = 2, max.cn = 4, nb.col.abline = "gray70", nb.col.cn.2 = "#7f8c8d", nb.col.cn = "#16a085", nb.col.hist = "#34495e", nb.border = NA, nb.breaks = 100, samp.name = NULL, output.file = NULL, sig.show = FALSE)
```

#### Example using all variants from vcf file

```{r plot2, message=FALSE, warning=FALSE, fig.cap="*top plot* - copy number profile along the genome, *bottom plots* - VAF distribution of SSNVs stratified by copy number and minor/major alleles"}
snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = snvs, vcf.source = "dkfz")
aceseq_cn <- system.file("extdata", "NBE15", "NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS")
c_data <- LACHESIS::readCNV(aceseq_cn)
nb <- snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
nb <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51)
cl_muts <- LACHESIS::clonalMutationCounter(nb)
norm_muts <- LACHESIS::normalizeCounts(cl_muts)
mrca <- LACHESIS::MRCA(norm_muts)
snvClonality <- LACHESIS::estimateClonality(nbObj = nb, mrcaObj = mrca, ID = "NBE15", purity = 1)
LACHESIS::plotNB(nb = nb, snvClonality = snvClonality)
```

The copy number plot highlights multiple chromosomal aberrations across different copy number states, including a gain of chromosome 1q to four copies and a whole-chromosome gain of chromosome 5 to three copies. SNVs located on these segments are shown in the per-copy-number VAF histograms below. The dotted lines indicate the expected clonal VAFs for each copy number state. As we would expect, subclonal SNVs appear to the left of these lines, while clonal peaks align closely with the dotted lines.

#### Example using variants assosciated with specific SBS mutational signatures from vcf file

```{r plot3, message=FALSE, warning=FALSE, fig.cap="*top plot* - copy number profile along the genome, *bottom plots* - VAF distribution of SSNVs stratified by copy number and minor/major alleles"}
set.seed(123)
snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = snvs, vcf.source = "dkfz")
aceseq_cn <- system.file("extdata", "NBE15", "NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS")
c_data <- LACHESIS::readCNV(aceseq_cn)
sig.filepath <- system.file("extdata", "NBE15_Decomposed_MutationType_Probabilities.txt", package = "LACHESIS")
nb <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51, sig.assign = TRUE, ID = "NBE15", sig.file = sig.filepath, sig.select = c("SBS1", "SBS5", "SBS40a", "SBS18"))
cl_muts <- LACHESIS::clonalMutationCounter(nb)
norm_muts <- LACHESIS::normalizeCounts(cl_muts)
mrca <- LACHESIS::MRCA(norm_muts)
snvClonality <- LACHESIS::estimateClonality(nbObj = nb, mrcaObj = mrca, ID = "NBE15", purity = 1)
LACHESIS::plotNB(nb = nb, snvClonality = snvClonality, sig.show = TRUE)
```

In addition to the VAF histograms stratified by clonality, here we show the same histograms stratified by mutational signature. In this case, SBS1, SBS5, SBS18, and SBS40a are observed—a typical constellation in untreated neuroblastoma. These signatures appear to be distributed relatively evenly and do not show any clear association with SNVs occurring before or after copy number gains or clonal expansion.

## Processing Clonal Mutations 

### Counting Clonal Mutations (ClonalMutationalCounter)

This function counts the number of clonal mutations residing on a single or multiple copies per genomic segment. Segments of equal copy number and A:B conformation are chromosome-wise merged. 

#### Input

```{r clonalmutcount_function, eval = FALSE}
clonalMutationCounter(nbObj = NULL, min.cn = 1, max.cn = 4, chromosomes = c(1:22))
```

#### Output

A `data.table` containing the following information:

```{r clonalmutcount_output, echo=FALSE}
library(knitr)

process_clonal_output <- data.frame(
    Column1 = c("*chrom*", "*TCN*", "*A*", "*B*", "*Seglength*", "*n_mut_A*", "*n_mut_B*", "*n_mut_total_clonal*"),
    Column2 = c(
        "the chromosome",
        "the total copy number",
        "copy number of the major allele",
        "copy number of the minor allele",
        "the segment length",
        "the number of mutations per copy with a multiplicity of A",
        "the number of mutations per copy with a multiplicity of B",
        "the total number of clonal mutations"
    )
)

knitr::kable(process_clonal_output, col.names = c("Column name", "Definition"), escape = FALSE)
```

#### Example

```{r clonalmutcount_example1, message=FALSE, warning=FALSE}
snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = snvs, vcf.source = "dkfz")
aceseq_cn <- system.file("extdata", "NBE15", "NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS")
c_data <- LACHESIS::readCNV(aceseq_cn)
nb <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51)
cl_muts <- LACHESIS::clonalMutationCounter(nb)
head(cl_muts, n = 1)
```

### Normalizing Clonal Mutation Counts (normalizeCounts)

This function translates the clonal mutation counts obtained with `clonalMutationCounter` into "molecular time". The normalized counts correspond to the number of mutations accumulated between conception/gastrulation and MRCA/copy number gain on a single DNA copy. 

#### Input

```{r normalizeCounts_function, eval=FALSE}
normalizeCounts(countObj)
```

#### Output

A `data.table` containing the following information:

```{r normalizeCounts_output, echo=FALSE}
library(knitr)

normalize_counts_output <- data.frame(
    Column1 = c("*chrom*", "*TCN*", "*A*", "*B*", "*Seglength*", "*n_mut_A*", "*n_mut_B*", "*n_mut_total_clonal*", "*density_total_mean*", "*density_total_lower*", "*density_total_upper*", "*density_A_mean*", "*density_A_lower*", "*density_A_upper*", "*density_B_mean*", "*density_B_lower*", "*density_B_upper*"),
    Column2 = c(
        "the chromosome",
        "the total copy number",
        "copy number of the major allele",
        "copy number of the minor allele",
        "the segment length",
        "the number of mutations per copy with a multiplicity of A",
        "the number of mutations per copy with a multiplicity of B",
        "the total number of mutations",
        "mean mutation densities (1/Mb) per copy",
        "lower limit of mutation densities (1/Mb) per copy according to a 95% confidence interval",
        "upper limit of mutation densities (1/Mb) per copy ccording to a 95% confidence interval",
        "mean mutation densities (1/Mb) per copy with a multiplicity of A",
        "lower limit of mutation densities (1/Mb) per copy with a multiplicity of A according to a 95% confidence interval",
        "upper limit of mutation densities (1/Mb) per copy with a multiplicity of B according to a 95% confidence interval",
        "mean mutation densities (1/Mb) per copy with a multiplicity of B",
        "lower limit of mutation densities (1/Mb) per copy with a multiplicity of B",
        "upper limit of mutation densities (1/Mb) per copy with a multiplicity of B"
    )
)

knitr::kable(normalize_counts_output, col.names = c("Column name", "Definition"), escape = FALSE)
```

#### Example

```{r normalizeCounts_example1, message=FALSE, warning=FALSE}
snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = snvs, vcf.source = "dkfz")
aceseq_cn <- system.file("extdata", "NBE15", "NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS")
c_data <- LACHESIS::readCNV(aceseq_cn)
nb <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51)
cl_muts <- LACHESIS::clonalMutationCounter(nb)
norm_muts <- LACHESIS::normalizeCounts(cl_muts)
head(norm_muts, n = 1)
```

## Inferring Clonal Evolution

### Estimating Mutation Densities at ECA/MRCA (MRCA)

This function estimates mutation densities at MRCA and, if present, ECA from the normalized clonal mutation counts obtained with `normalizeCounts`. Optionally, the average false positive rate of clonal mutations (e.g. due to incomplete tissue sampling) and the standard deviation of the false positive rate can be defined (fp.mean, fp.sd).

#### Input Parameters 

```{r MRCA_function, eval = FALSE}
MRCA(normObj = NULL, min.seg.size = 10^7, fp.mean = 0, fp.sd = 0, excl.chr = NULL)
```

#### Output

* **SNV_timing_per_segment_ID.txt:** purity, ploidy, and the estimated mean, lower, and upper bounds for the timing of the tumor's MRCA and ECA (if detected).

* **MRCA_densities_ID.txt:** per-segment mutation time, differentiated by minor and major copy number (A and B). For each segment,
```{r MRCA_output, echo=FALSE}
library(knitr)

mrca_dens_df <- data.frame(
    Column1 = c(
        "chrom", "TCN", "A", "B", "Seglength", "n_mut_A", "n_mut_B", "n_mut_total_clonal",
        "density_total_mean", "density_total_lower", "density_total_upper",
        "density_A_mean", "density_A_lower", "density_A_upper", "density_B_mean",
        "density_B_lower", "density_B_upper", "p_total_to_mrca", "p_A_to_mrca",
        "p_B_to_mrca", "p_adj_total_to_mrca", "p_adj_A_to_mrca", "p_adj_B_to_mrca",
        "MRCA_qual", "p_A_to_eca", "p_B_to_eca", "p_adj_A_to_eca", "p_adj_B_to_eca",
        "A_time", "B_time"
    ),
    Column2 = c(
        "the chromosome",
        "the total copy number",
        "copy number of the major allele",
        "copy number of the minor allele",
        "the segment length",
        "the normalized number of mutations per copy with a multiplicity of A",
        "the normalized number of mutations per copy with a multiplicity of B",
        "the normalized number of mutations per copy",
        "mean mutation densities (1/Mb) per copy",
        "lower limit of mutation densities (1/Mb) per copy according to a 95% confidence interval",
        "upper limit of mutation densities (1/Mb) per copy according to a 95% confidence interval",
        "mean mutation densities (1/Mb) per copy with a multiplicity of A",
        "lower limit of mutation densities (1/Mb) per copy with a multiplicity of A according to a 95% confidence interval",
        "upper limit of mutation densities (1/Mb) per copy with a multiplicity of B according to a 95% confidence interval",
        "mean mutation densities (1/Mb) per copy with a multiplicity of B",
        "lower limit of mutation densities (1/Mb) per copy with a multiplicity of B according to a 95% confidence interval",
        "upper limit of mutation densities (1/Mb) per copy with a multiplicity of B according to a 95% confidence interval",
        "probability that the mutation density per copy agrees with the mean mutation density at the MRCA",
        "probability that the mutation density per copy with a multiplicity of A agrees with the mean mutation density at the MRCA",
        "probability that the mutation density per copy with a multiplicity of B agrees with the mean mutation density at the MRCA",
        "Holm-corrected probability that the mutation density per copy agrees with the mean mutation density at the MRCA",
        "Holm-corrected probability that the mutation density per copy with a multiplicity of A agrees with the mean mutation density at the MRCA",
        "Holm-corrected probability that the mutation density per copy with a multiplicity of B agrees with the mean mutation density at the MRCA",
        "Quality control - PASS if `p_adj_to_mrca >= 0.01`",
        "probability that the mutation density per copy with a multiplicity of A agrees with the mean mutation density at the ECA",
        "probability that the mutation density per copy with a multiplicity of B agrees with the mean mutation density at the ECA",
        "Holm-corrected probability that the mutation density per copy with a multiplicity of A agrees with the mean mutation density at the ECA",
        "Holm-corrected probability that the mutation density per copy with a multiplicity of B agrees with the mean mutation density at the ECA",
        "Time point at which the copy number change of the major allele occurred (can be `ECA`, `MRCA`, or, if `A=1`, NA)",
        "Time point at which the copy number change of the major allele occurred (can be `ECA`, `MRCA`, or, if `B=1`, NA)"
    ),
    stringsAsFactors = FALSE
)

knitr::kable(mrca_dens_df, col.names = c("Column Name", "Definition"), align = "l")
```

#### Example

```{r MRCA_example1, message=FALSE, warning=FALSE}
snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = snvs, vcf.source = "dkfz")
aceseq_cn <- system.file("extdata", "NBE15", "NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS")
c_data <- LACHESIS::readCNV(aceseq_cn)
nb <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51)
cl_muts <- LACHESIS::clonalMutationCounter(nb)
norm_muts <- LACHESIS::normalizeCounts(cl_muts)
mrca <- LACHESIS::MRCA(norm_muts)
head(mrca, n = 1)
```

### Plotting Mutation Densities (plotMutationalDensities)

This function visualizes the results of the MRCA analysis. The top plot presents histograms of mean mutation densities, while the bottom plot illustrates a timeline of early tumor evolution. This timeline highlights mutation densities (mean values and 95% confidence intervals) associated with individual chromosomal gains. Additionally, the mutation densities at the ECA and MRCA are depicted. Optionally, the real-time estimation can be calculated based on a user-specified mutation rate (SNVs/day). Details on these rates are available in the literature. For neuroblastoma, a default value of 3.2 SNVs/ day is set according to calculations performed by Körber et al. ([Nature 2023](https://www.nature.com/articles/s41588-023-01332-y)).

#### Input 

```{r plotMutDens_function, eval = FALSE}
plotMutationDensities(mrcaObj = NULL, samp.name = NULL, min.seg.size = 10^7, mut.col.zero = "#4FB12B", mut.col.multi = "#176A02", mut.border = NULL, mut.show.density = TRUE, mut.breaks = NULL, mut.xaxis = NULL, mut.show.realtime = FALSE, mut.snv.rate = 3.2, output.file = NULL)
```

#### Example

```{r plot4, message=FALSE, warning=FALSE, fig.cap="*top plots* - Distribution of non-amplified and amplified clonal mutation densities across segments, *bottom plot* - Evolutionary timeline showing mutation densities at chromosomal gains (mean and 95% CI) and at the tumor's ECA and MRCA (mean and 95% CI of)"}
snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = snvs, vcf.source = "dkfz")
aceseq_cn <- system.file("extdata", "NBE15", "NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS")
c_data <- LACHESIS::readCNV(aceseq_cn)
nb <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51)
cl_muts <- LACHESIS::clonalMutationCounter(nb)
norm_muts <- LACHESIS::normalizeCounts(cl_muts)
mrca <- LACHESIS::MRCA(norm_muts)
LACHESIS::plotMutationDensities(mrca)
```

The single-copy SNVs place the MRCA at approximately 0.22 SNVs per Mb. The ECA, defined by multi-copy SNVs, is positioned before the MRCA, close to 0 SNVs per Mb and most of the chromosomal gains are asscoiated with it. This pattern is typical in high-risk neuroblastoma, where aneuploidy occurs during the first trimester, followed by the emergence of the MRCA around birth or during the first year of life.

### Estimating Clonality per SNV (estimateClonality)

This function determines the clonality status of each SNV, classifying them as clonal or subclonal. For clonal variants, the function further infers whether the mutation arose before or after a copy number alteration (pre-CNV or post-CNV) if possible.
 
#### Input 

```{r estimateClonality_function, eval = FALSE}
estimateClonality(nbObj = NULL, mrcaObj = NULL, ID = NULL, purity = NULL)
```

#### Example

```{r estimateClonality_example1, message=FALSE, warning=FALSE}
snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = snvs, vcf.source = "dkfz")
aceseq_cn <- system.file("extdata", "NBE15", "NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS")
c_data <- LACHESIS::readCNV(aceseq_cn)
nb <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51)
cl_muts <- LACHESIS::clonalMutationCounter(nb)
norm_muts <- LACHESIS::normalizeCounts(cl_muts)
mrca <- LACHESIS::MRCA(norm_muts)
LACHESIS::estimateClonality(nbObj = nb, mrcaObj = mrca, ID = "NBE15", purity = 1)
```

### Plotting Clonality per SNV (plotClonality)

This function visualizes the output from estimateClonality, providing an overview of clonal and subclonal SNVs across the chromosomal segments. If a mutational signature file is supplied, the function further stratifies SNV clonality by SBS signature within each chromosome for detailed analysis.

#### Input 

```{r plotClonality_function, eval = FALSE}
plotClonality(snvClonality = snvClonality, nb = NULL, sig.assign = FALSE, output.file = NULL)
```

#### Example

```{r plotClonality_example1, message=FALSE, warning=FALSE}
snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = snvs, vcf.source = "dkfz")
aceseq_cn <- system.file("extdata", "NBE15", "NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS")
c_data <- LACHESIS::readCNV(aceseq_cn)
nb <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51, ID = "NBE15")
cl_muts <- LACHESIS::clonalMutationCounter(nb)
norm_muts <- LACHESIS::normalizeCounts(cl_muts)
mrca <- LACHESIS::MRCA(norm_muts)
snvClonality <- LACHESIS::estimateClonality(nbObj = nb, mrcaObj = mrca, ID = "NBE15", purity = 1)
LACHESIS::plotClonality(snvClonality = snvClonality, nb = nb, sig.assign = FALSE)
```

This graph provides a first overview of whether SNVs occurred before or after copy number gains. A more detailed analysis of specific SNVs—such as known driver mutations—and their timing relative to the gains can be performed using the per-SNV table output. In this case, we observe a large number of SNVs occurring after the gain on chromosome 7, potentially triggered by this gain, whereas the small number of preceding mutations on chromosome 1 may provide insights into the initiation of the chromosome 1 gain.  

# LACHESIS Wrapper Function

This function calls all of the previously mentioned functions, and saves the results to the specified output directory. In addition, for every patient a datatable with mean, lower and upper bounds of mutation density at MRCA and ECA (if detected) is generated. These data tables will be combined into a single table, enabling comprehensive cohort analysis.
Input can be specified as vectors or in a tab-delimited **sample-specification file**

#### Input 

```{r LACHESIS_function, eval = FALSE}
LACHESIS(
    input.files = NULL, ids = NULL, vcf.tumor.ids = NULL, cnv.files = NULL, snv.files = NULL,
    vcf.source = NULL, purity = NULL, ploidy = NULL, cnv.chr.col = NULL, cnv.start.col = NULL,
    cnv.end.col = NULL, cnv.A.col = NULL, cnv.B.col = NULL, cnv.tcn.col = NULL, age = NULL,
    OS.time = NULL, OS = NULL, EFS.time = NULL, EFS = NULL, output.dir = NULL,
    ignore.XY = TRUE, min.cn = 1, max.cn = 4, merge.tolerance = 10^5, min.vaf = 0.01,
    min.depth = 30, vcf.info.af = "AF", vcf.info.dp = "DP", min.seg.size = 10^7, fp.mean = 0,
    fp.sd = 0, excl.chr = NULL, ref_build = "hg19", ...
)
```

#### Output

A `data.table` containing the following information:

```{r LACHESIS_output, echo=FALSE}
library(knitr)

lachesis_output <- data.frame(
    Column1 = c(
        "Sample_ID", "MRCA_time_mean", "MRCA_time_lower/ MRCA_time_upper", "ECA_time_mean", "ECA_time_lower/ ECA_time_upper", "Ploidy", "Purity", "Age", "OS.time", "OS", "EFS.time", "EFS"
    ),
    Column2 = c(
        "the tumor sample ID",
        "lower and upper bounds of mutation density at MRCA as calculated in [MRCA function](#estimating-mutation-densities-at-ecamrca-mrca)",
        "mean mutation density at MRCA as calculated in [MRCA function](#estimating-mutation-densities-at-ecamrca-mrca)",
        "mean mutation density at ECA as calculated in [MRCA function](#estimating-mutation-densities-at-ecamrca-mrca)",
        "lower and upper bounds of mutation density at ECA as calculated in [MRCA function](#estimating-mutation-densities-at-ecamrca-mrca)",
        "user-specified average copy number in the tumor sample",
        "user-specified tumor cell content",
        "user-specified age at diagnosis",
        "user-specified overall survival time",
        "user-specified overall survival indicator variable",
        "user-specified eventfree survival time",
        "user-specified eventfree survival indicator variable"
    ),
    stringsAsFactors = FALSE
)

knitr::kable(lachesis_output, col.names = c("Column Name", "Definition"), align = "l")
```


#### Example with vectors

```{r LACHESIS_example1, message=FALSE, warning=FALSE}
strelka_vcf <- system.file("extdata", "strelka2.somatic.snvs.vcf.gz", package = "LACHESIS")
aceseq_cn <- system.file("extdata", "ACESeq/NBE11_comb_pro_extra2.59_0.83.txt", package = "LACHESIS")
lachesis <- LACHESIS::LACHESIS(ids = "NBE11", cnv.files = aceseq_cn, snv.files = strelka_vcf, vcf.source = "strelka", purity = 0.83, ploidy = 2.59)
head(lachesis, n = 3)
```

#### Example with tab-delimited sample-specification file

```{r LACHESIS_example2, message=FALSE, warning=FALSE}
lachesis_input <- system.file("extdata", "Sample_template.txt", package = "LACHESIS")
```

```{r LACHESIS_example2_1, echo=FALSE}
input.files <- system.file("extdata", "Sample_template.txt", package = "LACHESIS")
input.files <- data.table::fread(input.files)

# cnv and snv files for example tumors
nbe11 <- list.files(system.file("extdata/NBE11/", package = "LACHESIS"), full.names = TRUE)
nbe15 <- list.files(system.file("extdata/NBE15/", package = "LACHESIS"), full.names = TRUE)
nbe26 <- list.files(system.file("extdata/NBE26/", package = "LACHESIS"), full.names = TRUE)

cnv.file <- c(nbe11[1], nbe15[1], nbe26[1])
snv.file <- c(nbe11[2], nbe15[2], nbe26[2])

input.files$cnv.file <- cnv.file
input.files$snv.file <- snv.file

# Make an example input file with paths to cnv and snv file along with other meta data
lachesis_input <- tempfile(pattern = "lachesis", tmpdir = tempdir(), fileext = ".tsv")
data.table::fwrite(x = input.files, file = lachesis_input, sep = "\t")
```

<!-- ```{r LACHESIS_example2_3} -->
<!-- #contents of the template -->
<!-- lachesis_input |> data.table::fread() |> head() -->
<!-- ``` -->


```{r LACHESIS_example2_4, message=FALSE, warning=FALSE}
lachesis <- LACHESIS::LACHESIS(input.files = lachesis_input)
head(lachesis, n = 3)
```

#### Example with multiple sample and data frame input

```{r LACHESIS_example3, message=FALSE, warning=FALSE}
nbe11_vcf <- system.file("extdata", "NBE11/snvs_NBE11_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
nbe11_cn <- read.delim(system.file("extdata", "NBE11/NBE11_comb_pro_extra2.59_0.83.txt", package = "LACHESIS"), sep = "\t", header = TRUE)
nbe15_vcf <- system.file("extdata", "NBE15/snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
nbe15_cn <- read.delim(system.file("extdata", "NBE15/NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS"), sep = "\t", header = TRUE)

lachesis <- LACHESIS::LACHESIS(ids = c("NBE11", "NBE15"), cnv.files = list(nbe11_cn, nbe15_cn), snv.files = c(nbe11_vcf, nbe15_vcf), vcf.source = c("dkfz", "dkfz"), purity = c(0.83, 1), ploidy = c(2.59, 2.51), cnv.chr.col = c(1, 1), cnv.start.col = c(2, 2), cnv.end.col = c(3, 3), cnv.A.col = c(34, 34), cnv.B.col = c(35, 35), cnv.tcn.col = c(37, 37))
head(lachesis, n = 3)
```


### Plotting Cohort Analysis (plotLACHESIS)

The function `plotLACHESIS` plots cumulative distributions of SSNV densities at ECA and MRCA across a cohort.

#### Input 

```{r plotLACHESIS_function, eval = FALSE}
plotLachesis(lachesis = NULL, lach.suppress.outliers = FALSE, lach.log.densities = FALSE, lach.col.multi = "#176A02", lach.border = NULL, binwidth = NULL, lach.col.zero = "#4FB12B", output.file = NULL)
```

#### Example

```{r plotLACHESIS_example1, message=FALSE, warning=FALSE}
lachesis_input <- system.file("extdata", "Sample_template.txt", package = "LACHESIS")
```

```{r plotLACHESIS_example1_1, echo=FALSE}
input.files <- system.file("extdata", "Sample_template.txt", package = "LACHESIS")
input.files <- data.table::fread(input.files)

# cnv and snv files for example tumors
nbe11 <- list.files(system.file("extdata/NBE11/", package = "LACHESIS"), full.names = TRUE)
nbe15 <- list.files(system.file("extdata/NBE15/", package = "LACHESIS"), full.names = TRUE)
nbe26 <- list.files(system.file("extdata/NBE26/", package = "LACHESIS"), full.names = TRUE)

cnv.file <- c(nbe11[1], nbe15[1], nbe26[1])
snv.file <- c(nbe11[2], nbe15[2], nbe26[2])

input.files$cnv.file <- cnv.file
input.files$snv.file <- snv.file

# Make an example input file with paths to cnv and snv file along with other meta data
lachesis_input <- tempfile(pattern = "lachesis", tmpdir = tempdir(), fileext = ".tsv")
data.table::fwrite(x = input.files, file = lachesis_input, sep = "\t")
```

```{r plot5, message=FALSE, warning=FALSE, fig.cap="Cumulative distribution of SSNV densities in ECA (dark green) and MRCA (light green)"}
lachesis <- LACHESIS::LACHESIS(input.files = lachesis_input)
LACHESIS::plotLachesis(lachesis)
```

This graph provides a cohort-level overview, illustrating two late-MRCA (SNVs per Mb > 0.05) and one early-MRCA sample. In the late-MRCA cases, an early ECA was detected, indicating a longer evolution rather than a later onset.

### Plotting Clinical Correlations (plotClinicalCorrelations)

This function takes SNV densities as input and correlates them with clinical data specified by the user, such as age at diagnosis, survival data etc.

#### Input 

```{r plotClinCor_function, eval = FALSE}
plotClinicalCorrelations(lachesis = NULL, clin.par = "Age", clin.suppress.outliers = FALSE, clin.log.densities = FALSE, output.file = NULL)
```

#### Example 

```{r plot6, message=FALSE, warning=FALSE, fig.cap="Correlation between age at diagnosis and SNV densities at ECA and MRCA"}
input.files <- system.file("extdata", "Sample_template.txt", package = "LACHESIS")
lachesis <- LACHESIS::LACHESIS(input.files = lachesis_input)

LACHESIS::plotClinicalCorrelations(lachesis)
```

The cohort size in this case is too small to draw reliable conclusions from these correlation plots. However, in larger neuroblastoma cohorts (Körber et al.), a correlation between patient age and MRCA timing has been observed.

### Plotting Survival (plotSurvival)

The function `plotSurvival` plots correlation between SNV density timing at MRCA with Survival

#### Input

```{r plotSurvival_function, eval = FALSE}
plotSurvival(lachesis = NULL, mrca.cutpoint = NULL, output.dir = NULL, surv.time = "OS.time", surv.event = "OS", surv.palette = c("dodgerblue", "dodgerblue4"), surv.time.breaks = NULL, surv.time.scale = 1, surv.title = "Survival probability", surv.ylab = "Survival")
```

#### Example

```{r plot7, message=FALSE, warning=FALSE, fig.cap="Correlation between Survival and SNV densities at ECA and MRCA", echo = FALSE}
input.files <- system.file("extdata", "Sample_template.txt", package = "LACHESIS")
lachesis <- LACHESIS::LACHESIS(input.files = lachesis_input)

LACHESIS::plotSurvival(lachesis, surv.time = "EFS.time", surv.event = "EFS", mrca.cutpoint = 0.05)
```

The timing of the MRCA serves as a robust predictor of event-free survival. Tumors with an early MRCA are linked to favorable overall survival outcomes, while late-MRCA tumors are more frequently associated with relapse.

### Classifying Tumor Evolution (classifyLACHESIS)

The function `classifyLACHESIS` classifies a tumor's start of clonal outgrowth during tumorigenesis as "early" or "late" (favorable/ unfavorable prognosis) depending on the mutation density at its MRCA.

#### Input

```{r classifyLACHESIS_function, eval = FALSE}
classifyLACHESIS(lachesis, mrca.cutpoint = NULL, output.dir = NULL, entity = "neuroblastoma", infer.cutpoint = FALSE, surv.time = "OS.time", surv.event = "OS")
```

#### Example

```{r classifyLACHESIS_example1, message=FALSE, warning=FALSE}
input.files <- system.file("extdata", "Sample_template.txt", package = "LACHESIS")
lachesis <- LACHESIS::LACHESIS(input.files = lachesis_input)

LACHESIS::classifyLACHESIS(lachesis, entity = "neuroblastoma")
```

# Quality control/ Filtering

## Hypermutants

To exclude hypermutant cases, the tumor mutational burden (TMB) should be below 10 mut/Mb ([Campbell et al](https://pubmed.ncbi.nlm.nih.gov/29056344/)). 

## Ploidy and purity

A minimum tumor purity of 30% is recommended for accurate estimation of variant numbers. Additionally, some pipelines (e.g., AceSeq) may assign a default purity of 1 when it cannot be reliably estimated. Such cases should be excluded from analysis.

To ensure valid results, tumor ploidy should be below the max.cn parameter [(e.g., readcnv)](#input). LACHESIS default value is max.cn = 4, therefore we recommend excluding samples with ploidy >4. 

## VAF distribution

The user should verify that the measured clonal peaks align with the expected variant allele frequencies (VAFs), indicated by the dashed lines in the stratified VAF histograms. If the observed VAFs do not match the expected patterns, an alternative purity and ploidy solution if provided by the CNV calling pipeline should be used. Otherwise the sample should not be considered for further analysis. 

#### Positive Example

```{r vaf-plots-positive, echo=FALSE, fig.show='hold', out.width="70%", fig.cap = "Clonal peaks (i.e. ->) correlate with expected VAFs (----)"}
knitr::include_graphics(system.file("extdata", "VAF_positive_example.png", package = "LACHESIS"))
```

#### Negative Example

```{r vaf-plots-negative, echo=FALSE, fig.show='hold', out.width="70%", fig.cap = "Clonal peaks cannot be clearly identified, a load of noise (i.e. ->) between the expected VAFs (----)"}
knitr::include_graphics(system.file("extdata", "VAF_negative_example.png", package = "LACHESIS"))
```

# Warning Messages

## readCNV

see [Importing Copy Number Information](#importing-copy-number-information-readcnv) for details

```{r warning_1, eval = FALSE}
warning("Removing x segments with start > end")
```
This warning indicates that there are rows inside the provided cnv file where the end position is less than the start position. Corresponding genomic segments will be removed from the data. 

```{r warning_2, eval = FALSE}
warning("Less than 10% of the genome with valid copy number information.")
```
The total length of all valid genomic segments in the cnv file is less than 300 million base pairs (3*10^8 bp). The function assumes a default human genome size of 3 × 10⁹ bp. 

```{r warning_3, eval = FALSE}
warning("No *x* identifier provided, assuming *y*")
```
The user should specify the column names or index for the cnv file (cnv.x.col/ x.col). If no identifiers are provided, the function will attempt to identify the relevant columns using standard nomenclature. Please verify that the expected data for x is located in column y.

```{r warning_4, eval = FALSE}
warning("Allele information is not provided and will be assumed 1:1 in disomic regions, 2:1 in trisomic regions, 2:2 in tetrasomic regions, ... .")
```
This warning appears when no major allele column (cnv.A.col/ A.col) is specified or cannot be inferred using standard nomenclature. LACHESIS will assume that the total copy number arose through the minimal number of genomic changes. However, this assumption may lead to incorrect results in some cases.
To ensure accuracy, we strongly recommend providing allele-specific copy number information.

## nbImport

see [Assigning copy number states to single nucleotide variants](#assining-copy-number-states-to-single-nucleotide-variants-nbimport) for details

```{r warning_5, eval = FALSE}
warning("Removed x variants with no copy number overlaps")
```
After integrating overlapping snv and cnv data, x variants with empty cnv start column were removed. 

## MRCA

see [Estimating Mutation Densities at ECA/MRCA](#estimating-mutation-densities-at-ecamrca-mrca) for details

```{r warning_6, eval = FALSE}
warning(sum(workObj$MRCA_qual == "FAIL"), " segments did not conform to the mutation density at MRCA.")
```


## LACHESIS

see [LACHESIS Wrapper Function](#lachesis-wrapper-function) for details

```{r warning_7, eval = FALSE}
warning("No sample name provided for samples x; sample name was set to 1 - y")
```
This warning message specifies samples where no ID was provided, the total number of missing IDs (y) was calculated and IDs were specified as numeric values (1, 2, 3, ..., y). 

```{r warning_8, eval = FALSE}
warning("No CNV/ SNV file provided for sample(s); sample(s) will be excluded")
```
The cnv/ snv file is missing for the specified sample, therefor it was excluded from the analysis. Please refer to [Input Data](#input-data) for details on data requirements. 

```{r warning_9, eval = FALSE}
warning("No output directory specified. Per-sample output will be discarded.")
```
If the output directory (output.dir) was not specified, [LACHESIS Output](#package-output) will not be displayed or saved.

```{r warning_10, eval = FALSE}
warning("Insufficient data for sample ", x$ID)
```
This warning appears if the datatable generated at [nbImport](#output-2) is empty.

```{r warning_11, eval = FALSE}
warning("No column identifier  provided for samples x; column name will be inferred")
```
The parameter vcf.tumor.ids should be specified by the user, it will be passed to [readVCF](#importing-variant-information-readvcf) as t.sample, indicating the tumor sample name.

```{r warning_12, eval = FALSE}
warning("Too few segments to estimate MRCA density for sample x.")
```
The datatable generated at [normalizeCounts](#output-4) only has one row, no reliable estimation of MRCA density is possible. 

## plotLachesis

see [Plotting Cohort Analysis](#plotting-cohort-analysis-plotlachesis) for details

```{r warning_13, eval = FALSE}
warning("Cannot produce summary statistics for a single case. Returning null.")
```
This warning messages indicates that the data for a single patient was specified to LACHESIS and therefore cumulative distributions of SSNV densities at ECA and MRCA across a cohort can not be plotted. 

## plotLachesis, plotClinicalCorrelations, plotSurvival

see [Plotting Cohort Analysis](#plotting-cohort-analysis-plotlachesis), [Plotting Clincal Correlations](#plotting-clincal-correlations-plotclinicalcorrelations), [Plotting Survival](#plotting-survival-plotsurvival) for details

```{r warning_14, eval = FALSE}
warning("Removing x samples with missing MRCA density estimate.")
```
The column *MRCA_time_mean* of [LACHESIS wrapper function](#output-7) output for the specified samples is empty; these samples will not be considered for further analysis. 

```{r warning_15, eval = FALSE}
warning("No sample with MRCA density estimate provided. Returning zero.")
```
[LACHESIS wrapper function](#output-7) output is empty; no further analysis will be performed 

## plotSurvival

see [Plotting Survival](#plotting-survival-plotsurvival) for details

```{r warning_16, eval = FALSE}
warning("Removing x samples with missing survival time.")
```
The column named according to parameter `surv.time` of [LACHESIS wrapper function](#output-7) output for specified samples is empty; these samples will not be considered for further analysis. 

```{r warning_17, eval = FALSE}
warning("No survival events in cohort Returning zero.")
```
The column named according to parameter `surv.event` of [LACHESIS wrapper function](#output-7) is empty; no further analysis will be performed. 

# How To Get Help 

In case of any questions, feel free to open an issue on [**Github**](https://github.com/VerenaK90/LACHESIS/issues).

<!-- # Acknowledgments -->

<!-- # Funding -->

# Session Info

```{r sessionInfo}
sessionInfo()
```

<!-- # References -->