---
title: "CopyNeutralIMA"
author:
  - name: Moritz Przybilla
    affiliation:
    - &dkfz Division of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany
  - name: Xavier Pastor
    affiliation:
    - Heidelberg Center for Personalized Oncology (DKFZ-HIPO), Heidelberg, Germany
    - *dkfz
    email: xavier.pastor@compbio-dev.com

date: "`r Sys.Date()`"
output:
  BiocStyle::html_document:
    toc: true
    toc_float: true
    number_sections: true
  github_document:
    output_file: ../README.md
    toc: true
    toc_float: true
    number_sections: true
link-citations: true
bibliography: ../inst/REFERENCES.bib
vignette: >
  %\VignetteIndexEntry{CopyNeutralIMA}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
<style>
body {
text-align: justify}
</style>

```{r, echo = FALSE}
knitr::opts_chunk$set(collapse=TRUE, comment='#>')
```

# Overview
*CopyNeutralIMA* provides reference samples for performing copy-number variation (CNV) analysis using Illumina Infinium 450k or EPIC DNA methylation arrays. 
There is a number of R/Bioconductor packages that do genomic copy number profiling, including [*conumee*](http://bioconductor.org/packages/release/bioc/html/conumee.html) [@conumee], [*ChAMP*](http://bioconductor.org/packages/release/bioc/html/ChAMP.html) [@champ] or *CopyNumber450k*, now deprecated. In order to extract information about the copy number alterations, a set of copy neutral samples is required as a reference. The package *CopyNumber450kData*, usually used to provide the reference, is no longer available. Additionally, there has never been an effort to provide reference samples for the EPIC arrays. To fill this gap of lacking reference samples, we here introduce the *CopyNeutralIMA* package. 

# Description

In this package we provide a set of 51 IlluminaHumanMethylation450k and 13 IlluminaHumanMethylationEPIC samples. The provided samples consist of material from healthy individuals with nominally no copy number aberrations. Users of *conumee* or other copy number profiling packages may use this data package as reference genomes.

# Data

We selected the data from different studies accessible in the [Gene Expression Omnibus (GEO)](https://www.ncbi.nlm.nih.gov/geo/). In particular, for 450k arrays samples from [GSE49618](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE49618) [@GSE49618], [GSE61441](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE61441) [@GSE61441] and [GSE106089](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE106089) [@GSE106089] were chosen. For EPIC arrays, normal or control samples from series [GSE86831](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE86831)/[GSE86833](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE86833) [@GSE86831], [GSE98990](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE98990) [@GSE98990] and [GSE100825](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE100825) [@GSE100825] were chosen.

# Example with *conumee*

First, we load the data we want to analyse and rename it. We will use the examples provided by the [*minfiData*](https://bioconductor.org/packages/release/data/experiment/html/minfiData.html) [@minfiData] package and will follow the steps described in the vignette of *conumee*.

```{r read_tcga, message=F, warning=F}
library(minfi)
library(conumee)
library(minfiData)

data(RGsetEx)
sampleNames(RGsetEx) <- pData(RGsetEx)$Sample_Name
cancer <- pData(RGsetEx)$status == 'cancer'
RGsetEx <- RGsetEx[,cancer]
RGsetEx
```

After loading the data we normalize it:
```{r normalize_tcga}
MsetEx <- preprocessIllumina(RGsetEx)
MsetEx
```

Now we load our control samples, from the same array type as our test samples and normalize them:
```{r prepare_controls, message=F}
library(CopyNeutralIMA)
ima <- annotation(MsetEx)[['array']]
RGsetCtrl <- getCopyNeutralRGSet(ima)
# preprocess as with the sample data
MsetCtrl <- preprocessIllumina(RGsetCtrl)
MsetCtrl
```

Finally we can run the conumee analysis following the author's indications:
```{r conumee}
# use the information provided by conumee to create annotation files or define
# them according to the package instructions
data(exclude_regions)
data(detail_regions)
anno <- CNV.create_anno(array_type = "450k", exclude_regions = exclude_regions, detail_regions = detail_regions)

# load in the data from the reference and samples to be analyzed
control.data <- CNV.load(MsetCtrl)
ex.data <- CNV.load(MsetEx)

cnv <- CNV.fit(ex.data["GroupB_1"], control.data, anno)
cnv <- CNV.bin(cnv)
cnv <- CNV.detail(cnv)
cnv <- CNV.segment(cnv)
cnv

CNV.genomeplot(cnv)
CNV.genomeplot(cnv, chr = 'chr18')

head(CNV.write(cnv, what = 'segments'))
head(CNV.write(cnv, what='probes'))
```

---
nocite: |
    @minfi
...

# Session info {.unnumbered}
```{r session_info, echo = F}
sessionInfo()
```

# References