---
title: "MACPET"
author:
  - name: "Ioannis Vardaxis"
    email: ioannis.vardaxis@ntnu.no
date: "`r doc_date()`"
package: "`r BiocStyle::pkg_ver('MACPET')` Rversion >=`r getRversion()`"
abstract: |
    This vignette gives an introduction to the MACPET package which can be
    used for the analysis of paired-end DNA data like ChIA-PET. Throughout the
    vignette an introduction of MACPET classes, methods and functions will be given.
references:
- id: macpet
  title: MACPET Model-based Analysis for ChIA-PET
  author:
  - family: Vardaxis
    given: Ioannis
  - family: Drabl\o s
    given: Finn
  - family: Rye
    given: Morten
  - family: Lindqvist
    given: Bo Henry
  container-title: To be published
  volume: 
  URL: 
  DOI: 
  issue: 
  publisher:
  page: 
  type: article-journal
  issued:
    year: 
    month: 
output:
 BiocStyle::pdf_document
vignette: >
 %\VignetteIndexEntry{MACPET}
 %\VignetteEngine{knitr::rmarkdown}
 %\VignetteEncoding{UTF-8}
---

```{r style,eval=TRUE,echo=FALSE,results='hide'}
BiocStyle::latex
```

<!-- Intro -->
\section{Introduction}

The \Biocpkg{MACPET} package can be used for general analysis of paired-end (PET) data like ChIA-PET.
\Biocpkg{MACPET} currently implements the following five stages:

* Stage 0 (Linker filtering): Identifies linkers A and B in the fastq files and classifies the
reads as usable (A/A,B/B), chimeric (A/B,B/A) and ambiguous (non/A, non/B, A/non, B/non).
* Stage 1 (Mapping to the reference genome): Maps the usable reads separately into the reference genome using 
\Biocpkg{Rbowtie} package, and keeps only uniquely mapped reads with zero mismatch per read. 
It then maps the unmapped reads to the reference genome with at most one mismatch and keeps the uniquely mapped reads. Uniquely mapped reads with zero or one mismatch are then merged and paired, their duplicates are marked and a paired-end bam file is created which is used in State 2.
* Stage 2 (PET classification): Classifies the PETs as self-ligated (short genomic distance, same chromosome),
intra-chromosomal (long genomic distance, same chromosome) by finding the self-ligated cut-off using the elbow method,
and inter-chromosomal (different chromosomes).  Furthermore, it removes identically mapped PETs for reducing noise created by amplification procedures. Moreover, it can remove black-listed regions based on the genome of the data. Note that loading the data into R might take a while depending on the size of the data.
* Stage 3 (Peak-calling): Uses the self-ligated PETs found in Stage 2 and segments the genome into non-overlapping
regions. It then uses both reads of each PET and applies 2D mixture models for identifying two-dimensional 
clusters which represent candidate binding sites using the
skewed generalized students-t distributions (SGT). Finally, it uses a local Poisson
model for finding significant binding sites. 
* Stage 4 (Interaction-calling): This stage uses the intra- and inter-chromosomal PETs found in State 2, as well as the significant Peaks found in Stage 3 for calling for significant interactions. NOTE: currently inter-chromosomal PETs are not supported.


\Biocpkg{MACPET} identifies binding site locations more accurately than other algorithms which use only one end (like MACS) [@macpet]. The output from Stage 3 in \Biocpkg{MACPET} can be used for interaction analysis using either MANGO or MICC, or the user can run State 4 in \Biocpkg{MACPET} for interaction analysis. Note that in the case of using the output from \Biocpkg{MACPET} in MANGO or MICC for interaction analysis, the user should use the self-ligated cut-off found by \Biocpkg{MACPET}, and not the one found in MANGO or MICC. Both of those algorithms allow the user to specify the self-ligated cut-off. MACPET is mainly written in C++, and it supports the  \Biocpkg{BiocParallel}  package.


Before starting with examples of how to use \Biocpkg{MACPET}, create a 
test folder to save all the output files of the examples presented in this 
vignette:
```{r,eval=TRUE,echo=TRUE}
#Create a temporary test folder, or anywhere you want:
SA_AnalysisDir=file.path(tempdir(),"MACPETtest")
dir.create(SA_AnalysisDir)#where you will save the results.
```

Load the package:
```{r}
library(MACPET)
```

<!-- Classes -->
\section{\Biocpkg{MACPET} Classes}
\label{sec:classes}

\Biocpkg{MACPET} provides five different classes which all inherit from the 
\Rclass{GInteractions} class in the \Biocpkg{InteractionSet} package. Therefore, 
every method associated with the \Rclass{GInteractions} class is also applicable 
to the \Biocpkg{MACPET} classes. Every \Biocpkg{MACPET} class contains information 
of the PETs associated with the corresponding class, their start/end coordinates on 
the genome as well as which chromosome they belong to. 
This section provides an overview of the \Biocpkg{MACPET} classes, while methods
associated with each class are presented in latter sections.
The classes provided by \Biocpkg{MACPET} are the following:

* \textcolor{LimeGreen}{\Rclass{PSelf}} class contains information 
about the self-ligated PETs in the data. This class is created using either the 
\Rfunction{MACPETUlt} function at stage 2 or the \Rfunction{ConvertToPSelf} 
function.
* \textcolor{LimeGreen}{\Rclass{PSFit}} class is an update of the
\textcolor{LimeGreen}{\Rclass{PSelf}} class, which contains information about
which binding site each PET belongs to, as well as significant peaks found by the
peak-calling algorithm. This class is created using the \Rfunction{MACPETUlt}
function at stage 3.
* \textcolor{LimeGreen}{\Rclass{PInter}} class contains information about Inter-chromosomal
PETs in the data. This class is created using the \Rfunction{MACPETUlt}
function at stage 2.
* \textcolor{LimeGreen}{\Rclass{PIntra}} class contains information about Intra-chromosomal
PETs in the data. This class is created using the \Rfunction{MACPETUlt}
function at stage 2.
* \textcolor{LimeGreen}{\Rclass{GenomeMap}} class contains information about
the interactions in the genome. This class is created using the \Rfunction{MACPETUlt}
function at stage 4. Then the user can use the \Rfunction{GetSignInteractions} function for subseting the 
significant interactions from the object and return a \Rclass{GInteractions} class object.

\subsection{\textcolor{LimeGreen}{\Rclass{PSelf}} Class}

The \textcolor{LimeGreen}{\Rclass{PSelf}} class contains pair-end tag information of self-ligated PETs
which is used for binding site analysis.
```{r}
load(system.file("extdata", "MACPET_pselfData.rda", package = "MACPET"))
class(MACPET_pselfData) #example name
MACPET_pselfData #print method
```


Extra information of this class is stored as list in the metadata entries with 
the following elements: 

* Self_info: a two-column data.frame with information about the chromosomes
in the data (chrom) and  the total PET counts of each chromosome (PET.counts). 
* SLmean: which is the mean size of the self-ligated PETs.
* MaxSize: The maximum self-ligated PET size in the data.
* MinSize: The minimum self-ligated PET size in the data.
```{r}
metadata(MACPET_pselfData)
```

One can also access information about chromosome lengths etc.
```{r}
seqinfo(MACPET_pselfData)
```


\subsection{\textcolor{LimeGreen}{\Rclass{PSFit}} Class}

The \textcolor{LimeGreen}{\Rclass{PSFit}} class adds information
to the \textcolor{LimeGreen}{\Rclass{PSelf}} class about the
peak each PET belongs to, as well as the total number of peaks
in each chromosome in the data, p-values and FDR for each peak.

```{r}
load(system.file("extdata", "MACPET_psfitData.rda", package = "MACPET"))
class(MACPET_psfitData) #example name
MACPET_psfitData #print method
```


This class updates the Self_info data frame of the \textcolor{LimeGreen}{\Rclass{PSelf}} class
with two extra columns: the total regions each chromosome is segmented into (Region.counts)
and the total candidate peaks of each chromosome (Peak.counts). Moreover, this class contains
a metadata entry which is a matrix containing region and peak IDs for each PET in the data (Classification.Info).
Finally, it also contains a metadata entry with information about each peak found (Peaks.Info). Peaks.Info is
a data.frame with the following entries:

* Chrom: The name of the chromosome
* Pets: Total PETs in the peak.
* Peak.Summit: Summit of the peak.
* Up.Summit: Summit of the left-stream PETs.
* Down.Summit: Summit of the right-stream PETs.
* CIQ.Up.start: Start of 95 Quantile confidence interval for the left-stream PETs.
* CIQ.Up.end: End of 95 Quantile confidence interval for the left-stream PETs.
* CIQ.Up.size: Size of 95 Quantile confidence interval for the left-stream PETs.
* CIQ.Down.start: Start of 95 Quantile confidence interval for the right-stream PETs.
* CIQ.Down.end: End of 95 Quantile confidence interval for the right-stream PETs.
* CIQ.Down.size: Size of 95 Quantile confidence interval for the right-stream PETs.
* CIQ.Peak.size: Size of the Peak based on the interval (CIQ.Up.start,CIQ.Down.end).
* sdx: The standard deviation of the upstream PETs.
* lambdax: The skewness of the upstream PETs. 
* sdy: The standard deviation of the downstream PETs.
* lambday: The skewness of the downstream PETs.
* lambdaUp: The expected number of PETs in the left-stream Peak region by random chance.
* FoldEnrichUp: Fold enrichment for the left-stream Peak region.
* p.valueUp: p-value for the left-stream Peak region.
* lambdaDown: The expected number of PETs in the right-stream Peak region by random chance.
* FoldEnrichDown: Fold enrichment for the right-stream Peak region.
* p.valueDown: p-value for the right-stream Peak region.
* p.value: p-value for the Peak (p.valueUp*p.valueDown).
* FDRUp: FDR correction for the left-stream Peak region.
* FDRDown: FDR correction for the right-stream Peak region.
* FDR: FDR correction for the Peak.
```{r}
head(metadata(MACPET_psfitData)$Peaks.Info)
```

One can also access information about chromosome lengths etc, using  
\Rfunction{seqinfo(MACPET\_psfitData)}.


\subsection{\textcolor{LimeGreen}{\Rclass{PInter}} Class}

The \textcolor{LimeGreen}{\Rclass{PInter}} class contains pair-end tag information
of Inter-chromosomal PETs:

```{r}
load(system.file("extdata", "MACPET_pinterData.rda", package = "MACPET"))
class(MACPET_pinterData) #example name
MACPET_pinterData #print method
```

One can also access information about chromosome lengths etc, using  
\Rfunction{seqinfo(MACPET\_pinterData)}.

It also contains a two-element metadata list with the following elements:

* InteractionCounts: a table with the total number of Inter-chromosomal PETs between chromosomes. Where
the rows represent the "from"  anchor and the columns the "to" anchor.
```{r,eval=TRUE}
metadata(MACPET_pinterData)
```


\subsection{\textcolor{LimeGreen}{\Rclass{PIntra}} Class}


The \textcolor{LimeGreen}{\Rclass{PIntra}} class contains pair-end tag information
of Intra-chromosomal PETs.

```{r}
load(system.file("extdata", "MACPET_pintraData.rda", package = "MACPET"))
class(MACPET_pintraData)#example name
MACPET_pintraData#print method
```

One can also access information about chromosome lengths etc, using  
\Rfunction{seqinfo(MACPET\_pintraData)}.

It also contains a two-element metadata list with the following elements:

* InteractionCounts: a data.frame with the total number of Intra-chromosomal PETs for each chromosome (Counts).
```{r}
metadata(MACPET_pintraData)
```
\subsection{\textcolor{LimeGreen}{\Rclass{GenomeMap}} Class}

The \textcolor{LimeGreen}{\Rclass{GenomeMap}} class contains all potential interactions between pairs
of peaks, as well as the peaks' anchors.
```{r}
load(system.file("extdata", "MACPET_GenomeMapData.rda", package = "MACPET"))
class(MACPET_GenomeMapData) #example name
MACPET_GenomeMapData #print method
```
Extra information of this class is stored as list in the metadata entries with 
the following elements: 

* pvalue: The p-value of the interaction.
* FDR: The FDR of the interaction.
* Order: The order the interaction was entered into the model.
* TotalInterPETs:  The total interaction PETs between every two interacting peaks.

Each row in the metadata entry corresponds to the same row in the main object.
```{r}
metadata(MACPET_GenomeMapData)
```
<!--  methods -->
\section{\Biocpkg{MACPET} Methods}

This section describes methods associated with the classes in the \Biocpkg{MACPET} package.

<!-- summary -->
\subsection{summary-method}
All \Biocpkg{MACPET} classes are associated with a summary method which sums
up the information stored in each class:

\subsubsection{\textcolor{LimeGreen}{\Rclass{PSelf}} Class}

\Rfunction{summary} for \textcolor{LimeGreen}{\Rclass{PSelf}} class prints information about
the total number of self-ligated PETs for each chromosome, as well as the total number of
self-ligated PETs in the data, their min/max length and genome information of the data:
```{r}
class(MACPET_pselfData)
summary(MACPET_pselfData)
```


\subsubsection{\textcolor{LimeGreen}{\Rclass{PSFit}} Class}
\Rfunction{summary} for \textcolor{LimeGreen}{\Rclass{PSFit}} class adds information to the 
\Rfunction{summary} of \textcolor{LimeGreen}{\Rclass{PSelf}} class. The new information is
the total regions found and analysed for each chromosome and the total number of 
candidate binding sites found on each chromosome:
```{r}
class(MACPET_psfitData)
summary(MACPET_psfitData)
```


\subsubsection{\textcolor{LimeGreen}{\Rclass{PIntra}} Class}
\Rfunction{summary} for \textcolor{LimeGreen}{\Rclass{PIntra}} class prints 
information about the total number of intra-ligated PETs for each chromosome,
as well as information about the genome. The user can 
choose to plot a heat-map for the total number of intra-ligated PETs on each chromosome:
```{r}
class(MACPET_pintraData)
requireNamespace("ggplot2")
requireNamespace("reshape2")
summary(MACPET_pintraData,heatmap=TRUE)
```

\subsubsection{\textcolor{LimeGreen}{\Rclass{PInter}} Class}
\Rfunction{summary} for \textcolor{LimeGreen}{\Rclass{PInter}} class prints 
information about the total number of inter-ligated PETs for each chromosome,
as well as information about the genome. The user can 
choose to plot a heat-map for the total number of inter-ligated PETs connecting the chromosomes:
```{r}
class(MACPET_pinterData)
requireNamespace("ggplot2")
requireNamespace("reshape2")
summary(MACPET_pinterData,heatmap=TRUE)
```
\subsubsection{\textcolor{LimeGreen}{\Rclass{GenomeMap}} Class}

\Rfunction{summary} for \textcolor{LimeGreen}{\Rclass{GenomeMap}} class prints information about
the total number of interactions in the data. The user can provide a threshold for the FDR cut-off of the
significant interactions to make the summary from. Alternatively if threshold=NULL all the interactions will be used for the summary.
```{r}
class(MACPET_GenomeMapData)
summary(MACPET_GenomeMapData)
```

<!-- plot -->
\subsection{plot-method}
All \Biocpkg{MACPET} classes are associated with a plot method which can be 
used to visualize counts, PETs in a region, as well as binding sites. Here we give
some examples for the usage of the plot methods, however more arguments can be
provided to the plot methods, see \Rpackage{MACPET::plot}.

\subsubsection{\textcolor{LimeGreen}{\Rclass{PSelf}} Class}
\Rfunction{plot} for \textcolor{LimeGreen}{\Rclass{PSelf}}  Class will create a
bar-plot showing the total number of self-ligated PETs on each chromosome. 
The x-axis are the chromosomes and the y-axis are the corresponding frequencies.
```{r}
requireNamespace("ggplot2")
class(MACPET_pselfData)
# PET counts plot
plot(MACPET_pselfData)
```


\subsubsection{\textcolor{LimeGreen}{\Rclass{PSFit}} Class}
\Rfunction{plot} for \textcolor{LimeGreen}{\Rclass{PSFit}} Class will create a
bar-plot (if kind="PeakCounts") showing the total number of candidate binding 
sites found on each chromosome. The x-axis are the chromosomes and the y-axis 
are the corresponding frequencies.
```{r}
class(MACPET_psfitData)
#binding site couts:
plot(MACPET_psfitData,kind="PeakCounts")
```

Other kind of plots are also supported for this class. For example if 
kind="PeakPETs", then a visual representation of a region will be plotted 
(RegIndex chooses which region to plot with 1 meaning the one with the
highest total of PETs in it). The x-axis are the genomic coordinates of the region
and the y-axis if the sizes of the PETs. Each segment represents a PET from its
start to its end coordinate. Different colors of colors represent which 
binding site each PET belongs to, with red (PeakID=0) representing the noise
cluster. Vertical lines represent the exact binding location of the 
binding site.
```{r}
# region example with binding sites:
plot(MACPET_psfitData,kind="PeakPETs",RegIndex=1)
```


\subsubsection{\textcolor{LimeGreen}{\Rclass{PIntra}} Class}
\Rfunction{plot} for \textcolor{LimeGreen}{\Rclass{PIntra}} Class will create a
bar-plot showing the total number of intra-ligated PETs on each chromosome. 
The x-axis are the chromosomes and the y-axis are the corresponding 
frequencies.
```{r}
class(MACPET_pintraData)
#plot counts:
plot(MACPET_pintraData)
```

\subsubsection{\textcolor{LimeGreen}{\Rclass{PInter}} Class}
\Rfunction{plot} for \textcolor{LimeGreen}{\Rclass{PInter}} Class.
Each node represents a chromosome where the size of the node is proportional to
the total number of Inter-chromosomal PETs leaving from this chromosome. Edges
connect interacting chromosomes where the thickness of each edge is proportional
to the total number of Inter-chromosomal PETs connecting the two chromosomes.
```{r}
class(MACPET_pinterData)
requireNamespace("igraph")
#network plot:
plot(MACPET_pinterData)
```

\subsubsection{\textcolor{LimeGreen}{\Rclass{GenomeMap}} Class}
\Rfunction{plot} for \textcolor{LimeGreen}{\Rclass{GenomeMap}} Class.
Different kind of plot can be created using the Type parameter.
The user can also specify a threshold for the significant interactions to make the plots from.
In the following example, each node represents a chromosome and the edges show which chromosomes
have significant interactions between them.
```{r}
class(MACPET_GenomeMapData)
requireNamespace("igraph")
#network plot:
plot(MACPET_GenomeMapData,Type='network-circle')
```


\subsection{exportPeaks methods}
\textcolor{LimeGreen}{\Rclass{PSFit}} 
class has a method which exports the binding site information stored in 
\Rfunction{metadata(object)[['Peaks.Info']]} into csv files in a given directory
if one wishes to have the binding sites in an excel file.
The user can also specify a threshold for the FDR. If no threshold is specified  all
the binding sites found by the algorithm are exported.

```{r,eval=TRUE,echo=TRUE}
class(MACPET_psfitData)#PSFit class
exportPeaks(object=MACPET_psfitData,file.out="Peaks",threshold=1e-5,savedir=SA_AnalysisDir)
```


\subsection{PeaksToGRanges methods}
\textcolor{LimeGreen}{\Rclass{PSFit}}
class has also a method which converts the binding sites found by the peak-calling algorithm
into a \Rclass{GRanges} object with start and end coordinates the binding site's confidence interval (CIQ.Up.start,CIQ.Down.end).
It furthermore contains information about the total number of PETs in the peak (TotPETs),
the p-value of the peak (p.value) and its FDR (FDR). The user can also specify an FDR
 threshold for returning significant peaks. If threshold=NULL, all the found peaks
 are returned.

```{r,eval=TRUE,echo=TRUE}
class(MACPET_psfitData)#PSFit class
object=PeaksToGRanges(object=MACPET_psfitData,threshold=1e-5)
object
```

\subsection{TagsToGInteractions methods}

\textcolor{LimeGreen}{\Rclass{PSFit}}
class has also a method which returns only PETs belonging to peaks 
(removing noisy or insignificant PETs) as a
\Rclass{GInteractions} object. This might be useful if one wishes to
visualize the tags belonging to PETs of binding sites on the
genome-browser. The user can also specify an FDR
 threshold for returning significant peaks. 
 If threshold=NULL, all the found peaks are returned.

```{r,eval=TRUE,echo=TRUE}
class(MACPET_psfitData)#PSFit class
TagsToGInteractions(object=MACPET_psfitData,threshold=1e-5)

```


\subsection{PeaksToNarrowPeak methods}
\textcolor{LimeGreen}{\Rclass{PSFit}} 
class has a method which converts peaks of an object of \textcolor{LimeGreen}{\Rclass{PSFit}}
class to narrowPeak object. 
The object is saved in a user specified directory and can be used in the
MANGO or MICC algorithms for interaction analysis. Alternatively, the user can use stage 4 in \Rfunction{MACPETUlt} for running the interaction analysis stage.

```{r,eval=TRUE,echo=TRUE}
class(MACPET_psfitData)#PSFit class
PeaksToNarrowPeak(object=MACPET_psfitData,threshold=1e-5,
                  file.out="MACPET_peaks.narrowPeak",savedir=SA_AnalysisDir)
```


\subsection{ConvertToPSelf methods}

This method if for the 
\textcolor{LimeGreen}{\Rclass{GInteractions}} class. It converts a
\Rclass{GInteractions} object to \textcolor{LimeGreen}{\Rclass{PSelf}} object. 
This method could be used in case the user already has the self-ligated PETs
separated from the rest of the data and wishes to run a binding site analysis on those only using stage 3 in  \Rfunction{MACPETUlt}. The output object will be saved in the user-specified directory.

```{r,eval=TRUE,echo=TRUE}
 #--remove information and convert to GInteractions:
object=MACPET_pselfData
#--remove information and convert to GInteractions:
S4Vectors::metadata(object)=list(NULL)
class(object)='GInteractions'
#----input parameters
S2_BlackList=TRUE
SA_prefix="MACPET"
S2_AnalysisDir=SA_AnalysisDir

ConvertToPSelf(object=object,
               S2_BlackList=S2_BlackList,
               SA_prefix=SA_prefix,
               S2_AnalysisDir=S2_AnalysisDir)
#load object:
rm(MACPET_pselfData)#old object
load(file.path(S2_AnalysisDir,"MACPET_pselfData"))
class(MACPET_pselfData)

```

\subsection{GetSignInteractions methods}
\textcolor{LimeGreen}{\Rclass{GenomeMap}} 
class has a method which subsets the significant interactions given a user-specified FDR threshold and
returns either a \textcolor{LimeGreen}{\Rclass{GInteractions}} class object for the interactions (each row corresponds to one interaction), or it saves the significant interactions into an excel file in a user specified directory. Metadata columns are also provided which given further information about each interaction.

```{r,eval=TRUE,echo=TRUE}
class(MACPET_GenomeMapData)#GenomeMap class
GetSignInteractions(object=MACPET_GenomeMapData,
                     threshold = NULL,
                     ReturnedAs='GInteractions')
```

\subsection{GetShortestPath methods}
\textcolor{LimeGreen}{\Rclass{GenomeMap}} 
class has a method which finds the length of the  shortest path between two user-specified peaks. Currently it only finds the shortest paths between intra-chromosomal peaks. Therefore, the peaks have to be on the same chromosome.
The resulting value is a two-element list with the first element named LinearPathLength
for the linear length of the path between summits of the two peaks,
and the second element named ThreeDPathLength for the 3D length of the shortest path
between summits of the two peaks.

```{r,eval=TRUE,echo=TRUE}
class(MACPET_GenomeMapData)#GenomeMap class
GetShortestPath(object=MACPET_GenomeMapData,
                     threshold = NULL,
                     ChrFrom="chr1",
                     ChrTo="chr1",
                     SummitFrom=10000,
                     SummitTo=1000000)
```


<!--  supplementary functions -->
\section{\Biocpkg{MACPET} Supplementary functions}

\subsection{AnalysisStatistics function}

\Rfunction{AnalysisStatistics} function can be used for all the
classes of the \Biocpkg{MACPET} package for printing and/or saving statistics
of the classes in csv file in a given working directory. Input for Self-ligated
PETs of one of the classes (\textcolor{LimeGreen}{\Rclass{PSelf}},
\textcolor{LimeGreen}{\Rclass{PSFit}}) is mandatory, while input for
the Intra- and Inter-chromosomal PETs is not. 

If the input for the Self-ligated PETs is of \textcolor{LimeGreen}{\Rclass{PSFit}}
class, a threshold can be given for the FDR cut-off.

Here is an example:
```{r,echo=TRUE,eval=TRUE}
AnalysisStatistics(x.self=MACPET_psfitData,
                   x.intra=MACPET_pintraData,
                   x.inter=MACPET_pinterData,
                   file.out='AnalysisStats',
                   savedir=SA_AnalysisDir,
                   threshold=1e-5)

```

\subsection{ConvertToPE\_BAM function}

\Rfunction{ConvertToPE\_BAM} in case the user has two separate BAM files from 
read 1 and 2 of the paired data, and needs to pair them in one paired-end 
BAM file for further analysis in stage 2-3 on the \Rfunction{MACPETUlt} function.
The output paired-end BAM file will be saved in the user-specified directory.

Here is an example:
```{r,echo=TRUE,eval=TRUE}
requireNamespace('ggplot2')

#Create a temporary forder, or anywhere you want:
S1_AnalysisDir=SA_AnalysisDir

#directories of the BAM files:
BAM_file_1=system.file('extdata', 'SampleChIAPETDataRead_1.bam', package = 'MACPET')
BAM_file_2=system.file('extdata', 'SampleChIAPETDataRead_2.bam', package = 'MACPET')
SA_prefix="MACPET"

#convert to paired-end BAM:
ConvertToPE_BAM(S1_AnalysisDir=S1_AnalysisDir,
             SA_prefix=SA_prefix,
             S1_BAMStream=2000000,S1_image=TRUE,
             S1_genome="hg19",BAM_file_1=BAM_file_1,
             BAM_file_2=BAM_file_2)

#test if the resulted BAM is paired-end:
PairedBAM=file.path(S1_AnalysisDir,paste(SA_prefix,"_Paired_end.bam",sep=""))
Rsamtools::testPairedEndBam(file = PairedBAM, index = PairedBAM)

bamfile = Rsamtools::BamFile(file = PairedBAM,asMates = TRUE)
GenomicAlignments::readGAlignmentPairs(file = bamfile,use.names = FALSE,
                                with.which_label = FALSE,
                                strandMode = 1)

```

<!-- analysis workflow -->
\section{Peak Calling Workflow}

The main function which the user can use for running a paired-end data 
analysis is called \Rfunction{MACPETUlt}. It consists of the  five stages described
in the introduction section. The user may run the whole pipeline/analysis at once using Stages=c(0:4) or step by step using a single stage at a time. The function supports the \Biocpkg{BiocParallel} package.

For the following example we run stages 2 and 4 of the \Rfunction{MACPETUlt} only.
The reason is that for running state 1, the bowtie index is needed which is 
too big for downloading it here.


```{r,echo=TRUE,eval=TRUE}

#give directory of the BAM file:
S2_PairedEndBAMpath=system.file('extdata', 'SampleChIAPETData.bam', package = 'MACPET')

#give prefix name:
SA_prefix="MACPET"

#parallel backhead can be created using the BiocParallel package
#parallel backhead can be created using the BiocParallel package
#requireNamespace('BiocParallel')
#snow <- BiocParallel::SnowParam(workers = 4, type = 'SOCK', progressbar=FALSE)
#BiocParallel::register(snow, default=TRUE)

# packages for plotting:
requireNamespace('ggplot2')

#-run for the whole binding site analysis:
MACPETUlt(SA_AnalysisDir=SA_AnalysisDir,
       SA_stages=c(2:4),
       SA_prefix=SA_prefix,
       S2_PairedEndBAMpath=S2_PairedEndBAMpath,
       S2_image=TRUE,
       S2_BlackList=TRUE,
       S3_image=TRUE,
       S4_image=TRUE,
       S4_FDR_peak=1)# the data is small so use all the peaks found.

#load results:
SelfObject=paste(SA_prefix,"_pselfData",sep="")
load(file.path(SA_AnalysisDir,"S2_results",SelfObject))
SelfObject=get(SelfObject)
class(SelfObject) # see methods for this class

IntraObject=paste(SA_prefix,"_pintraData",sep="")
load(file.path(SA_AnalysisDir,"S2_results",IntraObject))
IntraObject=get(IntraObject)
class(IntraObject) # see methods for this class

InterObject=paste(SA_prefix,"_pinterData",sep="")
load(file.path(SA_AnalysisDir,"S2_results",InterObject))
InterObject=get(InterObject)
class(InterObject) # see methods for this class

SelfFitObject=paste(SA_prefix,"_psfitData",sep="")
load(file.path(SA_AnalysisDir,"S3_results",SelfFitObject))
SelfFitObject=get(SelfFitObject)
class(SelfFitObject) # see methods for this class

GenomeMapObject=paste(SA_prefix,"_GenomeMapData",sep="")
load(file.path(SA_AnalysisDir,"S4_results",GenomeMapObject))
GenomeMapObject=get(GenomeMapObject)
class(GenomeMapObject) # see methods for this class

#-----delete test directory:
unlink(SA_AnalysisDir,recursive=TRUE)

```


 \Rfunction{MACPETUlt} saves its outputs in SA\_AnalysisDir in the folders
 S0\_results, S1\_results, S2\_results, S3\_results and S4\_results based on the stages run.
 The output of \Rfunction{MACPETUlt} in those folders is the following:
 
Stage 0: (output saved in a folder named S0\_results in  SA\_AnalysisDir)

* SA\_prefix\_usable\_1.fastq.gz: fastq.gz files with the usable 5-end tags. To be used in Stage 1.
* SA\_prefix\_usable\_2.fastq.gz: fastq.gz files with the usable 3-end tags. To be used in Stage 1.
* SA\_prefix\_chimeric\_1.fastq.gz: fastq.gz files with the chimeric 5-end tags.
* SA\_prefix\_chimeric\_2.fastq.gz: fastq.gz files with the chimeric 3-end tags.
* SA\_prefix\_ambiguous\_1.fastq.gz: fastq.gz files with the ambiguous 5-end tags. 
* SA\_prefix\_ambiguous\_2.fastq.gz: fastq.gz files with the ambiguous 3-end tags.
* SA\_prefix\_stage\_0\_image.jpg: Pie chart image with the split of two fastq files used as input (if S0\_image==TRUE)


Stage 1: (output saved in a folder named S1\_results in  SA\_AnalysisDir)

* SA\_prefix\_usable\_1.sam: sam file with the mapped 5-end reads (if S1\_rmSam==FALSE).
* SA\_prefix\_usable\_2.sam: sam file with the mapped 3-end reads (if S1\_rmSam==FALSE).
* SA\_prefix\_Paired\_end.bam: paired-end bam file with the mapped PETs. To be used in Stage 2
* SA\_prefix\_Paired\_end.bam.bai: .bai paired-end bam file with the mapped PETs. To be used in Stage 2
* SA\_prefix\_stage\_1\_p1\_image.jpg: Pie-chart for the mapped/unmapped reads from SA\_prefix\_usable\_1.sam
and SA\_prefix\_usable\_2.sam (if S1\_image==TRUE).
* SA\_prefix\_stage\_1\_p2\_image.jpg: Pie-chart for the paired/unpaired reads of SA\_prefix\_Paired\_end.bam (if S1\_image==TRUE).


Stage 2: (output saved in a folder named S2\_results in  SA\_AnalysisDir)

* SA\_prefix\_pselfData: An object of \textcolor{LimeGreen}{\Rclass{PSelf}}
 class, containing the self-ligated PETs. To be used in Stage 3.
* SA\_prefix\_pintraData: An object of \textcolor{LimeGreen}{\Rclass{PIntra}}
 class, containing the intra-chromosomal PETs.
* SA\_prefix\_pinterData: An object of \textcolor{LimeGreen}{\Rclass{PInter}}
 class, containing the inter-chromosomal PETs.
* SA\_prefix\_stage\_2\_p1\_image.jpg: Pie-chart reliable/duplicated/black-listed PETs of SA\_prefix\_Paired\_end.bam (if S2\_image==TRUE).
* SA\_prefix\_stage\_2\_p2\_image.jpg: Histogram with the self-ligated/intra-chromosomal cut-off of SA\_prefix\_Paired\_end.bam (if S2\_image==TRUE).
* SA\_prefix\_stage\_2\_p3\_image.jpg: Pie-chart for the self-ligated/intra-chromosomal/inter-chromosomal PETs of SA\_prefix\_Paired\_end.bam (if S2\_image==TRUE).

Stage 3: (output saved in a folder named S3\_results in  SA\_AnalysisDir)

* SA\_prefix\_psfitData: An object of \textcolor{LimeGreen}{\Rclass{PSFit}} class. This object contains
 peaks found by the peak-calling algorithm along with their p-values and FDR.
* SA\_prefix\_stage\_3\_p1\_image.jpg: Sizes of the upstream vs downstream peaks of each binding site given the binding site's FDR (if S3\_image==TRUE).
* SA\_prefix\_stage\_3\_p2\_image.jpg: FDR of the binding sites. The horizontal red line is at FDR=0.05 (if S3\_image==TRUE).
* SA\_prefix\_stage\_3\_p3\_image.jpg: Comparison of binding site sizes given their FDR (if S3\_image==TRUE).
* SA\_prefix\_stage\_3\_p4\_image.jpg: FDR for the upstream/donwstream peaks of the binding sites given the binding sites FDR (if S3\_image==TRUE).

Stage 4: (output saved in a folder named S4\_results in  SA\_AnalysisDir)

* SA\_prefix\_GenomeMapData: An object of \textcolor{LimeGreen}{\Rclass{GenomeMap}} class. This object contains all the interactions found in the data.
* SA\_prefix\_stage\_4\_p1\_image.jpg: Pie charts for the total number of peaks used in the 
interaction analysis as well as the total number of interaction PETs used (if S4\_image==TRUE).



Stages 0:4:

*  All the above outputs in separate folders.
 

Furthermore a log-file named SA\_prefix\_analysis.log with the progress of the analysis
is also saved in the SA\_AnalysisDir.