---
title: 'OMICsPCAdata: Supporting data for package OMICsPCA'
author: "Subhadeep Das"
date: "`r Sys.Date()`"
output:
    html_document:
      df_print: tibble
      fig_height: 7
      fig_width: 9
      highlight: pygments
      keep_md: yes
      number_sections: yes
      theme: lumen
      toc: yes
      toc_depth: 6
vignette: >
    %\VignetteEncoding{UTF-8} 
    %\VignetteIndexEntry{OMICsPCAdata} 
    %\VignetteEngine{knitr::rmarkdown}
---

<style type="text/css">

body, td {
    font-size: 20px;
    text-align: justify
}
code.r{
    font-size: 15px;
}
pre {
    font-size: 15px
}
</style>


```{r include = TRUE, echo = FALSE, message = FALSE, warning = FALSE}
library(OMICsPCAdata)
library(kableExtra)
library(rmarkdown)
library(knitr)
library(MultiAssayExperiment)
```

```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#",
eval = TRUE
)

```

```{r global_options, include=FALSE}
knitr::opts_chunk$set(fig.pos = 'H')
```

```{r echo = FALSE}
datalist <- data(package = "OMICsPCAdata")
datanames <- datalist$results[,3]
data(list = datanames)
assays <- assays(multi_assay)
```

# Overview

This package contains suporting data for the package OMICsPCA.
The datasets included in this package are used to run the examples
provided at the vignette and the man pages of the
functions of OMICsPCA package.

# Datasets

OMICsPCAdata contains 4 data sets including 1 
expression dataset (CAGE) and 3 ChIP-seq 
experiments of various Histone
modifications. Each dataset contains the values 
corresponding to 28770 GENCODE TSS groups (rows).
TSS groups are obtained by merging the neighboring TSSs
together. Each column contains either ChIP-seq
peak intensities (for Histone modifications) or length
of DHS (for DNaseI hypersensitivity) or number of reads
(tpm) of CAGE defined TSSs (CTSS), coming from a specific
cell line.


The datasets are described below:

```{r echo =FALSE}
data_summary = data.frame(
Name = c(
"CAGE" ,"H2az" ,"H3k9ac",
"H3k4me1"),

Type_of_Assay = c(
"Expression of Transcription Start Sites(TSS)",
"location of Histone modification peaks",
"location of Histone modification peaks",
"location of Histone modification peaks"
),

No_of_Cell_lines =c(
ncol(assays$CAGE),
ncol(assays$H2az),
ncol(assays$H3k9ac),
ncol(assays$H3k4me1)
),

Name_of_Cell_lines = c(
paste(names(assays$CAGE), collapse = " ,"),
paste(names(assays$H2az), collapse = " ,"),
paste(names(assays$H3k9ac), collapse = " ,"),
paste(names(assays$H3k4me1), collapse = " ,")
),

Type_of_data = c(
"Cap Analysis\nof\nGene Expression",
"ChIP-seq",
"ChIP-seq",
"ChIP-seq"
)
)
names(data_summary) <- c("Assay", "Assay\nType", "Number of\nCell lines",
"Name of\ncell lines", "Experiment")
```
\  
```{r,eval = TRUE, echo=FALSE, results='asis'}
#for html
knitr::kable(data_summary, caption = "Summary of data sets", align = 'c') %>% 
kable_styling("bordered",full_width = FALSE, position = "center") %>%
column_spec(1, bold = FALSE, border_right = FALSE,
border_left = FALSE, width = "5em") %>%
column_spec(2, bold = FALSE, border_right = FALSE,
border_left = FALSE, width = "5em") %>%
column_spec(3, border_right = FALSE, width = "4em") %>%
column_spec(4, border_right = FALSE, width = "23em") %>%
column_spec(5, border_right = FALSE, width = "5em")
```
\  

# Example of datasets

Each dataset contains the value of 28770 GENCODE TSS groups in
multiple Cell lines. Here is an example:

## CAGE data

```{r}
# The CAGE data set contains normalized CAGE data of 28770 GENCODE
#TSS groups in from 31 cell lines

dim(assays$CAGE)

# Let's look at the first five rows and columns of this dataset
head(assays$CAGE[1:5,1:5])
```