---
title: "cBioPortalData: API Reference Guide for Devs"
author: "Marcel Ramos & Levi Waldron"
date: "`r format(Sys.time(), '%B %d, %Y')`"
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{cBioPortal Developer Guide}
  %\VignetteEncoding{UTF-8}
output:
  BiocStyle::html_document:
    number_sections: no
    toc: yes
    toc_depth: 4
---

```{r, setup, include=FALSE}
knitr::opts_chunk$set(cache = TRUE)
```

# Installation

Please use the devel version of the `AnVIL` Bioconductor package.

```{r,include=TRUE,results="hide",message=FALSE,warning=FALSE}
library(cBioPortalData)
library(AnVIL)
```

# Introduction

The cBioPortal for Cancer Genomics [website](https://cbioportal.org) is a great
resource for interactive exploration of study datasets. However, it does not
easily allow the analyst to obtain and further analyze the data.

We've developed the `cBioPortalData` package to fill this need to
programmatically access the data resources available on the cBioPortal.

The `cBioPortalData` package provides an R interface for accessing the
cBioPortal study data within the Bioconductor ecosystem.

It downloads study data from the cBioPortal API (https://cbioportal.org/api)
and uses Bioconductor infrastructure to cache and represent the data.

We use the [`MultiAssayExperiment`][1] (@Ramos2017-er) package to integrate,
represent, and coordinate multiple experiments for the studies availble in the
cBioPortal. This package in conjunction with `curatedTCGAData` give access to
a large trove of publicly available bioinformatic data. Please see our
publication [here][2] (@Ramos2020-ya).

[1]: https://dx.doi.org/10.1158/0008-5472.CAN-17-0344
[2]: https://dx.doi.org/10.1200/CCI.19.00119

We demonstrate common use cases of `cBioPortalData` and `curatedTCGAData`
during Bioconductor conference
[workshops](https://waldronlab.io/MultiAssayWorkshop/).

# Overview

This vignette is for users / developers who would like to learn more about
the available data in `cBioPortalData` and to learn how to hit other endpoints
in the cBioPortal API implementation. The functionality demonstrated
here is used internally by the package to create integrative representations
of study datasets.

Note. To avoid overloading the API service, the API was designed to only query
a part of the study data. Therefore, the user is required to enter either a set
of genes of interest or a gene panel identifier.

# API representation

Obtaining the cBioPortal API representation object

```{r}
(cbio <- cBioPortal())
```

## Operations

Check available tags, operations, and descriptions as a `tibble`:

```{r}
tags(cbio)
head(tags(cbio)$operation)
```

### Searching through the API

```{r}
searchOps(cbio, "clinical")
```

## Studies

Get the list of studies available:

```{r}
getStudies(cbio)
```

## Clinical Data

Obtain the clinical data for a particular study:

```{r}
clinicalData(cbio, "acc_tcga")
```

## Molecular Profiles

A table of molecular profiles for a particular study can be obtained by
running the following:

```{r}
mols <- molecularProfiles(cbio, "acc_tcga")
mols[["molecularProfileId"]]
```

## Molecular Profile Data

The data for a molecular profile can be obtained with prior knowledge of
available `entrezGeneIds`:

```{r}
molecularData(cbio, molecularProfileIds = "acc_tcga_rna_seq_v2_mrna",
    entrezGeneIds = c(1, 2),
    sampleIds = c("TCGA-OR-A5J1-01",  "TCGA-OR-A5J2-01")
)
```

## Genes

### All available genes

A list of all the genes provided by the API service including hugo symbols,
and entrez gene IDs can be obtained by using the `geneTable` function:

```{r}
geneTable(cbio)
```

### Gene Panels

```{r}
genePanels(cbio)
getGenePanel(cbio, "IMPACT341")
```

## Molecular Gene Panels

### genePanelMolecular

```{r}
gprppa <- genePanelMolecular(cbio,
    molecularProfileId = "acc_tcga_rppa",
    sampleListId = "acc_tcga_all")
gprppa
```

### getGenePanelMolecular

```{r}
getGenePanelMolecular(cbio,
    molecularProfileIds = c("acc_tcga_rppa", "acc_tcga_gistic"),
    sampleIds = allSamples(cbio, "acc_tcga")$sampleId
)
```

### getDataByGenes

```{r}
getDataByGenes(cbio, "acc_tcga", genePanelId = "IMPACT341",
    molecularProfileIds = "acc_tcga_rppa", sampleListId = "acc_tcga_rppa")
```


It uses the `getAllGenesUsingGET` function from the API.

## Samples

### Sample List Identifiers

To display all available sample list identifiers for a particular study ID,
one can use the `sampleLists` function:

```{r}
sampleLists(cbio, "acc_tcga")
```

### Sample Identifiers

One can obtain the barcodes / identifiers for each sample using a specific
sample list identifier, in this case we want all the copy number alteration
samples:

```{r}
samplesInSampleLists(cbio, "acc_tcga_cna")
```

This returns a `CharacterList` of all identifiers for each sample list
identifier input:

```{r}
samplesInSampleLists(cbio, c("acc_tcga_cna", "acc_tcga_cnaseq"))
```

### All samples within a study ID

```{r}
allSamples(cbio, "acc_tcga")
```

### Info on Samples

```{r}
getSampleInfo(cbio, studyId = "acc_tcga",
    sampleListIds = c("acc_tcga_rppa", "acc_tcga_cna"))
```

# Advanced Usage

The `cBioPortal` API representation is not limited to the functions
provided in the package. Users who wish to make use of any of the endpoints
provided by the API specification should use the dollar sign `$` function
to access the endpoints.

First the user should see the input for a particular endpoint as detailed
in the API:

```{r}
cbio$getGeneUsingGET
```

Then the user can provide such input:

```{r}
(resp <- cbio$getGeneUsingGET("BRCA1"))
```

which will require the user to 'translate' the response using `httr::content`:

```{r}
httr::content(resp)
```

## Clearing the cache

For users who wish to clear the entire `cBioPortalData` cache, it is
recommended that they use:

```{r,eval=FALSE}
unlink("~/.cache/cBioPortalData/")
```

# sessionInfo

```{r}
sessionInfo()
```