% % NOTE -- ONLY EDIT beadarraySNP.Rnw!!! % beadarraySNP.tex file will get overwritten. % %\VignetteIndexEntry{beadarraySNP Vignette} %\VignetteDepends{} %\VignetteKeywords{SNP Analysis} %\VignettePackage{beadarraySNP} \documentclass{article} \usepackage{hyperref} \textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textsf{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\classdef}[1]{% {\em #1} } \begin{document} \title{Introduction of beadarraySNP} \maketitle \section*{Introduction} beadarraySNP is a Bioconductor package for the analysis of Illumina genotyping BeadArray data, especially derived from the GoldenGate assay. The functionality includes importing datafiles produced by Illumina software, doing LOH analysis and performing copy number analysis. \section{Import} The package contains an artificially small example to show some of the key aspects of analysis. Normally an experiment contains 96 samples spread over four probe panels of about 1500 SNPs each. The example has data from 2 colorectal tumors and corresponding leukocyte DNA of 1 probe (or OPA) panel. This particular OPA panel contains probes on chromosomes 16-20 and X and Y. <>= library(beadarraySNP) @ In this case all datafiles have been put in 1 directory. The import function \Rfunction{read.SnpSetIllumina} can be invoked to use all separate directories that are used with the Illumina software. We start by loading in the data into a \Rclass{SnpSetIllumina} object. A samplesheet is used to identify the samples in the experiment. This file can be created by the Illumina software. The next code shows the example samplesheet. The import function searches for the line starting with [Data]. The "Sample\_Well", "Sample\_Plate", and "Sample\_Group" columns are not used, but the other columns should have meaningful values. The "Pool\_ID" and "Sentrix\_ID" should be identical for all samples in 1 samplesheet. Samples from multiple experiments or multiple probe panels can be combined after they are imported. <>= datadir <- system.file("testdata", package="beadarraySNP") readLines(paste(datadir,"4samples_opa4.csv",sep="/")) @ <>= SNPdata <- read.SnpSetIllumina(paste(datadir,"4samples_opa4.csv",sep="/"),datadir) SNPdata @ The targets file contains extra information on the samples which are important during normalization. The \Rfunarg{NorTum} column indicates normal and tumor samples. The tumor samples are not used for the invariant set in between sample normalization. The \Rfunarg{Gender} column is used to properly normalize the sex chromosomes. The following lines show a possible way to add these columns to the \Rfunarg{phenoData} slot of the object. If a \Rfunarg{NorTum} and/or \Rfunarg{Gender} column is in the \Rfunarg{phenoData} slot they will be automatically used later on. <>= pd<-read.AnnotatedDataFrame(paste(datadir,"targets.txt",sep="/"),sep="\t") pData(SNPdata)<-cbind(pData(SNPdata),pData(pd)) @ \section{Quality control} Illumina Sentrix arrays contain 96 wells to process up to 96 samples. Each well can be used with a separate set of probes or OPA panel. The example shows the QC of the full experiment form which the example files were taken. This experiment consisted of 24 samples with 4 probe panels each <>= qc<-calculateQCarray(SNPdata) data(QC.260) plotQC(QC.260,"greenMed") @ Other types of plots ("intensityMed","greenMed","redMed","validn","annotation", "samples") show the median intensity, median red intensity or identifying information. <>= par(mfrow=c(2,2),mar=c(4,2,1,1)) reportSamplePanelQC(QC.260,by=8) SNPdata<-removeLowQualitySamples(SNPdata,1500,100,"OPA") @ Based on these findings, low quality samples can be removed from the experiment \section{Normalization} Normalization to calculate copy number is a multi-step process. In Oosting(2007) we have determined that the following procedure provides the optimal strategy: \begin{enumerate} \item Perform quantile normalization between both colors of a sample. This is allowed because the frequencies of both alleles throughout a sample are nearly identical in practice. This action also neutralizes any dye bias. \item Scale each sample using the median of the high quality heterozygous SNPs as the normalization factor. Genomic regions that show copy number alterations are likely to show LOH(loss of heterozygosity), or are harder to genotype leading to a decrease quality score of the call. \item Scale each probe using the normal samples in the experiment. Assume that these samples are diploid, and have a copy number of 2. \end{enumerate} <>= SNPnrm<-normalizeBetweenAlleles.SNP(SNPdata) SNPnrm<-normalizeWithinArrays.SNP(SNPnrm,callscore=0.8,relative=TRUE,fixed=FALSE,quantilepersample=TRUE) SNPnrm<-normalizeLoci.SNP(SNPnrm,normalizeTo=2) @ \section{Reporting} Although the OPA panels contain distinct chromosomes, also a few spurious SNPs on other chromosomes are in it. We first select the probes that are located on the chromosomes for this OPA panel. <>= SNPnrm<-SNPnrm[featureData(SNPnrm)$CHR %in% c("4","16","17","18","19","20","X","Y"),] reportSamplesSmoothCopyNumber(SNPnrm,normalizedTo=2,smooth.lambda=4) @ A figure is created of all 4 samples in the dataset with copynumber along the chromosomes. <>= reportSamplesSmoothCopyNumber(SNPnrm, normalizedTo=2, paintCytobands =TRUE, smooth.lambda=4,organism="hsa",sexChromosomes=TRUE) @ By using the \Rfunarg{organism} a figure is created that shows all chromosomes in 2 columns. Experimental data is plotted wherever it is available in the dataset. \section{References} Oosting J, Lips EH, van Eijk R, Eilers PH, Szuhai K, Wijmenga C, Morreau H, van Wezel T. High-resolution copy number analysis of paraffin-embedded archival tissue using SNP BeadArrays. Genome Res. 2007 Mar;17(3):368-76. Epub 2007 Jan 31. \section{Session Information} The version number of R and packages loaded for generating the vignette were: <>= toLatex(sessionInfo()) @ \end{document}