\name{ConsensusClusterPlus}
\alias{ConsensusClusterPlus}
\alias{calcICL}
\title{ run ConsensusClusterPlus}
\description{
ConsensusClusterPlus function for determing cluster number and class membership by stability evidence.
calcICL function for calculating cluster-consensus and item-consensus.
}
\usage{
ConsensusClusterPlus(
d=NULL, maxK = 3, reps=10, pItem=0.8, pFeature=1, clusterAlg="hc",title="untitled_consensus_cluster",
innerLinkage="average", finalLinkage="average", distance="pearson", ml=NULL,
tmyPal=NULL,seed=NULL,plot=NULL,writeTable=FALSE,weightsItem=NULL,weightsFeature=NULL,verbose=F)
calcICL(res,title="untitled_consensus_cluster",plot=NULL,writeTable=FALSE)
}
\arguments{
\item{d}{matrix where columns=items/samples and rows are features. For example, a gene expression matrix of genes in rows and microarrays in columns. OR ExpressionSet object.}
\item{maxK}{integer value. maximum cluster number to evaluate. }
\item{reps}{integer value. number of subsamples. }
\item{pItem}{numerical value. proportion of items to sample. }
\item{pFeature}{numerical value. proportion of features to sample. }
\item{clusterAlg}{character value. cluster algorithm. "hc" heirarchical (hclust) or "km" for k-means. See Note. }
\item{title}{ character value for output directory. Directory is created only if plot is not NULL or writeTable is TRUE. This title can be an abosulte or relative path. }
\item{innerLinkage}{heirarchical linkage method for subsampling. }
\item{finalLinkage}{heirarchical linkage method for consensus matrix. }
\item{distance}{character value. sample distance measures: "pearson","spearman", or "euclidean". }
\item{ml}{optional. prior result, if supplied then only do graphics and tables.}
\item{tmyPal}{optional character vector of colors for consensus matrix}
\item{seed}{optional numerical value. sets random seed for reproducible results.}
\item{plot}{character value. NULL - print to screen, 'pdf', 'png'.}
\item{writeTable}{logical value. TRUE - write ouput and log to csv.}
\item{weightsItem}{optional numerical vector. weights to be used for sampling items.}
\item{weightsFeature}{optional numerical vector. weights to be used for sampling features.}
\item{res}{ result of consensusClusterPlus.}
\item{verbose}{ boolean. If TRUE, print messages to the screen to indicate progress. This is useful for large datasets.}
}
\details{
ConsensusClusterPlus implements the Consensus Clustering algorithm of Monti, et al (2003) and extends this method with new functionality and visualizations.
Its utility is to provide quantitative stability evidence for determing a cluster count and cluster membership in an unsupervised analysis.
ConsensusClusterPlus takes a numerical data matrix of items as columns and rows as features. This function subsamples this matrix according to pItem, pFeature, weightsItem, and weightsFeature, and clusters the data into 2 to maxK clusters by clusterArg clusteringAlgorithm. Agglomerative heirarchical (hclust) and kmeans clustering are supported by an option see above. For users wishing to use a different clustering algorithm for which many are available in R, one can supply their own clustering algorithm as a simple programming hook - see the second commented-out example that uses divisive heirarchical clustering.
For a detailed description of usage, output and images, see the vignette by: openVignette().
}
\value{
ConsensusClusterPlus returns a list of length maxK. Each element is a list containing consensusMatrix (numerical matrix), consensusTree (hclust), consensusClass (consensus class asssignments). ConsensusClusterPlus also produces images.
calcICL returns a list of two elements clusterConsensus and itemConsensus corresponding to cluster-consensus and item-consensus. See Monti, et al (2003) for formulas.
}
\author{ Matt Wilkerson mwilkers@med.unc.edu }
\references{
Monti, S., Tamayo, P., Mesirov, J., Golub, T. (2003) Consensus Clustering:
A Resampling-Based Method for Class Discovery and Visualization of Gene
Expression Microarray Data. Machine Learning, 52, 91-118.
}
\examples{
## obtain gene expression data
library(Biobase)
data(geneData)
d=geneData
#median center genes
d = sweep(d,1, apply(d,1,median))
## run consensus cluster
rcc = ConsensusClusterPlus(d,maxK=4,reps=100,pItem=0.8,pFeature=1,title="example")
## ICL
resICL = calcICL(rcc,title="example")
##example of programming hook for clusterAlg:
#library(cluster)
#dianaHook = function(this_dist,k){
#tmp = diana(this_dist,diss=TRUE)
#assignment = cutree(tmp,k)
#return(assignment)
#}
#ConsensusClusterPlus(d,maxK=6,reps=25,pItem=0.8,pFeature=1,title="example",plot="png",clusterAlg="dianaHook")
}
\keyword{ methods }