Presentation on theme: "BY ROSELINE ANTAI CLUTO A Clustering Toolkit. What is CLUTO? CLUTO is a software package which is used for clustering high dimensional datasets and for."— Presentation transcript:
BY ROSELINE ANTAI CLUTO A Clustering Toolkit
What is CLUTO? CLUTO is a software package which is used for clustering high dimensional datasets and for analyzing the characteristics of the various clusters.
Algorithms of CLUTO vcluster scluster Major difference: Input format vcluster: actual multidimensional representation of the objects to be clustered. scluster: The similarity matrix (or graph) between these objects.
Reporting and Analysis Parameters Control the amount of information that vcluster and scluster report about the clusters as well as the analysis performed on discovered clusters. Examples 1. -clustfile = string. ( Default is MatrixFile.clustering.Nclusters( or GraphFile)) 2. -clabelfile = string (name of the file that’s stores the labels of the columns. Used when –showfeatues, -showsummaries or –labeltree are used)
3. -rlabelfile=string 4. -rclassfile=string (Stores the labels of the rows – objects to be clustered). 5. -showtree 6. -showfeatures (descriptive and discriminating)
Cluster Visualization Parameters Simple plots of the original input matrix which show how the different objects (rows) and features (columns) are clustered together. Examples 1. -plottree = string; gives graphic representation of the entire hierarchical tree 2. -plotmatrix = string; shows how the rows of the original matrix are clustered together.
Classfile and rlabelfile Evo Sem Imp Imp Deo Deo Imp Imp Deo Deo Imp Deo Deo Imp Sem Deo Sem Imp Imp Evo
The plot uses red to denote positive values and green to denote negative values. Bright red/green indicate large positive/negative values, whereas colors close to white indicate values close to zero.