Presentation is loading. Please wait.

Presentation is loading. Please wait.

O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering Lionel F.

Similar presentations


Presentation on theme: "O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering Lionel F."— Presentation transcript:

1 O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering Lionel F. Lovett, II Advisors: George Ostrouchov and Houssain Kettani Computer Science and Mathematics Division Oak Ridge National Laboratory Summer 2005

2 2 O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Why Dimension Reduction?  Data Mining  Large Databases  Number of Items  Number of Attributes (high-dimensionality) with items  Visualization  Requires low dimensional views (2 or 3)  Structure discovery  Patterns  Clusters  Fast similarity searching  Images, video, documents, character recognition, face recognition, DNA sequences  Data Reduction RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering

3 3 O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY RobustMap Uses Distances and Mimics PCA Like FastMap RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering Faloutsos and Lin (1995) (FastMap)  Choose two very distant points as principal axis  Project onto orthogonal hyperplane  Repeat  Each axis O(n), given distances  Distances updated using cosine law as needed  Result is a mapping as well as the transformation to map new items

4 4 O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Projection to Pivot Axis and to Orthogonal Hyperplane Given pivot axis ab FastMap computes coordinates along axis and projections onto the orthogonal hyperplane. b a y z y’ z’ d y’,z’ RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering a y cycy d a,y d b,y d a,b b

5 5 O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Problems with FastMap? OUTLIERS!  Outliers are points that are not closest on average to the other members of their cluster.  When Selecting points based on distance, FastMap considers all the points of a dataset.  By including outliers, FastMap isn’t robust. RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering

6 6 O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY FastMap Pivot Pair: Choosing Outliers RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering Axis does not represent majority of data

7 7 O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY RobustMap: Clustering and Excluding Outliers RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering  Compute n distances from random object  Take point of largest distance  Repeat  Clustering  Estimate distance distribution from two extreme points  Find probability of extreme points  Exclude most extreme cluster of low probability points  Finish projection using remaining points  Diagnostic histogram and cluster plots Ratio Function  Uses only distances from pivots (2 nd and 3 rd )  Computes ratios: data fraction / probability of data  Looks for splits according to ratio threshold  Discards smaller portion.

8 8 O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Dataset Generator  Generates clustered data from a mixture of multivariate normal densities  There are five parameters  Number of dimensions  Number of clusters  Cluster variability  Cluster mixing proportions  Seed for random number generator  Other RobustMap parameters  Number of dimensions to extract  Quantile of trimmed max  Ratio threshold for outlying cluster extraction RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering

9 9 O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Results  RobustMap identifies and excludes outlying clusters  RobustMap performs dimension reduction  RobustMap exploits robust statistics  RobustMap exploits fast machine learning algorithms (runtime O(nk)) RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering

10 10 O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY... PC 1 PC 2 PC 3PC 4 ++ PC 1620 + 135 year CCM3 run at T42 resolution CO 2 increase to 3x Average Monthly Surface Temperature 1620 x 2500 matrix (Putman, Drake, Ostrouchov, 2000) Decomposition of Climate Model Run Data with PCA (EOF) Image vector PC 1 PC 2 PC 3 PC 4 Concise 4-d summary of 135 year run Winter warming more severe than summer warming Amplitude-in-time plots RM 1RM 2RM 3 RM 4 ++ 1000 x FASTER ! + RM 1 RM 2 RM 3 RM 4 Concise 4-d summary of 135 year run Winter warming more severe than summer warming RobustMap Amplitude-in-time plots

11 11 O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Future Plans  Ratio  Compute threshold from probability theory  Create loop for remaining clusters  Develop better probability theory for RobustMap  Add application context visualization RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering

12 12 O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Applications  Searching  Images and multimedia databases  String databases (spelling, typing and OCR error correction)  Medical databases  Data Mining and Visualization  Medical databases (ECGs, X-rays, MRI brain scans)  Demographic Data  Time Series  Business, Commerce, and Financial Data  Climate, Astrophysics, Chemistry, and Biology Data RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering

13 13 O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Questions?


Download ppt "O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering Lionel F."

Similar presentations


Ads by Google