1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)

Slides:



Advertisements
Similar presentations
Chapter 9 Greedy Technique. Constructs a solution to an optimization problem piece by piece through a sequence of choices that are: b feasible - b feasible.
Advertisements

Lower Bound for Sparse Euclidean Spanners Presented by- Deepak Kumar Gupta(Y6154), Nandan Kumar Dubey(Y6279), Vishal Agrawal(Y6541)
WSPD Applications.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Weighted graphs Example Consider the following graph, where nodes represent cities, and edges show if there is a direct flight between each pair of cities.
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Comments We consider in this topic a large class of related problems that deal with proximity of points in the plane. We will: 1.Define some proximity.
Fast Algorithms For Hierarchical Range Histogram Constructions
Comp 122, Spring 2004 Greedy Algorithms. greedy - 2 Lin / Devi Comp 122, Fall 2003 Overview  Like dynamic programming, used to solve optimization problems.
Greedy Algorithms Greed is good. (Some of the time)
Minimum Spanning Trees Definition Two properties of MST’s Prim and Kruskal’s Algorithm –Proofs of correctness Boruvka’s algorithm Verifying an MST Randomized.
Clustering (1) Clustering Similarity measure Hierarchical clustering Model-based clustering Figures from the book Data Clustering by Gan et al.
Chapter 3 The Greedy Method 3.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Chapter 23 Minimum Spanning Trees
CSE 780 Algorithms Advanced Algorithms Minimum spanning tree Generic algorithm Kruskal’s algorithm Prim’s algorithm.
HCS Clustering Algorithm
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
1 7-MST Minimal Spanning Trees Fonts: MTExtra:  (comment) Symbol:  Wingdings: Fonts: MTExtra:  (comment) Symbol:  Wingdings:
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
Scalable Data Mining The Auton Lab, Carnegie Mellon University Brigham Anderson, Andrew Moore, Dan Pelleg, Alex Gray, Bob Nichols, Andy.
1 Efficient Algorithms for Non-Parametric Clustering With Clutter Weng-Keen Wong Andrew Moore.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
Tirgul 7 Review of graphs Graph algorithms: – BFS (next tirgul) – DFS – Properties of DFS – Topological sort.
Minimum Spanning Trees What is a MST (Minimum Spanning Tree) and how to find it with Prim’s algorithm and Kruskal’s algorithm.
Well Separated Pair Decomposition 16 GigaYears and 3.3 Million Light Years Playing.
October 8, 2013Computer Vision Lecture 11: The Hough Transform 1 Fitting Curve Models to Edges Most contours can be well described by combining several.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
Efficient Gathering of Correlated Data in Sensor Networks
MST Many of the slides are from Prof. Plaisted’s resources at University of North Carolina at Chapel Hill.
UNC Chapel Hill Lin/Foskey/Manocha Minimum Spanning Trees Problem: Connect a set of nodes by a network of minimal total length Some applications: –Communication.
Minimum Spanning Trees CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
CS654: Digital Image Analysis Lecture 30: Clustering based Segmentation Slides are adapted from:
Union-find Algorithm Presented by Michael Cassarino.
Minimal Spanning Tree Problems in What is a minimal spanning tree An MST is a tree (set of edges) that connects all nodes in a graph, using.
Introduction to Graphs And Breadth First Search. Graphs: what are they? Representations of pairwise relationships Collections of objects under some specified.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
1 Assignment #3 is posted: Due Thursday Nov. 15 at the beginning of class. Make sure you are also working on your projects. Come see me if you are unsure.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Introduction Wireless Ad-Hoc Network  Set of transceivers communicating by radio.
Lecture 12 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
1 8. Estimating the cluster tree of a density from the MST by Runt Pruning Problem: 1-nn density estimate is very noisy --- singularity at each observation.
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
November 22, Algorithms and Data Structures Lecture XII Simonas Šaltenis Nykredit Center for Database Research Aalborg University
Data Science Algorithms: The Basic Methods
Chapter 5. Greedy Algorithms
Minimum Spanning Trees
CS 3343: Analysis of Algorithms
Lecture 12 Algorithm Analysis
CS 3343: Analysis of Algorithms
Autumn 2016 Lecture 11 Minimum Spanning Trees (Part II)
Enumerating Distances Using Spanners of Bounded Degree
Minimum Spanning Tree.
Autumn 2015 Lecture 11 Minimum Spanning Trees (Part II)
Introduction Wireless Ad-Hoc Network
Lecture 12 Algorithm Analysis
Richard Anderson Lecture 10 Minimum Spanning Trees
Lecture 14 Shortest Path (cont’d) Minimum Spanning Tree
Clustering.
Lecture 12 Algorithm Analysis
Winter 2019 Lecture 11 Minimum Spanning Trees (Part II)
Lecture 13 Shortest Path (cont’d) Minimum Spanning Tree
Data Mining CSCI 307, Spring 2019 Lecture 23
INTRODUCTION A graph G=(V,E) consists of a finite non empty set of vertices V , and a finite set of edges E which connect pairs of vertices .
Minimum Spanning Trees
Autumn 2019 Lecture 11 Minimum Spanning Trees (Part II)
Presentation transcript:

1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)

2 Problems From the Physical Sciences Minefield detection (Dasgupta and Raftery 1998) Earthquake faults (Byers and Raftery 1998)

3 Problems From the Physical Sciences (Pereira 2002)(Sloan Digital Sky Survey 2000)

4 A Simplified Example

5 Clustering with Single Linkage Clustering ClustersSingle Linkage Clustering MST

6 Clustering with Mixture Models Resulting ClustersMixture of Gaussians with a Uniform Background Component

7 Clustering with CFF Cuevas-Febrero-FraimanOriginal Dataset

8 Related Work (Dasgupta and Raftery 98) Mixture model approach – mixture of Gaussians for features, Poisson process for clutter (Byers and Raftery 98) K-nearest neighbour distances for all points modeled as a mixture of two gamma distributions, one for clutter and one for the features Classify each data point based on which component it was most likely generated from

9 Outline 1. Introduction: Clustering and Clutter 2. The Cuevas-Febreiro-Fraiman Algorithm 3. Optimizing Step One of CFF 4. Optimizing Step Two of CFF 5. Results

10 The CFF Algorithm Step One Find the high density datapoints

11 The CFF Algorithm Step Two Cluster the high density points using Single Linkage Clustering Stop when link length > 

12 The CFF Algorithm Originally intended to estimate the number of clusters Can also be used to find clusters against a noisy background

13 Step One: Density Estimators Finding high density points requires a density estimator Want to make as few assumptions about underlying density as possible Use a non-parametric density estimator

14 A Simple Non-Parametric Density Estimator A datapoint is a high density datapoint if: The number of datapoints within a hypersphere of radius h is > threshold c

15 Speeding up the Non-Parametric Density Estimator Addressed in a separate paper (Gray and Moore 2001) Two basic ideas: 1. Use a dual tree algorithm (Gray and Moore 2000) 2. Cut search off early without computing exact densities (Moore 2000)

16 Step Two: Euclidean Minimum Spanning Trees (EMSTs) Traditional MST algorithms assume you are given all the distances Implies O(N 2 ) memory usage Want to use a Euclidean Minimum Spanning Tree algorithm

17 Optimizing Clustering Step Exploit recent results in computational geometry for efficient EMSTs Involves modification to GeoMST2 algorithm by (Narasimhan et al 2000) GeoMST2 is based on Well-Separated Pairwise Decompositions (WSPDs) (Callahan 1995) Our optimizations gain an order of magnitude speedup, especially in higher dimensions

18 Outline for Optimizing Step Two 1. High level overview of GeoMST2 2. Properties of a WSPD 3. How to create a WSPD 4. More detailed description of GeoMST2 5. Our optimizations

19 Intuition behind GeoMST2

20 Intuition behind GeoMST2

21 High Level Overview of GeoMST2 (A 1,B 1 ) (A 2,B 2 ). (A m,B m ) Well-Separated Pairwise Decomposition

22 High Level Overview of GeoMST2 (A 1,B 1 ) (A 2,B 2 ). (A m,B m ) Well-Separated Pairwise Decomposition Each Pair (A i,B i ) represents a possible edge in the MST

23 High Level Overview of GeoMST2 (A 1,B 1 ) (A 2,B 2 ). (A m,B m ) 1.Create the Well- Separated Pairwise Decomposition 2.Take the pair (A i,B i ) that corresponds to the shortest edge 3.If the vertices of that edge are not in the same connected component, add the edge to the MST. Repeat Step 2.

24 A Well-Separated Pair (Callahan 1995) Let A and B be point sets in  d Let R A and R B be their respective bounding hyper-rectangles Define MargDistance(A,B) to be the minimum distance between R A and R B

25 A Well-Separated Pair (Cont) The point sets A and B are considered to be well-separated if: MargDistance(A,B)  max{Diam(R A ),Diam(R B )}

26 Interaction Product The interaction product between two point sets A and B is defined as: A  B = {{p,p’} | p  A, p’  B, p  p’}

27 Interaction Product The interaction product between two point sets A and B is defined as: A  B = {{p,p’} | p  A, p’  B, p  p’} This is the set of all distinct pairs with one element in the pair from A and the other element from B

28 Interaction Product Definition The interaction product between two point sets A and B is defined as: A  B = {{p,p’} | p  A, p’  B, p  p’} For Example: A = {1,2,3}B = {4,5} A  B = {{1,4}, {1,5}, {2,4}, {2,5}, {3,4}, {3,5}}

29 Interaction Product A  B = {{0,1}, {0,2}, {0,3},{0,4}, {1,2}, {1,3}, {1,4}, {2,3}, {2,4}, {3,4}} Now let A and B be the same point set ie. A = {0,1,2,3,4}B = {0,1,2,3,4}

30 Interaction Product A  B = {{0,1}, {0,2}, {0,3}, {0,4}, {1,2}, {1,3}, {1,4}, {2,3}, {2,4}, {3,4}} Now let A and B be the same point set ie. A = {0,1,2,3,4}B = {0,1,2,3,4} Think of this as all possible edges in a complete, undirected graph with {0,1,2,3,4} as the vertices

31 A Well-Separated Pairwise Decomposition Pair #1: ([0],[1]) Pair #2: ([0,1], [2]) Pair #3: ([0,1,2],[3,4]) Pair #4: ([3], [4]) Claim: The set of pairs {([0],[1]), ([0,1], [2]), ([0,1,2],[3,4]), ([3], [4])} form a Well-Separated Decomposition.

32 Interaction Product Properties If P is a point set in  d then a WSPD of P is a set of pairs (A i,B i ),…,(A k,B k ) with the following properties: 1. A i  P and B i  P for all i = 1,…,k 2. A i  B i =  for all i = 1, …, k A = {0,1,2,3,4}B = {0,1,2,3,4} {([0],[1]), ([0,1], [2]), ([0,1,2],[3,4]), ([3], [4])} clearly satisfies Properties 1 and 2

33 Interaction Product Property 3 3. (A i  B i )  (A j  B j ) =  for all i,j such that i  j From {([0],[1]), ([0,1], [2]), ([0,1,2],[3,4]), ([3], [4])} we get the following interaction products: A 1  B 1 = {{0,1}} A 2  B 2 = {{0,2},{1,2}} A 3  B 3 = {{0,3},{1,3},{2,3},{0,4},{1,4},{2,4}} A 4  B 4 = {{3,4}} These Interaction Products are all disjoint

34 Interaction Product Property 4 4. P  P = {{0,1}, {0,2}, {0,3}, {0,4}, {1,2}, {1,3}, {1,4}, {2,3}, {2,4}, {3,4}} A 1  B 1 = {{0,1}} A 2  B 2 = {{0,2},{1,2}} A 3  B 3 = {{0,3},{1,3},{2,3},{0,4},{1,4},{2,4}} A 4  B 4 = {{3,4}} The Union of the above Interaction Products gives back P  P

35 Interaction Product Property 5 5. A i and B i are well-separated for all i=1,…,k

36 Two Points to Note about WSPDs Two distinct points are considered to be well-separated For any data set of size n, there is a trivial WSPD of size (n choose 2)

37 A Well-Separated Pairwise Decomposition (Continued) If there are n points in P, a WSPD of P can be constructed in O(nlogn) time with O(n) elements using a fair split tree (Callahan 1995)

38 A Fair Split Tree

39 Creating a WSPD Are the nodes outlined in yellow well-separated? No.

40 Creating a WSPD Recurse on children of node with widest dimension

41 Creating a WSPD Recurse on children of node with widest dimension

42 Creating a WSPD Recurse on children of node with widest dimension

43 Creating a WSPD And so on…

44 Base Case Eventually you will find a well-separated pair of nodes. Add this pair to the WSPD.

45 Another Example of the Base Case

46 Creating a WSPD FindWSPD(W,NodeA,NodeB) if( IsWellSeparated(NodeA,NodeB)) AddPair(W,NodeA,NodeB) else if( MaxHrectDimLength(NodeA) < MaxHrectDimLength(NodeB) ) Swap(NodeA,NodeB) FindWSPD(W,NodeA->Left,NodeB) FindWSPD(W,NodeA->Right,NodeB)

47 High Level Overview of GeoMST2 (A 1,B 1 ) (A 2,B 2 ). (A m,B m ) 1.Create the Well- Separated Pairwise Decomposition 2.Take the pair (A i,B i ) that corresponds to the shortest edge 3.If the vertices of that edge are not in the same connected component, add the edge to the MST. Repeat Step 2

48 Bichromatic Closest Pair Distance Given two sets (A i,B i ), the Bichromatic Closest Pair Distance is the closest distance from a point in A i to a point in B i

49 High Level Overview of GeoMST2 (A 1,B 1 ) (A 2,B 2 ). (A m,B m ) 1.Create the Well- Separated Pairwise Decomposition 2.Take the pair (A i,B i ) with the shortest BCP distance 3.If A i and B i are not already connected, add the edge to the MST. Repeat Step 2.

50 GeoMST2 Example Start Current MST

51 GeoMST2 Example Iteration 1 Current MST

52 GeoMST2 Example Iteration 2 Current MST

53 GeoMST2 Example Iteration 3 Current MST

54 GeoMST2 Example Iteration 4 Current MST

55 High Level Overview of GeoMST2 (A 1,B 1 ) (A 2,B 2 ). (A m,B m ) 1.Create the Well- Separated Pairwise Decomposition 2.Take the pair (A i,B i ) with the shortest BCP distance 3.If A i and B i are not already connected, add the edge to the MST. Repeat Step 2. Modification for CFF: If BCP distance > , terminate

56 Optimizations We don’t need the EMST We just need to cluster all points that are within  distance or less from each other Allows two optimizations to GeoMST2 code

57 High Level Overview of GeoMST2 (A 1,B 1 ) (A 2,B 2 ). (A m,B m ) 1.Create the Well- Separated Pairwise Decomposition 2.Take the pair (A i,B i ) with the shortest BCP distance 3.If A i and B i are not already connected, add the edge to the MST. Repeat Step 2. Optimizations take place in Step 1

58 Recall: How to Create the WSPD

59 Optimization 1 Illustration

60 Optimization 1 Ignore all links that are >  Every pair (A i, B i ) in the WSPD becomes an edge unless it joins two already connected components If MargDistance(A i,B i ) > , then an edge of length  cannot exist between a point in A i and B i Don’t include such a pair in the WSPD

61 Optimization 2 Illustration

62 Optimization 2 Join all elements that are within  distance of each other If the max distance separating the bounding hyper-rectangles of A i and B i is  , then join all the points in A i and B i if they are not already connected Do not add such a pair (A i,B i ) to the WSPD

63 Implications of the optimizations Reduce the amount of time spent in creating the WSPD Reduce the number of WSPDs, thereby speeding up the GeoMST2 algorithm by reducing the size of the priority queue

64 Results Ran step two algorithms on subsets of the Sloan Digital Sky Survey 7 attributes – 4 colors, 2 sky coordinates, 1 redshift value Compared Kruskal, GeoMST2, and  -clustering

65 Results (GeoMST2 vs  -Clustering vs Kruskal in 4D)

66 Results (GeoMST2 vs  -Clustering in 3D)

67 Results (GeoMST2 vs  -Clustering in 4D)

68 Results (Change in Time as  changes for 4D data)

69 Results (Increasing Dimensions vs Time

70 Future Work More accurate, faster non-parametric density estimator Use ball trees instead of fair split tree Optimize algorithm if we keep h constant but vary c and 

71 Conclusions  -clustering outperforms GeoMST2 by nearly an order of magnitude in higher dimensions Combining the optimizations in both steps will yield an efficient algorithm for clustering against clutter on massive data sets