Distance Between Two Partitions Joe Previte Penn State Erie.

Slides:



Advertisements
Similar presentations
Protein Secondary Structure Prediction Using BLAST and Relaxed Threshold Rule Induction from Coverings Leong Lee Missouri University of Science and Technology,
Advertisements

Greedy Algorithms Amihood Amir Bar-Ilan University.
Greedy Algorithms Greed is good. (Some of the time)
Sets Lecture 11: Oct 24 AB C. This Lecture We will first introduce some basic set theory before we do counting. Basic Definitions Operations on Sets Set.
Lecture 12: Lower bounds By "lower bounds" here we mean a lower bound on the complexity of a problem, not an algorithm. Basically we need to prove that.
Divide and Conquer. Recall Complexity Analysis – Comparison of algorithm – Big O Simplification From source code – Recursive.
Applied Discrete Mathematics Week 11: Graphs
Equilibrium Concepts in Two Player Games Kevin Byrnes Department of Applied Mathematics & Statistics.
Generated Waypoint Efficiency: The efficiency considered here is defined as follows: As can be seen from the graph, for the obstruction radius values (200,
Nattee Niparnan. Recall  Complexity Analysis  Comparison of Two Algos  Big O  Simplification  From source code  Recursive.
Discrete geometry Lecture 2 1 © Alexander & Michael Bronstein
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006
Great Theoretical Ideas in Computer Science.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Approximation Algorithms: Combinatorial Approaches Lecture 13: March 2.
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
Prune-and-search Strategy
L16: Micro-array analysis Dimension reduction Unsupervised clustering.
1 Vertex Cover Problem Given a graph G=(V, E), find V' ⊆ V such that for each edge (u, v) ∈ E at least one of u and v belongs to V’ and |V’| is minimized.
The Goldreich-Levin Theorem: List-decoding the Hadamard code
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Visual Recognition Tutorial
The Quadratic Equation
Lecture II.  Using the example from Birenens Chapter 1: Assume we are interested in the game Texas lotto (similar to Florida lotto).  In this game,
CSE 589 Applied Algorithms Spring Colorability Branch and Bound.
MA4266 Topology Wayne Lawton Department of Mathematics S ,
Asaf Cohen (joint work with Rami Atar) Department of Mathematics University of Michigan Financial Mathematics Seminar University of Michigan March 11,
Correlation testing for affine invariant properties on Shachar Lovett Institute for Advanced Study Joint with Hamed Hatami (McGill)
Great Theoretical Ideas In Computer Science Steven Rudich, Anupam GuptaCS Spring 2004 Lecture 22April 1, 2004Carnegie Mellon University
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Sets.
General (point-set) topology Jundong Liu Ohio Univ.
© The McGraw-Hill Companies, Inc., Chapter 6 Prune-and-Search Strategy.
Great Theoretical Ideas in Computer Science.
Discrete Math for CS Binary Relation: A binary relation between sets A and B is a subset of the Cartesian Product A x B. If A = B we say that the relation.
Notes for self-assembly of thin rectangles Days 19, 20 and 21 of Comp Sci 480.
Chapter 5: Probability Analysis of Randomized Algorithms Size is rarely the only property of input that affects run time Worst-case analysis most common.
Great Theoretical Ideas in Computer Science.
1 The number of orientations having no fixed tournament Noga Alon Raphael Yuster.
Mathematical Proofs. Chapter 1 Sets 1.1 Describing a Set 1.2 Subsets 1.3 Set Operations 1.4 Indexed Collections of Sets 1.5 Partitions of Sets.
1 Psych 5500/6500 Measures of Variability Fall, 2008.
A data Type for Computational Geometry & Solid Modelling Abbas Edalat Andr é Lieutier Imperial College Dassault Systemes.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Inequalities for Stochastic Linear Programming Problems By Albert Madansky Presented by Kevin Byrnes.
8.5 Equivalence Relations
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
Clustering (1) Chapter 7. Outline Introduction Clustering Strategies The Curse of Dimensionality Hierarchical k-means.
1 Chapter 6 Heapsort. 2 About this lecture Introduce Heap – Shape Property and Heap Property – Heap Operations Heapsort: Use Heap to Sort Fixing heap.
Presenters: Amool Gupta Amit Sharma. MOTIVATION Basic problem that it addresses?(Why) Other techniques to solve same problem and how this one is step.
2004/10/5fuzzy set theory chap03.ppt1 Classical Set Theory.
A Binary Linear Programming Formulation of the Graph Edit Distance Presented by Shihao Ji Duke University Machine Learning Group July 17, 2006 Authors:
2.5 The Fundamental Theorem of Game Theory For any 2-person zero-sum game there exists a pair (x*,y*) in S  T such that min {x*V. j : j=1,...,n} =
Infinite sets We say that a set A is infinite if a proper subset B exists of A such that there is a bijection It is easy to see that no set with a finite.
Representing Relations Using Digraphs
Existence of Non-measurable Set
BN Semantic II d-Separation, PDAGs, etc
Function Hubert Chan (Chapter 2.1, 2.2) [O1 Abstract Concepts]
Chapter 3 The Real Numbers.
Function Hubert Chan (Chapter 2.1, 2.2) [O1 Abstract Concepts]
Existence of Non-measurable Set
Chapter 3 The Real Numbers.
Vapnik–Chervonenkis Dimension
k-center Clustering under Perturbation Resilience
Craig Schroeder October 26, 2004
RS – Reed Solomon List Decoding.
The Curve Merger (Dvir & Widgerson, 2008)
Hardness Of Approximation
András Sebő and Anke van Zuylen
Clustering.
8/7/2019 Berhanu G (Dr) 1 Chapter 3 Convex Functions and Separation Theorems In this chapter we focus mainly on Convex functions and their properties in.
Presentation transcript:

Distance Between Two Partitions Joe Previte Penn State Erie

Motivation of problem FAA partitions the US Partitions change over time Need to compare 2 partitions, specifically to assign a distance (How far apart are the partitions? They used an adhoc method that didn’t really work (later on this)

Sample partition Even these are partitioned!

Concepts from mathematics Sets Partitions Distance/metric

Partition of S A set of nonempty subsets of S that are pairwise disjoint and exhaust S

Metric on S A metric or a distance on S is a function d(x,y) that satisfies the following: d: S x S → R (1)d(x,y) ≥ 0 for all x,y (2) d(x,y) =0 if and only if x=y (3)d(x,y)=d(y,x) (4) d(x,y) ≤ d(x,z)+d(z,y) S with d is called a metric space

Hausdorff Distance Let A S (A compact) Define d(x,A) = inf d(x,a) (FAIP:This is the shortest distance from x to get to A) Define A ={ x ε S : d(x, A) < } a ε A (This is just A ‘fattened up’ a bit)

Hausdorff Distance Example: A and A (S= R 2 ) A 1 1 A 1

Hausdorff Distance Dist_H (A,B) = inf { δ : B A δ and A B δ } δ >0 AB 3 Dist_H (A,B)=3 Let A, B be subsets of S, a metric space 4

Example 2 For S = R (Real line) Dist_H( [0,1], {1,4} ) = ?

Example 3 For S = R (Real line) A = Q [0,1] B= (0,1) Dist_H( A, B ) = ?

AS, B S d: S x S → R (1)Dist_H(A,B) ≥ 0 for all A,B (2) Dist_H(A,B) = 0 if and only if A=B (3)Dist_H(A,B)=Dist_H(B,A) (4) Dist_H(A,B) ≤ Dist_H(A,C)+Dist_H(C,B) Hausdorff function is (almost) a Metric on S Compact = closed and bounded Need A, B compact

Hausdorff Distance Is a distance between compact subsets of a metric space. SO the set of all compact subsets of a compact metric space S itself is a metric space with metric Dist_H. Or, the power set of S is almost a metric space

What about partitions? Example S= square (same number elements) S1S1 S2S2 S3S3 T1T1 T2T2 T3T3 P1P1 Natural idea: Take pairwise Hausdorff distances between ‘best fits' P2P2

S1S1 S2S2 S3S3 T1T1 T2T2 T3T3 Set Dist_P( P 1, P 2 )= Dist_H(S 1,T 1 )+Dist_H(S 2,T 2 )+Dist_H(S 3,T 3 ) We technically take the closure of the elements of the partition, the resulting sets are NOT a partition but share boundaries.

How do we make the pairwise assignment?! Two ‘random’ partitions of the square (How do we pair partition elements?)

We cheat! General Definition (Same number of partition elements, n) Minimum here is over all possible bijections from {1,…,n} to itself. (So we take the min of all possible assignments) Dist_P( P 1, P 2 )= min   Dist_H (A i,B f(i) ) f: {1,..,n}-> {1,…,n} i=1 n

Straightforward Proof The proof that this definition obeys the properties of a metric is straightforward EXCEPT: Dist_P( P 1, P 2 )= 0 implies that the partition elements of P 1, P 2 agree up to closures.

Problem arises when | P 1 | ≠ | P 2 | Now what to do if the number of elements of the partition differ? S1S1 S2S2 S3S3 T1T1 T2T2 T3T3 T4T4 P1P1 P2P2

It was at this point, I actually got involved in the problem FAA had an ad hoc way to compute the ‘distance’ The resulting schemes were not actually metrics The distances assigned numbers that were too big for partitions that ‘looked’ close.

FAA solution #1 S1S1 S2S2 S3S3 T1T1 T2T2 T3T3 T4T4 Add a fixed penalty || P 1 | -| P 2 || to the best pairwise assignment

Trouble! Now these 2 partitions are the same distance from the first!! Want 2 nd one to be further (it looks further!) S1S1 S2S2 S3S3 I1I1 I2I2 I3I3 I4I4 S1S1 S2S2 S3S3 T1T1 T2T2 T3T3 T4T4

Sorry FAA! S1S1 S2S2 S3S3 T1T1 T2T2 T3T3 T4T4 T5T5 Then this partition is further from P 1 than this one (looks closer to me) I1I1 I2I2 I3I3 I4I4

FAA solution #2 (cheat some more) S1S1 S2S2 S3S3 T1T1 T2T2 T3T3 T4T4 T5T5 Dist_P( P 1, P 2 )= min   Dist_H (A i,B f(i) ) f: {1,2,3,…min{| P 1 |, | P 2 |}->{1,2,3,…max{| P 1 |, | P 2 |} n Minimal assignment strategy again (Hey, it worked before!!)

Sorry again FAA!! (triangle ineq.) S1S1 S2S2 S3S3 W1W1 W2W2 W3W3 T1T1 T2T2 T3T3 T4T4 W4W4

One KEY is an observation General Definition (Same number of partition elements, n) Dist_P( P 1, P 2 )= min   Dist_H (A i,B f(i) ) f: {1,..,n}-> {1,…,n} i=1 n Here P 1, P 2 are any collection of n compact subsets of S in (Power Set(S) )^n compact subsets

To solve the problem Add || P 1 | -| P 2 || copies of the same compact subset of S to the smaller partition. This set should be canonical, S itself can be used (as one extreme) or any compact set (the centroid). The new ‘partitions’ each contain n compact subsets, the distance can be computed.

Straightforward Proof The proof that this definition obeys the properties of a metric is again straightforward AGAIN, EXCEPT: Dist_P( P 1, P 2 )= 0 implies that the partition elements of P 1, P 2 agree up to closures.

Problems that FAA encountered mostly eliminated (Most of these arose since they did not have a metric)

Example S1S1 S2S2 S3S3 T1T1 T2T2 T3T3 T4T4 Throw in S 4 (centroid) keeping S 1,S 2,S 3 intact S4S4

Still a ‘problem’ (we just moved T4) S1S1 S2S2 S3S3 T1T1 T2T2 T3T3 T4T4 Location dependence

Another ‘problem’, we might want this to be small! (They aren’t). W1W1 W2W2 W3W3 T1T1 T2T2 T3T3 T4T4 W4W4

The final solution? Introduce a set of points A in S that are uniformly distributed and compute Hausdorff distances in S/A Change the space to be the identification space S/A In the case | P 1 | ≠ | P 2 |, throw in copies of A

Example S1S1 S2S2 S3S3 T1T1 T2T2 T3T3 T4T4 Always throw in A (uniformly distributed) and compute all distances in S/A A With this, we have a workable metric!

Future goals Need a student to write code to demonstrate the computation Need to interact with FAA Can apply to image recognition (colors of image create a partition of the image)

Thanks Cookies