# Distance Between Two Partitions Joe Previte Penn State Erie.

## Presentation on theme: "Distance Between Two Partitions Joe Previte Penn State Erie."— Presentation transcript:

Distance Between Two Partitions Joe Previte Penn State Erie

Motivation of problem FAA partitions the US Partitions change over time Need to compare 2 partitions, specifically to assign a distance (How far apart are the partitions? They used an adhoc method that didn’t really work (later on this)

Sample partition Even these are partitioned!

Concepts from mathematics Sets Partitions Distance/metric

Partition of S A set of nonempty subsets of S that are pairwise disjoint and exhaust S

Metric on S A metric or a distance on S is a function d(x,y) that satisfies the following: d: S x S → R (1)d(x,y) ≥ 0 for all x,y (2) d(x,y) =0 if and only if x=y (3)d(x,y)=d(y,x) (4) d(x,y) ≤ d(x,z)+d(z,y) S with d is called a metric space

Hausdorff Distance Let A S (A compact) Define d(x,A) = inf d(x,a) (FAIP:This is the shortest distance from x to get to A) Define A ={ x ε S : d(x, A) < } a ε A (This is just A ‘fattened up’ a bit)

Hausdorff Distance Example: A and A (S= R 2 ) A 1 1 A 1

Hausdorff Distance Dist_H (A,B) = inf { δ : B A δ and A B δ } δ >0 AB 3 Dist_H (A,B)=3 Let A, B be subsets of S, a metric space 4

Example 2 For S = R (Real line) Dist_H( [0,1], {1,4} ) = ?

Example 3 For S = R (Real line) A = Q [0,1] B= (0,1) Dist_H( A, B ) = ?

AS, B S d: S x S → R (1)Dist_H(A,B) ≥ 0 for all A,B (2) Dist_H(A,B) = 0 if and only if A=B (3)Dist_H(A,B)=Dist_H(B,A) (4) Dist_H(A,B) ≤ Dist_H(A,C)+Dist_H(C,B) Hausdorff function is (almost) a Metric on S Compact = closed and bounded Need A, B compact

Hausdorff Distance Is a distance between compact subsets of a metric space. SO the set of all compact subsets of a compact metric space S itself is a metric space with metric Dist_H. Or, the power set of S is almost a metric space

What about partitions? Example S= square (same number elements) S1S1 S2S2 S3S3 T1T1 T2T2 T3T3 P1P1 Natural idea: Take pairwise Hausdorff distances between ‘best fits' P2P2

S1S1 S2S2 S3S3 T1T1 T2T2 T3T3 Set Dist_P( P 1, P 2 )= Dist_H(S 1,T 1 )+Dist_H(S 2,T 2 )+Dist_H(S 3,T 3 ) We technically take the closure of the elements of the partition, the resulting sets are NOT a partition but share boundaries.

How do we make the pairwise assignment?! Two ‘random’ partitions of the square (How do we pair partition elements?)

We cheat! General Definition (Same number of partition elements, n) Minimum here is over all possible bijections from {1,…,n} to itself. (So we take the min of all possible assignments) Dist_P( P 1, P 2 )= min   Dist_H (A i,B f(i) ) f: {1,..,n}-> {1,…,n} i=1 n

Straightforward Proof The proof that this definition obeys the properties of a metric is straightforward EXCEPT: Dist_P( P 1, P 2 )= 0 implies that the partition elements of P 1, P 2 agree up to closures.

Problem arises when | P 1 | ≠ | P 2 | Now what to do if the number of elements of the partition differ? S1S1 S2S2 S3S3 T1T1 T2T2 T3T3 T4T4 P1P1 P2P2

It was at this point, I actually got involved in the problem FAA had an ad hoc way to compute the ‘distance’ The resulting schemes were not actually metrics The distances assigned numbers that were too big for partitions that ‘looked’ close.

FAA solution #1 S1S1 S2S2 S3S3 T1T1 T2T2 T3T3 T4T4 Add a fixed penalty || P 1 | -| P 2 || to the best pairwise assignment

Trouble! Now these 2 partitions are the same distance from the first!! Want 2 nd one to be further (it looks further!) S1S1 S2S2 S3S3 I1I1 I2I2 I3I3 I4I4 S1S1 S2S2 S3S3 T1T1 T2T2 T3T3 T4T4

Sorry FAA! S1S1 S2S2 S3S3 T1T1 T2T2 T3T3 T4T4 T5T5 Then this partition is further from P 1 than this one (looks closer to me) I1I1 I2I2 I3I3 I4I4

FAA solution #2 (cheat some more) S1S1 S2S2 S3S3 T1T1 T2T2 T3T3 T4T4 T5T5 Dist_P( P 1, P 2 )= min   Dist_H (A i,B f(i) ) f: {1,2,3,…min{| P 1 |, | P 2 |}->{1,2,3,…max{| P 1 |, | P 2 |} n Minimal assignment strategy again (Hey, it worked before!!)

Sorry again FAA!! (triangle ineq.) S1S1 S2S2 S3S3 W1W1 W2W2 W3W3 T1T1 T2T2 T3T3 T4T4 W4W4

One KEY is an observation General Definition (Same number of partition elements, n) Dist_P( P 1, P 2 )= min   Dist_H (A i,B f(i) ) f: {1,..,n}-> {1,…,n} i=1 n Here P 1, P 2 are any collection of n compact subsets of S in (Power Set(S) )^n compact subsets

To solve the problem Add || P 1 | -| P 2 || copies of the same compact subset of S to the smaller partition. This set should be canonical, S itself can be used (as one extreme) or any compact set (the centroid). The new ‘partitions’ each contain n compact subsets, the distance can be computed.

Straightforward Proof The proof that this definition obeys the properties of a metric is again straightforward AGAIN, EXCEPT: Dist_P( P 1, P 2 )= 0 implies that the partition elements of P 1, P 2 agree up to closures.

Problems that FAA encountered mostly eliminated (Most of these arose since they did not have a metric)

Example S1S1 S2S2 S3S3 T1T1 T2T2 T3T3 T4T4 Throw in S 4 (centroid) keeping S 1,S 2,S 3 intact S4S4

Still a ‘problem’ (we just moved T4) S1S1 S2S2 S3S3 T1T1 T2T2 T3T3 T4T4 Location dependence

Another ‘problem’, we might want this to be small! (They aren’t). W1W1 W2W2 W3W3 T1T1 T2T2 T3T3 T4T4 W4W4

The final solution? Introduce a set of points A in S that are uniformly distributed and compute Hausdorff distances in S/A Change the space to be the identification space S/A In the case | P 1 | ≠ | P 2 |, throw in copies of A

Example S1S1 S2S2 S3S3 T1T1 T2T2 T3T3 T4T4 Always throw in A (uniformly distributed) and compute all distances in S/A A With this, we have a workable metric!

Future goals Need a student to write code to demonstrate the computation Need to interact with FAA Can apply to image recognition (colors of image create a partition of the image)