Download presentation
Presentation is loading. Please wait.
1
Fair Clustering through Fairlets ( NIPS 2017)
Flavio Chierichetti Ravi Kumar Silvio Lattanzi Sergei Vassilvitskii
2
Objective A Fair Clustering algorithm under the Disparate Impact doctrine, where each protected class must have approximately equal representation in every cluster Formulation of fair clustering under the k-center and k-median objectives
3
Clustering and Fairness
Given a set X of points lying in some metric space, the goal is to find a partition of X into k different clusters, optimizing a particular objective function Unprotected- Coordinates, Protected- Color Disparate impact translates to that of Color Balance in each cluster
4
The two objectives K- Center
Given a set of data points X with distances d(xi, xj) β N satisfying the triangle inequality, find a subset C β X with |C| = k while minimizing such that the maximum distance of a point in X to the closest point in C is minimized: π π, πΆ = max π₯βπ min πβπ π(π₯, π) K-Median Given a set of data points X, the k centers ci are to be chosen so as to minimize the sum of the distances from each x to the nearest ci π π, πΆ = π₯βπ, min πβπ π(π₯, π)
5
Balance For, πβπΏ, πππππππ π = π¦π’π§ #πΉπ¬π«(π) #π©π³πΌπ¬(π) , #π©π³πΌπ¬(π) #πΉπ¬π«(π) β π, π πππππππ πͺ = π¦π’π§ πβπͺ πππππππ(π) A subset with equal number of red and blue points has balance 1, while a monochromatic subset has balance 0.
6
LEMMA Lemma A: Let π, πβ²βπΏ be disjoint. If πͺ is a clustering of π and πͺβ² be a clustering of πβ², then πππππππ πͺβ πͺ β² =π¦π’π§β‘(πππππππ πͺ , πππππππ( πͺ β² )). Lemma B: Let πππππππ πΏ = π π for some integers πβ€πβ€π such that π ππ π, π =π, then there exists a clustering π¨= π π , β¦, π π of πΏ such that π π β€π+π for each π π βπ¨, i.e., each cluster is small πππππππ π¨ = π π =πππππππ(πΏ π¨ is π, π βπππππππ‘ ππππππππ ππ‘πππ ππ π and each πβπ¨ a πππππππ‘
7
π‘, π βππππ πππ’π π‘πππππ In the π‘,π -fair center (πππ π. (π‘, π) ππππ ππππππ) problem, the goal is to partition π into πΆ such that πΆ =π, πππππππ πΆ β₯π‘, πππ π(π, πΆ) (πππ π. π(π, πΆ)) is minimized.
8
Fair k- center: (1, 1)- fairlets
Create a graph πΊ π΅βπ
, πΈ , πΈ={ π π , π π , π€ ππ =π( π π , π π )} Decomposition into fairlets corresponds to some perfect matching in the graph. π(π, π) is exactly the cost of the maximum weight edge in the matching. Define πΊ π as a threshold graph that has the same nodes as πΊbut only those edges who has weight at most π We can then look for the minimum π where the corresponding graph has a perfect matching Finally for each fairlet π π we can arbitrarily set one of the two nodes as the center
9
Fair k-center: (1, π‘ β² )-fairlets
Transform the problem into a minimum cost flow(MCF) problem A (π½, π) edge with cost 0 and capacity minβ‘( π΅ , π
) A (π½, π π ) edge for each π π βπ΅ and an ( π π ,π) for each π π βπ
[cost 0 capacity π‘ β² β1] For each π π βπ΅ and for each πβ π‘β² , a ( π π , π π π ) edge and similarly for each π π βπ
[cost 0 and capacity 1] For each π π βπ΅, π π βπ
and for each 1β€π,πβ€π‘, π ( π π π , π π π ) edge with capacity 1. The cost of each edge is 1 if π π π , π π β€π and β otherwise.
10
Fair k-center: (1, π‘ β² )-fairlets
11
LEMMA Lemma C: Let π΄ be an optimal solution of cost C to the MCF instance, then it is possible to construct a 1, π‘ β² -fairlet decomposition for ( 1 π‘ β² , π)- fair center problem of cost at most C.
12
Theorem For each fixed π‘β²β₯3, finding an optimal (1, π‘ β² )-fairlet decomposition is NP-hard. Finding the minimum cost ( 1 π‘ β² ,π)-fair median clustering is NP-hard.
13
Greedy Furthest point Algorithm
14
Datasets Diabetes (1000 records, gender to be balanced)
Bank (1000 records, Married or unmarried to be balanced) Census (600 records, gender to be balanced)
15
Results
16
Future Work Extend this idea to situations where the protected class is not binary Extend the idea to other clustering objective functions
17
References Gonzalez, Teofilo F. "Clustering to minimize the maximum intercluster distance."Β Theoretical Computer ScienceΒ 38 (1985): [PDF]
18
THANK YOU
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.