Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fair Clustering through Fairlets ( NIPS 2017)

Similar presentations


Presentation on theme: "Fair Clustering through Fairlets ( NIPS 2017)"β€” Presentation transcript:

1 Fair Clustering through Fairlets ( NIPS 2017)
Flavio Chierichetti Ravi Kumar Silvio Lattanzi Sergei Vassilvitskii

2 Objective A Fair Clustering algorithm under the Disparate Impact doctrine, where each protected class must have approximately equal representation in every cluster Formulation of fair clustering under the k-center and k-median objectives

3 Clustering and Fairness
Given a set X of points lying in some metric space, the goal is to find a partition of X into k different clusters, optimizing a particular objective function Unprotected- Coordinates, Protected- Color Disparate impact translates to that of Color Balance in each cluster

4 The two objectives K- Center
Given a set of data points X with distances d(xi, xj) ∈ N satisfying the triangle inequality, find a subset C βŠ† X with |C| = k while minimizing such that the maximum distance of a point in X to the closest point in C is minimized: πœ‘ 𝑋, 𝐢 = max π‘₯βˆˆπ‘‹ min π‘βˆˆπ’ž 𝑑(π‘₯, 𝑐) K-Median Given a set of data points X, the k centers ci are to be chosen so as to minimize the sum of the distances from each x to the nearest ci πœ“ 𝑋, 𝐢 = π‘₯βˆˆπ‘‹, min π‘βˆˆπ’ž 𝑑(π‘₯, 𝑐)

5 Balance For, π’€βŠ†π‘Ώ, 𝒃𝒂𝒍𝒂𝒏𝒄𝒆 𝒀 = 𝐦𝐒𝐧 #𝑹𝑬𝑫(𝒀) #𝑩𝑳𝑼𝑬(𝒀) , #𝑩𝑳𝑼𝑬(𝒀) #𝑹𝑬𝑫(𝒀) ∈ 𝟎, 𝟏 𝒃𝒂𝒍𝒂𝒏𝒄𝒆 π‘ͺ = 𝐦𝐒𝐧 π’„βˆˆπ‘ͺ 𝒃𝒂𝒍𝒂𝒏𝒄𝒆(𝒄) A subset with equal number of red and blue points has balance 1, while a monochromatic subset has balance 0.

6 LEMMA Lemma A: Let 𝒀, π’€β€²βŠ†π‘Ώ be disjoint. If π‘ͺ is a clustering of 𝒀 and π‘ͺβ€² be a clustering of 𝒀′, then 𝒃𝒂𝒍𝒂𝒏𝒄𝒆 π‘ͺ⋃ π‘ͺ β€² =𝐦𝐒𝐧⁑(𝒃𝒂𝒍𝒂𝒏𝒄𝒆 π‘ͺ , 𝒃𝒂𝒍𝒂𝒏𝒄𝒆( π‘ͺ β€² )). Lemma B: Let 𝒃𝒂𝒍𝒂𝒏𝒄𝒆 𝑿 = 𝒃 𝒓 for some integers πŸβ‰€π’ƒβ‰€π’“ such that 𝐠𝐜𝐝 𝒃, 𝒓 =𝟏, then there exists a clustering 𝓨= 𝒀 𝟏 , …, 𝒀 π’Ž of 𝑿 such that 𝒀 𝒋 ≀𝒃+𝒓 for each 𝒀 𝒋 βˆˆπ“¨, i.e., each cluster is small 𝒃𝒂𝒍𝒂𝒏𝒄𝒆 𝓨 = 𝒃 𝒓 =𝒃𝒂𝒍𝒂𝒏𝒄𝒆(𝑿 𝓨 is 𝑏, π‘Ÿ βˆ’π‘“π‘Žπ‘–π‘Ÿπ‘™π‘’π‘‘ π‘‘π‘’π‘π‘œπ‘šπ‘π‘œπ‘ π‘–π‘‘π‘–π‘œπ‘› π‘œπ‘“ 𝑋 and each π’€βˆˆπ“¨ a π‘“π‘Žπ‘–π‘Ÿπ‘™π‘’π‘‘

7 𝑑, π‘˜ βˆ’π‘“π‘Žπ‘–π‘Ÿ π‘π‘™π‘’π‘ π‘‘π‘’π‘Ÿπ‘–π‘›π‘” In the 𝑑,π‘˜ -fair center (π‘Ÿπ‘’π‘ π‘. (𝑑, π‘˜) π‘“π‘Žπ‘–π‘Ÿ π‘šπ‘’π‘‘π‘–π‘Žπ‘›) problem, the goal is to partition 𝑋 into 𝐢 such that 𝐢 =π‘˜, π‘π‘Žπ‘™π‘Žπ‘›π‘π‘’ 𝐢 β‰₯𝑑, π‘Žπ‘›π‘‘ πœ‘(𝑋, 𝐢) (π‘Ÿπ‘’π‘ π‘. πœ“(𝑋, 𝐢)) is minimized.

8 Fair k- center: (1, 1)- fairlets
Create a graph 𝐺 𝐡⋃𝑅, 𝐸 , 𝐸={ 𝑏 𝑖 , π‘Ÿ 𝑗 , 𝑀 𝑖𝑗 =𝑑( 𝑏 𝑖 , π‘Ÿ 𝑗 )} Decomposition into fairlets corresponds to some perfect matching in the graph. πœ‘(𝑋, π‘Œ) is exactly the cost of the maximum weight edge in the matching. Define 𝐺 𝜏 as a threshold graph that has the same nodes as 𝐺but only those edges who has weight at most 𝜏 We can then look for the minimum 𝜏 where the corresponding graph has a perfect matching Finally for each fairlet π‘Œ 𝑖 we can arbitrarily set one of the two nodes as the center

9 Fair k-center: (1, 𝑑 β€² )-fairlets
Transform the problem into a minimum cost flow(MCF) problem A (𝛽, 𝜌) edge with cost 0 and capacity min⁑( 𝐡 , 𝑅 ) A (𝛽, 𝑏 𝑖 ) edge for each 𝑏 𝑖 ∈𝐡 and an ( π‘Ÿ 𝑖 ,𝜌) for each π‘Ÿ 𝑖 βˆˆπ‘… [cost 0 capacity 𝑑 β€² βˆ’1] For each 𝑏 𝑖 ∈𝐡 and for each π‘—βˆˆ 𝑑′ , a ( 𝑏 𝑖 , 𝑏 𝑖 𝑗 ) edge and similarly for each π‘Ÿ 𝑖 βˆˆπ‘… [cost 0 and capacity 1] For each 𝑏 𝑖 ∈𝐡, π‘Ÿ 𝑗 βˆˆπ‘… and for each 1β‰€π‘˜,𝑙≀𝑑, π‘Ž ( 𝑏 𝑖 π‘˜ , π‘Ÿ 𝑗 𝑙 ) edge with capacity 1. The cost of each edge is 1 if 𝑑 𝑏 𝑖 , π‘Ÿ 𝑗 β‰€πœ and ∞ otherwise.

10 Fair k-center: (1, 𝑑 β€² )-fairlets

11 LEMMA Lemma C: Let 𝒴 be an optimal solution of cost C to the MCF instance, then it is possible to construct a 1, 𝑑 β€² -fairlet decomposition for ( 1 𝑑 β€² , π‘˜)- fair center problem of cost at most C.

12 Theorem For each fixed 𝑑′β‰₯3, finding an optimal (1, 𝑑 β€² )-fairlet decomposition is NP-hard. Finding the minimum cost ( 1 𝑑 β€² ,π‘˜)-fair median clustering is NP-hard.

13 Greedy Furthest point Algorithm

14 Datasets Diabetes (1000 records, gender to be balanced)
Bank (1000 records, Married or unmarried to be balanced) Census (600 records, gender to be balanced)

15 Results

16 Future Work Extend this idea to situations where the protected class is not binary Extend the idea to other clustering objective functions

17 References Gonzalez, Teofilo F. "Clustering to minimize the maximum intercluster distance."Β Theoretical Computer ScienceΒ 38 (1985): [PDF]

18 THANK YOU


Download ppt "Fair Clustering through Fairlets ( NIPS 2017)"

Similar presentations


Ads by Google