Presentation is loading. Please wait.

Presentation is loading. Please wait.

On the Limits of Dictatorial Classification Reshef Meir School of Computer Science and Engineering, Hebrew University Joint work with Shaull Almagor, Assaf.

Similar presentations


Presentation on theme: "On the Limits of Dictatorial Classification Reshef Meir School of Computer Science and Engineering, Hebrew University Joint work with Shaull Almagor, Assaf."— Presentation transcript:

1 On the Limits of Dictatorial Classification Reshef Meir School of Computer Science and Engineering, Hebrew University Joint work with Shaull Almagor, Assaf Michaely and Jeffrey S. Rosenschein

2 Strategy-Proof Classification An Example Motivation Our Model and previous results Filling the gap: proving a lower bound Filling the gap: proving a lower bound The weighted case The weighted case

3 The Motivating Questions Do “strategyproof” considerations apply to learning? If agents have an incentive to lie, what can we do about it? – Approximation – Randomization – And even clever use of dictators…

4 ERM MotivationModelResults Strategic labeling: an example Introduction 5 errors

5 There is a better classifier! (for me…) MotivationModelResultsIntroduction

6 If I just change the labels… MotivationModelResultsIntroduction 2+5 = 7 errors

7 Classification The Supervised Classification problem: – Input: a set of labeled data points {(x i,y i )} i=1..m – output: a classifier c from some predefined concept class C ( e.g., functions of the form f : X  {-,+} ) – We usually want c to classify correctly not just the sample, but to generalize well, i.e., to minimize R(c) ≡ the expected number of errors w.r.t. the distribution D (the 0/1 loss function) MotivationResultsIntroductionModel E (x,y)~D [ c(x)≠y ]

8 Classification (cont.) ERM (Empirical Risk Minimizer) A common approach is to return the ERM (Empirical Risk Minimizer), i.e., the concept in C that is the best w.r.t. the given samples (has the lowest number of errors) Generalizes well under some assumptions on the concept class C (e.g., linear classifiers tend to generalize well) With multiple experts, we can’t trust our ERM! MotivationResultsIntroductionModel

9 Where do we find “experts” with incentives? Example 1: A firm learning purchase patterns – Information gathered from local retailers – The resulting policy affects them – “the best policy, is the policy that fits my pattern” IntroductionModelResultsMotivation

10 Users Reported Dataset Classification Algorithm Classifier IntroductionModelResults Example 2: Internet polls / polls of experts Motivation

11 IntroductionModelResults Motivation from other domains Motivation Aggregating partitions Judgment aggregation Facility location (on the binary cube) AgentABA & BA | ~B TFFT FTFF FFFT

12 A problem instance is defined by Set of agents I = {1,...,n} A set of data points X = {x 1,...,x m }  X For each x k  X agent i has a label y ik  { ,  } – Each pair s ik=  x k,y ik  is a sample – All samples of a single agent compose the labeled dataset S i = {s i1,...,s i,m(i) } The joint dataset S=  S 1, S 2,…, S n  is our input – m=|S| We denote the dataset with the reported labels by S’ IntroductionMotivationResultsModel

13 Agent 1 Agent 2 Agent 3 Input: Example – – – – – – – – + + + + – – X  X m Y 1  {-,+} m Y 2  {-,+} m Y 3  {-,+} m S =  S 1, S 2,…, S n  =  (X,Y 1 ),…, (X,Y n )  IntroductionMotivationResultsModel – – + + – – + + - - - - – – – – + + + + – – - - + + + +

14 Mechanisms A Mechanism M receives a labeled dataset S and outputs c = M (S)  C Private risk of i: R i (c,S) = |{k: c(x ik )  y ik }| / m i Global risk: R (c,S) = |{i,k: c(x ik )  y ik }| / m We allow non-deterministic mechanisms – Measure the expected risk IntroductionMotivationResultsModel % of errors on S i % of errors on S

15 ERM We compare the outcome of M to the ERM: c* = ERM(S) = argmin( R (c),S) r* = R (c*,S) c  Cc  C Can our mechanism simply compute and return the ERM? IntroductionMotivationResultsModel

16 (Lying) Requirements 1.Good approximation:  S R ( M (S),S) ≤ α ∙r* 2.Strategy-Proofness (SP):  i,S,S i ‘ R i ( M (S -i, S i ‘),S) ≥ R i ( M (S),S) ERM(S) is 1-approximating but not SP ERM(S 1 ) is SP but gives bad approximation Are there any mechanisms that guarantee both SP and good approximation? IntroductionMotivationResultsModel MOST IMPORTANT SLIDE (Truth)

17 A study of SP mechanisms in Regression learning – O. Dekel, F. Fischer and A. D. Procaccia, SODA (2008), JCSS (2009). [supervised learning] No SP mechanisms for Clustering – J. Perote-Peña and J. Perote, Economics Bulletin (2003) [unsupervised learning] IntroductionMotivationModelResults Related work

18 Results A simple case Tiny concept class: |C|= 2 Either “all positive” or “all negative” Theorem: There is a SP 2-approximation mechanism There are no SP α-approximation mechanisms, for any α<2 IntroductionMotivationModel Meir, Procaccia and Rosenschein, AAAI 2008 Previous work

19 Results General concept classes Theorem: Selecting a dictator at random is SP and guarantees approximation – True for any concept class C – Generalizes well from sampled data when C has a bounded VC dimension Open question #1: are there better mechanisms? Open question #2: what if agents are weighted? IntroductionMotivationModel Meir, Procaccia and Rosenschein, IJCAI 2009 Previous work

20 A lower bound IntroductionMotivationModelResults Theorem: There is a concept class C (where |C|=3), for which any SP mechanism has an approximation ratio of at least Our main result: o Matching the upper bound from IJCAI-09 o Proof is by a careful reduction to a voting scenario o We will see the proof sketch

21 Proof sketch IntroductionMotivationModelResults Gibbard [‘77] proved that every (randomized) SP voting rule for 3 candidates, must be a lottery over dictators*. We define X = {x,y,z}, and C as follows: We also restrict the agents, so that each agent can have mixed labels on just one point xyz cxcx +-- cycy -+- czcz --+ xyz - - - - ++++ - - - -++++++++ - - - - ++ - - - - - -

22 Proof sketch (cont.) IntroductionMotivationModelResults xyz - - - - ++++ - - - -++++++++ - - - - ++ - - - - - - Suppose that M is SP

23 Proof sketch (cont.) IntroductionMotivationModelResults xyz - - - - ++++ - - - -++++++++ - - - - ++ - - - - - - Suppose that M is SP 1. M must be monotone on the mixed point 2. M must ignore the mixed point 3. M is a (randomized) voting rule c z > c y > c x c x > c z > c y

24 Proof sketch (cont.) IntroductionMotivationModelResults xyz - - - - ++++ - - - -++++++++ - - - - ++ - - - - - - 4. By Gibbard [‘77], M is a random dictator 5. We construct an instance where random dictators perform poorly c z > c y > c x c x > c z > c y

25 Weighted agents IntroductionMotivationModelResults We must select a dictator randomly However, probability may be based on weight Naïve approach: o Only gives 3-approximation An optimal SP algorithm: o Matches the lower bound of

26 Future work Other concept classes Other loss functions (linear loss, quadratic loss,…) Alternative assumptions on structure of data Other models of strategic behavior … IntroductionMotivationModelResults


Download ppt "On the Limits of Dictatorial Classification Reshef Meir School of Computer Science and Engineering, Hebrew University Joint work with Shaull Almagor, Assaf."

Similar presentations


Ads by Google