Strategy-Proof Classification Reshef Meir School of Computer Science and Engineering, Hebrew University A joint work with Ariel. D. Procaccia and Jeffrey.

Slides:

Advertisements

Similar presentations

Machine Learning: Intro and Supervised Classification

Advertisements

Algorithmic mechanism design Vincent Conitzer

On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.

1 AI and Economics: The Dynamic Duo Ariel Procaccia Center for Research on Computation and Society Harvard SEAS AI AND ECONOMICS DYNAMIC DUO THE.

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 

FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley 1 and Robert E. Schapire 2 1 Carnegie Mellon University 2 Princeton University.

Machine Learning Week 2 Lecture 1.

1 Truthful Mechanism for Facility Allocation: A Characterization and Improvement of Approximation Ratio Pinyan Lu, MSR Asia Yajun Wang, MSR Asia Yuan Zhou,

Planning under Uncertainty

Linear Separators.

On the Limits of Dictatorial Classification Reshef Meir School of Computer Science and Engineering, Hebrew University Joint work with Shaull Almagor, Assaf.

Sum of Us: Strategyproof Selection From the Selectors Noga Alon, Felix Fischer, Ariel Procaccia, Moshe Tennenholtz 1.

Reshef Meir, Ariel D. Procaccia, and Jeffrey S. Rosenschein.

Ariel D. Procaccia (Microsoft)  Best advisor award goes to...  Thesis is about computational social choice Approximation Learning Manipulation BEST.

Machine Learning Week 2 Lecture 2.

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

Strategy-Proof Classification Reshef Meir School of Computer Science and Engineering, Hebrew University A joint work with Ariel. D. Procaccia and Jeffrey.

Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.

Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.

Incentive Compatible Regression Learning Ofer Dekel, Felix A. Fischer and Ariel D. Procaccia.

Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.

Reshef Meir School of Computer Science and Engineering Hebrew University, Jerusalem, Israel Joint work with Maria Polukarov, Jeffery S. Rosenschein and.

Competitive Analysis of Incentive Compatible On-Line Auctions Ron Lavi and Noam Nisan SISL/IST, Cal-Tech Hebrew University.

Experimental Evaluation

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

Strategic Behavior in Multi-Winner Elections A follow-up on previous work by Ariel Procaccia, Aviv Zohar and Jeffrey S. Rosenschein Reshef Meir The School.

Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.

CPS 173 Mechanism design Vincent Conitzer

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Machine Learning CSE 681 CH2 - Supervised Learning.

Bayesian Networks Martin Bachler MLA - VO

ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.

Universit at Dortmund, LS VIII

By: Amir Ronen, Department of CS Stanford University Presented By: Oren Mizrahi Matan Protter Issues on border of economics & computation, 2002.

Mechanism design for computationally limited agents (previous slide deck discussed the case where valuation determination was complex) Tuomas Sandholm.

 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.

Regret Minimizing Equilibria of Games with Strict Type Uncertainty Stony Brook Conference on Game Theory Nathanaël Hyafil and Craig Boutilier Department.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

Mechanism Design on Discrete Lines and Cycles Elad Dokow, Michal Feldman, Reshef Meir and Ilan Nehama.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Chapter1: Introduction Chapter2: Overview of Supervised Learning

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.

Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.

Mechanism Design with Strategic Mediators Moran Feldman EPFL Joint work with: Moshe Babaioff, Microsoft Research Moshe Tennenholtz, Technion.

CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Pseudo-random generators Talk for Amnon ’ s seminar.

Market Design and Analysis Lecture 2 Lecturer: Ning Chen ( 陈宁 )

Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.

Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

On-Line Algorithms in Machine Learning By: WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ KIM HYEONGCHEOL HE RUIDAN SHANG XINDI.

Page 1 CS 546 Machine Learning in NLP Review 1: Supervised Learning, Binary Classifiers Dan Roth Department of Computer Science University of Illinois.

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.

Unconstrained Submodular Maximization Moran Feldman The Open University of Israel Based On Maximizing Non-monotone Submodular Functions. Uriel Feige, Vahab.

1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.

Mechanism design for computationally limited agents (last lecture discussed the case where valuation determination was complex) Tuomas Sandholm Computer.

Dan Roth Department of Computer and Information Science

CSE 4705 Artificial Intelligence

Introduction to Data Science Lecture 7 Machine Learning Overview

Generalization and adaptivity in stochastic convex optimization

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

ECE 5424: Introduction to Machine Learning

Computational Learning Theory

Computational Learning Theory

Presentation transcript:

Strategy-Proof Classification Reshef Meir School of Computer Science and Engineering, Hebrew University A joint work with Ariel. D. Procaccia and Jeffrey S. Rosenschein

Strategy-Proof Classification Introduction – Learning and Classification – An Example of Strategic Behavior Motivation: – Decision Making – Machine Learning Our Model Some Results

Classification The Supervised Classification problem: – Input: a set of labeled data points {(x i,y i )} i=1..m – output: a classifier c from some predefined concept class C ( functions of the form f : X  {-,+} ) – We usually want c to classify correctly not just the sample, but to generalize well, i.e.to minimize Risk(c) ≡ E (x,y)~D [ L (c(x)≠y) ], Where D is the distribution from which we sampled the training data, L is some loss function. MotivationModelResultsIntroduction

Classification (cont.) A common approach is to return the ERM, i.e. the concept in C that is the best w.r.t. the given samples (a.k.a. training data) – Try to approximate it if finding it is hard Works well under some assumptions on the concept class C Should we do the same with many experts? MotivationModelResultsIntroduction

ERM MotivationModelResults Strategic labeling: an example Introduction 5 errors

There is a better classifier! (for me…) MotivationModelResultsIntroduction

If I will only change the labels… MotivationModelResultsIntroduction 2+4 = 6 errors

- Decision making ECB makes decisions based on reports from national banks National bankers gather positive/negative data from local institutions Each country reports to ECB Yes/no decision taken at European level Bankers might misreport their data in order to sway the central decision IntroductionModelResultsMotivation

Labels Managers Reported Dataset Classification Algorithm Classifier (Spam filter) Outlook IntroductionModelResults Machine Learning (spam filter) Motivation

Learning (cont.) Some s may be considered spam by certain managers, and relevant by others A manager might misreport labels to bias the final classifier towards her point-of-view IntroductionModelResultsMotivation

A Problem is characterized by An input space X A set of classifiers (concept class) C Every classifier c  C is a function c: X  {+,-} Optional assumptions and restrictions Example 1: All Linear Separators in R n Example 2: All subsets of a finite set Q IntroductionMotivationResultsModel

A problem instance is defined by Set of agents I = {1,...,n} A partial dataset for each agent i  I, X i = {x i1,...,x i,m(i) }  X For each x ik  X i agent i has a label y ik  { ,  } – Each pair s ik=  x ik,y ik  is an example – All examples of a single agent compose the labeled dataset S i = {s i1,...,s i,m(i) } The joint dataset S=  S 1, S 2,…, S n  is our input – m=|S| We denote the dataset with the reported labels by S’ IntroductionMotivationResultsModel

Agent 1 Agent 2 Agent 3 Input: Example + + – – – – + + – – – – – – – – – – – – X1  Xm1X1  Xm1 X2  Xm2X2  Xm2 X3  Xm3X3  Xm3 Y 1  {-,+} m 1 Y 2  {-,+} m 2 Y 3  {-,+} m 3 S =  S 1, S 2,…, S n  =  (X 1,Y 1 ),…, (X n,Y n )  IntroductionMotivationResultsModel

Mechanisms A Mechanism M receives a labeled dataset S’ and outputs c  C Private risk of i: R i (c,S) = |{k: c(x ik )  y ik }| / m i Global risk: R (c,S) = |{i,k: c(x ik )  y ik }| / m We allow non-deterministic mechanisms – The outcome is a random variable – Measure the expected risk IntroductionMotivationResultsModel

ERM We compare the outcome of M to the ERM: c* = ERM(S) = argmin( R (c),S) r* = R (c*,S) c  Cc  C Can our mechanism simply compute and return the ERM? IntroductionMotivationResultsModel

Requirements 1.Good approximation:  S R ( M (S),S) ≤ β∙r* 2.Strategy-Proofness:  i,S,S i ‘ R i ( M (S -i, S i ‘),S) ≤ R i ( M (S),S) ERM(S) is 1-approximating but not SP ERM(S 1 ) is SP but gives bad approximation Are there any mechanisms that guarantee both SP and good approximation? IntroductionMotivationResultsModel

Suppose | C |=2 Like in the ECB example There is a trivial deterministic SP 3- approximation mechanism Theorem: There are no deterministic SP α-approximation mechanisms, for any α<3 R. Meir, A. D. Procaccia and J. S. Rosenschein, Incentive Compatible Classification under Constant Hypotheses: A Tale of Two Functions, AAAI 2008 IntroductionMotivationModelResults

Proof C = {“all positive”, “all negative”} R. Meir, A. D. Procaccia and J. S. Rosenschein, Incentive Compatible Classification under Constant Hypotheses: A Tale of Two Functions, AAAI 2008 IntroductionMotivationModelResults

Randomization comes to the rescue There is a randomized SP 2-approximation mechanism (when |C|=2) – Randomization is non-trivial Once again, there is no better SP mechanism R. Meir, A. D. Procaccia and J. S. Rosenschein, Incentive Compatible Classification under Constant Hypotheses: A Tale of Two Functions, AAAI 2008 IntroductionMotivationModelResults

Negative results  Theorem: There are concept classes (including linear separators), for which there are no SP mechanisms with constant approximation Proof idea: – we first construct a classification problem that is equivalent to a voting problem – Then use impossibility results from Social-Choice theory to prove that there must be a dictator IntroductionMotivationModelResults R. Meir, A. D. Procaccia and J. S. Rosenschein, On the Power of Dictatorial Classification, in submission.

Classification as Voting Only 2 errors (m-2) errors IntroductionMotivationModelResults

More positive results Suppose all agents control the same data points, i.e. X 1 = X 2 =…= X n Theorem: Selecting a dictator at random is SP and guarantees 3-approximation – True for any concept class C – 2-approximation when each S i is separable Agent 1 Agent 2 Agent 3 IntroductionMotivationModelResults R. Meir, A. D. Procaccia and J. S. Rosenschein, Incentive Compatible Classification with Shared Inputs, in submission.

Proof idea IntroductionMotivationModelResults The average pair-wise distance between green dots, cannot be much higher than the average distance to the star

Generalization So far, we only compared our results to the ERM, i.e. to the data at hand We want learning algorithms that can generalize well from sampled data – with minimal strategic bias – Can we ask for SP algorithms? IntroductionMotivationModelResults

Generalization (cont.) There is a fixed distribution D X on X Each agent holds a private function Y i : X  {+,-} – Possibly non-deterministic The algorithm is allowed to sample from D X and ask agents for their labels We evaluate the result vs. the optimal risk, averaging over all agents, i.e. IntroductionMotivationModelResults Model

Generalization (cont.) IntroductionMotivationModelResults Model DXDX Y1Y1 Y3Y3 Y2Y2

Generalization Mechanisms Our mechanism is used as follows: 1.Sample m data points i.i.d 2.Ask agents for their labels 3.Use the SP mechanism on the labeled data, and return the result Does it work? – Depends on our game-theoretic and learning- theoretic assumptions IntroductionMotivationModelResults

The “truthful approach” Assumption A: Agents do not lie unless they gain at least ε Theorem: W.h.p. the following occurs – There is no ε-beneficial lie – Approximation ratio (if no one lies) is close to 3 Corollary: with enough samples, the expected approximation ratio is close to 3 The number of required samples is polynomial in n and 1/ε IntroductionMotivationModelResults R. Meir, A. D. Procaccia and J. S. Rosenschein, Incentive Compatible Classification with Shared Inputs, in submission.

The “Rational approach” Assumption B: Agents always pick a dominant strategy, if one exists. Theorem: with enough samples, the expected approximation ratio is close to 3 The number of required samples is polynomial in 1/ε (and not on n) IntroductionMotivationModelResults R. Meir, A. D. Procaccia and J. S. Rosenschein, Incentive Compatible Classification with Shared Inputs, in submission.

Previous and future work A study of SP mechanisms in Regression learning 1 No SP mechanisms for Clustering 2 Future directions Other concept classes Other loss functions Alternative assumptions on structure of data 1 O. Dekel, F. Fischer and A. D. Procaccia, Incentive Compatible Regression Learning, SODA J. Perote-Peña and J. Perote. The impossibility of strategy-proof clustering, Economics Bulletin, IntroductionMotivationModelResults