Algorithmic Game Theory & Machine Learning Christos Papadimitriou.

Slides:

Advertisements

Similar presentations

Approximate Nash Equilibria in interesting games Constantinos Daskalakis, U.C. Berkeley.

Advertisements

Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.

Mixed Strategies CMPT 882 Computational Game Theory Simon Fraser University Spring 2010 Instructor: Oliver Schulte.

COMP 553: Algorithmic Game Theory Fall 2014 Yang Cai Lecture 21.

Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.

Support vector machine

Game-Theoretic Approaches to Multi-Agent Systems Bernhard Nebel.

Approximation Algoirthms: Semidefinite Programming Lecture 19: Mar 22.

Overlapping Coalition Formation: Charting the Tractability Frontier Y. Zick, G. Chalkiadakis and E. Elkind (submitted to AAMAS 2012)

Algorithmic Game Theory Nicola Gatti and Marcello Restelli {ngatti, DEI, Politecnico di Milano, Piazza Leonardo da Vinci 32, Milano,

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.

Probably Approximately Correct Model (PAC)

Support Vector Machines Based on Burges (1998), Scholkopf (1998), Cristianini and Shawe-Taylor (2000), and Hastie et al. (2001) David Madigan.

An Introduction to Game Theory Part III: Strictly Competitive Games Bernhard Nebel.

Global Synchronization in Sensornets Jeremy Elson, Richard Karp, Christos Papadimitriou, Scott Shenker.

Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.

Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.

Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.

SVM Support Vectors Machines

SVM (Support Vector Machines) Base on statistical learning theory choose the kernel before the learning process.

Computing Equilibria Christos H. Papadimitriou UC Berkeley “christos”

Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.

Near-Optimal Network Design With Selfish Agents Elliot Anshelevich, Anirban Dasgupta, Éva Tardos, Tom Wexler STOC’03, June 9–11, 2003, San Diego, California,

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Inefficiency of equilibria, and potential games Computational game theory Spring 2008 Michal Feldman.

6.896: Topics in Algorithmic Game Theory Spring 2010 Constantinos Daskalakis vol. 1:

Minimax strategies, Nash equilibria, correlated equilibria Vincent Conitzer

Support Vector Machines

Strategy-Proof Classification Reshef Meir School of Computer Science and Engineering, Hebrew University A joint work with Ariel. D. Procaccia and Jeffrey.

Computing Equilibria Christos H. Papadimitriou UC Berkeley “christos”

Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.

Mechanisms for Making Crowds Truthful Andrew Mao, Sergiy Nesterko.

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

Algorithmic Game Theory Christos Papadimitriou Econ/TCS Boot Camp.

ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.

Yang Cai Oct 08, An overview of today’s class Basic LP Formulation for Multiple Bidders Succinct LP: Reduced Form of an Auction The Structure of.

Game theory & Linear Programming Steve Gu Mar 28, 2008.

CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)

Regret Minimizing Equilibria of Games with Strict Type Uncertainty Stony Brook Conference on Game Theory Nathanaël Hyafil and Craig Boutilier Department.

START OF DAY 5 Reading: Chap. 8. Support Vector Machine.

Ásbjörn H Kristbjörnsson1 The complexity of Finding Nash Equilibria Ásbjörn H Kristbjörnsson Algorithms, Logic and Complexity.

Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.

Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.

Support Vector Machines Tao Department of computer science University of Illinois.

Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.

Machine learning optimization Usman Roshan. Machine learning Two components: – Modeling – Optimization Modeling – Generative: we assume a probabilistic.

Machine Learning in CSC 196K

5.3 Algorithmic Stability Bounds Summarized by: Sang Kyun Lee.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.

Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.

Bayesian Algorithmic Mechanism Design Jason Hartline Northwestern University Brendan Lucier University of Toronto.

Page 1 CS 546 Machine Learning in NLP Review 2: Loss minimization, SVM and Logistic Regression Dan Roth Department of Computer Science University of Illinois.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.

CS 9633 Machine Learning Support Vector Machines

Information Complexity Lower Bounds

Empirical risk minimization

Algorithmic Game Theory and Internet Computing

Multiagent Systems Game Theory © Manfred Huber 2018.

CSCI B609: “Foundations of Data Science”

Computational Learning Theory

10701 / Machine Learning Today: - Cross validation,

Computational Learning Theory

Generally Discriminant Analysis

Support vector machines

Empirical risk minimization

Supervised machine learning: creating a model

Normal Form (Matrix) Games

Presentation transcript:

Algorithmic Game Theory & Machine Learning Christos Papadimitriou

AGT’s three recent brushes with ML 1.Classification when the data are strategic (with Hardt, Megiddo, Wootters, ITCS 2016) 2.Statistical learning when the data sources are strategic (with Cai, Daskalakis, COLT 2015) 3.A theorem in Topology as solution concept (with Georgios Piliouras, ITCS 2016)

1. Classification when the data are strategic Fact: The number of books in the household is an excellent predictor of success in college – actually, a bit better than grades So, an admissions classifier should use this!

Problem: books classifier grades

The data are strategic! Applicants will buy enough books to get in (Or will invest unreasonable money and effort to improve their SATs, or …) (Assume they know your classifier) As long as their utility of admission covers this expense

So, you may have to… books grades

But of course… This way you will misclassify a different set… (and inconvenience many...) Suppose applicants have the same utility ±1, and same cost c(x,y) ≥ 0 of moving from x to y You want to maximize the (expected) score = #correctly classified - # misclassified

Stackelberg! Given: Data, cost function, ideal classifier h… …come up with modified classifier h * … …such that, when the data respond by moving to the closest admitted point… (if cost of moving < 2, recall utility = ±1) …the best score is achieved Call this theoretical full-information maximum optscore

Our goal in classification Given data space X, distribution D on X, target classification h: X  ±1, sample m points from D labeled by h based in these, construct in poly time (poly in m, 1/ε, log 1/δ) a classifier f: X  ±1 such that with probability 1 – δ the expected score of any new sample is within ε of optscore

Hard problem even with nonstrategic data Distribution and concept class must have low “classification complexity” e.g., VC dimension, Rademacher complexity and then the problem must be polynomial- time learnable

Surprise: strategic data is “easier” to classify! Theorem: If the cost function is separable, then we can classify strategic points within ε of the theoretical optscore as long as the distribution and concept class have low classification complexity Note: No computational complexity constraint!

Separable cost functions c(x,y) = f(y) – g(x) + Example: + In fact, same theorem if c is the minimum of a finite number of k separable functions “Bounded separability dimension” Complexity ≈ m k+1 NB: For simple metrics (unbounded k) even the full information problem is NP-complete

How come? Suppose c(x,y) = + The vector β Your optimal, but computationally complex, Stackelberg classifier There is a linear (and thus comp’ly easy) Stackelberg classifier that does as well!

AGT’s three recent brushes with ML 1.Classification when the data are strategic (with Hardt, Megiddo, Wootters, ITCS 2016) 2.Statistical learning when the data sources are strategic (with Cai, Daskalakis, COLT 2015) 3.A theorem in Topology as solution concept (with Georgios Piliouras, ITCS 2016)

2. Strategic Data Sources Context: Statistical estimation There is a ground truth function F(x) You estimate it from data points (x i,y i ) You get the data points from workers Each worker i has a convex function σ i With effort e ≥ 0 from x she finds y = F(x) in expectation with variance [σ i (e)] 2

Meet the agents You (the Statistician) have a way, from data {x i,y i }, to find an approximation G of F (Think linear regression, much more general) Your loss is L = E x~D [(F(x) – G(x)) 2 ] Assumption: L depends only on {x i }, D, {σ i } (Not on F; estimation is unbiased) Plus payments to workers, Σ i p i Each worker i wants to minimize e i – p i

Social Optimum OPT = min W’, x, e [ L({x i }, D, {e i }) + Σ i e i ] Notice that OPT is your (the Statistician’s) wildest dream of a total loss It means that you have persuaded the workers to exert optimum (for you) effort with no surplus (can’t do better because of IR)

Surprise: It can be achieved! Theorem: There is a mechanism which achieves “your wildest dream” loss OPT in dominating strategiesas dominant strategy equilibrium Trick: Promise payments p i = A i – B i (y i – G -i (x -i )) 2 zero surplus optimum effort

Caveat This means that the incentive problem can be solved, essentially by a powerful contract What is less clear is how to solve the computational problem – how to compute OPT Btw, if the x i ’s are given, even ridge regression can be managed the same way

AGT’s three recent brushes with ML 1.Classification when the data are strategic (with Hardt, Megiddo, Wootters, ITCS 2016) 2.Statistical learning when the data sources are strategic (with Cai, Daskalakis, COLT 2015) 3.A theorem in Topology and solution concepts (with Georgios Piliouras, ITCS 2016)

3. Solution Concepts from the Topology of Learning in Games Fact: Learning dynamics usually fail to find the Nash equilibrium Is this bad news for learning dynamics? Or for Nash equilibrium? In particular, can we view learning dynamics as a novel solution concept? “learning dynamics” ≈ “replicator dynamics”

Basic idea The asymptotic behavior of the dynamics Generalizes the Nash equilibrium Coarse correlated equilibria are “achieved” this way (But suppose we are more interested in behavior, not in performance) What other behaviors are there besides equilibria? Limit cycles perhaps?

Basic idea seems to work: matching pennies

Basic idea seems to work (cont.): coordination

Basic Idea does not work! The dynamics (of even two-player games) can be chaotic…

Conley to the Rescue Chain Recurrent Set: “Almost” limit cycle Can be turned to a cycle by finitely many, arbitrarily small jumps Fundamental Theorem of Dynamical Systems [Conley 1984]: Any dynamical system ends up in some chain recurrent set, drawn there by a Lyapunov function

One CRS

Five CRS’s

The CRS structure of a game The CRSs of a game form a DAG The matching pennies DAG: The coordination game DAG: (0.5, 2) (0.5, 2) Probabilities and scores of sink nodes determine expected performance of dynamics Work in progress: characterization of CRS’s for 2-player games

Bottom line John Nash based his solution concept on the most sophisticated theorem in topology of his era Shouldn’t we do the same?

Thank You!