Download presentation
Presentation is loading. Please wait.
1
Chi-Jen Lu Academia Sinica
Learning in Games Chi-Jen Lu Academia Sinica
2
Outline What machine learning can do for game theory
What game theory can do for machine learning
3
Two-player zero-sum games
4
Zero-sum games -1 1 1 -1 player 2 player 2 player 1 player 1
-1 1 1 -1 utility (reward) of player 1 utility (reward) of player 2
5
Zero-sum games ๐พ ๐ : action set of player i.
๐: utility matrix of Player 1. U(a, b) Player 1: max. Player 2: min. Donโt want to play first: max ๐โ ๐พ 1 min ๐โ ๐พ 2 ๐(๐,๐) ๏ฃ min ๐โ ๐พ 2 max ๐โ ๐พ 1 ๐(๐,๐) < in many games
6
Zero-sum games Minimax Theorem: max ๐ด min ๐ต ๐ ๐ด,๐ต = min ๐ต max ๐ด ๐ ๐ด,๐ต
How to find such A, B efficiently? E ๐~๐ด ๐~๐ต [๐ ๐,๐ ] distributions distributions
7
Online Learning
8
Online learning / decision
Making decisions/predictions and then paying the prices repeatedly I wish I hadโฆ
9
Many examples Predicting weather, trading stocks, commuting to work, โฆ
Network routing Scheduling Resource allocation Online advertising โฆ
10
Problem formulation Play for T rounds, from action set K In round t,
play an action ๐ฅ (๐ก) โ๐พ or โโ(๐พ) receive a reward ๐ (๐ก) (๐ฅ (๐ก) ) How to choose ๐ฅ (๐ก) ? Goal? distribution over K E ๐~ ๐ฅ (๐ก) [ ๐ ๐ก (๐)]
11
Goal: minimize regret Regret: max ๐ฅ โ ๐ก=1 ๐ ๐ (๐ก) (๐ฅ โ )
total reward of best fixed strategy max ๐ฅ โ ๐ก=1 ๐ ๐ (๐ก) (๐ฅ โ ) total reward of online algorithm ๐ก=1 ๐ ๐ (๐ก) (๐ฅ (๐ก) ) I wish I hadโฆ
12
No-regret algorithms T-step regret โ ๐
Average regret per step: โ1/ ๐ ๏ฎ 0 Finite action space K time, space per step โ |K| Convex K ๏ Rd and concave ๐ (๐ก) โs time, space per step โ d Gradient-descent type algorithms
13
Applications in other areas
algorithms: approximation algorithms complexity: hardcore set for derandomization optimization: LP duality biology: evolution game theory: minimax theorem
14
Zero-sum games Minimax Theorem: max ๐ด min ๐ต ๐ ๐ด,๐ต = min ๐ต max ๐ด ๐ ๐ด,๐ต
How to find such A, B efficiently? Play no regret algorithm with each other: Get ๐ฅ (1) , โฆ,๐ฅ (๐) and ๐ฆ (1) , โฆ,๐ฆ (๐) ๐ด= 1 ๐ ๐ก=1 ๐ ๐ฅ (๐ก) . B= 1 ๐ ๐ก=1 ๐ ๐ฆ (๐ก) . Time, space โ #(actions) โ๏ฅ E ๐~๐ด ๐~๐ต [๐ ๐,๐ ] distributions distributions one-shot game T โ 1/๏ฅ2 huge?
15
Influence maximization games
16
Opinion formation in social net
A population of n individuals, each with some internal opinion from [-1, 1] Each tries to express an opinion close to neighborsโ opinions and her internal one vs.
17
Opinion formation in social net
Zero-sum game between and goal: makes n shades of grey darker actions: controls the opinions of k individuals Find minimax strategy? player/party player/party: lighter ๐ ๐ actions
18
Opinion formation in social net
Zero-sum game between and goal: makes n shades of grey darker actions: controls the opinions of k individuals Find minimax strategy? Solution: no-regret algorithm for online combinatorial optimization. player/party player/party: lighter ๐ ๐ actions follow the perturbed leader
19
markov games
20
max ๐ด min ๐ต ๐ ๐ด,๐ต = min ๐ต max ๐ด ๐ ๐ด,๐ต
Games with states board configurations policy: states ๏ฎ actions (randomized) Minimax theorem: policy max ๐ด min ๐ต ๐ ๐ด,๐ต = min ๐ต max ๐ด ๐ ๐ด,๐ต
21
max ๐ด min ๐ต ๐ ๐ด,๐ต = min ๐ต max ๐ด ๐ ๐ด,๐ต
Games with states board configurations policy: states ๏ฎ actions (randomized) Minimax theorem: How to find such A, B efficiently? #(policies) โ #(states) #(actions) max ๐ด min ๐ต ๐ ๐ด,๐ต = min ๐ต max ๐ด ๐ ๐ด,๐ต policy policy huge?
22
still huge for many games
Games with states Solution: no-regret algorithm for two- player Markov decision process Time, space ๏ป poly(#(states), #(actions)) still huge for many games
23
Outline What machine learning can do for game theory
What game theory can do for machine learning
24
Algorithms vs. adversaries
25
No-regret algorithm ๏ค algorithm A s.t. ๏ข sequence of loss functions ๐=( ๐ 1 , โฆ, ๐ ๐ ) : Regret(๐ด,๐)โค๐( ๐ ) min ๐ด max ๐ Regret ๐ด,๐ โค๐( ๐ ) log T ? find adversarial c โฅ ฮฉ( ๐ ) smaller regret benign class of c
26
More generallyโฆ For any algorithm design problem and any cost measure (regret, time, space, โฆ) ๏ค algorithm A s.t. ๏ข input ๐ฅ: cost(๐ด,๐ฅ)โค โฆ min ๐ด max ๐ฅ cost ๐ด,๐ฅ โคโฆ โฅโฆ
27
Generative adversarial networks
28
Learning generative models
fake images!
29
Learning generative models
fake images!
30
Learning generative models
Training data: real face images Learn generative model G: random seeds ๏ฎ face images novel / fake
31
Learning generative models
Training data: real face images Learn generative model G: random seeds ๏ฎ face images How to train a good G? If we can evaluate how bad/good G isโฆ Discriminator D(x) โ if x is fake โ1 if x is real How to get a good D? novel / fake
32
Play the zero-sum game D tries to distinguish fake images from real ones G tries to fool D min G max D E z [D G z ] โ E x~real [D ๐ฅ ] Learning generative model G ๏ Finding minimax solution to the game by behaving differently fake real Still not an easy task! G, D: deep neural nets. huge action sets
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.