Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chi-Jen Lu Academia Sinica

Similar presentations


Presentation on theme: "Chi-Jen Lu Academia Sinica"โ€” Presentation transcript:

1 Chi-Jen Lu Academia Sinica
Learning in Games Chi-Jen Lu Academia Sinica

2 Outline What machine learning can do for game theory
What game theory can do for machine learning

3 Two-player zero-sum games

4 Zero-sum games -1 1 1 -1 player 2 player 2 player 1 player 1
-1 1 1 -1 utility (reward) of player 1 utility (reward) of player 2

5 Zero-sum games ๐พ ๐‘– : action set of player i.
๐‘ˆ: utility matrix of Player 1. U(a, b) Player 1: max. Player 2: min. Donโ€™t want to play first: max ๐‘Žโˆˆ ๐พ 1 min ๐‘โˆˆ ๐พ 2 ๐‘ˆ(๐‘Ž,๐‘) ๏‚ฃ min ๐‘โˆˆ ๐พ 2 max ๐‘Žโˆˆ ๐พ 1 ๐‘ˆ(๐‘Ž,๐‘) < in many games

6 Zero-sum games Minimax Theorem: max ๐ด min ๐ต ๐‘ˆ ๐ด,๐ต = min ๐ต max ๐ด ๐‘ˆ ๐ด,๐ต
How to find such A, B efficiently? E ๐‘Ž~๐ด ๐‘~๐ต [๐‘ˆ ๐‘Ž,๐‘ ] distributions distributions

7 Online Learning

8 Online learning / decision
Making decisions/predictions and then paying the prices repeatedly I wish I hadโ€ฆ

9 Many examples Predicting weather, trading stocks, commuting to work, โ€ฆ
Network routing Scheduling Resource allocation Online advertising โ€ฆ

10 Problem formulation Play for T rounds, from action set K In round t,
play an action ๐‘ฅ (๐‘ก) โˆˆ๐พ or โˆˆโˆ†(๐พ) receive a reward ๐‘Ÿ (๐‘ก) (๐‘ฅ (๐‘ก) ) How to choose ๐‘ฅ (๐‘ก) ? Goal? distribution over K E ๐‘Ž~ ๐‘ฅ (๐‘ก) [ ๐‘Ÿ ๐‘ก (๐‘Ž)]

11 Goal: minimize regret Regret: max ๐‘ฅ โˆ— ๐‘ก=1 ๐‘‡ ๐‘Ÿ (๐‘ก) (๐‘ฅ โˆ— )
total reward of best fixed strategy max ๐‘ฅ โˆ— ๐‘ก=1 ๐‘‡ ๐‘Ÿ (๐‘ก) (๐‘ฅ โˆ— ) total reward of online algorithm ๐‘ก=1 ๐‘‡ ๐‘Ÿ (๐‘ก) (๐‘ฅ (๐‘ก) ) I wish I hadโ€ฆ

12 No-regret algorithms T-step regret โ‰ˆ ๐‘‡
Average regret per step: โ‰ˆ1/ ๐‘‡ ๏‚ฎ 0 Finite action space K time, space per step โ‰ˆ |K| Convex K ๏ƒ Rd and concave ๐‘Ÿ (๐‘ก) โ€™s time, space per step โ‰ˆ d Gradient-descent type algorithms

13 Applications in other areas
algorithms: approximation algorithms complexity: hardcore set for derandomization optimization: LP duality biology: evolution game theory: minimax theorem

14 Zero-sum games Minimax Theorem: max ๐ด min ๐ต ๐‘ˆ ๐ด,๐ต = min ๐ต max ๐ด ๐‘ˆ ๐ด,๐ต
How to find such A, B efficiently? Play no regret algorithm with each other: Get ๐‘ฅ (1) , โ€ฆ,๐‘ฅ (๐‘‡) and ๐‘ฆ (1) , โ€ฆ,๐‘ฆ (๐‘‡) ๐ด= 1 ๐‘‡ ๐‘ก=1 ๐‘‡ ๐‘ฅ (๐‘ก) . B= 1 ๐‘‡ ๐‘ก=1 ๐‘‡ ๐‘ฆ (๐‘ก) . Time, space โ‰ˆ #(actions) โ‰ˆ๏ฅ E ๐‘Ž~๐ด ๐‘~๐ต [๐‘ˆ ๐‘Ž,๐‘ ] distributions distributions one-shot game T โ‰ˆ 1/๏ฅ2 huge?

15 Influence maximization games

16 Opinion formation in social net
A population of n individuals, each with some internal opinion from [-1, 1] Each tries to express an opinion close to neighborsโ€™ opinions and her internal one vs.

17 Opinion formation in social net
Zero-sum game between and goal: makes n shades of grey darker actions: controls the opinions of k individuals Find minimax strategy? player/party player/party: lighter ๐‘› ๐‘˜ actions

18 Opinion formation in social net
Zero-sum game between and goal: makes n shades of grey darker actions: controls the opinions of k individuals Find minimax strategy? Solution: no-regret algorithm for online combinatorial optimization. player/party player/party: lighter ๐‘› ๐‘˜ actions follow the perturbed leader

19 markov games

20 max ๐ด min ๐ต ๐‘ˆ ๐ด,๐ต = min ๐ต max ๐ด ๐‘ˆ ๐ด,๐ต
Games with states board configurations policy: states ๏‚ฎ actions (randomized) Minimax theorem: policy max ๐ด min ๐ต ๐‘ˆ ๐ด,๐ต = min ๐ต max ๐ด ๐‘ˆ ๐ด,๐ต

21 max ๐ด min ๐ต ๐‘ˆ ๐ด,๐ต = min ๐ต max ๐ด ๐‘ˆ ๐ด,๐ต
Games with states board configurations policy: states ๏‚ฎ actions (randomized) Minimax theorem: How to find such A, B efficiently? #(policies) โ‰ˆ #(states) #(actions) max ๐ด min ๐ต ๐‘ˆ ๐ด,๐ต = min ๐ต max ๐ด ๐‘ˆ ๐ด,๐ต policy policy huge?

22 still huge for many games
Games with states Solution: no-regret algorithm for two- player Markov decision process Time, space ๏‚ป poly(#(states), #(actions)) still huge for many games

23 Outline What machine learning can do for game theory
What game theory can do for machine learning

24 Algorithms vs. adversaries

25 No-regret algorithm ๏€ค algorithm A s.t. ๏€ข sequence of loss functions ๐‘=( ๐‘ 1 , โ€ฆ, ๐‘ ๐‘‡ ) : Regret(๐ด,๐‘)โ‰ค๐‘‚( ๐‘‡ ) min ๐ด max ๐‘ Regret ๐ด,๐‘ โ‰ค๐‘‚( ๐‘‡ ) log T ? find adversarial c โ‰ฅ ฮฉ( ๐‘‡ ) smaller regret benign class of c

26 More generallyโ€ฆ For any algorithm design problem and any cost measure (regret, time, space, โ€ฆ) ๏€ค algorithm A s.t. ๏€ข input ๐‘ฅ: cost(๐ด,๐‘ฅ)โ‰ค โ€ฆ min ๐ด max ๐‘ฅ cost ๐ด,๐‘ฅ โ‰คโ€ฆ โ‰ฅโ€ฆ

27 Generative adversarial networks

28 Learning generative models
fake images!

29 Learning generative models
fake images!

30 Learning generative models
Training data: real face images Learn generative model G: random seeds ๏‚ฎ face images novel / fake

31 Learning generative models
Training data: real face images Learn generative model G: random seeds ๏‚ฎ face images How to train a good G? If we can evaluate how bad/good G isโ€ฆ Discriminator D(x) โ‰ˆ if x is fake โˆ’1 if x is real How to get a good D? novel / fake

32 Play the zero-sum game D tries to distinguish fake images from real ones G tries to fool D min G max D E z [D G z ] โˆ’ E x~real [D ๐‘ฅ ] Learning generative model G ๏ƒž Finding minimax solution to the game by behaving differently fake real Still not an easy task! G, D: deep neural nets. huge action sets


Download ppt "Chi-Jen Lu Academia Sinica"

Similar presentations


Ads by Google