Chi-Jen Lu Academia Sinica

Chi-Jen Lu Academia Sinica
Learning in Games Chi-Jen Lu Academia Sinica

Outline What machine learning can do for game theory
What game theory can do for machine learning

Two-player zero-sum games

Zero-sum games -1 1 1 -1 player 2 player 2 player 1 player 1
-1 1 1 -1 utility (reward) of player 1 utility (reward) of player 2

Zero-sum games 𝐾 𝑖 : action set of player i.
𝑈: utility matrix of Player 1. U(a, b) Player 1: max. Player 2: min. Don’t want to play first: max 𝑎∈ 𝐾 1 min 𝑏∈ 𝐾 2 𝑈(𝑎,𝑏)  min 𝑏∈ 𝐾 2 max 𝑎∈ 𝐾 1 𝑈(𝑎,𝑏) < in many games

Zero-sum games Minimax Theorem: max 𝐴 min 𝐵 𝑈 𝐴,𝐵 = min 𝐵 max 𝐴 𝑈 𝐴,𝐵
How to find such A, B efficiently? E 𝑎~𝐴 𝑏~𝐵 [𝑈 𝑎,𝑏 ] distributions distributions

Online Learning

Online learning / decision
Making decisions/predictions and then paying the prices repeatedly I wish I had…

Many examples Predicting weather, trading stocks, commuting to work, …
Network routing Scheduling Resource allocation Online advertising …

Problem formulation Play for T rounds, from action set K In round t,
play an action 𝑥 (𝑡) ∈𝐾 or ∈∆(𝐾) receive a reward 𝑟 (𝑡) (𝑥 (𝑡) ) How to choose 𝑥 (𝑡) ? Goal? distribution over K E 𝑎~ 𝑥 (𝑡) [ 𝑟 𝑡 (𝑎)]

Goal: minimize regret Regret: max 𝑥 ∗ 𝑡=1 𝑇 𝑟 (𝑡) (𝑥 ∗ )
total reward of best fixed strategy max 𝑥 ∗ 𝑡=1 𝑇 𝑟 (𝑡) (𝑥 ∗ ) total reward of online algorithm 𝑡=1 𝑇 𝑟 (𝑡) (𝑥 (𝑡) ) I wish I had…

No-regret algorithms T-step regret ≈ 𝑇
Average regret per step: ≈1/ 𝑇  0 Finite action space K time, space per step ≈ |K| Convex K  Rd and concave 𝑟 (𝑡) ’s time, space per step ≈ d Gradient-descent type algorithms

Applications in other areas
algorithms: approximation algorithms complexity: hardcore set for derandomization optimization: LP duality biology: evolution game theory: minimax theorem

Zero-sum games Minimax Theorem: max 𝐴 min 𝐵 𝑈 𝐴,𝐵 = min 𝐵 max 𝐴 𝑈 𝐴,𝐵
How to find such A, B efficiently? Play no regret algorithm with each other: Get 𝑥 (1) , …,𝑥 (𝑇) and 𝑦 (1) , …,𝑦 (𝑇) 𝐴= 1 𝑇 𝑡=1 𝑇 𝑥 (𝑡) . B= 1 𝑇 𝑡=1 𝑇 𝑦 (𝑡) . Time, space ≈ #(actions) ≈ E 𝑎~𝐴 𝑏~𝐵 [𝑈 𝑎,𝑏 ] distributions distributions one-shot game T ≈ 1/2 huge?

Influence maximization games

Opinion formation in social net
A population of n individuals, each with some internal opinion from [-1, 1] Each tries to express an opinion close to neighbors’ opinions and her internal one vs.

Zero-sum game between and goal: makes n shades of grey darker actions: controls the opinions of k individuals Find minimax strategy? player/party player/party: lighter 𝑛 𝑘 actions

Zero-sum game between and goal: makes n shades of grey darker actions: controls the opinions of k individuals Find minimax strategy? Solution: no-regret algorithm for online combinatorial optimization. player/party player/party: lighter 𝑛 𝑘 actions follow the perturbed leader

markov games

max 𝐴 min 𝐵 𝑈 𝐴,𝐵 = min 𝐵 max 𝐴 𝑈 𝐴,𝐵
Games with states board configurations policy: states  actions (randomized) Minimax theorem: policy max 𝐴 min 𝐵 𝑈 𝐴,𝐵 = min 𝐵 max 𝐴 𝑈 𝐴,𝐵

max 𝐴 min 𝐵 𝑈 𝐴,𝐵 = min 𝐵 max 𝐴 𝑈 𝐴,𝐵
Games with states board configurations policy: states  actions (randomized) Minimax theorem: How to find such A, B efficiently? #(policies) ≈ #(states) #(actions) max 𝐴 min 𝐵 𝑈 𝐴,𝐵 = min 𝐵 max 𝐴 𝑈 𝐴,𝐵 policy policy huge?

still huge for many games
Games with states Solution: no-regret algorithm for two- player Markov decision process Time, space  poly(#(states), #(actions)) still huge for many games

Outline What machine learning can do for game theory
What game theory can do for machine learning

Algorithms vs. adversaries

No-regret algorithm  algorithm A s.t.  sequence of loss functions 𝑐=( 𝑐 1 , …, 𝑐 𝑇 ) : Regret(𝐴,𝑐)≤𝑂( 𝑇 ) min 𝐴 max 𝑐 Regret 𝐴,𝑐 ≤𝑂( 𝑇 ) log T ? find adversarial c ≥ Ω( 𝑇 ) smaller regret benign class of c

More generally… For any algorithm design problem and any cost measure (regret, time, space, …)  algorithm A s.t.  input 𝑥: cost(𝐴,𝑥)≤ … min 𝐴 max 𝑥 cost 𝐴,𝑥 ≤… ≥…

Generative adversarial networks

Learning generative models
fake images!

Training data: real face images Learn generative model G: random seeds  face images novel / fake

Training data: real face images Learn generative model G: random seeds  face images How to train a good G? If we can evaluate how bad/good G is… Discriminator D(x) ≈ if x is fake −1 if x is real How to get a good D? novel / fake

Play the zero-sum game D tries to distinguish fake images from real ones G tries to fool D min G max D E z [D G z ] − E x~real [D 𝑥 ] Learning generative model G  Finding minimax solution to the game by behaving differently fake real Still not an easy task! G, D: deep neural nets. huge action sets

Chi-Jen Lu Academia Sinica

Similar presentations

Presentation on theme: "Chi-Jen Lu Academia Sinica"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chi-Jen Lu Academia Sinica

Similar presentations

Presentation on theme: "Chi-Jen Lu Academia Sinica"— Presentation transcript:

Similar presentations

About project

Feedback