Download presentation

Presentation is loading. Please wait.

Published byStacey Ludwick Modified over 2 years ago

2
Online learning and game theory Adam Kalai (joint with Sham Kakade)

4
How do we learn? Goal learn a function f: X ! Y Batch (offline) model Get training data (x 1,y 1 ),…,(x n,y n ) drawn independently from some distribution over X £ Y We output f: X ! Y with low error P [f(x) y] Online (repeated game) model, for i=1,2,…,n: Observe ith example x i 2 X We predict its label Observe true label y i 2 {–,+} Goal: make as few mistakes as possible Distribution-free learning Y = {–,+} =?

5
Outline 1. Online/batch learnability of F Online learnability ) batch learnability Finite learning: batch and online (via weighted maj.) Batch learnability, online learnability? 2. Online learning in repeated games Zero-sum: Weighted majority General-sum: No “internal regret” ) corr. eq.

6
Online learning Adversary picks (x 1,y 1 ) 2 X £ Y We see x 1 We predict z 1 We see y 1 … Adversary picks (x n,y n ) 2 X £ Y We see x n We predict z n We see y n – + – ? “empirical error” err(A,data) = | {i j z i y i } | / n X = R 2 Y = {–,+} Online alg. A(x 1,y 1,…,x i-1,y i-1,x i )=z i

7
Batch Learning X £ Y X = R 2 Y = {–,+} data: (x 1,y 1 ),…,(x n,y n ) laerning algorithm A f: X ! Y – + – – – – – – – – – – –– – – – – – – – – – – – – + + + + + + + + + + + + + + + + + + + + + + + + + – + + + + + + + + + + – + – + – – – – +

8
– – – – – – – – – – –– – – – – – – – – – – – – + + + + + + + + + + + + + + + + + + + + + + + + + – + + + + + + + + + + – + – + – – – – + Batch Learning X £ Y X = R 2 Y = {–,+} data: (x 1,y 1 ),…,(x n,y n ) – – – – – – – – – – –– – – – – – – – – – – – – + + + + + + + + + + + + + + + + + + + + + + + + + – + + + + + + + + + + – + –.+ – – – – + laerning algorithm A f: X ! Y – + “generalization error” err(f, ) = Pr [f(x) y] “empirical error” err(f,data) = | {i j f(x i ) y i } | / n – +

9
Online/batch learnability of F Family F of functions f: X ! Y ( Y = {–,+}) Alg. A learns F online if 9 k,c>0 Online input: data (x 1,y 1 ),…,(x n,y n ) Regret(A,data) = err(A,data)–min g 2F err(g,data) Alg. B batch learns F if 9 k,c>0 Input: (x 1,y ),…,(x n,y n ) independent from Output: f 2 F, Regret(f, ) = err(f, )–min g 2F err(g, ) X £ Y 8 data E[Regret(A,data)] · k / n c 8 E data [Regret(B, )] · k / n c

10
Online learnable ) Batch learnable · Given online learning algorithm A Define batch learning algorithm B Input: (x 1,y 1 ),(x 2,y 2 ),…,(x n,y n ) from Let f i (x): X ! Y be f i (x)=A(x 1,y 1,…,x i-1,y i-1,x) Pick i 2 {1,2,…,n} at random and output f i Analysis E[Regret(A,data)] = E[err(A,data)] – E[min g 2F err(g,data)] E[Regret(B, )] = E[err(B, )] – min g 2F err(g, )

11
Given online learning algorithm A Define batch learning algorithm B Input: (x 1,y 1 ),(x 2,y 2 ),…,(x n,y n ) from Let f i (x): X ! Y be f i (x)=A(x 1,y 1,…,x i-1,y i-1,x) Pick i 2 {1,2,…,n} at random and output f i Analysis E[Regret(A,data)] = E[err(A,data)] – E[min g 2F err(g,data)] E[Regret(B, )] = E[err(B, )] – min g 2F err(g, ) Online learnable ) Batch learnable · · ·

12
Outline 1. Online/batch learnability of F Online learnability ) batch learnability Finite learning: batch and online (via weighted maj.) Batch learnability ; online learnability Batch learnability ) online learnability 2. Online learning in repeated games Zero-sum: Weighted majority ) eq. General-sum: No “internal regret” ) corr. eq. transductive

13
Online majority algorithm Say there is some perfect f* 2 F, err(f*,data)=0 Say | F |=F Predict according to majority of consistent f’s Each mistake Maj makes eliminates ¸ ½ of f’s Maj’s #mistakes · log 2 (F) err(Maj,data) · log 2 (F)/n Perfect f 2 F f 1 f 2 f 3 … f F (live) majority truth y x 1 x 2 x 3 … x n + – + … + + + – – + … + + – + + + … – + –

14
Naive batch learning Say there is some perfect f* 2 F, err(f*,data)=0 Say | F |=F Select a consistent f Say 8 g f* err(g,data)=log(F)/n P[err(g,data)=0]= f 1 f 2 f 3 … f F truth y x 1 x 2 x 3 … x n +–+…+++–+…++ ––+…+–––+…+– +++…––+++…–– –+–…–––+–…–– Wow! Online looks like batch. Perfect f 2 F

15
Naive batch learning Naive batch algorithm Choose f 2 F that minimizes err(f,data) For any f 2 F, P[|err(f,data)-err(f, )|> ] · 2e -2c 2 P[ 9 f 2F |err(f,data)-err(f, )|> ] · 2Fe -200ln F · 2 -100 E[Regret(n.b., )] · c (F = | F |)

16
Weighted majority’ [LW89] Assign weight to each f, 8 f 2F w(f)=1 On period i=1,2,…,n: Predict weighted maj of f’s For each f: if f(x i ) y i, w(f):=w(f)/2 (F = | F |) WM’ errs ) total weight decreases by ¸ 25% Final total weight · F (3/4) #mistakes(WM’) Final total weight ¸ 2 -min f #mistakes(f) #mistakes(WM’) · 2.41(min f #mistakes(f)+log(F)/n)

17
Weighted majority [LW89] Assign weight to each f, 8 f 2F w(f)=1 On period i=1,2,…,n: Predict weighted maj of f’s For each f: if f(x i ) y i, w(f):=w(f)(1– ) Thm: E[Regret(WM,data)] · 2 (F = | F |) Wow! Online looks like batch.

18
Weighted majority extensions… Tracking On any window W, E[Regret(WM,W)] · c f 1 f 2 f 3 … f F WM truth y x 1 x 2 x 3 … x n +–+…++++–+…+++ ––+…++–––+…++– +++…–+–+++…–+– W

19
Weighted majority extensions… Multi-armed bandit You don’t see x i You pick f Find out if you erred E[Regret] · c f 1 f 2 f 3 … f F WM truth y x 1 x 2 x 3 … x n + … + – –…+––…+– +…+–+…+–

20
Outline 1. Online/batch learnability of F Online learnability ) batch learnability Finite learning: batch and online (via weighted maj.) Batch learnability ; online learnability Batch learnability ) (transductive) online learnability 2. Online learning in repeated games Zero-sum: Weighted majority ) eq. General-sum: No “internal regret” ) corr. eq.

21
Batch Online Define f c : [0,1] ! {+,–}, f c (x) = sgn(x – c) Simple threshold functions F = {f c | c 2 [0,1]} Batch learnable: Yes Online learnable: ? Adversary does a “random binary search” Each label is equally likely to be +/– E[Regret]=½ for any online algorithm x 1 =.5 + x2x2 – x3x3 + 01 x4x4 – No! x5x5

22
Key idea: transductive online learning We see x 1,x 2,…,x n 2 X in advance y 1,y 2,…,y n 2 {+,–} are revealed online [KakadeK05] x 1 =.5 + x2x2 – x3x3 + 01 x4x4 –

23
Adversary picks (x 1,y 1 ),…,(x n,y n ) 2X£Y Adversary reveals x 1,x 2,…,x n We predict z 1 We see y 1 We predict z 2 We see y 2 … We predict z n We see y n – + – ? “empirical error” err(A,data) = | {i j z i y i } | / n X = R 2 Y = {–,+} Trans. online alg. T(x 1,y 1,…,x i-1,y i-1,x i,x i+1,…,x n )=z i Key idea: transductive online learning [KakadeK05]

24
Algorithm for trans. online learning We see x 1,x 2,…,x n 2 X in advance y 1,y 2,…,y n 2 {+,–} are revealed online Algorithm for trans. online learning L distinct labelings f(x 1 ),f(x 2 ),…,f(x n ) over all f 2 F Effective size of F is L Run WM on L functions E[Regret(WM,data)] · 2 [KK05] f1f2f3…f1f1f2f3…f1 x 1 x 2 x 3 … x n +++…++++…+ ––+…+––+…+ +++…–+++…– ++–…–++–…–

25
Candidate efficient algorithm f 1 f 2 f 3 … f 1 True y x 1 x 2 x 3 … x n + + + … + + ––+…+–––+…+– +++…–+++…– ++–…–++–…– + – – … + (random)

26
How many labelings? Shattering & VC Def: S µ X is shattered if there are 2 |S| ways to label S by f 2 F VC( F ) = max |S| S is shattered by F Example VC dimension captures complexity of F – + + – + – + + –

27
How many labelings? Shattering & VC Sauer’s lemma: # labelings L = O(n VC( F ) ) ) E[Regret(WM,data)] ·

28
Cannot batch learn faster than VC( F ) Shattered set S, |S| = VC( F ), n > 0 X £ Y Batch training set of size n Each x 2 S is not in training set with probability (1-1/n) n ¼ e -1 ) E[Regret(B, )] ¸ c VC( F ) /n

29
Putting it together Transductive online: E[Regret(WM,data)] = Batch: E[Regret(B, )] ¸ Trans. online learnable, batch learnable, finite VC( F ) Almost identical to standard VC bound

30
Learnability conclusions Finite VC( F ) characterizes batch and transductive learnability Open problem: what propertyof F characterizes online learnability (non- transductive) Efficiency!? WM algorithm requires enumeration of F Thm [KK05] : if one can efficiently find lowest error f 2 F, then one can design efficient online learning algorithm

31
Online learning in repeated games

32
Repeated games Example: Rounds i=1,2,…,n: Players simultaneously choose actions Players receive payoff, goal: max total payoff Learning: players need not know opponent/game Feedback: player only finds out payoff of his action and alternatives (not opponent action) 0,0-1,11,-1 -1,10,0-1,1 1,-1-1,10,0 RPS S P R Pl. 1 Pl. 2

33
(Mixed) Nash Equlibrium Each player chooses a dist. over actions Players are optimizing relative to opponent(s) 0,0-1,11,-1 -1,10,0-1,1 1,-1-1,10,0 RPS S P R 1/3

34
Online learning in 0-sum games (Schapire recap) Payoff is A(i,j) for pl. 1, -A(i,j) for pl. 2 Going first is disadvantage: max i min j A(i,j) · min j max i A(i,j) Mixed strategies: max min A(, ) · min max A(, ) Min-max theorem “=”

35
Each player uses weighted majority Maintain weight on each action, initially equal Choose an action proportional to weight (assume payoffs are in [-1,1]) Find out payoffs of each action For each action weight Ã weight*(1 + payoff) Regret = possible improvement (in hindsight) from always playing a single action WM ) regret is low Online learning in 0-sum games (Schapire recap)

36
Actions are (a 1,b 1 ),(a 2,b 2 ),…,(a n,b n ) Regret of pl. 1 is Let be empirical distributions of actions a 1,…,a n and b 1,…,b n, respectively Online learning in 0-sum games (Schapire recap) =

37
WM ) “min-max” theorem max min A(, ) = min max A(, ) = “value of game” Using WM, each player guarantees regret ! 0, regardless of opponent Can beat an idiot in tic-tac-toe Reasonable strategy to use Justifies how such equilibria might arise

38
Justifying equilibrium in games Online learning gives plausible explanation for how equilibrium might arise Nash equilibrium Zero-sum games Unique value Fast and easy to “learn” General-sum games Not unique Fast and easy to “learn”?? Polynomial time algorithm to find one??

39
General-sum games No unique “value” Many very different equilibria Can’t naively improve a “no regret” algorithm (by playing a single mixed strategy) Low regret for both players ; equilibrium 1,10,0 2,2 ABAB A B e.g.

40
General sum games Low regret ; Nash equilibrium, e.g., (1,1),(2,2),(1,1),(2,2),(1,1),(2,2),… 0,0-1,-11, 11,-1 -1,-10,01,-11,1 -1,11,1 -1,11,1 123 3 2 1 4 4

41
Refined notion of regret Can’t naively improve a “no regret” algorithm (by playing a single mixed strategy) Might be able to naively improve it by replacing: “When alg. suggests 1, play 3” 0,0-1,-11, 11,-1 -1,-10,01,-11,1 -1,11,1 -1,11,1 123 3 2 1 4 4

42
Play col. 2 Play row 1 Internal regret Internal regret IR(i,j) is how much we could have improved by replacing all occurrences of action i with action j No internal regret ) correlated equilibrium Calibration ) correlated equilibrium [FosterVohra] Correlated Equilibrium [Aumann] Best strategy to listen to the fish 1,00,10,0 1,00,1 0,01,0 1/6

43
Low internal regret ! correlated eq. Sequence like (1,1),(2,1),(3,2),… Think about this as a distribution No internal regret, correlated eq. Play col. 2 Play row 1 1,00,10,0 1,00,1 0,01,0 1/6 P

44
Online learning in games conclusions Online learning in zero-sum games Weighted majority (low regret) Achieves value of game Online learning in general-sum games Low internal regret Achieves correlated equilibrium Open problems Are there natural dynamics ) Nash Equilibrium Is correlated equilibrium going take over?

Similar presentations

OK

Correlated-Q Learning and Cyclic Equilibria in Markov games Haoqi Zhang.

Correlated-Q Learning and Cyclic Equilibria in Markov games Haoqi Zhang.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on formal education articles Ppt on holographic technology for health Easy ppt on artificial intelligence Lcos display ppt online Mp ppt online form 2012 Ppt on l&t finance ipo Ppt on 2nd world war video Ppt on different sectors of economy for class 10 Ppt on tourism in nepal Ppt on fine dining etiquette