No-Regret Algorithms for Online Convex Programs Geoffrey J. Gordon Carnegie Mellon University Presented by Nicolas Chapados 21 February 2007.

No-Regret Algorithms for Online Convex Programs Geoffrey J. Gordon Carnegie Mellon University Presented by Nicolas Chapados 21 February 2007

Outline Online learning setting Definition of Regret Safe Set Lagrangian Hedging (gradient form) Lagrangian Hedging (optimization form) Mention of Theoretical Results Application: One-Card Poker

Online Learning Sequence of trials 1, 2, … At each trial we must pick a hypothesis y i Correct answer revealed in the form of a convex loss function l t (y t ) Just before seeing t-th example, total loss is given by

Goal of Paper Introduce Lagrangian Hedging algorithm Generalization of other algorithms –Hedge (Freund and Schapire) –Weighted Majority (Littlestone and Warmuth) –External-regret Matching (Hart and Mas- Colell) (CMU Technical Report is much clearer than NIPS paper)

Regret If we had used a fixed hypothesis y, the loss would have been The regret is the difference between the total loss of the adaptive and fixed hypotheses: Positive regret means that we should have preferred the fxed hypothesis

Hypothesis Set Assume that hypothesis set Y is a convex subset of R d For example, the simplex of probability distributions The corners of Y represent pure actions and the middle region a probability distribution over actions

Loss Function Minimize a linear loss

Regret Vector Keep the state of the learning algorithm Vector that keeps information about actual losses and gradient of loss function Define regret vector s t by the recursion Arbitrary vector u which satisfies for all Example: if y is a probability, then u can be the vector of all ones.

Use of Regret Vector Given any hypothesis y, we can use the regret vector to compute its regret:

Safe Set Region of the regret space in which the regret is guaranteed to be nonpositive for all hypotheses Goal of the Lagrangian Hedging algorithm is to keep its regret vector « near » the safe set

Safe Set (continued) Hypothesis set Y Safe Set S

Unnormalized Hypotheses Consider the cone of unnormalized hypotheses: The safe set is a cone that is polar to this cone of unnormalized hypotheses:

Lagrangian Hedging (Setting) At each step, the algorithm chooses its play according to the current regret vector and a closed convex potential function F(s) Define (sub)gradient of F(s) as f(s) Potential function is what defines the problem to be solved E.g. Hedge / Weighted Majority:

Lagrangian Hedging (Gradient)

Optimization Form In practice, may be difficult to define, evaluate and differentiate an appropriate potential function Optimization form: same pseudo-code as previously, but define F in terms of a simpler hedging function W Example corresponding to previous F 1

Optimization Form (cont’d) Then may obtain F as: And the (sub)gradient as: Which we may plug into the previous pseudo-code

Theoretical Results (In a nutshell: it all works)

One-Card Poker Hypothesis space is the set of sequence weight vectors –information about when it is player i’s turn to move and the actions available at that time Two players: gambler and dealer Ante = $1 / given 1 card from 13-card deck Gambler Bets / Dealer Bets / Gambler Bets A player may fold If neither folds: player with highest card wins pot

Why is it interesting? Elements of more complicated games: –Incomplete information –Chance events –Multiple stages Optimal play requires randomization and bluffing

Results in Self-Play

Results Against Fixed Opponent

No-Regret Algorithms for Online Convex Programs Geoffrey J. Gordon Carnegie Mellon University Presented by Nicolas Chapados 21 February 2007.

Similar presentations

Presentation on theme: "No-Regret Algorithms for Online Convex Programs Geoffrey J. Gordon Carnegie Mellon University Presented by Nicolas Chapados 21 February 2007."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

No-Regret Algorithms for Online Convex Programs Geoffrey J. Gordon Carnegie Mellon University Presented by Nicolas Chapados 21 February 2007.

Similar presentations

Presentation on theme: "No-Regret Algorithms for Online Convex Programs Geoffrey J. Gordon Carnegie Mellon University Presented by Nicolas Chapados 21 February 2007."— Presentation transcript:

Similar presentations

About project

Feedback