Power of Selective Memory. Slide 1 The Power of Selective Memory Shai Shalev-Shwartz Joint work with Ofer Dekel, Yoram Singer Hebrew University, Jerusalem.

Power of Selective Memory. Slide 1 The Power of Selective Memory Shai Shalev-Shwartz Joint work with Ofer Dekel, Yoram Singer Hebrew University, Jerusalem

Power of Selective Memory. Slide 2 Outline Online learning, loss bounds etc. Hypotheses space – PST Margin of prediction and hinge-loss An online learning algorithm Trading margin for depth of the PST Automatic calibration A self-bounded online algorithm for learning PSTs

Power of Selective Memory. Slide 3 Online Learning For Get an instance Predict a target based on Get true update and suffer loss Update prediction mechanism

Power of Selective Memory. Slide 4 Analysis of Online Algorithm Relative loss bounds (external regret): For any fixed hypothesis h :

Power of Selective Memory. Slide 5 Prediction Suffix Tree (PST) Each hypothesis is parameterized by a triplet: context function

Power of Selective Memory. Slide 6 PST Example 0 -3 1 4 -2 7

Power of Selective Memory. Slide 7 Margin of Prediction Margin of prediction Hinge loss

Power of Selective Memory. Slide 8 Complexity of hypothesis Define the complexity of hypothesis as We can also extend g s.t. and get

Power of Selective Memory. Slide 9 Algorithm I : Learning Unbounded-Depth PST Init: For t=1,2,… Get and predict Get and suffer loss Set Update weight vector Update tree

Power of Selective Memory. Slide 10 Example y = 0 y = ?

Power of Selective Memory. Slide 11 Example y = + 0 y = ?

Power of Selective Memory. Slide 12 Example y = + 0 y = ??

Power of Selective Memory. Slide 13 Example y = +- 0 y = ?? -.23 +

Power of Selective Memory. Slide 14 Example y = +- 0 y = ??? -.23 +

Power of Selective Memory. Slide 15 Example y = +-+ 0 y = ??? -.23 +.23.16 + -

Power of Selective Memory. Slide 16 Example y = +-+ 0 y = ???- -.23 +.23.16 + -

Power of Selective Memory. Slide 17 Example y = +-+- 0 y = ???- -.42 +.23.16 + - -.14 -.09 + -

Power of Selective Memory. Slide 18 Example y = +-+- 0 y = ???-+ -.42 +.23.16 + - -.14 -.09 + -

Power of Selective Memory. Slide 19 Example y = +-+-+ 0 y = ???-+ -.42 +.41.29 + - -.14 -.09 + -.09.06 + -

Power of Selective Memory. Slide 20 Analysis Let be a sequence of examples and assume that Let be an arbitrary hypothesis Let be the loss of on the sequence of examples. Then,

Power of Selective Memory. Slide 21 Proof Sketch Define Upper bound Lower bound Upper + lower bounds give the bound in the theorem

Power of Selective Memory. Slide 22 Proof Sketch (Cont.) Where does the lower bound come from? For simplicity, assume that and Define a Hilbert space: The context function g t+1 is the projection of g t onto the half-space where f is the function

Power of Selective Memory. Slide 23 Example revisited The following hypothesis has cumulative loss of 2 and complexity of 2. Therefore, the number of mistakes is bounded above by 12. y = +-+-+-+-

Power of Selective Memory. Slide 24 Example revisited The following hypothesis has cumulative loss of 1 and complexity of 4. Therefore, the number of mistakes is bounded above by 18. But, this tree is very shallow 0 1.41 -1.41 + - y = +-+-+-+- Problem: The tree we learned is much more deeper !

Power of Selective Memory. Slide 25 Geometric Intuition

Power of Selective Memory. Slide 26 Geometric Intuition (Cont.) Lets force g t+1 to be sparse by “canceling” the new coordinate

Power of Selective Memory. Slide 27 Geometric Intuition (Cont.) Now we can show that:

Power of Selective Memory. Slide 28 Trading margin for sparsity We got that If is much smaller than we can get a loss bound ! Problem: What happens if is very small and therefore ? Solution: Tolerate small margin errors ! Conclusion: If we tolerate small margin errors, we can get a sparser tree

Power of Selective Memory. Slide 29 Automatic Calibration Problem: The value of is unknown Solution: Use the data itself to estimate it ! More specifically: Denote If we keep then we get a mistake bound

Power of Selective Memory. Slide 30 Algorithm II : Learning Self Bounded-Depth PST Init: For t=1,2,… Get and predict Get and suffer loss If do nothing! Otherwise: Set Update w and the tree as in Algo. I, up to depth d t

Power of Selective Memory. Slide 31 Analysis – Loss Bound Let be a sequence of examples and assume that Let be an arbitrary hypothesis Let be the loss of on the sequence of examples. Then,

Power of Selective Memory. Slide 32 Analysis – Bounded depth Under the previous conditions, the depth of all the trees learned by the algorithm is bounded above by

Power of Selective Memory. Slide 33 Example revisited Performance of Algo. II y = + - + - + - + - … Only 3 mistakes The last PST is of depth 5 The margin is 0.61 (after normalization) The margin of the max margin tree (of infinite depth) is 0.7071 0 -.55 +.55.39 + - -. 22 -.07 + -.07.05 -.03 -.05 - + -

Power of Selective Memory. Slide 34 Conclusions Discriminative online learning of PSTs Loss bound Trade margin and sparsity Automatic calibration Future work Experiments Features selection and extraction Support vectors selection

Power of Selective Memory. Slide 1 The Power of Selective Memory Shai Shalev-Shwartz Joint work with Ofer Dekel, Yoram Singer Hebrew University, Jerusalem.

Similar presentations

Presentation on theme: "Power of Selective Memory. Slide 1 The Power of Selective Memory Shai Shalev-Shwartz Joint work with Ofer Dekel, Yoram Singer Hebrew University, Jerusalem."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Power of Selective Memory. Slide 1 The Power of Selective Memory Shai Shalev-Shwartz Joint work with Ofer Dekel, Yoram Singer Hebrew University, Jerusalem.

Similar presentations

Presentation on theme: "Power of Selective Memory. Slide 1 The Power of Selective Memory Shai Shalev-Shwartz Joint work with Ofer Dekel, Yoram Singer Hebrew University, Jerusalem."— Presentation transcript:

Similar presentations

About project

Feedback