Download presentation

Presentation is loading. Please wait.

Published byCorey Matthey Modified over 3 years ago

1
Near-Optimal Algorithms for Online Matrix Prediction Elad Hazan (Technion) Satyen Kale (Yahoo! Labs) Shai Shalev-Shwartz (Hebrew University)

2
Three Prediction Problems: I. Online Collaborative Filtering Users: {1, 2, …, m} Movies: {1, 2, …, n} On round t: User i t arrives and is interested in movie j t Output predicted rating p t in [-1, 1] User responds with actual rating r t in [-1, 1] Loss = (p t – r t ) 2 Comparison class: all m x n matrices with entries in [-1, 1] of trace norm τ For each such matrix W, predicted rating = W(i t, j t ) Regret = loss of alg – loss of best bounded trace-norm matrix Sum of singular values If no entries repeated, [Cesa-Bianchi, Shamir 11]: O(n 3/2 ) regret for τ = O(n)

3
Three Prediction Problems: II. Online Max Cut 2 political parties Voters: {1, 2, …, n} On round t: Voters i t, j t arrive Output prediction: votes agree or disagree Loss = 1 if incorrect prediction, 0 o.w. Comparison class: all possible bipartitions Bipartition prediction = agree if i t, j t in same partition, disagree o.w. Regret = loss of alg – loss of best bipartition Weight = #(disagree) - #(agree) Best bipartition = Max Cut! Inefficient alg using the 2 n bipartitions as experts: regret =

4
Three Prediction Problems: III. Online Gambling [Abernethy 10, Kleinberg, Niculescu-Mizil, Sharma 10] Teams: {1, 2, …, n} In round t: Teams i t, j t compete Output: prediction which team will win Loss = 1 if incorrect prediction, 0 o.w. Comparison class: all possible permutations π Permutation π prediction = i t if π(i t ) π(j t ); j t o.w. Regret = loss of alg – loss of best permutation Weight = #(i wins) - #(j wins) i j Best permutation = Min Feedback Arc Set! Inefficient alg using the n! permutations as experts: regret = Trivial bound of considered hard to improve (e.g. [Kanade, Steinke 12])

5
Results Upper BoundLower Bound Online Collaborative Filtering Online Max Cut Online Gambling Stochastic; solves $50 open problem of [Srebro, Shamir 11] By [Kleinberg, Niculescu-Mizil, Sharma 10]

6
One meta-problem to rule them all: Online Matrix Prediction (OMP) In round t: Receive pair i t, j t in [m] x [n] Output prediction p t in [-1, 1] Receive true value y t in [-1, 1] Suffer loss L(p t, y t ) Comparison class: set W of m x n matrices with entries in [-1, 1] Prediction for matrix W: entry W(i t, j t ) Regret = loss of alg – loss of best comparison matrix m x n matrices 12……n 1 2 : m

7
Online Collaborative Filtering as OMP Users: {1, 2, …, m} Movies: {1, 2, …, n} On round t: User i t arrives and is interested in movie j t Output predicted rating p t in [-1, 1] User responds with actual rating r t Loss = (p t – r t ) 2 Comparison class: W = all m x n matrices with entries in [-1, 1] of trace norm τ For each such matrix W, predicted rating = W(i t, j t )

8
Online Max Cut as OMP On round t: Voters i t, j t arrive Output prediction: votes agree or disagree Loss = 1 if incorrect prediction, 0 o.w. Comparison class: all possible bipartitions Bipartition prediction = agree if i t, j t in same partition, disagree o.w. 2 political parties Voters: {1, 2, …, n} W = all 2 n cut matrices W S corresponding to subsets S of [n] W S (i, j) = 0 if both i, j in S or [n] \ S, = 1 o.w. 00111 00111 11000 11000 11000 S S [n] \ S

9
Online Gambling as OMP Teams: {1, 2, …, n} In round t: Teams i t, j t compete Output: prediction which team will win Loss = 1 if incorrect prediction, 0 o.w. π(1)π(2)……π(n ) π(1)11111 π(2)01111 :00111 :00011 π(n ) 00001 12……n 110110 211111 :00110 :00010 n10111 Comparison class: all possible permutations π Permutation π prediction = i t if π(i t ) π(j t ); j t o.w. W = all n! permutation matrices W π corresponding to permutations π W π (i, j) = 1 if π(i) π(j) = 0 o.w.

10
Decomposability 0W WTWT 0 P = N W is (β, τ)-decomposable if where P, N are positive semidefinite Diagonal entries P ii, N ii β Sum of traces Tr(P) + Tr(N) τ Class W is (β, τ)-decomposable if every W in W is. Symmetric square matrix of order m + n

11
Main Result for (,τ)-decomposable OMP An efficient algorithm for OMP with (β, τ)-decomposable W and Lipschitz losses with regret bound

12
The Technology Theorem: Matrix Exponentiated Gradient [Tsuda, Rätsch, Warmuth 06]/ Matrix Multiplicative Weights [Arora, K. 07] algorithm Online Learning problem: in round t, Learner chooses density (i.e. psd, trace 1) matrix X t Nature reveals loss matrix M t with eigenvalues in [-1, 1] Learner suffers loss Tr(M t X t ) Goal: minimize regret = loss of learner – loss of best density matrix

13
Overview of Algorithm for OMP W W K All square symmetric X of order 2(m+n) s.t. X is positive semidefinite Diagonals X ii β Trace Tr(X) τ P0 0N W Matrix MW algorithm + Bregman projections into K 0W WTWT 0 = P - N

14
Decomposability Theorems Online Collaborative Filtering Trace norm τ matrices are ((m + n), 2τ)-decomposable. Online Max Cut Cut matrices W S are (½, 2n)-decomposable. Online Gambling Permutation matrices W π are (O(log n), O(n log n))-decomposable.

15
Decomposability Theorems Online Collaborative Filtering Trace norm τ matrices are ((m + n), 2τ)-decomposable. Online Max Cut Cut matrices W S are (½, 2n)-decomposable. Online Gambling Permutation matrices W π are (O(log n), O(n log n))-decomposable.

16
Decomposability for OCF Thm: Any symmetric matrix M of order n with entries in [-1, 1] and trace norm τ is (n, τ)-decomposable Eigenvalue decomposition: Defineand Clearly Tr(P) + Tr(N) = trace-norm(M) = τ. Diagonals of (P + N) 2 = M 2 bounded by n. So diagonals of (P + N) bounded by n. So diagonals of P, N bounded by n.

17
Decomposability Theorems Online Collaborative Filtering Trace norm τ matrices are ((m + n), 2τ)-decomposable. Online Max Cut Cut matrices W S are (½, 2n)-decomposable. Online Gambling Permutation matrices W π are (O(log n), O(n log n))-decomposable.

18
Decomposability for Online Gambling Thm: The all 1s upper triangular matrix of order n is (O(log n), O(n log n))-decomposable. T(n) = One rank-1 matrix + two non-overlapping T(n/2) B(n) = 1 + B(n/2) B(n) = O(log n).

19
Concluding Remarks Gave near-optimal algorithms for various online matrix prediction problems Exploited spectral structure of comparison matrices to get near-tight convex relaxations Solved 2 COLT open problems from [Abernethy 10] and [Shamir, Srebro 11] Open problem: get rid of the logarithmic gap between upper and lower bounds Decompositions in the paper are optimal up to constant factors, so a fundamentally different algorithm seems necessary

20
Thanks!

Similar presentations

OK

Introduction to Machine Learning Fall 2013 Perceptron (6) Prof. Koby Crammer Department of Electrical Engineering Technion 1.

Introduction to Machine Learning Fall 2013 Perceptron (6) Prof. Koby Crammer Department of Electrical Engineering Technion 1.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on why technology is important in our daily life Ppt on power factor improvement methods Ppt on panel discussion questions Ppt on corporate etiquettes Ppt on technical analysis of stocks Ppt on trade related intellectual property rights Ppt on rainwater harvesting techniques A ppt on loch ness monster found Ppt on first conditional games Ppt on ms access create tab