Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 A fast algorithm for learning large scale preference relations Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park Balaji Krishnapuram.

Similar presentations


Presentation on theme: "1 A fast algorithm for learning large scale preference relations Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park Balaji Krishnapuram."— Presentation transcript:

1 1 A fast algorithm for learning large scale preference relations Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park Balaji Krishnapuram Siemens medical solutions USA AISTATS 2007

2 2 Learning Many learning tasks can be viewed as function estimation.

3 3 Learning from examples Learning algorithm Training Not all supervised learning procedures fit in the standard classification/regression framework. In this talk we are mainly concerned with ranking/ordering.

4 4 Ranking / Ordering For some applications ordering is more important Example 1: Information retrieval Sort in the order of relevance

5 5 Ranking / Ordering For some applications ordering is more important Example 2: Recommender systems Sort in the order of preference

6 6 Ranking / Ordering For some applications ordering is more important Example 3: Medical decision making Decide over different treatment options

7 7 Ranking formulation Algorithm Fast algorithm Results Plan of the talk

8 8 Preference relations Given a we can order/rank a set of instances. Goal - Learn a preference relation Training data – Set of pairwise preferences

9 9 Ranking function Why not use classifier/ordinal regressor as the ranking function? Goal - Learn a preference relation New Goal - Learn a ranking function Provides a numerical score Not unique

10 10 Why is ranking different? Learning algorithm Training Pairwise preference Relations Pairwise disagreements

11 11 Training data..more formally From these two we can get a set of pairwise preference realtions

12 12 Loss function.. Generalized Wilcoxon-Mann-Whitney (WMW) statistic Minimize fraction of pairwise disagreements Maximize fraction of pairwise agreements Total # of pairwise agreements Total # of pairwise preference relations

13 13 Consider a two class problem + + + + + + + - - - - - -

14 14 Function class..Linear ranking function Different algorithms use different function class RankNet – neural network RankSVM – RKHS RankBoost – boosted decision stumps

15 15 Ranking formulation –Training data – Pairwise preference relations –Ideal Loss function – WMW statistic –Function class – linear ranking functions Algorithm Fast algorithm Results Plan of the talk

16 16 The Likelihood Discrete optimization problem Log-likelihood Assumption : Every pair is drawn independently Sigmoid [Burges et.al.] Choose w to maximize

17 17 The MAP estimator

18 18 Another interpretation O-1 indicator function Log-sigmoid What we want to maximize What we actually maximize Log-sigmoid is a lower bound for the indicator function

19 19 Lower bounding the WMW Log-likelihood <= WMW

20 20 Gradient based learning Use nonlinear conjugate-gradient algorithm. Requires only gradient evaluations. No function evaluations. No second derivatives. Gradient is given by

21 21 RankNet Learning algorithm Training Pairwise preference relations Cross entropy neural net Backpropagation

22 22 RankSVM Learning algorithm Training Pairwise preference relations Pairwise disagreements RKHS SVM

23 23 RankBoost Learning algorithm Training Pairwise preference relations Pairwise disagreements Decision stumps Boosting

24 24 Ranking formulation –Training data – Pairwise preference relations –Loss function – WMW statistic –Function class – linear ranking functions Algorithm –Maximize a lower bound on WMW –Use conjugate-gradient –Quadratic complexity Fast algorithm Results Plan of the talk

25 25 Key idea Use approximate gradient. Extremely fast in linear time. Converges to the same solution. Requires a few more iterations.

26 26 Core computational primitive Weighted summation of erfc functions

27 27 Notion of approximation

28 28 Example

29 29 1. Beauliu’s series expansion Retain only the first few terms contributing to the desired accuracy. Derive bounds for this to choose the number of terms

30 30 2. Error bounds

31 31 3. Use truncated series

32 32 3. Regrouping Does not depend on y. Can be computed in O(pN) Once A and B are precomputed Can be computed in O(pM) Reduced from O(MN) to O(p(M+N))

33 33 3. Other tricks Rapid saturation of the erfc function. Space subdivision Choosing the parameters to achieve the error bound See the technical report

34 34 Numerical experiments

35 35 Precision vs Speedup

36 36 Ranking formulation –Training data – Pairwise preference relations –Loss function – WMW statistic –Function class – linear ranking functions Algorithm –Maximize a lower bound on WMW –Use conjugate-gradient –Quadratic complexity Fast algorithm –Use fast approximate gradient –Fast summation of erfc functions Results Plan of the talk

37 37 Datasets 12 public benchmark datasets Five-fold cross-validation experiments CG tolerance 1e-3 Accuracy for the gradient computation 1e-6

38 38 Direct vs Fast -WMW statistic Dataset DirectFast 10.5360.534 20.917 30.623 4*0.979 WMW is similar for both the exact and the fast approximate version.

39 39 Dataset DirectFast 11736 secs.2 secs. 26731 secs.19 secs. 32557 secs.4 secs. 4*47 secs. Direct vs Fast – Time taken

40 40 Effect of gradient approximation

41 41 Comparison with other methods RankNet - Neural network RankSVM - SVM RankBoost - Boosting

42 42 Comparison with other methods WMW is almost similar for all the methods. Proposed method faster than all the other methods. Next best time is shown by RankBoost. Only proposed method can handle large datasets.

43 43 Sample result Dataset 8 N=950 d=10 S=5 Time taken (secs) WMW RankNCG direct3330.984 RankNCG fast30.984 RankNet linear12640.951 RankNet two layer24640.765 RankSVM linear340.984 RankSVM quadratic13320.996 RankBoost60.958

44 44 Sample result Dataset 11 N=4177 d=9 S=3 Time taken (secs) WMW RankNCG direct17360.536 RankNCG fast20.534 RankNet linear RankNet two layer RankSVM linear RankSVM quadratic RankBoost630.535

45 45 Application to collaborative filtering Predict movie ratings for a user based on the ratings provided by other users. MovieLens dataset (www.grouplens.org) 1 million ratings (1-5) 3592 movies 6040 users Feature vector for each movie – rating provided by d other users

46 46 Collaborative filtering results

47 47 Collaborative filtering results

48 48 Ranking formulation –Training data – Pairwise preference relations –Loss function – WMW statistic –Function class – linear ranking functions Algorithm –Maximize a lower bound on WMW –Use conjugate-gradient –Quadratic complexity Fast algorithm –Use fast approximate gradient –Fast summation of erfc functions Results –Similar accuracy as other methods –But much much faster Plan/Conclusion of the talk

49 49 Ranking formulation –Training data – Pairwise preference relations –Loss function – WMW statistic –Function class – linear ranking functions Algorithm –Maximize a lower bound on WMW –Use conjugate-gradient –Quadratic complexity Fast algorithm –Use fast approximate gradient –Fast summation of erfc functions Results –Similar accuracy as other methods –But much much faster Future work Other applications neural network Probit regression Code coming soon

50 50 Ranking formulation –Training data – Pairwise preference relations –Loss function – WMW statistic –Function class – linear ranking functions Algorithm –Maximize a lower bound on WMW –Use conjugate-gradient –Quadratic complexity Fast algorithm –Use fast approximate gradient –Fast summation of erfc functions Results –Similar accuracy as other methods –But much much faster Future work Other applications neural network Probit regression Nonlinear Kernelized Variation.

51 51 Thank You ! | Questions ?


Download ppt "1 A fast algorithm for learning large scale preference relations Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park Balaji Krishnapuram."

Similar presentations


Ads by Google