Download presentation

Presentation is loading. Please wait.

Published byDustin Tronson Modified over 3 years ago

1
Learning User Preferences Jason Rennie MIT CSAIL jrennie@gmail.com Advisor: Tommi Jaakkola

2
Information Extraction Informal Communication: e-mail, mailing lists, bulletin boards Issues: –Context switching –Abbreviations & shortened forms –Variable punctuation, formatting, grammar

3
Thesis Advertisement: Outline Thesis is not end-to-end IE system We address some IE problems: 1.Identifying & Resolving Named Entites 2.Tracking Context 3.Learning User Preferences

4
Identifying Named Entities Rialto is now open until 11pm Facts/Opinions usually about a named entity Tools typically rely on punctuation, capitalization, formatting, grammar We developed criterion to identify topic- oriented words using occurrence stats [Rennie & Jaakkola, SIGIR 2005]

5
Resolving Named Entites Theyre now open until 11pm What does they refer to? Clustering –Group noun phrases that co-refer McCallum & Wellner (2005) –Excellent for proper nouns Our contribution: better modeling of non- proper nouns (incl. pronouns)

6
Tracking Context The Swordfish was fabulous –Indirect comment on restaurant. –Restaurant identifed by context. Use word statistics to find topic switches Contribution: new sentence clustering algorithm

7
Learning User Preferences Examples: –I loved Rialto last night. –Overall, Oleana was worth the money –Radius wasnt bad, but wasnt great –Om was purely pretentious Issues: 1.Translate text to partial ordering or rating 2.Predict unobserved ratings

8
Preference Problems Single User w/ Item Features Multi-user, no features –Aka Collaborative Filtering

9
Single User, Item Features -0.1 +10+500+2 User Weights +8-4+1-7-6-3 Preference Scores Capacity Price French? New American? Ethnic? Formality Location 10 Tables#9 ParkLumiereTanjoreChennaiRndzvous 309060804080 306050302040 101000 010001 000110 243102 231202 Feature Values 4 =6 3 =3 2 =-2 1 =-5 5 1 3 2 4 Ratings

10
Single User, Item Features ??????? User Weights ?????? Preference Scores Capacity Price French? New American? Ethnic? Formality Location 10 Tables#9 ParkLumiereTanjoreChennaiRndzvous 309060804080 306050302040 101000 010001 000110 243102 231202 Feature Values 5231?? Ratings

11
-2.51.4-0.95.63.1-1.8 -2.70.2-4.22.10.2-4.2 2.1-2.51.4-0.95.63.1 -1.8-2.70.2-4.22.1-2.5 1.4-0.95.63.1-1.8-2.7 0.2-4.2-1.40.73.4-0.8 1.9-2.24.72.6-3.5-2.1 Many Users, No Features 232323 215124 121313 523524 425215 333532 452435 ? ???? ?? ?? ???? ??? ??? Weights Features Preference Scores Ratings ?? ?

12
Possible goals: –Predict missing entries –Cluster users or items Applications: –Movies, Books –Genetic Interaction –Network routing –Sports performance Collaborative Filtering 232323 215124 121313 523524 425215 333532 452435 users items

13
Outline Single User, Features –Loss functions, Convexity, Large Margin –Loss function for Ratings Many Users, No Features –Feature Selection, Rank, SVD –Regularization: tie together multiple tasks –Optimization: scale to large problems Extensions

14
This Talk: Contributions Implementation and systematic evaluation of loss functions for Single User prediction. Scaling Multi-user regularization to large (thousands of users/items) problems –Analysis of optimization Extensions –Hybrid: features + multiple users –Observation model & multiple ratings

15
Rating Classification n ordered classes Learn weight vector, thresholds 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 w

16
Loss Functions 0-1Hinge Logistic Margin Agreement Smooth Hinge Mod. Least Squares

17
Convexity Convex function => no local minima Set convex if all line segments within set

18
Convexity of Loss Functions 0-1 loss is not convex –Local minima, sensitive to small changes Convex Bound –Large margin solution with regularization –Stronger guarantees

19
Proportional Odds McCullagh introduced original rating model –Linear interaction: weights & features –Thresholds –Maximum likelihood [McCullagh, 1980] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 w

20
Immediate-Thresholds 1 2 3 4 5 [Shashua & Levin, 2003]

21
Some Errors are Better than Others User: System 1:System 2:

22
Not a Bound on Absolute Diff. 1 2 3 4 5

23
All-Thresholds Loss 1 2 3 4 5 [Srebro, Rennie & Jaakkola, NIPS 2004]

24
Experiments Multi- Class Imm- Thresh All- Thresh p-value MLS.7486.7491.67001.7e-18 Hinge.7433.7628.67026.6e-17 Logistic.7490.7248.66237.3e-22 Least Squares: 1.3368 [Rennie & Srebro, IJCAI 2005]

25
Many Users, No Features 232323 215124 121313 523524 425215 333532 452435 ? ???? ?? ?? ???? ??? ??? -2.51.4-0.95.63.1-1.8 -2.70.2-4.22.10.2-4.2 2.1-2.51.4-0.95.63.1 -1.8-2.70.2-4.22.1-2.5 1.4-0.95.63.1-1.8-2.7 0.2-4.2-1.40.73.4-0.8 1.9-2.24.72.6-3.5-2.1 Weights Features Preference Scores Ratings ?? ?

26
Background: L p -norms L 0 : # non-zero entries: || || 0 = 3 L 1 : absolute value sum: || || 1 = 5 L 2 : Euclidean length: || || 2 = 2 General: ||v|| p = ( i |v i | p ) 1/p

27
Background: Feature Selection Objective: Loss + Regularization L 2 Squared L1L1

28
Singular Value Decomposition X=USV –U,V: orthogonal (rotation) –S: diagonal, non-negative Eigenvalues of XX=USVVSU=USSU are squared singular values of X Rank = ||s|| 0 SVD: used to obtain least-squares low- rank approximation

29
Low Rank Matrix Factorization V U × ¼ X rank k = 245142 312254 424131 33424 231432 22145 241423 131143 422531 Y Use SVD to find Global Optimum Non-convex No explicit soln. Sum-Squared Loss Fully Observed Y Classification Error Loss Partially Observed Y

30
Low-Rank: Non-Convex Set Rank 1 Rank 2

31
Trace Norm Regularization [Fazel et al., 2001] Trace Norm: sum of singular values y

32
Many Users, No Features 232323 215124 121313 523524 425215 333532 452435 -2.51.4-0.95.63.1-1.8 -2.70.2-4.22.10.2-4.2 2.1-2.51.4-0.95.63.1 -1.8-2.70.2-4.22.1-2.5 1.4-0.95.63.1-1.8-2.7 0.2-4.2-1.40.73.4-0.8 1.9-2.24.72.6-3.5-2.1 Weights Features Preference Scores Ratings U V X Y

33
Max Margin Matrix Factorization Convex function of X and Low rank in X All-Thresholds Loss Trace Norm [Srebro, Rennie & Jaakkola, NIPS 2004]

34
Properties of the Trace Norm The factorization: U S, V S minimizes both quantities

35
Factorized Optimization Factorized Objective (tight bound): Gradient descent: O(n 3 ) per round Stationary points, but no local minima [Rennie & Srebro, ICML 2005]

36
Collaborative Prediction Results size, sparsity: EachMovie 36656x1648, 96% MovieLens 6040x3952, 96% Algorithm Weak Error Strong Error Weak Error Strong Error URP.8596.8859.6946.7104 Attitude.8787.8845.6912.7000 MMMF.8548.8439.6650.6725 [URP & Attitude: Marlin, 2004] [MMMF: Rennie & Srebro, 2005]

37
Extensions Multi-user + Features Observation model –Predict which restaurants a user will rate, and –The rating she will make Multiple ratings per user/restaurant –E.g. Food, Service and Décor ratings SVD Parameterization

38
Fixed Features Learned Features Multi-User + Features Feature parameters (V): –Some are fixed –Some are learned Learn weights (U) for all features Fixed part of V does not affect regularization V

39
Observation Model Common assumption: ratings observed at random Restaurant selection: –Geography, popularity, price, food style Remove bias: model observation process

40
Observation Model Model as binary classification Add binary classification loss Tie together rating and observation models X=U X V W=U W V

41
Multiple Ratings Users may provide multiple ratings: –Service, Décor, Food Add in loss functions Stack parameter matrices for regularization

42
SVD Parameterization Too many parameters: UAA -1 V=X is another factorization of X Alternate: U,S,V –U,V orthogonal, S diagonal Advantages: –Not over-parameterized –Exact objective (not a bound) –No stationary points

43
Summary Loss function for ratings Regularization for multiple users Scaled MMMF to large problems (e.g. > 1000x1000) Trace norm: widely applicable Extensions Code: http://people.csail.mit.edu/jrennie/matlab

44
Thanks! Helen, for supporting me for 7.5 years! Tommi Jaakkola, for answering all my questions and directing me to the end! Mike Collins and Tommy Poggio for addl guidance. Nati Srebro & John Barnett for endless valuable discussions and ideas. Amir Globerson, David Sontag, Luis Ortiz, Luis Perez-Breva, Alan Qi, & Patrycja Missiuro & all past members of Tommis reading group for paper discussions, conference trips and feedback on my talks. Many, many others who have helped me along the way!

45
Low-Rank Optimization Low-Rank Minimum Objective Minimum Low-Rank Local Minimum Low- Rank Low- Rank

Similar presentations

OK

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on peak load pricing graph Ppt on science in daily life Ppt on human chromosomes types 3d holographic display ppt online Ppt on 14 principles of management Ppt on fundamental rights and duties Appt only ph clinics Ppt on review of related literature samples Ppt on the art of war quotes Ppt on placement in hrm