Download presentation
Presentation is loading. Please wait.
Published byDustin Tronson Modified over 10 years ago
1
Learning User Preferences Jason Rennie MIT CSAIL jrennie@gmail.com Advisor: Tommi Jaakkola
2
Information Extraction Informal Communication: e-mail, mailing lists, bulletin boards Issues: –Context switching –Abbreviations & shortened forms –Variable punctuation, formatting, grammar
3
Thesis Advertisement: Outline Thesis is not end-to-end IE system We address some IE problems: 1.Identifying & Resolving Named Entites 2.Tracking Context 3.Learning User Preferences
4
Identifying Named Entities Rialto is now open until 11pm Facts/Opinions usually about a named entity Tools typically rely on punctuation, capitalization, formatting, grammar We developed criterion to identify topic- oriented words using occurrence stats [Rennie & Jaakkola, SIGIR 2005]
5
Resolving Named Entites Theyre now open until 11pm What does they refer to? Clustering –Group noun phrases that co-refer McCallum & Wellner (2005) –Excellent for proper nouns Our contribution: better modeling of non- proper nouns (incl. pronouns)
6
Tracking Context The Swordfish was fabulous –Indirect comment on restaurant. –Restaurant identifed by context. Use word statistics to find topic switches Contribution: new sentence clustering algorithm
7
Learning User Preferences Examples: –I loved Rialto last night. –Overall, Oleana was worth the money –Radius wasnt bad, but wasnt great –Om was purely pretentious Issues: 1.Translate text to partial ordering or rating 2.Predict unobserved ratings
8
Preference Problems Single User w/ Item Features Multi-user, no features –Aka Collaborative Filtering
9
Single User, Item Features -0.1 +10+500+2 User Weights +8-4+1-7-6-3 Preference Scores Capacity Price French? New American? Ethnic? Formality Location 10 Tables#9 ParkLumiereTanjoreChennaiRndzvous 309060804080 306050302040 101000 010001 000110 243102 231202 Feature Values 4 =6 3 =3 2 =-2 1 =-5 5 1 3 2 4 Ratings
10
Single User, Item Features ??????? User Weights ?????? Preference Scores Capacity Price French? New American? Ethnic? Formality Location 10 Tables#9 ParkLumiereTanjoreChennaiRndzvous 309060804080 306050302040 101000 010001 000110 243102 231202 Feature Values 5231?? Ratings
11
-2.51.4-0.95.63.1-1.8 -2.70.2-4.22.10.2-4.2 2.1-2.51.4-0.95.63.1 -1.8-2.70.2-4.22.1-2.5 1.4-0.95.63.1-1.8-2.7 0.2-4.2-1.40.73.4-0.8 1.9-2.24.72.6-3.5-2.1 Many Users, No Features 232323 215124 121313 523524 425215 333532 452435 ? ???? ?? ?? ???? ??? ??? Weights Features Preference Scores Ratings ?? ?
12
Possible goals: –Predict missing entries –Cluster users or items Applications: –Movies, Books –Genetic Interaction –Network routing –Sports performance Collaborative Filtering 232323 215124 121313 523524 425215 333532 452435 users items
13
Outline Single User, Features –Loss functions, Convexity, Large Margin –Loss function for Ratings Many Users, No Features –Feature Selection, Rank, SVD –Regularization: tie together multiple tasks –Optimization: scale to large problems Extensions
14
This Talk: Contributions Implementation and systematic evaluation of loss functions for Single User prediction. Scaling Multi-user regularization to large (thousands of users/items) problems –Analysis of optimization Extensions –Hybrid: features + multiple users –Observation model & multiple ratings
15
Rating Classification n ordered classes Learn weight vector, thresholds 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 w
16
Loss Functions 0-1Hinge Logistic Margin Agreement Smooth Hinge Mod. Least Squares
17
Convexity Convex function => no local minima Set convex if all line segments within set
18
Convexity of Loss Functions 0-1 loss is not convex –Local minima, sensitive to small changes Convex Bound –Large margin solution with regularization –Stronger guarantees
19
Proportional Odds McCullagh introduced original rating model –Linear interaction: weights & features –Thresholds –Maximum likelihood [McCullagh, 1980] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 w
20
Immediate-Thresholds 1 2 3 4 5 [Shashua & Levin, 2003]
21
Some Errors are Better than Others User: System 1:System 2:
22
Not a Bound on Absolute Diff. 1 2 3 4 5
23
All-Thresholds Loss 1 2 3 4 5 [Srebro, Rennie & Jaakkola, NIPS 2004]
24
Experiments Multi- Class Imm- Thresh All- Thresh p-value MLS.7486.7491.67001.7e-18 Hinge.7433.7628.67026.6e-17 Logistic.7490.7248.66237.3e-22 Least Squares: 1.3368 [Rennie & Srebro, IJCAI 2005]
25
Many Users, No Features 232323 215124 121313 523524 425215 333532 452435 ? ???? ?? ?? ???? ??? ??? -2.51.4-0.95.63.1-1.8 -2.70.2-4.22.10.2-4.2 2.1-2.51.4-0.95.63.1 -1.8-2.70.2-4.22.1-2.5 1.4-0.95.63.1-1.8-2.7 0.2-4.2-1.40.73.4-0.8 1.9-2.24.72.6-3.5-2.1 Weights Features Preference Scores Ratings ?? ?
26
Background: L p -norms L 0 : # non-zero entries: || || 0 = 3 L 1 : absolute value sum: || || 1 = 5 L 2 : Euclidean length: || || 2 = 2 General: ||v|| p = ( i |v i | p ) 1/p
27
Background: Feature Selection Objective: Loss + Regularization L 2 Squared L1L1
28
Singular Value Decomposition X=USV –U,V: orthogonal (rotation) –S: diagonal, non-negative Eigenvalues of XX=USVVSU=USSU are squared singular values of X Rank = ||s|| 0 SVD: used to obtain least-squares low- rank approximation
29
Low Rank Matrix Factorization V U × ¼ X rank k = 245142 312254 424131 33424 231432 22145 241423 131143 422531 Y Use SVD to find Global Optimum Non-convex No explicit soln. Sum-Squared Loss Fully Observed Y Classification Error Loss Partially Observed Y
30
Low-Rank: Non-Convex Set Rank 1 Rank 2
31
Trace Norm Regularization [Fazel et al., 2001] Trace Norm: sum of singular values y
32
Many Users, No Features 232323 215124 121313 523524 425215 333532 452435 -2.51.4-0.95.63.1-1.8 -2.70.2-4.22.10.2-4.2 2.1-2.51.4-0.95.63.1 -1.8-2.70.2-4.22.1-2.5 1.4-0.95.63.1-1.8-2.7 0.2-4.2-1.40.73.4-0.8 1.9-2.24.72.6-3.5-2.1 Weights Features Preference Scores Ratings U V X Y
33
Max Margin Matrix Factorization Convex function of X and Low rank in X All-Thresholds Loss Trace Norm [Srebro, Rennie & Jaakkola, NIPS 2004]
34
Properties of the Trace Norm The factorization: U S, V S minimizes both quantities
35
Factorized Optimization Factorized Objective (tight bound): Gradient descent: O(n 3 ) per round Stationary points, but no local minima [Rennie & Srebro, ICML 2005]
36
Collaborative Prediction Results size, sparsity: EachMovie 36656x1648, 96% MovieLens 6040x3952, 96% Algorithm Weak Error Strong Error Weak Error Strong Error URP.8596.8859.6946.7104 Attitude.8787.8845.6912.7000 MMMF.8548.8439.6650.6725 [URP & Attitude: Marlin, 2004] [MMMF: Rennie & Srebro, 2005]
37
Extensions Multi-user + Features Observation model –Predict which restaurants a user will rate, and –The rating she will make Multiple ratings per user/restaurant –E.g. Food, Service and Décor ratings SVD Parameterization
38
Fixed Features Learned Features Multi-User + Features Feature parameters (V): –Some are fixed –Some are learned Learn weights (U) for all features Fixed part of V does not affect regularization V
39
Observation Model Common assumption: ratings observed at random Restaurant selection: –Geography, popularity, price, food style Remove bias: model observation process
40
Observation Model Model as binary classification Add binary classification loss Tie together rating and observation models X=U X V W=U W V
41
Multiple Ratings Users may provide multiple ratings: –Service, Décor, Food Add in loss functions Stack parameter matrices for regularization
42
SVD Parameterization Too many parameters: UAA -1 V=X is another factorization of X Alternate: U,S,V –U,V orthogonal, S diagonal Advantages: –Not over-parameterized –Exact objective (not a bound) –No stationary points
43
Summary Loss function for ratings Regularization for multiple users Scaled MMMF to large problems (e.g. > 1000x1000) Trace norm: widely applicable Extensions Code: http://people.csail.mit.edu/jrennie/matlab
44
Thanks! Helen, for supporting me for 7.5 years! Tommi Jaakkola, for answering all my questions and directing me to the end! Mike Collins and Tommy Poggio for addl guidance. Nati Srebro & John Barnett for endless valuable discussions and ideas. Amir Globerson, David Sontag, Luis Ortiz, Luis Perez-Breva, Alan Qi, & Patrycja Missiuro & all past members of Tommis reading group for paper discussions, conference trips and feedback on my talks. Many, many others who have helped me along the way!
45
Low-Rank Optimization Low-Rank Minimum Objective Minimum Low-Rank Local Minimum Low- Rank Low- Rank
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.