Learning User Preferences Jason Rennie MIT CSAIL Advisor: Tommi Jaakkola.

Learning User Preferences Jason Rennie MIT CSAIL jrennie@gmail.com Advisor: Tommi Jaakkola

Information Extraction Informal Communication: e-mail, mailing lists, bulletin boards Issues: –Context switching –Abbreviations & shortened forms –Variable punctuation, formatting, grammar

Thesis Advertisement: Outline Thesis is not end-to-end IE system We address some IE problems: 1.Identifying & Resolving Named Entites 2.Tracking Context 3.Learning User Preferences

Identifying Named Entities Rialto is now open until 11pm Facts/Opinions usually about a named entity Tools typically rely on punctuation, capitalization, formatting, grammar We developed criterion to identify topic- oriented words using occurrence stats [Rennie & Jaakkola, SIGIR 2005]

Resolving Named Entites Theyre now open until 11pm What does they refer to? Clustering –Group noun phrases that co-refer McCallum & Wellner (2005) –Excellent for proper nouns Our contribution: better modeling of non- proper nouns (incl. pronouns)

Tracking Context The Swordfish was fabulous –Indirect comment on restaurant. –Restaurant identifed by context. Use word statistics to find topic switches Contribution: new sentence clustering algorithm

Learning User Preferences Examples: –I loved Rialto last night. –Overall, Oleana was worth the money –Radius wasnt bad, but wasnt great –Om was purely pretentious Issues: 1.Translate text to partial ordering or rating 2.Predict unobserved ratings

Preference Problems Single User w/ Item Features Multi-user, no features –Aka Collaborative Filtering

Single User, Item Features -0.1 +10+500+2 User Weights +8-4+1-7-6-3 Preference Scores Capacity Price French? New American? Ethnic? Formality Location 10 Tables#9 ParkLumiereTanjoreChennaiRndzvous 309060804080 306050302040 101000 010001 000110 243102 231202 Feature Values 4 =6 3 =3 2 =-2 1 =-5 5 1 3 2 4 Ratings

Single User, Item Features ??????? User Weights ?????? Preference Scores Capacity Price French? New American? Ethnic? Formality Location 10 Tables#9 ParkLumiereTanjoreChennaiRndzvous 309060804080 306050302040 101000 010001 000110 243102 231202 Feature Values 5231?? Ratings

-2.51.4-0.95.63.1-1.8 -2.70.2-4.22.10.2-4.2 2.1-2.51.4-0.95.63.1 -1.8-2.70.2-4.22.1-2.5 1.4-0.95.63.1-1.8-2.7 0.2-4.2-1.40.73.4-0.8 1.9-2.24.72.6-3.5-2.1 Many Users, No Features 232323 215124 121313 523524 425215 333532 452435 ? ???? ?? ?? ???? ??? ??? Weights Features Preference Scores Ratings ?? ?

Possible goals: –Predict missing entries –Cluster users or items Applications: –Movies, Books –Genetic Interaction –Network routing –Sports performance Collaborative Filtering 232323 215124 121313 523524 425215 333532 452435 users items

Outline Single User, Features –Loss functions, Convexity, Large Margin –Loss function for Ratings Many Users, No Features –Feature Selection, Rank, SVD –Regularization: tie together multiple tasks –Optimization: scale to large problems Extensions

This Talk: Contributions Implementation and systematic evaluation of loss functions for Single User prediction. Scaling Multi-user regularization to large (thousands of users/items) problems –Analysis of optimization Extensions –Hybrid: features + multiple users –Observation model & multiple ratings

Rating Classification n ordered classes Learn weight vector, thresholds 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 w

Loss Functions 0-1Hinge Logistic Margin Agreement Smooth Hinge Mod. Least Squares

Convexity Convex function => no local minima Set convex if all line segments within set

Convexity of Loss Functions 0-1 loss is not convex –Local minima, sensitive to small changes Convex Bound –Large margin solution with regularization –Stronger guarantees

Proportional Odds McCullagh introduced original rating model –Linear interaction: weights & features –Thresholds –Maximum likelihood [McCullagh, 1980] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 w

Immediate-Thresholds 1 2 3 4 5 [Shashua & Levin, 2003]

Some Errors are Better than Others User: System 1:System 2:

Not a Bound on Absolute Diff. 1 2 3 4 5

All-Thresholds Loss 1 2 3 4 5 [Srebro, Rennie & Jaakkola, NIPS 2004]

Experiments Multi- Class Imm- Thresh All- Thresh p-value MLS.7486.7491.67001.7e-18 Hinge.7433.7628.67026.6e-17 Logistic.7490.7248.66237.3e-22 Least Squares: 1.3368 [Rennie & Srebro, IJCAI 2005]

Many Users, No Features 232323 215124 121313 523524 425215 333532 452435 ? ???? ?? ?? ???? ??? ??? -2.51.4-0.95.63.1-1.8 -2.70.2-4.22.10.2-4.2 2.1-2.51.4-0.95.63.1 -1.8-2.70.2-4.22.1-2.5 1.4-0.95.63.1-1.8-2.7 0.2-4.2-1.40.73.4-0.8 1.9-2.24.72.6-3.5-2.1 Weights Features Preference Scores Ratings ?? ?

Background: L p -norms L 0 : # non-zero entries: || || 0 = 3 L 1 : absolute value sum: || || 1 = 5 L 2 : Euclidean length: || || 2 = 2 General: ||v|| p = ( i |v i | p ) 1/p

Background: Feature Selection Objective: Loss + Regularization L 2 Squared L1L1

Singular Value Decomposition X=USV –U,V: orthogonal (rotation) –S: diagonal, non-negative Eigenvalues of XX=USVVSU=USSU are squared singular values of X Rank = ||s|| 0 SVD: used to obtain least-squares low- rank approximation

Low Rank Matrix Factorization V U × ¼ X rank k = 245142 312254 424131 33424 231432 22145 241423 131143 422531 Y Use SVD to find Global Optimum Non-convex No explicit soln. Sum-Squared Loss Fully Observed Y Classification Error Loss Partially Observed Y

Low-Rank: Non-Convex Set Rank 1 Rank 2

Trace Norm Regularization [Fazel et al., 2001] Trace Norm: sum of singular values y

Many Users, No Features 232323 215124 121313 523524 425215 333532 452435 -2.51.4-0.95.63.1-1.8 -2.70.2-4.22.10.2-4.2 2.1-2.51.4-0.95.63.1 -1.8-2.70.2-4.22.1-2.5 1.4-0.95.63.1-1.8-2.7 0.2-4.2-1.40.73.4-0.8 1.9-2.24.72.6-3.5-2.1 Weights Features Preference Scores Ratings U V X Y

Max Margin Matrix Factorization Convex function of X and Low rank in X All-Thresholds Loss Trace Norm [Srebro, Rennie & Jaakkola, NIPS 2004]

Properties of the Trace Norm The factorization: U S, V S minimizes both quantities

Factorized Optimization Factorized Objective (tight bound): Gradient descent: O(n 3 ) per round Stationary points, but no local minima [Rennie & Srebro, ICML 2005]

Collaborative Prediction Results size, sparsity: EachMovie 36656x1648, 96% MovieLens 6040x3952, 96% Algorithm Weak Error Strong Error Weak Error Strong Error URP.8596.8859.6946.7104 Attitude.8787.8845.6912.7000 MMMF.8548.8439.6650.6725 [URP & Attitude: Marlin, 2004] [MMMF: Rennie & Srebro, 2005]

Extensions Multi-user + Features Observation model –Predict which restaurants a user will rate, and –The rating she will make Multiple ratings per user/restaurant –E.g. Food, Service and Décor ratings SVD Parameterization

Fixed Features Learned Features Multi-User + Features Feature parameters (V): –Some are fixed –Some are learned Learn weights (U) for all features Fixed part of V does not affect regularization V

Observation Model Common assumption: ratings observed at random Restaurant selection: –Geography, popularity, price, food style Remove bias: model observation process

Observation Model Model as binary classification Add binary classification loss Tie together rating and observation models X=U X V W=U W V

Multiple Ratings Users may provide multiple ratings: –Service, Décor, Food Add in loss functions Stack parameter matrices for regularization

SVD Parameterization Too many parameters: UAA -1 V=X is another factorization of X Alternate: U,S,V –U,V orthogonal, S diagonal Advantages: –Not over-parameterized –Exact objective (not a bound) –No stationary points

Summary Loss function for ratings Regularization for multiple users Scaled MMMF to large problems (e.g. > 1000x1000) Trace norm: widely applicable Extensions Code: http://people.csail.mit.edu/jrennie/matlab

Thanks! Helen, for supporting me for 7.5 years! Tommi Jaakkola, for answering all my questions and directing me to the end! Mike Collins and Tommy Poggio for addl guidance. Nati Srebro & John Barnett for endless valuable discussions and ideas. Amir Globerson, David Sontag, Luis Ortiz, Luis Perez-Breva, Alan Qi, & Patrycja Missiuro & all past members of Tommis reading group for paper discussions, conference trips and feedback on my talks. Many, many others who have helped me along the way!

Low-Rank Optimization Low-Rank Minimum Objective Minimum Low-Rank Local Minimum Low- Rank Low- Rank

Learning User Preferences Jason Rennie MIT CSAIL Advisor: Tommi Jaakkola.

Similar presentations

Presentation on theme: "Learning User Preferences Jason Rennie MIT CSAIL Advisor: Tommi Jaakkola."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning User Preferences Jason Rennie MIT CSAIL Advisor: Tommi Jaakkola.

Similar presentations

Presentation on theme: "Learning User Preferences Jason Rennie MIT CSAIL Advisor: Tommi Jaakkola."— Presentation transcript:

Similar presentations

About project

Feedback