He Xiangnan Research Fellow National University of Singapore

He Xiangnan Research Fellow National University of Singapore
Cross-Modal Recommendation Moving from Shallow Learning to Deep Learning He Xiangnan Research Fellow National University of Singapore

Motivation Recommender System
Netflix: 60+% of the movies watched are recommended. Google News: RS generates 38+% click-throughs Amazon: 35% sales are from recommendations In the current age of information overloading, recommender system plays an important role to help users seek their desirable information. Many online systems that interact with users are actually in the form of recommender system. Together with online advertising, they serve as the main way to earn money for many websites, such as in Netflix over 60% movie traffic are recommended, and in Amazon, over 35% sales are from recommendations. Statistics come from Xavier Amatriain

Motivation Cross-Modal Recommendation
Rich cross-modal information: User-item Interactions (ratings, likes, clicks, purchases...) User profiles (ages, genders …) Item profiles (descriptions, images…) Textual reviews Contexts (locations, time …) The “Recommender problem”: Estimate a scoring function that predicts how a user will like an item, based on the available information. Besides transaction records, there are a lot of rich side information, such as the user-item interactions like ratings and clicks, user demographics, item attributes, textual reviews and various contexts. These data are available in the form of multiple modalities, such as categorical variables, texts, images, videos and so on. [click] The recommender problem is to estimate a scoring function that predicts how much a user will like an item. So the key research for cross-modal recommendation is how to effectively fuse all these available information to better estimate the scoring function. [Zhang et al KDD CKE.]

Collaborative Filtering
“Traditional” view of collaborative filtering (CF): “CF makes predictions (filtering) about a user’s interest by collecting preferences information from many users (collaborating)” 1. Memory-based: Predict by memorizing similar users’ ratings 2. Model-based: Predict by inferring from an underlying model. Collaborative Filtering is the default technique for modern recommender systems. The basic idea is that to predict a user’s interest, not only his own history is considered, but also the histories of other similar users. Typically the data for CF are user-item interaction history, for example a table includes user ID, item ID and the rating sore. And the CF task can be formulated as estimating the missing entries of the user-item rating matrix. E.g., MF learns latent vector for each user, item: Score between ‘u’ and ‘i’:

Recommendation as a Learning Problem
“Standard” supervised learning view of CF: Matrix / Tensor data can be represented by a design matrix (feature vectors): ML methods: - Logistic Regression - SVM - Decision Trees - Bayesian Networks - Neural Networks …… One-hot encoding [Rendle, ICDM 2010]

A Generic Solution for Cross-Modal Data
E.g., location, time, weather, mood … context data E.g., user gender, age, occupation personality … user data rating data Input Features: 1. Categorical features: user/item ID, bag-of-words, historical features… 2. Real-valued features: textual/visual embeddings, converted features (e.g. TFIDF, GBDT)… item data E.g., item category, description, image … One-hot encoding Predictive Sparse ML Models (recommender)

Advantages of such Generic Solution
One model for all: Regardless of applications, all practitioners need to do is feature engineering and model hyper-parameter tuning. Controllable complexity: Only non-zero features in the design matrix matter. More efficient than tensor methods. What models work?

Requirements for a Good Model
Key properties to capture: 1. Collaborative Filtering effect: user ID + item ID 2. Cross Feature effect: e.g., female in ages 20 like pink. gender x age x visual 3. Strong generalization ability: All feature combinations in testing have never seen in training. In the next… Shallow Methods: Logistic Regression Factorization Machines Deep Methods: Wide&Deep Neural Factorization Machines (our recent work)

Shallow Methods – Logistic Regression
GBDT Features: LR LR is a single-layer Neural Network Pros: - Simple & Easy to Interpret Cons: - Features are mutually independent. (need to manually design cross features) GBDT can extract non-linear feature interactions. CF effect can be captured by embedding features of MF. FaceBook CTR solution in 2014. [He et al. ADKDD 2014]

Shallow Method – Factorization Machine
Model of FM: Example: Another example: S = wESPN + wNike + <vESPN,vNike> S = wESPN + wNike + wGender + <vESPN,vNike> + < vESPN,vMale > + < vNike,vMale > Pros: - Feature embeddings allow strong generalization. - Feature interactions are learned automatically. Cons: Only 2-order feature interactions. (inefficient for higher order interactions) Only linear interactions.

Deep Methods – Wide&Deep
Google’s App Recommender Solution in 2016: Concatenation Pros: - Feature embeddings allow strong generalization. - Deep part can learn any-order feature interactions (implicitly). Cons: - Feature interactions learned by hidden layers are “black-box” - Deep part is easy to over-generalize. [Cheng et al. DLRS 2016]

Deep Methods – Neural FM
Our work in SIGIR 2017 and IJCAI 2017. Learn 2-order interactions with FM and explain them with attention. Learn high-order interactions with Deep Neural Network. Explain a recommendation by identifying most predictive interactions: <Female, Age 20> <Age 20, iPhone> <Female, Color Pink> …… Outperform FM by 7% Outperform Google’s Wide&Deep by 3% Our deep recommendation solution perform representation learning on features, and most importantly, is self-explainable. [click] The core design of our solution is the attention-augmented pairwise pooling. It allows explaining recommendation by identify the most predictive interactions, such as we recommend iPhone Rose Gold to a user because she is a female of age 20, and the people of similar profile tend to by iPhone Rose Gold. Our solution outperforms factorization machine by 7%, and better than Google’s Wide&Deep solution by 3% on recommendation and CTR evaluation. second-order interactions high-order interactions [He and Chua. SIGIR 2017,Xiao et al. IJCAI 2017]

Personal Thoughts on Deep Recommendation
Generic models that allow easy feature engineering are more preferable in industry. However, most research papers only propose a specific model for a specific domain with certain inputs. Shallow models are still dominant. E.g. linear, factorization and tree models. Directly apply existing DL methods may not work. The key reason: strong representation => over generalization. Future research should focus on designing better and explainable neural components that can meet the properties of a specific task.

Thanks!

He Xiangnan Research Fellow National University of Singapore

Similar presentations

Presentation on theme: "He Xiangnan Research Fellow National University of Singapore"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

He Xiangnan Research Fellow National University of Singapore

Similar presentations

Presentation on theme: "He Xiangnan Research Fellow National University of Singapore"— Presentation transcript:

Similar presentations

About project

Feedback