1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Google News Personalization: Scalable Online Collaborative Filtering

Biointelligence Laboratory, Seoul National University

R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.

Dimensionality Reduction PCA -- SVD

Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.

Segmentation and Fitting Using Probabilistic Methods

Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.

Machine Learning & Data Mining CS/CNS/EE 155 Lecture 14: Embeddings 1Lecture 14: Embeddings.

Visual Recognition Tutorial

Pattern Recognition and Machine Learning

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Independent Component Analysis (ICA) and Factor Analysis (FA)

Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.

Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.

Recommender systems Ram Akella November 26 th 2008.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Cao et al. ICML 2010 Presented by Danushka Bollegala.

PATTERN RECOGNITION AND MACHINE LEARNING

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Outline Separating Hyperplanes – Separable Case

ArrayCluster: an analytic tool for clustering, data visualization and module ﬁnder on gene expression proﬁles 組員：李祥豪謝紹陽江建霖.

EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.

EigenRank: A ranking oriented approach to collaborative filtering By Nathan N. Liu and Qiang Yang Presented by Zachary 1.

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun

Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.

Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.

Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Collaborative Deep Learning for Recommender Systems

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

Biointelligence Laboratory, Seoul National University

Dimensionality Reduction and Principle Components Analysis

Information Retrieval: Models and Methods

Chapter 7. Classification and Prediction

Deep Feedforward Networks

Large-Scale Content-Based Audio Retrieval from Text Queries

Probability Theory and Parameter Estimation I

LECTURE 11: Advanced Discriminant Analysis

Information Retrieval: Models and Methods

Personalized Social Image Recommendation

Multimodal Learning with Deep Boltzmann Machines

Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Qi Xie1, Shenglin Zhao2, Zibin Zheng3, Jieming Zhu2 and Michael.

Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.

Roberto Battiti, Mauro Brunato

Advanced Artificial Intelligence

Q4 : How does Netflix recommend movies?

Collaborative Filtering Matrix Factorization Approach

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Word Embedding Word2Vec.

Connecting Data with Domain Knowledge in Neural Networks -- Use Deep learning in Conventional problems Lizhong Zheng.

Biointelligence Laboratory, Seoul National University

Michal Rosen-Zvi University of California, Irvine

Learning Theory Reza Shadmehr

Probabilistic Latent Preference Analysis

Machine learning overview

Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.

Recommendation Systems

Linear Discrimination

EM Algorithm and its Applications

Restructuring Sparse High Dimensional Data for Effective Retrieval

Jia-Bin Huang Virginia Tech

Recommender Systems Problem formulation Machine Learning.

Presentation transcript:

1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro

2 Outline Outline  Matrix Factorization Models and Formulations  --Dimensionality reduction and applications  --Matrix Factorization  -- Matrix Factorization Models  --Loss functions  Finding Low Rank Approximations  --Frobenius Low Rank Approximation  --Weighted Low Rank Approximations(WLRA)  --EM Approach and Newton Approach  Maximum Margin Matrix Factorization  -- Collaborative Filtering and Collaborative Prediction

3 Dimensionality reduction  The underlying premise: important aspects of the data can be captured via a low-dimensional representation.

4 Applications Applications  Signal reconstruction: The reduced representation may correspond to some hidden signal or process that is observed indirectly.(factor analysis)  Lossy compression : reduce memory requirements and computational costs.

5 Applications Applications  Understanding structure : understand the relationship between items in the corpus (document/image) and the major modes of variation(item features, such as word appearances /pixel color levels).  Prediction : data matrix is only partially observed (e.g. not all users rated, or saw, all movies), matrix factorization can be used to predict unobserved entries.(Collaborative Filtering)

6 Linear Dimensionality Reduction

7 Matrix Factorization Matrix Factorization Each row of U functions as a “feature vector”, and each column of V ′ is a linear predictor, predicting the entries in the corresponding column of Y based on the “features” in U. [MMMFnips04]

8 Matrix Factorization Models and Formulations  Gaussian Additive Noise:  –Y = X + Z, Z  Gaussian;  General Additive Noise:  –Y = X + Z, Z  Any Distribution;  Low-Rank Models for Matrices of Counts:  -”bag of words”, for a corpus of text documents, the rows corresponds to documents and columns to words. Entries Yia can be boolean, Occurrences or Frequencies.

9 Loss functions Loss functions  Sum Squared Error : the Frobenius distance  Another loss function : the negative log- likelihood.

10 Outline Outline  Matrix Factorization Models and Formulations  --Dimensionality reduction and applications  --Matrix Factorization  -- Matrix Factorization Models  --Loss functions  Finding Low Rank Approximations  --Frobenius Low Rank Approximation  --Weighted Low Rank Approximations(WLRA)  --EM Approach and Newton Approach  Maximum Margin Matrix Factorization  -- Collaborative Filtering and Collaborative Prediction

11 Frobenius Low Rank Approximations  Loss function:  Minimizing the loss function: V’V = I and U’U = is diagonal, yields U = AV. （正交解）

12 Weighted Low Rank Approximations  Loss function:  Minimizing the loss function : for fixed V,  each row

13 Weighted Low Rank Approximations  Gradient-Based Optimization:  The parameter space of J(V) is of course much smaller than that of J(U,V),making optimization of J(V) more tractable.

14 EM Approach and Newton Approach  EM approach : simpler to implement.  Newton Approach: (For other loss functions)

15 Outline Outline  Matrix Factorization Models and Formulations  --Dimensionality reduction and applications  --Matrix Factorization  -- Matrix Factorization Models  --Loss functions  Finding Low Rank Approximations  --Frobenius Low Rank Approximation  --Weighted Low Rank Approximations(WLRA)  --EM Approach and Newton Approach  Maximum Margin Matrix Factorization  -- Collaborative Filtering and Collaborative Prediction

16 Collaborative Filtering and Collaborative Prediction  “Collaborative filtering” ： providing users with information on what items they might like, or dislike, based on their preferences so far (perhaps as inferred from their actions) and how they relate to the preferences of other users.  “collaborative prediction”: predicting the user’s preference regarding each item, answering queries of the form “Will I like this movie?”.

17 Collaborative Filtering ---based on their preferences so far and how they relate to the preferences of other users. ---providing users with information on what items they might like, or dislike,

18 Matrix Completion ---Based on a partially observed target matrix. ---predicting the unobserved entries ---Other application ： filling in missing values in a mostly observed matrix of experiment Results ---gene expression analysis

19 Matrix Factorization for Collaborative Prediction  Methods mostly differ in ：  how they relate real-valued entries in X to preferences in Y ： viewing the entries in X as mean parameters ， natural parameters ， unobserved entries by zeros ， and so on.  measure of discrepancy ： a sum-squared loss ， a logistic loss  Methods for this paper’s collaborative prediction:  Y = X + Z,X=UV’,X to predict unobserved entries.  Loss function:

20 Matrix Factorization for Collaborative Prediction ---When U is fixed, Each row of U functions as a “feature vector”, and each column of V’ is a linear predictor ---Fitting U and V: learning feature vectors that work well across all of the prediction problems.

21 What I have read and Future work  What I have read:  Significance of recommendation  The challenge of group recommendation  Methods for Recommendation  tag recommendation  Future work :  Determine appropriate parameters for dynamic social media data of wide diversity  Extent the current techniques to integrate more social information

22 Significance of recommendation  Recommend content: help users to browse  Recommend users to one user: help expand relationships  Recommend users to group: help increase community size

23 The challenge of the group recommendation ：  the loose semantics associated with an interest group  ---the groups may share overlapped interests;  --- the users may contribute their images to multiple groups;  --- the groups may be formed upon non-visual properties,e.g… a group of “London”.  Solution : contextual information, such as image annotations, capture location, and time, to provide more insight beyond the image content.

24 Methods for Recommendation  Content-Based Recommendation: Learning a model, to represent the user preference.  Collaborative Filtering and Collaborative Prediction :  Hybrid methods : hybrid training strategy, and combine content-based and CF method. (ChenChen)  Matrix factorization:

25 tag recommendation  Significance : tag recommendation is important to social tagging and image search --- motivate users to contribute more useful tags to an image ---remind the users of more rich and specific tags ---depress the noise in social tagging system.

26 tag recommendation  Drawbacks of social tagging:  polysemy and synonyms problem --- different users may tag similar images with different words ---it is difficult for the users to input all the tags of the equivalent meaning.  ambiguity : apple

27 tag recommendation  Tag ranking:  estimate initial relevance scores for the tags based on probability density estimation.  refine the relevance scores based on a random walk over a tag similarity.

28 What impress me most What impress me most  A hybrid training strategy: combines Gibbs sampling (provide better initialization) and Expectation-Maximization algorithm (faster).  User and group contacts: boolean number VS frequency.  Learn the relationships between tags:

29 Future work Future work  Determine appropriate parameters for dynamic social media data of wide diversity.  Survey on applications and motivations  Design and revise models  Experiments and paper writing

30 Future work Future work  Extent the current techniques to integrate more social information.  Add user contact information  Combine feature selection in TMSM  Group cleaning