1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.

1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro

2 Outline Outline  Matrix Factorization Models and Formulations  --Dimensionality reduction and applications  --Matrix Factorization  -- Matrix Factorization Models  --Loss functions  Finding Low Rank Approximations  --Frobenius Low Rank Approximation  --Weighted Low Rank Approximations(WLRA)  --EM Approach and Newton Approach  Maximum Margin Matrix Factorization  -- Collaborative Filtering and Collaborative Prediction

3 Dimensionality reduction  The underlying premise: important aspects of the data can be captured via a low-dimensional representation.

4 Applications Applications  Signal reconstruction: The reduced representation may correspond to some hidden signal or process that is observed indirectly.(factor analysis)  Lossy compression : reduce memory requirements and computational costs.

5 Applications Applications  Understanding structure : understand the relationship between items in the corpus (document/image) and the major modes of variation(item features, such as word appearances /pixel color levels).  Prediction : data matrix is only partially observed (e.g. not all users rated, or saw, all movies), matrix factorization can be used to predict unobserved entries.(Collaborative Filtering)

6 Linear Dimensionality Reduction

7 Matrix Factorization Matrix Factorization Each row of U functions as a “feature vector”, and each column of V ′ is a linear predictor, predicting the entries in the corresponding column of Y based on the “features” in U. [MMMFnips04]

8 Matrix Factorization Models and Formulations  Gaussian Additive Noise:  –Y = X + Z, Z  Gaussian;  General Additive Noise:  –Y = X + Z, Z  Any Distribution;  Low-Rank Models for Matrices of Counts:  -”bag of words”, for a corpus of text documents, the rows corresponds to documents and columns to words. Entries Yia can be boolean, Occurrences or Frequencies.

9 Loss functions Loss functions  Sum Squared Error : the Frobenius distance  Another loss function : the negative log- likelihood.

11 Frobenius Low Rank Approximations  Loss function:  Minimizing the loss function: V’V = I and U’U = is diagonal, yields U = AV. （正交解）

12 Weighted Low Rank Approximations  Loss function:  Minimizing the loss function : for fixed V,  each row

13 Weighted Low Rank Approximations  Gradient-Based Optimization:  The parameter space of J(V) is of course much smaller than that of J(U,V),making optimization of J(V) more tractable.

14 EM Approach and Newton Approach  EM approach : simpler to implement.  Newton Approach: (For other loss functions)

16 Collaborative Filtering and Collaborative Prediction  “Collaborative filtering” ： providing users with information on what items they might like, or dislike, based on their preferences so far (perhaps as inferred from their actions) and how they relate to the preferences of other users.  “collaborative prediction”: predicting the user’s preference regarding each item, answering queries of the form “Will I like this movie?”.

17 Collaborative Filtering ---based on their preferences so far and how they relate to the preferences of other users. ---providing users with information on what items they might like, or dislike,

18 Matrix Completion ---Based on a partially observed target matrix. ---predicting the unobserved entries ---Other application ： filling in missing values in a mostly observed matrix of experiment Results ---gene expression analysis

19 Matrix Factorization for Collaborative Prediction  Methods mostly differ in ：  how they relate real-valued entries in X to preferences in Y ： viewing the entries in X as mean parameters ， natural parameters ， unobserved entries by zeros ， and so on.  measure of discrepancy ： a sum-squared loss ， a logistic loss  Methods for this paper’s collaborative prediction:  Y = X + Z,X=UV’,X to predict unobserved entries.  Loss function:

20 Matrix Factorization for Collaborative Prediction ---When U is fixed, Each row of U functions as a “feature vector”, and each column of V’ is a linear predictor ---Fitting U and V: learning feature vectors that work well across all of the prediction problems.

21 What I have read and Future work  What I have read:  Significance of recommendation  The challenge of group recommendation  Methods for Recommendation  tag recommendation  Future work :  Determine appropriate parameters for dynamic social media data of wide diversity  Extent the current techniques to integrate more social information

22 Significance of recommendation  Recommend content: help users to browse  Recommend users to one user: help expand relationships  Recommend users to group: help increase community size

23 The challenge of the group recommendation ：  the loose semantics associated with an interest group  ---the groups may share overlapped interests;  --- the users may contribute their images to multiple groups;  --- the groups may be formed upon non-visual properties,e.g… a group of “London”.  Solution : contextual information, such as image annotations, capture location, and time, to provide more insight beyond the image content.

24 Methods for Recommendation  Content-Based Recommendation: Learning a model, to represent the user preference.  Collaborative Filtering and Collaborative Prediction :  Hybrid methods : hybrid training strategy, and combine content-based and CF method. (ChenChen)  Matrix factorization:

25 tag recommendation  Significance : tag recommendation is important to social tagging and image search --- motivate users to contribute more useful tags to an image ---remind the users of more rich and specific tags ---depress the noise in social tagging system.

26 tag recommendation  Drawbacks of social tagging:  polysemy and synonyms problem --- different users may tag similar images with different words ---it is difficult for the users to input all the tags of the equivalent meaning.  ambiguity : apple

27 tag recommendation  Tag ranking:  estimate initial relevance scores for the tags based on probability density estimation.  refine the relevance scores based on a random walk over a tag similarity.

28 What impress me most What impress me most  A hybrid training strategy: combines Gibbs sampling (provide better initialization) and Expectation-Maximization algorithm (faster).  User and group contacts: boolean number VS frequency.  Learn the relationships between tags:

29 Future work Future work  Determine appropriate parameters for dynamic social media data of wide diversity.  Survey on applications and motivations  Design and revise models  Experiments and paper writing

30 Future work Future work  Extent the current techniques to integrate more social information.  Add user contact information  Combine feature selection in TMSM  Group cleaning

1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.

Similar presentations

Presentation on theme: "1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.

Similar presentations

Presentation on theme: "1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro."— Presentation transcript:

Similar presentations

About project

Feedback