1/11 Tea Talk: Weighted Low Rank Approximations Ben Marlin Machine Learning Group Department of Computer Science University of Toronto April 30, 2003.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

PCA Data. PCA Data minus mean Eigenvectors Compressed Data.
General Linear Model With correlated error terms  =  2 V ≠  2 I.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
Matrix Factorization with Unknown Noise
Pattern Recognition and Machine Learning
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Dimensionality Reduction PCA -- SVD
Supervised Learning Recap
Robot Vision SS 2005 Matthias Rüther 1 ROBOT VISION Lesson 3: Projective Geometry Matthias Rüther Slides courtesy of Marc Pollefeys Department of Computer.
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Visual Recognition Tutorial
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
Midterm Review. The Midterm Everything we have talked about so far Stuff from HW I won’t ask you to do as complicated calculations as the HW Don’t need.
1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
Probabilistic video stabilization using Kalman filtering and mosaicking.
Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.
Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.
Independent Component Analysis (ICA) and Factor Analysis (FA)
Expectation Maximization Algorithm
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
1 Neural Nets Applications Vectors and Matrices. 2/27 Outline 1. Definition of Vectors 2. Operations on Vectors 3. Linear Dependence of Vectors 4. Definition.
Maximum Likelihood (ML), Expectation Maximization (EM)
Ordinary least squares regression (OLS)
Expectation-Maximization
Visual Recognition Tutorial
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
1 Systems of Linear Equations Error Analysis and System Condition.
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Intro to Matrices Don’t be scared….
T WO WAY ANOVA WITH REPLICATION  Also called a Factorial Experiment.  Replication means an independent repeat of each factor combination.  The purpose.
Today Wrap up of probability Vectors, Matrices. Calculus
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Matrix Algebra and Regression a matrix is a rectangular array of elements m=#rows, n=#columns  m x n a single value is called a ‘scalar’ a single row.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Fast Maximum Margin Matrix Factorization for Collaborative Prediction Jason Rennie MIT Nati Srebro Univ. of Toronto.
Orthogonalization via Deflation By Achiya Dax Hydrological Service Jerusalem, Israel
Elementary Linear Algebra Anton & Rorres, 9th Edition
Efficient computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm Anders Eriksson and Anton van den Hengel.
Direct Robust Matrix Factorization Liang Xiong, Xi Chen, Jeff Schneider Presented by xxx School of Computer Science Carnegie Mellon University.
Image transformations Digital Image Processing Instructor: Dr. Cheng-Chien LiuCheng-Chien Liu Department of Earth Sciences National Cheng Kung University.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
Introducing Error Co-variances in the ARM Variational Analysis Minghua Zhang (Stony Brook University/SUNY) and Shaocheng Xie (Lawrence Livermore National.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Advanced Artificial Intelligence Lecture 8: Advance machine learning.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Numerical Analysis – Data Fitting Hanyang University Jong-Il Park.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Extreme Learning Machine
Roberto Battiti, Mauro Brunato
Advanced Artificial Intelligence
Two Sample t-test vs. Paired t-test
Probabilistic Models with Latent Variables
10701 / Machine Learning Today: - Cross validation,
Presented by: Mingyuan Zhou Duke University, ECE February 18, 2011
5.2 Least-Squares Fit to a Straight Line
Presentation transcript:

1/11 Tea Talk: Weighted Low Rank Approximations Ben Marlin Machine Learning Group Department of Computer Science University of Toronto April 30, 2003

2/11 Paper Details: Authors: Nathan Srebro, Tommi Jaakkola (MIT) Title: Weighted Low Rank Approximations URL: Submitted: ICML 2003

3/11 Motivation: Missing Data: Weighted LRA naturally handles data matrices with missing elements by using a 0/1 weight matrix. Noisy Data: Weighted LRA naturally handles data matrices with different noise variance estimates σ ij for each of the elements of the matrix by setting W ij = 1/σ ij.

4/11 The Problem: Given an nxm data matrix D and an nxm weight matrix W, construct a rank-K approximation X=UV’ to D that minimizes error in the weighted Froebenius norm E WF. DU V’ = mm n W m X mK K

5/11 Relationship to standard SVD: Critical points of E WF can be local minima that are not global minima. wSVD does not admit a solution based on eigenvectors of the data matrix D. Adding the requirement that U and V are orthogonal results in a weighted low rank approximation analogous to SVD.

6/11 Optimization Approach: Main Idea: For a given V the optimal U v * can be calculated analytically, as can the gradient of the projected objective function E* WF (V)= E* WF (U v *, V). Thus, perform gradient descent on E* WF (V). Where d(W i ) is the mxm matrix with the i th row of W along the diagonal and D i is the i th row of D.

7/11 Missing Value Approach: Main Idea: Consider a model of the data matrix given by D=X+Z where Z is white Gaussian noise. The weighted cost of X is equivalent to the log-likelihood of the observed variables. This suggests an EM approach where in the E step the missing values in D are filled in according to the values in X creating a matrix F. In the M step X is re-estimated as the rank-K SVD of F.

8/11 Missing Value Approach: Extension to General Weights: Consider a system with several data matrices D n =X+Z n where the Z n are independent gaussian white noise. The maximum likelihood X in this case is found by taking the rank-K SVD of the mean of the F n ’s. Now consider a weighted rank-K approximation problem where W ij = w ij /N and w ij ={1,…,N}. Such a problem can be converted to the type of problem described above by observing D ij in w ij of a total of N D n ’s. For any N the mean of the N matrices F n is given by:

9/11 Missing Value Approach: EM Algorithm: This approach yields an extremely simple EM- Algorithm: E-Step: M-Step: Obtain U,V from SVD of F Set X t+1 = UV’ function X=wsvd(D,W,K) X=zeros(size(D)); Xold=inf*ones(size(D)); C=inf; while(sum(sum((X-Xold).^2))>eps) Xold=X; [U,S,V]=svd(W.*D+(1-W).*X); S(K+1:end,K+1:end)=0; X=U*S*V'; end

10/11 Example: DataWeights wSVD K= = Synthetic Rank 2 Matrix:

11/11 The End