A Note on Rectangular Quotients By Achiya Dax Hydrological Service Jerusalem, Israel

Slides:



Advertisements
Similar presentations
Eigen Decomposition and Singular Value Decomposition
Advertisements

Eigen Decomposition and Singular Value Decomposition
3D Geometry for Computer Graphics
General Linear Model With correlated error terms  =  2 V ≠  2 I.
Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
Scientific Computing QR Factorization Part 2 – Algorithm to Find Eigenvalues.
Generalised Inverses Modal Analysis and Modal Testing S. Ziaei Rad.
Extremum Properties of Orthogonal Quotients Matrices By Achiya Dax Hydrological Service, Jerusalem, Israel
The General Linear Model. The Simple Linear Model Linear Regression.
Solving Linear Systems (Numerical Recipes, Chap 2)
Lecture 13 - Eigen-analysis CVEN 302 July 1, 2002.
Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.
1cs542g-term Notes  In assignment 1, problem 2: smoothness = number of times differentiable.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
1cs542g-term Notes  Assignment 1 due tonight ( me by tomorrow morning)
Symmetric Matrices and Quadratic Forms
Sampling algorithms for l 2 regression and applications Michael W. Mahoney Yahoo Research (Joint work with P. Drineas.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
Math for CSLecture 41 Linear Least Squares Problem Over-determined systems Minimization problem: Least squares norm Normal Equations Singular Value Decomposition.
Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
Lecture 20 SVD and Its Applications Shang-Hua Teng.
Tutorial 10 Iterative Methods and Matrix Norms. 2 In an iterative process, the k+1 step is defined via: Iterative processes Eigenvector decomposition.
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 6. Eigenvalue problems.
5.1 Orthogonality.
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 108F Linear.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
CHAPTER SIX Eigenvalues
Linear Algebra Review 1 CS479/679 Pattern Recognition Dr. George Bebis.
Eigenvalue Problems Solving linear systems Ax = b is one part of numerical linear algebra, and involves manipulating the rows of a matrix. The second main.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Systems of Linear Equation and Matrices
Chapter 5 Orthogonality.
Gram-Schmidt Orthogonalization
Extrapolation Models for Convergence Acceleration and Function ’ s Extension David Levin Tel-Aviv University MAIA Erice 2013.
SVD: Singular Value Decomposition
Matrices CHAPTER 8.1 ~ 8.8. Ch _2 Contents  8.1 Matrix Algebra 8.1 Matrix Algebra  8.2 Systems of Linear Algebra Equations 8.2 Systems of Linear.
Orthogonalization via Deflation By Achiya Dax Hydrological Service Jerusalem, Israel
Elementary Linear Algebra Anton & Rorres, 9th Edition
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 4. Least squares.
Efficient computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm Anders Eriksson and Anton van den Hengel.
4.8 Rank Rank enables one to relate matrices to vectors, and vice versa. Definition Let A be an m  n matrix. The rows of A may be viewed as row vectors.
Algorithms 2005 Ramesh Hariharan. Algebraic Methods.
E XACT MATRIX C OMPLETION VIA CONVEX OPTIMIZATION E MMANUEL J. C ANDES AND B ENJAMIN R ECHT M AY 2008 Presenter: Shujie Hou January, 28 th,2011 Department.
Large-Scale Matrix Factorization with Missing Data under Additional Constraints Kaushik Mitra University of Maryland, College Park, MD Sameer Sheoreyy.
Arab Open University Faculty of Computer Studies M132: Linear Algebra
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Krylov-Subspace Methods - I Lecture 6 Alessandra Nardi Thanks to Prof. Jacob White, Deepak Ramaswamy, Michal Rewienski, and Karen Veroy.
Instructor: Mircea Nicolescu Lecture 8 CS 485 / 685 Computer Vision.
1 Instituto Tecnológico de Aeronáutica Prof. Maurício Vicente Donadon AE-256 NUMERICAL METHODS IN APPLIED STRUCTURAL MECHANICS Lecture notes: Prof. Maurício.
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
Singular Value Decomposition and Numerical Rank. The SVD was established for real square matrices in the 1870’s by Beltrami & Jordan for complex square.
Chapter 61 Chapter 7 Review of Matrix Methods Including: Eigen Vectors, Eigen Values, Principle Components, Singular Value Decomposition.
Krylov-Subspace Methods - II Lecture 7 Alessandra Nardi Thanks to Prof. Jacob White, Deepak Ramaswamy, Michal Rewienski, and Karen Veroy.
Matrices CHAPTER 8.9 ~ Ch _2 Contents  8.9 Power of Matrices 8.9 Power of Matrices  8.10 Orthogonal Matrices 8.10 Orthogonal Matrices 
CSE 554 Lecture 8: Alignment
Chapter 7 Determination of Natural Frequencies and Mode shapes
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Review of Linear Algebra
Lecture 8:Eigenfaces and Shared Features
Two-view geometry Computer Vision Spring 2018, Lecture 10
Singular Value Decomposition SVD
Symmetric Matrices and Quadratic Forms
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)
Lecture 13: Singular Value Decomposition (SVD)
Presentation transcript:

A Note on Rectangular Quotients By Achiya Dax Hydrological Service Jerusalem, Israel

The Symmetric Case S = ( s ij ) a symmetric positive semi-definite n x n matrix With eigenvalues     n  and eigenvectors v 1, v 2, …, v n S v j = j v j, j = 1, …, n. S V = V D V = [v 1, v 2, …, v n ], V T V = V V T = I D = diag { 1, 2, …, n } S = V D V T =  j v j v j T

Low-Rank Approximations S = 1 v 1 v 1 T + … + n v n v n T T 1 = 1 v 1 v 1 T T 2 = 1 v 1 v 1 T + 2 v 2 v 2 T. T k = 1 v 1 v 1 T + 2 v 2 v 2 T + … + k v k v k T T k is a low - rank approximation of order k.

The Rayleigh Quotient  =  (v, S) = v T S v / v T v  = arg min f (  ) = || S v -  v || 2  estimates an eigenvalue corresponding to V

The Power Method Starting with some unit vector p 0. The k th iteration, k = 1, 2, 3, …, Step 1: Compute w k = S p k-1 Step 2: Compute  k = ( p k-1 ) T w k Step 3: Normalize p k = w k / || w k || 2

THE POWER METHOD Asymptotic Rates of Convergence ( Assuming 1 > 2 ) { p k }   v 1 at a linear rate, proportional to 2 / 1 {  k }  1 at a linear rate, proportional to ( 2 / 1 ) 2 Monotony : 1  …   k  …   2   1 > 0

THE POWER METHOD The asymptotic rates of convergence depend on the ratio 2 / 1 and can be arbitrarily slow. Yet  k provides a fair estimate of 1 within a few iterations ! For a “worst case analysis” see D.P. O’Leary, G.W. Stewart and J.S. Vandergraft, “Estimating the largest eigenvalue of a positive definite matrix”, Math. of Comp., 33(1979), pp – 1292.

THE POWER METHOD An eigenvector v j is called “large” if j  1 / 2 and “small” if j < 1 / 2. In most of the practical situations, for “small” eigenvectors p k T v j becomes negligible after a small number of iterations. Thus, after a few iterations p k actually lies in a subspace spanned by “large” eigenvectors.

Deflation by Subtraction S = 1 v 1 v 1 T + … + n v n v n T. S 1 = S - 1 v 1 v 1 T = 2 v 2 v 2 T + … + n v n v n T. S 2 = S v 2 v 2 T = 3 v 3 v 3 T + … + n v n v n T.. S n-1 = n v n v n T. S n = 0. Hotelling (1933, 1943)

The Frobenius norm A = ( a ij ), || A || F ] =  | a ij | 2½ [

The Minimum Norm Approach Let the vector v* solve the minimum norm problem minimize E (v) = || S - v v T || F 2. Then v 1 = v* / || v* || 2 and 1 = (v*) T v*.

The Symmetric Quotient Given any vector u, the Symmetric Quotient  (u) = u T S u / ( u T u ) 2 solves the one parameter problem minimize f (  ) = || S -  u u T || F 2 That is,  (u) = arg min f (  ). If || u || 2 = 1 then  (u) =  (u) = u T S u

The Symmetric Quotient Equality || S -  (u) u u T || F 2 = || S || F 2 - (  (u) ) 2 means that solving minimize F (u) = || S - u u T || F 2 is equivalent to solving maximize  (u ) = u T S u / u T u

Can we extend these tools to rectangular matrices?

The Rectangular Case A = ( a ij ) a real m x n matrix, p = min {m, n} With singular values  1   2  …   p  0, Left singular vectors u 1, u 2, …, u p Right singular vectors v 1, v 2, …, v p A v j =  j u j, A T u j =  j v j = 1, …, p.

The Singular Value Decomposition A = U  V T  = diag {  1,  2, …,  p }, p = min { m, n } U = [u 1, u 2, …, u p ], U T U = I V = [v 1, v 2, …, v p ], V T V = I A V = U  A T U = V  A v j =  j u j, A T u j =  j v j j = 1, …, p.

Low - Rank Approximations A = U  V T =   j u j v j T A =  1 u 1 v 1 T +  2 u 2 v 2 T + … +  p u p v p T. B 1 =  1 u 1 v 1 T B 2 =  1 u 1 v 1 T +  2 u 2 v 2 T. B k =  1 u 1 v 1 T +  2 u 2 v 2 T + … +  k u k v k T B k is a low - rank approximation of order k. (Also called "truncated SVD“ or “filtered SVD”.)

The Minimum Norm Approach Let the vectors u* and v* solve the problem minimize F ( u, v) = || A - u v T || F 2 then u 1 = u* / || u* || 2, v 1 = v* / || v* || 2, and  1 = || u* || 2 || v* || 2 ( See the Eckhart-Young, Schmidt-Mirsky Theorems.)

The Rectangular Quotient Given any vectors, u and v, the Rectangular Quotient  (u, v) = u T A v / ( u T u ) ( v T v ) solves the one parameter problem minimize f (  ) = || A -  u v T || F 2 That is,  (u, v) = arg min f (  )

The Rectangular Rayleigh Quotient Given two vectors, u and v, the Rectangular Rayleigh Quotient  (u, v) = u T A v / || u || 2 || v || 2 estimates the “corresponding” singular value.

The Rectangular Rayleigh Quotient Given two unit vectors, u and v, the Rectangular Rayleigh Quotient  (u, v) = u T A v / || u || 2 || v || 2 solves the following three problems minimize f 1 (  ) = || A -  u v T || F minimize f 2 (  ) = || A v -  u || 2 minimize f 3 (  ) = || A T u -  v || 2

The Rectangular Quotients Equality Given any pair of vectors, u and v, the Rectangular Quotient   (u,v) = u T A v / ( u T u ) ( v T v ) satisfies || A –   (u,v) u v T || F 2 = || A || F 2 - (  (u,v) ) 2

The Rectangular Quotients Equality Solving the least norm problem minimize F ( u, v ) = || A - u v T || F 2 is equivalent to solving maximizing  (u, v) = u T A v / || u || 2 || v || 2

Approximating a left singular vector Given a right singular vector, v 1, the corresponding left singular vector, u 1, is attained by solving the least norm problem minimize g ( u ) = || A - u v 1 T || F 2 That is, u 1 = A v 1 / v 1 T v 1. ( The rows of A are orthogonalized against v 1 T.)

Approximating a right singular vector Given a left singular vector, u 1, the corresponding right singular vector, v 1, is attained by solving the least norm problem minimize h ( v ) = || A – u 1 v T || F 2 That is, v 1 = A T u 1 / u 1 T u 1. (The columns of A are orthogonalized against u 1.)

Rectangular Iterations - Motivation The k th iteration, k = 1, 2, 3, …, starts with u k-1 and v k-1 and ends with u k and v k. Given v k-1 the vector u k is obtained by solving the problem minimize g(u) = || A - u v k-1 T || F 2. That is, u k = A v k-1 / v k-1 T v k-1. Then, v k is obtained by solving the problem minimize h(v) = || A - u k v T || F 2, which gives v k = A T u k / u k T u k.

Rectangular Iterations – Implementation The k th iteration, k = 1, 2, 3, …, u k = A v k-1 / v k-1 T v k-1, v k = A T u k / u k T u k. The sequence { v k / || v k || 2 } is obtained by applying the Power Method on the matrix A T A. The sequence { u k / || u k || 2 } is obtained by applying the Power Method on the matrix AA T.

Left Iterations u k = A v k-1 / v k-1 T v k-1, v k = A T u k / u k T u k v k T v k = v k T A T u k / u k T u k Right Iterations v k = A T u k-1 / u k-1 T u k-1, u k = A v k / v k T v k u k T u k = u k T A v k / v k T v k Can one see a difference?

Some Useful Relations In both cases we have u k T u k v k T v k = u k T A v k, || u k || 2 || v k || 2 = u k T A v k / || u k || 2 || v k || 2 =  (u k, v k ), and  (u k, v k ) = u k T A v k / u k T u k v k T v k = 1. The objective function F ( u, v ) = || A - u v T || F 2 satisfies F ( u k, v k ) = || A || F 2 - u k T u k v k T v k and F ( u k, v k ) - F ( u k+1, v k+1 ) = = u k+1 T u k+1 v k+1 T v k+1 - u k T u k v k T v k > 0

Convergence Properties Inherited from the Power Method, assuming  1 >  2. The sequences { u k / || u k || 2 } and { v k / || v k || 2 } converge at a linear rate, proportional to (  2 /  1 ) 2. { u k T u k v k T v k }  (  1 ) 2 at a linear rate, proportional to (  2 /  1 ) 4 Monotony : (  1 ) 2  u k+1 T u k+1 v k+1 T v k+1  u k T u k v k T v k > 0

Convergence Properties  k = || u k || 2 || v k || 2 provides a fair estimate of  1 within a few rectangular iterations !

Convergence Properties After a few rectangular iterations {  k, u k, v k } provides a fair estimate of a dominant triplet {  1, u 1, v 1 }.

Deflation by Subtraction A 1 = A =  1 u 1 v 1 T + … +  p u p v p T. A 2 = A 1 -  1 u 1 v 1 T =  2 u 2 v 2 T + … +  p u p v p T A 3 = A 2 -  2 u 2 v 2 T =  3 u 3 v 3 T + … +  p v p v p T. A k+1 = A k -  k u k v k T =  k+1 u k+1 v k+1 T +…+  p u p v p T.

Deflation by Subtraction A 1 = A A 2 = A 1 -  1 u 1 v 1 T A 3 = A 2 -  2 u 2 v 2 T. A k+1 = A k -  k u k v k T. where {  k, u k, v k } denotes a computed dominant singular triplet of A k.

The Main Motivation At the k th stage, k = 1, 2, …, a few rectangular iterations provide a fair estimate of a dominant triplet of A K.

Low - Rank Approximation Via Deflation  1   2  …   p  0, A =  1 u 1 v 1 T +  2 u 2 v 2 T + … +  p u p v p T. B 1 =  * 1 u * 1 v * 1 T ( * means computed values ) B 2 =  * 1 u * 1 v * 1 T +  * 2 u * 2 v * 2 T. B =  * 1 u * 1 v * 1 T +  * 2 u * 2 v * 2 T + …+  * u * v * T B is a low - rank approximation of order. ( Also called "truncated SVD“ or the “filtered part” of A. )

Low - Rank Approximation of Order A =  1 u 1 v 1 T +  2 u 2 v 2 T + … + s p u p v p T. B =  * 1 u * 1 v * 1 T +  * 2 u * 2 v * 2 T + …+  * u * v * T B = U  V T U = [u * 1, u * 2, …, u * ], V = [v * 1, v * 2, …, v * ],  = diag {   1,   2, …,   } ( * means computed values )

What About Orthogonality ? Does U T U = I and V T V = I ? The theory behind the Power Method suggests that the more accurate are the computed singular triplets the smaller is the deviation from orthogonality. Is there a difference ( regarding deviation from orthogonality ) between U and V ?

Orthogonality Properties ( Assuming exact arithmetic. ) Theorem 1 : Consider the case when each singular triplet, {  * j, u * j, v * j }, is computed by a finite number of "Left Iterations". ( At least one iteration for each triplet. ) In this case U T U = I and U T A = 0 regardless the actual number of iterations !

Left Iterations u k = A v k-1 / v k-1 T v k-1, v k = A T u k / u k T u k. Right Iterations v k = A T u k-1 / u k-1 T u k-1, u k = A v k / v k T v k. Can one see a difference?

Orthogonality Properties ( Assuming exact arithmetic. ) Theorem 2 : Consider the case when each singular triplet, {  * j, u * j, v * j }, is computed by a finite number of “Right Iterations". ( At least one iteration for each triplet. ) In this case V T V = I and A V = 0 regardless the actual number of iterations !

Finite Termination Assuming exact arithmetic, r = rank ( A ). Corollary : In both cases we have A = B r =  * 1 u * 1 v * 1 T + … +  * r u * r v * r T, regardless the number of iterations per singular triplet !

A New QR Decomposion Assuming exact arithmetic, r = rank ( A ). In both cases we obtain an effective “rank – revealing” QR decomposition A = U r  r V r T. In “Left Iterations” U r T U r = I. In “Right Iterations” V r T V r = I.

The Orthogonal Basis Problem Is to compute an orthogonal basis of Range ( A ). The Householder and Gram-Schmidt orthogonalizations methods use a “column pivoting for size” policy, which completely determine the basis.

The Orthogonal Basis Problem The new method, “Orthogonalization via Deflation”, has larger freedom in choosing the basis. At the k th stage, the ultimate choice for a new vector to enter the basis is u k, the k th left singular vector of A. ( But accurate computation of u k can be “too expensive”. )

The Main Theme At the kth stage, a few rectangular iterations are sufficient to provide a fair subtitute of u k.

Applications in Missing Data Reconstruction Consider the case when some entries of A are missing. * Missing Data in DNA Microarrays * Tables of Annual Rain Data * Tables of Water Levels in Observation Wells * Web Search Engines Standard SVD algorithms are unable to handle such matrices. The Minimum Norm Approach is easily adapted to handle matrices with missing entries.

A Modified Algorithm The objective function F ( u, v ) = || A - u v T || F 2 is redefined as F ( u, v ) =   ( a ij – u i v j ) 2, where the sum is restricted to known entries of A. ( As before, u = (u 1, u 2, …, u m ) T and v = (v 1, v 2, …, v n ) T denote the vectors of unknowns. )

The minimum norm approach Concluding Remarks : * Adds new insight into ‘old’ methods and concepts. * Fast Power methods. ( Relaxation methods, line search acceleration, etc. ) * Opens the door for new methods and concepts. ( The rectangular quotients equality, rectangular iterations, etc. ) * Orthogonalization via Deflation: A new QR decomposition. ( Low - rank approximations, Rank revealing. ) Capable of handling problems with missing data.