Scientific Computing Chapter 3 - Linear Least squares

Slides:



Advertisements
Similar presentations
5.4 Basis And Dimension.
Advertisements

5.1 Real Vector Spaces.
Linear Algebra Applications in Matlab ME 303. Special Characters and Matlab Functions.
Algebraic, transcendental (i.e., involving trigonometric and exponential functions), ordinary differential equations, or partial differential equations...
Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft.
MATH 685/ CSI 700/ OR 682 Lecture Notes
Solving Linear Systems (Numerical Recipes, Chap 2)
Systems of Linear Equations
Chapter 2 Matrices Finite Mathematics & Its Applications, 11/e by Goldstein/Schneider/Siegel Copyright © 2014 Pearson Education, Inc.
Lesson 8 Gauss Jordan Elimination
Solution of linear system of equations
Chapter 9 Gauss Elimination The Islamic University of Gaza
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Linear Algebraic Equations
1cs542g-term Notes  Simpler right-looking derivation (sorry):
Chapter 5 Orthogonality
1cs542g-term Notes  r 2 log r is technically not defined at r=0 but can be smoothly continued to =0 there  Question (not required in assignment):
Goldstein/Schnieder/Lay: Finite Math & Its Applications, 9e 1 of 86 Chapter 2 Matrices.
Ch 7.9: Nonhomogeneous Linear Systems
Math for CSLecture 41 Linear Least Squares Problem Over-determined systems Minimization problem: Least squares norm Normal Equations Singular Value Decomposition.
Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.
Matrices and Systems of Equations
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Chapter 2 Matrices Definition of a matrix.
1 Neural Nets Applications Vectors and Matrices. 2/27 Outline 1. Definition of Vectors 2. Operations on Vectors 3. Linear Dependence of Vectors 4. Definition.
Ordinary least squares regression (OLS)
MOHAMMAD IMRAN DEPARTMENT OF APPLIED SCIENCES JAHANGIRABAD EDUCATIONAL GROUP OF INSTITUTES.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
Finite Mathematics & Its Applications, 10/e by Goldstein/Schneider/SiegelCopyright © 2010 Pearson Education, Inc. 1 of 86 Chapter 2 Matrices.
Chapter 10 Real Inner Products and Least-Square (cont.)
5  Systems of Linear Equations: ✦ An Introduction ✦ Unique Solutions ✦ Underdetermined and Overdetermined Systems  Matrices  Multiplication of Matrices.
SVD(Singular Value Decomposition) and Its Applications
1 1.1 © 2012 Pearson Education, Inc. Linear Equations in Linear Algebra SYSTEMS OF LINEAR EQUATIONS.
CHAPTER SIX Eigenvalues
Computational Methods in Physics PHYS 3437 Dr Rob Thacker Dept of Astronomy & Physics (MM-301C)
Systems and Matrices (Chapter5)
Eigenvalue Problems Solving linear systems Ax = b is one part of numerical linear algebra, and involves manipulating the rows of a matrix. The second main.
BMI II SS06 – Class 3 “Linear Algebra” Slide 1 Biomedical Imaging II Class 3 – Mathematical Preliminaries: Elementary Linear Algebra 2/13/06.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
MA2213 Lecture 5 Linear Equations (Direct Solvers)
Systems of Linear Equation and Matrices
MATH 250 Linear Equations and Matrices
CHAPTER FIVE Orthogonality Why orthogonal? Least square problem Accuracy of Numerical computation.
Presentation by: H. Sarper
ΑΡΙΘΜΗΤΙΚΕΣ ΜΕΘΟΔΟΙ ΜΟΝΤΕΛΟΠΟΙΗΣΗΣ 4. Αριθμητική Επίλυση Συστημάτων Γραμμικών Εξισώσεων Gaussian elimination Gauss - Jordan 1.
SVD: Singular Value Decomposition
Matrices CHAPTER 8.1 ~ 8.8. Ch _2 Contents  8.1 Matrix Algebra 8.1 Matrix Algebra  8.2 Systems of Linear Algebra Equations 8.2 Systems of Linear.
Chapter 3 Solution of Algebraic Equations 1 ChE 401: Computational Techniques for Chemical Engineers Fall 2009/2010 DRAFT SLIDES.
Orthogonalization via Deflation By Achiya Dax Hydrological Service Jerusalem, Israel
Elementary Linear Algebra Anton & Rorres, 9th Edition
Solving Linear Systems of Equations
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 4. Least squares.
Elementary Linear Algebra Anton & Rorres, 9 th Edition Lecture Set – 07 Chapter 7: Eigenvalues, Eigenvectors.
Eigenvalues The eigenvalue problem is to determine the nontrivial solutions of the equation Ax= x where A is an n-by-n matrix, x is a length n column.
Chapter 9 Gauss Elimination The Islamic University of Gaza
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Linear Systems Dinesh A.
Instructor: Mircea Nicolescu Lecture 8 CS 485 / 685 Computer Vision.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Unit #1 Linear Systems Fall Dr. Jehad Al Dallal.
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Matrices, Vectors, Determinants.
Chapter 12: Data Analysis by linear least squares Overview: Formulate problem as an over-determined linear system of equations Solve the linear system.
Section 8 Numerical Analysis CSCI E-24 José Luis Ramírez Herrán October 20, 2015.
5 Systems of Linear Equations and Matrices
Chapter 10: Solving Linear Systems of Equations
Singular Value Decomposition
~ Least Squares example
~ Least Squares example
Presentation transcript:

Scientific Computing Chapter 3 - Linear Least squares Guanghua tan guanghuatan@gmail.com School of computer and communication, HNU

Outline Least Squares Data Fitting Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems

Method of Least Squares Measurement errors are inevitable in observational and experimental sciences Errors can be smoothed out by averaging over many cases, i.e., taking more measurements than are strictly necessary to determine parameters of system Resulting system is over-determined, so usually there is no exact solution

Example: Fitting a plane A plane can be determined by 3 points Scan points may contain many points Plane equation

Linear Least Squares For linear problems, we obtain over- determined linear system Ax = b, with m×n matrix A, m > n System is better written , since equality is not exactly satisfiable when m>n Least squares solution x minimizes squared Euclidean norm of residual vector

Data Fitting Given m data points (ti, yi), find n-vector x of parameters that gives “best fit” to model function f(t, x) Problem is linear if function f is linear in components of x, where functions depend only on t

Data Fitting Polynomial fitting is linear, since polynomial linear in coefficients, though nonlinear in independent variable t Fitting sum of exponentials is example of nonlinear problem For now, we will consider only linear least squares problems

Example: Data Fitting Fitting quadratic polynomial to five data points gives linear squares problem

Example: Data Fitting For data over-determined 5×3 linear system is Solution, we will see how to compute later, is So approximating polynomial is

Outline Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems

Existence and Uniqueness Linear least squares problem always has solution Solution is unique, if and only if, columns of A are linearly independent, i.e., rank(A) = n, where A is m × n If rank(A) < n, then A is rank-deficient, and solution of linear least squares problem is not unique For now we assume A has full column rank n

Necessary condition for a minimum: Normal Equations Necessary condition for a minimum: To minimize squared Euclidean norm of residual vector take derivative with respect to x and set it to 0 which reduces to n×n linear system of normal equations

Geometric Interpretation Vector v1 and v2 are orthogonal if their product is zero, Space spanned by columns of m×n matrix A, , is of dimension at most n if m>n, b generally does not lie in span(A), so there is no exact solution to Ax = b

Geometric Interpretation Vector y = Ax in span(A) closest to b in 2- norm occurs when residual r = b – Ax is orthogonal to span(A) again giving system of normal equations The solution to the least squares problem is

Pseudo-inverse and Condition Number Non-square m×n matrix A has no inverse in usual sense If rank(A) = n, pseudo-inverse is defined by and condition number by By convention,

Condition Number Just as condition number of square matrix measures closeness to singularity, condition number of rectangular matrix measures closeness to rank deficiency Least squares solution of is given by

Sensitivity and Conditioning Sensitivity of least squares solution to depends on b as well as A Define angle θ between b and y = Ax by Bound on perturbation ∆x in solution x due to perturbation ∆b in b is given by

Sensitivity and Conditioning Similarly, for perturbation E in matrix A, Condition number of least squares solution is about cond(A) if residual is small, but can be squared or arbitrarily worse for large residual

Outline Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems

Normal Equations Method If m×n matrix A has rank n, then symmetric n×n matrix ATA is positive definite, so its Cholesky factorization can be used to obtain solution x to system of normal equations which has same solution as linear squares problem Normal equations method involves transformations rectangular square triangular

Example: Normal Equations Method For polynomial data-fitting example given previously, normal equations method gives

Example, continued Cholesky factorization of symmetric positive definite matrix gives Solving lower triangular system Lz = ATb by forward-substitution gives Solving upper triangular system LTx = z by back- substitution gives

Shortcomings of Normal Equations Information can be lost in forming ATA and ATb For example, take where is positive number smaller than Then in floating-point arithmetic which is singular

Shortcomings of Normal Equations Sensitivity of solution is also worsened, since

Augmented System Method Definition of residual together with orthogonality requirement give (m+n)×(m+n) augmented system Augmented system is symmetric but not positive definite, is larger than original system, and requires storing two copies of A But it allows greater freedom in choosing pivots in computing LU factorization Another way to transform a least squares problem into a square linear system is to embed it in a larger system. The definition of the residual vector r, together with the requirement that the residual to the columns of A, gives the system of two equatiions

Augmented System Introducing scaling parameter gives system which allows control over relative weights of two subsystems in choosing pivots Reasonable rule of thumb is to take Augmented system is sometimes useful, but is far from ideal in work and storage required

Orthogonal Transformation We seek alternative method that avoids numerical difficulties of normal equations We need numerically robust transformation that produces easier problem without changing solution What kind of transformation leaves least squares solution unchanged? In view f the potential numerical difficulties with the normal equations approach, we need an alternative that does not require formation of the cross-product matrix and right-hand-side vector. Thus, we seek a more numerically robust type of transformation that will yield a simpler problem whose solution is the same as that of the original least square problem but is more easily computed As with square linear system

Orthogonal Transformations Square matrix Q is orthogonal if Multiplication of vector by orthogonal matrix preserves Euclidean norm Thus, multiplying both sides of least squares problem by orthogonal matrix does not change its solution

Triangular Least Squares As with square linear systems, suitable target in simplifying least squares problems is triangular form Upper triangular over-determined (m>n) least squares problem has form where R is nxn upper triangular and b is partitioned similarly Residual is

Triangular Least Squares We have no control over second term, , but first term becomes zero if x satisfies n×n triangular system which can be solved by back-substitution Resulting x is least squares solution, and minimum sum of squares is

Triangular Least Squares So our strategy is to transform general least squares problem to triangular form using orthogonal transformation so that least squares solution is preserved

QR Factorization Given m×n matrix A, with m>n, we seek m×m orthogonal matrix Q such that where R is n×n and upper triangular Linear least squares problem is then transformed into triangular least squares problem which has same solution, since

Orthogonal Bases If we partition m×m orthogonal matrix Q = [Q1 Q2], where Q1 is m×n, then is called reduced QR factorization of A Columns of Q1 are orthonomal basis for span(A) Solution to least squares problem is given by solution to square system

Computing QR Factorization To compute QR factorization of m×n matrix A, with m>n, we annihilate subdiagonal entries of successive columns of A, eventually reaching upper triangular form Similar to LU factorization by Gaussian elimination, but use orthogonal transformations instead of elementary elimination matrices

Computing QR Factorization Possible methods include Householder transformations Givens rotations Gram-Schmidt orthogonalization We will focus mainly on the use of Householder transformations, which is the most popular and generally the most effective approach in this context, but we will sketch the other two methods as well

Householder Transformations Householder transformation has form for nonzero vector v H is orthogonal and symmetric: Given vector a, we want to choose v so that

Householder transformation

Householder Transformation Substituting into formula for H, we can take And , with sign chosen to avoid cancellation, i.e.

Example: Householder Transformation If , then we take where Since a1 is positive, we choose negative sign for to avoid cancellation, so

Example: Householder Transformation To confirm that transformation works,

Householder QR Factorization To compute QR factorization of A, use householder transformation to annihilate sub-diagonal entries of each successive column Each Householder transformation is applied to entire matrix, but does not affect prior columns, so zeros are preserved Process just described produces factorization where R is n×n and upper triangular

Householder QR Factorization If , then In applying Householder transformation H to arbitrary vector u which is much cheaper than general matrix-vector multiplication and requires only vector v, not full matrix H

Implementation of Householder QR Factorization Pseudo description For k = 1 to n {loop over columns} { compute Householder vector for current col} if βk = 0 then { skip current column if continue with next k it’s already zero } for j = k to n { apply transformation to remaining sub-matrix } end

Householder QR Factorization To preserve solution of linear least squares problem, right-hand side b is transformed by same sequence of Householder transformations Then solve triangular least squares problem For solving linear least squares problem, product Q of Householder transformations need not be formed explicitly

Householder QR Factorization R can be stored in upper triangle of array initially containing A Householder vectors v can be stored in (now zero) lower triangular portion of A (almost) Householder transformations most easily applied in this form anyway

Example: Householder QR Factorization For polynomial data fitting example given previously, with Householder vector v1 for annihilating subdiagonal entries of first column of A is

Example, continued Applying resulting Householder transformation H1 yields transformed matrix and right-hand side Householder vector v2 for annihilating subdiagonal entries of second column of H1A is

Example, continued Applying resulting Householder transformation H2 yields Householder vector v3 for annihilating subdiagonal entries of third column of H2H1A

Example, continued Applying resulting Householder transformation H3 yields Now solve upper triangular system by back-substitution to obtain

Givens rotations Given rotations introduce zeros one at a time Given vector , choose scalars c and s so that Previous equation can be rewritten

Givens Rotations Gaussian elimination yields triangular system Back-substitution then gives Finally, , implies

Example: Givens Rotation Let , to annihilate second entry we compute cosine and sine Rotation is then given by To confirm that rotation works,

Givens QR Factorization More generally, to annihilate selected component of vector in n dimensions, rotate target component with another component By systematically annihilating successive entries, we can reduce matrix to upper triangular form using sequence of Given rotations

Givens QR Factorization Each rotation is orthogonal, so their product is orthogonal, producing QR factorization Straightforward implementation of Givens method requires about 50% more work than Householder method, and also requires more storage, since each rotation requires two numbers, c and s, to define it

Givens QR Factorization These disadvantages can be overcome, but requires more complicated implementation Givens can be advantageous for computing QR factorization when many entries of matrix are already zero, since those annihilations can be skipped

Gram-Schmidt Orthogonalization Given vectors a1 and a2, we seek orthonormal vectors q1 and q2 having same span This can be accomplished by subtracting from second vector its projection onto first vector and normalizing both resulting vectors, as shown in diagram

Gram-Schmidt Orthogonalization Process can be extended to any number of vectors , orthogonalizing each successive vector against all preceding ones, giving classical Gram-Schmidt Resulting qk and rjk form reduced QR factorization of A

Modified Gram-Schmidt Classical Gram-Schmidt procedure often suffers loss of orthogonality in finite-precision Also, separate storage is required for A, Q, and R, since original ak are need in inner loop, so qk cannot overwrite columns of A Both deficiencies are improved by modified Gram-Schmidt procedure, with each vector orthogonalized in turn against all subsequent vectors, so qk can overwrite ak

Modified Gram-Schmidt Factorization Modified Gram-Schmidt algorithm

Rank Deficiency If rank(A)<n, then QR factorization still exists, but yields singular upper triangular factor R, and multiple vectors x give minimum residual norm Common practice selects minimum residual solutions x having smallest norm Can be computed by QR factorization with column pivoting or by singular value decomposition (SVD) Rank of matrix is often not clear cut in practice, so relative tolerance is used to determine rank So far we have assumed that A is of full column ran, i.e. rank(A)=n. If this is not the case, i.e., if A has linearly dependent columns, then the QR factorization still exists, but the upper triangular factor R is singular This situation usually arises from a poorly designed experiment, insufficient data, or inadequare or redundamt model

Example: Near Rank Deficiency Consider 3x2 matrix Computing QR factorization R is extremely close to singular (exactly singular to 3-digit accuracy of problem statement)

Example: Near Rank Deficiency If R is used to solve linear least squares problem, result is highly sensitive to perturbations in right-hand side For practical purpose, rank(A)=1 rather than 2, because columns are nearly linearly dependent

QR with Column Pivoting Instead of processing columns in natural order, select for reduction at each stage column of remaining unreduced submatrix having maximum Euclidean norm If rank(A)=k<n, then after k steps, norms of remaining unreduced columns will be zero (or “negligible” in finite-precision arithmetic) below row k Yields orthogonal factorization of form Where R is kxk, upper triangular, and nonsingular, and permutation matrix that performs column interchanges

QR with Column Pivoting Basic Solution to least squares problem can now be computed by solving triangular system , where c1 contains first k components of QTb, and then taking Minimum-norm solution can be computed, if desired, at expense of additional processing to annihilate S Basic solution, i.e, a solution having at most k nonzero components

QR with Column Pivoting Rank(A) is usually unknown, so rank is determined by monitoring norms of remaining unreduced columns and terminating factorization when maximum value falls below chosen tolerance

Singular Value Decomposition Singular value decomposition (SVD) of m×n matrix A has form where U is m×m orthogonal matrix, V is n×n orthogonal matrix, and ∑ is m×n diagonal matrix, with

Singular Value Decomposition Diagonal entries , called singular values of A, are usually ordered so that Columns ui of U and vi of V are called left and right singular vectors Computation of SVD will be discussed in next chapter, here only covers the application of SVD

Example: SVD SVD of is given by

Applications of SVD Minimum norm solution to is given by For ill-conditioned or rank deficient problems, “small” singular values can be omitted from summation to stabilize solution Euclidean matrix norm: Euclidean condition number of matrix: Rank of matrix: number of nonzero singular values

Pseudo-inverse Define pseudo-inverse of ( possibly rectangular ) diagonal matrix by transposing and taking scalar pseudo-inverse of each entry Then pseudo-inverse of general real m×n matrix A is given by Pseudo-inverse always exists whether or not matrix is square or has full rank If A is square and nonsingular, then In all cases, minimum-norm solution to is given by

Lower-Rank Matrix Approximation Another way to write SVD is with Ei has rank 1 and can be stored using only m+n storage locations Condensed approximation to A is obtained by omitting from summation terms corresponding to small singular values Approximation using k largest singular values is closest matrix of rank k to A Approximation is useful in image processing, data compression , information retrieval, cryptography, etc

Total Least Squares Ordinary least squares is applicable when right-hand side b is subject to random error but matrix A is known accurately When all data, including A, are subject to error, then total least squares is more appropriate Total least squares minimizes orthogonal distances, rather than vertical distances, between model and data

Total least squares solution can be computed from SVD of [A, b]

Comparison of Methods Forming normal equations matrix ATA requires about n2m/2 multiplications, and solving resulting symmetric linear system requires about n3/6 multiplications Solving least squares problem using Householder QR factorization requires about mn2-n3/3 multiplications If m ≈ n, both methods require about same amount of work If m>>n, Householder QR requires about twice as much as normal equations The choice among them depends on the particular problem being solved and involves tradeoffs among efficiency, accuracy and reliability

Comparison of Methods Cost of SVD is proportional to mn2+n3, with proportionality constant ranging from 4 to 10, depending on algorithms used Normal equations method produces solution whose relative error is proportional to [cond(A)]2 Required Cholesky factorization can be expected to break down if or worse

Comparison of Methods Householder method produces solution whose relative error is proportional to which is best possible, since is inherent sensitivity of solution to least squares problem Householder method can be expected to break down (in back-substitution phase) only if cond(A) or worse

Comparison Methods Householder is more accurate and more broadly applicable than normal equations These advantages may not be worth additional cost, however, when problem is sufficiently well conditioned that normal equations provide sufficient accuracy For rank-deficient or nearly rank-deficient problems, Householder with column pivoting can produce useful solution when normal equations method fails outright SVD is even more robust and reliable than Householder, but substantially more expensive

Time Adjustment I want to adjust the class next Thursday to next Tuesday Is there any time available? Morning or afternoon?