Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH.

Slides:



Advertisements
Similar presentations
Elementary Linear Algebra Anton & Rorres, 9th Edition
Advertisements

Chapter Matrices Matrix Arithmetic
Applied Informatics Štefan BEREŽNÝ
Chapter 3 Determinants and Eigenvectors 大葉大學 資訊工程系 黃鈴玲 Linear Algebra.
APPENDIX A: REVIEW OF LINEAR ALGEBRA APPENDIX B: CONVEX AND CONCAVE FUNCTIONS V. Sree Krishna Chaitanya 3 rd year PhD student Advisor: Professor Biswanath.
Chapter 5 Orthogonality
Chapter 2 Basic Linear Algebra
Review of Matrix Algebra
Chapter 2 Matrices Definition of a matrix.
Ch 7.2: Review of Matrices For theoretical and computation reasons, we review results of matrix theory in this section and the next. A matrix A is an m.
1 Neural Nets Applications Vectors and Matrices. 2/27 Outline 1. Definition of Vectors 2. Operations on Vectors 3. Linear Dependence of Vectors 4. Definition.
Economics 2301 Matrices Lecture 13.
CALCULUS – II Matrix Multiplication by Dr. Eman Saad & Dr. Shorouk Ossama.
Linear regression models in matrix terms. The regression function in matrix terms.
Matrices and Determinants
Chapter 5 Determinants.
Arithmetic Operations on Matrices. 1. Definition of Matrix 2. Column, Row and Square Matrix 3. Addition and Subtraction of Matrices 4. Multiplying Row.
Stats & Linear Models.
3.8 Matrices.
Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012.
Compiled By Raj G. Tiwari
Spring 2013 Solving a System of Linear Equations Matrix inverse Simultaneous equations Cramer’s rule Second-order Conditions Lecture 7.
Linear Algebra Review 1 CS479/679 Pattern Recognition Dr. George Bebis.
1 February 24 Matrices 3.2 Matrices; Row reduction Standard form of a set of linear equations: Chapter 3 Linear Algebra Matrix of coefficients: Augmented.
ECON 1150 Matrix Operations Special Matrices
TH EDITION LIAL HORNSBY SCHNEIDER COLLEGE ALGEBRA.
 Row and Reduced Row Echelon  Elementary Matrices.
Matrix Inversion.
Chapter 2 Determinants. The Determinant Function –The 2  2 matrix is invertible if ad-bc  0. The expression ad- bc occurs so frequently that it has.
CHAP 0 MATHEMATICAL PRELIMINARY
Multivariate Statistics Matrix Algebra II W. M. van der Veld University of Amsterdam.
Digital Image Processing, 3rd ed. © 1992–2008 R. C. Gonzalez & R. E. Woods Gonzalez & Woods Matrices and Vectors Objective.
A rectangular array of numbers (we will concentrate on real numbers). A nxm matrix has ‘n’ rows and ‘m’ columns What is a matrix? First column First row.
Matrices & Determinants Chapter: 1 Matrices & Determinants.
1 © 2010 Pearson Education, Inc. All rights reserved © 2010 Pearson Education, Inc. All rights reserved Chapter 9 Matrices and Determinants.
Statistics and Linear Algebra (the real thing). Vector A vector is a rectangular arrangement of number in several rows and one column. A vector is denoted.
WEEK 8 SYSTEMS OF EQUATIONS DETERMINANTS AND CRAMER’S RULE.
ME 2304: 3D Geometry & Vector Calculus Dr. Faraz Junejo Gradient of a Scalar field & Directional Derivative.
8.1 Matrices & Systems of Equations
1 C ollege A lgebra Systems and Matrices (Chapter5) 1.
Matrices CHAPTER 8.1 ~ 8.8. Ch _2 Contents  8.1 Matrix Algebra 8.1 Matrix Algebra  8.2 Systems of Linear Algebra Equations 8.2 Systems of Linear.
Chapter 9 Matrices and Determinants Copyright © 2014, 2010, 2007 Pearson Education, Inc Determinants and Cramer’s Rule.
Elementary Linear Algebra Anton & Rorres, 9 th Edition Lecture Set – 02 Chapter 2: Determinants.
Chapter 3 Determinants Linear Algebra. Ch03_2 3.1 Introduction to Determinants Definition The determinant of a 2  2 matrix A is denoted |A| and is given.
A Review of Some Fundamental Mathematical and Statistical Concepts UnB Mestrado em Ciências Contábeis Prof. Otávio Medeiros, MSc, PhD.
EASTERN MEDITERRANEAN UNIVERSITY Department of Industrial Engineering Non linear Optimization Spring Instructor: Prof.Dr.Sahand Daneshvar Submited.
Chapter 6 Systems of Linear Equations and Matrices Sections 6.3 – 6.5.
College Algebra Sixth Edition James Stewart Lothar Redlin Saleem Watson.
CALCULUS – III Matrix Operation by Dr. Eman Saad & Dr. Shorouk Ossama.
Review of Matrix Operations Vector: a sequence of elements (the order is important) e.g., x = (2, 1) denotes a vector length = sqrt(2*2+1*1) orientation.
Copyright © 2007 Pearson Education, Inc. Slide 7-1.
Chapter 2 Determinants. With each square matrix it is possible to associate a real number called the determinant of the matrix. The value of this number.
Linear Algebra Chapter 2 Matrices.
Matrices and Matrix Operations. Matrices An m×n matrix A is a rectangular array of mn real numbers arranged in m horizontal rows and n vertical columns.
Instructor: Mircea Nicolescu Lecture 8 CS 485 / 685 Computer Vision.
STROUD Worked examples and exercises are in the text Programme 5: Matrices MATRICES PROGRAMME 5.
Section 2.1 Determinants by Cofactor Expansion. THE DETERMINANT Recall from algebra, that the function f (x) = x 2 is a function from the real numbers.
STROUD Worked examples and exercises are in the text PROGRAMME 5 MATRICES.
Boot Camp in Linear Algebra TIM 209 Prof. Ram Akella.
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Lecture XXVI.  The material for this lecture is found in James R. Schott Matrix Analysis for Statistics (New York: John Wiley & Sons, Inc. 1997).  A.
MATRICES A rectangular arrangement of elements is called matrix. Types of matrices: Null matrix: A matrix whose all elements are zero is called a null.
Matrices and Vector Concepts
Review of Matrix Operations
MATHEMATICS Matrix Algebra
Chapter 2 Determinants by Cofactor Expansion
Section 3.2 – Determinant of a SQUARE Matrix
Chapter 7: Matrices and Systems of Equations and Inequalities
Section 10.4 – Determinant of a SQUARE Matrix
Presentation transcript:

Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh Dr. M. N. H. MOLLAH

Outline  Differentiable Functions  Classification of Functions and Variables for Derivatives  Derivatives of Scalar Functions w. r. to Vector Variable  Derivative of Scalar Functions w. r. to a Matrix Variable  Derivatives of Vector Function w. r. to a Scalar Variable  Derivatives of Vector Function w. r. to a Vector Variable  Derivatives of Vector Function w. r. to a Matrix Variable  Derivatives of Matrix Function w. r. to a Scalar Variable  Derivatives of Matrix Function w. r. to a Vector Variable  Derivatives of Matrix Function w. r. to a Matrix Variable  Some Applications of Matrix Differential Calcul us Dr. M. N. H. MOLLAH

1. Differentiable Functions  A real-valued function where is an open set is said to be continuously differentiable if the partial derivatives exist for each and are continuous functions of x over X. In this case we write over X.  Generally, we write over X if all partial derivatives of order p exist and are continuous as functions of x over X.  If over R n, we simply write  If on X, the gradient of at a point is defined as Dr. M. N. H. MOLLAH

 If over X, the Hessian of at x is defined to be the symmetric matrix having as the ijth element  If where then is represented by the column vector of its component functions as  If X is open, we can write on X if on X. Then the derivative of the vector function with respect to the vector variable x is defined by Dr. M. N. H. MOLLAH

 If is real-valued functions of (x,y) where we write Dr. M. N. H. MOLLAH

 If then  For and consider the function defined by Then if the chain rule of differentiation is stated as Dr. M. N. H. MOLLAH

2. Classification of Functions and Variables for Derivatives Let us consider scalar functions g, vector functions ƒ and matrix functions F. Each of these may depend on one real variable x, a vector of real variables x, or a matrix of real variables X. We thus obtain the classification of function and variables shown in the following Table. TableScalar Variable Vector variable Matrix variable Scalar function Vector function Matrix function Dr. M. N. H. MOLLAH

Some Examples of Scalar, Vector and Matrix Functions Dr. M. N. H. MOLLAH

Consider a scalar valued function ‘g’ of ‘m’ variables g = g(x 1, x 2,…, x m ) = g(x), where x = (x 1, x 2,…, x m ) /. Assuming function g is differentiable, then its vector gradient with respect to x is the m-dimensional column vector of partial derivatives as follows 3. Derivatives of Scalar Functions w. r. to Vector Variable 3.1 Definition Dr. M. N. H. MOLLAH

Consider the simple linear functional of x as where is a constant vector. Then the gradient of g is w. r. to x is given by Also we can write it as Because the gradient is constant (independent of x), the Hessian matrix of is zero. 3.2 Example 1 Dr. M. N. H. MOLLAH

Example 2 Consider the quadratic form where A=(a ij ) is a m x m square matrix. Then the gradient of g(x) w.r. to x is given by Dr. M. N. H. MOLLAH

Then the second- order gradient or Hessian matrix of g(x)=x / Ax w. r. to x becomes Dr. M. N. H. MOLLAH

For computing the gradients of products and quotients of functions, as well as of composite functions, the same rules apply as for ordinary functions of one variable. Thus The gradient of the composite function f(g(x)) can be generalized to any number of nested functions, giving the same chain rule of differentiation that is valid for functions if one variable. 3.3 Some useful rules for derivative of scalar functions w. r. to vectors Dr. M. N. H. MOLLAH

3.3 Fundamental Rules for Matrix Differential Calculus Dr. M. N. H. MOLLAH

3.4 Some useful derivatives of scalar functions w. r. to a vector variable Dr. M. N. H. MOLLAH

Consider a scalar-valued function f of the elements of a matrix X=(x ij ) as f = f(X) = f(x 11, x 12,… x ij,..., x mn ) Assuming that function f is differentiable, then its matrix gradient with respect to X is the m×n matrix of partial derivatives as follows 4. Derivative of Scalar Functions w. r. to a Matrix Variable 4.1 Definition Dr. M. N. H. MOLLAH

The trace of a matrix is a scalar function of the matrix elements. Let X=(x ij ) is an m x m square matrix whose trace is denoted by tr (X). Then Proof: The trace of X is defined by Taking the partial derivatives of tr (X) with respect to one of the elements, say x ij, gives 4.2 Example 1 Dr. M. N. H. MOLLAH

Thus we get, Dr. M. N. H. MOLLAH

The determinant of a matrix is a scalar function of the matrix elements. Let X=(x ij ) is an m x m invertible square matrix whose determinant is denoted |X|. Then Proof: The inverse of a matrix X is obtained as where adj(X) is known as the adjoint matrix of X. It is defined by where C ij =(-1) i+j M ij is the cofactor w. r. to x ij and M ij is the minor w. r. to x ij. 4.2 Example 2 Dr. M. N. H. MOLLAH

The minor M ij is obtained by first taking the (n-1) x (n-1) sub-matrix of X that remains when the i-th row and j-th column of X are removed, then computing the determinant of this sub-matrix. Thus the determinant |X| can be expressed in terms of the cofactors as follows Row i can be any row and the result is always the same. In the cofactors C ik none of the matrix elements of the i-th row appear, so the determinant is a linear function of these elements. Taking now a partial derivatives of |X| with respect to one of the elements, say x ij, gives Dr. M. N. H. MOLLAH

Thus we get, This also implies that Dr. M. N. H. MOLLAH

4.3 Some useful derivatives of scalar functions w.r.to matrix Dr. M. N. H. MOLLAH

Derivatives of trace w.r.to matrix Dr. M. N. H. MOLLAH

Dr. M. N. H. MOLLAH

Derivatives of determinants w.r.to matrix Dr. M. N. H. MOLLAH

Consider the vector valued function ‘f’ of a scalar variable x as f(x)=[f 1 (x), f 2 (x),…, f n (x) ] / Assuming function f is differentiable, then its scalar gradient with respect to x is the n-dimensional row vector of partial derivatives as follows 5. Derivatives of Vector Function w. r. to a Scalar Variable 5.1 Definition Dr. M. N. H. MOLLAH

Let Then the gradient of f with respect to x is given by Also we can write it as 5.2 Example Dr. M. N. H. MOLLAH

Consider the vector valued function ‘f’ of a vector variable x=(x 1, x 2, …, x m ) / as f(x)= y =[y 1 = f 1 (x), y 2 = f 2 (x),…, y n = f n (x) ] / Assuming function f is differentiable, then its vector gradient with respect to x is the m×n matrix of partial derivatives as follows 6. Derivatives of Vector Function w. r. to a Vector Variable 6.1 Definition Dr. M. N. H. MOLLAH

Let Then the gradient of f (x) with respect to x is given by 6.2 Example Dr. M. N. H. MOLLAH

Consider the vector valued function ‘f’ of a matrix variable X=(x ij ) of order m×n as f(X)= y =[y 1 = f 1 (X), y 2 = f 2 (X),…, y q = f q (X) ] / Assuming that function f is differentiable, then its matrix gradient with respect to X is the mn×q matrix of partial derivatives as follows 7. Derivatives of Vector Function w. r. to a Matrix Variable 7.1 Definition Dr. M. N. H. MOLLAH

Let Then the gradient of f (X) w. r. to matrix variable X is given by 7.2 Example Dr. M. N. H. MOLLAH

Consider the matrix valued function ‘F’ of a scalar variable x as F(x)= Y =[y ij = f ij (x)] m×n Assuming that function F is differentiable, then its scalar gradient with respect to the scalar x is the m×n order matrix of partial derivatives as follows 8. Derivatives of Matrix Function w. r. to a Scalar Variable 8.1 Definition Dr. M. N. H. MOLLAH

Let Then the gradient of F (x) w. r. to scalar variable x is given by 8.2 Example Dr. M. N. H. MOLLAH

Consider the matrix valued function ‘F’ of a vector variable x=(x 1,x 2,…,x m ) as F(x)= Y =[y ij = f ij (x)] n×q Assuming that function F is differentiable, then its vector gradient with respect to the vector x is the m×nq order matrix of partial derivatives as follows 9. Derivatives of Matrix Function w. r. to a Vector Variable 9.1 Definition Dr. M. N. H. MOLLAH

Let Then the gradient of F (x) w. r. to scalar variable x is given by 9.2 Example Dr. M. N. H. MOLLAH

Consider the matrix valued function ‘F’ of a matrix variable X=(x ij ) m×p as F(X)= Y =[y ij = f ij (X)] n×q Assuming that function F is differentiable, then its matrix gradient with respect to the matrix X is the mp×nq order matrix of partial derivatives as follows 10.Derivatives of Matrix Function w. r. to a Matrix Variable 10.1 Definition Dr. M. N. H. MOLLAH

Let Then the gradient of F (X) w. r. to scalar variable X is given by 10.2 Example Dr. M. N. H. MOLLAH

Dr. M. N. H. MOLLAH

Some important rules for matrix differentiation Dr. M. N. H. MOLLAH

Homework's Dr. M. N. H. MOLLAH

11. Some Applications of Matrix Differential Calculus 1. Test of independence between functions 2. Expansion of Tailor series 3. Transformations of Multivariate Density functions 4. Multiple integrations 5. And so on. Dr. M. N. H. MOLLAH

Test of Independence A set of functions are said to be correlated of each other if their Jacobian is zero. That is Example: Show that the functions are not independent of one another. Show that Proof: So the functions are not independent. Dr. M. N. H. MOLLAH

In deriving some of the gradient type learning algorithms, we have to resort to Taylor series expansion of a function g(x) of a scalar variable x, (3.19) We can do a similar expansion for a function g(x)=g(x 1, x 2,…, x m ) of m variables. We have (3.20) Where the derivatives are evaluated at the point x. The second term is the inner product of the gradient vector with the vector x-x, and the third term is a quadratic form with the symmetric Hassian matrix (∂2g / ∂x 2 ).The truncation error depends on the distance |x-x|; the distance has to be small, if g(x)is approximated using only the first and second-order terms. Taylor series expansions of multivariate functions Dr. M. N. H. MOLLAH

The same expansion can be made for a scalar function of a matrix variable. The second order term already becomes complicated because the second order gradient is a four- dimension tensor. But we can easily extend the first order term in (3.20), the inner product of the gradient with the vector x-x to the matrix case. Remember that the vector inner product is define as For the matrix case, this must become the sum. Taylor series expansions of multivariate functions Dr. M. N. H. MOLLAH

This is the sum of the products of corresponding elements, just like in the vectorial inner product. This can be nicely presented in matrix form when we remember that for any two matrices, say A and B. With obvious notation. So, we have (3.21) for the first two terms in the Taylor series of a function g of a matrix variable. Taylor series expansions of multivariate functions Dr. M. N. H. MOLLAH