Today Wrap up of probability Vectors, Matrices. Calculus

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

CS 450: COMPUTER GRAPHICS LINEAR ALGEBRA REVIEW SPRING 2015 DR. MICHAEL J. REALE.
Matrix Algebra Matrix algebra is a means of expressing large numbers of calculations made upon ordered sets of numbers. Often referred to as Linear Algebra.
3_3 An Useful Overview of Matrix Algebra
Maths for Computer Graphics
Lecture 9 Inexact Theories. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
Today Linear Regression Logistic Regression Bayesians v. Frequentists
ECIV 301 Programming & Graphics Numerical Methods for Engineers Lecture 12 System of Linear Equations.
ECIV 301 Programming & Graphics Numerical Methods for Engineers Lecture 14 Elimination Methods.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Chapter 5 Part II 5.3 Spread of Data 5.4 Fisher Discriminant.
Ch 7.2: Review of Matrices For theoretical and computation reasons, we review results of matrix theory in this section and the next. A matrix A is an m.
1 Neural Nets Applications Vectors and Matrices. 2/27 Outline 1. Definition of Vectors 2. Operations on Vectors 3. Linear Dependence of Vectors 4. Definition.
Unconstrained Optimization Problem
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Lecture 10: Support Vector Machines
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
Linear regression models in matrix terms. The regression function in matrix terms.
Matrix Approach to Simple Linear Regression KNNL – Chapter 5.
Matrix Definition A Matrix is an ordered set of numbers, variables or parameters. An example of a matrix can be represented by: The matrix is an ordered.
Arithmetic Operations on Matrices. 1. Definition of Matrix 2. Column, Row and Square Matrix 3. Addition and Subtraction of Matrices 4. Multiplying Row.
CE 311 K - Introduction to Computer Methods Daene C. McKinney
Stats & Linear Models.
Chapter 7 Matrix Mathematics Matrix Operations Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
: Appendix A: Mathematical Foundations 1 Montri Karnjanadecha ac.th/~montri Principles of.
Machine Learning Queens College Lecture 3: Probability and Statistics.
ECON 1150 Matrix Operations Special Matrices
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Slide Chapter 7 Systems and Matrices 7.1 Solving Systems of Two Equations.
Principles of Pattern Recognition
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.
CHAP 0 MATHEMATICAL PRELIMINARY
Statistics and Linear Algebra (the real thing). Vector A vector is a rectangular arrangement of number in several rows and one column. A vector is denoted.
Matrices Addition & Subtraction Scalar Multiplication & Multiplication Determinants Inverses Solving Systems – 2x2 & 3x3 Cramer’s Rule.
Matrices. Definitions  A matrix is an m x n array of scalars, arranged conceptually as m rows and n columns.  m is referred to as the row dimension.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam.
A Review of Some Fundamental Mathematical and Statistical Concepts UnB Mestrado em Ciências Contábeis Prof. Otávio Medeiros, MSc, PhD.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Introduction to Matrices and Matrix Approach to Simple Linear Regression.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Review of Matrix Operations Vector: a sequence of elements (the order is important) e.g., x = (2, 1) denotes a vector length = sqrt(2*2+1*1) orientation.
Machine Learning CUNY Graduate Center Lecture 2: Math Primer.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Linear System of Simultaneous Equations Warm UP First precinct: 6 arrests last week equally divided between felonies and misdemeanors. Second precinct:
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
Matrices. Matrix - a rectangular array of variables or constants in horizontal rows and vertical columns enclosed in brackets. Element - each value in.
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Matrix Algebra Definitions Operations Matrix algebra is a means of making calculations upon arrays of numbers (or data). Most data sets are matrix-type.
Matrices, Vectors, Determinants.
Lecture XXVI.  The material for this lecture is found in James R. Schott Matrix Analysis for Statistics (New York: John Wiley & Sons, Inc. 1997).  A.
Matrices. Variety of engineering problems lead to the need to solve systems of linear equations matrixcolumn vectors.
Lecture 1 Linear algebra Vectors, matrices. Linear algebra Encyclopedia Britannica:“a branch of mathematics that is concerned with mathematical structures.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
1 Matrix Math ©Anthony Steed Overview n To revise Vectors Matrices.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MTH108 Business Math I Lecture 20.
Introduction to Vectors and Matrices
Linear Algebra Lecture 2.
Review of Matrix Operations
Probability Theory and Parameter Estimation I
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
CH 5: Multivariate Methods
Ying shen Sse, tongji university Sep. 2016
PRELIMINARY MATHEMATICS
Matrices.
Introduction to Vectors and Matrices
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

Lecture 3: Math Primer II Machine Learning Andrew Rosenberg

Today Wrap up of probability Vectors, Matrices. Calculus Derivation with respect to a vector.

Properties of probability density functions Sum Rule Product Rule

Expected Values Given a random variable, with a distribution p(X), what is the expected value of X?

Multinomial Distribution If a variable, x, can take 1-of-K states, we represent the distribution of this variable as a multinomial distribution. The probability of x being in state k is μk

Expected Value of a Multinomial The expected value is the mean values.

Gaussian Distribution One Dimension D-Dimensions

Gaussians

How machine learning uses statistical modeling Expectation The expected value of a function is the hypothesis Variance The variance is the confidence in that hypothesis

Variance The variance of a random variable describes how much variability around the expected value there is. Calculated as the expected squared error.

Covariance The covariance of two random variables expresses how they vary together. If two variables are independent, their covariance equals zero.

Linear Algebra Vectors Matrices A one dimensional array. If not specified, assume x is a column vector. Matrices Higher dimensional array. Typically denoted with capital letters. n rows by m columns

Transposition Transposing a matrix swaps columns and rows.

Transposition Transposing a matrix swaps columns and rows.

Addition Matrices can be added to themselves iff they have the same dimensions. A and B are both n-by-m matrices.

Multiplication To multiply two matrices, the inner dimensions must be the same. An n-by-m matrix can be multiplied by an m-by-k matrix

Inversion The inverse of an n-by-n or square matrix A is denoted A-1, and has the following property. Where I is the identity matrix is an n-by-n matrix with ones along the diagonal. Iij = 1 iff i = j, 0 otherwise

Identity Matrix Matrices are invariant under multiplication by the identity matrix.

Helpful matrix inversion properties

Norm The norm of a vector, x, represents the euclidean length of a vector.

Positive Definite-ness Quadratic form Scalar Vector Positive Definite matrix M Positive Semi-definite Show that the quadratic form evaluates to a scalar. Why should we care about positive definite and semi definite?

Calculus Derivatives and Integrals Optimization

Derivatives A derivative of a function defines the slope at a point x.

Derivative Example

Integrals Integration is the inverse operation of derivation (plus a constant) Graphically, an integral can be considered the area under the curve defined by f(x)

Integration Example

Vector Calculus Derivation with respect to a matrix or vector Gradient Change of Variables with a Vector

Derivative w.r.t. a vector Given a vector x, and a function f(x), how can we find f’(x)?

Derivative w.r.t. a vector Given a vector x, and a function f(x), how can we find f’(x)?

Example Derivation

Example Derivation Also referred to as the gradient of a function.

Useful Vector Calculus identities Scalar Multiplication Product Rule

Useful Vector Calculus identities Derivative of an inverse Change of Variable

Optimization Have an objective function that we’d like to maximize or minimize, f(x) Set the first derivative to zero.

Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject to a constraint. Two functions: An objective function to maximize An inequality that must be satisfied

Lagrange Multipliers Find maxima of f(x,y) subject to a constraint.

General form Maximizing: Subject to: Introduce a new variable, and find a maxima.

Example Maximizing: Subject to: Introduce a new variable, and find a maxima.

Example Now have 3 equations with 3 unknowns.

Example Eliminate Lambda Substitute and Solve

Why does Machine Learning need these tools? Calculus We need to identify the maximum likelihood, or minimum risk. Optimization Integration allows the marginalization of continuous probability density functions Linear Algebra Many features leads to high dimensional spaces Vectors and matrices allow us to compactly describe and manipulate high dimension al feature spaces.

Why does Machine Learning need these tools? Vector Calculus All of the optimization needs to be performed in high dimensional spaces Optimization of multiple variables simultaneously – Gradient Descent Want to take a marginal over high dimensional distributions like Gaussians.

Next Time Linear Regression Then Regularization