Intro to Linear Methods Reading: DH&S, Ch 5.{1-4,8} hip to be hyperplanar...

Slides:



Advertisements
Similar presentations
Least squares CS1114
Advertisements


Computer vision: models, learning and inference Chapter 8 Regression.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Motion Analysis Slides are from RPI Registration Class.
Chapter 5: Linear Discriminant Functions
Intro to Linear Methods Reading: Bishop, 3.0, 3.1, 4.0, 4.1 hip to be hyperplanar...
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Margins, support vectors, and linear programming Thanks to Terran Lane and S. Dreiseitl.
Linear methods: Regression & Discrimination Sec 4.6.
Margins, support vectors, and linear programming, oh my! Reading: Bishop, 4.0, 4.1, 7.0, 7.1 Burges tutorial (on class resources page)
Linear fits You know how to use the solver to minimize the chi^2 to do linear fits… Where do the errors on the slope and intercept come from?
Support Vector Machines Kernel Machines
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Linear Methods, cont’d; SVMs intro. Straw poll Which would you rather do first? Unsupervised learning Clustering Structure of data Scientific discovery.
SVMs Reprised Reading: Bishop, Sec 4.1.1, 6.0, 6.1, 7.0, 7.1.
Visual Recognition Tutorial
Linear and generalised linear models
Lecture 10: Support Vector Machines
Bayesian Learning 1 of (probably) 2. Administrivia Readings 1 back today Good job, overall Watch your spelling/grammar! Nice analyses, though Possible.
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
Bayesian Learning Part 3+/- σ. Administrivia Final project/proposal Hand-out/brief discussion today Proposal due: Mar 27 Midterm exam: Thurs, Mar 22 (Thurs.
Today Wrap up of probability Vectors, Matrices. Calculus
Collaborative Filtering Matrix Factorization Approach
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
COMP 116: Introduction to Scientific Programming Lecture 7: Matrices and Linear Systems.
Multiple Linear Regression - Matrix Formulation Let x = (x 1, x 2, …, x n )′ be a n  1 column vector and let g(x) be a scalar function of x. Then, by.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Scientific Computing Linear Least Squares. Interpolation vs Approximation Recall: Given a set of (x,y) data points, Interpolation is the process of finding.
HADOOP David Kauchak CS451 – Fall Admin Assignment 7.
IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are.
Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Pg. 97 Homework Pg. 109#24 – 31 (all algebraic), 32 – 35, 54 – 59 #30#31x = -8, 2 #32#33 #34#35 #36#37x = ½, 1 #38#39x = -1, 4 #40#41 #42 x = -3, 8#43y.
Today’s Topics 11/10/15CS Fall 2015 (Shavlik©), Lecture 22, Week 101 Support Vector Machines (SVMs) Three Key Ideas –Max Margins –Allowing Misclassified.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
MAT 2401 Linear Algebra 2.5 Applications of Matrix Operations
Geology 5670/6670 Inverse Theory 27 Feb 2015 © A.R. Lowry 2015 Read for Wed 25 Feb: Menke Ch 9 ( ) Last time: The Sensitivity Matrix (Revisited)
Review of statistical modeling and probability theory Alan Moses ML4bio.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
University of Colorado Boulder ASEN 5070 Statistical Orbit determination I Fall 2012 Professor George H. Born Professor Jeffrey S. Parker Lecture 9: Least.
Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Regression Machine Learning. Outline Regression vs Classification Linear regression – another discriminative learning method –As optimization 
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Support Vector Machines
MTH108 Business Math I Lecture 20.
Elementary Matrix Theory
MATH 1046 Introduction to Systems of Linear Equations (Sections 2
ASEN 5070: Statistical Orbit Determination I Fall 2014
CH 5: Multivariate Methods
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Classification Discriminant Analysis
Pattern Recognition PhD Course.
Support Vector Machines
Singular Value Decomposition
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Collaborative Filtering Matrix Factorization Approach
Ying shen Sse, tongji university Sep. 2016
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Mathematical Foundations of BME
Maths for Signals and Systems Linear Algebra in Engineering Lecture 6, Friday 21st October 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL.
Matrices.
Presentation transcript:

Intro to Linear Methods Reading: DH&S, Ch 5.{1-4,8} hip to be hyperplanar...

Then & now... Last time: k -NN geometry Bayesian decision theory -- Bayes optimal classifiers & Bayes error NN in daily life Today: Intro to linear methods Formulation Math The linear regression problem

Linear methods Both methods we’ve seen so far: Are classification methods Use intersections of linear boundaries What about regression? What can we do with a single linear surface? Not as much as you’d like but still surprisingly a lot Linear regression is the proto-learning method

Warning! Change of notation! I usually write x for the data vector y for the class var; y for the class vector In this section, your book uses x for the data vector y for the “augmented” data vector g(x) for the class; b for the class vector

Linear regression prelims Basic idea: assume g(x) is a linear function of x : Our job: find best to fit g(x) “as well as possible”

Linear regression prelims Basic idea: assume g(x) is a linear function of x : Our job: find best to fit g(x) “as well as possible” Note: double-uomega little curly squiggle...

Linear regression prelims Basic idea: assume g(x) is a linear function of x : Our job: find best to fit g(x) “as well as possible” Note: Stupid math fonts... double-uomega little curly squiggle...

Linear regression prelims By “as well as possible”, we mean here, minimum squared error: Loss function

Useful definitions Definition: A trick is a clever mathematical hack Definition: A method is a trick you use more than once

A helpful “method” Recall Want to be able to easily write Introduce “pseudo-feature” of x, Now have:

A helpful “method” Now have: And: So: And our “loss function” becomes:

Minimizing loss Finally, can write: Want the “best” set of a : the weights that minimize the above Q: how do you find the minimum of a function w.r.t. some parameter?

Minimizing loss Back up to the 1-d case Suppose you had the function: And wanted to find w that minimizes l() Std answer: take derivative, set equal to 0, and solve: To be sure of a min, check 2nd derivative too...

5 minutes of math... Some useful linear algebra identities: If A and B are matrices, (for invertible square matrices)

5 minutes of math... What about derivatives of vectors/matrices? There’s more than one kind... For the moment, we’ll need the derivative of a vector function with respect to a vector If x is a vector of variables, y is a vector of constants, and A is a matrix of constants, then:

Exercise Derive the vector derivative expressions: Find an expression for the minimum squared error weight vector, a, in the loss function:

The LSE method The quantity is called a Gram matrix and is positive semidefinite and symmetric The quantity is the pseudoinverse of Y May not exist if Gram matrix is not invertable The complete “learning algorithm” is 2 whole lines of Matlab code