Linear Models Tony Dodd. 24-25 January 2007An Overview of State-of-the-Art Data Modelling Overview Linear models. Parameter estimation. Linear in the.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access.
Machine Learning and Data Mining Linear regression
CHAPTER 10: Linear Discrimination
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Machine Learning Week 2 Lecture 1.
Computer vision: models, learning and inference
CMPUT 466/551 Principal Source: CMU
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Chapter 4: Linear Models for Classification
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University
Visual Recognition Tutorial
Kernel methods - overview
x – independent variable (input)
Maximum likelihood (ML) and likelihood ratio (LR) test
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Motion Analysis (contd.) Slides are from RPI Registration Class.
Linear Methods for Classification
An Overview of State-of- the-Art Data Modelling Introduction.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Sparse Kernels Methods Steve Gunn.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.
Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,
Linear and generalised linear models
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Maximum likelihood (ML)
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
EM and expected complete log-likelihood Mixture of Experts
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 2B *Courtesy of Associate Professor Andrew Ng’s Notes, Stanford University.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
CS Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept
Factor Analysis Psy 524 Ainsworth. Assumptions Assumes reliable correlations Highly affected by missing data, outlying cases and truncated data Data screening.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
M Machine Learning F# and Accord.net.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.
Machine Learning 5. Parametric Methods.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.
Computacion Inteligente Least-Square Methods for System Identification.
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Linear Models Tony Dodd. 21 January 2008Mathematics for Data Modelling: Linear Models Overview Linear models. Parameter estimation. Linear in the parameters.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Background on Classification
CSE 4705 Artificial Intelligence
10701 / Machine Learning.
The Elements of Statistical Learning
Probabilistic Models for Linear Regression
Roberto Battiti, Mauro Brunato
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
Project 1 Binary Classification
Linear regression Fitting a straight line to observations.
Biointelligence Laboratory, Seoul National University
Deep Learning for Non-Linear Control
Linear Discrimination
Presentation transcript:

Linear Models Tony Dodd

24-25 January 2007An Overview of State-of-the-Art Data Modelling Overview Linear models. Parameter estimation. Linear in the parameters. Classification. The nonlinear bits.

24-25 January 2007An Overview of State-of-the-Art Data Modelling Linear models Linear model has general form where is the th component of input. Assume and therefore is the bias. Can represent lines and planes. Should ALWAYS try a linear model first!

24-25 January 2007An Overview of State-of-the-Art Data Modelling Parameter estimation Least squares estimation. Choose parameters that minimise Unique minimum… Optimum when noise is Gaussian.

24-25 January 2007An Overview of State-of-the-Art Data Modelling Least squares cost function

24-25 January 2007An Overview of State-of-the-Art Data Modelling Least squares parameters Define the design matrix Then the optimal parameters given by

24-25 January 2007An Overview of State-of-the-Art Data Modelling Example

24-25 January 2007An Overview of State-of-the-Art Data Modelling Example

24-25 January 2007An Overview of State-of-the-Art Data Modelling Example

24-25 January 2007An Overview of State-of-the-Art Data Modelling How can we generalise this? Consider instead Where is a nonlinear function of the inputs. Nonlinear transform of the inputs and then form a linear model (more tomorrow).

24-25 January 2007An Overview of State-of-the-Art Data Modelling Linear in the parameters A nonlinear model that is often called linear. Can apply simple estimation to the parameters. But… it is nonlinear in the basis functions.

24-25 January 2007An Overview of State-of-the-Art Data Modelling Parameter estimation Define the design matrix Then the optimal parameters given by

24-25 January 2007An Overview of State-of-the-Art Data Modelling Example

24-25 January 2007An Overview of State-of-the-Art Data Modelling Example

24-25 January 2007An Overview of State-of-the-Art Data Modelling Example

24-25 January 2007An Overview of State-of-the-Art Data Modelling Example – how does it work?

24-25 January 2007An Overview of State-of-the-Art Data Modelling Example – how does it work?

24-25 January 2007An Overview of State-of-the-Art Data Modelling Example – how does it work? Add all these together To get the function estimate

24-25 January 2007An Overview of State-of-the-Art Data Modelling Example – when it all goes wrong More on this later.

24-25 January 2007An Overview of State-of-the-Art Data Modelling Linear classification How do we apply linear models to classification – output is now categorical? Discriminant analysis. Probit analysis. Log-linear regression. Logistic regression.

24-25 January 2007An Overview of State-of-the-Art Data Modelling Logistic regression A regression model for Bernoulli-distributed targets. Form the linear model where

24-25 January 2007An Overview of State-of-the-Art Data Modelling Can we generalise it? Instead of use a linear in the parameters model

24-25 January 2007An Overview of State-of-the-Art Data Modelling Parameter estimation Maximum likelihood. Maximise the probability of getting the observed results given the parameters. Although unique minimum need to use iterative techniques (no closed form solution).

24-25 January 2007An Overview of State-of-the-Art Data Modelling Example

24-25 January 2007An Overview of State-of-the-Art Data Modelling Example

24-25 January 2007An Overview of State-of-the-Art Data Modelling Example

24-25 January 2007An Overview of State-of-the-Art Data Modelling Example

24-25 January 2007An Overview of State-of-the-Art Data Modelling Example – class probabilities

24-25 January 2007An Overview of State-of-the-Art Data Modelling But…

24-25 January 2007An Overview of State-of-the-Art Data Modelling Basis function optimisation Need to estimate: Type of basis functions. Number of basis functions. Positions of basis functions. These are nonlinear problems – difficult!

24-25 January 2007An Overview of State-of-the-Art Data Modelling Types of basis functions Usually choose a favourite! Examples include: Polynomials: Gaussians: …

24-25 January 2007An Overview of State-of-the-Art Data Modelling Number of basis functions How many basis functions? Slowly increase number until overfit data. Exploratory vs optimal. More on this in the next talk.

24-25 January 2007An Overview of State-of-the-Art Data Modelling Positions of basis functions This is really difficult! One easy possibility is to put one basis function on each data point. Uniform grid (but curse of dimensionality). Advantage of global basis functions e.g. polynomials – don’t need to optimise positions.

24-25 January 2007An Overview of State-of-the-Art Data Modelling Concluding remarks Always try a linear model first. Can make nonlinear in the input but linear in the parameters. But becomes nonlinear optimisation. Is least squares/maximum likelihood the best way?