Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Instabilities of SVD Small eigenvalues -> m+ sensitive to small amounts of noise Small eigenvalues maybe indistinguishable from 0 Possible to remove small.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
1 Regression Models & Loss Reserve Variability Prakash Narayan Ph.D., ACAS 2001 Casualty Loss Reserve Seminar.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Ch11 Curve Fitting Dr. Deshi Ye
Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft.
The General Linear Model. The Simple Linear Model Linear Regression.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Visual Recognition Tutorial
Function Optimization Newton’s Method. Conjugate Gradients
Tutorial 12 Unconstrained optimization Conjugate gradients.
Tutorial 5-6 Function Optimization. Line Search. Taylor Series for Rn
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
Linear and generalised linear models
Function Optimization. Newton’s Method Conjugate Gradients Method
Advanced Topics in Optimization
July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Minimal sufficient statistic.
Linear and generalised linear models
Environmental Data Analysis with MatLab Lecture 7: Prior Information.

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
UNCONSTRAINED MULTIVARIABLE
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Introduction to Error Analysis
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
CHAPTER 4 Adaptive Tapped-delay-line Filters Using the Least Squares Adaptive Filtering.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Modeling Data Greg Beckham. Bayes Fitting Procedure should provide – Parameters – Error estimates on the parameters – A statistical measure of goodness.
Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.
Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.
Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Geology 5670/6670 Inverse Theory 21 Jan 2015 © A.R. Lowry 2015 Read for Fri 23 Jan: Menke Ch 3 (39-68) Last time: Ordinary Least Squares Inversion Ordinary.
559 Fish 559; Lecture 5 Non-linear Minimization. 559 Introduction Non-linear minimization (or optimization) is the numerical technique that is used by.
Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Econometrics III Evgeniya Anatolievna Kolomak, Professor.
Environmental Data Analysis with MatLab 2 nd Edition Lecture 22: Linear Approximations and Non Linear Least Squares.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Data Modeling Patrice Koehl Department of Biological Sciences
The simple linear regression model and parameter estimation
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Chapter 7. Classification and Prediction
Probability Theory and Parameter Estimation I
The Maximum Likelihood Method
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Section 11.1: Least squares estimation CIS Computational.
Modelling data and curve fitting
Collaborative Filtering Matrix Factorization Approach
Chapter 10. Numerical Solutions of Nonlinear Systems of Equations
Nonlinear regression.
~ Least Squares example
5.2 Least-Squares Fit to a Straight Line
~ Least Squares example
Neural Network Training
Performance Optimization
Presentation transcript:

Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore

Data Modeling  Data Modeling: least squares  Data Modeling: Non linear least squares  Data Modeling: robust estimation

Data Modeling  Data Modeling: least squares  Linear least squares  Data Modeling: Non linear least squares  Data Modeling: robust estimation

Least squares Suppose that we are fitting N data points (x i,y i ) (with errors  i on each data point) to a model Y defined with M parameters a j : The standard procedure is least squares: the fitted values for the parameters a j are those that minimize: Where does this come from?

Let us suppose that:  The data points are independent of each other  Each data point as a measurement error that is random, distributed as a Gaussian distribution around the “true” value Y(x i ) The probability of the data points, given the model Y is then: Least squares

Application of Bayes ‘s theorem: With no information on the models, we can assume that the prior probability P(Model) is constant. Finding the coefficients a1,…aM that maximizes P(Model/Data) is then equivalent to finding the coefficients that maximizes P(Data/Model). This is equivalent to maximizing its logarithm, or minimizing the negative of its logarithm, namely: Least squares

Fitting data to a straight line

This is the simplest case: Then: The parameters a and b are obtained from the two equations:

Fitting data to a straight line Let us define: then a and b are given by:

Fitting data to a straight line We are not done! Uncertainty on the values of a and b: Evaluate goodness of fit: -Compute  2 and compare to N-M (here N-2) -Compute residual error on each data point: Y(x i )-y i -Compute correlation coefficient R 2

Fitting data to a straight line

General Least Squares Then: The minimization of  2 occurs when the derivatives of  2 with respect to the parameters a 1,…a M are 0. This leads to M equations:

General Least Squares Define design matrix A such that

General Least Squares Define two vectors b and a such that and a contains the parameters The parameters a that minimize  2 satisfy: Note that  2 can be rewritten as: These are the normal equations for the linear least square problem.

General Least Squares How to solve a general least square problem: 1) Build the design matrix A and the vector b 2) Find parameters a 1,…a M that minimize (usually solve the normal equations) 3) Compute uncertainty on each parameter a j : if C = A T A, then

Data Modeling  Data Modeling: least squares  Data Modeling: Non linear least squares  Data Modeling: robust estimation

In the general case, g(X 1,…,X n ) is a non linear function of the parameters X 1,…X n ;  2 is then also a non linear function of these parameters: Finding the parameters X 1,…,X n is then treated as finding X 1,…,X n that minimize  2. Non linear least squares

Minimizing  2 Some definitions: Gradient: The gradient of a smooth function f with continuous first and second derivatives is defined as: Hessian The n x n symmetric matrix of second derivatives, H(x), is called the Hessian:

Steepest descent (SD): The simplest iteration scheme consists of following the “steepest descent” direction: Usually, SD methods leads to improvement quickly, but then exhibit slow progress toward a solution. They are commonly recommended for initial minimization iterations, when the starting function and gradient-norm values are very large. Minimization of a multi-variable function is usually an iterative process, in which updates of the state variable x are computed using the gradient and in some (favorable) cases the Hessian. ( sets the minimum along the line defined by the gradient) Minimizing  2

Conjugate gradients (CG): In each step of conjugate gradient methods, a search vector p k is defined by a recursive formula: The corresponding new position is found by line minimization along p k : the CG methods differ in their definition of . Minimizing  2

Newton’s methods: Newton’s method is a popular iterative method for finding the 0 of a one-dimensional function: x0x0 x1x1 x2x2 x3x3 It can be adapted to the minimization of a one –dimensional function, in which case the iteration formula is: Minimizing  2

The equivalent iterative scheme for multivariate functions is based on: Several implementations of Newton’s method exist, that avoid Computing the full Hessian matrix: quasi-Newton, truncated Newton, “adopted-basis Newton-Raphson” (ABNR),… Minimizing  2

Data analysis and Data Modeling  Data Modeling: least squares  Data Modeling: Non linear least squares  Data Modeling: robust estimation

Robust estimation of parameters Least squares modeling assume a Gaussian statistics for the experimental data points; this may not always be true however. There are other possible distributions that may lead to better models in some cases. One of the most popular alternatives is to use a distribution of the form: Let us look again at the simple case of fitting a straight line in a set of data points (t i,Y i ), which is now written as finding a and b that minimize: b = median(Y-at) and a is found by non linear minimization

Robust estimation of parameters