Nonlinear regression.

Slides:

Advertisements

Similar presentations

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.

Advertisements

Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.

Data Modeling and Parameter Estimation Nov 9, 2005 PSCI 702.

Chapter 10 Curve Fitting and Regression Analysis

Ch11 Curve Fitting Dr. Deshi Ye

P M V Subbarao Professor Mechanical Engineering Department

Lecture (14,15) More than one Variable, Curve Fitting, and Method of Least Squares.

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

Least Square Regression

Chapter 1 Introduction The solutions of engineering problems can be obtained using analytical methods or numerical methods. Analytical differentiation.

Curve-Fitting Regression

Least Square Regression

The Islamic University of Gaza Faculty of Engineering Civil Engineering Department Numerical Analysis ECIV 3306 Chapter 17 Least Square Regression.

Linear fits You know how to use the solver to minimize the chi^2 to do linear fits… Where do the errors on the slope and intercept come from?

Lecture 5 Curve fitting by iterative approaches MARINE QB III MARINE QB III Modelling Aquatic Rates In Natural Ecosystems BIOL471 © 2001 School of Biological.

Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. by Lale Yurttas, Texas A&M University Chapter 171 CURVE.

Newton's Method for Functions of Several Variables

Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. by Lale Yurttas, Texas A&M University Chapter 171 Least.

Classification and Prediction: Regression Analysis

Objectives of Multiple Regression

Least-Squares Regression

Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.

Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Curve Fitting.

Newton's Method for Functions of Several Variables Joe Castle & Megan Grywalski.

Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.

MECN 3500 Inter - Bayamon Lecture 9 Numerical Methods for Engineering MECN 3500 Professor: Dr. Omar E. Meza Castillo

Chapter 8 Curve Fitting.

Curve-Fitting Regression

1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.

Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore

Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.

Curve Fitting Pertemuan 10 Matakuliah: S0262-Analisis Numerik Tahun: 2010.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Chapter 15 General Least Squares and Non- Linear.

Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression.

ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.

Linear regression. Case study Galactose diffusion in silica mesopore.

Data Modeling Patrice Koehl Department of Biological Sciences

ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and 2 Now, we need procedures to calculate  and 2 , themselves.

Physics 114: Lecture 13 Probability Tests & Linear Fitting

Chapter 7. Classification and Prediction

1 Functions and Applications

Part 5 - Chapter

Part 5 - Chapter 17.

CH 5: Multivariate Methods

Ch12.1 Simple Linear Regression

Statistical Methods For Engineers

Chapter 12 Curve Fitting : Fitting a Straight Line Gab-Byung Chae

Part 5 - Chapter 17.

Today’s class Multiple Variable Linear Regression

Chi Square Distribution (c2) and Least Squares Fitting

J.-F. Pâris University of Houston

Linear regression Fitting a straight line to observations.

6.5 Taylor Series Linearization

5.2 Least-Squares Fit to a Straight Line

5.4 General Linear Least-Squares

Nonlinear Fitting.

Discrete Least Squares Approximation

Least Square Regression

ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and 2 Now, we need procedures to calculate  and 2 , themselves.

Linear regression.

Engineering Analysis ENG 3420 Fall 2009

Linear Algebra Lecture 16.

Multiple Regression Berlin Chen

The Normal Distribution

CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.

Pivoting, Perturbation Analysis, Scaling and Equilibration

Regression Models - Introduction

Multiple linear regression

Presentation transcript:

Nonlinear regression

CASE STUDY - ENSO

How to analyse data?

How to analyse data? Plot!

Human brain is one the most powerfull computationall tools How to analyse data? Plot! Human brain is one the most powerfull computationall tools Works differently than a computer…

What if data have no linear correlation?

1. Linearization – transform nonlinear problem into linear Example 𝒚=𝑩 𝒆 𝑨𝒙 𝒍𝒐𝒈𝒚=𝒍𝒐𝒈𝑩+𝑨𝒙 𝒀=𝒃+𝒂𝒙

However there is more general approach… Few words about R In the case of linear regression the r coeff. Indicates the rate of linear dependency betwen data. However there is more general approach…

Discrepancy between data and single estimate (mean) Few words about R 𝑆 𝑟 = 𝑖=1 𝑛 𝑦 𝑖 −𝑓 𝑥 𝑖 2 Error of the model 𝑆 𝑡 = 𝑖=1 𝑛 𝑦 𝑖 − 𝑦 2 Discrepancy between data and single estimate (mean)

Few words about R 𝑦 = 𝑖=1 𝑛 𝑦 𝑖 𝑛 𝑠 𝑦 = 𝑆 𝑡 𝑛−1

Discrepancy between data and single estimate (mean) Few words about R 𝑆 𝑟 = 𝑖=1 𝑛 𝑦 𝑖 −𝑓 𝑥 𝑖 2 Error of the model 𝑆 𝑡 = 𝑖=1 𝑛 𝑦 𝑖 − 𝑦 2 Discrepancy between data and single estimate (mean)

Standard error of the estimate Few words about R 𝑠 𝑦 𝑥 = 𝑆 𝑟 𝑛−2 Standard error of the estimate Spread around the line

Standard error of the estimate Few words about R 𝑠 𝑦 𝑥 = 𝑆 𝑟 𝑛−2 Standard error of the estimate Spread around the line

Scale dependent - normalization Few word about R Error reduction due to describing data in terms of a model (straight line) 𝑟 2 = 𝑆 𝑡 − 𝑆 𝑟 𝑆 𝑡 Scale dependent - normalization

𝐴𝑛𝑠𝑐𝑜𝑚𝑏𝑒 𝑒𝑥𝑎𝑚𝑝𝑙𝑒 𝑦=0.5 𝑥+3, 𝑟 2 =0.67

2. Polynomial 𝒇 𝒙 = 𝒂 𝟎 + 𝒂 𝟏 𝒙+ 𝒂 𝟐 𝒙 𝟐 + 𝒂 𝟑 𝒙 𝟑 … 𝒇 𝒙 = 𝒂 𝟎 + 𝒂 𝟏 𝒙+ 𝒂 𝟐 𝒙 𝟐 + 𝒂 𝟑 𝒙 𝟑 … Same approach as in the case of linear regression – least squares

2. Polynomial 𝒇 𝒙 = 𝒂 𝟎 + 𝒂 𝟏 𝒙+ 𝒂 𝟐 𝒙 𝟐 + 𝒂 𝟑 𝒙 𝟑 … 𝒇 𝒙 = 𝒂 𝟎 + 𝒂 𝟏 𝒙+ 𝒂 𝟐 𝒙 𝟐 + 𝒂 𝟑 𝒙 𝟑 … 𝒆 𝒊 = 𝒚 𝒊 −𝒇 𝒙 𝒊 = 𝒚 𝒊 − 𝒂 𝟎 − 𝒂 𝟏 𝒙 𝒊 − 𝒂 𝟐 𝒙 𝒊 𝟐 … 𝒆 𝒊 = 𝒚 𝒊 −𝒇 𝒙 𝒊 = 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋

2. Polynomial 𝒇 𝒙 = 𝒂 𝟎 + 𝒂 𝟏 𝒙+ 𝒂 𝟐 𝒙 𝟐 + 𝒂 𝟑 𝒙 𝟑 … 𝒇 𝒙 = 𝒂 𝟎 + 𝒂 𝟏 𝒙+ 𝒂 𝟐 𝒙 𝟐 + 𝒂 𝟑 𝒙 𝟑 … 𝒆 𝒊 𝟐 = 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,… = 𝒊=𝟏 𝒏 𝒆 𝒊 𝟐 = 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐

How to adjust a and b so SSE is the smallest? 𝑆𝑆𝐸(𝑎,𝑏)= 𝑖=1 𝑛 𝑦 𝑖 −𝑎 𝑥 𝑖 −𝑏 2 How to calculate minimum of the SSE(a,b) function? 𝜕𝑆𝑆𝐸 𝑎,𝑏 𝜕𝑎 =0 𝜕𝑆𝑆𝐸 𝑎,𝑏 𝜕𝑏 =0

How to adjust a and b so SSE is the smallest? 𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,… = 𝒊=𝟏 𝒏 𝒆 𝒊 𝟐 = 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 𝝏𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,.. 𝝏 𝒂 𝟎 = 𝝏 𝝏 𝒂 𝟎 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝝏 𝝏 𝒂 𝟎 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝟐 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 −𝟏 =−𝟐 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝝏𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,.. 𝝏 𝒂 𝟏 = 𝝏 𝝏 𝒂 𝟏 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝝏 𝝏 𝒂 𝟏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝟐 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 − 𝒙 𝒊 =−𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝝏𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,.. 𝝏 𝒂 𝟐 = 𝝏 𝝏 𝒂 𝟐 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝝏 𝝏 𝒂 𝟐 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝟐 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 − 𝒙 𝒊 𝟐 =−𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝟐 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋

How to adjust a and b so SSE is the smallest? 𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,… = 𝒊=𝟏 𝒏 𝒆 𝒊 𝟐 = 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 𝝏𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,.. 𝝏 𝒂 𝒌 = 𝝏 𝝏 𝒂 𝒌 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝝏 𝝏 𝒂 𝒌 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝟐 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 − 𝒙 𝒊 𝒌 =−𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝒌 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝝏𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,.. 𝝏 𝒂 𝒌 =−𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝒌 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋

How to adjust a and b so SSE is the smallest? We obtain set of N+1 linear equations 𝝏𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,.. 𝝏 𝒂 𝒌 =−𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝒌 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 =𝟎 𝒊=𝟏 𝒏 𝒙 𝒊 𝒌 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 =𝟎 𝒊=𝟏 𝒏 𝒙 𝒊 𝒌 𝒚 𝒊 − 𝒊=𝟏 𝒏 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋+𝒌 =𝟎 𝒊=𝟏 𝒏 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋+𝒌 = 𝒊=𝟏 𝒏 𝒙 𝒊 𝒌 𝒚 𝒊

How to adjust a and b so SSE is the smallest? 𝒊=𝟏 𝒏 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋+𝒌 = 𝒊=𝟏 𝒏 𝒙 𝒊 𝒌 𝒚 𝒊 Row 1, k = 0 𝒊=𝟏 𝒏 𝒂 𝟎 + 𝒂 𝟏 𝒊=𝟏 𝒏 𝒙 𝒊 + 𝒂 𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝟐 +…+ 𝒂 𝑵 𝒊=𝟏 𝒏 𝒙 𝒊 𝑵 = 𝒊=𝟏 𝒏 𝒚 𝒊 Row 2, k = 1 𝒊=𝟏 𝒏 𝒂 𝟎 𝒙 𝒊 + 𝒂 𝟏 𝒊=𝟏 𝒏 𝒙 𝒊 𝟐 + 𝒂 𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝟑 +…+ 𝒂 𝑵 𝒊=𝟏 𝒏 𝒙 𝒊 𝑵+𝟏 = 𝒊=𝟏 𝒏 𝒙 𝒊 𝒚 𝒊 Row N+1, k = N 𝒊=𝟏 𝒏 𝒂 𝟎 𝒙 𝒊 𝑵 + 𝒂 𝟏 𝒊=𝟏 𝒏 𝒙 𝒊 𝑵+𝟏 + 𝒂 𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝑵+𝟐 +…+ 𝒂 𝑵 𝒊=𝟏 𝒏 𝒙 𝒊 𝑵+𝑵 = 𝒊=𝟏 𝒏 𝒙 𝒊 𝑵 𝒚 𝒊

How solve it? Linear equation - matrix 𝑛 𝑖=1 𝑛 𝑥 𝑖 … 𝑖=1 𝑛 𝑥 𝑖 𝑁 𝑖=1 𝑛 𝑥 𝑖 𝑖=1 𝑛 𝑥 𝑖 2 … 𝑖=1 𝑛 𝑥 𝑖 𝑁+1 ⋮ 𝑖=1 𝑛 𝑥 𝑖 𝑁 ⋮ 𝑖=1 𝑛 𝑥 𝑖 𝑁+1 ⋱ ⋮ … 𝑖=1 𝑛 𝑥 𝑖 2𝑁 𝑎 0 𝑎 1 ⋮ 𝑎 𝑁 = 𝑖=1 𝑛 𝑦 𝑖 𝑖=1 𝑛 𝑥 𝑖 𝑦 𝑖 ⋮ 𝑖=1 𝑛 𝑥 𝑖 𝑁 𝑦 𝑖

Nonlinear regression 2 Normal equations

Linear regression different approach 𝒚 𝟏 =𝒂 𝒙 𝟏 +𝒃 𝒚 𝟏 𝒚 𝟐 ⋮ 𝒚 𝒏 = 𝒙 𝟏 𝟏 𝒙 𝟐 𝟏 ⋮ 𝒙 𝒏 ⋮ 𝟏 𝒂 𝒃 𝒚 𝟐 =𝒂 𝒙 𝟐 +𝒃 𝒚 𝒏 =𝒂 𝒙 𝒏 +𝒃 𝒚=𝑨𝒛

Linear regression different approach 𝒚 𝟏 𝒚 𝟐 ⋮ 𝒚 𝒏 = 𝒙 𝟏 𝟏 𝒙 𝟐 𝟏 ⋮ 𝒙 𝒏 ⋮ 𝟏 𝒂 𝒃 𝒚=𝑨𝒛 This system cannot be solved. System is over conditioned – to many equations. A is not a square matrix – cannot be inverted. Solution? Let’s make it squared matrix!

Linear regression different approach Solution? Let’s make it squared matrix! 𝑨 𝑻 ∗ 𝑨 =𝑪 −𝒔𝒒𝒖𝒂𝒓𝒆 𝒎𝒂𝒕𝒓𝒊𝒙 𝒙 𝟏 𝒙 𝟐 ⋯ 𝒙 𝒏 𝟏 𝟏 ⋯ 𝟏 𝒙 𝟏 𝟏 𝒙 𝟐 𝟏 ⋮ 𝒙 𝒏 ⋮ 𝟏 = 𝒊=𝟏 𝒏 𝒙 𝒊 𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝒊=𝟏 𝒏 𝒙 𝒊 𝒏

Linear regression different approach Solution? Let’s make it square matrix! 𝒚=𝑨𝒛 𝒚=𝑨𝒛 | 𝐀 𝐓 𝒙 𝟏 𝒙 𝟐 ⋯ 𝒙 𝒏 𝟏 𝟏 ⋯ 𝟏 𝒚 𝟏 𝒚 𝟐 ⋮ 𝒚 𝒏 = 𝒊=𝟏 𝒏 𝒙 𝒊 𝒚 𝒊 𝒊=𝟏 𝒏 𝒚 𝒊 𝑨 𝑻 𝒚= 𝑨 𝑻 𝑨𝒛 𝑨 𝑻 𝑨 −𝟏 𝑨 𝑻 𝒚=𝒛

Example X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4

Example X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4

Example 𝒚=𝒂 𝒙 𝒃 𝒍𝒏𝒚=𝒃𝒍𝒏𝒙+𝒍𝒏𝒂 𝒀=𝜶𝑿+𝜷 X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒚=𝒂 𝒙 𝒃 𝒍𝒏𝒚=𝒃𝒍𝒏𝒙+𝒍𝒏𝒂 𝒀=𝜶𝑿+𝜷

Approach 1 – least squares X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒚=𝒂 𝒙 𝒃 𝒍𝒏𝒚=𝒃𝒍𝒏𝒙+𝒍𝒏𝒂 𝒀=𝜶𝑿+𝜷 𝒀= 𝒍𝒏 𝟎.𝟕𝟕 𝒍𝒏 𝟏.𝟏 𝒍𝒏 𝟏.𝟐𝟐 𝒍𝒏 𝟏.𝟑𝟏 𝒍𝒏 𝟏.𝟒 𝑨= 𝒍𝒏(𝟎.𝟓) 𝟏 𝒍𝒏 (𝟏.𝟓) 𝟏 𝒍𝒏(𝟐.𝟎) 𝟏 𝒍𝒏(𝟐.𝟓) 𝟏 𝒍𝒏(𝟑.𝟎) 𝟏

Approach 1 – least squares X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒚=𝒂 𝒙 𝒃 𝒀=𝜶𝑿+𝜷 𝒊=𝟏 𝒏 𝑿 𝒊 𝟐 𝒊=𝟏 𝒏 𝑿 𝒊 𝒊=𝟏 𝒏 𝑿 𝒊 𝒏 = 𝟑.𝟏𝟕𝟐 𝟐.𝟒𝟐 𝟐.𝟒𝟐 𝟓.𝟎

Approach 1 – least squares X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒀=𝜶𝑿+𝜷 𝒚=𝒂 𝒙 𝒃 𝒊=𝟏 𝒏 𝑿 𝒊 𝒀 𝒊 𝒊=𝟏 𝒏 𝒀 𝒊 = 𝟎.𝟗𝟕𝟒𝟕 𝟎.𝟔𝟑𝟗𝟑

Approach 1 – least squares X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒚=𝒂 𝒙 𝒃 𝒀=𝜶𝑿+𝜷 𝟑.𝟏𝟕𝟐 𝟐.𝟒𝟐 𝟐.𝟒𝟐 𝟓.𝟎 𝜶 𝜷 = 𝟎.𝟗𝟕𝟒𝟕 𝟎.𝟔𝟑𝟗𝟑 𝜶 𝜷 = 𝟎.𝟑𝟑𝟐𝟔 −𝟎.𝟎𝟑𝟑𝟏 𝒚=𝟎.𝟗𝟔𝟕𝟒 𝒙 𝟎.𝟑𝟑𝟐𝟔 𝒃=𝜶 𝒂= 𝒆 𝜷

Approach 2 – normal equations X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒚=𝒂 𝒙 𝒃 𝒍𝒏𝒚=𝒃𝒍𝒏𝒙+𝒍𝒏𝒂 𝒀=𝜶𝑿+𝜷 𝑨= 𝒍𝒏(𝟎.𝟓) 𝟏 𝒍𝒏 (𝟏.𝟓) 𝟏 𝒍𝒏(𝟐.𝟎) 𝟏 𝒍𝒏(𝟐.𝟓) 𝟏 𝒍𝒏(𝟑.𝟎) 𝟏 𝒀=𝑨𝒛 𝒛= 𝜶 𝜷

Approach 2 – normal equations X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒚=𝒂 𝒙 𝒃 𝒀=𝜶𝑿+𝜷 𝒀=𝑨𝒛 𝒛= 𝜶 𝜷 𝒛= 𝑨 𝑻 𝑨 −𝟏 𝑨 𝑻 𝒀 𝑨 𝑻 𝑨= 𝒍𝒏(𝟎.𝟓) 𝒍𝒏(𝟏.𝟓) 𝒍𝒏(𝟐.𝟎) 𝟏 𝟏 𝟏 𝒍𝒏(𝟐.𝟓) 𝒍𝒏(𝟑.𝟎) 𝟏 𝟏 𝒍𝒏(𝟎.𝟓) & 𝟏 𝒍𝒏(𝟏.𝟓) & 𝟏 𝒍𝒏(𝟐.𝟎) & 𝟏 𝒍𝒏(𝟐.𝟓) & 𝟏 𝒍𝒏 𝟑.𝟎 & 𝟏 = 𝟑.𝟏𝟕𝟐 𝟐.𝟒𝟐 𝟐.𝟒𝟐 𝟓.𝟎 𝐀 𝐓 𝐀 −𝟏 = 𝟑.𝟏𝟕𝟐 𝟐.𝟒𝟐 𝟐.𝟒𝟐 𝟓.𝟎 −𝟏 = 𝟎.𝟒𝟗𝟗𝟗 −𝟎.𝟐𝟒𝟐 −𝟎.𝟐𝟒𝟐 𝟑.𝟏𝟕𝟐

Approach 2 – normal equations X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒚=𝒂 𝒙 𝒃 𝒀=𝜶𝑿+𝜷 𝒀=𝑨𝒛 𝒛= 𝜶 𝜷 𝒛= 𝑨 𝑻 𝑨 −𝟏 𝑨 𝑻 𝒀 𝑨 𝑻 𝒀= 𝒍𝒏(𝟎.𝟓) 𝒍𝒏(𝟏.𝟓) 𝒍𝒏(𝟐.𝟎) 𝟏 𝟏 𝟏 𝒍𝒏(𝟐.𝟓) 𝒍𝒏(𝟑.𝟎) 𝟏 𝟏 𝒍𝒏 𝟎.𝟕𝟕 𝒍𝒏 𝟏.𝟏 𝒍𝒏 𝟏.𝟐𝟐 𝒍𝒏 𝟏.𝟑𝟏 𝒍𝒏 𝟏.𝟒 = 𝟎.𝟗𝟕𝟒𝟕 𝟎.𝟔𝟑𝟗𝟑 𝒃=𝜶 𝒛= 𝑨 𝑻 𝑨 −𝟏 𝑨 𝑻 𝒀= 𝟎.𝟒𝟗𝟗𝟗 −𝟎.𝟐𝟒𝟐 −𝟎.𝟐𝟒𝟐 𝟑.𝟏𝟕𝟐 𝟎.𝟗𝟕𝟒𝟕 𝟎.𝟔𝟑𝟗𝟑 = 𝟎.𝟑𝟑𝟐𝟔 −𝟎.𝟎𝟑𝟑𝟏 𝒚=𝟎.𝟗𝟔𝟕𝟒 𝒙 𝟎.𝟑𝟑𝟐𝟔 𝒂= 𝒆 𝜷

Approach 2 – normal equations X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4

Summing up Linear regression problem can be formulated as: Having set of approximation of every yi as linear function of xi 𝑦 𝑖 =𝑓 𝑥 𝑖 + 𝑒 𝑖 where 𝑓 𝑥 =𝑎𝑥+𝑏 we look for such parameters a, b such that SSE is the smallest. Solution for this is: 𝑧 = 𝐴 𝐴 𝑇 −1 𝐴 𝑇 𝑦 where 𝑨= 𝒙 𝟏 𝟏 𝒙 𝟐 𝟏 ⋮ 𝒙 𝒏 ⋮ 𝟏 𝒚= 𝒚 𝟏 𝒚 𝟐 ⋮ 𝒚 𝒏 𝒛= 𝒂 𝒃

Example Fit function 𝑓 𝑥, 𝑎 0 , 𝑎 1 = 𝑎 0 1− 𝑒 − 𝑎 1 𝑥 to Partial derivatives off SSE with respect to 𝑎 0 and 𝑎 1 are: 𝑆𝑆𝐸 𝑎 0 , 𝑎 1 = 𝑖=1 𝑛 𝑦 𝑖 − 𝑎 0 1− 𝑒 − 𝑎 1 𝑥 2 𝜕𝑆𝑆𝐸 𝜕 𝑎 0 = 𝑖=1 𝑛 𝑦 𝑖 − 𝑎 0 1− 𝑒 − 𝑎 1 𝑥 𝑖 1− 𝑒 − 𝑎 1 𝑥 𝑖 =0 𝜕𝑆𝑆𝐸 𝜕 𝑎 1 = 𝑖=1 𝑛 𝑦 𝑖 − 𝑎 0 1− 𝑒 − 𝑎 1 𝑥 𝑖 𝑎 0 𝑥 𝑒 − 𝑎 1 𝑥 𝑖 =0 We obtain set of nonlinear equations One way is to solve them  no error control. Another…  Gauss-Newton iterative technique.

Iterative technique – Gauss-Newton method Problem: Having set of data 𝑥 𝑖 , 𝑦 𝑖 and function 𝑓(𝑥, 𝑎 0 , 𝑎 1 , 𝑎 2 ,…) fit: 𝑦 𝑖 =𝑓 𝑥 𝑖 , 𝑎 0 , 𝑎 1 , 𝑎 2 ,… + 𝑒 𝑖 for every i = 1,…,N So the sum of random errors 𝑒 𝑖 squared is the smallest. Vector notation: 𝑦 = 𝑓 𝑥, 𝑎 + 𝑒

Iterative technique – Gauss-Newton method To illustrate the process we use the case where there are two parameters a0 and a1. Then the truncated Taylor expression that defines the terms of the vectors used for values of the model function f in the steps of the iterative process have the form: 𝑓 𝑥 𝑖 𝑗+1 =𝑓 𝑥 𝑖 𝑗 + 𝜕𝑓 𝑥 𝑖 𝜕 𝑎 0 Δ 𝑎 0 + 𝜕𝑓 𝑥 𝑖 𝜕 𝑎 1 Δ 𝑎 1 for every i = 1,…,N We dont „travel” in x rather in a – we look for better estimation of a. The values of Δa0 and Δa1 are to be determined from the least squares computation at each step and represent the increments added to the latest estimates of the parameters to generate the next parameter estimates. This expression is said to linearize the original model with respect to the parameters.

How to solve truly non-linear problem 𝑦 𝑖 =𝑓 𝑥 𝑖 + 𝑒 𝑖 𝑓 𝑥 𝑖 𝑗+1 =𝑓 𝑥 𝑖 𝑗 + 𝜕𝑓 𝑥 𝑖 𝜕 𝑎 0 Δ 𝑎 0 + 𝜕𝑓 𝑥 𝑖 𝜕 𝑎 1 Δ 𝑎 1 𝑦 𝑖 =𝑓 𝑥 𝑖 𝑗 + 𝜕𝑓 𝑥 𝑖 𝜕 𝑎 0 Δ 𝑎 0 + 𝜕𝑓 𝑥 𝑖 𝜕 𝑎 1 Δ 𝑎 1 + 𝑒 𝑖 𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏 + 𝒆 𝟏 𝒆 𝟐 ⋮ 𝒆 𝑵

How to solve truly non-linear problem 𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏 + 𝒆 𝟏 𝒆 𝟐 ⋮ 𝒆 𝑵 Now we drop the random error terms to obtain an over determined system to which we apply least square computational strategy to determine the normal system. 𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏

How to solve truly non-linear problem 𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏 𝑱= 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝒃= 𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 𝒛= 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏

How to solve truly non-linear problem 𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏 𝒃=𝑱𝒛 𝒃=𝑱𝒛 | 𝑱 𝑻 𝑱 𝑻 𝒃= 𝑱 𝑻 𝑱𝒛 | 𝑱 𝑻

How to solve truly non-linear problem 𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏 𝑱 𝑻 𝒃= 𝑱 𝑻 𝑱𝒛 | 𝑱 𝑻 𝒛= 𝑱 𝑻 𝑱 −𝟏 𝑱 𝑻 𝒃 = 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏

How to solve truly non-linear problem The entries of vector z= 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏 are used to update the values of the parameters: 𝑎 0,𝑗+1 = 𝑎 0,𝑗 +Δ 𝑎 0 𝑎 1,𝑗+1 = 𝑎 1,𝑗 +Δ 𝑎 1 𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏

How to solve truly non-linear problem The iterative procedure continues and we test for convergence using an approximate relative error test 𝑎 0,𝑗+1 − 𝑎 0,𝑗 𝑎 0,𝑗+1 <𝑡𝑜𝑙𝑒𝑟𝑎𝑛𝑐𝑒 𝑎 1,𝑗+1 − 𝑎 1,𝑗 𝑎 1,𝑗+1 <𝑡𝑜𝑙𝑒𝑟𝑎𝑛𝑐𝑒

Example Fit function 𝑓 𝑥, 𝑎 0 , 𝑎 1 = 𝑎 0 1− 𝑒 − 𝑎 1 𝑥 to x 0.25 0.75 1.25 1.75 2.25 y 0.28 0.57 0.68 0.74 0.79

Example Fit function 𝑓 𝑥, 𝑎 0 , 𝑎 1 = 𝑎 0 1− 𝑒 − 𝑎 1 𝑥 to We use as initial guesses 𝑎 0 =1.0 and 𝑎 1 =1.0 Partial derivatives with respect to 𝑎 0 and 𝑎 1 are: 𝜕𝑓 𝜕 𝑎 0 =1− 𝑒 − 𝑎 1 𝑥 𝜕𝑓 𝜕 𝑎 1 = 𝑎 0 𝑥 𝑒 − 𝑎 1 𝑥 Next we evaluate entries of matrix J: 𝑱= 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 = 𝟎.𝟐𝟐 𝟎.𝟏𝟗 𝟎.𝟓𝟐 𝟎.𝟑𝟓 𝟎.𝟕𝟏 𝟎.𝟖𝟐 𝟎.𝟖𝟗 𝟎.𝟑𝟓 𝟎.𝟑𝟎 𝟎.𝟐𝟑

Example Then we compute entries of b matrix: 𝒃= 𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝟎.𝟐𝟖−𝟎.𝟐𝟐 𝟎.𝟓𝟕−𝟎.𝟓𝟐 𝟎.𝟔𝟖−𝟎.𝟕𝟏 𝟎.𝟕𝟒−𝟎.𝟖𝟐 𝟎.𝟕𝟗−𝟎.𝟖𝟗 = 𝟎.𝟎𝟓 𝟎.𝟎𝟒 −𝟎.𝟎𝟑 −𝟎.𝟎𝟖 −𝟎.𝟏 Using normal system equation 𝑧= 𝐽 𝑇 𝐽 −1 𝐽 𝑇 𝑏 𝒛= 𝚫 𝒂 𝟎 𝚫 𝐚 𝟏 = −𝟎.𝟐𝟕 𝟎.𝟓 So new set parameters a0 and a1 are: 𝒂 𝟎 𝒂 𝟏 = 𝟏 𝟏 + −𝟎.𝟐𝟕 𝟎.𝟓 = 𝟎.𝟕𝟑 𝟏.𝟓

Problems It may converge slowly. 2. It may oscillate and continually change direction. 3. It may not converge.

More about Normal Distribution How can we measure the similarity of given distribution to N(0,1)? 1 2 3 -3 -2 -1 Standard normal curve Z distribution 𝜇=0 𝜎=1 0.0013 0.0228 0.1587 0.5 0.8413 0.9772 0.9982 0.0214 0.1359 0.3413 Median, mode Mean 𝑥 = 𝑖=1 𝑁 𝑥 𝑖 𝑁 =0 Median = Mean = Mode Std. deviation 𝜎= 𝑖=1 𝑁 𝑥 𝑖 −𝜇 2 𝑁−2 =1 Skewness 𝑠1= 1 𝑁 𝑖=1 𝑁 𝑥−𝜇 3 1 𝑁 𝑖=1 𝑁 𝑥−𝜇 2 3 = 0 Kurtosis 𝑘= 1 𝑁 𝑖=1 𝑁 𝑥−𝜇 4 1 𝑁 𝑖=1 𝑁 𝑥−𝜇 2 2 = 3

More about Normal Distribution How can we measure the similarity of given distribution to N(0,1)? Median, mode Mean 𝑥 = 𝑖=1 𝑁 𝑥 𝑖 𝑁 =0 Median = Mean = Mode Std. deviation 𝜎= 𝑖=1 𝑁 𝑥 𝑖 −𝜇 2 𝑁−2 =1 Skewness 𝑠1= 1 𝑁 𝑖=1 𝑁 𝑥−𝜇 3 1 𝑁 𝑖=1 𝑁 𝑥−𝜇 2 3 = 0 Kurtosis 𝑘= 1 𝑁 𝑖=1 𝑁 𝑥−𝜇 4 1 𝑁 𝑖=1 𝑁 𝑥−𝜇 2 2 = 3