Nonlinear regression.

Nonlinear regression

CASE STUDY - ENSO

How to analyse data?

How to analyse data? Plot!

Human brain is one the most powerfull computationall tools
How to analyse data? Plot! Human brain is one the most powerfull computationall tools Works differently than a computer…

What if data have no linear correlation?

1. Linearization – transform nonlinear problem into linear
Example 𝒚=𝑩 𝒆 𝑨𝒙 𝒍𝒐𝒈𝒚=𝒍𝒐𝒈𝑩+𝑨𝒙 𝒀=𝒃+𝒂𝒙

However there is more general approach…
Few words about R In the case of linear regression the r coeff. Indicates the rate of linear dependency betwen data. However there is more general approach…

Discrepancy between data and single estimate (mean)
Few words about R 𝑆 𝑟 = 𝑖=1 𝑛 𝑦 𝑖 −𝑓 𝑥 𝑖 2 Error of the model 𝑆 𝑡 = 𝑖=1 𝑛 𝑦 𝑖 − 𝑦 2 Discrepancy between data and single estimate (mean)

Few words about R 𝑦 = 𝑖=1 𝑛 𝑦 𝑖 𝑛 𝑠 𝑦 = 𝑆 𝑡 𝑛−1

Discrepancy between data and single estimate (mean)
Few words about R 𝑆 𝑟 = 𝑖=1 𝑛 𝑦 𝑖 −𝑓 𝑥 𝑖 2 Error of the model 𝑆 𝑡 = 𝑖=1 𝑛 𝑦 𝑖 − 𝑦 2 Discrepancy between data and single estimate (mean)

Standard error of the estimate
Few words about R 𝑠 𝑦 𝑥 = 𝑆 𝑟 𝑛−2 Standard error of the estimate Spread around the line

Scale dependent - normalization
Few word about R Error reduction due to describing data in terms of a model (straight line) 𝑟 2 = 𝑆 𝑡 − 𝑆 𝑟 𝑆 𝑡 Scale dependent - normalization

𝐴𝑛𝑠𝑐𝑜𝑚𝑏𝑒 𝑒𝑥𝑎𝑚𝑝𝑙𝑒 𝑦=0.5 𝑥+3, 𝑟 2 =0.67

2. Polynomial 𝒇 𝒙 = 𝒂 𝟎 + 𝒂 𝟏 𝒙+ 𝒂 𝟐 𝒙 𝟐 + 𝒂 𝟑 𝒙 𝟑 …
𝒇 𝒙 = 𝒂 𝟎 + 𝒂 𝟏 𝒙+ 𝒂 𝟐 𝒙 𝟐 + 𝒂 𝟑 𝒙 𝟑 … Same approach as in the case of linear regression – least squares

𝒇 𝒙 = 𝒂 𝟎 + 𝒂 𝟏 𝒙+ 𝒂 𝟐 𝒙 𝟐 + 𝒂 𝟑 𝒙 𝟑 … 𝒆 𝒊 = 𝒚 𝒊 −𝒇 𝒙 𝒊 = 𝒚 𝒊 − 𝒂 𝟎 − 𝒂 𝟏 𝒙 𝒊 − 𝒂 𝟐 𝒙 𝒊 𝟐 … 𝒆 𝒊 = 𝒚 𝒊 −𝒇 𝒙 𝒊 = 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋

𝒇 𝒙 = 𝒂 𝟎 + 𝒂 𝟏 𝒙+ 𝒂 𝟐 𝒙 𝟐 + 𝒂 𝟑 𝒙 𝟑 … 𝒆 𝒊 𝟐 = 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,… = 𝒊=𝟏 𝒏 𝒆 𝒊 𝟐 = 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐

How to adjust a and b so SSE is the smallest?
𝑆𝑆𝐸(𝑎,𝑏)= 𝑖=1 𝑛 𝑦 𝑖 −𝑎 𝑥 𝑖 −𝑏 2 How to calculate minimum of the SSE(a,b) function? 𝜕𝑆𝑆𝐸 𝑎,𝑏 𝜕𝑎 =0 𝜕𝑆𝑆𝐸 𝑎,𝑏 𝜕𝑏 =0

𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,… = 𝒊=𝟏 𝒏 𝒆 𝒊 𝟐 = 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 𝝏𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,.. 𝝏 𝒂 𝟎 = 𝝏 𝝏 𝒂 𝟎 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝝏 𝝏 𝒂 𝟎 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝟐 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 −𝟏 =−𝟐 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝝏𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,.. 𝝏 𝒂 𝟏 = 𝝏 𝝏 𝒂 𝟏 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝝏 𝝏 𝒂 𝟏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝟐 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 − 𝒙 𝒊 =−𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝝏𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,.. 𝝏 𝒂 𝟐 = 𝝏 𝝏 𝒂 𝟐 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝝏 𝝏 𝒂 𝟐 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝟐 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 − 𝒙 𝒊 𝟐 =−𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝟐 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋

𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,… = 𝒊=𝟏 𝒏 𝒆 𝒊 𝟐 = 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 𝝏𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,.. 𝝏 𝒂 𝒌 = 𝝏 𝝏 𝒂 𝒌 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝝏 𝝏 𝒂 𝒌 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝟐 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 − 𝒙 𝒊 𝒌 =−𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝒌 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝝏𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,.. 𝝏 𝒂 𝒌 =−𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝒌 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋

We obtain set of N+1 linear equations 𝝏𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,.. 𝝏 𝒂 𝒌 =−𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝒌 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 =𝟎 𝒊=𝟏 𝒏 𝒙 𝒊 𝒌 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 =𝟎 𝒊=𝟏 𝒏 𝒙 𝒊 𝒌 𝒚 𝒊 − 𝒊=𝟏 𝒏 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋+𝒌 =𝟎 𝒊=𝟏 𝒏 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋+𝒌 = 𝒊=𝟏 𝒏 𝒙 𝒊 𝒌 𝒚 𝒊

𝒊=𝟏 𝒏 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋+𝒌 = 𝒊=𝟏 𝒏 𝒙 𝒊 𝒌 𝒚 𝒊 Row 1, k = 0 𝒊=𝟏 𝒏 𝒂 𝟎 + 𝒂 𝟏 𝒊=𝟏 𝒏 𝒙 𝒊 + 𝒂 𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝟐 +…+ 𝒂 𝑵 𝒊=𝟏 𝒏 𝒙 𝒊 𝑵 = 𝒊=𝟏 𝒏 𝒚 𝒊 Row 2, k = 1 𝒊=𝟏 𝒏 𝒂 𝟎 𝒙 𝒊 + 𝒂 𝟏 𝒊=𝟏 𝒏 𝒙 𝒊 𝟐 + 𝒂 𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝟑 +…+ 𝒂 𝑵 𝒊=𝟏 𝒏 𝒙 𝒊 𝑵+𝟏 = 𝒊=𝟏 𝒏 𝒙 𝒊 𝒚 𝒊 Row N+1, k = N 𝒊=𝟏 𝒏 𝒂 𝟎 𝒙 𝒊 𝑵 + 𝒂 𝟏 𝒊=𝟏 𝒏 𝒙 𝒊 𝑵+𝟏 + 𝒂 𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝑵+𝟐 +…+ 𝒂 𝑵 𝒊=𝟏 𝒏 𝒙 𝒊 𝑵+𝑵 = 𝒊=𝟏 𝒏 𝒙 𝒊 𝑵 𝒚 𝒊

How solve it? Linear equation - matrix
𝑛 𝑖=1 𝑛 𝑥 𝑖 … 𝑖=1 𝑛 𝑥 𝑖 𝑁 𝑖=1 𝑛 𝑥 𝑖 𝑖=1 𝑛 𝑥 𝑖 2 … 𝑖=1 𝑛 𝑥 𝑖 𝑁 ⋮ 𝑖=1 𝑛 𝑥 𝑖 𝑁 ⋮ 𝑖=1 𝑛 𝑥 𝑖 𝑁 ⋱ ⋮ … 𝑖=1 𝑛 𝑥 𝑖 2𝑁 𝑎 0 𝑎 1 ⋮ 𝑎 𝑁 = 𝑖=1 𝑛 𝑦 𝑖 𝑖=1 𝑛 𝑥 𝑖 𝑦 𝑖 ⋮ 𝑖=1 𝑛 𝑥 𝑖 𝑁 𝑦 𝑖

Nonlinear regression 2 Normal equations

Linear regression different approach
𝒚 𝟏 =𝒂 𝒙 𝟏 +𝒃 𝒚 𝟏 𝒚 𝟐 ⋮ 𝒚 𝒏 = 𝒙 𝟏 𝟏 𝒙 𝟐 𝟏 ⋮ 𝒙 𝒏 ⋮ 𝟏 𝒂 𝒃 𝒚 𝟐 =𝒂 𝒙 𝟐 +𝒃 𝒚 𝒏 =𝒂 𝒙 𝒏 +𝒃 𝒚=𝑨𝒛

𝒚 𝟏 𝒚 𝟐 ⋮ 𝒚 𝒏 = 𝒙 𝟏 𝟏 𝒙 𝟐 𝟏 ⋮ 𝒙 𝒏 ⋮ 𝟏 𝒂 𝒃 𝒚=𝑨𝒛 This system cannot be solved. System is over conditioned – to many equations. A is not a square matrix – cannot be inverted. Solution? Let’s make it squared matrix!

Solution? Let’s make it squared matrix! 𝑨 𝑻 ∗ 𝑨 =𝑪 −𝒔𝒒𝒖𝒂𝒓𝒆 𝒎𝒂𝒕𝒓𝒊𝒙 𝒙 𝟏 𝒙 𝟐 ⋯ 𝒙 𝒏 𝟏 𝟏 ⋯ 𝟏 𝒙 𝟏 𝟏 𝒙 𝟐 𝟏 ⋮ 𝒙 𝒏 ⋮ 𝟏 = 𝒊=𝟏 𝒏 𝒙 𝒊 𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝒊=𝟏 𝒏 𝒙 𝒊 𝒏

Solution? Let’s make it square matrix! 𝒚=𝑨𝒛 𝒚=𝑨𝒛 | 𝐀 𝐓 𝒙 𝟏 𝒙 𝟐 ⋯ 𝒙 𝒏 𝟏 𝟏 ⋯ 𝟏 𝒚 𝟏 𝒚 𝟐 ⋮ 𝒚 𝒏 = 𝒊=𝟏 𝒏 𝒙 𝒊 𝒚 𝒊 𝒊=𝟏 𝒏 𝒚 𝒊 𝑨 𝑻 𝒚= 𝑨 𝑻 𝑨𝒛 𝑨 𝑻 𝑨 −𝟏 𝑨 𝑻 𝒚=𝒛

Example X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31
1.4

Example 𝒚=𝒂 𝒙 𝒃 𝒍𝒏𝒚=𝒃𝒍𝒏𝒙+𝒍𝒏𝒂 𝒀=𝜶𝑿+𝜷 X = W (kg) 0.5 1.5 2.0 2.5 3.0
Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒚=𝒂 𝒙 𝒃 𝒍𝒏𝒚=𝒃𝒍𝒏𝒙+𝒍𝒏𝒂 𝒀=𝜶𝑿+𝜷

Approach 1 – least squares
X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒚=𝒂 𝒙 𝒃 𝒍𝒏𝒚=𝒃𝒍𝒏𝒙+𝒍𝒏𝒂 𝒀=𝜶𝑿+𝜷 𝒀= 𝒍𝒏 𝟎.𝟕𝟕 𝒍𝒏 𝟏.𝟏 𝒍𝒏 𝟏.𝟐𝟐 𝒍𝒏 𝟏.𝟑𝟏 𝒍𝒏 𝟏.𝟒 𝑨= 𝒍𝒏(𝟎.𝟓) 𝟏 𝒍𝒏 (𝟏.𝟓) 𝟏 𝒍𝒏(𝟐.𝟎) 𝟏 𝒍𝒏(𝟐.𝟓) 𝟏 𝒍𝒏(𝟑.𝟎) 𝟏

X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒚=𝒂 𝒙 𝒃 𝒀=𝜶𝑿+𝜷 𝒊=𝟏 𝒏 𝑿 𝒊 𝟐 𝒊=𝟏 𝒏 𝑿 𝒊 𝒊=𝟏 𝒏 𝑿 𝒊 𝒏 = 𝟑.𝟏𝟕𝟐 𝟐.𝟒𝟐 𝟐.𝟒𝟐 𝟓.𝟎

X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒀=𝜶𝑿+𝜷 𝒚=𝒂 𝒙 𝒃 𝒊=𝟏 𝒏 𝑿 𝒊 𝒀 𝒊 𝒊=𝟏 𝒏 𝒀 𝒊 = 𝟎.𝟗𝟕𝟒𝟕 𝟎.𝟔𝟑𝟗𝟑

X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒚=𝒂 𝒙 𝒃 𝒀=𝜶𝑿+𝜷 𝟑.𝟏𝟕𝟐 𝟐.𝟒𝟐 𝟐.𝟒𝟐 𝟓.𝟎 𝜶 𝜷 = 𝟎.𝟗𝟕𝟒𝟕 𝟎.𝟔𝟑𝟗𝟑 𝜶 𝜷 = 𝟎.𝟑𝟑𝟐𝟔 −𝟎.𝟎𝟑𝟑𝟏 𝒚=𝟎.𝟗𝟔𝟕𝟒 𝒙 𝟎.𝟑𝟑𝟐𝟔 𝒃=𝜶 𝒂= 𝒆 𝜷

Approach 2 – normal equations
X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒚=𝒂 𝒙 𝒃 𝒍𝒏𝒚=𝒃𝒍𝒏𝒙+𝒍𝒏𝒂 𝒀=𝜶𝑿+𝜷 𝑨= 𝒍𝒏(𝟎.𝟓) 𝟏 𝒍𝒏 (𝟏.𝟓) 𝟏 𝒍𝒏(𝟐.𝟎) 𝟏 𝒍𝒏(𝟐.𝟓) 𝟏 𝒍𝒏(𝟑.𝟎) 𝟏 𝒀=𝑨𝒛 𝒛= 𝜶 𝜷

X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒚=𝒂 𝒙 𝒃 𝒀=𝜶𝑿+𝜷 𝒀=𝑨𝒛 𝒛= 𝜶 𝜷 𝒛= 𝑨 𝑻 𝑨 −𝟏 𝑨 𝑻 𝒀 𝑨 𝑻 𝑨= 𝒍𝒏(𝟎.𝟓) 𝒍𝒏(𝟏.𝟓) 𝒍𝒏(𝟐.𝟎) 𝟏 𝟏 𝟏 𝒍𝒏(𝟐.𝟓) 𝒍𝒏(𝟑.𝟎) 𝟏 𝟏 𝒍𝒏(𝟎.𝟓) & 𝟏 𝒍𝒏(𝟏.𝟓) & 𝟏 𝒍𝒏(𝟐.𝟎) & 𝟏 𝒍𝒏(𝟐.𝟓) & 𝟏 𝒍𝒏 𝟑.𝟎 & 𝟏 = 𝟑.𝟏𝟕𝟐 𝟐.𝟒𝟐 𝟐.𝟒𝟐 𝟓.𝟎 𝐀 𝐓 𝐀 −𝟏 = 𝟑.𝟏𝟕𝟐 𝟐.𝟒𝟐 𝟐.𝟒𝟐 𝟓.𝟎 −𝟏 = 𝟎.𝟒𝟗𝟗𝟗 −𝟎.𝟐𝟒𝟐 −𝟎.𝟐𝟒𝟐 𝟑.𝟏𝟕𝟐

X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒚=𝒂 𝒙 𝒃 𝒀=𝜶𝑿+𝜷 𝒀=𝑨𝒛 𝒛= 𝜶 𝜷 𝒛= 𝑨 𝑻 𝑨 −𝟏 𝑨 𝑻 𝒀 𝑨 𝑻 𝒀= 𝒍𝒏(𝟎.𝟓) 𝒍𝒏(𝟏.𝟓) 𝒍𝒏(𝟐.𝟎) 𝟏 𝟏 𝟏 𝒍𝒏(𝟐.𝟓) 𝒍𝒏(𝟑.𝟎) 𝟏 𝟏 𝒍𝒏 𝟎.𝟕𝟕 𝒍𝒏 𝟏.𝟏 𝒍𝒏 𝟏.𝟐𝟐 𝒍𝒏 𝟏.𝟑𝟏 𝒍𝒏 𝟏.𝟒 = 𝟎.𝟗𝟕𝟒𝟕 𝟎.𝟔𝟑𝟗𝟑 𝒃=𝜶 𝒛= 𝑨 𝑻 𝑨 −𝟏 𝑨 𝑻 𝒀= 𝟎.𝟒𝟗𝟗𝟗 −𝟎.𝟐𝟒𝟐 −𝟎.𝟐𝟒𝟐 𝟑.𝟏𝟕𝟐 𝟎.𝟗𝟕𝟒𝟕 𝟎.𝟔𝟑𝟗𝟑 = 𝟎.𝟑𝟑𝟐𝟔 −𝟎.𝟎𝟑𝟑𝟏 𝒚=𝟎.𝟗𝟔𝟕𝟒 𝒙 𝟎.𝟑𝟑𝟐𝟔 𝒂= 𝒆 𝜷

X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4

Summing up Linear regression problem can be formulated as:
Having set of approximation of every yi as linear function of xi 𝑦 𝑖 =𝑓 𝑥 𝑖 + 𝑒 𝑖 where 𝑓 𝑥 =𝑎𝑥+𝑏 we look for such parameters a, b such that SSE is the smallest. Solution for this is: 𝑧 = 𝐴 𝐴 𝑇 −1 𝐴 𝑇 𝑦 where 𝑨= 𝒙 𝟏 𝟏 𝒙 𝟐 𝟏 ⋮ 𝒙 𝒏 ⋮ 𝟏 𝒚= 𝒚 𝟏 𝒚 𝟐 ⋮ 𝒚 𝒏 𝒛= 𝒂 𝒃

Example Fit function 𝑓 𝑥, 𝑎 0 , 𝑎 1 = 𝑎 0 1− 𝑒 − 𝑎 1 𝑥 to
Partial derivatives off SSE with respect to 𝑎 0 and 𝑎 1 are: 𝑆𝑆𝐸 𝑎 0 , 𝑎 1 = 𝑖=1 𝑛 𝑦 𝑖 − 𝑎 0 1− 𝑒 − 𝑎 1 𝑥 2 𝜕𝑆𝑆𝐸 𝜕 𝑎 0 = 𝑖=1 𝑛 𝑦 𝑖 − 𝑎 0 1− 𝑒 − 𝑎 1 𝑥 𝑖 1− 𝑒 − 𝑎 1 𝑥 𝑖 =0 𝜕𝑆𝑆𝐸 𝜕 𝑎 1 = 𝑖=1 𝑛 𝑦 𝑖 − 𝑎 0 1− 𝑒 − 𝑎 1 𝑥 𝑖 𝑎 0 𝑥 𝑒 − 𝑎 1 𝑥 𝑖 =0 We obtain set of nonlinear equations One way is to solve them  no error control. Another…  Gauss-Newton iterative technique.

Iterative technique – Gauss-Newton method
Problem: Having set of data 𝑥 𝑖 , 𝑦 𝑖 and function 𝑓(𝑥, 𝑎 0 , 𝑎 1 , 𝑎 2 ,…) fit: 𝑦 𝑖 =𝑓 𝑥 𝑖 , 𝑎 0 , 𝑎 1 , 𝑎 2 ,… + 𝑒 𝑖 for every i = 1,…,N So the sum of random errors 𝑒 𝑖 squared is the smallest. Vector notation: 𝑦 = 𝑓 𝑥, 𝑎 + 𝑒

Iterative technique – Gauss-Newton method
To illustrate the process we use the case where there are two parameters a0 and a1. Then the truncated Taylor expression that defines the terms of the vectors used for values of the model function f in the steps of the iterative process have the form: 𝑓 𝑥 𝑖 𝑗+1 =𝑓 𝑥 𝑖 𝑗 + 𝜕𝑓 𝑥 𝑖 𝜕 𝑎 0 Δ 𝑎 0 + 𝜕𝑓 𝑥 𝑖 𝜕 𝑎 1 Δ 𝑎 1 for every i = 1,…,N We dont „travel” in x rather in a – we look for better estimation of a. The values of Δa0 and Δa1 are to be determined from the least squares computation at each step and represent the increments added to the latest estimates of the parameters to generate the next parameter estimates. This expression is said to linearize the original model with respect to the parameters.

How to solve truly non-linear problem
𝑦 𝑖 =𝑓 𝑥 𝑖 + 𝑒 𝑖 𝑓 𝑥 𝑖 𝑗+1 =𝑓 𝑥 𝑖 𝑗 + 𝜕𝑓 𝑥 𝑖 𝜕 𝑎 0 Δ 𝑎 0 + 𝜕𝑓 𝑥 𝑖 𝜕 𝑎 1 Δ 𝑎 1 𝑦 𝑖 =𝑓 𝑥 𝑖 𝑗 + 𝜕𝑓 𝑥 𝑖 𝜕 𝑎 0 Δ 𝑎 0 + 𝜕𝑓 𝑥 𝑖 𝜕 𝑎 1 Δ 𝑎 1 + 𝑒 𝑖 𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏 𝒆 𝟏 𝒆 𝟐 ⋮ 𝒆 𝑵

𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏 𝒆 𝟏 𝒆 𝟐 ⋮ 𝒆 𝑵 Now we drop the random error terms to obtain an over determined system to which we apply least square computational strategy to determine the normal system. 𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏

𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏 𝑱= 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝒃= 𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 𝒛= 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏

𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏 𝒃=𝑱𝒛 𝒃=𝑱𝒛 | 𝑱 𝑻 𝑱 𝑻 𝒃= 𝑱 𝑻 𝑱𝒛 | 𝑱 𝑻

𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏 𝑱 𝑻 𝒃= 𝑱 𝑻 𝑱𝒛 | 𝑱 𝑻 𝒛= 𝑱 𝑻 𝑱 −𝟏 𝑱 𝑻 𝒃 = 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏

The entries of vector z= 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏 are used to update the values of the parameters: 𝑎 0,𝑗+1 = 𝑎 0,𝑗 +Δ 𝑎 0 𝑎 1,𝑗+1 = 𝑎 1,𝑗 +Δ 𝑎 1 𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏

The iterative procedure continues and we test for convergence using an approximate relative error test 𝑎 0,𝑗+1 − 𝑎 0,𝑗 𝑎 0,𝑗+1 <𝑡𝑜𝑙𝑒𝑟𝑎𝑛𝑐𝑒 𝑎 1,𝑗+1 − 𝑎 1,𝑗 𝑎 1,𝑗+1 <𝑡𝑜𝑙𝑒𝑟𝑎𝑛𝑐𝑒

Example Fit function 𝑓 𝑥, 𝑎 0 , 𝑎 1 = 𝑎 0 1− 𝑒 − 𝑎 1 𝑥 to x 0.25 0.75
1.25 1.75 2.25 y 0.28 0.57 0.68 0.74 0.79

Example Fit function 𝑓 𝑥, 𝑎 0 , 𝑎 1 = 𝑎 0 1− 𝑒 − 𝑎 1 𝑥 to
We use as initial guesses 𝑎 0 =1.0 and 𝑎 1 =1.0 Partial derivatives with respect to 𝑎 0 and 𝑎 1 are: 𝜕𝑓 𝜕 𝑎 0 =1− 𝑒 − 𝑎 1 𝑥 𝜕𝑓 𝜕 𝑎 1 = 𝑎 0 𝑥 𝑒 − 𝑎 1 𝑥 Next we evaluate entries of matrix J: 𝑱= 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 = 𝟎.𝟐𝟐 𝟎.𝟏𝟗 𝟎.𝟓𝟐 𝟎.𝟑𝟓 𝟎.𝟕𝟏 𝟎.𝟖𝟐 𝟎.𝟖𝟗 𝟎.𝟑𝟓 𝟎.𝟑𝟎 𝟎.𝟐𝟑

Example Then we compute entries of b matrix:
𝒃= 𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝟎.𝟐𝟖−𝟎.𝟐𝟐 𝟎.𝟓𝟕−𝟎.𝟓𝟐 𝟎.𝟔𝟖−𝟎.𝟕𝟏 𝟎.𝟕𝟒−𝟎.𝟖𝟐 𝟎.𝟕𝟗−𝟎.𝟖𝟗 = 𝟎.𝟎𝟓 𝟎.𝟎𝟒 −𝟎.𝟎𝟑 −𝟎.𝟎𝟖 −𝟎.𝟏 Using normal system equation 𝑧= 𝐽 𝑇 𝐽 −1 𝐽 𝑇 𝑏 𝒛= 𝚫 𝒂 𝟎 𝚫 𝐚 𝟏 = −𝟎.𝟐𝟕 𝟎.𝟓 So new set parameters a0 and a1 are: 𝒂 𝟎 𝒂 𝟏 = 𝟏 𝟏 + −𝟎.𝟐𝟕 𝟎.𝟓 = 𝟎.𝟕𝟑 𝟏.𝟓

Problems It may converge slowly.
2. It may oscillate and continually change direction. 3. It may not converge.

More about Normal Distribution
How can we measure the similarity of given distribution to N(0,1)? 1 2 3 -3 -2 -1 Standard normal curve Z distribution 𝜇=0 𝜎=1 0.0013 0.0228 0.1587 0.5 0.8413 0.9772 0.9982 0.0214 0.1359 0.3413 Median, mode Mean 𝑥 = 𝑖=1 𝑁 𝑥 𝑖 𝑁 =0 Median = Mean = Mode Std. deviation 𝜎= 𝑖=1 𝑁 𝑥 𝑖 −𝜇 2 𝑁−2 =1 Skewness 𝑠1= 1 𝑁 𝑖=1 𝑁 𝑥−𝜇 𝑁 𝑖=1 𝑁 𝑥−𝜇 = 0 Kurtosis 𝑘= 1 𝑁 𝑖=1 𝑁 𝑥−𝜇 𝑁 𝑖=1 𝑁 𝑥−𝜇 = 3

More about Normal Distribution
How can we measure the similarity of given distribution to N(0,1)? Median, mode Mean 𝑥 = 𝑖=1 𝑁 𝑥 𝑖 𝑁 =0 Median = Mean = Mode Std. deviation 𝜎= 𝑖=1 𝑁 𝑥 𝑖 −𝜇 2 𝑁−2 =1 Skewness 𝑠1= 1 𝑁 𝑖=1 𝑁 𝑥−𝜇 𝑁 𝑖=1 𝑁 𝑥−𝜇 = 0 Kurtosis 𝑘= 1 𝑁 𝑖=1 𝑁 𝑥−𝜇 𝑁 𝑖=1 𝑁 𝑥−𝜇 = 3

Nonlinear regression.

Similar presentations

Presentation on theme: "Nonlinear regression."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Nonlinear regression.

Similar presentations

Presentation on theme: "Nonlinear regression."— Presentation transcript:

Similar presentations

About project

Feedback