Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nonlinear regression.

Similar presentations


Presentation on theme: "Nonlinear regression."— Presentation transcript:

1 Nonlinear regression

2

3 CASE STUDY - ENSO

4

5

6 How to analyse data?

7 How to analyse data? Plot!

8 Human brain is one the most powerfull computationall tools
How to analyse data? Plot! Human brain is one the most powerfull computationall tools Works differently than a computer…

9 What if data have no linear correlation?

10 1. Linearization – transform nonlinear problem into linear
Example 𝒚=𝑩 𝒆 𝑨𝒙 𝒍𝒐𝒈𝒚=𝒍𝒐𝒈𝑩+𝑨𝒙 𝒀=𝒃+𝒂𝒙

11 However there is more general approach…
Few words about R In the case of linear regression the r coeff. Indicates the rate of linear dependency betwen data. However there is more general approach…

12 Discrepancy between data and single estimate (mean)
Few words about R 𝑆 𝑟 = 𝑖=1 𝑛 𝑦 𝑖 −𝑓 𝑥 𝑖 2 Error of the model 𝑆 𝑡 = 𝑖=1 𝑛 𝑦 𝑖 − 𝑦 2 Discrepancy between data and single estimate (mean)

13 Few words about R 𝑦 = 𝑖=1 𝑛 𝑦 𝑖 𝑛 𝑠 𝑦 = 𝑆 𝑡 𝑛−1

14 Discrepancy between data and single estimate (mean)
Few words about R 𝑆 𝑟 = 𝑖=1 𝑛 𝑦 𝑖 −𝑓 𝑥 𝑖 2 Error of the model 𝑆 𝑡 = 𝑖=1 𝑛 𝑦 𝑖 − 𝑦 2 Discrepancy between data and single estimate (mean)

15 Standard error of the estimate
Few words about R 𝑠 𝑦 𝑥 = 𝑆 𝑟 𝑛−2 Standard error of the estimate Spread around the line

16 Standard error of the estimate
Few words about R 𝑠 𝑦 𝑥 = 𝑆 𝑟 𝑛−2 Standard error of the estimate Spread around the line

17 Scale dependent - normalization
Few word about R Error reduction due to describing data in terms of a model (straight line) 𝑟 2 = 𝑆 𝑡 − 𝑆 𝑟 𝑆 𝑡 Scale dependent - normalization

18 𝐴𝑛𝑠𝑐𝑜𝑚𝑏𝑒 𝑒𝑥𝑎𝑚𝑝𝑙𝑒 𝑦=0.5 𝑥+3, 𝑟 2 =0.67

19 2. Polynomial 𝒇 𝒙 = 𝒂 𝟎 + 𝒂 𝟏 𝒙+ 𝒂 𝟐 𝒙 𝟐 + 𝒂 𝟑 𝒙 𝟑 …
𝒇 𝒙 = 𝒂 𝟎 + 𝒂 𝟏 𝒙+ 𝒂 𝟐 𝒙 𝟐 + 𝒂 𝟑 𝒙 𝟑 … Same approach as in the case of linear regression – least squares

20 2. Polynomial 𝒇 𝒙 = 𝒂 𝟎 + 𝒂 𝟏 𝒙+ 𝒂 𝟐 𝒙 𝟐 + 𝒂 𝟑 𝒙 𝟑 …
𝒇 𝒙 = 𝒂 𝟎 + 𝒂 𝟏 𝒙+ 𝒂 𝟐 𝒙 𝟐 + 𝒂 𝟑 𝒙 𝟑 … 𝒆 𝒊 = 𝒚 𝒊 −𝒇 𝒙 𝒊 = 𝒚 𝒊 − 𝒂 𝟎 − 𝒂 𝟏 𝒙 𝒊 − 𝒂 𝟐 𝒙 𝒊 𝟐 … 𝒆 𝒊 = 𝒚 𝒊 −𝒇 𝒙 𝒊 = 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋

21 2. Polynomial 𝒇 𝒙 = 𝒂 𝟎 + 𝒂 𝟏 𝒙+ 𝒂 𝟐 𝒙 𝟐 + 𝒂 𝟑 𝒙 𝟑 …
𝒇 𝒙 = 𝒂 𝟎 + 𝒂 𝟏 𝒙+ 𝒂 𝟐 𝒙 𝟐 + 𝒂 𝟑 𝒙 𝟑 … 𝒆 𝒊 𝟐 = 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,… = 𝒊=𝟏 𝒏 𝒆 𝒊 𝟐 = 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐

22 How to adjust a and b so SSE is the smallest?
𝑆𝑆𝐸(𝑎,𝑏)= 𝑖=1 𝑛 𝑦 𝑖 −𝑎 𝑥 𝑖 −𝑏 2 How to calculate minimum of the SSE(a,b) function? 𝜕𝑆𝑆𝐸 𝑎,𝑏 𝜕𝑎 =0 𝜕𝑆𝑆𝐸 𝑎,𝑏 𝜕𝑏 =0

23 How to adjust a and b so SSE is the smallest?
𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,… = 𝒊=𝟏 𝒏 𝒆 𝒊 𝟐 = 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 𝝏𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,.. 𝝏 𝒂 𝟎 = 𝝏 𝝏 𝒂 𝟎 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝝏 𝝏 𝒂 𝟎 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝟐 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 −𝟏 =−𝟐 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝝏𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,.. 𝝏 𝒂 𝟏 = 𝝏 𝝏 𝒂 𝟏 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝝏 𝝏 𝒂 𝟏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝟐 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 − 𝒙 𝒊 =−𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝝏𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,.. 𝝏 𝒂 𝟐 = 𝝏 𝝏 𝒂 𝟐 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝝏 𝝏 𝒂 𝟐 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝟐 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 − 𝒙 𝒊 𝟐 =−𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝟐 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋

24 How to adjust a and b so SSE is the smallest?
𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,… = 𝒊=𝟏 𝒏 𝒆 𝒊 𝟐 = 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 𝝏𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,.. 𝝏 𝒂 𝒌 = 𝝏 𝝏 𝒂 𝒌 𝒊=𝟏 𝒏 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝝏 𝝏 𝒂 𝒌 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝟐 = 𝒊=𝟏 𝒏 𝟐 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 − 𝒙 𝒊 𝒌 =−𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝒌 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 𝝏𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,.. 𝝏 𝒂 𝒌 =−𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝒌 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋

25 How to adjust a and b so SSE is the smallest?
We obtain set of N+1 linear equations 𝝏𝑺𝑺𝑬 𝒂 𝟎 , 𝒂 𝟏 ,.. 𝝏 𝒂 𝒌 =−𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝒌 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 =𝟎 𝒊=𝟏 𝒏 𝒙 𝒊 𝒌 𝒚 𝒊 − 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋 =𝟎 𝒊=𝟏 𝒏 𝒙 𝒊 𝒌 𝒚 𝒊 − 𝒊=𝟏 𝒏 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋+𝒌 =𝟎 𝒊=𝟏 𝒏 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋+𝒌 = 𝒊=𝟏 𝒏 𝒙 𝒊 𝒌 𝒚 𝒊

26 How to adjust a and b so SSE is the smallest?
𝒊=𝟏 𝒏 𝒋=𝟎 𝑵 𝒂 𝒋 𝒙 𝒊 𝒋+𝒌 = 𝒊=𝟏 𝒏 𝒙 𝒊 𝒌 𝒚 𝒊 Row 1, k = 0 𝒊=𝟏 𝒏 𝒂 𝟎 + 𝒂 𝟏 𝒊=𝟏 𝒏 𝒙 𝒊 + 𝒂 𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝟐 +…+ 𝒂 𝑵 𝒊=𝟏 𝒏 𝒙 𝒊 𝑵 = 𝒊=𝟏 𝒏 𝒚 𝒊 Row 2, k = 1 𝒊=𝟏 𝒏 𝒂 𝟎 𝒙 𝒊 + 𝒂 𝟏 𝒊=𝟏 𝒏 𝒙 𝒊 𝟐 + 𝒂 𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝟑 +…+ 𝒂 𝑵 𝒊=𝟏 𝒏 𝒙 𝒊 𝑵+𝟏 = 𝒊=𝟏 𝒏 𝒙 𝒊 𝒚 𝒊 Row N+1, k = N 𝒊=𝟏 𝒏 𝒂 𝟎 𝒙 𝒊 𝑵 + 𝒂 𝟏 𝒊=𝟏 𝒏 𝒙 𝒊 𝑵+𝟏 + 𝒂 𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝑵+𝟐 +…+ 𝒂 𝑵 𝒊=𝟏 𝒏 𝒙 𝒊 𝑵+𝑵 = 𝒊=𝟏 𝒏 𝒙 𝒊 𝑵 𝒚 𝒊

27 How solve it? Linear equation - matrix
𝑛 𝑖=1 𝑛 𝑥 𝑖 … 𝑖=1 𝑛 𝑥 𝑖 𝑁 𝑖=1 𝑛 𝑥 𝑖 𝑖=1 𝑛 𝑥 𝑖 2 … 𝑖=1 𝑛 𝑥 𝑖 𝑁 ⋮ 𝑖=1 𝑛 𝑥 𝑖 𝑁 ⋮ 𝑖=1 𝑛 𝑥 𝑖 𝑁 ⋱ ⋮ … 𝑖=1 𝑛 𝑥 𝑖 2𝑁 𝑎 0 𝑎 1 ⋮ 𝑎 𝑁 = 𝑖=1 𝑛 𝑦 𝑖 𝑖=1 𝑛 𝑥 𝑖 𝑦 𝑖 ⋮ 𝑖=1 𝑛 𝑥 𝑖 𝑁 𝑦 𝑖

28 Nonlinear regression 2 Normal equations

29 Linear regression different approach
𝒚 𝟏 =𝒂 𝒙 𝟏 +𝒃 𝒚 𝟏 𝒚 𝟐 ⋮ 𝒚 𝒏 = 𝒙 𝟏 𝟏 𝒙 𝟐 𝟏 ⋮ 𝒙 𝒏 ⋮ 𝟏 𝒂 𝒃 𝒚 𝟐 =𝒂 𝒙 𝟐 +𝒃 𝒚 𝒏 =𝒂 𝒙 𝒏 +𝒃 𝒚=𝑨𝒛

30 Linear regression different approach
𝒚 𝟏 𝒚 𝟐 ⋮ 𝒚 𝒏 = 𝒙 𝟏 𝟏 𝒙 𝟐 𝟏 ⋮ 𝒙 𝒏 ⋮ 𝟏 𝒂 𝒃 𝒚=𝑨𝒛 This system cannot be solved. System is over conditioned – to many equations. A is not a square matrix – cannot be inverted. Solution? Let’s make it squared matrix!

31 Linear regression different approach
Solution? Let’s make it squared matrix! 𝑨 𝑻 ∗ 𝑨 =𝑪 −𝒔𝒒𝒖𝒂𝒓𝒆 𝒎𝒂𝒕𝒓𝒊𝒙 𝒙 𝟏 𝒙 𝟐 ⋯ 𝒙 𝒏 𝟏 𝟏 ⋯ 𝟏 𝒙 𝟏 𝟏 𝒙 𝟐 𝟏 ⋮ 𝒙 𝒏 ⋮ 𝟏 = 𝒊=𝟏 𝒏 𝒙 𝒊 𝟐 𝒊=𝟏 𝒏 𝒙 𝒊 𝒊=𝟏 𝒏 𝒙 𝒊 𝒏

32 Linear regression different approach
Solution? Let’s make it square matrix! 𝒚=𝑨𝒛 𝒚=𝑨𝒛 | 𝐀 𝐓 𝒙 𝟏 𝒙 𝟐 ⋯ 𝒙 𝒏 𝟏 𝟏 ⋯ 𝟏 𝒚 𝟏 𝒚 𝟐 ⋮ 𝒚 𝒏 = 𝒊=𝟏 𝒏 𝒙 𝒊 𝒚 𝒊 𝒊=𝟏 𝒏 𝒚 𝒊 𝑨 𝑻 𝒚= 𝑨 𝑻 𝑨𝒛 𝑨 𝑻 𝑨 −𝟏 𝑨 𝑻 𝒚=𝒛

33 Example X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31
1.4

34 Example X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31
1.4

35 Example 𝒚=𝒂 𝒙 𝒃 𝒍𝒏𝒚=𝒃𝒍𝒏𝒙+𝒍𝒏𝒂 𝒀=𝜶𝑿+𝜷 X = W (kg) 0.5 1.5 2.0 2.5 3.0
Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒚=𝒂 𝒙 𝒃 𝒍𝒏𝒚=𝒃𝒍𝒏𝒙+𝒍𝒏𝒂 𝒀=𝜶𝑿+𝜷

36 Approach 1 – least squares
X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒚=𝒂 𝒙 𝒃 𝒍𝒏𝒚=𝒃𝒍𝒏𝒙+𝒍𝒏𝒂 𝒀=𝜶𝑿+𝜷 𝒀= 𝒍𝒏 𝟎.𝟕𝟕 𝒍𝒏 𝟏.𝟏 𝒍𝒏 𝟏.𝟐𝟐 𝒍𝒏 𝟏.𝟑𝟏 𝒍𝒏 𝟏.𝟒 𝑨= 𝒍𝒏(𝟎.𝟓) 𝟏 𝒍𝒏 (𝟏.𝟓) 𝟏 𝒍𝒏(𝟐.𝟎) 𝟏 𝒍𝒏(𝟐.𝟓) 𝟏 𝒍𝒏(𝟑.𝟎) 𝟏

37 Approach 1 – least squares
X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒚=𝒂 𝒙 𝒃 𝒀=𝜶𝑿+𝜷 𝒊=𝟏 𝒏 𝑿 𝒊 𝟐 𝒊=𝟏 𝒏 𝑿 𝒊 𝒊=𝟏 𝒏 𝑿 𝒊 𝒏 = 𝟑.𝟏𝟕𝟐 𝟐.𝟒𝟐 𝟐.𝟒𝟐 𝟓.𝟎

38 Approach 1 – least squares
X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒀=𝜶𝑿+𝜷 𝒚=𝒂 𝒙 𝒃 𝒊=𝟏 𝒏 𝑿 𝒊 𝒀 𝒊 𝒊=𝟏 𝒏 𝒀 𝒊 = 𝟎.𝟗𝟕𝟒𝟕 𝟎.𝟔𝟑𝟗𝟑

39 Approach 1 – least squares
X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒚=𝒂 𝒙 𝒃 𝒀=𝜶𝑿+𝜷 𝟑.𝟏𝟕𝟐 𝟐.𝟒𝟐 𝟐.𝟒𝟐 𝟓.𝟎 𝜶 𝜷 = 𝟎.𝟗𝟕𝟒𝟕 𝟎.𝟔𝟑𝟗𝟑 𝜶 𝜷 = 𝟎.𝟑𝟑𝟐𝟔 −𝟎.𝟎𝟑𝟑𝟏 𝒚=𝟎.𝟗𝟔𝟕𝟒 𝒙 𝟎.𝟑𝟑𝟐𝟔 𝒃=𝜶 𝒂= 𝒆 𝜷

40 Approach 2 – normal equations
X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒚=𝒂 𝒙 𝒃 𝒍𝒏𝒚=𝒃𝒍𝒏𝒙+𝒍𝒏𝒂 𝒀=𝜶𝑿+𝜷 𝑨= 𝒍𝒏(𝟎.𝟓) 𝟏 𝒍𝒏 (𝟏.𝟓) 𝟏 𝒍𝒏(𝟐.𝟎) 𝟏 𝒍𝒏(𝟐.𝟓) 𝟏 𝒍𝒏(𝟑.𝟎) 𝟏 𝒀=𝑨𝒛 𝒛= 𝜶 𝜷

41 Approach 2 – normal equations
X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒚=𝒂 𝒙 𝒃 𝒀=𝜶𝑿+𝜷 𝒀=𝑨𝒛 𝒛= 𝜶 𝜷 𝒛= 𝑨 𝑻 𝑨 −𝟏 𝑨 𝑻 𝒀 𝑨 𝑻 𝑨= 𝒍𝒏(𝟎.𝟓) 𝒍𝒏(𝟏.𝟓) 𝒍𝒏(𝟐.𝟎) 𝟏 𝟏 𝟏 𝒍𝒏(𝟐.𝟓) 𝒍𝒏(𝟑.𝟎) 𝟏 𝟏 𝒍𝒏(𝟎.𝟓) & 𝟏 𝒍𝒏(𝟏.𝟓) & 𝟏 𝒍𝒏(𝟐.𝟎) & 𝟏 𝒍𝒏(𝟐.𝟓) & 𝟏 𝒍𝒏 𝟑.𝟎 & 𝟏 = 𝟑.𝟏𝟕𝟐 𝟐.𝟒𝟐 𝟐.𝟒𝟐 𝟓.𝟎 𝐀 𝐓 𝐀 −𝟏 = 𝟑.𝟏𝟕𝟐 𝟐.𝟒𝟐 𝟐.𝟒𝟐 𝟓.𝟎 −𝟏 = 𝟎.𝟒𝟗𝟗𝟗 −𝟎.𝟐𝟒𝟐 −𝟎.𝟐𝟒𝟐 𝟑.𝟏𝟕𝟐

42 Approach 2 – normal equations
X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4 𝒚=𝒂 𝒙 𝒃 𝒀=𝜶𝑿+𝜷 𝒀=𝑨𝒛 𝒛= 𝜶 𝜷 𝒛= 𝑨 𝑻 𝑨 −𝟏 𝑨 𝑻 𝒀 𝑨 𝑻 𝒀= 𝒍𝒏(𝟎.𝟓) 𝒍𝒏(𝟏.𝟓) 𝒍𝒏(𝟐.𝟎) 𝟏 𝟏 𝟏 𝒍𝒏(𝟐.𝟓) 𝒍𝒏(𝟑.𝟎) 𝟏 𝟏 𝒍𝒏 𝟎.𝟕𝟕 𝒍𝒏 𝟏.𝟏 𝒍𝒏 𝟏.𝟐𝟐 𝒍𝒏 𝟏.𝟑𝟏 𝒍𝒏 𝟏.𝟒 = 𝟎.𝟗𝟕𝟒𝟕 𝟎.𝟔𝟑𝟗𝟑 𝒃=𝜶 𝒛= 𝑨 𝑻 𝑨 −𝟏 𝑨 𝑻 𝒀= 𝟎.𝟒𝟗𝟗𝟗 −𝟎.𝟐𝟒𝟐 −𝟎.𝟐𝟒𝟐 𝟑.𝟏𝟕𝟐 𝟎.𝟗𝟕𝟒𝟕 𝟎.𝟔𝟑𝟗𝟑 = 𝟎.𝟑𝟑𝟐𝟔 −𝟎.𝟎𝟑𝟑𝟏 𝒚=𝟎.𝟗𝟔𝟕𝟒 𝒙 𝟎.𝟑𝟑𝟐𝟔 𝒂= 𝒆 𝜷

43 Approach 2 – normal equations
X = W (kg) 0.5 1.5 2.0 2.5 3.0 Y = L (m) 0.77 1.1 1.22 1.31 1.4

44 Summing up Linear regression problem can be formulated as:
Having set of approximation of every yi as linear function of xi 𝑦 𝑖 =𝑓 𝑥 𝑖 + 𝑒 𝑖 where 𝑓 𝑥 =𝑎𝑥+𝑏 we look for such parameters a, b such that SSE is the smallest. Solution for this is: 𝑧 = 𝐴 𝐴 𝑇 −1 𝐴 𝑇 𝑦 where 𝑨= 𝒙 𝟏 𝟏 𝒙 𝟐 𝟏 ⋮ 𝒙 𝒏 ⋮ 𝟏 𝒚= 𝒚 𝟏 𝒚 𝟐 ⋮ 𝒚 𝒏 𝒛= 𝒂 𝒃

45 Example Fit function 𝑓 𝑥, 𝑎 0 , 𝑎 1 = 𝑎 0 1− 𝑒 − 𝑎 1 𝑥 to
Partial derivatives off SSE with respect to 𝑎 0 and 𝑎 1 are: 𝑆𝑆𝐸 𝑎 0 , 𝑎 1 = 𝑖=1 𝑛 𝑦 𝑖 − 𝑎 0 1− 𝑒 − 𝑎 1 𝑥 2 𝜕𝑆𝑆𝐸 𝜕 𝑎 0 = 𝑖=1 𝑛 𝑦 𝑖 − 𝑎 0 1− 𝑒 − 𝑎 1 𝑥 𝑖 1− 𝑒 − 𝑎 1 𝑥 𝑖 =0 𝜕𝑆𝑆𝐸 𝜕 𝑎 1 = 𝑖=1 𝑛 𝑦 𝑖 − 𝑎 0 1− 𝑒 − 𝑎 1 𝑥 𝑖 𝑎 0 𝑥 𝑒 − 𝑎 1 𝑥 𝑖 =0 We obtain set of nonlinear equations One way is to solve them  no error control. Another…  Gauss-Newton iterative technique.

46 Iterative technique – Gauss-Newton method
Problem: Having set of data 𝑥 𝑖 , 𝑦 𝑖 and function 𝑓(𝑥, 𝑎 0 , 𝑎 1 , 𝑎 2 ,…) fit: 𝑦 𝑖 =𝑓 𝑥 𝑖 , 𝑎 0 , 𝑎 1 , 𝑎 2 ,… + 𝑒 𝑖 for every i = 1,…,N So the sum of random errors 𝑒 𝑖 squared is the smallest. Vector notation: 𝑦 = 𝑓 𝑥, 𝑎 + 𝑒

47 Iterative technique – Gauss-Newton method
To illustrate the process we use the case where there are two parameters a0 and a1. Then the truncated Taylor expression that defines the terms of the vectors used for values of the model function f in the steps of the iterative process have the form: 𝑓 𝑥 𝑖 𝑗+1 =𝑓 𝑥 𝑖 𝑗 + 𝜕𝑓 𝑥 𝑖 𝜕 𝑎 0 Δ 𝑎 0 + 𝜕𝑓 𝑥 𝑖 𝜕 𝑎 1 Δ 𝑎 1 for every i = 1,…,N We dont „travel” in x rather in a – we look for better estimation of a. The values of Δa0 and Δa1 are to be determined from the least squares computation at each step and represent the increments added to the latest estimates of the parameters to generate the next parameter estimates. This expression is said to linearize the original model with respect to the parameters.

48 How to solve truly non-linear problem
𝑦 𝑖 =𝑓 𝑥 𝑖 + 𝑒 𝑖 𝑓 𝑥 𝑖 𝑗+1 =𝑓 𝑥 𝑖 𝑗 + 𝜕𝑓 𝑥 𝑖 𝜕 𝑎 0 Δ 𝑎 0 + 𝜕𝑓 𝑥 𝑖 𝜕 𝑎 1 Δ 𝑎 1 𝑦 𝑖 =𝑓 𝑥 𝑖 𝑗 + 𝜕𝑓 𝑥 𝑖 𝜕 𝑎 0 Δ 𝑎 0 + 𝜕𝑓 𝑥 𝑖 𝜕 𝑎 1 Δ 𝑎 1 + 𝑒 𝑖 𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏 𝒆 𝟏 𝒆 𝟐 ⋮ 𝒆 𝑵

49 How to solve truly non-linear problem
𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏 𝒆 𝟏 𝒆 𝟐 ⋮ 𝒆 𝑵 Now we drop the random error terms to obtain an over determined system to which we apply least square computational strategy to determine the normal system. 𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏

50 How to solve truly non-linear problem
𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏 𝑱= 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝒃= 𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 𝒛= 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏

51 How to solve truly non-linear problem
𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏 𝒃=𝑱𝒛 𝒃=𝑱𝒛 | 𝑱 𝑻 𝑱 𝑻 𝒃= 𝑱 𝑻 𝑱𝒛 | 𝑱 𝑻

52 How to solve truly non-linear problem
𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏 𝑱 𝑻 𝒃= 𝑱 𝑻 𝑱𝒛 | 𝑱 𝑻 𝒛= 𝑱 𝑻 𝑱 −𝟏 𝑱 𝑻 𝒃 = 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏

53 How to solve truly non-linear problem
The entries of vector z= 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏 are used to update the values of the parameters: 𝑎 0,𝑗+1 = 𝑎 0,𝑗 +Δ 𝑎 0 𝑎 1,𝑗+1 = 𝑎 1,𝑗 +Δ 𝑎 1 𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 𝚫 𝐚 𝟎 𝚫 𝐚 𝟏

54 How to solve truly non-linear problem
The iterative procedure continues and we test for convergence using an approximate relative error test 𝑎 0,𝑗+1 − 𝑎 0,𝑗 𝑎 0,𝑗+1 <𝑡𝑜𝑙𝑒𝑟𝑎𝑛𝑐𝑒 𝑎 1,𝑗+1 − 𝑎 1,𝑗 𝑎 1,𝑗+1 <𝑡𝑜𝑙𝑒𝑟𝑎𝑛𝑐𝑒

55 Example Fit function 𝑓 𝑥, 𝑎 0 , 𝑎 1 = 𝑎 0 1− 𝑒 − 𝑎 1 𝑥 to x 0.25 0.75
1.25 1.75 2.25 y 0.28 0.57 0.68 0.74 0.79

56 Example Fit function 𝑓 𝑥, 𝑎 0 , 𝑎 1 = 𝑎 0 1− 𝑒 − 𝑎 1 𝑥 to
We use as initial guesses 𝑎 0 =1.0 and 𝑎 1 =1.0 Partial derivatives with respect to 𝑎 0 and 𝑎 1 are: 𝜕𝑓 𝜕 𝑎 0 =1− 𝑒 − 𝑎 1 𝑥 𝜕𝑓 𝜕 𝑎 1 = 𝑎 0 𝑥 𝑒 − 𝑎 1 𝑥 Next we evaluate entries of matrix J: 𝑱= 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟏 𝝏 𝒂 𝟏 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟎 𝝏𝒇 𝒙 𝟐 𝝏 𝒂 𝟏 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟎 ⋮ 𝝏𝒇 𝒙 𝑵 𝝏 𝒂 𝟏 = 𝟎.𝟐𝟐 𝟎.𝟏𝟗 𝟎.𝟓𝟐 𝟎.𝟑𝟓 𝟎.𝟕𝟏 𝟎.𝟖𝟐 𝟎.𝟖𝟗 𝟎.𝟑𝟓 𝟎.𝟑𝟎 𝟎.𝟐𝟑

57 Example Then we compute entries of b matrix:
𝒃= 𝒚 𝟏 −𝒇 𝒙 𝟏 𝒚 𝟐 −𝒇( 𝒙 𝟐 ) ⋮ 𝒚 𝑵 −𝒇 𝒙 𝑵 = 𝟎.𝟐𝟖−𝟎.𝟐𝟐 𝟎.𝟓𝟕−𝟎.𝟓𝟐 𝟎.𝟔𝟖−𝟎.𝟕𝟏 𝟎.𝟕𝟒−𝟎.𝟖𝟐 𝟎.𝟕𝟗−𝟎.𝟖𝟗 = 𝟎.𝟎𝟓 𝟎.𝟎𝟒 −𝟎.𝟎𝟑 −𝟎.𝟎𝟖 −𝟎.𝟏 Using normal system equation 𝑧= 𝐽 𝑇 𝐽 −1 𝐽 𝑇 𝑏 𝒛= 𝚫 𝒂 𝟎 𝚫 𝐚 𝟏 = −𝟎.𝟐𝟕 𝟎.𝟓 So new set parameters a0 and a1 are: 𝒂 𝟎 𝒂 𝟏 = 𝟏 𝟏 + −𝟎.𝟐𝟕 𝟎.𝟓 = 𝟎.𝟕𝟑 𝟏.𝟓

58 Problems It may converge slowly.
2. It may oscillate and continually change direction. 3. It may not converge.

59 More about Normal Distribution
How can we measure the similarity of given distribution to N(0,1)? 1 2 3 -3 -2 -1 Standard normal curve Z distribution 𝜇=0 𝜎=1 0.0013 0.0228 0.1587 0.5 0.8413 0.9772 0.9982 0.0214 0.1359 0.3413 Median, mode Mean 𝑥 = 𝑖=1 𝑁 𝑥 𝑖 𝑁 =0 Median = Mean = Mode Std. deviation 𝜎= 𝑖=1 𝑁 𝑥 𝑖 −𝜇 2 𝑁−2 =1 Skewness 𝑠1= 1 𝑁 𝑖=1 𝑁 𝑥−𝜇 𝑁 𝑖=1 𝑁 𝑥−𝜇 = 0 Kurtosis 𝑘= 1 𝑁 𝑖=1 𝑁 𝑥−𝜇 𝑁 𝑖=1 𝑁 𝑥−𝜇 = 3

60 More about Normal Distribution
How can we measure the similarity of given distribution to N(0,1)? Median, mode Mean 𝑥 = 𝑖=1 𝑁 𝑥 𝑖 𝑁 =0 Median = Mean = Mode Std. deviation 𝜎= 𝑖=1 𝑁 𝑥 𝑖 −𝜇 2 𝑁−2 =1 Skewness 𝑠1= 1 𝑁 𝑖=1 𝑁 𝑥−𝜇 𝑁 𝑖=1 𝑁 𝑥−𝜇 = 0 Kurtosis 𝑘= 1 𝑁 𝑖=1 𝑁 𝑥−𝜇 𝑁 𝑖=1 𝑁 𝑥−𝜇 = 3


Download ppt "Nonlinear regression."

Similar presentations


Ads by Google