Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deep Learning.

Similar presentations


Presentation on theme: "Deep Learning."— Presentation transcript:

1 Deep Learning

2 About the Course CS6501: Vision and Language
Instructor: Vicente Ordonez Website: Location: Thornton Hall E316 Times: Tuesday - Thursday 12:30PM - 1:45PM Faculty Office hours: Tuesdays 3 - 4pm (Rice 310) Discuss in Piazza:

3 Today Quick review into Machine Learning. Linear Regression
Neural Networks Backpropagation

4 Linear Regression 𝑎 𝑗 = 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 + 𝑏 𝑗 𝑎= 𝑊 𝑇 𝑥+𝑏
Prediction, Inference, Testing 𝑎 𝑗 = 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 + 𝑏 𝑗 𝑎= 𝑊 𝑇 𝑥+𝑏 Training, Learning, Parameter estimation Objective minimization 𝐿 𝑊, 𝑏 = 𝑑=1 |𝐷| 𝑙( 𝑎 𝑑 , 𝑦 (𝑑) ) 𝐷={( 𝑥 𝑑 , 𝑦 𝑑 )} 𝑊 ∗ , 𝑏 ∗ =argmin 𝐿(𝑊, 𝑏)

5 total revenue international
Linear Regression Example: Hollywood movie data input variables x output variables y production costs promotional costs genre of the movie box office first week total book sales total revenue USA total revenue international 𝑥 1 (1) 𝑥 2 (1) 𝑥 3 (1) 𝑥 4 (1) 𝑥 5 (1) 𝑥 6 (1) 𝑥 7 (1) 𝑥 1 (2) 𝑥 2 (2) 𝑥 3 (2) 𝑥 4 (2) 𝑥 5 (2) 𝑥 6 (2) 𝑥 7 (2) 𝑥 1 (3) 𝑥 2 (3) 𝑥 3 (3) 𝑥 4 (3) 𝑥 5 (3) 𝑥 6 (3) 𝑥 7 (3) 𝑥 1 (4) 𝑥 2 (4) 𝑥 3 (4) 𝑥 4 (4) 𝑥 5 (4) 𝑥 6 (4) 𝑥 7 (4) 𝑥 1 (5) 𝑥 2 (5) 𝑥 3 (5) 𝑥 4 (5) 𝑥 5 (5) 𝑥 6 (5) 𝑥 7 (5)

6 total revenue international
Linear Regression Example: Hollywood movie data input variables x output variables y production costs promotional costs genre of the movie box office first week total book sales total revenue USA total revenue international 𝑥 1 (1) 𝑥 2 (1) 𝑥 3 (1) 𝑥 4 (1) 𝑥 5 (1) 𝑦 1 (1) 𝑦 2 (1) 𝑥 1 (2) 𝑥 2 (2) 𝑥 3 (2) 𝑥 4 (2) 𝑥 5 (2) 𝑦 1 (2) 𝑦 2 (2) 𝑥 1 (3) 𝑥 2 (3) 𝑥 3 (3) 𝑥 4 (3) 𝑥 5 (3) 𝑦 1 (3) 𝑦 2 (3) 𝑥 1 (4) 𝑥 2 (4) 𝑥 3 (4) 𝑥 4 (4) 𝑥 5 (4) 𝑦 1 (4) 𝑦 2 (4) 𝑥 1 (5) 𝑥 2 (5) 𝑥 3 (5) 𝑥 4 (5) 𝑥 5 (5) 𝑦 1 (5) 𝑦 2 (5)

7 total revenue international
Linear Regression Example: Hollywood movie data input variables x output variables y production costs promotional costs genre of the movie box office first week total book sales total revenue USA total revenue international 𝑥 1 (1) 𝑥 2 (1) 𝑥 3 (1) 𝑥 4 (1) 𝑥 5 (1) 𝑦 1 (1) 𝑦 2 (1) training data 𝑥 1 (2) 𝑥 2 (2) 𝑥 3 (2) 𝑥 4 (2) 𝑥 5 (2) 𝑦 1 (2) 𝑦 2 (2) 𝑥 1 (3) 𝑥 2 (3) 𝑥 3 (3) 𝑥 4 (3) 𝑥 5 (3) 𝑦 1 (3) 𝑦 2 (3) 𝑥 1 (4) 𝑥 2 (4) 𝑥 3 (4) 𝑥 4 (4) 𝑥 5 (4) 𝑦 1 (4) 𝑦 2 (4) test data 𝑥 1 (5) 𝑥 2 (5) 𝑥 3 (5) 𝑥 4 (5) 𝑥 5 (5) 𝑦 1 (5) 𝑦 2 (5)

8 Linear Regression – Least Squares
𝑎 𝑗 = 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 + 𝑏 𝑗 𝑎= 𝑊 𝑇 𝑥+𝑏 Training, Learning, Parameter estimation Objective minimization 𝐿 𝑊, 𝑏 = 𝑑=1 |𝐷| 𝑙( 𝑎 𝑑 , 𝑦 (𝑑) ) 𝐷={( 𝑥 𝑑 , 𝑦 𝑑 )} 𝑊 ∗ , 𝑏 ∗ =argmin 𝐿(𝑊, 𝑏)

9 Linear Regression – Least Squares
𝑎 𝑗 = 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 + 𝑏 𝑗 𝑎= 𝑊 𝑇 𝑥+𝑏 Training, Learning, Parameter estimation Objective minimization 𝐿 𝑊, 𝑏 = 𝑑=1 |𝐷| 𝑎 𝑑 − 𝑦 𝑑 2 𝐷={( 𝑥 𝑑 , 𝑦 𝑑 )} 𝑊 ∗ , 𝑏 ∗ =argmin 𝐿(𝑊, 𝑏)

10 Linear Regression – Least Squares
𝐿 𝑊, 𝑏 = 𝑑=1 |𝐷| 𝑎 𝑑 − 𝑦 𝑑 2 𝑎 𝑗 (𝑑) = 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 (𝑑) + 𝑏 𝑗 𝐿 𝑗 𝑊, 𝑏 = 𝑑=1 |𝐷| 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 (𝑑) + 𝑏 𝑗 − 𝑦 𝑗 𝑑 2

11 Linear Regression – Least Squares
𝐿 𝑗 𝑊, 𝑏 = 𝑑=1 |𝐷| 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 (𝑑) + 𝑏 𝑗 − 𝑦 𝑗 𝑑 2 𝑊 ∗ , 𝑏 ∗ =argmin 𝐿(𝑊, 𝑏) 𝑑 𝐿 𝑗 𝑑 𝑤 𝑢𝑣 𝑊, 𝑏 = 𝑑𝐿 𝑑 𝑤 𝑢𝑣 ( 𝑑=1 |𝐷| 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 (𝑑) + 𝑏 𝑗 − 𝑦 𝑗 𝑑 2 )

12 Linear Regression – Least Squares
𝑑 𝐿 𝑗 𝑑 𝑤 𝑢𝑣 𝑊, 𝑏 = 𝑑 𝐿 𝑗 𝑑 𝑤 𝑢𝑣 ( 𝑑=1 |𝐷| 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 (𝑑) + 𝑏 𝑗 − 𝑦 𝑗 𝑑 2 ) 𝑑 𝐿 𝑗 𝑑 𝑤 𝑢𝑣 𝑊, 𝑏 = ( 𝑑=1 |𝐷| 𝑑 𝐿 𝑗 𝑑 𝑤 𝑢𝑣 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 (𝑑) + 𝑏 𝑗 − 𝑦 𝑗 𝑑 2 ) =0 𝑊= (𝑋 𝑇 𝑋) −1 𝑋 𝑇 𝑌

13 Neural Network with One Layer
𝑊=[𝑤 𝑗𝑖 ] 𝑥 1 𝑎 1 𝑥 2 𝑥 3 𝑎 2 𝑥 4 𝑥 5 𝑎 𝑗 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 + 𝑏 𝑗 )

14 Neural Network with One Layer
𝐿 𝑊, 𝑏 = 𝑑=1 |𝐷| 𝑎 𝑑 − 𝑦 𝑑 2 𝑎 𝑗 (𝑑) =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 𝑑 + 𝑏 𝑗 ) 𝐿 𝑗 𝑊, 𝑏 = 𝑑=1 |𝐷| 𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 𝑑 + 𝑏 𝑗 )− 𝑦 𝑗 𝑑 2

15 Neural Network with One Layer
𝐿 𝑗 𝑊, 𝑏 = 𝑑=1 |𝐷| 𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 𝑑 + 𝑏 𝑗 )− 𝑦 𝑗 𝑑 2 𝑑 𝐿 𝑗 𝑑 𝑤 𝑢𝑣 = 𝑑=1 |𝐷| 𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 𝑑 + 𝑏 𝑗 )− 𝑦 𝑗 𝑑 2 =0 We can compute this derivative but there is no closed-form solution for W when dL/dw = 0

16 Gradient Descent 1. Start with a random value of w (e.g. w = 12) 𝐿 𝑤
2. Compute the gradient (derivative) of L(w) at point w = 12. (e.g. dL/dw = 6) w=12 3. Recompute w as: w = w – lambda * (dL / dw) 𝑤

17 Gradient Descent expensive 𝜆=0.01 for e = 0, num_epochs do end
Initialize w and b randomly 𝑑𝐿(𝑤,𝑏)/𝑑𝑤 𝑑𝐿(𝑤,𝑏)/𝑑𝑏 Compute: and Update w: Update b: 𝑤=𝑤 −𝜆 𝑑𝐿(𝑤,𝑏)/𝑑𝑤 𝑏=𝑏 −𝜆 𝑑𝐿(𝑤,𝑏)/𝑑𝑏 Print: 𝐿(𝑤,𝑏) // Useful to see if this is becoming smaller or not. 𝐿(𝑤,𝑏)= 𝑖=1 𝑛 𝑙(𝑤,𝑏)

18 Stochastic Gradient Descent
𝜆=0.01 for e = 0, num_epochs do end Initialize w and b randomly 𝑑 𝐿 𝐵 (𝑤,𝑏)/𝑑𝑤 𝑑 𝐿 𝐵 (𝑤,𝑏)/𝑑𝑏 Compute: and Update w: Update b: 𝑤=𝑤 −𝜆 𝑑𝑙(𝑤,𝑏)/𝑑𝑤 𝑏=𝑏 −𝜆 𝑑𝑙(𝑤,𝑏)/𝑑𝑏 Print: 𝐿 𝐵 (𝑤,𝑏) // Useful to see if this is becoming smaller or not. 𝐿 𝐵 (𝑤,𝑏)= 𝑖=1 𝐵 𝑙(𝑤,𝑏)

19 Deep Learning Lab 𝑎 𝑗 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 + 𝑏 𝑗 )

20 Two Layer Neural Network
𝑎 1 𝑥 1 𝑎 2 𝑥 2 𝑦 1 𝑦 1 𝑎 3 𝑥 3 𝑎 4 𝑥 4

21 Forward Pass 𝑎 1 𝑥 1 𝑎 2 𝑥 2 𝑦 1 𝑦 1 𝑎 3 𝑥 3 𝑎 4 𝑥 4
𝑧 𝑖 = 𝑖=0 𝑛 𝑤 1𝑖𝑗 𝑥 𝑖 + 𝑏 1 𝑎 𝑖 = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑( 𝑧 𝑖 ) 𝑝 1 = 𝑖=0 𝑛 𝑤 2𝑖 𝑎 𝑖 + 𝑏 2 𝑦 1 = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑( 𝑝 𝑖 ) 𝑎 1 𝑥 1 𝑎 2 𝑥 2 𝐿𝑜𝑠𝑠=𝐿( 𝑦 1 , 𝑦 1 ) 𝑦 1 𝑦 1 𝑎 3 𝑥 3 𝑎 4 𝑥 4

22 Backward Pass - Backpropagation
𝜕𝐿 𝜕 𝑥 𝑘 =( 𝜕 𝜕 𝑥 𝑘 𝑖=0 𝑛 𝑤 1𝑖𝑗 𝑥 𝑖 + 𝑏 1 ) 𝜕𝐿 𝜕 𝑧 𝑖 𝜕𝐿 𝜕 𝑤 1𝑖𝑗 = 𝜕 𝑥 𝑘 𝜕 𝑤 1𝑖𝑗 𝜕𝐿 𝜕 𝑥 𝑘 𝜕𝐿 𝜕 𝑧 𝑖 = 𝜕 𝜕 𝑧 𝑖 𝑆𝑖𝑔𝑚𝑜𝑖𝑑( 𝑧 𝑖 ) 𝜕𝐿 𝜕 𝑎 𝑘 GradInputs 𝜕𝐿 𝜕 𝑎 𝑘 =( 𝜕 𝜕 𝑎 𝑘 𝑖=0 𝑛 𝑤 2𝑖 𝑎 𝑖 + 𝑏 2 ) 𝜕𝐿 𝜕 𝑝 1 𝜕𝐿 𝜕 𝑤 2𝑖 = 𝜕 𝑎 𝑘 𝜕 𝑤 2𝑖 𝜕𝐿 𝜕 𝑎 𝑘 𝜕𝐿 𝜕 𝑝 1 = 𝜕 𝜕 𝑝 1 𝑆𝑖𝑔𝑚𝑜𝑖𝑑( 𝑝 𝑖 ) 𝜕𝐿 𝜕 𝑦 1 𝑎 1 𝑥 1 𝑎 2 𝑥 2 𝜕𝐿 𝜕 𝑦 1 = 𝜕 𝜕 𝑦 1 𝐿( 𝑦 1 , 𝑦 1 ) 𝑦 1 𝑦 1 𝑎 3 𝑥 3 𝑎 4 𝑥 4 GradParams

23 Layer-wise implementation

24 Layer-wise implementation

25 Automatic Differentiation
You only need to write code for the forward pass, backward pass is computed automatically.

26 Questions?


Download ppt "Deep Learning."

Similar presentations


Ads by Google