Download presentation
Presentation is loading. Please wait.
1
Deep Learning
2
About the Course CS6501: Vision and Language
Instructor: Vicente Ordonez Website: Location: Thornton Hall E316 Times: Tuesday - Thursday 12:30PM - 1:45PM Faculty Office hours: Tuesdays 3 - 4pm (Rice 310) Discuss in Piazza:
3
Today Quick review into Machine Learning. Linear Regression
Neural Networks Backpropagation
4
Linear Regression π π = π π€ ππ π₯ π + π π π= π π π₯+π
Prediction, Inference, Testing π π = π π€ ππ π₯ π + π π π= π π π₯+π Training, Learning, Parameter estimation Objective minimization πΏ π, π = π=1 |π·| π( π π , π¦ (π) ) π·={( π₯ π , π¦ π )} π β , π β =argmin πΏ(π, π)
5
total revenue international
Linear Regression Example: Hollywood movie data input variables x output variables y production costs promotional costs genre of the movie box office first week total book sales total revenue USA total revenue international π₯ 1 (1) π₯ 2 (1) π₯ 3 (1) π₯ 4 (1) π₯ 5 (1) π₯ 6 (1) π₯ 7 (1) π₯ 1 (2) π₯ 2 (2) π₯ 3 (2) π₯ 4 (2) π₯ 5 (2) π₯ 6 (2) π₯ 7 (2) π₯ 1 (3) π₯ 2 (3) π₯ 3 (3) π₯ 4 (3) π₯ 5 (3) π₯ 6 (3) π₯ 7 (3) π₯ 1 (4) π₯ 2 (4) π₯ 3 (4) π₯ 4 (4) π₯ 5 (4) π₯ 6 (4) π₯ 7 (4) π₯ 1 (5) π₯ 2 (5) π₯ 3 (5) π₯ 4 (5) π₯ 5 (5) π₯ 6 (5) π₯ 7 (5)
6
total revenue international
Linear Regression Example: Hollywood movie data input variables x output variables y production costs promotional costs genre of the movie box office first week total book sales total revenue USA total revenue international π₯ 1 (1) π₯ 2 (1) π₯ 3 (1) π₯ 4 (1) π₯ 5 (1) π¦ 1 (1) π¦ 2 (1) π₯ 1 (2) π₯ 2 (2) π₯ 3 (2) π₯ 4 (2) π₯ 5 (2) π¦ 1 (2) π¦ 2 (2) π₯ 1 (3) π₯ 2 (3) π₯ 3 (3) π₯ 4 (3) π₯ 5 (3) π¦ 1 (3) π¦ 2 (3) π₯ 1 (4) π₯ 2 (4) π₯ 3 (4) π₯ 4 (4) π₯ 5 (4) π¦ 1 (4) π¦ 2 (4) π₯ 1 (5) π₯ 2 (5) π₯ 3 (5) π₯ 4 (5) π₯ 5 (5) π¦ 1 (5) π¦ 2 (5)
7
total revenue international
Linear Regression Example: Hollywood movie data input variables x output variables y production costs promotional costs genre of the movie box office first week total book sales total revenue USA total revenue international π₯ 1 (1) π₯ 2 (1) π₯ 3 (1) π₯ 4 (1) π₯ 5 (1) π¦ 1 (1) π¦ 2 (1) training data π₯ 1 (2) π₯ 2 (2) π₯ 3 (2) π₯ 4 (2) π₯ 5 (2) π¦ 1 (2) π¦ 2 (2) π₯ 1 (3) π₯ 2 (3) π₯ 3 (3) π₯ 4 (3) π₯ 5 (3) π¦ 1 (3) π¦ 2 (3) π₯ 1 (4) π₯ 2 (4) π₯ 3 (4) π₯ 4 (4) π₯ 5 (4) π¦ 1 (4) π¦ 2 (4) test data π₯ 1 (5) π₯ 2 (5) π₯ 3 (5) π₯ 4 (5) π₯ 5 (5) π¦ 1 (5) π¦ 2 (5)
8
Linear Regression β Least Squares
π π = π π€ ππ π₯ π + π π π= π π π₯+π Training, Learning, Parameter estimation Objective minimization πΏ π, π = π=1 |π·| π( π π , π¦ (π) ) π·={( π₯ π , π¦ π )} π β , π β =argmin πΏ(π, π)
9
Linear Regression β Least Squares
π π = π π€ ππ π₯ π + π π π= π π π₯+π Training, Learning, Parameter estimation Objective minimization πΏ π, π = π=1 |π·| π π β π¦ π 2 π·={( π₯ π , π¦ π )} π β , π β =argmin πΏ(π, π)
10
Linear Regression β Least Squares
πΏ π, π = π=1 |π·| π π β π¦ π 2 π π (π) = π π€ ππ π₯ π (π) + π π πΏ π π, π = π=1 |π·| π π€ ππ π₯ π (π) + π π β π¦ π π 2
11
Linear Regression β Least Squares
πΏ π π, π = π=1 |π·| π π€ ππ π₯ π (π) + π π β π¦ π π 2 π β , π β =argmin πΏ(π, π) π πΏ π π π€ π’π£ π, π = ππΏ π π€ π’π£ ( π=1 |π·| π π€ ππ π₯ π (π) + π π β π¦ π π 2 )
12
Linear Regression β Least Squares
π πΏ π π π€ π’π£ π, π = π πΏ π π π€ π’π£ ( π=1 |π·| π π€ ππ π₯ π (π) + π π β π¦ π π 2 ) π πΏ π π π€ π’π£ π, π = ( π=1 |π·| π πΏ π π π€ π’π£ π π€ ππ π₯ π (π) + π π β π¦ π π 2 ) =0 β¦ π= (π π π) β1 π π π
13
Neural Network with One Layer
π=[π€ ππ ] π₯ 1 π 1 π₯ 2 π₯ 3 π 2 π₯ 4 π₯ 5 π π =π ππππππ( π π€ ππ π₯ π + π π )
14
Neural Network with One Layer
πΏ π, π = π=1 |π·| π π β π¦ π 2 π π (π) =π ππππππ( π π€ ππ π₯ π π + π π ) πΏ π π, π = π=1 |π·| π ππππππ( π π€ ππ π₯ π π + π π )β π¦ π π 2
15
Neural Network with One Layer
πΏ π π, π = π=1 |π·| π ππππππ( π π€ ππ π₯ π π + π π )β π¦ π π 2 π πΏ π π π€ π’π£ = π=1 |π·| π ππππππ( π π€ ππ π₯ π π + π π )β π¦ π π 2 =0 We can compute this derivative but there is no closed-form solution for W when dL/dw = 0
16
Gradient Descent 1. Start with a random value of w (e.g. w = 12) πΏ π€
2. Compute the gradient (derivative) of L(w) at point w = 12. (e.g. dL/dw = 6) w=12 3. Recompute w as: w = w β lambda * (dL / dw) π€
17
Gradient Descent expensive π=0.01 for e = 0, num_epochs do end
Initialize w and b randomly ππΏ(π€,π)/ππ€ ππΏ(π€,π)/ππ Compute: and Update w: Update b: π€=π€ βπ ππΏ(π€,π)/ππ€ π=π βπ ππΏ(π€,π)/ππ Print: πΏ(π€,π) // Useful to see if this is becoming smaller or not. πΏ(π€,π)= π=1 π π(π€,π)
18
Stochastic Gradient Descent
π=0.01 for e = 0, num_epochs do end Initialize w and b randomly π πΏ π΅ (π€,π)/ππ€ π πΏ π΅ (π€,π)/ππ Compute: and Update w: Update b: π€=π€ βπ ππ(π€,π)/ππ€ π=π βπ ππ(π€,π)/ππ Print: πΏ π΅ (π€,π) // Useful to see if this is becoming smaller or not. πΏ π΅ (π€,π)= π=1 π΅ π(π€,π)
19
Deep Learning Lab π π =π ππππππ( π π€ ππ π₯ π + π π )
20
Two Layer Neural Network
π 1 π₯ 1 π 2 π₯ 2 π¦ 1 π¦ 1 π 3 π₯ 3 π 4 π₯ 4
21
Forward Pass π 1 π₯ 1 π 2 π₯ 2 π¦ 1 π¦ 1 π 3 π₯ 3 π 4 π₯ 4
π§ π = π=0 π π€ 1ππ π₯ π + π 1 π π = πππππππ( π§ π ) π 1 = π=0 π π€ 2π π π + π 2 π¦ 1 = πππππππ( π π ) π 1 π₯ 1 π 2 π₯ 2 πΏππ π =πΏ( π¦ 1 , π¦ 1 ) π¦ 1 π¦ 1 π 3 π₯ 3 π 4 π₯ 4
22
Backward Pass - Backpropagation
ππΏ π π₯ π =( π π π₯ π π=0 π π€ 1ππ π₯ π + π 1 ) ππΏ π π§ π ππΏ π π€ 1ππ = π π₯ π π π€ 1ππ ππΏ π π₯ π ππΏ π π§ π = π π π§ π πππππππ( π§ π ) ππΏ π π π GradInputs ππΏ π π π =( π π π π π=0 π π€ 2π π π + π 2 ) ππΏ π π 1 ππΏ π π€ 2π = π π π π π€ 2π ππΏ π π π ππΏ π π 1 = π π π 1 πππππππ( π π ) ππΏ π π¦ 1 π 1 π₯ 1 π 2 π₯ 2 ππΏ π π¦ 1 = π π π¦ 1 πΏ( π¦ 1 , π¦ 1 ) π¦ 1 π¦ 1 π 3 π₯ 3 π 4 π₯ 4 GradParams
23
Layer-wise implementation
24
Layer-wise implementation
25
Automatic Differentiation
You only need to write code for the forward pass, backward pass is computed automatically.
26
Questions?
Similar presentations
Β© 2025 SlidePlayer.com Inc.
All rights reserved.