Presentation is loading. Please wait.

Presentation is loading. Please wait.

Logistic Regression Geoff Hulten.

Similar presentations


Presentation on theme: "Logistic Regression Geoff Hulten."— Presentation transcript:

1 Logistic Regression Geoff Hulten

2 Overview of Logistic Regression
A linear model for classification and probability estimation. Can be very effective when: The problem is linearly separable Or there are a lot of relevant features (10s - 100s of thousands can work) You need something simple and efficient as a baseline Efficient runtime Logistic regression will generally not be the most accurate option.

3 Components of Learning Algorithm: Logistic Regression
Model Structure – Linear model with sigmoid activation Loss Function – Log Loss Optimization Method – Gradient Descent

4 Structure of Logistic Regression
Weight per Feature Bias Weight Linear Model: 𝑤 0 + 𝑖 𝑛 𝑤 𝑖 ∗ 𝑥 𝑖 𝑦 ^ =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 0 + 𝑖 𝑛 𝑤 𝑖 ∗ 𝑥 𝑖 ) 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑧 = 𝑒 −𝑧 Predict 1 Threshold = .9 Predict 0 Threshold = .5 Example: 𝑤 0 𝑤 1 𝑤 2 .25 -1 1 𝑥 1 𝑥 2 1

5 Intuition about additional dimensions
Higher Threshold 3 Dimensions Decision surface is plane N-Dimensions Decision surface is n-dimensional hyper-plane High-dimensions are weird High-dimensional hyper-planes can represent quite a lot Predict 1 𝑤 0 𝑤 1 𝑤 2 .25 -1 1 Predict 0 Lower Threshold

6 Loss Function: Log Loss
y^ -- The predicted 𝑦 (pre-threshold) Log Loss: If 𝑦 is 1: −log⁡(𝑦^) If 𝑦 is 0: −log⁡(1 −𝑦^) Examples y^ y Loss .9 1 .105 .5 .693 2.3 .95 2.99 Use Natural Log (base e)

7 Logistic Regression Loss Function Summary
Log Loss 𝐿𝑜𝑠𝑠 𝑦 ^ ,𝑦 = −log 𝑦 ^ 𝐼𝑓 𝑦=1 𝐿𝑜𝑠𝑠 𝑦 ^ ,𝑦 = −log 1 − 𝑦 ^ 𝐼𝑓 𝑦=0 Same thing expressed in Sneaky Math 𝐿𝑜𝑠𝑠 𝑦 ^ , 𝑦 =−𝑦 ∗ 𝑙𝑜𝑔 𝑦 ^ − 1 −𝑦 ∗𝑙𝑜𝑔(1 − 𝑦 ^ ) Average across the data set 𝐿𝑜𝑠𝑠 𝑑𝑎𝑡𝑎𝑆𝑒𝑡 = 1 𝑛 𝑗 𝑛 𝐿𝑜𝑠𝑠( 𝑦 𝑗 ^ , 𝑦 𝑗 ) 𝑦 ^ is pre-thresholding Use natural log (base e)

8 Logistic Regression Optimization: Gradient Descent
Predict 1 Predict 0 𝑤 0 𝑤 1 .25 -1 𝑤 0 𝑤 1 .1 -1.6 ‘Initial’ Model Updated Model Training Set

9 Finding the Gradient Derivative of Loss Function with respect to model weights Gradient for 𝑤 𝑖 for training sample 𝑥 𝑑𝐿𝑜𝑠𝑠(𝑥) 𝑑𝜃 → 𝜕𝐿𝑜𝑠𝑠(𝑥) 𝜕 𝑤 𝑖 = … = 𝑦^−𝑦 ∗ 𝑥 𝑖 Model Parameters (all the w’s) Partial Derivative per weight Calculus you don’t need to remember 𝜕𝐿𝑜𝑠𝑠(𝑇𝑟𝑎𝑖𝑛𝑆𝑒𝑡) 𝜕 𝑤 𝑖 = 1 𝑛 𝑗=0 𝑛−1 𝑦 𝑗 ^ − 𝑦 𝑗 ∗ 𝑥 𝑗𝑖 Average across training data set Compute simultaneously for all 𝑤 𝑖 with one pass over data Update each weight by stepping away from gradient 𝑤 𝑖 = 𝑤 𝑖 − ∝ 𝜕𝐿𝑜𝑠𝑠(𝑇𝑟𝑎𝑖𝑛𝑆𝑒𝑡) 𝜕 𝑤 𝑖 Note: 𝑥 0 =1.0 for all samples

10 Logistic Regression Optimization Algorithm
Initialize model weights to 0 Do ‘numIterations’ steps of gradient descent (thousands of steps) Find the gradient for each weight by averaging across the training set Update each weight by taking a step of size ∝ opposite the gradient Parameters ∝ – size of the step to take in each iteration numIterations – number of iterations of gradient descent to perform Or use a convergence criteria… Threshold – value between 0-1 to convert 𝑦 ^ into a classification


Download ppt "Logistic Regression Geoff Hulten."

Similar presentations


Ads by Google