Probabilistic Models for Linear Regression

Probabilistic Models for Linear Regression
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Regression Problem N iid training samples { 𝑥 𝑛 , 𝑦 𝑛 }
Response / Output / Target : 𝑦 𝑛 ∈𝑅 Input / Feature vector: 𝑋∈ 𝑅 𝑑 Linear Regression 𝑦 𝑛 = 𝑤 𝑇 𝑥 𝑛 + 𝜖 𝑛 Polynomial Regression 𝑦 𝑛 = 𝑤 𝑇 𝜙 𝑥 𝑛 + 𝜖 𝑛 𝜙 𝑗 𝑥 = 𝑥 𝑗 Still linear function of w Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Least Squares Formulation
Deterministic error term 𝜖 𝑛 Minimize total error 𝐸 𝑤 = 𝑛 𝜖 𝑛 2 𝑤 ∗ = arg min 𝑤 𝐸(𝑤) Find gradient wrt 𝑤 and equate to 0 𝑤 ∗ = 𝑋 𝑇 𝑋 −1 𝑋 𝑇 𝑦 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Regularization for Regression
How does regression overfit? Adding regularization to regression 𝐸 1 𝑤,𝐷 + 𝜆𝐸 2 𝑤 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Regularization for Regression
Possibilities for regularizers 𝑙 2 norm 𝑤 𝑇 𝑤 (Ridge regression) Quadratic: Continuous, convex 𝑤 ∗ = 𝜆𝐼+ 𝑋 𝑇 𝑋 −1 𝑋 𝑇 𝑌 𝑙 1 norm (Lasso) Choosing 𝜆 Cross validation: wastes training data … Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Probabilistic formulation
Model X and Y as random variables Directly model conditional distribution of Y IID 𝑌 𝑖 | 𝑋 𝑖 =𝑥∼𝑖𝑖𝑑 𝑝(𝑦|𝑥) Linear 𝑌 𝑖 = 𝑤 𝑇 𝑋 𝑖 + 𝜖 𝑛 , 𝜖 𝑛 ∼𝑖𝑖𝑑 𝑝 𝜖 Gaussian noise 𝑝 𝜖 =𝑁 0, 𝜎 2 𝑝 𝑦 𝑥 = 𝜋 𝜎 exp{− 𝑦− 𝑤 𝑇 𝑥 𝜎 2 } Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Probabilistic formulation
Image from Michael Jordan’s book Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Maximum Likelihood Estimation
Formulate loglikelihood 𝐿 𝑤 = 𝑛 𝑝 𝑦 𝑛 𝑥 𝑛 ;𝑤 = 1 2𝜋𝜎^2 𝑁/2 exp⁡{− 1 2𝜋 𝜎 𝑛 𝑦− 𝑤 𝑇 𝑥 𝑛 } 𝑙 𝑤 = 𝑛 𝑦− 𝑤 𝑇 𝑥 𝑛 2 Recovers LMS formulation! Maximize to get MLE 𝑤 𝑀𝐿 = 𝑋 𝑇 𝑋 −1 𝑋 𝑇 𝑌 𝜎 2 𝑀𝐿 = 1 𝑁 𝑛 ( 𝑦 𝑛 − 𝑤 𝑀𝐿 𝑇 𝑥 𝑛 ) Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Bayesian Linear Regression
Model W as random variable with prior distribution 𝑝 𝑤 =𝑁 𝑚 0 , 𝑆 0 ;𝑤, 𝑚 0 is 𝑀×1, 𝑆 0 is 𝑀×𝑀 Derive posterior distribution 𝑝 𝑤 𝑦 =𝑁 𝑚 𝑁 , 𝑆 𝑁 (for some 𝑚 𝑁 , 𝑆 𝑁 ) Derive mean of posterior distribution 𝑤 𝐵 =𝐸 𝑊 𝑦 = 𝑚 𝑁 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Iterative Solutions for Normal Equations
Direct solutions have limitations Iterative solutions First order method: Gradient descent 𝑤 (𝑡+1) ← 𝑤 (𝑡) +𝜌 𝑛 𝑦 𝑛 − 𝑤 𝑡 𝑇 𝑥 𝑛 𝑥 𝑛 Convergence guarantees Convergence in probability to correct solution for appropriate fixed step size Sure convergence with decreasing step sizes Stochastic gradient descent Update based on a single data point as each step Often converges faster Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Advantages of Probabilistic Modeling
Makes assumptions explicit Modularity Conceptually simple to change a model by replacing with appropriate distributions Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Summary Probabilistic formulation of linear regression
Recovers least squares formulation Iterative algorithms for training Forms of regularization Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Probabilistic Models for Linear Regression

Similar presentations

Presentation on theme: "Probabilistic Models for Linear Regression"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Probabilistic Models for Linear Regression

Similar presentations

Presentation on theme: "Probabilistic Models for Linear Regression"— Presentation transcript:

Similar presentations

About project

Feedback