Lecture 02: Linear Regression

Lecture 02: Linear Regression
CS480/680: Intro to ML Lecture 02: Linear Regression 9/14/17 Yao-Liang Yu

Outline Announcements Linear Regression Regularization
Cross-validation 9/14/17 Yao-Liang Yu

Announcements Assignment 1 is out. TA office hour? Enrolment
Due in two weeks TA office hour? Enrolment CS680: permission numbers sent CS480: ~7 seats available on Quest, ask CS advisors! 9/14/17 Yao-Liang Yu

Regression Given pairs (xi, yi), find function f such that
xi: feature vector, d-dim real vector yi: response, m-dim real vector (m=1 say) 𝑓( 𝒙 𝑖 )≈ 𝒚 𝑖 9/14/17 Yao-Liang Yu

How much should I bid for?
Interpolation vs. Extrapolation Linear vs. Nonlinear 9/14/17 Yao-Liang Yu

Exact interpolation Theorem. For any finite data set D= {(xi, yi): i = 1, …, n}, there exist infinitely many functions f so that for all i, 𝑓 𝒙 𝑖 = 𝒚 𝑖 On new data x, predict 𝑦 =𝑓(𝑥) Can be wildly different for different f ! 9/14/17 Yao-Liang Yu

Statistical Learning Assume data (Xi, Yi) iid samples from an unknown distr. Least squares regression: Role of training difficult to evaluate average error Law of large numbers 9/14/17 Yao-Liang Yu

The regression function
Many ways to estimate m(X) Simplest: Let’s assume it is linear (affine)! Inherent noise variance 9/14/17 Yao-Liang Yu

Linear Least-squares Regression
9/14/17 Yao-Liang Yu

Finally Sum of square residuals True responses
Hyperplane (again!) parameterized by W 9/14/17 Yao-Liang Yu

Why least squares? Theorem (Sondermann’86; Friedland and Torokhti’07; Yu and Schuurmans’11) Among all minimizers of minW ||AWB – C||F, W=A+CB+ is the one that has minimal F-norm. Pseudo-inverse A+ is the unique matrix G such that AGA = A, GAG = G, (AG)T=AG, (GA)T=GA Singular Value Decomposition A=USVT A+=VS-1UT 9/14/17 Yao-Liang Yu

Optimization detour Fermat’s Theorem. Necessarily
(Fréchet) Derivative at x. Example. f(x) = xTAx + xTb + c [Df(x)]T = ∇f(x) = (A+AT)x + b 9/14/17 Yao-Liang Yu

Solving least squares Normal Equation XTX may not be invertible, but there is always a solution Even invertible, never compute W = (XTX)-1XTY ! Instead, solve the linear system 9/14/17 Yao-Liang Yu

Prediction Once have W, can predict How to evaluate?
Sometimes we evaluate using a different Leads to a beautiful theory of calibration 9/14/17 Yao-Liang Yu

Ill-posedness Let x1=(0, 1), x2=(ε, 1), y1=1, y2=-1
w = X-1y = −2/ε 1 Slight perturbation leads to chaotic behaviour 9/14/17 Yao-Liang Yu

Tiknohov regularization (Hoerl and Kennard’70)
Reg. constant (hyperparameter) Ridge regression With positive lambda, slight perturbation in input leads to proportional (wrt 1/lambda) perturbation in output 9/14/17 Yao-Liang Yu

Data augmentation 9/14/17 Yao-Liang Yu

Sparsity Ridge regression weight is always dense Lasso (Tibshirani’96)
Computationally heavy Interpretationally cumbersome Lasso (Tibshirani’96) 9/14/17 Yao-Liang Yu

Regularization vs. Constraint
Computationally appealing Always true Mild conditions Theoretically appealing 9/14/17 Yao-Liang Yu

Cross-validation … Training set Validation Test set 1 5 k-1 k 2 3 4
9/14/17 Yao-Liang Yu

Cross-validation … Training set Test set 1 2 3 4 5 k-1 k
For each lambda, perf1 9/14/17 Yao-Liang Yu

For each lambda, perf1 + perf2 9/14/17 Yao-Liang Yu

For each lambda, perf1 + perf2 + … + perfk 9/14/17 Yao-Liang Yu

Cross-validation … Training set Test set 1 2 3 4 5 k-1 k Wlambda*
For each lambda, perf(lambda) = perf1 + perf2 + … + perfk lambda* = argmaxlambda perf(lambda) 9/14/17 Yao-Liang Yu

Questions? 9/14/17 Yao-Liang Yu

Robustness 9/14/17 Yao-Liang Yu

Gauss vs. Laplace 9/14/17 Yao-Liang Yu

Multi-task learning Everything we’ve shown still holds if Y is m-dim
But, can solve each column of Y independently Things are more interesting if we had regularization 9/14/17 Yao-Liang Yu

Linear regression Assumption: Dream: Law of Large Numbers: Reality:
distribution unknown… empirical risk 9/14/17 Yao-Liang Yu

Lecture 02: Linear Regression

Similar presentations

Presentation on theme: "Lecture 02: Linear Regression"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 02: Linear Regression

Similar presentations

Presentation on theme: "Lecture 02: Linear Regression"— Presentation transcript:

Similar presentations

About project

Feedback