Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 02: Linear Regression

Similar presentations


Presentation on theme: "Lecture 02: Linear Regression"— Presentation transcript:

1 Lecture 02: Linear Regression
CS480/680: Intro to ML Lecture 02: Linear Regression 9/14/17 Yao-Liang Yu

2 Outline Announcements Linear Regression Regularization
Cross-validation 9/14/17 Yao-Liang Yu

3 Outline Announcements Linear Regression Regularization
Cross-validation 9/14/17 Yao-Liang Yu

4 Announcements Assignment 1 is out. TA office hour? Enrolment
Due in two weeks TA office hour? Enrolment CS680: permission numbers sent CS480: ~7 seats available on Quest, ask CS advisors! 9/14/17 Yao-Liang Yu

5 Outline Announcements Linear Regression Regularization
Cross-validation 9/14/17 Yao-Liang Yu

6 Regression Given pairs (xi, yi), find function f such that
xi: feature vector, d-dim real vector yi: response, m-dim real vector (m=1 say) 𝑓( 𝒙 𝑖 )≈ 𝒚 𝑖 9/14/17 Yao-Liang Yu

7 How much should I bid for?
Interpolation vs. Extrapolation Linear vs. Nonlinear 9/14/17 Yao-Liang Yu

8 Exact interpolation Theorem. For any finite data set D= {(xi, yi): i = 1, …, n}, there exist infinitely many functions f so that for all i, 𝑓 𝒙 𝑖 = 𝒚 𝑖 On new data x, predict 𝑦 =𝑓(𝑥) Can be wildly different for different f ! 9/14/17 Yao-Liang Yu

9 Statistical Learning Assume data (Xi, Yi) iid samples from an unknown distr. Least squares regression: Role of training difficult to evaluate average error Law of large numbers 9/14/17 Yao-Liang Yu

10 The regression function
Many ways to estimate m(X) Simplest: Let’s assume it is linear (affine)! Inherent noise variance 9/14/17 Yao-Liang Yu

11 Linear Least-squares Regression
9/14/17 Yao-Liang Yu

12 Finally Sum of square residuals True responses
Hyperplane (again!) parameterized by W 9/14/17 Yao-Liang Yu

13 Why least squares? Theorem (Sondermann’86; Friedland and Torokhti’07; Yu and Schuurmans’11) Among all minimizers of minW ||AWB – C||F, W=A+CB+ is the one that has minimal F-norm. Pseudo-inverse A+ is the unique matrix G such that AGA = A, GAG = G, (AG)T=AG, (GA)T=GA Singular Value Decomposition A=USVT A+=VS-1UT 9/14/17 Yao-Liang Yu

14 Optimization detour Fermat’s Theorem. Necessarily
(Fréchet) Derivative at x. Example. f(x) = xTAx + xTb + c [Df(x)]T = ∇f(x) = (A+AT)x + b 9/14/17 Yao-Liang Yu

15 Solving least squares Normal Equation XTX may not be invertible, but there is always a solution Even invertible, never compute W = (XTX)-1XTY ! Instead, solve the linear system 9/14/17 Yao-Liang Yu

16 Prediction Once have W, can predict How to evaluate?
Sometimes we evaluate using a different Leads to a beautiful theory of calibration 9/14/17 Yao-Liang Yu

17 Outline Announcements Linear Regression Regularization
Cross-validation 9/14/17 Yao-Liang Yu

18 Ill-posedness Let x1=(0, 1), x2=(ε, 1), y1=1, y2=-1
w = X-1y = −2/ε 1 Slight perturbation leads to chaotic behaviour 9/14/17 Yao-Liang Yu

19 Tiknohov regularization (Hoerl and Kennard’70)
Reg. constant (hyperparameter) Ridge regression With positive lambda, slight perturbation in input leads to proportional (wrt 1/lambda) perturbation in output 9/14/17 Yao-Liang Yu

20 Data augmentation 9/14/17 Yao-Liang Yu

21 Sparsity Ridge regression weight is always dense Lasso (Tibshirani’96)
Computationally heavy Interpretationally cumbersome Lasso (Tibshirani’96) 9/14/17 Yao-Liang Yu

22 Regularization vs. Constraint
Computationally appealing Always true Mild conditions Theoretically appealing 9/14/17 Yao-Liang Yu

23 Outline Announcements Linear Regression Regularization
Cross-validation 9/14/17 Yao-Liang Yu

24 Cross-validation … Training set Validation Test set 1 5 k-1 k 2 3 4
9/14/17 Yao-Liang Yu

25 Cross-validation … Training set Test set 1 2 3 4 5 k-1 k
For each lambda, perf1 9/14/17 Yao-Liang Yu

26 Cross-validation … Training set Test set 1 2 3 4 5 k-1 k
For each lambda, perf1 + perf2 9/14/17 Yao-Liang Yu

27 Cross-validation … Training set Test set 1 2 3 4 5 k-1 k
For each lambda, perf1 + perf2 + … + perfk 9/14/17 Yao-Liang Yu

28 Cross-validation … Training set Test set 1 2 3 4 5 k-1 k Wlambda*
For each lambda, perf(lambda) = perf1 + perf2 + … + perfk lambda* = argmaxlambda perf(lambda) 9/14/17 Yao-Liang Yu

29 Questions? 9/14/17 Yao-Liang Yu

30 Robustness 9/14/17 Yao-Liang Yu

31 Gauss vs. Laplace 9/14/17 Yao-Liang Yu

32 Multi-task learning Everything we’ve shown still holds if Y is m-dim
But, can solve each column of Y independently Things are more interesting if we had regularization 9/14/17 Yao-Liang Yu

33 Linear regression Assumption: Dream: Law of Large Numbers: Reality:
distribution unknown… empirical risk 9/14/17 Yao-Liang Yu


Download ppt "Lecture 02: Linear Regression"

Similar presentations


Ads by Google