Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you.

Similar presentations


Presentation on theme: "Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you."— Presentation transcript:

1 Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA +

2 Supervised learning Notation –Features x –Targets y –Predictions ŷ –Parameters  Program (“Learner”) Characterized by some “parameters” θ Procedure (using θ ) that outputs a prediction Training data (examples) Features Learning algorithm Change θ Improve performance Feedback / Target values Score performance (“cost function”)

3 Linear regression Define form of function f(x) explicitly Find a good f(x) within that family Target y Feature x “Predictor”: Evaluate line: return r (c) Alexander Ihler

4 More dimensions? x1x1 x2x2 y x1x1 x2x2 y (c) Alexander Ihler

5 Notation Define “feature” x 0 = 1 (constant) Then (c) Alexander Ihler

6 Measuring error Error or “residual” Prediction Observation (c) Alexander Ihler

7 Mean squared error How can we quantify the error? Could choose something else, of course… –Computationally convenient (more later) –Measures the variance of the residuals –Corresponds to likelihood under Gaussian model of “noise” (c) Alexander Ihler

8 MSE cost function Rewrite using matrix form (Matlab) >> e = y’ – th*X’; J = e*e’/m; (c) Alexander Ihler

9 Visualizing the cost function (c) Alexander Ihler θ1θ1 θ 0 J( θ )

10 Supervised learning Notation –Features x –Targets y –Predictions ŷ –Parameters  Program (“Learner”) Characterized by some “parameters” θ Procedure (using θ ) that outputs a prediction Training data (examples) Features Learning algorithm Change θ Improve performance Feedback / Target values Score performance (“cost function”)

11 Finding good parameters Want to find parameters which minimize our error… Think of a cost “surface”: error residual for that θ… (c) Alexander Ihler

12 Machine Learning and Data Mining Linear regression: direct minimization (adapted from) Prof. Alexander Ihler +

13 MSE Minimum Consider a simple problem –One feature, two data points –Two unknowns: µ 0, µ 1 –Two equations: Can solve this system directly: However, most of the time, m > n –There may be no linear function that hits all the data exactly –Instead, solve directly for minimum of MSE function (c) Alexander Ihler

14 SSE Minimum Reordering, we have X (X T X) -1 is called the “pseudo-inverse” If X T is square and independent, this is the inverse If m > n: overdetermined; gives minimum MSE fit (c) Alexander Ihler

15 Matlab SSE This is easy to solve in Matlab… % y = [y1 ; … ; ym] % X = [x1_0 … x1_m ; x2_0 … x2_m ; …] % Solution 1: “manual” th = y’ * X * inv(X’ * X); % Solution 2: “mrdivide” th = y’ / X’;% th*X’ = y => th = y/X’ (c) Alexander Ihler

16 Effects of MSE choice Sensitivity to outliers 16 2 cost for this one datum Heavy penalty for large errors (c) Alexander Ihler

17 L1 error L2, original data L1, original data L1, outlier data (c) Alexander Ihler

18 Cost functions for regression “Arbitrary” functions can’t be solved in closed form… - use gradient descent (MSE) (MAE) Something else entirely… (???) (c) Alexander Ihler

19 Machine Learning and Data Mining Linear regression: nonlinear features (adapted from) Prof. Alexander Ihler +

20 Nonlinear functions What if our hypotheses are not lines? –Ex: higher-order polynomials (c) Alexander Ihler

21 Nonlinear functions Single feature x, predict target y: Sometimes useful to think of “feature transform” Add features: Linear regression in new features (c) Alexander Ihler

22 Higher-order polynomials Fit in the same way More “features”

23 Features In general, can use any features we think are useful Other information about the problem –Sq. footage, location, age, … Polynomial functions –Features [1, x, x 2, x 3, …] Other functions –1/x, sqrt(x), x 1 * x 2, … “Linear regression” = linear in the parameters –Features we can make as complex as we want! (c) Alexander Ihler

24 Higher-order polynomials Are more features better? “Nested” hypotheses –2 nd order more general than 1 st, –3 rd order “ “ than 2 nd, … Fits the observed data better

25 Overfitting and complexity More complex models will always fit the training data better But they may “overfit” the training data, learning complex relationships that are not really present X Y Complex model X Y Simple model (c) Alexander Ihler

26 Test data After training the model Go out and get more data from the world –New observations (x,y) How well does our model perform? (c) Alexander Ihler Training data New, “test” data

27 Training data Training versus test error Plot MSE as a function of model complexity –Polynomial order Decreases –More complex function fits training data better What about new data? Mean squared error Polynomial order New, “test” data 0 th to 1 st order –Error decreases –Underfitting Higher order –Error increases –Overfitting (c) Alexander Ihler


Download ppt "Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you."

Similar presentations


Ads by Google