Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Regression Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.

Similar presentations


Presentation on theme: "CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Regression Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist."— Presentation transcript:

1 CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Regression Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Some slides extracted from Data Mining, Introductory and Advanced Topics, Prentice Hall, 2002.

2 2CSE 5331/7331 F'07 Table of Contents Linear Regression Linear Regression Nonlinear Regression Nonlinear Regression Logistic Regression Logistic Regression Metrics Metrics

3 3CSE 5331/7331 F'07 Remember High School? Y= mx + b Y= mx + b You need two points to determine a straight line. You need two points to determine a straight line. You need two points to find values for m and b. You need two points to find values for m and b. THIS IS REGRESSION

4 © Prentice Hall4CSE 5331/7331 F'07 Regression Predict future values based on past values Predict future values based on past values Linear Regression assumes linear relationship exists. Linear Regression assumes linear relationship exists. y = c 0 + c 1 x 1 + … + c n x n Find values to best fit the data Find values to best fit the data

5 © Prentice Hall5CSE 5331/7331 F'07 Linear Regression

6 © Prentice Hall6CSE 5331/7331 F'07 Linear Regression Assume data fits a predefined function Assume data fits a predefined function Determine best values for regression coefficients c 0,c 1,…,c n. Determine best values for regression coefficients c 0,c 1,…,c n. Assume an error: y = c 0 +c 1 x 1 +…+c n x n Assume an error: y = c 0 +c 1 x 1 +…+c n x n +  Estimate error using mean squared error for training set:

7 7CSE 5331/7331 F'07 Linear Regression Poor Fit Why use sum of least squares? http://curvefit.com/sum_of_squares.htm Linear doesn’t always work well

8 8CSE 5331/7331 F'07 Nonlinear Regression Data does not nicely fit a straight line Data does not nicely fit a straight line Fit data to a curve Fit data to a curve Many possible functions Many possible functions Not as easy and straightforward as linear regression Not as easy and straightforward as linear regression How nonlinear regression works: How nonlinear regression works: http://curvefit.com/how_nonlin_works.htm

9 9CSE 5331/7331 F'07 Logistic Regression Generalized linear model Generalized linear model Predict discrete outcome Predict discrete outcome –Binomial (binary) logistic regression –Multinomial logistic regression One dependent variable One dependent variable Logistic Regression by Gerard E. Dallal Logistic Regression by Gerard E. Dallal http://www.tufts.edu/~gdallal/logistic.htm

10 10CSE 5331/7331 F'07 Logistic Regression (cont’d) Log Odds Function: Log Odds Function: P is probability that outcome is 1 P is probability that outcome is 1 Odds – The probability the event occurs divided by the probability that it does not occur Odds – The probability the event occurs divided by the probability that it does not occur Log Odds function is strictly increasing as p increases Log Odds function is strictly increasing as p increases

11 11CSE 5331/7331 F'07 Why Log Odds? Shape of curve is desirable Shape of curve is desirable Relationship to probability Relationship to probability Range – to + Range – to +

12 12CSE 5331/7331 F'07 P-value The probability that a variable has a value greater than the observed value The probability that a variable has a value greater than the observed value http://en.wikipedia.org/wiki/P-value http://en.wikipedia.org/wiki/P-value http://en.wikipedia.org/wiki/P-value http://sportsci.org/resource/stats/pvalue s.html http://sportsci.org/resource/stats/pvalue s.html http://sportsci.org/resource/stats/pvalue s.html http://sportsci.org/resource/stats/pvalue s.html

13 © Prentice Hall13CSE 5331/7331 F'07 Correlation Examine the degree to which the values for two variables behave similarly. Examine the degree to which the values for two variables behave similarly. Correlation coefficient r: Correlation coefficient r: 1 = perfect correlation1 = perfect correlation -1 = perfect but opposite correlation-1 = perfect but opposite correlation 0 = no correlation0 = no correlation

14 © Prentice Hall14CSE 5331/7331 F'07 Covariance Degree to which two variables vary in the same manner Degree to which two variables vary in the same manner Correlation is normalized and covariance is not Correlation is normalized and covariance is not http://www.ds.unifi.it/VL/VL_EN/expect/ expect3.html http://www.ds.unifi.it/VL/VL_EN/expect/ expect3.html http://www.ds.unifi.it/VL/VL_EN/expect/ expect3.html http://www.ds.unifi.it/VL/VL_EN/expect/ expect3.html

15 15CSE 5331/7331 F'07 Residual Error Error Difference between desired output and predicted output Difference between desired output and predicted output May actually use sum of squares May actually use sum of squares


Download ppt "CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Regression Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist."

Similar presentations


Ads by Google