Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 5. Classification and Prediction

Similar presentations


Presentation on theme: "Chapter 5. Classification and Prediction"— Presentation transcript:

1 Chapter 5. Classification and Prediction
What is classification? What is prediction? Issues regarding classification and prediction Classification by decision tree induction Bayesian Classification Classification by Back Propagation Support Vector Machines Associative Classification: Classification by association rule analysis Lazy Learners (or Learning from your Neighbors) Other Classification Methods Prediction Accuracy Summary November 8, 2018 Data Mining: Concepts and Techniques

2 Simple Linear Regression
Simple Linear Regression Model Least Squares Method Coefficient of Determination Model Assumptions Testing for Significance Using the Estimated Regression Equation for Estimation and Prediction Computer Solution Residual Analysis: Validating Model Assumptions Residual Analysis: Outliers and Influential Observations November 8, 2018 Data Mining: Concepts and Techniques

3 Data Mining: Concepts and Techniques
What Is Prediction? Prediction is similar to classification First, construct a model Second, use model to predict unknown value Major method for prediction is regression Linear and multiple regression Non-linear regression Prediction is different from classification Classification refers to predict categorical class label Prediction models continuous-valued functions November 8, 2018 Data Mining: Concepts and Techniques

4 Predictive Modeling in Databases
Predictive modeling: Predict data values or construct generalized linear models based on the database data. One can only predict value ranges or category distributions Method outline: Minimal generalization Attribute relevance analysis Generalized linear model construction Prediction Determine the major factors which influence the prediction Data relevance analysis: uncertainty measurement, entropy analysis, expert judgement, etc. Multi-level prediction: drill-down and roll-up analysis November 8, 2018 Data Mining: Concepts and Techniques

5 Regress Analysis and Log-Linear Models in Prediction
Linear regression: Y = 0 + 1 X Two parameters , 0 and 1 specify the line and are to be estimated by using the data at hand. using the least squares criterion to the known values of Y1, Y2, …, X1, X2, …. Multiple regression: Y = b0 + b1 X1 + b2 X2. Many nonlinear functions can be transformed into the above. Log-linear models: The multi-way table of joint probabilities is approximated by a product of lower-order tables. Probability: p(a, b, c, d) = ab acad bcd November 8, 2018 Data Mining: Concepts and Techniques

6 The Simple Linear Regression Model
y = 0 + 1x +  Simple Linear Regression Equation E(y) = 0 + 1x Estimated Simple Linear Regression Equation y = b0 + b1x ^ November 8, 2018 Data Mining: Concepts and Techniques

7 Simple Linear Regression Model
The population regression model: Random Error term Population Slope Coefficient Population Y intercept Independent Variable Dependent Variable Linear component Random Error component November 8, 2018 Data Mining: Concepts and Techniques

8 Simple Linear Regression Model
(continued) Y Observed Value of Y for Xi εi Slope = β1 Predicted Value of Y for Xi Random Error for this Xi value Intercept = β0 Xi X November 8, 2018 Data Mining: Concepts and Techniques

9 Simple Linear Regression Equation
The simple linear regression equation provides an estimate of the population regression line Estimated (or predicted) y value for observation i Estimate of the regression intercept Estimate of the regression slope Value of x for observation i The individual random error terms ei have a mean of zero November 8, 2018 Data Mining: Concepts and Techniques

10 Least Squares Estimators
b0 and b1 are obtained by finding the values of b0 and b1 that minimize the sum of the squared differences between y and : Differential calculus is used to obtain the coefficient estimators b0 and b1 that minimize SSE November 8, 2018 Data Mining: Concepts and Techniques

11 Least Squares Estimators
(continued) The slope coefficient estimator is And the constant or y-intercept is The regression line always goes through the mean x, y November 8, 2018 Data Mining: Concepts and Techniques

12 Finding the Least Squares Equation
The coefficients b0 and b1 , and other regression results in this chapter, will be found using a computer Hand calculations are tedious Statistical routines are built into Excel Other statistical analysis software can be used November 8, 2018 Data Mining: Concepts and Techniques

13 Data Mining: Concepts and Techniques
Model Assumptions Assumptions About the Error Term  The error  is a random variable with mean of zero. The variance of  , denoted by  2, is the same for all values of the independent variable. The values of  are independent. The error  is a normally distributed random variable. November 8, 2018 Data Mining: Concepts and Techniques

14 Linear Regression Model Assumptions
The true relationship form is linear (Y is a linear function of X, plus random error) The error terms, εi are independent of the x values The error terms are random variables with mean 0 and constant variance, σ2 (the constant variance property is called homoscedasticity) The random error terms, εi, are not correlated with one another, so that November 8, 2018 Data Mining: Concepts and Techniques

15 Interpretation of the Slope and the Intercept
b0 is the estimated average value of y when the value of x is zero (if x = 0 is in the range of observed x values) b1 is the estimated change in the average value of y as a result of a one-unit change in x November 8, 2018 Data Mining: Concepts and Techniques

16 Simple Linear Regression Example
A real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet) A random sample of 10 houses is selected Dependent variable (Y) = house price in $1000s Independent variable (X) = square feet November 8, 2018 Data Mining: Concepts and Techniques

17 Sample Data for House Price Model
House Price in $1000s (Y) Square Feet (X) 245 1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255 November 8, 2018 Data Mining: Concepts and Techniques

18 Graphical Presentation
House price model: scatter plot November 8, 2018 Data Mining: Concepts and Techniques

19 Regression Using Excel
Tools / Data Analysis / Regression November 8, 2018 Data Mining: Concepts and Techniques

20 Regression Statistics
Excel Output Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 10 ANOVA df SS MS F Significance F Regression 1 Residual 8 Total 9 Coefficients t Stat P-value Lower 95% Upper 95% Intercept Square Feet The regression equation is: November 8, 2018 Data Mining: Concepts and Techniques

21 Graphical Presentation
House price model: scatter plot and regression line Slope = Intercept = November 8, 2018 Data Mining: Concepts and Techniques

22 Interpretation of the Intercept, b0
b0 is the estimated average value of Y when the value of X is zero (if X = 0 is in the range of observed X values) Here, no houses had 0 square feet, so b0 = just indicates that, for houses within the range of sizes observed, $98, is the portion of the house price not explained by square feet November 8, 2018 Data Mining: Concepts and Techniques

23 Interpretation of the Slope Coefficient, b1
b1 measures the estimated change in the average value of Y as a result of a one-unit change in X Here, b1 = tells us that the average value of a house increases by ($1000) = $109.77, on average, for each additional one square foot of size November 8, 2018 Data Mining: Concepts and Techniques

24 Example: Reed Auto Sales
Simple Linear Regression Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales are shown below. Number of TV Ads Number of Cars Sold 1 14 3 24 2 18 1 17 3 27 November 8, 2018 Data Mining: Concepts and Techniques

25 Example: Reed Auto Sales
Slope for the Estimated Regression Equation b1 = (10)(100)/5 = 5 24 - (10)2/5 y-Intercept for the Estimated Regression Equation b0 = (2) = 10 Estimated Regression Equation y = x ^ November 8, 2018 Data Mining: Concepts and Techniques

26 Example: Reed Auto Sales
Scatter Diagram November 8, 2018 Data Mining: Concepts and Techniques

27 Measures of Variation Total variation is made up of two parts:
Total Sum of Squares Regression Sum of Squares Error Sum of Squares where: = Average value of the dependent variable yi = Observed values of the dependent variable i = Predicted value of y for the given xi value November 8, 2018 Data Mining: Concepts and Techniques

28 Data Mining: Concepts and Techniques
Measures of Variation (continued) SST = total sum of squares Measures the variation of the yi values around their mean, y SSR = regression sum of squares Explained variation attributable to the linear relationship between x and y SSE = error sum of squares Variation attributable to factors other than the linear relationship between x and y November 8, 2018 Data Mining: Concepts and Techniques

29 Data Mining: Concepts and Techniques
Measures of Variation (continued) Y yi y SSE = (yi - yi )2 _ SST = (yi - y)2 _ y _ SSR = (yi - y)2 _ y y X xi November 8, 2018 Data Mining: Concepts and Techniques

30 Coefficient of Determination, R2
The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable The coefficient of determination is also called R-squared and is denoted as R2 note: November 8, 2018 Data Mining: Concepts and Techniques

31 Examples of Approximate r2 Values
Y r2 = 1 Perfect linear relationship between X and Y: 100% of the variation in Y is explained by variation in X X r2 = 1 Y X r2 = 1 November 8, 2018 Data Mining: Concepts and Techniques

32 Examples of Approximate r2 Values
Y 0 < r2 < 1 Weaker linear relationships between X and Y: Some but not all of the variation in Y is explained by variation in X X Y X November 8, 2018 Data Mining: Concepts and Techniques

33 Examples of Approximate r2 Values
Y No linear relationship between X and Y: The value of Y does not depend on X. (None of the variation in Y is explained by variation in X) X r2 = 0 November 8, 2018 Data Mining: Concepts and Techniques

34 Regression Statistics
Excel Output Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 10 ANOVA df SS MS F Significance F Regression 1 Residual 8 Total 9 Coefficients t Stat P-value Lower 95% Upper 95% Intercept Square Feet 58.08% of the variation in house prices is explained by variation in square feet November 8, 2018 Data Mining: Concepts and Techniques

35 Data Mining: Concepts and Techniques
Correlation and R2 The coefficient of determination, R2, for a simple regression is equal to the simple correlation squared November 8, 2018 Data Mining: Concepts and Techniques

36 The Correlation Coefficient
Sample Correlation Coefficient where: b1 = the slope of the estimated regression equation November 8, 2018 Data Mining: Concepts and Techniques

37 Example: Reed Auto Sales
Sample Correlation Coefficient The sign of b1 in the equation is “+”. rxy = November 8, 2018 Data Mining: Concepts and Techniques

38 Estimation of Model Error Variance
An estimator for the variance of the population model error is Division by n – 2 instead of n – 1 is because the simple regression model uses two estimated parameters, b0 and b1, instead of one is called the standard error of the estimate November 8, 2018 Data Mining: Concepts and Techniques

39 Regression Statistics
Excel Output Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 10 ANOVA df SS MS F Significance F Regression 1 Residual 8 Total 9 Coefficients t Stat P-value Lower 95% Upper 95% Intercept Square Feet November 8, 2018 Data Mining: Concepts and Techniques

40 Comparing Standard Errors
se is a measure of the variation of observed y values from the regression line Y Y X X The magnitude of se should always be judged relative to the size of the y values in the sample data i.e., se = $41.33K is moderately small relative to house prices in the $200 - $300K range November 8, 2018 Data Mining: Concepts and Techniques

41 Inferences About the Regression Model
The variance of the regression slope coefficient (b1) is estimated by where: = Estimate of the standard error of the least squares slope = Standard error of the estimate November 8, 2018 Data Mining: Concepts and Techniques

42 Regression Statistics
Excel Output Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 10 ANOVA df SS MS F Significance F Regression 1 Residual 8 Total 9 Coefficients t Stat P-value Lower 95% Upper 95% Intercept Square Feet November 8, 2018 Data Mining: Concepts and Techniques

43 Comparing Standard Errors of the Slope
is a measure of the variation in the slope of regression lines from different possible samples Y Y X X November 8, 2018 Data Mining: Concepts and Techniques

44 Inference about the Slope: t Test
t test for a population slope Is there a linear relationship between X and Y? Null and alternative hypotheses H0: β1 = 0 (no linear relationship) H1: β1  0 (linear relationship does exist) Test statistic where: b1 = regression slope coefficient β1 = hypothesized slope sb1 = standard error of the slope November 8, 2018 Data Mining: Concepts and Techniques

45 Inference about the Slope: t Test
(continued) Estimated Regression Equation: House Price in $1000s (y) Square Feet (x) 245 1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255 The slope of this model is Does square footage of the house affect its sales price? November 8, 2018 Data Mining: Concepts and Techniques

46 Inferences about the Slope: t Test Example
From Excel output: Coefficients Standard Error t Stat P-value Intercept Square Feet t November 8, 2018 Data Mining: Concepts and Techniques

47 Inferences about the Slope: t Test Example
(continued) Test Statistic: t = 3.329 b1 t H0: β1 = 0 H1: β1  0 From Excel output: Coefficients Standard Error t Stat P-value Intercept Square Feet d.f. = 10-2 = 8 t8,.025 = Decision: Conclusion: Reject H0 a/2=.025 a/2=.025 There is sufficient evidence that square footage affects house price Reject H0 Do not reject H0 Reject H0 -tn-2,α/2 tn-2,α/2 2.3060 3.329 November 8, 2018 Data Mining: Concepts and Techniques

48 Inferences about the Slope: t Test Example
(continued) P-value = P-value H0: β1 = 0 H1: β1  0 From Excel output: Coefficients Standard Error t Stat P-value Intercept Square Feet This is a two-tail test, so the p-value is P(t > 3.329)+P(t < ) = (for 8 d.f.) Decision: P-value < α so Conclusion: Reject H0 There is sufficient evidence that square footage affects house price November 8, 2018 Data Mining: Concepts and Techniques

49 Confidence Interval Estimate for the Slope
Confidence Interval Estimate of the Slope: d.f. = n - 2 Excel Printout for House Prices: Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept Square Feet At 95% level of confidence, the confidence interval for the slope is (0.0337, ) November 8, 2018 Data Mining: Concepts and Techniques

50 Confidence Interval Estimate for the Slope
(continued) Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept Square Feet Since the units of the house price variable is $1000s, we are 95% confident that the average impact on sales price is between $33.70 and $ per square foot of house size This 95% confidence interval does not include 0. Conclusion: There is a significant relationship between house price and square feet at the .05 level of significance November 8, 2018 Data Mining: Concepts and Techniques

51 F-Test for Significance
F Test statistic: where where F follows an F distribution with k numerator and (n – k - 1) denominator degrees of freedom (k = the number of independent variables in the regression model) November 8, 2018 Data Mining: Concepts and Techniques

52 Regression Statistics
Excel Output Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 10 ANOVA df SS MS F Significance F Regression 1 Residual 8 Total 9 Coefficients t Stat P-value Lower 95% Upper 95% Intercept Square Feet With 1 and 8 degrees of freedom P-value for the F-Test November 8, 2018 Data Mining: Concepts and Techniques

53 F-Test for Significance
(continued) Test Statistic: Decision: Conclusion: H0: β1 = 0 H1: β1 ≠ 0  = .05 df1= df2 = 8 Critical Value: F = 5.32 Reject H0 at  = 0.05  = .05 There is sufficient evidence that house size affects selling price F Do not reject H0 Reject H0 F.05 = 5.32 November 8, 2018 Data Mining: Concepts and Techniques

54 Testing for Significance: F Test
Hypotheses H0: 1 = 0 Ha: 1 = 0 Test Statistic F = MSR/MSE Rejection Rule Reject H0 if F > F where F is based on an F distribution with 1 d.f. in the numerator and n - 2 d.f. in the denominator. November 8, 2018 Data Mining: Concepts and Techniques

55 Example: Reed Auto Sales
F Test Hypotheses H0: 1 = 0 Ha: 1 = 0 Rejection Rule For  = .05 and d.f. = 1, 3: F.05 = 10.13 Reject H0 if F > Test Statistic F = MSR/MSE = 100/4.667 = 21.43 Conclusion We can reject H0. November 8, 2018 Data Mining: Concepts and Techniques

56 Data Mining: Concepts and Techniques
Prediction The regression equation can be used to predict a value for y, given a particular x For a specified value, xn+1 , the predicted value is November 8, 2018 Data Mining: Concepts and Techniques

57 Predictions Using Regression Analysis
Predict the price for a house with 2000 square feet: The predicted price for a house with 2000 square feet is ($1,000s) = $317,850 November 8, 2018 Data Mining: Concepts and Techniques

58 Relevant Data Range When using a regression model for prediction, only predict within the relevant range of data Relevant data range Risky to try to extrapolate far beyond the range of observed X’s November 8, 2018 Data Mining: Concepts and Techniques

59 Estimating Mean Values and Predicting Individual Values
Goal: Form intervals around y to express uncertainty about the value of y for a given xi Confidence Interval for the expected value of y, given xi Y y y = b0+b1xi Prediction Interval for an single observed y, given xi xi X November 8, 2018 Data Mining: Concepts and Techniques

60 Confidence Interval for the Average Y, Given X
Confidence interval estimate for the expected value of y given a particular xi Notice that the formula involves the term so the size of interval varies according to the distance xn+1 is from the mean, x November 8, 2018 Data Mining: Concepts and Techniques

61 Prediction Interval for an Individual Y, Given X
Confidence interval estimate for an actual observed value of y given a particular xi This extra term adds to the interval width to reflect the added uncertainty for an individual case November 8, 2018 Data Mining: Concepts and Techniques

62 Estimation of Mean Values: Example
Confidence Interval Estimate for E(Yn+1|Xn+1) Find the 95% confidence interval for the mean price of 2,000 square-foot houses Predicted Price yi = ($1,000s) The confidence interval endpoints are and , or from $280,660 to $354,900 November 8, 2018 Data Mining: Concepts and Techniques

63 Estimation of Individual Values: Example
Confidence Interval Estimate for yn+1 Find the 95% confidence interval for an individual house with 2,000 square feet Predicted Price yi = ($1,000s) The confidence interval endpoints are and , or from $215,500 to $420,070 November 8, 2018 Data Mining: Concepts and Techniques

64 Finding Confidence and Prediction Intervals in Excel
In Excel, use PHStat | regression | simple linear regression … Check the “confidence and prediction interval for x=” box and enter the x-value and confidence level desired November 8, 2018 Data Mining: Concepts and Techniques

65 Finding Confidence and Prediction Intervals in Excel
(continued) Input values y Confidence Interval Estimate for E(Yn+1|Xn+1) Confidence Interval Estimate for individual yn+1 November 8, 2018 Data Mining: Concepts and Techniques

66 Data Mining: Concepts and Techniques
Graphical Analysis The linear regression model is based on minimizing the sum of squared errors If outliers exist, their potentially large squared errors may have a strong influence on the fitted regression line Be sure to examine your data graphically for outliers and extreme points Decide, based on your model and logic, whether the extreme points should remain or be removed November 8, 2018 Data Mining: Concepts and Techniques

67 Data Mining: Concepts and Techniques
Chapter Summary Introduced the linear regression model Reviewed correlation and the assumptions of linear regression Discussed estimating the simple linear regression coefficients Described measures of variation Described inference about the slope Addressed estimation of mean values and prediction of individual values November 8, 2018 Data Mining: Concepts and Techniques

68 The Multiple Regression Model
Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (Xi) Multiple Regression Model with k Independent Variables: Y-intercept Population slopes Random Error November 8, 2018 Data Mining: Concepts and Techniques

69 Multiple Regression Equation
The coefficients of the multiple regression model are estimated using sample data Multiple regression equation with k independent variables: Estimated (or predicted) value of y Estimated intercept Estimated slope coefficients In this chapter we will always use a computer to obtain the regression slope coefficients and other regression summary measures. November 8, 2018 Data Mining: Concepts and Techniques

70 Multiple Regression Equation
(continued) Two variable model y Slope for variable x1 x2 Slope for variable x2 x1 November 8, 2018 Data Mining: Concepts and Techniques

71 Standard Multiple Regression Assumptions
The values xi and the error terms εi are independent The error terms are random variables with mean 0 and a constant variance, 2. (The constant variance property is called homoscedasticity) November 8, 2018 Data Mining: Concepts and Techniques

72 Standard Multiple Regression Assumptions
(continued) The random error terms, εi , are not correlated with one another, so that It is not possible to find a set of numbers, c0, c1, , ck, such that (This is the property of no linear relation for the Xj’s) November 8, 2018 Data Mining: Concepts and Techniques

73 Example: 2 Independent Variables
A distributor of frozen desert pies wants to evaluate factors thought to influence demand Dependent variable: Pie sales (units per week) Independent variables: Price (in $) Advertising ($100’s) Data are collected for 15 weeks November 8, 2018 Data Mining: Concepts and Techniques

74 Data Mining: Concepts and Techniques
Pie Sales Example Week Pie Sales Price ($) Advertising ($100s) 1 350 5.50 3.3 2 460 7.50 3 8.00 3.0 4 430 4.5 5 6.80 6 380 4.0 7 4.50 8 470 6.40 3.7 9 450 7.00 3.5 10 490 5.00 11 340 7.20 12 300 7.90 3.2 13 440 5.90 14 15 2.7 Multiple regression equation: Sales = b0 + b1 (Price) + b2 (Advertising) November 8, 2018 Data Mining: Concepts and Techniques

75 Estimating a Multiple Linear Regression Equation
Excel will be used to generate the coefficients and measures of goodness of fit for multiple regression Excel: Tools / Data Analysis... / Regression PHStat: PHStat / Regression / Multiple Regression… November 8, 2018 Data Mining: Concepts and Techniques

76 Multiple Regression Output
Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 15 ANOVA   df SS MS F Significance F Regression 2 Residual 12 Total 14 Coefficients t Stat P-value Lower 95% Upper 95% Intercept Price Advertising November 8, 2018 Data Mining: Concepts and Techniques

77 The Multiple Regression Equation
where Sales is in number of pies per week Price is in $ Advertising is in $100’s. b1 = : sales will decrease, on average, by pies per week for each $1 increase in selling price, net of the effects of changes due to advertising b2 = : sales will increase, on average, by pies per week for each $100 increase in advertising, net of the effects of changes due to price November 8, 2018 Data Mining: Concepts and Techniques

78 Coefficient of Determination, R2
Reports the proportion of total variation in y explained by all x variables taken together This is the ratio of the explained variability to total sample variability November 8, 2018 Data Mining: Concepts and Techniques

79 Coefficient of Determination, R2
(continued) Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 15 ANOVA   df SS MS F Significance F Regression 2 Residual 12 Total 14 Coefficients t Stat P-value Lower 95% Upper 95% Intercept Price Advertising 52.1% of the variation in pie sales is explained by the variation in price and advertising November 8, 2018 Data Mining: Concepts and Techniques

80 Estimation of Error Variance
Consider the population regression model The unbiased estimate of the variance of the errors is where The square root of the variance, se , is called the standard error of the estimate November 8, 2018 Data Mining: Concepts and Techniques

81 Regression Statistics
Standard Error, se Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 15 ANOVA   df SS MS F Significance F Regression 2 Residual 12 Total 14 Coefficients t Stat P-value Lower 95% Upper 95% Intercept Price Advertising The magnitude of this value can be compared to the average y value November 8, 2018 Data Mining: Concepts and Techniques

82 Adjusted Coefficient of Determination,
R2 never decreases when a new X variable is added to the model, even if the new variable is not an important predictor variable This can be a disadvantage when comparing models What is the net effect of adding a new variable? We lose a degree of freedom when a new X variable is added Did the new X variable add enough explanatory power to offset the loss of one degree of freedom? November 8, 2018 Data Mining: Concepts and Techniques

83 Adjusted Coefficient of Determination,
(continued) Used to correct for the fact that adding non-relevant independent variables will still reduce the error sum of squares (where n = sample size, K = number of independent variables) Adjusted R2 provides a better comparison between multiple regression models with different numbers of independent variables Penalize excessive use of unimportant independent variables Smaller than R2 November 8, 2018 Data Mining: Concepts and Techniques

84 Regression Statistics
Multiple R R Square Adjusted R Square Standard Error Observations 15 ANOVA   df SS MS F Significance F Regression 2 Residual 12 Total 14 Coefficients t Stat P-value Lower 95% Upper 95% Intercept Price Advertising 44.2% of the variation in pie sales is explained by the variation in price and advertising, taking into account the sample size and number of independent variables November 8, 2018 Data Mining: Concepts and Techniques

85 Coefficient of Multiple Correlation
The coefficient of multiple correlation is the correlation between the predicted value and the observed value of the dependent variable Is the square root of the multiple coefficient of determination Used as another measure of the strength of the linear relationship between the dependent variable and the independent variables Comparable to the correlation between Y and X in simple regression November 8, 2018 Data Mining: Concepts and Techniques

86 Evaluating Individual Regression Coefficients
Use t-tests for individual coefficients Shows if a specific independent variable is conditionally important Hypotheses: H0: βj = 0 (no linear relationship) H1: βj ≠ 0 (linear relationship does exist between xj and y) November 8, 2018 Data Mining: Concepts and Techniques

87 Evaluating Individual Regression Coefficients
(continued) H0: βj = 0 (no linear relationship) H1: βj ≠ 0 (linear relationship does exist between xi and y) Test Statistic: (df = n – k – 1) November 8, 2018 Data Mining: Concepts and Techniques

88 Evaluating Individual Regression Coefficients
(continued) Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 15 ANOVA   df SS MS F Significance F Regression 2 Residual 12 Total 14 Coefficients t Stat P-value Lower 95% Upper 95% Intercept Price Advertising t-value for Price is t = , with p-value .0398 t-value for Advertising is t = 2.855, with p-value .0145 November 8, 2018 Data Mining: Concepts and Techniques

89 Data Mining: Concepts and Techniques
Example: Evaluating Individual Regression Coefficients From Excel output: H0: βj = 0 H1: βj  0 Coefficients Standard Error t Stat P-value Price Advertising d.f. = = 12 = .05 t12, .025 = The test statistic for each variable falls in the rejection region (p-values < .05) Decision: Conclusion: Reject H0 for each variable a/2=.025 a/2=.025 There is evidence that both Price and Advertising affect pie sales at  = .05 Reject H0 Do not reject H0 Reject H0 -tα/2 tα/2 2.1788 November 8, 2018 Data Mining: Concepts and Techniques

90 Confidence Interval Estimate for the Slope
Confidence interval limits for the population slope βj where t has (n – K – 1) d.f. Coefficients Standard Error Intercept Price Advertising Here, t has (15 – 2 – 1) = 12 d.f. Example: Form a 95% confidence interval for the effect of changes in price (x1) on pie sales: ± (2.1788)(10.832) So the interval is < β1 < November 8, 2018 Data Mining: Concepts and Techniques

91 Confidence Interval Estimate for the Slope
(continued) Confidence interval for the population slope βi Coefficients Standard Error Lower 95% Upper 95% Intercept Price Advertising Example: Excel output also reports these interval endpoints: Weekly sales are estimated to be reduced by between 1.37 to pies for each increase of $1 in the selling price November 8, 2018 Data Mining: Concepts and Techniques

92 Test on All Coefficients
F-Test for Overall Significance of the Model Shows if there is a linear relationship between all of the X variables considered together and Y Use F test statistic Hypotheses: H0: β1 = β2 = … = βk = 0 (no linear relationship) H1: at least one βi ≠ 0 (at least one independent variable affects Y) November 8, 2018 Data Mining: Concepts and Techniques

93 F-Test for Overall Significance
Test statistic: where F has k (numerator) and (n – K – 1) (denominator) degrees of freedom The decision rule is November 8, 2018 Data Mining: Concepts and Techniques

94 F-Test for Overall Significance
(continued) Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 15 ANOVA   df SS MS F Significance F Regression 2 Residual 12 Total 14 Coefficients t Stat P-value Lower 95% Upper 95% Intercept Price Advertising With 2 and 12 degrees of freedom P-value for the F-Test November 8, 2018 Data Mining: Concepts and Techniques

95 F-Test for Overall Significance
(continued) Test Statistic: Decision: Conclusion: H0: β1 = β2 = 0 H1: β1 and β2 not both zero  = .05 df1= df2 = 12 Critical Value: F = 3.885 Since F test statistic is in the rejection region (p-value < .05), reject H0  = .05 F There is evidence that at least one independent variable affects Y Do not reject H0 Reject H0 F.05 = 3.885 November 8, 2018 Data Mining: Concepts and Techniques

96 Tests on a Subset of Regression Coefficients
Consider a multiple regression model involving variables xj and zj , and the null hypothesis that the z variable coefficients are all zero: November 8, 2018 Data Mining: Concepts and Techniques

97 Tests on a Subset of Regression Coefficients
(continued) Goal: compare the error sum of squares for the complete model with the error sum of squares for the restricted model First run a regression for the complete model and obtain SSE Next run a restricted regression that excludes the z variables (the number of variables excluded is r) and obtain the restricted error sum of squares SSE(r) Compute the F statistic and apply the decision rule for a significance level  November 8, 2018 Data Mining: Concepts and Techniques

98 Data Mining: Concepts and Techniques
Prediction Given a population regression model then given a new observation of a data point (x1,n+1, x 2,n+1, , x K,n+1) the best linear unbiased forecast of yn+1 is It is risky to forecast for new X values outside the range of the data used to estimate the model coefficients, because we do not have data to support that the linear model extends beyond the observed range. ^ November 8, 2018 Data Mining: Concepts and Techniques

99 Using The Equation to Make Predictions
Predict sales for a week in which the selling price is $5.50 and advertising is $350: Note that Advertising is in $100’s, so $350 means that X2 = 3.5 Predicted sales is pies November 8, 2018 Data Mining: Concepts and Techniques

100 Data Mining: Concepts and Techniques
Predictions in PHStat PHStat | regression | multiple regression … Check the “confidence and prediction interval estimates” box November 8, 2018 Data Mining: Concepts and Techniques

101 Data Mining: Concepts and Techniques
Predictions in PHStat (continued) Input values Predicted y value < Confidence interval for the mean y value, given these x’s < Prediction interval for an individual y value, given these x’s < November 8, 2018 Data Mining: Concepts and Techniques

102 Residuals in Multiple Regression
Two variable model y Sample observation yi Residual = ei = (yi – yi) < yi < x2i x2 x1i x1 November 8, 2018 Data Mining: Concepts and Techniques

103 Nonlinear Regression Models
The relationship between the dependent variable and an independent variable may not be linear Can review the scatter diagram to check for non-linear relationships Example: Quadratic model The second independent variable is the square of the first variable November 8, 2018 Data Mining: Concepts and Techniques

104 Quadratic Regression Model
Model form: where: β0 = Y intercept β1 = regression coefficient for linear effect of X on Y β2 = regression coefficient for quadratic effect on Y εi = random error in Y for observation i November 8, 2018 Data Mining: Concepts and Techniques

105 Linear vs. Nonlinear Fit
Y Y X X X X residuals residuals Linear fit does not give random residuals Nonlinear fit gives random residuals November 8, 2018 Data Mining: Concepts and Techniques

106 Quadratic Regression Model
Quadratic models may be considered when the scatter diagram takes on one of the following shapes: Y Y Y Y X1 X1 X1 X1 β1 < 0 β1 > 0 β1 < 0 β1 > 0 β2 > 0 β2 > 0 β2 < 0 β2 < 0 β1 = the coefficient of the linear term β2 = the coefficient of the squared term November 8, 2018 Data Mining: Concepts and Techniques

107 Testing for Significance: Quadratic Effect
Testing the Quadratic Effect Compare the linear regression estimate with quadratic regression estimate Hypotheses (The quadratic term does not improve the model) (The quadratic term improves the model) H0: β2 = 0 H1: β2  0 November 8, 2018 Data Mining: Concepts and Techniques

108 Testing for Significance: Quadratic Effect
(continued) Testing the Quadratic Effect Hypotheses (The quadratic term does not improve the model) (The quadratic term improves the model) The test statistic is H0: β2 = 0 H1: β2  0 where: b2 = squared term slope coefficient β2 = hypothesized slope (zero) Sb = standard error of the slope 2 November 8, 2018 Data Mining: Concepts and Techniques

109 Testing for Significance: Quadratic Effect
(continued) Testing the Quadratic Effect Compare R2 from simple regression to R2 from the quadratic model If R2 from the quadratic model is larger than R2 from the simple model, then the quadratic model is a better model November 8, 2018 Data Mining: Concepts and Techniques

110 Example: Quadratic Model
Purity increases as filter time increases: Purity Filter Time 3 1 7 2 8 15 5 22 33 40 10 54 12 67 13 70 14 78 85 87 16 99 17 November 8, 2018 Data Mining: Concepts and Techniques

111 Example: Quadratic Model
(continued) Simple regression results: y = Time ^ Coefficients Standard Error t Stat P-value Intercept Time 2.078E-10 t statistic, F statistic, and R2 are all high, but the residuals are not random: Regression Statistics R Square Adjusted R Square Standard Error F Significance F 2.0778E-10 November 8, 2018 Data Mining: Concepts and Techniques

112 Example: Quadratic Model
(continued) Quadratic regression results: y = Time (Time)2 ^ Coefficients Standard Error t Stat P-value Intercept Time Time-squared 1.165E-05 Regression Statistics R Square Adjusted R Square Standard Error F Significance F 2.368E-13 The quadratic term is significant and improves the model: R2 is higher and se is lower, residuals are now random November 8, 2018 Data Mining: Concepts and Techniques

113 The Log Transformation
The Multiplicative Model: Original multiplicative model Transformed multiplicative model November 8, 2018 Data Mining: Concepts and Techniques

114 Interpretation of coefficients
For the multiplicative model: When both dependent and independent variables are logged: The coefficient of the independent variable Xk can be interpreted as a 1 percent change in Xk leads to an estimated bk percentage change in the average value of Y bk is the elasticity of Y with respect to a change in Xk November 8, 2018 Data Mining: Concepts and Techniques

115 Data Mining: Concepts and Techniques
Dummy Variables A dummy variable is a categorical independent variable with two levels: yes or no, on or off, male or female recorded as 0 or 1 Regression intercepts are different if the variable is significant Assumes equal slopes for other variables If more than two levels, the number of dummy variables needed is (number of levels - 1) November 8, 2018 Data Mining: Concepts and Techniques

116 Dummy Variable Example
Let: y = Pie Sales x1 = Price x2 = Holiday (X2 = 1 if a holiday occurred during the week) (X2 = 0 if there was no holiday that week) November 8, 2018 Data Mining: Concepts and Techniques

117 Dummy Variable Example
(continued) Holiday No Holiday Different intercept Same slope y (sales) If H0: β2 = 0 is rejected, then “Holiday” has a significant effect on pie sales b0 + b2 Holiday (x2 = 1) b0 No Holiday (x2 = 0) x1 (Price) November 8, 2018 Data Mining: Concepts and Techniques

118 Interpreting the Dummy Variable Coefficient
Example: Sales: number of pies sold per week Price: pie price in $ Holiday: 1 If a holiday occurred during the week 0 If no holiday occurred b2 = 15: on average, sales were 15 pies greater in weeks with a holiday than in weeks without a holiday, given the same price November 8, 2018 Data Mining: Concepts and Techniques

119 Dummy Variable Models (More than 2 Levels)
Dummy variables can be used in situations in which the categorical variable of interest has more than two categories Dummy variables can also be useful in experimental design Experimental design is used to identify possible causes of variation in the value of the dependent variable Y outcomes are measured at specific combinations of levels for treatment and blocking variables The goal is to determine how the different treatments influence the Y outcome November 8, 2018 Data Mining: Concepts and Techniques

120 Dummy Variable Models (More than 2 Levels)
Consider a categorical variable with K levels The number of dummy variables needed is one less than the number of levels, K – 1 Example: y = house price ; x1 = square feet If style of the house is also thought to matter: Style = ranch, split level, condo Three levels, so two dummy variables are needed November 8, 2018 Data Mining: Concepts and Techniques

121 Dummy Variable Models (More than 2 Levels)
(continued) Example: Let “condo” be the default category, and let x2 and x3 be used for the other two categories: y = house price x1 = square feet x2 = 1 if ranch, 0 otherwise x3 = 1 if split level, 0 otherwise The multiple regression equation is: November 8, 2018 Data Mining: Concepts and Techniques

122 Interpreting the Dummy Variable Coefficients (with 3 Levels)
Consider the regression equation: For a condo: x2 = x3 = 0 With the same square feet, a ranch will have an estimated average price of thousand dollars more than a condo For a ranch: x2 = 1; x3 = 0 With the same square feet, a split-level will have an estimated average price of thousand dollars more than a condo. For a split level: x2 = 0; x3 = 1 November 8, 2018 Data Mining: Concepts and Techniques

123 Interaction Between Explanatory Variables
Hypothesizes interaction between pairs of x variables Response to one x variable may vary at different levels of another x variable Contains two-way cross product terms November 8, 2018 Data Mining: Concepts and Techniques

124 Data Mining: Concepts and Techniques
Effect of Interaction Given: Without interaction term, effect of X1 on Y is measured by β1 With interaction term, effect of X1 on Y is measured by β1 + β3 X2 Effect changes as X2 changes November 8, 2018 Data Mining: Concepts and Techniques

125 Interaction Example Suppose x2 is a dummy variable and the estimated regression equation is y 12 x2 = 1: y = 1 + 2x1 + 3(1) + 4x1(1) = 4 + 6x1 ^ 8 4 x2 = 0: y = 1 + 2x1 + 3(0) + 4x1(0) = 1 + 2x1 ^ x1 0.5 1 1.5 Slopes are different if the effect of x1 on y depends on x2 value November 8, 2018 Data Mining: Concepts and Techniques

126 Significance of Interaction Term
The coefficient b3 is an estimate of the difference in the coefficient of x1 when x2 = 1 compared to when x2 = 0 The t statistic for b3 can be used to test the hypothesis If we reject the null hypothesis we conclude that there is a difference in the slope coefficient for the two subgroups November 8, 2018 Data Mining: Concepts and Techniques

127 Multiple Regression Assumptions
Errors (residuals) from the regression model: ei = (yi – yi) < Assumptions: The errors are normally distributed Errors have a constant variance The model errors are independent November 8, 2018 Data Mining: Concepts and Techniques

128 Analysis of Residuals in Multiple Regression
These residual plots are used in multiple regression: Residuals vs. yi Residuals vs. x1i Residuals vs. x2i Residuals vs. time (if time series data) < Use the residual plots to check for violations of regression assumptions November 8, 2018 Data Mining: Concepts and Techniques

129 Data Mining: Concepts and Techniques
Chapter Summary Developed the multiple regression model Tested the significance of the multiple regression model Discussed adjusted R2 ( R2 ) Tested individual regression coefficients Tested portions of the regression model Used quadratic terms and log transformations in regression models Used dummy variables Evaluated interaction effects Discussed using residual plots to check model assumptions November 8, 2018 Data Mining: Concepts and Techniques

130 The Stages of Model Building
Model Specification * Understand the problem to be studied Select dependent and independent variables Identify model form (linear, quadratic…) Determine required data for the study Coefficient Estimation Model Verification Interpretation and Inference November 8, 2018 Data Mining: Concepts and Techniques

131 The Stages of Model Building
(continued) Model Specification Estimate the regression coefficients using the available data Form confidence intervals for the regression coefficients For prediction, goal is the smallest se If estimating individual slope coefficients, examine model for multicollinearity and specification bias Coefficient Estimation * Model Verification Interpretation and Inference November 8, 2018 Data Mining: Concepts and Techniques

132 The Stages of Model Building
(continued) Model Specification Logically evaluate regression results in light of the model (i.e., are coefficient signs correct?) Are any coefficients biased or illogical? Evaluate regression assumptions (i.e., are residuals random and independent?) If any problems are suspected, return to model specification and adjust the model Coefficient Estimation Model Verification * Interpretation and Inference November 8, 2018 Data Mining: Concepts and Techniques

133 The Stages of Model Building
(continued) Model Specification Coefficient Estimation Interpret the regression results in the setting and units of your study Form confidence intervals or test hypotheses about regression coefficients Use the model for forecasting or prediction Model Verification Interpretation and Inference * November 8, 2018 Data Mining: Concepts and Techniques

134 Data Mining: Concepts and Techniques
Basic concepts Regressin equation Yi = 0 + 1*Xi + i 0 1 are parameters whose values are to be estimated they have fixed but unknown values i is the error term at any observatin i Error is a random variable Charecteristics of error E() is zoro : expected value of error is zero Var() is constant for all observations= sigma  -- N(0,sigma) a stronger assumption Errors are normally distributed for hypothesis testing November 8, 2018 Data Mining: Concepts and Techniques

135 Data Mining: Concepts and Techniques
İn the previous equation Parameters 0 and 1 are fixed but unknown Xi is fixed and known values of explanatory variables i is random variable So Yi is random variable due to ei Taking the expected value of both sides E(Yi) = 0 + 1*Xi + E( i) since E(ei) =0 E(Yi) = 0 + 1*Xi : this is the regressinon eququation Knowing 0 1 and X value We can predict Y on the average Yi_head = b0+ b1*Xi Here b0 and b1 are estimates of 0 and 1 November 8, 2018 Data Mining: Concepts and Techniques

136 Data Mining: Concepts and Techniques
Yi_head is:a point estimate of Y for a particular value of X when explanatory variable X takes a particular value b0 and b1 are estimates of the unknown parameters 0 and 1 November 8, 2018 Data Mining: Concepts and Techniques

137 Data Mining: Concepts and Techniques
* * * * * * * Yi – Y_head Yi * * Xi November 8, 2018 Data Mining: Concepts and Techniques

138 Estimation of parameters 0 1
Total prediction error =ni=1(Yi – Y_head)2 Sum of square error SSE =(Yi – Y_head)2 Deviation between actual value Yi and the predicted value y_head Squared and summed othrwise possitve deviations cansel negative deviations Take derivative of sum of square error SSE With respect to b0 and b1 Equate to zero Solve for b0 and b1 Note that b0 and b1 are estimates of 0 and 1 November 8, 2018 Data Mining: Concepts and Techniques

139 Data Mining: Concepts and Techniques
SSE = ni=1(Yi – (b0+b1*Xi))2 dSSE/db0 = ni=12ui*u’i Where ui = Yi – b0-b1*Xi Then u’i derivative of u with respect to b0: u’i = -1 dSSE/db0 =ni=12(Yi – b0-b1Xi)(-1) Derivative with respect to b1: dSSE/db1 = ni=12ui*u’i u’i = d(Yi – b0 – b1*Xi)/db1 = -Xi dSSE/db1 = ni=12(Yi – b0-b1*Xi)(-Xi) November 8, 2018 Data Mining: Concepts and Techniques

140 Data Mining: Concepts and Techniques
At the point where SSE is minimum dSSE/b0 = 0, dSSE/b1 =0 So: ni=12(Yi – b0-b1Xi)(-1) = 0 ni=12(Yi – b0-b1*Xi)(-Xi) = 0 ni=1(Yi – b0-b1Xi) = 0 (1) ni=1(Yi – b0-b1*Xi)(Xi) = 0 (2) Two eq. two unknowns b0 and b1 Solve b0 b1 in terms of Xi and Yi s November 8, 2018 Data Mining: Concepts and Techniques

141 Data Mining: Concepts and Techniques
From the first equation ni=1Yi – n*b0-b1ni=1Xi = 0 or Dividing by n number of observations (ni=1Yi)/n – b0-b1(ni=1Xi)/n = 0 Y_mean = b0 + b1*X_mean Regression line passes through the mean of Y and mean of X Formulas to calculate estimates b0 and b1 are: b1 = (Xi – X_mean)(Yi – Y_mean) (Xi – X_mean)2 b0 = Y_mean – b1*X_mean November 8, 2018 Data Mining: Concepts and Techniques

142 Data Mining: Concepts and Techniques
Example A pizza chain around universities Sales versus student population Problem: Predict sales of pizza Y given the student population of university X Based on data points Yi s and Xi s November 8, 2018 Data Mining: Concepts and Techniques

143 Coefficient of determination
For the i th observation Yi – Yi_head: difference between observed and actual value of Y is called residual Total sum of squares SST SST = (Yi – Y_mean)2 Each term in SST is a measure of error in estiating Y without knowing any explanatory variable X in the pizza example estimat the sales of pizza without knowing the university population X Use directly the mean value of pizza sales Y_mean Y_mean is the best point estimate of pizza sales if no other information is provided by any other variable such as X university population November 8, 2018 Data Mining: Concepts and Techniques

144 Data Mining: Concepts and Techniques
* Yi – Y_mean * * * * * * Y_mean Yi – Y_head Yi * * Xi Yi – Y_mean = (Yi – Y_head) + (Y_head – Y_mean) For observation i deviation from mean Can be expalained by error from regression + Deviation from the mean by y_head November 8, 2018 Data Mining: Concepts and Techniques

145 Data Mining: Concepts and Techniques
İn the particualar example: Knowing the university population helps to predict sales to some extend How much How much the estimated Y values are deviated from the mean Y_head – Y_mean SSR = (Yi_head – Y_mean)2 deviations of Y_head from mean Y ,squared and summed for all observations İn general SST = SSR + SSE November 8, 2018 Data Mining: Concepts and Techniques

146 Data Mining: Concepts and Techniques
SST is the total variation in Y Some part of which can be explained by knowing a variable X Knowing university population can expalin sales to some extend This is SSR sum of square due to regression But there is still some variation in Y Pizza sales that can not be predicted by X SSE sum of square error November 8, 2018 Data Mining: Concepts and Techniques

147 A measure for goodness of fit for regression
İf every observation perfectly fits Are on the regression line Yi – Yi_head = 0 or SSE is zero : no error X perfectly expain the variation in Y SST = SSR The ratio of SSR to SST gives us a measure about the goodness of fit of the regression line r2 = SSR/SST Between 0 and 1 November 8, 2018 Data Mining: Concepts and Techniques

148 Data Mining: Concepts and Techniques
Can be interpreted as the percentage of total sum of squares that can be expalained using the regression equation SSR/SST = r2 :coefficient of determination Corrolation coefficient r between variables X and Y Degree of linear associatin between two numerical variables rxy = (sign of b1)sqrt(coefficient of det) So r is between -1 and +1 November 8, 2018 Data Mining: Concepts and Techniques

149 Data Mining: Concepts and Techniques
* * * * * * * * * * * * * Yi * * * * Yi * Xi Xi We assume that error has constant variance Errors are independent of X and Y For some probllems these assumptions are violated Y hight x weight for babies errors are likely to be small compared adults November 8, 2018 Data Mining: Concepts and Techniques

150 Data Mining: Concepts and Techniques
Testing for signifi Estimation of the variance of error sigma square SSE sum of square error: variability around estimated line MSE mean square error s2 = MSE = SSE/ (n-2) is an unbiesd estimate of the variance of error Divide by n-2 as There are n errors but only n-2 of them are independent as we estimate two parameters b0 and b1 there are two equations involving errors Knowing n-2 errors the rest 2 errors can be calculated from the least square equations November 8, 2018 Data Mining: Concepts and Techniques

151 Data Mining: Concepts and Techniques
Standard error of estimate s S = Sqrt(MSE) Estimate of the standard deviation of error For the pizza example MSE is SSE/(10-2) = 1530/8= s = sqrt(191.25) = November 8, 2018 Data Mining: Concepts and Techniques

152 Data Mining: Concepts and Techniques
t-tests What about the hypothesised variable İn the pizza example university population Does it expalain sale to some extend Is it significantly different from zero H0 null hypothesis 1 = 0 Ha the alternative 1 is not equal 0 b1 is related to X and Y values but Y are related to errors so b1 is a random variable having distributions E(b1) = 1 so b1 is an unbiesd estimate of 1 Estimated standard deviation of b1 November 8, 2018 Data Mining: Concepts and Techniques

153 Data Mining: Concepts and Techniques
Sb1 = standard error of estimate sqrt( (Xi – X_mean)2) Sb1 = s_error/ sqrt((Xi – X_mean)2) For the A Pizza example Sb1 = /sqrt(568) = So sampling distribution of b1 t distributed with degree of freedom 8 a mean value of 5 and stadard deviation of H0 1 =0 Ha 1 is not 0 t = (b1-1)/sb1= (5-0)/ = 8.62 November 8, 2018 Data Mining: Concepts and Techniques

154 Data Mining: Concepts and Techniques
With an confidence level of .01 and a degree of freedom 8 the critical value of t(8) is 3.355 Since 8.62 > Reject the null hypotesis that 1 is zero November 8, 2018 Data Mining: Concepts and Techniques

155 Data Mining: Concepts and Techniques
The F test Testing the overall significance of regression equation İn binary regression F test and t test are equivalent as İn multiple regression F test is used For testing the significance of regression t test for testing the significance of each variable MSR = SSR/k mean square regression k number of variables November 8, 2018 Data Mining: Concepts and Techniques

156 Data Mining: Concepts and Techniques
F = MSR/MSE Under the null hypothesis of H0 1 =0 MSR/MSE has an F distirbution with k and n-2 degrees of freedom As MSR is close to zero F closs to zero regression is less likely to be significant F statistics is greater then zoro To what extend November 8, 2018 Data Mining: Concepts and Techniques

157 Data Mining: Concepts and Techniques
Residual Plots Residuals errors Yi –Yi_head Plot residuals versus X Y_head Check the assumption of independent residuals Detect outliers November 8, 2018 Data Mining: Concepts and Techniques

158 Data Mining: Concepts and Techniques
Multiple Regression Regression model Yi = 0 + 1*X1i + 2* X2i k*Xki + i Taking the expectation E(Yi) = 0 + 1*X1i + 2* X2i k*Xki As E(i) =0 by assumption Estimated regression equation Yi_head = 0 + 1*X1i + 2* X2i k*Xki November 8, 2018 Data Mining: Concepts and Techniques

159 Data Mining: Concepts and Techniques
Least Square Method Minimize SSE = (Yi – Yi_head)2 Just like binary regression November 8, 2018 Data Mining: Concepts and Techniques

160 Data Mining: Concepts and Techniques
Example Yearly spending of a customer is related to her income X1 and distance to the market X2 Regression model Yi = 0 + 1*X1i + 2* X2i + i B0, b1, b2 are point estimates 0 1 and 2 Yi_head = b0 + b1*X1i + 2* X2i The company can predict a yearly spending value for each cusomer Yi_head = X X2 İnterpretation of 1 increasing income by one unit what is the change in Y holding the effect of distance consant November 8, 2018 Data Mining: Concepts and Techniques

161 Testing for significance
SST = SSR + SSE As regression is likely to be sigifiant SSR is high SSE small As regression is likely to be insignificant SSE is high SSR is small F test: testing the overall significance of regression equation t test testing the significance of individual variables Only after accepting the overall significance of regression November 8, 2018 Data Mining: Concepts and Techniques

162 Data Mining: Concepts and Techniques
F test H0: regression is insignificant 1=0, 2=0, ..., k =0 Ha: at least one of the parameters is not zero Under the null hypothesis H0 The ratio MRS/MSE is Fwith k, n-k-1 degree of freedom MSR = SSR/k MSE = SSE/n-k-1 An unbiesd estimate of variance of error İf the regression is significant MSR is high MSE is low F ratio is high Given a confidece level F>F_critical Reject H0 or accept Ha so regression is significant November 8, 2018 Data Mining: Concepts and Techniques

163 Data Mining: Concepts and Techniques
Example Test the significase of regression H0 1=0, 2=0 Ha at least one is different from zero MSR =10.8 MSE 0.328 F = 10.8/0.328 = 32.9 Under the null the ratio MSR/MSE is F with 2,10-3 Degrees of freedom Chosing a significanse level of 0.01 The critical values for the F distirbution with 2 and 7 degrees of freedom is 9.55 F =32.9>9.55 F>F_critical null is rejected regression is significant November 8, 2018 Data Mining: Concepts and Techniques

164 t test for significase of indivdual coefficients
H0 i=0 Ha i is not zero t = bi/Sbi Sbi is the estimate of the standard deviation of bi Reject the null if t< -t/2 or t> t /2 Where t /2 is based on a t distribution with n-k-1 degree of freedom November 8, 2018 Data Mining: Concepts and Techniques

165 Data Mining: Concepts and Techniques
Example b1 =0.061 sb1 = b2 = 923 sb2 =0.2211 tb1 = 0.61/ = 6.18 tb2 = 923/2211= 4.15  of 0.01 t_critical witha degree of freedom =7 is 3.49 Reject the null hypothesis that b1 =0 And b2=0 These are different hypothesis November 8, 2018 Data Mining: Concepts and Techniques

166 Variable-Selection Procedures
Stepwise Regression At each iteration, the first consideration is to see whether the least significant variable currently in the model can be removed because its F value, FMIN, is less than the user-specified or default F value, FREMOVE. If no variable can be removed, the procedure checks to see whether the most significant variable not in the model can be added because its F value, FMAX, is greater than the user-specified or default F value, FENTER. If no variable can be removed and no variable can be added, the procedure stops. November 8, 2018 Data Mining: Concepts and Techniques

167 Variable-Selection Procedures
Forward Selection This procedure is similar to stepwise-regression, but does not permit a variable to be deleted. This forward-selection procedure starts with no independent variables. It adds variables one at a time as long as a significant reduction in the error sum of squares (SSE) can be achieved. November 8, 2018 Data Mining: Concepts and Techniques

168 Variable-Selection Procedures
Backward Elimination This procedure begins with a model that includes all the independent variables the modeler wants considered. It then attempts to delete one variable at a time by determining whether the least significant variable currently in the model can be removed because its F value, FMIN, is less than the user-specified or default F value, FREMOVE. Once a variable has been removed from the model it cannot reenter at a subsequent step. November 8, 2018 Data Mining: Concepts and Techniques

169 Variable-Selection Procedures
Best-Subsets Regression The three preceding procedures are one-variable-at-a-time methods offering no guarantee that the best model for a given number of variables will be found. Some software packages include best-subsets regression that enables the use to find, given a specified number of independent variables, the best regression model. Minitab output identifies the two best one-variable estimated regression equations, the two best two-variable equation, and so on. November 8, 2018 Data Mining: Concepts and Techniques

170 Example: Economic indicator problem
There are tens of macroeconomic variables say totally 45 Which ones is the best predictor for inflation rate three months ahead? Develop a simple model to predict inflation by using only a couple of those 45 macro variables Best-stepwise feature selection The single macro variable predicting inflation among the 45 is seleced first: try 45 models say: $/TL repeat The k th variable is entered among 45-(k-1) variables Stop at some point introducing new variables November 8, 2018 Data Mining: Concepts and Techniques

171 Data Mining: Concepts and Techniques
Example cont. Best feature elimination Develop a model including all 45 variables Remove just one of them try 45 models each excluding just one out of 45 variables repeat Continue eliminating a new variable at each step Unitl a stoping criteria Rearly used comared to feature selection November 8, 2018 Data Mining: Concepts and Techniques

172 Qualitative independent variables
Cender 0-1 variable Major If there are M categories use M binary variable with 0 or 1 Ex: three categories: app science, enineer, others Use M-1 of them in regression equation İncome = 0+ 1*M1+ 2*M2+error M1 = 1 if app. Science , 0 otherwise M2 = 1 if enginieering , 0 otherwise if person is other if neither app science not engeneer M1 =0,M2=0 November 8, 2018 Data Mining: Concepts and Techniques

173 Data Mining: Concepts and Techniques
Model Building Curvilinear relationships Y = 0 + 1X1 + 2X2 + 3X22+4X1*X2 Ex:predict Sales as a function of number of months an sale person is employed Sales = f(months_emp) November 8, 2018 Data Mining: Concepts and Techniques

174 Adding or deleting a variable
Adding a variable to a regression Decreases SSE This is always true Even an irrelevant variable decreases SSE SSE(X1) – SSE(X1,X2) Reduction of error sumofsquare F test is used in determining whether this reduction is significant (SSE1 – SSE2)/num of variable added Reduction in SSE per variable November 8, 2018 Data Mining: Concepts and Techniques

175 Data Mining: Concepts and Techniques
Or SSE(reduced) – SSE(full)/p p number of variables added There are p new variables in full model F test F = (SSE(reduced)-SSE(full)/p MSE(full) İf the addition of new variables is significant F is high The null hypothesi is new variables are not significant İf F > F critical reject null F_critical is based on an F with p and n-k-p-1 degrees of freedom November 8, 2018 Data Mining: Concepts and Techniques

176 Data Mining: Concepts and Techniques
Example Testing the effect of major on income Since major is represented by a dummy qualitaive variable it is encodded with more then one variable For M categories of major use M-1 dummy variables November 8, 2018 Data Mining: Concepts and Techniques

177 Forecasting application
Univariate forecasts Y is a variable evolving over time its feature values is to be predicted based on its past values and possibly using other variables Yt= 0+ 1*yt-1+ 2*Yt pYt-p+other variables + errort Estimate regression equation of order p for Y AR(p) model autoregressive P November 8, 2018 Data Mining: Concepts and Techniques

178 Data Mining: Concepts and Techniques
Example Predict dollar based on its own values And possible other variables İnflation rate, interest rate, stockindex... Test the significance of its own effect or the effect of inflation Use the F test procedure November 8, 2018 Data Mining: Concepts and Techniques

179 Multivariate forecasts
A set of variables affect each other Strong relation between set of variables Dollart = f(dollar previous values, inflation prv, interest rate prv, stock index prv) But inflatint =f(dollar previous values, inflation prv, interest rate prv, stock index prv) İnterest ratet =f(dollar previous values, inflation prv, interest rate prv, stock index prv) November 8, 2018 Data Mining: Concepts and Techniques

180 Data Mining: Concepts and Techniques
Estimate seperate regression equatins for each variable use F test procedure the test significance of overall regression As well as effects of individual variables Use the estimated and refined equations to make forecasts November 8, 2018 Data Mining: Concepts and Techniques


Download ppt "Chapter 5. Classification and Prediction"

Similar presentations


Ads by Google