Presentation on theme: "Introduction to Regression Analysis"— Presentation transcript:
1 Introduction to Regression Analysis We use sample data toestimate a population mean () or (1 - 2)estimate a population proportion (p) or (p1 - p2)test of hypothesis about or (1 - 2)test of hypothesis about p or (p1 - p2).Now we want to use sample data to investigate the relationships among a group of variables and to create a mathematical model that can be used to predict its value in the future.The process of finding a mathematical model (an equation) that best fits the data is known as regression analysis.
2 Introduction to Regression Analysis The variable to be predicted (or modeled), y, is called the dependent variable.The variables used to predict (or model) y are called independent variables and are denoted by the symbols x1, x2, x3, etc..General form of probabilistic model in regression:wherey = dependent variable= mean or expected value of y, deterministic component = unexplainable, or random error componentEstimation/prediction equation
3 Form of The Simple Linear Regression Model y|x = b0 + b1x is the mean value of the dependent variable y when the value of the independent variable is xb0 is the y-intercept, the mean of y when x is 0 (when there is observed any values of x near 0)b1 is the slope, the change in the mean of y per unit change in x (over the range of sample x-values)e is an error term that describes the effect on y of all factors other than x
5 Regression Terms β0 and β1 are called regression parameters β0 is the y-intercept and β1 is the slopeWe do not know the true values of these parametersSo, we must use sample data to estimate themb0 is the estimate of β0 and b1 is the estimate of β1
6 The Least Squares Point Estimates Estimation/prediction equationSlope:y-intercept:n=sample sizeMS EXCEL: =SLOPE(y range, x range)=INTERCEPT(y range, x range)
7 An Estimator of 2 where n = sample size s = standard deviation of error = standard error of estimate
8 A 100(1-)% confidence interval for the simple linear regression slope 1 wheret/2 is based on (n-2) degree of freedom
9 Testing the Significance of the Slope One Tailed Test Two Tailed TestHo: 1 = Ho: 1 = 0Ha: 1 < Ha: 1 0or 1 > 0Test Statistic:Rejection region: t< -t Rejection region: |t|>t/2or t> tWhere t is based on Where t/2 is based on(n-2) degree of freedom (n-2) degree of freedom
10 The 100(1-)% confidence interval for the mean value of y for x=xp Where t/2 is based on (n-2) degree of freedom
11 The 100(1-)% prediction interval for an individual y for x=xp Where t/2 is based on (n-2) degree of freedom
12 Simple Coefficient of Determination Explained Variationr2 =Total VariationAbout 100(r2)% of the sample variation in y can be explained by using x to predict y in the simple linear regression model.yiUn-ExplainedVariationTotal VariationExplainedVariationxi
13 The coefficient of correlation SSxyr =SSxx SSyyr for sample and (rho) for population-1< r <1r > 0 means that y increases as x increasesr < 0 means that y decreases as x increasesr 0 little or no linear relationship between y and x.the closer r to 1 or –1, the stronger the relationship.High correlation does not imply causality. Only a linear trend may exist between x and y.Wherewhen b1>0 orwhen b1<0
14 ExerciseWhat is the range of values that the coefficient of determination can assume? ___If the value of r is -0.96, what does this indicate about the dependent variable as the independent variable increases? __If the correlation between sales and advertising is +0.6, what percent of the variation in sales can be attributed to advertising? __What does the coefficient of determination equal if r = 0.89?
15 ExerciseIn the regression equation, what does the letter "b" represent?What is the null hypothesis to test the significance of the slope in a regression equation?The regression equation is Ŷ = X, the sample size is 8, and the standard error of the slope is What is the test statistic to test the significance of the slope?