# Introduction to Regression Analysis

## Presentation on theme: "Introduction to Regression Analysis"— Presentation transcript:

Introduction to Regression Analysis
We use sample data to estimate a population mean () or (1 - 2) estimate a population proportion (p) or (p1 - p2) test of hypothesis about  or (1 - 2) test of hypothesis about p or (p1 - p2). Now we want to use sample data to investigate the relationships among a group of variables and to create a mathematical model that can be used to predict its value in the future. The process of finding a mathematical model (an equation) that best fits the data is known as regression analysis.

Introduction to Regression Analysis
The variable to be predicted (or modeled), y, is called the dependent variable. The variables used to predict (or model) y are called independent variables and are denoted by the symbols x1, x2, x3, etc.. General form of probabilistic model in regression: where y = dependent variable = mean or expected value of y, deterministic component  = unexplainable, or random error component Estimation/prediction equation

Form of The Simple Linear Regression Model
y|x = b0 + b1x is the mean value of the dependent variable y when the value of the independent variable is x b0 is the y-intercept, the mean of y when x is 0 (when there is observed any values of x near 0) b1 is the slope, the change in the mean of y per unit change in x (over the range of sample x-values) e is an error term that describes the effect on y of all factors other than x

The Simple Linear Regression Model Illustrated

Regression Terms β0 and β1 are called regression parameters
β0 is the y-intercept and β1 is the slope We do not know the true values of these parameters So, we must use sample data to estimate them b0 is the estimate of β0 and b1 is the estimate of β1

The Least Squares Point Estimates
Estimation/prediction equation Slope: y-intercept: n=sample size MS EXCEL: =SLOPE(y range, x range) =INTERCEPT(y range, x range)

An Estimator of 2 where n = sample size
s = standard deviation of error = standard error of estimate

A 100(1-)% confidence interval for the simple linear regression slope 1
where t/2 is based on (n-2) degree of freedom

Testing the Significance of the Slope
One Tailed Test Two Tailed Test Ho: 1 = Ho: 1 = 0 Ha: 1 < Ha: 1  0 or 1 > 0 Test Statistic: Rejection region: t< -t Rejection region: |t|>t/2 or t> t Where t is based on Where t/2 is based on (n-2) degree of freedom (n-2) degree of freedom

The 100(1-)% confidence interval for the mean value of y for x=xp
Where t/2 is based on (n-2) degree of freedom

The 100(1-)% prediction interval for an individual y for x=xp
Where t/2 is based on (n-2) degree of freedom

Simple Coefficient of Determination
Explained Variation r2 = Total Variation About 100(r2)% of the sample variation in y can be explained by using x to predict y in the simple linear regression model. yi Un-Explained Variation Total Variation Explained Variation xi

The coefficient of correlation
SSxy r = SSxx SSyy r for sample and  (rho) for population -1< r <1 r > 0 means that y increases as x increases r < 0 means that y decreases as x increases r  0 little or no linear relationship between y and x. the closer r to 1 or –1, the stronger the relationship. High correlation does not imply causality. Only a linear trend may exist between x and y. Where when b1>0 or when b1<0

Exercise What is the range of values that the coefficient of determination can assume? ___ If the value of r is -0.96, what does this indicate about the dependent variable as the independent variable increases? __ If the correlation between sales and advertising is +0.6, what percent of the variation in sales can be attributed to advertising? __ What does the coefficient of determination equal if r = 0.89?

Exercise In the regression equation, what does the letter "b" represent? What is the null hypothesis to test the significance of the slope in a regression equation? The regression equation is Ŷ = X, the sample size is 8, and the standard error of the slope is What is the test statistic to test the significance of the slope?

Exercise Page 488 no. 26 Page 494 no. 31 Page 500 no. 38