9. SIMPLE LINEAR REGESSION AND CORRELATION

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Lesson 10: Linear Regression and Correlation
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
Simple Linear Regression
Introduction to Regression Analysis
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
PPA 415 – Research Methods in Public Administration
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Linear Regression and Correlation Analysis
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Simple Linear Regression Analysis
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
SIMPLE LINEAR REGRESSION
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Simple Linear Regression and Correlation
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
Linear Regression/Correlation
Relationships Among Variables
© 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 10 Simple Linear Regression.
Correlation & Regression
Correlation and Linear Regression
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Chapter 8: Bivariate Regression and Correlation
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation
MAT 254 – Probability and Statistics Sections 1,2 & Spring.
Simple Linear Regression Models
Chapter 6 & 7 Linear Regression & Correlation
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Introduction to Regression Analysis. Dependent variable (response variable) Measures an outcome of a study  Income  GRE scores Dependent variable =
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Correlation & Regression Analysis
Chapter 8: Simple Linear Regression Yang Zhenlin.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Regression Analysis Deterministic model No chance of an error in calculating y for a given x Probabilistic model chance of an error First order linear.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
© 2011 Pearson Education, Inc Statistics for Business and Economics Chapter 10 Simple Linear Regression.
The simple linear regression model and parameter estimation
Linear Regression and Correlation Analysis
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
Simple Linear Regression
SIMPLE LINEAR REGRESSION
Chapter Thirteen McGraw-Hill/Irwin
Presentation transcript:

9. SIMPLE LINEAR REGESSION AND CORRELATION 9.1 Regression and Correlation 9.2 Regression Model 9.3 Probabilistic Models 9.4 Fitting The Model: The Least Square Approach 9.5 The Least-Square Lines 9.6 The Least-Squares Assumption 9.7 Model Assumptions of Simple Regression 9.8 Assessing the utility of the model-making inference about the slope 9.9 The Coefficient of Correlation 9.10 Calculating r2 9.11 Correlation Model 9.12 Correlation Coefficient 9.13 The Coefficient of Determination 9.14 Using The Model for Estimation and Prediction

9.1 Regression and Correlation Regression: Helpful in ascertaining the probable form of the relationship between variables, and predict or estimate the value corresponding to a given value of another variable. Correlation: Measuring the strength of the relationship between variables.

9.2 Regression Model Two variables, X and Y, are interest. Where, X = independent variable Y = dependent variable  = Random error component (beta zero) = y intercept of the line (beta one) = Slope of the line , the amount of increase or decrease in the deterministic of y for every 1 unit of x increase or decrease.

Figure 9a. Regression Model

9.3 Probabilistic Models 9.3.1 General Form of Probabilistic Models y = Deterministic component + Random error Where y is the variable of interest. We always assume that the mean value of the random error equals 0. This is equivalent to assuming that the mean value of y, E(y), equals the deterministic component of the model; that is, E(y) = Deterministic component

9.3.2 A First-Order (Straight-line) Probabilistic Model Where y = Dependent or response variable (variable to be modeled) x = Independent or predictor variable (variable used as a predictor of y) E(y) = = Deterministic component (epsilon) = Random error component (beta zero) = y-intercept of the line, that is, the point at which the line intercepts or cuts through the y-axis (see figure 9b below) (beta one) = Slope of the line, that is, the amount of increase (or decrease) in the deterministic component of y for every one-unit increase in x.

Figure 9b. The straight-line model

9.4 Fitting The Model: The Least Square Approach Table 9a. Reaction Time Versus Drug Percentage Subject Amount of Drug x (%) Reaction Time y (seconds) 1 2 3 4 5

Figure 9c. 1) Scattergram 2) Visual straight line (for data in table above) (fitted for the data above)

9.5 The Least-Squares Lines The Least-Squares Line is result in obtaining the desired line which called method of lease-squares. Where, y = value on the vertical axis x = value on the horizontal axis = point where the line crosses the vertical axis = shows the amount by which y changes for each unit change in x.

9.5.1 Definition of Least Square Line The least square line is one that has the following two properties: The sum of the errors (SE) equals 0 The sum of squared errors (SSE) is smaller than that for any other straight-line model

9.5.2 Formulas for the Least Squares Estimation Where;

Figure 9d. Scatter Diagram

The total deviation:- measuring the vertical distance from line. The explained deviation:- shows how much the total deviation is reduced when the regression line is fitted to the points. Unexplained deviation:- shows the proportion of the total deviation accounted for by the introduction of the regression line. total deviation Explained deviation Unexplained deviation

Total sum of squares (SST):- to measure of the total variation in observed values of Y. Explained sum of squares (SSR) :- measures the amount of the total variability in the observes values of Y that is accounted for by the linear relationship between the observed values of X and Y. Unexplained sum of squares (SST):-measure the dispersion of the observed Y values about the regression line.

9.6 The Least-Squares Assumption Consider now a reasonable criterion for estimating  and  from data. The method of ordinary least squares (OLS) determines values of  and  (since these will be estimated from data, we will replace  and  with Latin letters a and b).

so that the sum of the squared vertical deviations residuals) between the data and the fitted line, Residuals = Data -Fit, is less than the sum of the squared vertical deviations from any other straight line that could be fitted through the data: Minimum of (Data - Fit)²

  A "vertical deviation" is the vertical distance from an observed point to the line. Each deviation in the sample is squared and the least-squares line is defined to be the straight line that makes the sum of these squared deviations a minimum: Data = a + bX + Residuals.

Figure 1 (a) illustrates the regression relationship between two variables, Y and X. The arithmetic mean of the observed values of Y is denoted by . The vertical dashed lines represent the total deviations of each value y from the mean value . Part (b) in Figure 1 shows a linear least-squares regression line fitted to the observed points

Figure 9e: The total variation of Y and the least-squares regression between Y and X. (a) Total variation (b) Least-squares regression

The total variation can be expressed in terms of (1) the variation explained by the regression and (2) a residual portion called the unexplained variation.

Figure 2 (a) shows the explained variation, which is expressed by the vertical distance between any fitted (predicted) value and the mean or - . The circumflex (^) over the y is used to represent fitted values determined by a model. Thus, it is also customary to write a = and b = . Figure 2 (b) shows the unexplained or residual variation-the vertical distance between the observed values and the pre-dicted values (y - )

Figure 9f. The explained and unexplained variation in least-squares regression. (a)  Explained variation (b) Unexplained variation

9.7 Model Assumptions of Simple Regression The mean of the probability distribution of is 0. That is, the average of the values of over an infinitely long series of experiments is 0 for each setting of the independents variable x. This assumption implies that the mean value of y, E(y), for given value of x is .

Assumption 2: The variance of the probability distribution of is constant for all settings of the independent variable x. For our straight-line model, this assumption means that the variance of is equal to a constant, say , for all values of x. Assumption 3: The probability distribution of is normal. Assumption 4: The values of associated with any two observed values of y are independent. That is, the value of associated with one value of y has no effect on the values of associated with other y values.

Figure 9g. The probability distribution of

9. 8. Assessing the Utility of the Model: 9.8 Assessing the Utility of the Model: Making Inference About the Slope 9.8.1 A Test Of Model Utility: Simple Linear Regression One-Tailed Test Two-Tailed Test

Where are based on degrees of freedom Assumption: Refer the four assumption about

Figure 9h. Rejection region and calculated t value for testing versus

9.8.2 A Confidence Interval for the Simple Linear Regression Slope Where the estimated standard error of is calculated by And is based on (n-2) degrees of freedom. Assumption: Refer the four assumption about

9.9 The Coefficient of Correlation Definition: The Pearson product moment coefficient of correlation, r, is a measure of a strength of the linear relationship between two variables x and y. It is computed (for a sample of n measurements on x and y) as follows:

Figure 9i. Value of r and their implication 1) Positive r : y increases as x increases

2) r near zero: little or no relationship between y and x

3) Negative r : y decreases as x increases

4) r = 1: a perfect positive relationship between y and x

5) r = -1: a perfect negative relationship between y and x

6) r near 0: little or no relationship between y and x

9.10 Calculating r2 Where: r = The sample correlation

Fig 9j. r2 as a measure of closeness of fit of the sample regression line to the sample observation

9.11 Correlation Model We have what is called Correlation Model, when Y and X are random variable. Involving two variables implies a co-relationship between them. One variable as dependent and another one as independent.

9.12 The Correlation Coefficient ( ) Measures the strength of the linear relationship between X and Y. May assumed any value between –1 and +1. If = 1, there is perfect direct linear. If = -1, indicates perfect inverse linear correlation.

9.13 The Coefficient of Determination Figure 9k. A comparison of the sum of squares of deviations for two models

b

c

9.13.1 Coefficient of Determination Definition It represents the proportion of the total sample variability around that is explained by the linear relationship between y and x. (In simple linear regression, it may also be computed as the square of the coefficient of correlation r.

9.14 Using The Model for Estimation and Prediction

9. 14. 1. Sampling Errors for the Estimator of the Mean 9.14.1 Sampling Errors for the Estimator of the Mean of y and the Predictor of an Individual New Value of y The standard deviation of the sampling distribution of the estimator of the mean of y at a specific value of x, say xp is Where is the standard deviation of the random error . We refer to as the standard error of .

The standard deviation of the prediction error for the predictor of an individual new y value at a specific value of x is Where is the standard deviation of the random error . We refer to as the standard error of prediction.

9.14.2 A Confidence Interval for the Mean Value of y at x = xp Where is based on (n-2) degrees of freedom.

9.14.3 A Prediction Interval for an Individual New Value of y at x = xp Where is based on (n-2) degrees of freedom.

Figure 9l. A 95% confidence interval for mean sales Figure 9l. A 95% confidence interval for mean sales and a prediction interval for drug concentration when x = 4

Figure 9m. Error of estimating the mean value of y for a given value of x

Figure 9n. Error of predicting a future value of y for a given value of x

Figure 9o. Confidence intervals for mean value Figure 9o. Confidence intervals for mean value and prediction intervals for new values