Correlation and Regression Ch 4. Why Regression and Correlation We need to be able to analyze the relationship between two variables (up to now we have.

Slides:



Advertisements
Similar presentations
Chapter 12 Simple Linear Regression
Advertisements

1 Functions and Applications
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Chapter 3 Bivariate Data
CHAPTER 8: LINEAR REGRESSION
Regression Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with.
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
Correlation and Regression Analysis
Chapter 12a Simple Linear Regression
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
SIMPLE LINEAR REGRESSION
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
Introduction to Probability and Statistics Linear Regression and Correlation.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Chapter 9: Correlation and Regression
Math 227 Elementary Statistics Math 227 Elementary Statistics Sullivan, 4 th ed.
SIMPLE LINEAR REGRESSION
BCOR 1020 Business Statistics Lecture 24 – April 17, 2008.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation Topic 18. Linear Regression  Is the link between two factors i.e. one value depends on the other.  E.g. Drivers age.
Spreadsheet Problem Solving
Simple Linear Regression
Linear Regression Analysis
Correlation and Regression
Descriptive Methods in Regression and Correlation
Linear Regression.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Simple Linear Regression
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION Chapter 3.
Biostatistics Unit 9 – Regression and Correlation.
Chapter 6: Exploring Data: Relationships Lesson Plan Displaying Relationships: Scatterplots Making Predictions: Regression Line Correlation Least-Squares.
2-5 Using Linear Models Make predictions by writing linear equations that model real-world data.
Chapter 6 & 7 Linear Regression & Correlation
1 1 Slide Simple Linear Regression Part A n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Anthony Greene1 Regression Using Correlation To Make Predictions.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Chapter 10 Correlation and Regression
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Regression Regression relationship = trend + scatter
CHAPTER 4: TWO VARIABLE ANALYSIS E Spring.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 12.3.
Lecture 10: Correlation and Regression Model.
Creating a Residual Plot and Investigating the Correlation Coefficient.
3.3 Correlation: The Strength of a Linear Trend Estimating the Correlation Measure strength of a linear trend using: r (between -1 to 1) Positive, Negative.
5.7 Scatter Plots and Line of Best Fit I can write an equation of a line of best fit and use a line of best fit to make predictions.
Correlation The apparent relation between two variables.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Financial Algebra © 2011 Cengage Learning. All Rights Reserved. Slide LINEAR REGRESSION Be able to fit a regression line to a scatterplot. Find and.
9.2 Linear Regression Key Concepts: –Residuals –Least Squares Criterion –Regression Line –Using a Regression Equation to Make Predictions.
AP Statistics HW: p. 165 #42, 44, 45 Obj: to understand the meaning of r 2 and to use residual plots Do Now: On your calculator select: 2 ND ; 0; DIAGNOSTIC.
Business Statistics for Managerial Decision Making
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Stats of Engineers, Lecture 8. 1.If the sample mean was larger 2.If you increased your confidence level 3.If you increased your sample size 4.If the population.
Section 1.6 Fitting Linear Functions to Data. Consider the set of points {(3,1), (4,3), (6,6), (8,12)} Plot these points on a graph –This is called a.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
The simple linear regression model and parameter estimation
Ch 4.1 & 4.2 Two dimensions concept
Ch 9.
Presentation transcript:

Correlation and Regression Ch 4

Why Regression and Correlation We need to be able to analyze the relationship between two variables (up to now we have only looked at single variables) We can use one variable to predict the other, but regression and correlation do NOT imply causation. Ex in the late 1940’s, analysts found a strong correlation between the amount of ice cream consumed and higher levels of the onset of polio….did ice cream cause polio? No, but both of these peak in the summer months causing there to be a strong correlation

Scatter Plots (pg 158)

Possible Relationships Pg 159 Positive Linear Negative Linear No apparent relationship Nonlinear relationship

Correlation Coefficient (r) How we quantify the strength and direction of the linear relationship. Always between -1 and 1 inclusively 1 or -1 is a perfect fit (data points lie on a perfectly straight line) 0 is no correlation at all

Correlation Coefficient Where s x is the sample standard deviation of the x- values, and s y is the sample standard deviation of the y-values. It takes too long to hand calculate……Use Excel ® =correl(array1, array2)

Test for Linear Correlation Find absolute value of r Table G, go to row n Compare the absolute value of r to the critical value in table G If if the absolute value of r is GREATER than the critical value from table G, the variables ARE linearly correlated (either positive or negative) If absolute value of r is LESS than OR EQUAL to the critical value from table G, the variables are NOT linearly correlated

Lot/sale price example Is there a linear correlation between the size of the lot and the sale price for the data from the earlier slide?

Regression Lines: least-squares method Ŷ=mx+b (notice the notation for the y…it is called “y hat”) The slope (m) is the estimated change in y per unit of x (how much is y increasing or decreasing per unit of x) The y-intercept (b) is the initial value when x is zero Lets find the equation of the regression line for the lot/sale price example…using Excel ®

Excel for least squares method Enter data in Excel Insert chart: marked scatter (doesn’t have any lines on the points) Click the plus sign next to your chart Check trendline Click on the over arrow to the right of “trendline” Go to more options Select Type: Linear Options: check the display equation box and the show r squared value box—we talk about r squared later Pg 175…figure 4.15

Use of regression lines Use regression lines to predict Interpolation (within the range of the plotted data) Extrapolation (outside the range of the plotted data) There is possible error in both interpolation and extrapolation (predicted value vs observed value) Prediction error (or residual) is y – Ŷ

SSE, SST, and SSR We are going to briefly talk about three different measures that have ugly equations…individually the numbers are not that useful, but at the end we will put them together to find a useful value…so just be patient. And, just so you know, we will use Excel ® to calculate all these values too

Sum of Squares Error (SSE) We want our prediction errors to be small We use SSE to measure the prediction error When we use the Least-Squares Criterion our SSE will be minimized…we will use Excel ® in just a little bit

Standard Error of the Estimate s Gives the measure of a typical residual (typical prediction error, kind of like the “average” error)…we want it to be small

SST Is SSE = 12 “small” which would indicate that our regression line is useful? We have to find a couple of other values that will help answer this: Total Sum of Squares (SST) where s 2 is the sample variance Or

SSR Sum of Squares Regression (SSR) measures the amount of improvement in the accuracy of our estimates when using the regression equation compared to only relying on the y-values.

SST, SSR, SSE SST=SSR + SSE Pg 190 ex 4.16 Go to Excel ®….File, options, Add-Ins, Go, check Analysis tool pack, OK On your “DATA” tab “Data Analysis” should be to the right Select Data Analysis, select regression, ok Fill in y-values, x-values, ok In the table that appears, under ANOVA, the SS column is where we get SST (total) SSR (regression) and SSE (residual)

Coefficient of Determination Measures the goodness of fit of the regression equation to the data (always between 0 and 1 inclusively) This was on our original trend line graph!!!

Coefficient of determination Read tan box below problem 4.17 pg 190 The closer the coefficient of determination is to 1 the better the fit of the regression equation to the data. (0 is a horrible fit) Pg 194 #43