REGRESSION Want to predict one variable (say Y) using the other variable (say X) GOAL: Set up an equation connecting X and Y. Linear regression linear.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Lesson 10: Linear Regression and Correlation
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Objectives (BPS chapter 24)
Simple Linear Regression
July 1, 2008Lecture 17 - Regression Testing1 Testing Relationships between Variables Statistics Lecture 17.
Inference for Regression 1Section 13.3, Page 284.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Simple Linear Regression Analysis
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Chapter 9: Correlation and Regression
SIMPLE LINEAR REGRESSION
BCOR 1020 Business Statistics
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Simple Linear Regression and Correlation
Chapter 7 Forecasting with Simple Regression
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Correlation and Linear Regression
SIMPLE LINEAR REGRESSION
Correlation and Regression
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Inference for regression - Simple linear regression
Linear Regression and Correlation
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Lecture 4 SIMPLE LINEAR REGRESSION.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Introduction to Probability and Statistics Chapter 12 Linear Regression and Correlation.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Elementary Statistics Correlation and Regression.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
10B11PD311 Economics REGRESSION ANALYSIS. 10B11PD311 Economics Regression Techniques and Demand Estimation Some important questions before a firm are.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Lecture 10: Correlation and Regression Model.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Regression Analysis Deterministic model No chance of an error in calculating y for a given x Probabilistic model chance of an error First order linear.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Correlation and Regression Elementary Statistics Larson Farber Chapter 9 Hours of Training Accidents.
Simple linear regression and correlation Regression analysis is the process of constructing a mathematical model or function that can be used to predict.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
Inference about the slope parameter and correlation
Regression and Correlation
Correlation and Regression
Chapter 12 Inference on the Least-squares Regression Line; ANOVA
LESSON 24: INFERENCES USING REGRESSION
Chapter Thirteen McGraw-Hill/Irwin
Presentation transcript:

REGRESSION Want to predict one variable (say Y) using the other variable (say X) GOAL: Set up an equation connecting X and Y. Linear regression linear eqn: y= α + βx, α= y-intercept, β= slope. We fit a line, regression line, to the data (scatter plot) y= α + βx

REGRESSION LINE – understanding the coefficients Regression line: y= α + βx, α= y-intercept, β= slope Example: The study of income an savings from the last lecture. X=income, Y=savings, data in thousands $. Suppose y = x. Slope. Change in y per unit increase in x. For $1,000 increase in income, savings increase by 0.14($1000)=$140. Intercept. Value of y, when x=0. If one has no income, one has “-4($1000)” savings. Nonsense in this case.

REGRESSION LINE- LEAST SQUARES PRINCIPLE How to find the line of best fit to the data? Use the Least Squares Principle. Given (x i, y i ). Observed value of y is y i, fitted value of y is α + βx i. Find the line, i.e. find α and β, that will minimize sum(observed –fitted) 2 =sum i (y i - α - βx i ) =sum(residuals) 2 =sum(errors) 2 = error Observed value

REGRESSION LINE - FORMULAS The least squares line y= α + βx will have slope β and intercept α such that they minimize Σ i (y i - α - βx i ) 2. The solution to this minimization problem is and Both a and b are sample estimates of α and β. Finally, the fitted regression equation/line is NOTE: Slope of the regression line has the same sign as r XY.

EXAMPLE Income and savings. Find the regression line. Solution: Recall summary statistics: X=income, Y=savings, Σx i = 463, Σx 2 i = 23533, Σy i = 27.4, Σy 2 i = , Σx i y i = r = Additional stats: MINITAB OUTPUT: Descriptive Statistics Variable N Mean StDev SE Mean Minimum Maximum income savings Then, The regression line is: savings = 0.141(income) , in thousands of $. Range of applicability of the regression equation = about the range of the data.

INFERENCE FOR REGRESSION: t-test The main purpose of regression is prediction of y from x. For prediction to be meaningful, we need y to depend significantly on x. In terms of the regression equation: y= α + βx, we need β≠0. Goal: Test hypothesis: Ho: β = 0 (y does not depend on x) Test statistic is based on the point estimate of β, Test statistic Under Ho, the test statistic has t distribution with df=n-2. For a two-sided Ha, we reject Ho if |t| > t α/2, where α is the significance level of the test. One sided alternatives, as usual.

EXAMPLE Income and savings. Does the amount of savings depend significantly on income? Use significance level 5%. Solution. Ho: β = 0 (savings do not depend on income) Ha: β≠0 (savings depend on income) Test statistic: and Critical number t(8) = Test statistic t=10.1> 2.306, so reject Ho. Savings depend significantly on income. Estimate the p-value: 2P(T>10.1) ≈0.

(1-α)100% CONFIDENCE INTERVAL FOR β A (1-α)100% CI for β is where t α/2 is percentile from a t distribution with n-2 df. Example. Income and savings. Find 90% CI for the slope of the regression line of savings (y) on income (x). Solution. 90% CI, so α=0.1 and α/2=0.05, df=8, t 0.05 = % CI for β is:

PREDICTION Two possibilities. Given a value of x, say x* 1. Predict average value of y, or 2. Predict individual value of y for x=x*. “Average” error/residual Predict average valuePredict individual value Prediction: use reg. eqn. Standard error Intervals with confidence (1-α)100% Confidence interval for the predicted mean value (1-α)100% Prediction interval for the individual future value

PREDICTION, contd. NOTE: Prediction interval for an individual value is longer than confidence interval for the mean. This is because the variability in an individual value is larger than variability in the mean. NOTE: Both intervals become longer as x* moves further from the center of the data (further from ). Example. Income and savings. Find point estimates, 90% CI for the mean savings of a family with income of $50k and PI for savings of a family with income of $50k. Solution: 90% CI or PI need t 0.05 with df=8. t 0.05 = Point estimates: Average amount of savings for families with income of $50k is $3,262. For a family with income of $50k, we predict savings of $3,262.

EXAMPLE, contd. “Average” residual: 90%CI: /- 1.86(0.2073) = ( 2.877, 3.648). 90% PI: /- 1.86(0.6681) = (2.02, 4.504) longer than the CI!

CORRELATION AND REGRESSION Coefficient of determination: Say we regress Y on X: Since x changes, then changes variability in x causes variability in via regression equation. Square of the correlation coefficient r has special meaning, R 2 is called coefficient of determination = fraction of variability in Y explained by variability in X via regression of y on X.

EXAMPLE Income and savings. What percent of variability in savings is explained by variability in income? Solution. The correlation coefficient was r= The coefficient of determination is r 2 =(0.963) 2 = About 92.7% of variability in savings is explained by variability in income.

REGRESSION DIAGNOSTICS: RESIDUAL ANALYSIS Regression model: Y= α+βx+ε, ε~N(0, σ) For the inference to work, we need the residuals to be approximately normal. Standard method is probability plot : use a statistical package like MINITAB. The model works well, if the normal probability plot is an approximately straight line. Example. Income and savings. The plot is approximately a straight line, so the model works well.