Regression and Correlation methods

Slides:



Advertisements
Similar presentations
Simple Linear Regression 1. review of least squares procedure 2
Advertisements

Rule September The following presentation is for educational purposes only and is not a substitute for the statute and Division rules.
EPI809/Spring Chapter 10 Hypothesis testing: Categorical Data Analysis.
Simple Linear Regression
Regression and correlation methods
Chapter 12 Simple Linear Regression
Regresi Linear Sederhana Pertemuan 01 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
EPI 809/Spring Probability Distribution of Random Error.
Simple Linear Regression and Correlation
Simple Linear Regression
Chapter 12 Simple Linear Regression
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)
Statistics for Business and Economics
The Simple Regression Model
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
REGRESSION AND CORRELATION
SIMPLE LINEAR REGRESSION
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
Chapter 6 (cont.) Regression Estimation. Simple Linear Regression: review of least squares procedure 2.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
1 1 Slide Simple Linear Regression Chapter 14 BA 303 – Spring 2011.
© 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 10 Simple Linear Regression.
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Regression Analysis (2)
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
© The McGraw-Hill Companies, Inc., Chapter 11 Correlation and Regression.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Statistical Methods Statistical Methods Descriptive Inferential
Multivariate Analysis. One-way ANOVA Tests the difference in the means of 2 or more nominal groups Tests the difference in the means of 2 or more nominal.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Chapter 14 Inference for Regression AP Statistics 14.1 – Inference about the Model 14.2 – Predictions and Conditions.
Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Chapter 8: Simple Linear Regression Yang Zhenlin.
1 Simple Linear Regression and Correlation Least Squares Method The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
BUSINESS MATHEMATICS & STATISTICS. Module 6 Correlation ( Lecture 28-29) Line Fitting ( Lectures 30-31) Time Series and Exponential Smoothing ( Lectures.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Introduction. We want to see if there is any relationship between the results on exams and the amount of hours used for studies. Person ABCDEFGHIJ Hours/
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
1 Linear Regression Model. 2 Types of Regression Models.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
© 2011 Pearson Education, Inc Statistics for Business and Economics Chapter 10 Simple Linear Regression.
The simple linear regression model and parameter estimation
Chapter 20 Linear and Multiple Regression
Statistics for Business and Economics
Chapter 11: Simple Linear Regression
Chapter 11: Simple Linear Regression
Regression Analysis Week 4.
Regression Chapter 8.
SIMPLE LINEAR REGRESSION
Simple Linear Regression
Statistics for Business and Economics
Introduction to Regression
Presentation transcript:

Regression and Correlation methods Chapter 11 Regression and Correlation methods EPI 809/Spring 2008

Learning Objectives Describe the Linear Regression Model State the Regression Modeling Steps Explain Ordinary Least Squares Compute Regression Coefficients Understand and check model assumptions Predict Response Variable Comments of SAS Output As a result of this class, you will be able to... EPI 809/Spring 2008

Learning Objectives… Correlation Models Link between a correlation model and a regression model Test of coefficient of Correlation EPI 809/Spring 2008

Models EPI 809/Spring 2008 3

What is a Model? Representation of Some Phenomenon Non-Math/Stats Model Representation of Some Phenomenon Non-Math/Stats Model . EPI 809/Spring 2008

What is a Math/Stats Model? Often Describe Relationship between Variables Types Deterministic Models (no randomness) Probabilistic Models (with randomness) . EPI 809/Spring 2008

Deterministic Models Hypothesize Exact Relationships Suitable When Prediction Error is Negligible Example: Body mass index (BMI) is measure of body fat based Metric Formula: BMI = Weight in Kilograms (Height in Meters)2 Non-metric Formula: BMI = Weight (pounds)x703 (Height in inches)2 EPI 809/Spring 2008

Probabilistic Models Hypothesize 2 Components Deterministic Random Error Example: Systolic blood pressure of newborns Is 6 Times the Age in days + Random Error SBP = 6xage(d) +  Random Error May Be Due to Factors Other Than age in days (e.g. Birthweight) EPI 809/Spring 2008

Types of Probabilistic Models EPI 809/Spring 2008 7

Regression Models EPI 809/Spring 2008 13

Types of Probabilistic Models EPI 809/Spring 2008 7

Regression Models Relationship between one dependent variable and explanatory variable(s) Use equation to set up relationship Numerical Dependent (Response) Variable 1 or More Numerical or Categorical Independent (Explanatory) Variables Used Mainly for Prediction & Estimation EPI 809/Spring 2008

Regression Modeling Steps 1. Hypothesize Deterministic Component Estimate Unknown Parameters 2. Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 3. Evaluate the fitted Model 4. Use Model for Prediction & Estimation EPI 809/Spring 2008

Model Specification EPI 809/Spring 2008 13

Specifying the deterministic component 1. Define the dependent variable and independent variable 2. Hypothesize Nature of Relationship Expected Effects (i.e., Coefficients’ Signs) Functional Form (Linear or Non-Linear) Interactions EPI 809/Spring 2008

Model Specification Is Based on Theory 1. Theory of Field (e.g., Epidemiology) 2. Mathematical Theory 3. Previous Research 4. ‘Common Sense’ EPI 809/Spring 2008

Thinking Challenge: Which Is More Logical? CD+ counts CD+ counts With positive linear relationship, sales increases infinitely. Discuss concept of ‘relevant range’. Years since seroconversion Years since seroconversion CD+ counts CD+ counts Years since seroconversion Years since seroconversion EPI 809/Spring 2008 17

OB/GYN Study EPI 809/Spring 2008

Types of Regression Models This teleology is based on the number of explanatory variables & nature of relationship between X & Y. EPI 809/Spring 2008 18

Types of Regression Models This teleology is based on the number of explanatory variables & nature of relationship between X & Y. EPI 809/Spring 2008 19

Types of Regression Models 1 Explanatory Variable Models This teleology is based on the number of explanatory variables & nature of relationship between X & Y. Simple EPI 809/Spring 2008 20

Types of Regression Models 1 Explanatory 2+ Explanatory Variable Models Variables This teleology is based on the number of explanatory variables & nature of relationship between X & Y. Simple Multiple EPI 809/Spring 2008 21

Types of Regression Models 1 Explanatory 2+ Explanatory Variable Models Variables This teleology is based on the number of explanatory variables & nature of relationship between X & Y. Simple Multiple Linear EPI 809/Spring 2008 22

Types of Regression Models 1 Explanatory 2+ Explanatory Variable Models Variables This teleology is based on the number of explanatory variables & nature of relationship between X & Y. Simple Multiple Non- Linear Linear EPI 809/Spring 2008 23

Types of Regression Models 1 Explanatory 2+ Explanatory Variable Models Variables This teleology is based on the number of explanatory variables & nature of relationship between X & Y. Simple Multiple Non- Linear Linear Linear EPI 809/Spring 2008 24

Types of Regression Models 1 Explanatory 2+ Explanatory Variable Models Variables This teleology is based on the number of explanatory variables & nature of relationship between X & Y. Simple Multiple Non- Non- Linear Linear Linear Linear EPI 809/Spring 2008 24

Linear Regression Model EPI 809/Spring 2008 26

Types of Regression Models This teleology is based on the number of explanatory variables & nature of relationship between X & Y. EPI 809/Spring 2008 27

Linear Equations EPI 809/Spring 2008 © 1984-1994 T/Maker Co. 28

Linear Regression Model 1. Relationship Between Variables Is a Linear Function Population Y-Intercept Population Slope Random Error Y     X   i 1 i i Dependent (Response) Variable (e.g., CD+ c.) Independent (Explanatory) Variable (e.g., Years s. serocon.)

Population & Sample Regression Models EPI 809/Spring 2008 30

Population & Sample Regression Models      EPI 809/Spring 2008 31

Population & Sample Regression Models Unknown Relationship      EPI 809/Spring 2008 32

Population & Sample Regression Models Random Sample Unknown Relationship        EPI 809/Spring 2008 33

Population & Sample Regression Models Random Sample Unknown Relationship        EPI 809/Spring 2008 34

Population Linear Regression Model Observedvalue i = Random error Observed value EPI 809/Spring 2008 35

Sample Linear Regression Model i = Random error ^ Unsampled observation Observed value EPI 809/Spring 2008 36

Estimating Parameters: Least Squares Method EPI 809/Spring 2008 40

Scatter plot 1. Plot of All (Xi, Yi) Pairs 2. Suggests How Well Model Will Fit Y 60 40 20 X 20 40 60 EPI 809/Spring 2008

Thinking Challenge How would you draw a line through the points? How do you determine which line ‘fits best’? Y 60 40 20 X 20 40 60 EPI 809/Spring 2008 42

Thinking Challenge How would you draw a line through the points? How do you determine which line ‘fits best’? Slope changed Y 60 40 20 X 20 40 60 Intercept unchanged EPI 809/Spring 2008 43

Thinking Challenge How would you draw a line through the points? How do you determine which line ‘fits best’? Slope unchanged Y 60 40 20 X 20 40 60 Intercept changed EPI 809/Spring 2008 44

Thinking Challenge How would you draw a line through the points? How do you determine which line ‘fits best’? Slope changed Y 60 40 20 X 20 40 60 Intercept changed EPI 809/Spring 2008 45

Least Squares 1. ‘Best Fit’ Means Difference Between Actual Y Values & Predicted Y Values Are a Minimum. But Positive Differences Off-Set Negative ones EPI 809/Spring 2008 49

Least Squares 1. ‘Best Fit’ Means Difference Between Actual Y Values & Predicted Y Values is a Minimum. But Positive Differences Off-Set Negative ones. So square errors! EPI 809/Spring 2008 50

Least Squares 1. ‘Best Fit’ Means Difference Between Actual Y Values & Predicted Y Values Are a Minimum. But Positive Differences Off-Set Negative. So square errors! 2. LS Minimizes the Sum of the Squared Differences (errors) (SSE) EPI 809/Spring 2008 51

Least Squares Graphically EPI 809/Spring 2008 52

Coefficient Equations Prediction equation Sample slope Sample Y - intercept EPI 809/Spring 2008

Derivation of Parameters (1) Least Squares (L-S): Minimize squared error EPI 809/Spring 2008

Derivation of Parameters (1) Least Squares (L-S): Minimize squared error EPI 809/Spring 2008

Computation Table EPI 809/Spring 2008 54

Interpretation of Coefficients EPI 809/Spring 2008

Interpretation of Coefficients ^ 1. Slope (1) Estimated Y Changes by 1 for Each 1 Unit Increase in X If 1 = 2, then Y Is Expected to Increase by 2 for Each 1 Unit Increase in X ^ ^ EPI 809/Spring 2008

Interpretation of Coefficients ^ 1. Slope (1) Estimated Y Changes by 1 for Each 1 Unit Increase in X If 1 = 2, then Y Is Expected to Increase by 2 for Each 1 Unit Increase in X 2. Y-Intercept (0) Average Value of Y When X = 0 If 0 = 4, then Average Y Is Expected to Be 4 When X Is 0 ^ ^ ^ ^ EPI 809/Spring 2008

Parameter Estimation Example Obstetrics: What is the relationship between Mother’s Estriol level & Birthweight using the following data? Estriol Birthweight (mg/24h) (g/1000) 1 1 2 1 3 2 4 2 5 4 EPI 809/Spring 2008

Scatterplot Birthweight vs. Estriol level EPI 809/Spring 2008 57

Parameter Estimation Solution Table EPI 809/Spring 2008 58

Parameter Estimation Solution EPI 809/Spring 2008 59

Coefficient Interpretation Solution EPI 809/Spring 2008

Coefficient Interpretation Solution ^ 1. Slope (1) Birthweight (Y) Is Expected to Increase by .7 Units for Each 1 unit Increase in Estriol (X) EPI 809/Spring 2008

Coefficient Interpretation Solution ^ 1. Slope (1) Birthweight (Y) Is Expected to Increase by .7 Units for Each 1 unit Increase in Estriol (X) 2. Intercept (0) Average Birthweight (Y) Is -.10 Units When Estriol level (X) Is 0 Difficult to explain The birthweight should always be positive ^ EPI 809/Spring 2008

SAS codes for fitting a simple linear regression Data BW; /*Reading data in SAS*/ input estriol birthw@@; cards; 1 1 2 1 3 2 4 2 5 4 ; run; PROC REG data=BW; /*Fitting linear regression models*/ model birthw=estriol; EPI 809/Spring 2008

Parameter Estimation SAS Computer Output Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -0.10000 0.63509 -0.16 0.8849 Estriol 1 0.70000 0.19149 3.66 0.0354 ^ ^ 0 1 EPI 809/Spring 2008

Parameter Estimation Thinking Challenge You’re a Vet epidemiologist for the county cooperative. You gather the following data: Food (lb.) Milk yield (lb.) 4 3.0 6 5.5 10 6.5 12 9.0 What is the relationship between cows’ food intake and milk yield? © 1984-1994 T/Maker Co. EPI 809/Spring 2008 62

Scattergram Milk Yield vs. Food intake* M. Yield (lb.) Food intake (lb.) EPI 809/Spring 2008 65

Parameter Estimation Solution Table* EPI 809/Spring 2008 66

Parameter Estimation Solution* EPI 809/Spring 2008 67

Coefficient Interpretation Solution* EPI 809/Spring 2008

Coefficient Interpretation Solution* ^ 1. Slope (1) Milk Yield (Y) Is Expected to Increase by .65 lb. for Each 1 lb. Increase in Food intake (X) EPI 809/Spring 2008

Coefficient Interpretation Solution* ^ 1. Slope (1) Milk Yield (Y) Is Expected to Increase by .65 lb. for Each 1 lb. Increase in Food intake (X) 2. Y-Intercept (0) Average Milk yield (Y) Is Expected to Be 0.8 lb. When Food intake (X) Is 0 ^ EPI 809/Spring 2008