Simple Linear Regression

Slides:

Advertisements

Similar presentations

1 Radio Maria World. 2 Postazioni Transmitter locations.

Advertisements

Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.

Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.

AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory

Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.

PDAs Accept Context-Free Languages

Reflection nurulquran.com.

EuroCondens SGB E.

RM WD-97 WD-101 WD-102 WD-124 a IIIh-H : RM110 (2.1) Hainan c agGY Ia-1 (2) Anhui agGY Ia-2 (3) agGY Ia WD-2 WD-8 WD-36 agGY Ia

STATISTICS Linear Statistical Models

STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.

Addition and Subtraction Equations

By John E. Hopcroft, Rajeev Motwani and Jeffrey D. Ullman

1 When you see… Find the zeros You think…. 2 To find the zeros...

Western Public Lands Grazing: The Real Costs Explore, enjoy and protect the planet Forest Guardians Jonathan Proctor.

Add Governors Discretionary (1G) Grants Chapter 6.

CHAPTER 18 The Ankle and Lower Leg

Summative Math Test Algebra (28%) Geometry (29%)

ASCII stands for American Standard Code for Information Interchange

The 5S numbers game..

突破信息检索壁垒－SciFinder Scholar 介绍

Simple Linear Regression 1. review of least squares procedure 2

A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.

Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)

Sampling in Marketing Research

The basics for simulations

Factoring Quadratics — ax² + bx + c Topic

MM4A6c: Apply the law of sines and the law of cosines.

Figure 3–1 Standard logic symbols for the inverter (ANSI/IEEE Std

TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”

TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”

Statistics Review – Part I

Progressive Aerobic Cardiovascular Endurance Run

Slide P- 1. Chapter P Prerequisites P.1 Real Numbers.

MaK_Full ahead loaded 1 Alarm Page Directory (F11)

When you see… Find the zeros You think….

2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.

Before Between After.

2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.

ST/PRM3-EU | | © Robert Bosch GmbH reserves all rights even in the event of industrial property rights. We reserve all rights of disposal such as copying.

Subtraction: Adding UP

Numeracy Resources for KS2

1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)

Static Equilibrium; Elasticity and Fracture

ANALYTICAL GEOMETRY ONE MARK QUESTIONS PREPARED BY:

Resistência dos Materiais, 5ª ed.

Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.

Learning Objectives Describe the linear regression model

Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.

Doc.: IEEE /0333r2 Submission July 2014 TGaj Editor Report for CC12 Jiamin Chen, HuaweiSlide 1 Date: Author:

1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.

Chart Deception Main Source: How to Lie with Charts, by Gerald E. Jones Dr. Michael R. Hyman, NMSU.

1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)

Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.

EPI 809/Spring Probability Distribution of Random Error.

Simple Linear Regression and Correlation

Statistics for Business and Economics

© 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 10 Simple Linear Regression.

Statistics for Business and Economics Chapter 10 Simple Linear Regression.

© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.

11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

© 2011 Pearson Education, Inc Statistics for Business and Economics Chapter 10 Simple Linear Regression.

Statistics for Business and Economics

Statistics for Business and Economics

Presentation transcript:

Simple Linear Regression Chapter 11 Simple Linear Regression

Learning Objectives 1. Describe the Linear Regression Model 2. State the Regression Modeling Steps Explain Ordinary Least Squares Understand and check model assumptions 4. Compute Regression Coefficients 5. Predict Response Variable 6. Interpret Computer Output As a result of this class, you will be able to...

Models 3

Models 1. Representation of Some Phenomenon 2. Mathematical Model Is a Mathematical Expression of Some Phenomenon 3. Often Describe Relationships between Variables 4. Types Deterministic Models Probabilistic Models .

Deterministic Models 1. Hypothesize Exact Relationships 2. Suitable When Prediction Error is Negligible 3. Example: Force Is Exactly Mass Times Acceleration F = m·a © 1984-1994 T/Maker Co.

Probabilistic Models 1. Hypothesize 2 Components Deterministic Random Error 2. Example: Sales Volume Is 10 Times Advertising Spending + Random Error Y = 10X +  Random Error May Be Due to Factors Other Than Advertising

Types of Probabilistic Models 7

Regression Models 13

Types of Probabilistic Models 7

Regression Models 1. Answer ‘What Is the Relationship Between the Variables?’ 2. Equation Used 1 Numerical Dependent (Response) Variable What Is to Be Predicted 1 or More Numerical or Categorical Independent (Explanatory) Variables 3. Used Mainly for Prediction & Estimation

Regression Modeling Steps 1. Hypothesize Deterministic Component 2. Estimate Unknown Model Parameters 3. Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4. Evaluate Model 5. Use Model for Prediction & Estimation

Model Specification 13

Regression Modeling Steps 1. Hypothesize Deterministic Component 2. Estimate Unknown Model Parameters 3. Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4. Evaluate Model 5. Use Model for Prediction & Estimation

Specifying the Model 1. Define Variables 2. Hypothesize Nature of Relationship Expected Effects (i.e., Coefficients’ Signs) Functional Form (Linear or Non-Linear) Interactions

Model Specification Is Based on Theory 1. Theory of Field (e.g., Sociology) 2. Mathematical Theory 3. Previous Research 4. ‘Common Sense’

Thinking Challenge: Which Is More Logical? With positive linear relationship, sales increases infinitely. Discuss concept of ‘relevant range’. 17

Types of Regression Models This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 18

Types of Regression Models This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 19

Types of Regression Models 1 Explanatory Variable Models This teleology is based on the number of explanatory variables & nature of relationship between X & Y. Simple 20

Types of Regression Models 1 Explanatory 2+ Explanatory Variable Models Variables This teleology is based on the number of explanatory variables & nature of relationship between X & Y. Simple Multiple 21

Types of Regression Models 1 Explanatory 2+ Explanatory Variable Models Variables This teleology is based on the number of explanatory variables & nature of relationship between X & Y. Simple Multiple Linear 22

Types of Regression Models 1 Explanatory 2+ Explanatory Variable Models Variables This teleology is based on the number of explanatory variables & nature of relationship between X & Y. Simple Multiple Non- Linear Linear 23

Types of Regression Models 1 Explanatory 2+ Explanatory Variable Models Variables This teleology is based on the number of explanatory variables & nature of relationship between X & Y. Simple Multiple Non- Linear Linear Linear 24

Types of Regression Models 1 Explanatory 2+ Explanatory Variable Models Variables This teleology is based on the number of explanatory variables & nature of relationship between X & Y. Simple Multiple Non- Non- Linear Linear Linear Linear 24

Linear Regression Model 26

Types of Regression Models This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 27

Linear Equations High School Teacher © 1984-1994 T/Maker Co. 28

Linear Regression Model 1. Relationship Between Variables Is a Linear Function Population Y-Intercept Population Slope Random Error Y     X   i 1 i i Dependent (Response) Variable (e.g., income) Independent (Explanatory) Variable (e.g., education)

Population & Sample Regression Models 30

Population & Sample Regression Models  $  $  $  $  $ 31

Population & Sample Regression Models Unknown Relationship  $  $  $  $  $ 32

Population & Sample Regression Models Random Sample Unknown Relationship  $  $  $  $  $  $  $ 33

Population & Sample Regression Models Random Sample Unknown Relationship  $  $  $  $  $  $  $ 34

Population Linear Regression Model Observedvalue i = Random error Observed value 35

Sample Linear Regression Model i = Random error ^ Unsampled observation Observed value 36

Estimating Parameters: Least Squares Method 40

Regression Modeling Steps 1. Hypothesize Deterministic Component 2. Estimate Unknown Model Parameters 3. Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4. Evaluate Model 5. Use Model for Prediction & Estimation

Scattergram 1. Plot of All (Xi, Yi) Pairs 2. Suggests How Well Model Will Fit Y 60 40 20 X 20 40 60

Thinking Challenge How would you draw a line through the points? How do you determine which line ‘fits best’? 42

Thinking Challenge How would you draw a line through the points? How do you determine which line ‘fits best’? 43

Thinking Challenge How would you draw a line through the points? How do you determine which line ‘fits best’? 44

Thinking Challenge How would you draw a line through the points? How do you determine which line ‘fits best’? 45

Thinking Challenge How would you draw a line through the points? How do you determine which line ‘fits best’? 46

Thinking Challenge How would you draw a line through the points? How do you determine which line ‘fits best’? 47

Thinking Challenge How would you draw a line through the points? How do you determine which line ‘fits best’? 48

Least Squares 1. ‘Best Fit’ Means Difference Between Actual Y Values & Predicted Y Values Are a Minimum But Positive Differences Off-Set Negative 49

Least Squares 1. ‘Best Fit’ Means Difference Between Actual Y Values & Predicted Y Values Are a Minimum But Positive Differences Off-Set Negative 50

Least Squares 1. ‘Best Fit’ Means Difference Between Actual Y Values & Predicted Y Values Are a Minimum But Positive Differences Off-Set Negative 2. LS Minimizes the Sum of the Squared Differences (SSE) 51

Least Squares Graphically 52

Coefficient Equations Prediction Equation Sample Slope Sample Y-intercept 53

Computation Table 54

Interpretation of Coefficients

Interpretation of Coefficients ^ 1. Slope (1) Estimated Y Changes by 1 for Each 1 Unit Increase in X If 1 = 2, then Sales (Y) Is Expected to Increase by 2 for Each 1 Unit Increase in Advertising (X) ^ ^

Interpretation of Coefficients ^ 1. Slope (1) Estimated Y Changes by 1 for Each 1 Unit Increase in X If 1 = 2, then Sales (Y) Is Expected to Increase by 2 for Each 1 Unit Increase in Advertising (X) 2. Y-Intercept (0) Average Value of Y When X = 0 If 0 = 4, then Average Sales (Y) Is Expected to Be 4 When Advertising (X) Is 0 ^ ^ ^ ^

Parameter Estimation Example You’re a marketing analyst for Hasbro Toys. You gather the following data: Ad $ Sales (Units) 1 1 2 1 3 2 4 2 5 4 What is the relationship between sales & advertising?

Scattergram Sales vs. Advertising 57

Guess The Parameters!

Scattergram Sales vs. Advertising 57

Parameter Estimation Solution Table 58

Parameter Estimation Solution 59

Coefficient Interpretation Solution

Coefficient Interpretation Solution ^ 1. Slope (1) Sales Volume (Y) Is Expected to Increase by .7 Units for Each $1 Increase in Advertising (X)

Coefficient Interpretation Solution ^ 1. Slope (1) Sales Volume (Y) Is Expected to Increase by .7 Units for Each $1 Increase in Advertising (X) 2. Y-Intercept (0) Average Value of Sales Volume (Y) Is -.10 Units When Advertising (X) Is 0 Difficult to Explain to Marketing Manager Expect Some Sales Without Advertising ^

Parameter Estimation Computer Output Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 -0.1000 0.6350 -0.157 0.8849 ADVERT 1 0.7000 0.1914 3.656 0.0354 ^ k ^ ^ 0 1

Derivation of Parameter Equations Goal: Minimize squared error

Derivation of Parameter Equations

Parameter Estimation Thinking Challenge You’re an economist for the county cooperative. You gather the following data: Fertilizer (lb.) Yield (lb.) 4 3.0 6 5.5 10 6.5 12 9.0 What is the relationship between fertilizer & crop yield? © 1984-1994 T/Maker Co. 62

Scattergram Crop Yield vs. Fertilizer* Yield (lb.) Fertilizer (lb.) 65

Parameter Estimation Solution Table* 66

Parameter Estimation Solution* 67

Coefficient Interpretation Solution*

Coefficient Interpretation Solution* ^ 1. Slope (1) Crop Yield (Y) Is Expected to Increase by .65 lb. for Each 1 lb. Increase in Fertilizer (X)

Coefficient Interpretation Solution* ^ 1. Slope (1) Crop Yield (Y) Is Expected to Increase by .65 lb. for Each 1 lb. Increase in Fertilizer (X) 2. Y-Intercept (0) Average Crop Yield (Y) Is Expected to Be 0.8 lb. When No Fertilizer (X) Is Used ^

Probability Distribution of Random Error 69

Regression Modeling Steps 1. Hypothesize Deterministic Component 2. Estimate Unknown Model Parameters 3. Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4. Evaluate Model 5. Use Model for Prediction & Estimation

Linear Regression Assumptions 1. Mean of Probability Distribution of Error Is 0 Probability Distribution of Error Has Constant Variance Exercise: Constant across what? 3. Probability Distribution of Error is Normal 4. Errors Are Independent

Error Probability Distribution ^ 91

Random Error Variation

Random Error Variation 1. Variation of Actual Y from Predicted Y

Random Error Variation 1. Variation of Actual Y from Predicted Y 2. Measured by Standard Error of Regression Model Sample Standard Deviation of , s ^

Random Error Variation 1. Variation of Actual Y from Predicted Y 2. Measured by Standard Error of Regression Model Sample Standard Deviation of , s 3. Affects Several Factors Parameter Significance Prediction Accuracy ^

Testing for Significance Evaluating the Model Testing for Significance 99

Regression Modeling Steps 1. Hypothesize Deterministic Component 2. Estimate Unknown Model Parameters 3. Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4. Evaluate Model 5. Use Model for Prediction & Estimation

Test of Slope Coefficient 1. Shows If There Is a Linear Relationship Between X & Y 2. Involves Population Slope 1 3. Hypotheses H0: 1 = 0 (No Linear Relationship) Ha: 1  0 (Linear Relationship) 4. Theoretical Basis Is Sampling Distribution of Slope

Sampling Distribution of Sample Slopes 102

Sampling Distribution of Sample Slopes 103

Sampling Distribution of Sample Slopes All Possible Sample Slopes Sample 1: 2.5 Sample 2: 1.6 Sample 3: 1.8 Sample 4: 2.1 : : Very large number of sample slopes 104

Sampling Distribution of Sample Slopes All Possible Sample Slopes Sample 1: 2.5 Sample 2: 1.6 Sample 3: 1.8 Sample 4: 2.1 : : Very large number of sample slopes Sampling Distribution S ^ 1 ^ 1 105

Slope Coefficient Test Statistic 106

Test of Slope Coefficient Example You’re a marketing analyst for Hasbro Toys. You find b0 = -.1, b1 = .7 & s = .60553. Ad $ Sales (Units) 1 1 2 1 3 2 4 2 5 4 Is the relationship significant at the .05 level?

Solution Table 108

Test of Slope Parameter Solution H0: 1 = 0 Ha: 1  0   .05 df  5 - 2 = 3 Critical Value(s): Test Statistic: Decision: Conclusion: Reject at  = .05 There is evidence of a relationship 109

Test Statistic Solution

Test of Slope Parameter Computer Output Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 -0.1000 0.6350 -0.157 0.8849 ADVERT 1 0.7000 0.1914 3.656 0.0354 ‘Standard Error’ is the estimated standard deviation of the sampling distribution, sbP. ^ ^ S ^ t = k / S k ^ k k P-Value

Measures of Variation in Regression 1. Total Sum of Squares (SSyy) Measures Variation of Observed Yi Around the MeanY 2. Explained Variation (SSR) Variation Due to Relationship Between X & Y 3. Unexplained Variation (SSE) Variation Due to Other Factors

Variation Measures Yi Unexplained sum of squares (Yi -Yi)2 ^ Total sum of squares (Yi -Y)2 Explained sum of squares (Yi -Y)2 ^ 78

Coefficient of Determination 1. Proportion of Variation ‘Explained’ by Relationship Between X & Y 0  r2  1 79

Coefficient of Determination Examples 80

Coefficient of Determination Example You’re a marketing analyst for Hasbro Toys. You find 0 = -0.1 & 1 = 0.7. Ad $ Sales (Units) 1 1 2 1 3 2 4 2 5 4 Interpret a coefficient of determination of 0.8167. ^ ^ 83

r 2 Computer Output Root MSE 0.60553 R-square 0.8167 Dep Mean 2.00000 Adj R-sq 0.7556 C.V. 30.27650 r2 r2 adjusted for number of explanatory variables & sample size S

Using the Model for Prediction & Estimation 112

Regression Modeling Steps 1. Hypothesize Deterministic Component 2. Estimate Unknown Model Parameters 3. Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4. Evaluate Model 5. Use Model for Prediction & Estimation

Prediction With Regression Models 1. Types of Predictions Point Estimates Interval Estimates 2. What Is Predicted Population Mean Response E(Y) for Given X Point on Population Regression Line Individual Response (Yi) for Given X

What Is Predicted 115

Confidence Interval Estimate of Mean Y

Factors Affecting Interval Width 1. Level of Confidence (1 - ) Width Increases as Confidence Increases 2. Data Dispersion (s) Width Increases as Variation Increases 3. Sample Size Width Decreases as Sample Size Increases 4. Distance of Xp from MeanX Width Increases as Distance Increases

Why Distance from Mean? X Greater dispersion than X1 The closer to the mean, the less variability. This is due to the variability in estimated slope parameters. X 118

Confidence Interval Estimate Example You’re a marketing analyst for Hasbro Toys. You find b0 = -.1, b1 = .7 & s = .60553. Ad $ Sales (Units) 1 1 2 1 3 2 4 2 5 4 Estimate the mean sales when advertising is $4 at the .05 level.

Solution Table 120

Confidence Interval Estimate Solution X to be predicted 121

Prediction Interval of Individual Response Note the 1 under the radical in the standard error formula. The effect of the extra Syx is to increase the width of the interval. This will be seen in the interval bands. Note! 122

Why the Extra ‘S’? The error in predicting some future value of Y is the sum of 2 errors: 1. the error of estimating the mean Y, E(Y|X) 2. the random error that is a component of the value of Y to be predicted. Even if we knew the population regression line exactly, we would still make  error. 123

Interval Estimate Computer Output Dep Var Pred Std Err Low95% Upp95% Low95% Upp95% Obs SALES Value Predict Mean Mean Predict Predict 1 1.000 0.600 0.469 -0.892 2.092 -1.837 3.037 2 1.000 1.300 0.332 0.244 2.355 -0.897 3.497 3 2.000 2.000 0.271 1.138 2.861 -0.111 4.111 4 2.000 2.700 0.332 1.644 3.755 0.502 4.897 5 4.000 3.400 0.469 1.907 4.892 0.962 5.837 Predicted Y when X = 4 Confidence Interval Prediction Interval SY ^

Hyperbolic Interval Bands Note: 1. As we move farther from the mean, the bands get wider. 2. The prediction interval bands are wider. Why? (extra Syx) 124

Correlation Models 129

Types of Probabilistic Models 130

Correlation Models 1. Answer ‘How Strong Is the Linear Relationship Between 2 Variables?’ 2. Coefficient of Correlation Used Population Correlation Coefficient Denoted  (Rho) Values Range from -1 to +1 Measures Degree of Association 3. Used Mainly for Understanding

Sample Coefficient of Correlation 1. Pearson Product Moment Coefficient of Correlation, r: 132

Coefficient of Correlation Values -1.0 -.5 +.5 +1.0 134

Coefficient of Correlation Values No Correlation -1.0 -.5 +.5 +1.0 135

Coefficient of Correlation Values No Correlation -1.0 -.5 +.5 +1.0 Increasing degree of negative correlation 136

Coefficient of Correlation Values Perfect Negative Correlation No Correlation -1.0 -.5 +.5 +1.0 137

Coefficient of Correlation Values Perfect Negative Correlation No Correlation -1.0 -.5 +.5 +1.0 Increasing degree of positive correlation 138

Coefficient of Correlation Values Perfect Negative Correlation Perfect Positive Correlation No Correlation -1.0 -.5 +.5 +1.0 139

Coefficient of Correlation Examples 141

Test of Coefficient of Correlation 1. Shows If There Is a Linear Relationship Between 2 Numerical Variables 2. Same Conclusion as Testing Population Slope 1 3. Hypotheses H0:  = 0 (No Correlation) Ha:   0 (Correlation)

Conclusion 1. Described the Linear Regression Model 2. Stated the Regression Modeling Steps 3. Explained Ordinary Least Squares 4. Computed Regression Coefficients 5. Predicted Response Variable 6. Interpreted Computer Output

Any blank slides that follow are blank intentionally. End of Chapter Any blank slides that follow are blank intentionally.