1 Regression Analysis The contents in this chapter are from Chapters 20-23 of the textbook. The cntry15.sav data will be used. The data collected 15 countries’

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Lesson 10: Linear Regression and Correlation
Forecasting Using the Simple Linear Regression Model and Correlation
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Objectives (BPS chapter 24)
Simple Linear Regression
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 13 Multiple Regression
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Linear Regression and Correlation
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Chapter 11 Multiple Regression.
Simple Linear Regression Analysis
Introduction to Probability and Statistics Linear Regression and Correlation.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Pertemua 19 Regresi Linier
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Chapter 7 Forecasting with Simple Regression
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Analysis
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation & Regression
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Inference for regression - Simple linear regression
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Lecture 10: Correlation and Regression Model.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Statistics for Managers Using Microsoft® Excel 5th Edition
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Chapter 12: Correlation and Linear Regression 1.
Regression Analysis AGEC 784.
Inference for Least Squares Lines
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Correlation, Bivariate Regression, and Multiple Regression
Statistics for Managers using Microsoft Excel 3rd Edition
Inference for Regression
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
CHAPTER 29: Multiple Regression*
6-1 Introduction To Empirical Models
Multiple Regression Chapter 14.
Inferences for Regression
Presentation transcript:

1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’ information lifeexpf: female life expectancy Birthrat: births per 1000 population Both are scale variables.

2 Linear regression model

3 It is obviously, the points are not randomly scattered over the grid. Instead, there appears to be a pattern. As birthrate increases, life expectancy decreases. How to choose the “best” line? The least squares principle is recommended. Linear regression model

4 Least squares principle

5 Dependent variable: the variable you wish to predict Independent variable: variables used to make the prediction Simple linear regression: in which a single numerical independent variable X is used to predict the numerical dependent variable Y. where

6 Least squares principle

7

8

9 Linear regression model

10 Linear regression model The regression model becomes life expectancy=90-(0.70 x birthrate) That tells us that for an increase of 1 in birthrate, there is a decrease in life expectancy of 0.70 years.

11 Prediction and residuals

12 Coefficient of Correlation It measures the strength of the linear relationship between two numerical variables.

13 Coefficient of Correlation Coefficient of correlation -1= = 1

14 Prediction and residuals The coefficient determination

15 ANOVA

16 Testing hypotheses about the assumptions Independence: all of the observations are independent The variance homogeneity: the variance of the distribution of the dependent variable must be the same for all values of the independent variable. Normality: for each value of the independent variable, the distribution of the related dependent variable follows a normal distribution.

17 Testing hypotheses

18 Testing that the slope is zero In this example, the sample slope is about and its standard error is 0.05, so the value for the t statistics is -0.70/0.05=-14, related p-value is less that We should reject the hypothesis. There appears to be a linear relationship between 1992 female life expectancy and birthrate. The 95% confidence interval for the population slope is (-0.805, ). Testing hypotheses

19 Prediction The regression equation obtained can be used for predict the life expectancy based on birthrates. For a country with a birthrate of 30 per 1000 population Predicted life expectancy = x 30=69.08 years

20 Predicting means and individual observations The plot on the next page gives the standard error of the predicted mean life expectancy for different values of birthrate. The vertical line at 32.9 is the average birthrate for all cases. The farther birthrates are from the sample mean, the larger the standard error of the predicted means.

21 Plot of standard error of predicted mean

22 The 95% fitting confidence region

23 Statistical diagnostics  Is the model correct?  Are there any outliers?  Is the variance constant?  Is the error normally distributed?

24 Statistical diagnostics Residuals can provide many useful information for the above four issues in statistical diagnostics. You can’t judge the related size of a residual by looking at its value alone as it depends on the unit of the dependent variable and are not convenient to use. Standardized residuals: divide the residual by the estimated standard deviation of the residuals.

25 Statistical diagnostics If the distribution of residuals is approximately normal, about 95% of the standardized residuals should be between -2 and 2; 99% should be between and It is easy to see whether there are some outliers.

26 Statistical diagnostics When you compute a standardized residuals, all of the observed residuals are divided by the same number. The variability of the dependent variable is not constant for all points, but depends on the value of the independent variable. The studentized residual takes into account the differences in variability from point to point. We calculate it by dividing the residual by an estimate of the standard deviation of the residual at that point.

27 Statistical diagnostics A residual divided by an estimate of the standard deviation of the residual at that point is called its studentized residual. The studentized residuals make it easier to see violations of the regression assumptions.

28 Statistical diagnostics

29 Standardized Residual Stem-and-Leaf Plot Frequency Stem & Leaf Stem width: Each leaf: 1 case(s) Standardized Residuals

30 Checking for normality

31 If the data are a sample from a normal distribution, you expect the points to fall more or less on a straight line. You can see the two largest residuals in absolute value (Thailand and Namibia) are stragglers from the line. Next page is a detrended normality plot. If the data are from a normal, the points in the detrended normal plot should fall randomly in a band abound 0. Checking for normality

32 Checking for normality

33 Testing for normality Many statistical tests for normality have been proposed, one of them is the Kolmogorov-Smirnov test.

34 Checking for constant variance Residual plot: plot of studentized residuals against the estimated values. From the residual plot you can see whether there are some pattern. For a normal case, the residuals appears to be randomly scattered around a horizontal line through 0.

35 Checking for constant variance

36 Checking linearity When the relationship between two variables is not linear, you can sometimes transform the variables to make the relationship linear, for example, take logarithm, sine, exponential, etc. Scale plot of female life expectancy against natural log of phones per 100.

37 Multiple Regression Models Considering the country.sav data, you are interesting to predict female life expectancy from Urban: percentage of the population living in urban areas Docs: number of doctors per 10,000 people Beds: number of hospital beds per 10,000 people Gdp: per capita gross domestic product in dollars Radios: radios per people

38 Multiple Regression Models A linear regression model is Scatterplot matrix is useful.

39 Scatterplot matrix

40 Scatterplot matrix The relationship between female life expectancy and the percentage of the population living urban areas appears to be more or less linear. The other four independent variables appear to be related to female life expectancy, but the relation is not linear. We take log of the values of the four independent variables.

41

42 Correlation matrix

43 Regression coefficients The estimated regression model Y= urban lndocs lnbeds lngdp lnradio

44 SPSS output: model summary statistics

45 SPSS output: ANOVA This regression is meaningful as the significance level is less than The residual variance is