Statistical Data Analysis - Lecture /04/03

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Correlation and regression
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Regression Lesson 11. The General Linear Model n Relationship b/n predictor & outcome variables form straight line l Correlation, regression, t-tests,
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
Chapter 8: Simple Linear Regression Yang Zhenlin.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Lecture Slides Elementary Statistics Twelfth Edition
Chapter 2 Linear regression.
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
The simple linear regression model and parameter estimation
Department of Mathematics
Step 1: Specify a null hypothesis
Regression Analysis AGEC 784.
Inference for Least Squares Lines
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Topic 10 - Linear Regression
Linear Regression.
CHAPTER 3 Describing Relationships
Two-way ANOVA with significant interactions
Correlation and Simple Linear Regression
Chapter 11: Simple Linear Regression
Statistical Data Analysis - Lecture10 26/03/03
Understanding Standards Event Higher Statistics Award
(Residuals and
Slides by JOHN LOUCKS St. Edward’s University.
Simple Linear Regression - Introduction
Correlation and Simple Linear Regression
Lecture Slides Elementary Statistics Thirteenth Edition
I271B Quantitative Methods
Chapter 8 Part 2 Linear Regression
Correlation and Regression
Regression model Y represents a value of the response variable.
Everyone thinks they know this stuff
Stats Club Marnie Brennan
CHAPTER 29: Multiple Regression*
Simple Linear Regression
Simple Linear Regression
Regression Models - Introduction
Least-Squares Regression
Correlation and Simple Linear Regression
Undergraduated Econometrics
M248: Analyzing data Block D UNIT D2 Regression.
Simple Linear Regression
Least-Squares Regression
Simple Linear Regression and Correlation
Product moment correlation
Model Adequacy Checking
MGS 3100 Business Analysis Regression Feb 18, 2016
Regression Models - Introduction
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Statistical Data Analysis - Lecture17 - 29/04/03 Regression We have looked at ANOVA models for a single response with one or two factors Sometimes these factors are known as predictors However that term is usually reserved for continuous variables What do we do when we have a single response and one (or more) variables we believe might be related to the response? One solution is linear regression and least squares Statistical Data Analysis - Lecture17 - 29/04/03

Simple linear regression When we have a single response and a single predictor or explanatory variable, depending on what we’re interested in we might fit a simple linear regression model Why would we do such a thing? If we plotted (transformed data) and saw that the data were related in some linear fashion If we’re interested in making some prediction of response based on experimental or observational data If we think that a line would be an adequate summary of the data Statistical Data Analysis - Lecture17 - 29/04/03

Statistical Data Analysis - Lecture17 - 29/04/03 What does linear mean Linear = line (we will see later on that this concept must be extended to higher dimensions) Technically, linear refers to the coefficients in the regression model What is this regression model we keep referring to? In school (hopefully) we learned a number of ways to describe a straight line A straight line can be described to points on a graph – the line passes through the points (x1, y1) and (x2, y2) Statistical Data Analysis - Lecture17 - 29/04/03

Statistical Data Analysis - Lecture17 - 29/04/03 y-axis (x2, y2) (x1, y1) x-axis We can also describe the line in terms of a slope and an intercept The slope the change in the y-value for a unit change in the x-value. In this simple situation we can think of this as the change in the height of the line as we progress along the x-axis The intercept is the height of the line when x = 0 Statistical Data Analysis - Lecture17 - 29/04/03

Simple linear regression Perhaps now we have the tools to begin to write down a model Generally we have more than two points to work with  Ideally we wouldn’t fit a regression model to a data set with fewer than thirty points We have a number of responses, yi, and an associated measurement (which we assume is taken without measurement error) xi which we think explains our response. However, the points (usually) don’t lie exactly on a straight line – there is a bit of “noise” associated with each measurement – does this sound familiar? Statistical Data Analysis - Lecture17 - 29/04/03

A probability model for simple linear regression The response variable, Y, is described by the intercept and a coefficient for the predictor X. If we subtract the model we expect to find residuals which are independently and identically normally distributed with mean zero and standard deviation sigma. How do we know we find the intercept and the coefficient for the predictor X? Statistical Data Analysis - Lecture17 - 29/04/03

The method of least squares We have a “theoretical” model which we believes describes the data. How do we “fit” the model to our data? That is, how do we find the slope and intercept for our model? These ideas are best illustrated with some data Our data set has 50 observations, with a response Y and a predictor X Given this is bivariate data, the first thing we do is plot it Statistical Data Analysis - Lecture17 - 29/04/03

Statistical Data Analysis - Lecture17 - 29/04/03 We can see from our plot, that the data points seem to cluster around a straight line. Maybe a regression model is appropriate Statistical Data Analysis - Lecture17 - 29/04/03

Statistical Data Analysis - Lecture17 - 29/04/03 We could try fitting a line “by eye” But everyone’s best guess would probably be different We want consistency Statistical Data Analysis - Lecture17 - 29/04/03

Statistical Data Analysis - Lecture17 - 29/04/03 Least squares The least squares procedure is a method for fitting regression lines It attempts to find the intercept and the slope such that the residual sum of squares is minimised. I.e. Find 0 and 1 such that is minimized The minimum value of this function is zero. This is hardly ever achieved. The least squares fitted values are denoted Statistical Data Analysis - Lecture17 - 29/04/03

Statistical Data Analysis - Lecture17 - 29/04/03 We try find a slope and intercept such that the red and green lines as small as possible I.e. we try and minimize the perpendicular distances from the points to the fitted values Statistical Data Analysis - Lecture17 - 29/04/03

Fitted values and residuals After we fit the regression line, we get fitted coefficient values for the slope and the intercept. If we apply these to the data, we get the fitted values, i.e. Corresponding to the fitted values are the residuals, our estimate of the errors in the models, i.e. We use the residuals to assess model fit, and to check the model assumptions Statistical Data Analysis - Lecture17 - 29/04/03

Checking model assumptions Recall that we when proposed the regression model we made an assumption. Namely, the errors are normally distributed with mean zero and variance sigma squares This means, like in ANOVA, we need to check whether the residuals are normally distributed and whether the variances are equal amongst residuals We’ve seen how to check for normality – a norplot Statistical Data Analysis - Lecture17 - 29/04/03

Statistical Data Analysis - Lecture17 - 29/04/03 Our norplot is follows a straight line reasonably well, therefore we might think our assumption of normality is satisfied. The intercept of a fitted line on this plot is zero – what does that mean? The slope is 5.62 – what does that mean? Statistical Data Analysis - Lecture17 - 29/04/03

Statistical Data Analysis - Lecture17 - 29/04/03 Equality of variance It is possible to have normality without having equality of variance, i.e. in some situations we fit the model However, we did not fit this model to this set of data, we assumed that we had equal variances for every error, i.e. We check this assumption, as before, with a pred-res plot This time however, we will generally see less patterning, because the data are not grouped Statistical Data Analysis - Lecture17 - 29/04/03

Statistical Data Analysis - Lecture17 - 29/04/03 Residuals vs Fitted 16 15 10 49 5 Residuals -5 -10 17 20 40 60 80 100 Fitted values lm(formula = y ~ x) Statistical Data Analysis - Lecture17 - 29/04/03

Statistical Data Analysis - Lecture17 - 29/04/03 What do we look for in a pred-res plot? Extreme residuals – our estimated standard deviation of the residuals is 5.6 More negative residuals than positive or vice versa Strong patterns or trends in the residuals What do these features mean? If we have extreme residuals then there are a number of reasons why An outlier or a data entry error Poor model fit Possible high leverage points elsewhere If there is a disproportionate ratio of positive to negative residuals then we may have A skewed response variable Statistical Data Analysis - Lecture17 - 29/04/03

Interpreting pred-res plots If we have strong patterns in the pred-res plot then this can mean a number of things The equality of variance assumption has been violated – this is usually shown by a funnel shape in the plot The simple linear model did not explain the trend in the data, i.e. there is some trend that still exists in the data which might require the addition of extra model terms – this is more likely in multivariate regression The data require transformation before a linear model is appropriate Statistical Data Analysis - Lecture17 - 29/04/03

Statistical Data Analysis - Lecture17 - 29/04/03 This funnel effect is evidence of “non-homogeneity of variance” Usually we can get around it by transforming the data or fitting a different model It is never valid to proceed from this point to the interpretation of the regression coefficients Statistical Data Analysis - Lecture17 - 29/04/03

Statistical Data Analysis - Lecture17 - 29/04/03 We usually see this type of effect when the real trend is really non-linear The actual model here is y=exp(5x) This is definitely a non-linear model Taking logs would cure this problem – why? Statistical Data Analysis - Lecture17 - 29/04/03

Statistical Data Analysis - Lecture17 - 29/04/03 Here we have more negative residuals than positive The extreme residuals are all positive as well This says the errors are skewed This violates our assumption of normality The real model here was y=5x+3+e, e~exp(50) Statistical Data Analysis - Lecture17 - 29/04/03