Introduction to Regression ©2005 Dr. B. C. Paul. Things Favoring ANOVA Analysis ANOVA tells you whether a factor is controlling a result It requires that.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Correlation and Linear Regression.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Inference for Regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.
Chapter 12 Simple Regression
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
The Simple Regression Model
The Basics of Regression continued
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
SIMPLE LINEAR REGRESSION
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
Correlation and Regression Analysis
Introduction to Linear Regression.  You have seen how to find the equation of a line that connects two points.
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
Linear Regression/Correlation
Review Regression and Pearson’s R SPSS Demo
Relationships Among Variables
Chapter 8: Bivariate Regression and Correlation
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
The Chi-Square Distribution 1. The student will be able to  Perform a Goodness of Fit hypothesis test  Perform a Test of Independence hypothesis test.
Inference for regression - Simple linear regression
Graphical Analysis. Why Graph Data? Graphical methods Require very little training Easy to use Massive amounts of data can be presented more readily Can.
Relationships between Variables. Two variables are related if they move together in some way Relationship between two variables can be strong, weak or.
Simple Linear Regression Models
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Two Way ANOVA ©2005 Dr. B. C. Paul. ANOVA Application ANOVA allows us to review data and determine whether a particular effect is changing our results.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Examining Relationships in Quantitative Research
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
Analysis of Residuals ©2005 Dr. B. C. Paul. Examining Residuals of Regression (From our Previous Example) Set up your linear regression in the Usual manner.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Multiple Regression ©2005 Dr. B. C. Paul. Problems with Regression So Far We have only been able to consider one factor as controlling at a time Everything.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
Economics 173 Business Statistics Lecture 10 Fall, 2001 Professor J. Petry
Quadratic Regression ©2005 Dr. B. C. Paul. Fitting Second Order Effects Can also use least square error formulation to fit an equation of the form Math.
ANOVA, Regression and Multiple Regression March
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
The simple linear regression model and parameter estimation
Regression and Correlation
Introduction to Regression Analysis
Multiple Regression.
Understanding Standards Event Higher Statistics Award
Correlation and Regression
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Linear Regression/Correlation
CHAPTER 3 Describing Relationships
CHAPTER 12 More About Regression
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Warsaw Summer School 2017, OSU Study Abroad Program
MGS 3100 Business Analysis Regression Feb 18, 2016
CHAPTER 3 Describing Relationships
Presentation transcript:

Introduction to Regression ©2005 Dr. B. C. Paul

Things Favoring ANOVA Analysis ANOVA tells you whether a factor is controlling a result It requires that the control factor be easily categorized Example Spring Summer Fall Tends to work well on non-quantitative or unordered or discontinuous controlling factors Does not quantify the magnitude or type of effect – only its existence Example Gas Mileage in influence by the season of the year, the driving distance, and the driver

Things Favoring Regression Analysis Suppose your gas mileage data is Outside Temperature Distance Driven Age of Driver The data can be categorized only by arbitrary divisions Suppose I want to know quantitatively how these continuous numeric variables control gas mileage

What Regression Does Idea is that you have a “Dependent Variable” that is a function of some “Independent Variable” Y = F(X) Could be gas mileage as a function of temperature The simplest form of a function is a straight line Y=b o +b 1 *X

Reminders on Linear Form b. Is the intercept of the line with the vertical axis at X=0 B 1 represents the units of rise in Y per unit of Run in X (ie it is the slope of the line)

Idea Behind Linear Regression Most of the variation of Y can be explained as a linear function of X The portion of variation in Y due to other known, unknown or random causes is normally distributed about the regression line The degree to which we missed predicting Y using X can be measured by squaring the difference between the actual and predicted value We will select our linear coefficients b o and b 1 such that the sum of all these squared differences is minimized For this class we will skip the formula derivation and mathematical formulas used to get b o and b 1 Linear Regression is readily done by most calculators and our friend program SPSS

Doing Linear Regression With SPSS Begin by Entering the Data In this case we will consider gas mileage As a function of the distance a car is driven. We believe that there may be a relationship Because vehicles take a while to warm up And get better mileage after warm up.

Why Did I Pick Linear Regression? Controlling variable was continuous – not category If I had looked at gas mileage as a function of gender the control variable would have been category (male, female) A linear relationship is an easy one to consider There are ways of plotting data to see if it appears there might be a linear trend.

A Note on Modeling Statistical Methods are all about fitting mathematical models to real data A linear regression attempts to fit a straight line function of x through the data Y Ultimately the quality of what I do does depend on how good the model represented reality A poorly fit model will produce answers But right answers cost more and are harder to get

Visually Examining Our Data Set Go to Graphs and click to pull down the menu Highlight and click on scatter Plot

You Will Be Given A Choice of Types of Scatter Plots The Default is a simple Scatter plot – which I am Going to accept. I will click on the define Button to move to the next Screen.

I Need to Define What to Plot on the Y and X axis The Y axis is my Dependent Variable. In this case I believe that MPG Is a function of distance Traveled. To make it my variable I will Highlight MPG and then Click the arrow by Y Axis

Next Choose my Independent Variable Since I believe that MPG Might be a function of Distance driven I next select Distance and click the arrow By X axis to move the variable Over to X axis Then I click Ok to go to the Plot.

Out Comes my Plot I see a fairly clear Indication that gas Mileage is improving With the length of the trip

Now Getting on to Regression Click the Pull Down Menu for Analyze Highlight Regression to pop the Side menu out Highlight and Click Linear

Select the Regression Variables Note that I selected MPG for my Dependent variable and distance As my independent variable.

Click OK and Out Comes Stuff First it tells me about variables that entered (Ie – what did it try to make MPG a function of). I told it to make it a function of distance and The table says it entered distance as the Controlling variable. Method Enter means it entered that variable because I told it to.

Next Box Tells Me About How Well I Did Guessing a Linear Model R 2 is called the Pearson Product Coefficient. It tells me how much Of the total scatter in the data is Explained by my linear regression Of one variable (distance) means 39.3% was explained

More Interpretation R value tells you how well your Data followed a straight line. 1 means it is a straight line. 0 means Its nothing like a straight line (a circle Would pull a 0 even though Y is A function of X – its not a linear one). Standard Error of the Estimate Is how far on average you would Miss your guess if you just gave The mileage predicted by the Equation.

The ANOVA Table SPSS does an ANOVA on The linear model as a Predictor. The F value for The regression is The chances of getting an F Value that high if the model Fit was a fluke is essentially 0.

The Coefficients Table Coefficient Table Gives the Regression Constants B o = B 1 =0.654 Y= *X

How Good are Our Coefficients? Test Statistic is done for each Coefficient in the equation. The “null hypothesis” is that the Slope or intercept is actually 0. The test statistic has a t distribution The standard deviation for each Coefficient value is given here. The constant is and the Standard deviation of that estimate Is

Significance of the Coefficients Significance levels in this table indicate the chance That the real value of the regression coefficient Should be 0. As can be seen, for both coefficients there is Essentially no chance that any of the coefficients Should be 0.

Some Conclusions There is definitely a linear influence of miles driven on gas mileage however the linear relationship only explains about 40% of the variability in the data. We know there is still something out there We may also want to examine our residuals to see if there are any trends in the residuals indicating we might be missing something or that our constant normal distribution of residuals about the model is wrong It might come up wrong for example if we were wrong about a linear model being the best fit.

Examining Residuals of Regression Set up your linear regression in the Usual manner.

Selecting Plots After setting you dependent and Independent variables and before Clicking ok, click plots instead.

Picking Residual Plots Plot the residual on the Y axis Against the predicted value on The X axis. Ask for Histograms and normal Probability plots.

More Plots Use the next button to allow you To select another plot. Then enter the residual on the Y axis against the dependent Variable. Finally tell the computer to Continue.

You Will Still Get the Normal Tables we Saw Before Scroll down To see what Is new.

Some Abnormality in the Histogram A Histogram is a bar chart Showing the number of Results in different numeric Intervals. In this case we can see there May be two families of Unexplained events and One of them is causing the Model to over-predict (note the negative tail).

We Have a Cumulative Probability Plot Cumulative probability Counts all the samples That should have come Up by a certain point (it is an integration of the Probability distribution). Normal would plot on a Straight line. This is Somewhat straight but The slope at the center is Wrong and the tails Drift off. (More commentary On reading cumulative Probability plots later).

Look for Trends that have been systematically missed This plot shows The residual (amount we Missed by) against The predicted Value. If there is a trend In the points it May tell us What we missed. In this case it is Pretty scattered.

Missing Trends We are still missing Something because There is a definite Trend in the residuals Relative to the actual MPG. We are missing a Variable or factor. (it might be linear).

Consider Another Data Set We have an Independent and Dependent Variable. (The data set could represent Any problem we wished to Model).

Tell it to do a Regression of the Dependent against the Independent Variable. Be sure we also ask for our Residual plots.

Go to Results The R^2 value is – darn One is a straight line. How much Closer do you want to be. This regression looks like it Fits like a glove – The Mean Square for regression Is 5 orders of magnitude Greater than the MS for error. The F statistic blows the null Hypothesis off the map.

No Chance the Slope or Constant are Zero

There is some evidence the distribution of residuals is a little skewed.

The residual distribution is definitely skewed off to one side

Oh Boy – Can You See the Trend we missed here? Here the residuals Follow a clear and Unmistakable shape of An effect we missed.

This Thing Has a Second Order or Curved Effect

OK – Now What Do I Do? Linear Regression Rapidly and Quantitatively Fits a simple linear function of one variable to another. We noted that there had to be other effects present on the gas mileage but linear regression only handles one independent variable. We also noted that sometimes there our second or higher order effects of a variable present – a straight line just doesn’t fit that We may want to have some more powerful tools to fall back on (we just try the easy stuff first).