Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.

Slides:



Advertisements
Similar presentations
Chapter 12 Inference for Linear Regression
Advertisements

Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Chapter 12 Simple Regression
The Simple Regression Model
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Simple Linear Regression Analysis
Introduction to Probability and Statistics Linear Regression and Correlation.
CHAPTER 3 Describing Relationships
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Chapter 12 Section 1 Inference for Linear Regression.
Linear Regression/Correlation
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Correlation & Regression
Descriptive Methods in Regression and Correlation
Introduction to Linear Regression and Correlation Analysis
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
STA291 Statistical Methods Lecture 27. Inference for Regression.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Chapter 15 Inference for Regression
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Chapter 14 Inference for Regression AP Statistics 14.1 – Inference about the Model 14.2 – Predictions and Conditions.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12: Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Agresti/Franklin Statistics, 1 of 88  Section 11.4 What Do We Learn from How the Data Vary Around the Regression Line?
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
1 Chapter 12: Analyzing Association Between Quantitative Variables: Regression Analysis Section 12.1: How Can We Model How Two Variables Are Related?
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12: Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
BPS - 5th Ed. Chapter 231 Inference for Regression.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
AP Statistics Chapter 14 Section 1.
Inference for Regression
CHAPTER 12 More About Regression
Correlation and Simple Linear Regression
Lecture Slides Elementary Statistics Thirteenth Edition
CHAPTER 26: Inference for Regression
Review for Exam 2 Some important themes from Chapters 6-9
Correlation and Simple Linear Regression
CHAPTER 12 More About Regression
Simple Linear Regression and Correlation
CHAPTER 3 Describing Relationships
Chapter 14 Inference for Regression
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 12 More About Regression
CHAPTER 3 Describing Relationships
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis to explore the association between two quantitative variables

Agresti/Franklin Statistics, 2 of 88  Section 11.1 How Can We “Model” How Two Variables Are Related?

Agresti/Franklin Statistics, 3 of 88 Regression Analysis The first step of a regression analysis is to identify the response and explanatory variables We use y to denote the response variable We use x to denote the explanatory variable

Agresti/Franklin Statistics, 4 of 88 The Scatterplot The first step in answering the question of association is to look at the data A scatterplot is a graphical display of the relationship between two variables

Agresti/Franklin Statistics, 5 of 88 Example: What Do We Learn from a Scatterplot in the Strength Study? An experiment was designed to measure the strength of female athletes The goal of the experiment was to find the maximum number of pounds that each individual athlete could bench press

Agresti/Franklin Statistics, 6 of 88 Example: What Do We Learn from a Scatterplot in the Strength Study? 57 high school female athletes participated in the study The data consisted of the following variables: x: the number of 60-pound bench presses an athlete could do y: maximum bench press

Agresti/Franklin Statistics, 7 of 88 Example: What Do We Learn from a Scatterplot in the Strength Study? For the 57 girls in this study, these variables are summarized by: x: mean = 11.0, st.deviation = 7.1 y: mean = 79.9 lbs, st.dev. = 13.3 lbs

Agresti/Franklin Statistics, 8 of 88 Example: What Do We Learn from a Scatterplot in the Strength Study?

Agresti/Franklin Statistics, 9 of 88 The Regression Line Equation When the scatterplot shows a linear trend, a straight line fitted through the data points describes that trend The regression line is: is the predicted value of the response variable y is the y-intercept and is the slope

Agresti/Franklin Statistics, 10 of 88 Example: Which Regression Line Predicts Maximum Bench Press? “t” test for slope t=(b-0)/se df=N-2

Agresti/Franklin Statistics, 11 of 88 Example: What Do We Learn from a Scatterplot in the Strength Study? The MINITAB output shows the following regression equation: BP = (BP_60) The y-intercept is 63.5 and the slope is 1.49 The slope of 1.49 tells us that predicted maximum bench press increases by about 1.5 pounds for every additional 60-pound bench press an athlete can do

Agresti/Franklin Statistics, 12 of 88 Outliers Check for outliers by plotting the data The regression line can be pulled toward an outlier and away from the general trend of points

Agresti/Franklin Statistics, 13 of 88 Influential Points An observation can be influential in affecting the regression line when two thing happen: Its x value is low or high compared to the rest of the data It does not fall in the straight-line pattern that the rest of the data have

Agresti/Franklin Statistics, 14 of 88 Residuals are Prediction Errors The regression equation is often called a prediction equation The difference between an observed outcome and its predicted value is the prediction error, called a residual

Agresti/Franklin Statistics, 15 of 88 Residuals Each observation has a residual A residual is the vertical distance between the data point and the regression line

Agresti/Franklin Statistics, 16 of 88 Residuals We can summarize how near the regression line the data points fall by The regression line has the smallest sum of squared residuals and is called the least squares line

Agresti/Franklin Statistics, 17 of 88 Regression Model: A Line Describes How the Mean of y Depends on x At a given value of x, the equation: Predicts a single value of the response variable But… we should not expect all subjects at that value of x to have the same value of y Variability occurs in the y values

Agresti/Franklin Statistics, 18 of 88 The Regression Line The regression line connects the estimated means of y at the various x values In summary, Describes the relationship between x and the estimated means of y at the various values of x

Agresti/Franklin Statistics, 19 of 88 The Population Regression Equation The population regression equation describes the relationship in the population between x and the means of y The equation is:

Agresti/Franklin Statistics, 20 of 88 The Population Regression Equation In the population regression equation, α is a population y-intercept and β is a population slope These are parameters In practice we estimate the population regression equation using the prediction equation for the sample data

Agresti/Franklin Statistics, 21 of 88 The Population Regression Equation The population regression equation merely approximates the actual relationship between x and the population means of y It is a model A model is a simple approximation for how variable relate in the population

Agresti/Franklin Statistics, 22 of 88 The Regression Model

Agresti/Franklin Statistics, 23 of 88 The Regression Model If the true relationship is far from a straight line, this regression model may be a poor one

Agresti/Franklin Statistics, 24 of 88 Variability about the Line At each fixed value of x, variability occurs in the y values around their mean, µ y The probability distribution of y values at a fixed value of x is a conditional distribution At each value of x, there is a conditional distribution of y values An additional parameter σ describes the standard deviation of each conditional distribution

Agresti/Franklin Statistics, 25 of 88 A Statistical Model A statistical model never holds exactly in practice. It is merely a simple approximation for reality Even though it does not describe reality exactly, a model is useful if the true relationship is close to what the model predicts

Agresti/Franklin Statistics, 26 of 88 Find the predicted fertility for Vietnam, which had the highest value of x = 91. a.5.25 b c d For recent data on several nations, the prediction equation relating y = fertility rate to x = female economic activity (the female labor force as a percentage of the male labor force) is:

Agresti/Franklin Statistics, 27 of 88 Find the residual for Vietnam, which had y = 2.3. a b c d For recent data on several nations, the prediction equation relating y = fertility rate to x = female economic activity (the female labor force as a percentage of the male labor force) is:

Agresti/Franklin Statistics, 28 of 88  Section 11.2 How Can We Describe Strength of Association?

Agresti/Franklin Statistics, 29 of 88 Correlation The correlation, denoted by r, describes linear association The correlation ‘r’ has the same sign as the slope ‘b’ The correlation ‘r’ always falls between -1 and +1 The larger the absolute value of r, the stronger the linear association

Agresti/Franklin Statistics, 30 of 88 Correlation and Slope We can’t use the slope to describe the strength of the association between two variables because the slope’s numerical value depends on the units of measurement

Agresti/Franklin Statistics, 31 of 88 Correlation and Slope The correlation is a standardized version of the slope The correlation does not depend on units of measurement

Agresti/Franklin Statistics, 32 of 88 Correlation and Slope The correlation and the slope are related in the following way: *** Exam I: xbar and ybar reminder

Agresti/Franklin Statistics, 33 of 88 Example: What’s the Correlation for Predicting Strength? For the female athlete strength study: x: number of 60-pound bench presses y: maximum bench press x: mean = 11.0, st.dev.=7.1 y: mean= 79.9 lbs., st.dev. = 13.3 lbs. Regression equation:

Agresti/Franklin Statistics, 34 of 88 Example: What’s the Correlation for Predicting Strength? The variables have a strong, positive association

Agresti/Franklin Statistics, 35 of 88 The Squared Correlation Another way to describe the strength of association refers to how close predictions for y tend to be to observed y values The variables are strongly associated if you can predict y much better by substituting x values into the prediction equation than by merely using the sample mean y and ignoring x

Agresti/Franklin Statistics, 36 of 88 The Squared Correlation Consider the prediction error: the difference between the observed and predicted values of y Using the regression line to make a prediction, each error is: Using only the sample mean, y, to make a prediction, each error is:

Agresti/Franklin Statistics, 37 of 88 The Squared Correlation When we predict y using y (that is, ignoring x), the error summary equals: This is called the total sum of squares

Agresti/Franklin Statistics, 38 of 88 The Squared Correlation When we predict y using x with the regression equation, the error summary is: This is called the residual sum of squares

Agresti/Franklin Statistics, 39 of 88 The Squared Correlation When a strong linear association exists, the regression equation predictions tend to be much better than the predictions using y We measure the proportional reduction in error and call it, r 2

Agresti/Franklin Statistics, 40 of 88 The Squared Correlation We use the notation r 2 for this measure because it equals the square of the correlation r

Agresti/Franklin Statistics, 41 of 88 Example: What Does r 2 Tell Us in the Strength Study? For the female athlete strength study: x: number of 60-pund bench presses y: maximum bench press The correlation value was found to be r = 0.80 We can calculate r 2 from r: (0.80) 2 =0.64 For predicting maximum bench press, the regression equation has 64% less error than y has

Agresti/Franklin Statistics, 42 of 88 Correlation r and Its Square r 2 Both r and r 2 describe the strength of association ‘r’ falls between -1 and +1 It represents the slope of the regression line when x and y have been standardized ‘r 2 ’ falls between 0 and 1 It summarizes the reduction in sum of squared errors in predicting y using the regression line instead of using y

Agresti/Franklin Statistics, 43 of 88 Find the predicted math SAT score for a student who has the verbal SAT score of 800. a.250 b.500 c.650 d.750 All Students who attend Lake Woebegone College must take the math and verbal SAT exams. Both exams have a mean of 500 and a standard deviation of 100. The regression equation relating y = math SAT score and x = verbal SAT score is: See example 10 on pg 544

Agresti/Franklin Statistics, 44 of 88 Find the r-value. a..5 b..25 c.1.00 d..75 All Students who attend Lake Woebegone College must take the math and verbal SAT exams. Both exams have a mean of 500 and a standard deviation of 100. The regression equation relating y = math SAT score and x = verbal SAT score is:

Agresti/Franklin Statistics, 45 of 88 Find the r 2 value. a..5 b..25 c.1.00 d..75 All Students who attend Lake Woebegone College must take the math and verbal SAT exams. Both exams have a mean of 500 and a standard deviation of 100. The regression equation relating y = math SAT score and x = verbal SAT score is:

Agresti/Franklin Statistics, 46 of 88  Section 11.3 How Can We make Inferences About the Association?

Agresti/Franklin Statistics, 47 of 88 Descriptive and Inferential Parts of Regression The sample regression equation, r, and r 2 are descriptive parts of a regression analysis The inferential parts of regression use the tools of confidence intervals and significance tests to provide inference about the regression equation, the correlation and r-squared in the population of interest

Agresti/Franklin Statistics, 48 of 88 Assumptions for Regression Analysis Basic assumption for using regression line for description: The population means of y at different values of x have a straight-line relationship with x, that is: This assumption states that a straight-line regression model is valid This can be verified with a scatterplot.

Agresti/Franklin Statistics, 49 of 88 Assumptions for Regression Analysis Extra assumptions for using regression to make statistical inference: The data were gathered using randomization The population values of y at each value of x follow a normal distribution, with the same standard deviation at each x value

Agresti/Franklin Statistics, 50 of 88 Assumptions for Regression Analysis Models, such as the regression model, merely approximate the true relationship between the variables A relationship will not be exactly linear, with exactly normal distributions for y at each x and with exactly the same standard deviation of y values at each x value

Agresti/Franklin Statistics, 51 of 88 Testing Independence between Quantitative Variables Suppose that the slope β of the regression line equals 0 Then… The mean of y is identical at each x value The two variables, x and y, are statistically independent: The outcome for y does not depend on the value of x It does not help us to know the value of x if we want to predict the value of y

Agresti/Franklin Statistics, 52 of 88 Testing Independence between Quantitative Variables

Agresti/Franklin Statistics, 53 of 88 Testing Independence between Quantitative Variables Steps of Two-Sided Significance Test about a Population Slope β: 1. Assumptions: The population satisfies regression line: Randomization The population values of y at each value of x follow a normal distribution, with the same standard deviation at each x value

Agresti/Franklin Statistics, 54 of 88 Testing Independence between Quantitative Variables Steps of Two-Sided Significance Test about a Population Slope β: 2. Hypotheses: H 0 : β = 0, H a : β ≠ 0 3. Test statistic: Software supplies sample slope b and its se

Agresti/Franklin Statistics, 55 of 88 Testing Independence between Quantitative Variables Steps of Two-Sided Significance Test about a Population Slope β: 4. P-value: Two-tail probability of t test statistic value more extreme than observed: Use t distribution with df = n-2 5. Conclusions: Interpret P-value in context If decision needed, reject H 0 if P-value ≤ significance level

Agresti/Franklin Statistics, 56 of 88 Example: Is Strength Associated with 60-Pound Bench Press?

Agresti/Franklin Statistics, 57 of 88 Example: Is Strength Associated with 60-Pound Bench Press? Conduct a two-sided significance test of the null hypothesis of independence Assumptions: A scatterplot of the data revealed a linear trend so the straight-line regression model seems appropriate The scatter of points have a similar spread at different x values The sample was a convenience sample, not a random sample, so this is a concern

Agresti/Franklin Statistics, 58 of 88 Example: Is Strength Associated with 60-Pound Bench Press? Hypotheses: H 0 : β = 0, H a : β ≠ 0 Test statistic: P-value: Conclusion: An association exists between the number of 60-pound bench presses and maximum bench press

Agresti/Franklin Statistics, 59 of 88 A Confidence Interval for β A small P-value in the significance test of H 0 : β = 0 suggests that the population regression line has a nonzero slope To learn how far the slope β falls from 0, we construct a confidence interval:

Agresti/Franklin Statistics, 60 of 88 Example: Estimating the Slope for Predicting Maximum Bench Press Construct a 95% confidence interval for β Based on a 95% CI, we can conclude, on average, the maximum bench press increases by between 1.2 and 1.8 pounds for each additional 60-pound bench press that an athlete can do

Agresti/Franklin Statistics, 61 of 88 Example: Estimating the Slope for Predicting Maximum Bench Press Let’s estimate the effect of a 10-unit increase in x: Since the 95% CI for β is (1.2, 1.8), the 95% CI for 10β is (12, 18) On the average, we infer that the maximum bench press increases by at least 12 pounds and at most 18 pounds, for an increase of 10 in the number of 60-pound bench presses