We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byKatrina Fox
Modified over 4 years ago
Relationships Between Quantitative VariablesChapter 3 Relationships Between Quantitative Variables Copyright ©2011 Brooks/Cole, Cengage Learning
Principle Idea: The description and confirmation of relationships between variables are very important in research. Copyright ©2011 Brooks/Cole, Cengage Learning
Three Tools we will use …Scatterplot, a two-dimensional graph of data values Correlation, a statistic that measures the strength and direction of a linear relationship between two quantitative variables. Regression equation, an equation that describes the average relationship between a quantitative response and explanatory variable. Copyright ©2011 Brooks/Cole, Cengage Learning
3.1 Looking for Patterns with ScatterplotsQuestions to Ask about a Scatterplot What is the average pattern? Does it look like a straight line, or is it curved? What is the direction of the pattern? How much do individual points vary from the average pattern? Are there any unusual data points? Copyright ©2011 Brooks/Cole, Cengage Learning
Positive/Negative Association/ Linear RelationshipTwo variables have a positive association when the values of one variable tend to increase as the values of the other variable increase. Two variables have a negative association when the values of one variable tend to decrease as the values of the other variable increase. Two variables have a linear relationship when the pattern of their relationship resembles a straight line. Copyright ©2011 Brooks/Cole, Cengage Learning
Example 3.1 Height and HandspanData shown are the first 12 observations of a data set that includes the heights (in inches) and fully stretched handspans (in centimeters) of 167 college students. Copyright ©2011 Brooks/Cole, Cengage Learning
Example 3.1 Height and HandspanTaller people tend to have greater handspan measurements than shorter people do. When two variables tend to increase together, we say that they have a positive association. The handspan and height measurements may have a linear relationship. Copyright ©2011 Brooks/Cole, Cengage Learning
Example 3.2 Driver Age and Maximum Legibility Distance of Highway SignsA research firm determined the maximum distance at which each of 30 drivers could read a newly designed sign. The 30 participants in the study ranged in age from 18 to 82 years old. We want to examine the relationship between age and the sign legibility distance. Copyright ©2011 Brooks/Cole, Cengage Learning
Example 3.2 Driver Age and Maximum Legibility Distance of Highway SignsWe see a negative association with a linear pattern. We will use a straight-line equation to model this relationship. Copyright ©2011 Brooks/Cole, Cengage Learning
Example 3.3 The Development of Musical PreferencesThe 108 participants in the study ranged in age from 16 to 86 years old. We want to examine the relationship between song-specific age (age in the year the song was popular) and musical preference (positive score above average, negative score below average). Copyright ©2011 Brooks/Cole, Cengage Learning
Example 3.3 The Development of Musical PreferencesPopular music preferences acquired in late adolescence and early adulthood. The association is nonlinear. Copyright ©2011 Brooks/Cole, Cengage Learning
Groups and Outliers Use different plotting symbols or colors to represent different subgroups. Look for outliers: points that have an usual combination of data values. Copyright ©2011 Brooks/Cole, Cengage Learning
3.2 Describing Linear Patterns with a Regression LineWhen the best equation for describing the relationship between x and y is a straight line, the equation is called the regression line. Two purposes of the regression line: to estimate the average value of y at any specified value of x to predict the value of y for an individual, given that individual’s x value Copyright ©2011 Brooks/Cole, Cengage Learning
Example 3.5 Height and Handspan (cont)Scatterplot with ‘best’ regression line (via Minitab) Based on line, at height of 60 inches tall, handspan is about 18 cm, at height of 70 inches tall, handspan is about 21.5 cm. So change in height of 10 inches corresponds to change in handspan of about 3.5 cm 3.5/10 = 0.35 cm per inch. Estimated slope is about 0.35 cm per inch. Copyright ©2011 Brooks/Cole, Cengage Learning
The Equation for the Regression Lineis spoken as “y-hat,” and it is also referred to either as predicted y or estimated y. b0 is the intercept of the straight line. The intercept is the value of y when x = 0. b1 is the slope of the straight line. The slope tells us how much of an increase (or decrease) there is for the y variable when the x variable increases by one unit. The sign of the slope tells us whether y increases or decreases when x increases. Copyright ©2011 Brooks/Cole, Cengage Learning
Example 3.6 Writing the Regression Equation for Height and HandspanRegression equation: Handspan = Height Estimate the average handspan for people 60 inches tall: Average handspan = (60) = 18 cm. Predict the handspan for someone who is 60 inches tall: Predicted handspan = (60) = 18 cm. Note: in a statistical relationship, there is variation from the average pattern. Copyright ©2011 Brooks/Cole, Cengage Learning
Interpreting the y-Intercept and the SlopeRegression equation: Handspan = Height b0 = -3 is the y-intercept, the estimated or predicted handspan for someone whose height (x) is 0 inches. No meaningful interpretation in this example. b1 = 0.35 is the slope, we estimate the handspan increases by 0.35 cm, on average, for each increase of 1 inch in height. Copyright ©2011 Brooks/Cole, Cengage Learning
Example 3.7 Driver Age and Maximum Legibility Distance of Highway SignsRegression equation: Distance = Age Slope of –3 tells us that, on average, the legibility distance decreases 3 feet when age increases by 1 year. Estimate the average distance for 20-year-old drivers: Average distance = 577 – 3(20) = 517 ft. Predict the legibility distance for a 20-year-old driver: Predicted distance = 577 – 3(20) = 517 ft. Copyright ©2011 Brooks/Cole, Cengage Learning
Prediction Errors and ResidualsPrediction Error = difference between the observed value of y and the predicted value . Residual = Copyright ©2011 Brooks/Cole, Cengage Learning
Example 3.8 Prediction Errors for the Highway Sign DataRegression equation: = 577 – 3x x = Age y = Distance Residual 18 510 577 – 3(18)=523 510 – 523 = -13 20 590 577 – 3(20)=517 590 – 517 = 73 22 516 577 – 3(22)=511 516 – 511 = 5 Can compute the residual for all 30 observations. Positive residual observed value higher than predicted. Negative residual observed value lower than predicted. Copyright ©2011 Brooks/Cole, Cengage Learning
The Least Squares Estimation CriterionLeast Squares Regression Line: minimizes the sum of squared prediction errors. SSE: Sum of squared prediction errors. Formulas for Slope and Intercept: Copyright ©2011 Brooks/Cole, Cengage Learning
3.3 Measuring Strength and Direction with CorrelationCorrelation r indicates the strength and the direction of a straight-line relationship. The strength of the relationship is determined by the closeness of the points to a straight line. The direction is determined by whether one variable generally increases or generally decreases when the other variable increases. Copyright ©2011 Brooks/Cole, Cengage Learning
Interpretation the Correlation Coefficientr is always between –1 and +1 magnitude indicates the strength r = –1 or +1 indicates a perfect linear relationship sign indicates the direction r = 0 indicates a slope of 0 so knowing x does not change the predicted value of y Formula for correlation: Copyright ©2011 Brooks/Cole, Cengage Learning
Example 3.10 Correlation Between Handspan and HeightRegression equation: Handspan = (Height) Correlation r = a somewhat strong positive linear relationship. Copyright ©2011 Brooks/Cole, Cengage Learning
Example 3.11 Correlation Between Ageand Sign Legibility Distance Regression equation: Distance = 577 – 3(Age) Correlation r = -0.8 a somewhat strong negative linear association. Copyright ©2011 Brooks/Cole, Cengage Learning
Example 3.12 Left and Right HandspansIf you know the span of a person’s right hand, can you accurately predict his/her left handspan? Correlation r = a very strong positive linear relationship. Copyright ©2011 Brooks/Cole, Cengage Learning
Example 3.13 Verbal SAT and GPAGrade point averages (GPAs) and verbal SAT scores for a sample of 100 university students. Correlation r = a moderately strong positive linear relationship. Copyright ©2011 Brooks/Cole, Cengage Learning
Example 3.14 Age and Hours of TV Watching per DayRelationship between age and hours of daily television viewing for 1299 survey respondents. Correlation r = a weak connection. Note: a few claimed to watch more than 24 hours/day! Copyright ©2011 Brooks/Cole, Cengage Learning
Example 3.15 Hours of Sleep and Hours of StudyRelationship between reported hours of sleep the previous 24 hours and the reported hours of study during the same period for a sample of 116 college students. Correlation r = –0.36 a not too strong negative association. Copyright ©2011 Brooks/Cole, Cengage Learning
Interpretation of and Formula for r2Squared correlation r2 is between 0 and 1 and indicates the proportion of variation in the response explained by x. SSTO = sum of squares total = sum of squared differences between observed y values and . SSE = sum of squared errors (residuals) = sum of squared differences between observed y values and predicted values based on least squares line. Copyright ©2011 Brooks/Cole, Cengage Learning
Interpretation of r2 Height and Right Handspan r2 = 0.55 Height explains 55% of the variation among observed right handspans TV viewing and Age r2 = only about 1.85%; knowing a person’s age doesn’t help much in predicting amount of daily TV viewing. Copyright ©2011 Brooks/Cole, Cengage Learning
Reading Computer Results for RegressionCopyright ©2011 Brooks/Cole, Cengage Learning
3.4 Regression and Correlation Difficulties and DisastersExtrapolating too far beyond the observed range of x values Allowing outliers to overly influence results Combining groups inappropriately Using correlation and a straight-line equation to describe curvilinear data Copyright ©2011 Brooks/Cole, Cengage Learning
Extrapolation Risky to use a regression equation to predict values far outside the range where the original data fell (called extrapolation). No guarantee that the relationship will continue beyond the range for which we have observed data. Copyright ©2011 Brooks/Cole, Cengage Learning
Example 3.17 Height and Foot LengthThree outliers were data entry errors. Regression equation uncorrected data: height corrected data: height Correlation uncorrected data: r = corrected data: r = 0.69 Copyright ©2011 Brooks/Cole, Cengage Learning
Example 3.18 Earthquakes in USCorrelation all data: r = w/o SF: r = –0.824 Copyright ©2011 Brooks/Cole, Cengage Learning
Example 3.19 Height and Lead FeetScatterplot of all data: College student heights and responses to the question “What is the fastest you have ever driven a car?” Scatterplot by gender: Combining two groups led to illegitimate correlation Copyright ©2011 Brooks/Cole, Cengage Learning
Example 3.20 U.S. Population PredictionsPopulation of US (in millions) for each census year between 1790 and 2000. Correlation: r = Regression Line: population = – (Year) Poor Prediction for Year 2030 = – (2030) or about 269 million, due to curved (not linear) pattern. Copyright ©2011 Brooks/Cole, Cengage Learning
3.5 Correlation Does Not Prove CausationInterpretations of an Observed Association Causation Confounding Factors Present Explanatory and Response are both affected by other variables Response variable is causing a change in the explanatory variable Copyright ©2011 Brooks/Cole, Cengage Learning
Case Study 3.1 A Weighty IssueRelationship between Actual and Ideal Weight Females Males Copyright ©2011 Brooks/Cole, Cengage Learning
Regression BPS chapter 5 © 2006 W.H. Freeman and Company.
Chapter 4 The Relation between Two Variables
Agresti/Franklin Statistics, 1 of 52 Chapter 3 Association: Contingency, Correlation, and Regression Learn …. How to examine links between two variables.
Warm up Use calculator to find r,, a, b. Chapter 8 LSRL-Least Squares Regression Line.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Quantitative Variables Chapter 5.
LSRL Least Squares Regression Line
Describing the Relation Between Two Variables
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Math 227 Elementary Statistics Math 227 Elementary Statistics Sullivan, 4 th ed.
CHAPTER 3 Describing Relationships
Ch 2 and 9.1 Relationships Between 2 Variables
Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Correlation and Regression A BRIEF overview Correlation Coefficients l Continuous IV & DV l or dichotomous variables (code as 0-1) n mean interpreted.
Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.
© 2020 SlidePlayer.com Inc. All rights reserved.