Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES.

Similar presentations


Presentation on theme: "1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES."— Presentation transcript:

1 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES 29 – 58

2 2 LINEAR RELATIONSHIPS BETWEEN TWO VARIABLES X AND Y WHY STUDY LINEAR RELATIONSHIPS? LINEAR RELATIONSHIPS ARE THE EASIEST TO UNDERSTAND AND ANALYZE; MOST RELATIONSHIPS ARE OFTEN APPROXIMATELY LINEAR; VARIABLES WITH A NONLINEAR RELATIONSHIP CAN OFTEN BE TRANSFORMED SO THAT THE RELATIONSHIP OF THE TRANSFORMED VARIABLES IS LINEAR. FOR EXAMPLE, CONSIDER THE EQUATION RELATING BRAIN WEIGHT W, AND BODY WEIGHT Z.

3 3 EXAMPLES RELATIONSHIP BETWEEN SMOKING AND LUNG CANCER; RELATIONSHIP BETWEEN ALTITUDE AND THE BOILING POINT OF WATER; RELATIONSHIP BETWEEN TEMPERATURE AND OZONE CONCENTRATION IN THE AIR; IN THESE EXAMPLES, TWO VARIABLES ARE INVOLVED NAMELY: THE RESPONSE VARIABLE Y, AND THE EXPLANATORY VARIABLE, X.

4 Three Tools We Will Use Scatterplot, a two-dimensional graph of data values Correlation, a statistic that measures the strength and direction of a linear relationship between two quantitative variables. Regression equation, an equation that describes the average relationship between a quantitative response and explanatory variable. 4

5 5 LEAST SQUARES LINE (REGRESSION LINE) GIVEN A SET OF n OBSERVATIONS, QUESTION: WHAT LINE “BEST” FITS THE OBSERVATIONS? WE SHALL ANSWER THIS QUESTION GRAPHICALLY USING A SCATTERPLOT, AND ANALYTICALLY USING LEAST SQUARES REGRESSION FORMULA. SCATTERPLOTS: A SCATTERPLOT IS A PLOT OF THE POINTS

6 6 WHAT LINE “BEST FITS” THE SET OF OBSERVATIONS? GRAPHICAL SOLUTION

7 7 EXAMPLE: GIVEN THE SET OF OBSERVATIONS, (1,2), (2,5), (3,4), (4,1), (5,8), (6,3), (7,2), PLOT A SCATTERGRAM. SCATTERGRAM X Y 4 8 6 2 2468 X X X X X X X

8 Example: Height and Handspan 8 Data shown are the first 12 observations of a data set that includes the heights (in inches) and fully stretched handspans (in centimeters) of 167 college students

9 Example: Height and Handspan Taller people tend to have greater handspan measurements than shorter people do. When two variables tend to increase together, we say that they have a positive association. The handspan and height measurements may have a linear relationship. 9

10 Example: Driver Age and Maximum Legibility Distance of Highway Signs A research firm determined the maximum distance at which each of 30 drivers could read a newly designed sign. The 30 participants in the study ranged in age from 18 to 82 years old. We want to examine the relationship between age and the sign legibility distance 10

11 Example: Driver Age and Maximum Legibility Distance of Highway Signs 11

12 Example: Driver Age and Maximum Legibility Distance of Highway Signs We see a negative association with a linear pattern. We will use a straight-line equation to model this relationship. 12

13 13 LOOKING AT SCATTERPLOTS SCATTERPLOTS ARE THE BEST WAY TO START OBSERVING THE RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES. BY JUST LOOKING AT THEM, YOU CAN SEE PATTERNS, TRENDS, RELATIONSHIPS, AND EVEN THE OCCASIONAL EXTRAORDINARY VALUE SITTING APART FROM THE OTHERS. THERE ARE FOUR THINGS WE LOOK FOR IN A SCATTERPLOT. –DIRECTION –FORM –STRENGTH –UNUSUAL FEATURES

14 Looking for Patterns with Scatterplots Questions to Ask about a Scatterplot What is the average pattern? Does it look like a straight line, or is it curved? What is the direction of the pattern? How much do individual points vary from the average pattern? Are there any unusual data points? 14

15 What we Look for in a scatterplot We examine a scatterplot to study association. How do values on the response variable change as values of the explanatory variable change? You can describe the overall pattern of a scatterplot by the trend, direction, and strength of the relationship between the two variables.  Trend: linear, curved, clusters, no pattern  Direction: positive, negative, no direction  Strength: how closely the points fit the trend Also look for outliers from the overall trend. 15

16 16 DIRECTION POSITIVE NEGATIVE NEITHER THE PATTERN RUNS FROM THE BOTTOM LEFT FROM THE UPPER LEFT TO THE UPPER RIGHT. TO THE LOWER RIGHT. X X X X X X X X X X X X X X X X X X X XX X X X XX X X X X X X X X X X X X X X X X X X X X XX XX X X X X XX X X X

17 17 FORM STRAIGHT CURVED EXOTIC NO PATTERNS

18 18 FORM: POSITIVE STRAIGHT DIRECTION POSITIVELY STRAIGHT RELATIONSHIP X X X X X X X X X X X X X X X X X X X X X X

19 Example: 100 Cars on the Lot of a Used-Car Dealership Question: Would you expect a positive association, a negative association or no association between the age of the car and the mileage on the odometer? –Positive association –Negative association –No association 19

20 20 FORM: NEGATIVE STRAIGHT DIRECTION NEGATIVELY STRAIGHT RELATIONSHIP X X X X X X X X X X X X X X X X X X X X X X X X

21 Positive and Negative Associations Two quantitative variables x and y are –Positively associated when high values of x tend to occur with high values of y. low values of x tend to occur with low values of y. –Negatively associated when high values of one variable tend to pair with low values of the other variable. 21

22 Positive, Negative Associations, Linear Relationships Two variables have a positive association when the values of one variable tend to increase as the values of the other variable increase. Two variables have a negative association when the values of one variable tend to decrease as the values of the other variable increase. Two variables have a linear relationship when the pattern of their relationship resembles a straight line. 22

23 23 FORM: CURVED RELATIONSHIP CURVED ASSOCIATION BETWEEN X AND Y X X X X X XX X X X X X X X X X X X X X X X X X X XXX X X X X X X XXX X X X X X

24 24 FORM:EXOTIC – SHARP POINTS OUTSTANDING FEATURE – SHARP POINTS X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X

25 25 FORM:NO CLEAR PATTERNS

26 26 STRENGTH STRONGMODERATEWEAK X X XX X X X X X X X X X X X X X X X X X X X X XX X X X X X X X X X X X X X X XX X X X X X X X X X X X X X X X X X X

27 27 UNUSUAL FEATURES OUTLIERSSUBGROUPS X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X

28 Unusual Features 28

29 29 EXAMPLES

30 30 REGRESSION WISDOM: CORRELATION COEFFICIENT CORRELATION COEFFICIENT r THE CORRELATION COEFFICIENT IS A NUMERICAL MEASURE OF THE DIRECTION AND STRENGTH OF A LINEAR ASSOCIATION. FOR A SET OF PAIRED DATA THE LINEAR CORRELATION COEFFICIENT r IS GIVEN BY

31 31 WHERE AND ARE RESPECTIVELY THE STANDARD DEVIATION OF X AND Y. EXAMPLE: FIND THE LINEAR CORRELATION COEFFICIENT FOR THE FOLLOWING FOUR PAIRS OF NUMBERS: (6,5), (10,3), (14,7), (19,8), (21,12).

32 32 PROPERTIES OF CORRELATION COEFFICIENT, r WITH POSITIVE r MEANING POSITIVE RELATIONSHIP AND NEGATIVE r MEANING NEGATIVE RELATIONSHIP BETWEEN THE TWO VARIABLES. r = 1 IF AND ONLY IF POINTS LIE ON A LINE WITH POSITIVE SLOPE. r = -1 IF AND ONLY IF POINTS LIE ON A LINE WITH NEGATIVE SLOPE. THE VALUE OF r DOES NOT CHANGE IF THE UNITS OF MEASUREMENT ARE CHANGED. r MEASURES THE STRENGTH AND DIRECTION OF THE LINEAR RELATIONSHIP BETWEEN Y AND X.

33 33 THE VALUE OF r DOES NOT DEPEND ON WHICH OF THE TWO VARIABLES IS LABELED X. THE VALUE OF r IS A MEASURE OF THE EXTENT TO WHICH X AND Y ARE LINEARLY RELATED – THAT IS, THE EXTENT TO WHICH THE POINTS IN THE SCATTER PLOT FALL CLOSE TO A STRAIGHT LINE. A VALUE OF r CLOSE TO ZERO DOES NOT RULE OUT ANY STRONG RELATIONSHIP BETWEEN X AND Y; THERE COULD STILL BE A STRONG RELATIONSHIP BUT ONE THAT IS NOT LINEAR. 0+1-0.50.50.8-0.8 STRONG MODERATE WEAK MODERATE

34 Scatterplots and Correlation Coefficient Let’s get a feel for the correlation r by looking at its values for the scatterplots shown below 34

35 35

36 36 CORRELATION CONDITIONS QUANTITATIVE VARIABLES CONDITION CORRELATION APPLIES ONLY TO QUANTITATIVE VARIABLES. BE SURE NOT TO APPLY CORRELATION TO CATEGORICAL DATA MASQUERADING AS QUANTITATIVE. CHECK THE VARIABLES’ UNITS AND WHAT THEY MEASURE.

37 37 STRAIGHT ENOUGH CONDITION MAKE SURE THE FORM OF THE SCATTERPLOT IS STRAIGHT ENOUGH THAT A LINEAR RELATIONSHIP MAKES SENSE. CORRELATION MEASURES THE STRENGTH ONLY OF THE LINEAR ASSOCIATION, AND WILL BE MISLEADING IF THE RELATIONSHIP IS NOT LINEAR. IF A RELATIONSHIP IS CURVED, THEN SUMMARIZING ITS STRENGTH WITH A CORRELATION WOULD BE MISLEADING.

38 38 OUTLIER CONDITION OUTLIERS CAN DISTORT THE CORRELATION DRAMATICALLY. AN OUTLIER CAN MAKE AN OTHERWISE WEAK CORRELATION LOOK BIG OR HIDE A STRONG CORRELATION. AN OUTLIER CAN EVEN GIVE AN OTHERWISE POSITIVE ASSOCIATION A NEGATIVE CORRELATION (AND VICE VERSA) WHEN YOU SEE AN OUTLIER, IT’S OFTEN A GOOD IDEA TO REPORT THE CORRELATION WITH AND WITHOUT THAT POINT.

39 39 WHICH LINE “BEST FITS” THE SET OF OBSERVATIONS? THE ANALYTICAL APPROACH

40 Regression Line The first step of a regression analysis is to identify the response and explanatory variables. –We use y to denote the response variable. –We use x to denote the explanatory variable. 40

41 41 FITTING THE MODEL: THE LEAST SQUARES METHOD CONSIDER THE EXAMPLE: SUPPOSE AN APPLIANCE STORE CONDUCTS A FIVE- MONTH EXPERIMENT TO DETERMINE THE EFFECT OF ADVERTISING ON SALES REVENUE. THE RESULTS ARE SHOWN IN THE TABLE. MONTHADVERTISING EXPENDITURE, X ($100s) SALE REVENUE Y, ($1,000s) 111 221 332 442 554

42 42 FIRST STEP IS TO MAKE A SCATTERGRAM SALES REVENUE ($1000s) x Y AD. EXPENDITURE ($100s) 12345 1 2 3 4 XX XX X

43 43 WHAT IS THE BEST FIT? SCATTERGRAM WITH POSSIBLE FITS X Y 1 2 3 4 123456 XX XX X

44 44 GENERAL EQUATION OF A REGRESSION LINE y = a + b.x + error LINEAR PART THE LINEAR PART OF THE EQUATION REQUIRES DETERMINATION OF TWO COEFFICIENTS – a (THE Y-INTERCEPT) AND b (THE SLOPE) IN ORDER TO PREDICT VALUES OF Y. ONCE a AND b ARE OBTAINED, THE STRAIGHT LINE IS KNOWN AND CAN BE PLOTTED ON THE SCATTER DIAGRAM. THEN WE COULD MAKE A VISUAL COMPARISON OF HOW WELL OUR PARTICULAR STATISTICAL MODEL (A STRAIGHT LINE) FITS THE ORIGINAL DATA.

45 Regression Equation The regression line predicts the value for the response variable y as a straight-line function of the value x of the explanatory variable. Let denote the predicted value of y. The equation for the regression line has the form In this formula, a denotes the y-intercept and b denotes the slope. 45

46 Example: Height Based on Human Remains Regression Equation: is the predicted height and is the length of a femur (thighbone), measured in centimeters. Use the regression equation to predict the height of a person whose femur length was 50 46

47 47 COMPUTING a AND b LETTING AND REPRESENT THE MEANS OF AND RESPECTIVELY, THE RESULTING FORMULAS FOR THE INTERCEPT AND SLOPE ARE GIVEN BY

48 48 ALTERNATIVE FORMULA WHEN THE STANDARD DEVIATIONS OF X AND Y VARIABLES ARE KNOWN

49 49 NOTE: THE ‘HAT’ IS PLACED OVER THE LETTERS a AND b TO REMIND US THAT THESE ARE THE VALUES WHICH MINIMIZES THE SUM OF SQUARED DEVIATIONS. CLASS WORK Example From Midterm 1 Review Sheet

50 Example: Baseball Scoring Vs Batting average Given the following statistics from data on baseball scoring versus batting average, find the regression line. 50

51 Example: Baseball Scoring Vs Batting Average 51

52 Example: Baseball Scoring Vs Batting Average The regression line to predict team scoring from batting average is 52

53 Slope: Positive, Negative, Zero 53

54 Interpreting the Slope Slope: measures the change in the predicted variable (y) for a 1 unit increase in the explanatory variable (x). Example: A 1 cm increase in femur length results in a 2.4 cm increase in predicted height. 54

55 The Slope and Correlation Slope: –Numerical value depends on the units used to measure the variables. –Does not tell us whether the association is strong or weak. –The two variables must be identified as response and explanatory variables. –The regression equation can be used to predict values of the response variable for given values of the explanatory variable. 55

56 Interpreting the y - Intercept y-Intercept: –The predicted value for y when x = 0; –This fact helps in plotting the line; –May not have any interpretative value if no observations had x values near 0; It does not make sense for femur length to be 0 cm, so the y-intercept for the equation is not a relevant predicted height. 56

57 57 REGRESSION WISDOM: PREDICTION, RESIDUALS, CORRELATION PREDICTION OBTAINING THE REGRESSION FORMULA IS NOT THE END OF THE ANALYSIS. MOSTLY, WE ARE INTERESTED IN PREDICTING FUTURE OUTCOMES WITH THE REGRESSION FUNCTION. TYPES OF PREDICTION EXTRAPOLATION: EXTRAPOLATION IS THE USE OF A REGRESSION LINE FOR PREDICTION OUTSIDE THE RANGE OF VALUES OF THE EXPLANATORY VARIABLE X THAT IS USED TO OBTAIN THE LINE. SUCH PREDICTION CANNOT BE TRUSTED.

58 Extrapolation is Dangerous Extrapolation: Using a regression line to predict y-values for x-values outside the observed range of the data. –Riskier the farther we move from the range of the given x-values. –There is no guarantee that the relationship given by the regression equation holds outside the range of sampled x-values. 58

59 59 INTERPOLATION: INTERPOLATION IS THE USE OF A REGRESSION LINE FOR PREDICTION WITHIN THE RANGE OF VALUES OF THE EXPLANATORY VARIABLE X THAT IS USED TO OBTAIN THE LINE. INTERPOLATION IS GENERALLY SAFE. REMARKS EXTRAPOLATION SHOULD BE HANDLED WITH CAUTION. LIMIT PREDICTIONS TO X VALUES WHICH ARE WITHIN THE RANGE OF THE DATA USED TO COMPUTE THE LEAST SQUARES LINE.

60 60 INTERPOLATING AND EXTRAPOLATING ILLUSTRATION

61 61 DO NOT MAKE PREDICTIONS OUTSIDE THE CONTEXT OF THE STUDY IN WHICH THE DATA WERE COLLECTED. FOR EXAMPLE, IT IS INAPPROPRIATE TO USE THE LEAST SQUARES LINE FITTED BY THE BABY DATA OF AMERICANS TO PREDICT WEIGHTS FOR BABIES BORN IN CHINA. DIAGNOSTICS: AFTER OBTAINING THE REGRESSION LINE, WE WOULD LIKE TO KNOW HOW WELL THE REGRESSION LINE FITS THE DATA. ALSO, WE WOULD LIKE TO KNOW IF THERE IS ANY POTENTIAL POINT THAT AFFECTS THE REGRESSION LINE. A DIAGNOSTIC ANALYSIS SUCH AS THE ANALYSIS OF RESIDUAL IS VERY USEFUL.

62 62 REGRESSION WISDOM - RESIDUALS THE DISCREPANCY BETWEEN DATA AND MODEL IS CALLED RESIDUAL. HENCE, A RESIDUAL IS THE DIFFERENCE BETWEEN AN OBSERVED VALUE OF Y AND THE VALUE PREDICTED BY THE REGRESSION LINE. THAT IS, RESIDUAL = OBSERVED Y – PREDICTED Y = WHERE

63 63 NOTATION: RESIDUAL IS DENOTED BY THE LETTER e THE RESIDUALS FOR INDIVIDUAL i IS DENOTED BY EXAMPLE: A LINEAR MODEL RELATING HURRICANES’ WIND SPEEDS TO THEIR CENTRAL PRESSURES IS MaxWindSpeed = 955.27 – 0.897CentralPressure HURRICANE KATRINA HAD A CENTRAL PRESSURE AT 920 MILLIBARS. WHAT DOES OUR REGRESSION MODEL PREDICT FOR HER MAXIMUM WIND SPEED? HOW GOOD IS THAT PREDICTION, GIVEN THAT KATRINA’S ACTUAL WIND SPEED WAS MEASURED AT 110 KNOTS?

64 Analysis of Residuals ANALYSIS OF RESIDUAL HELPS US TO ASSESS THE ADEQUACY OF A MODEL AND HELPS TO IDENTIFY OUTLIERS OR OTHER INTERESTING DATA POINT WHEN A REGRESSION MODEL IS APPROPRIATE, IT SHOULD MODEL THE UNDERLYING RELATIONSHIP. NOTHING INTERESTING SHOULD BE LEFT BEHIND. SO AFTER WE FIT A REGRESSION MODEL, WE USUALLY PLOT THE RESIDUALS IN THE HOPE OF FINDING … NOTHING. 64

65 65 Analysis of Residuals – Residual Plots RESIDUAL PLOT IS A SCATTERPLOT OF THE RESIDUALS [ON THE VERTICAL AXIS] AGAINST THE EXPLANATORY VARIABLE, X, ON THE HORIZONTAL AXIS. THE PLOT SHOULD NOT HAVE ANY INTERESTING FEATURES, LIKE A DIRECTION OR SHAPE. IT SHOULD STRETCH HORIZONTALLY, WITH ABOUT THE SAME AMOUNT OF SCATTER THROUGHOUT. IT SHOULD SHOW NO BENDS, AND IT SHOULD HAVE NO OUTLIERS.

66 66 NOTE: SUM OF RESIDUALS = 0 A RESIDUAL PLOT e x ● ● ● ● ● ● ● ● ● ● ● ● ● ●

67 67 CLASS EXAMPLES

68 68 INTERPRETATION OF POSITIVE AND NEGATIVE RESIDUALS POSITIVE RESIDUAL: THE MODEL OR PREDICTED VALUES UNDERESTIMATE THE ACTUAL DATA VALUE. NEGATIVE RESIDUAL: THE MODEL OR PREDICTED VALUES OVERESTIMATE THE ACTUAL DATA VALUE

69 69 REMARK MOST COMPUTER STATISTICS PACKAGES PLOT THE RESIDUALS AGAINST THE PREDICTED VALUES, RATHER THAN AGAINST THE X-VALUES. WHEN THE SLOPE IS NEGATIVE, THE TWO VERSIONS ARE MIRROR IMAGES. WHEN THE SLOPE IS POSITIVE, THEY ARE VIRTUALLY IDENTICAL EXCEPT FOR THE AXIS LABELS. SINCE ALL WE CARE ABOUT IS THE PATTERNS (OR, BETTER, LACK OF PATTERNS) IN THE RESIDUAL PLOT, IT REALLY DOES NOT MATTER WHICH WAY WE PLOT THE RESIDUALS.

70 70 CLASS EXAMPLES

71 71 COEFFICIENT OF DETERMINATION THE COEFFICIENT OF DETERMINATION MEASURES THE PROPORTION OF VARIATION THAT IS EXPLAINED BY THE INDEPENDENT VARIABLE X, IN THE REGRESSION MODEL. THAT IS, MEASURES THE PROPORTION OF THE TOTAL VARIABILITY IN Y THAT IS REMOVED BY ADDING X TO THE LINEAR MODEL. NOTATION: THE COEFFICIENT OF DETERMINATION IS USEFUL WHEN INTERPRETING r. ITS SYMBOL EXPLAINS HOW IT IS COMPUTED; TO OBTAIN IT, SIMPLY SQUARE r – THE CORRELATION COEFFICIENT.

72 Coefficient of Determination The typical way to interpret is as the proportion of the variation in the y-values that is accounted for by the linear relationship of y with x. When a strong linear association exists, the regression equation predictions tend to be much better than the predictions using only. We measure the proportional reduction in error and call it,. 72

73 Coefficient of Determination measures the proportion of the variation in the y- values that is accounted for by the linear relationship of y with x. A correlation of.9 means that –81% of the variation in the y-values can be explained by the explanatory variable, x. 73

74 74 PROPERTIES OF COEFFICIENT OF DETERMINATION 1. 2. IF AND ONLY IF ALL POINTS LIE ON A LINE. 3. DOES NOT CHANGE IF THE UNITS OF MEASUREMENT ARE CHANGED. 4. MEASURES THE STRENGTH OF LINEAR ASSOCIATION BETWEEN THE VARIABLES Y AND X. IT IS POSSIBLE THAT X AND Y ARE STRONGLY RELATED, BUT IS CLOSE TO 0.

75 75 Remark THE COEFFICIENT OF DETERMINATION, WHEN CONVERTED TO A PERCENTAGE, INDICATES HOW MUCH VARIANCE IS ACCOUNTED FOR BY THE VARIANCE ON THE OTHER VARIABLE Examples From Midterm Review 1 Sheet

76 Outliers and Influential Points A regression outlier is an observation that lies far away from the trend that the rest of the data follows. An observation is influential if –its x value is relatively low or high compared to the remainder of the data. –the observation is a regression outlier. Influential observations tend to pull the regression line toward that data point and away from the rest of the data points. 76

77 Be Cautious of Influential Points One reason to plot the data before you do a correlation or regression analysis is to check for unusual observations. Search for observations that are regression outliers, being well removed from the trend that the rest of the data follow. 77

78 Outliers and Influential Points 78

79 Outliers and Influential Points An Observation Is a Regression Outlier if it is Far Removed from the Trend that the Rest of the Data Follow. The top two points are regression outliers. Not all regression outliers are influential in affecting the correlation or slope. Question: Which regression outlier in this figure is influential? 79

80 Correlation Does not Imply Causation In a regression analysis, suppose that as x goes up, y also tends to go up (or down). Can we conclude that there’s a causal connection, with changes in x causing changes in y? –A strong correlation between x and y means that there is a strong linear association that exists between the two variables. –A strong correlation between x and y, does not mean that x causes y to change. 80

81 Correlation Does not Imply Causation (Extra – Credit Exercise) Data are available for all fires in Chicago last year on x = number of firefighters at the fire and y = cost of damages due to the fire. 1. Would you expect the correlation to be negative, zero, or positive? 2. If the correlation is positive, does this mean that having more firefighters at a fire causes the damages to be worse? Yes or No? 3. Identify a third variable that could be considered a common cause of x and y:  Distance from the fire station  Intensity of the fire  Size of the fire 81

82 Lurking Variables & Confounding A lurking variable is a variable, usually unobserved, that influences the association between the variables of primary interest.  Ice cream sales and drowning – lurking variable = temperature  Reading level and shoe size – lurking variable = age  Childhood obesity rate and GDP-lurking variable = time When two explanatory variables are both associated with a response variable but are also associated with each other, there is said to be confounding. Lurking variables are not measured in the study but have the potential for confounding. 82

83 The Effect of Lurking Variables on Associations Lurking variables can affect associations in many ways. For instance, a lurking variable may be a common cause of both the explanatory and response variable. In practice, there’s usually not a single variable that causally explains a response variable or the association between two variables. More commonly, there are multiple causes. When there are multiple causes, the association among them makes it difficult to study the effect of any single variable. 83

84 The Effects of Confounding on Associations When two explanatory variables are both associated with a response variable but are also associated with each other, confounding occurs. It is difficult to determine whether either of them truly causes the response because a variable’s effect could be at least partly due to its association with the other variable. 84

85 85 LEVERAGE AND INFLUENTIAL POINTS LEVERAGE DATA POINTS WITH X-VARIABLES FAR FROM THE MEAN OF X ARE SAID TO EXERT LEVERAGE ON A LINEAR MODEL. HIGH LEVERAGE POINTS PULL THE LINE CLOSE TO THEM, AND SO THEY CAN HAVE A LARGE EFFECT ON THE LINE, SOMETIMES COMPLETELY DETERMINING THE SLOPE AND Y- INTERCEPT. WITH HIGH ENOUGH LEVERAGE, THEIR RESIDUALS CAN APPEAR TO BE DECEPTIVELY SMALL.

86 86 INFLUENTIAL POINT IF OMITING A POINT FROM THE DATA RESULTS IN A VERY DIFFERENT REGRESSION MODEL, THEN THAT POINT IS CALLED AN INFLUENTIAL POINT. ILLUSTRATIVE EXAMPLES


Download ppt "1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES."

Similar presentations


Ads by Google