Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 2 Looking at Data - Relationships. Relations Among Variables Response variable - Outcome measurement (or characteristic) of a study. Also called:

Similar presentations


Presentation on theme: "Chapter 2 Looking at Data - Relationships. Relations Among Variables Response variable - Outcome measurement (or characteristic) of a study. Also called:"— Presentation transcript:

1 Chapter 2 Looking at Data - Relationships

2 Relations Among Variables Response variable - Outcome measurement (or characteristic) of a study. Also called: dependent variable, outcome, and endpoint. Labelled as y. Explanatory variable - Condition that explains or causes changes in response variables. Also called: independent variable and predictor. Labelled as x. Theories usually are generated about relationships among variables and statistical methods can be used to test them. Research questions are stated such as: Do changes in x cause changes in y?

3 Scatterplots Identify the explanatory and response variables of interest, and label them as x and y Obtain a set of individuals and observe the pairs (x i, y i ) for each pair. There will be n pairs. Statistical convention has the response variable (y) placed on the vertical (up/down) axis and the explanatory variable (x) placed on the horizontal (left/right) axis. (Note: economists reverse axes in price/quantity demand plots) Plot the n pairs of points (x,y) on the graph

4 France August,2003 Heat Wave Deaths Individuals: 13 cities in France Response: Excess Deaths(%) Aug1/19,2003 vs 1999-2002 Explanatory Variable: Change in Mean Temp in period (C) Data:

5 France August,2003 Heat Wave Deaths Possible Outlier

6 Example - Pharmacodynamics of LSD Response (y) - Math score (mean among 5 volunteers) Explanatory (x) - LSD tissue concentration (mean of 5 volunteers) Raw Data and scatterplot of Score vs LSD concentration: Source: Wagner, et al (1968)

7 Manufacturer Production/Cost Relation Y= Amount Produced x= Total Cost n=48 months (not in order)

8 Manufacturer Production/Cost Relation

9 Correlation Numerical measure to summarize the strength of the linear (straight-line) association between two variables Bounded between -1 and +1 (Labelled as r) –Values near -1  Strong Negative association –Values near 0  Weak or no association –Values near +1  Strong Positive association Not affected by linear transformation of either x or y Does not distinguish between response and explanatory variable (x and y can be interchaged)

10 Excess French Heatwave Deaths

11 Examples

12 Least-Squares Regression Goal: Fit a line that “best fits” the relationship between the response variable and the explanatory variable Equation of a straight line: y = a + bx –a - y-intercept (value of y when x = 0) –b - slope (amount y increases as x increases by 1 unit) Prediction: Often want to predict what y will be at a given level of x. (e.g. How much will it cost to fill an order of 1000 t-shirts) Extrapolation: Using a fitted line outside level of the explanatory variable observed in sample: BAD IDEA

13 Least-Squares Regression y = a + bx is a deterministic equation Sample data don’t fall on a straight line, but rather around one Obtain equation that “best fits” a sample of data points Error - Difference between observed response and predicted response (from equation) Least Squares criteria: Choose the line that minimizes the sum of squared errors. Resulting regression line:

14 Excess French Heatwave Deaths For each 1C increase in mean temp, excess mortality increases about 20%

15 Effect of an Outlier (Paris) Re-fitting the model without Paris, which had a very high excess mortality (Using EXCEL):

16 Squared Correlation The squared correlation represents the fraction of the variation in the response variable that is “explained” by the explanatory variable Represents the improvement (reduction in sum of squared errors) by using x (and fitted equation y-hat) to predict y as opposed to ignoring x (and simply using the sample mean y-bar) to predict y 0  r 2  1 –Values near 0  x does not help predict y (regression line flat) –Values near 1  x predicts y well (data near regression line)

17 Residual Analysis Residuals: Difference between observed responses and their predicted values: Useful to plot the residuals versus the level of the explanatory variable (x) Outliers: Large (positive or negative) residuals. Values of y that are inconsistent with prediction Influential observations: Cases where the level of the explanatory variable is far away from the other individuals (extreme x values)

18 France Heatwave Mortality Paris (outlier)

19 Miscellaneous Topics Lurking Variable: Variable not included in regression analysis that may influence the association between y and x. Sometimes referred to as a spurious association between y and x. Association does not imply causation (it is one of various steps to demonstrating cause-and-effect) Do not extrapolate outside range of x observed in study Some relationships are not linear, which may show low correlation when relation is strong Correlations based on averages across individuals tend to be higher than those based on individuals

20 Causation Association between x and y demonstrated Time order confirmed (x “occurs” before y) Alternative explanations are considered and explained away: –Lurking variables - Another variable causes both x and y –Confounding - Two explanatory variables are highly related, and which causes y cannot be determined Dose-Response Effect Plausible cause


Download ppt "Chapter 2 Looking at Data - Relationships. Relations Among Variables Response variable - Outcome measurement (or characteristic) of a study. Also called:"

Similar presentations


Ads by Google