Chapter 3 Association: Contingency, Correlation, and Regression Section 3.1 How Can We Explore the Association between Two Categorical Variables?

Slides:



Advertisements
Similar presentations
Chapter 4 The Relation between Two Variables
Advertisements

Agresti/Franklin Statistics, 1 of 52 Chapter 3 Association: Contingency, Correlation, and Regression Learn …. How to examine links between two variables.
Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?
Chapter 6: Exploring Data: Relationships Lesson Plan
Chapter 3 Association: Contingency, Correlation, and Regression
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.4 Cautions in Analyzing.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
BPS - 5th Ed. Chapter 51 Regression. BPS - 5th Ed. Chapter 52 u Objective: To quantify the linear relationship between an explanatory variable (x) and.
Ch 2 and 9.1 Relationships Between 2 Variables
Basic Practice of Statistics - 3rd Edition
Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.
Descriptive Methods in Regression and Correlation
ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION Chapter 3.
Stat 1510: Statistical Thinking and Concepts Scatterplots and Correlation.
Chapter 6: Exploring Data: Relationships Lesson Plan Displaying Relationships: Scatterplots Making Predictions: Regression Line Correlation Least-Squares.
BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.2 The Association.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
Chapter 10 Correlation and Regression
Essential Statistics Chapter 41 Scatterplots and Correlation.
Objectives (IPS Chapter 2.1)
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u Objective: To quantify the linear relationship between an explanatory variable (x) and.
Chapter 5 Regression BPS - 5th Ed. Chapter 51. Linear Regression  Objective: To quantify the linear relationship between an explanatory variable (x)
BPS - 5th Ed. Chapter 51 Regression. BPS - 5th Ed. Chapter 52 u Objective: To quantify the linear relationship between an explanatory variable (x) and.
1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES.
CHAPTER 4 SCATTERPLOTS AND CORRELATION BPS - 5th Ed. Chapter 4 1.
Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Chapter 7 Scatterplots, Association, and Correlation.
Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict.
Business Statistics for Managerial Decision Making
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.3 Predicting the Outcome.
BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given.
Stat 1510: Statistical Thinking and Concepts REGRESSION.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.1 The Association.
Unit 3 – Association: Contingency, Correlation, and Regression Lesson 3-2 Quantitative Associations.
Chapter 5: 02/17/ Chapter 5 Regression. 2 Chapter 5: 02/17/2004 Objective: To quantify the linear relationship between an explanatory variable (x)
Essential Statistics Chapter 41 Scatterplots and Correlation.
Describing Relationships
Essential Statistics Regression
Basic Practice of Statistics - 3rd Edition
Basic Practice of Statistics - 3rd Edition
Basic Practice of Statistics - 5th Edition
Chapter 7 Part 1 Scatterplots, Association, and Correlation
Chapter 2 Looking at Data— Relationships
residual = observed y – predicted y residual = y - ŷ
CHAPTER 3 Describing Relationships
^ y = a + bx Stats Chapter 5 - Least Squares Regression
Chapter 2 Looking at Data— Relationships
Least Squares Regression
Basic Practice of Statistics - 5th Edition Regression
HS 67 (Intro Health Stat) Regression
Chapter 3 Association: Contingency, Correlation, and Regression
Basic Practice of Statistics - 3rd Edition
Basic Practice of Statistics - 3rd Edition Regression
Essential Statistics Scatterplots and Correlation
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Basic Practice of Statistics - 3rd Edition Lecture Powerpoint
Honors Statistics Review Chapters 7 & 8
Basic Practice of Statistics - 3rd Edition
CHAPTER 3 Describing Relationships
Presentation transcript:

Chapter 3 Association: Contingency, Correlation, and Regression Section 3.1 How Can We Explore the Association between Two Categorical Variables?

Learning Objectives 1. Identify variable type: Response or Explanatory 2. Define Association 3. Contingency tables 4. Calculate proportions and conditional proportions

Learning Objective 1: Response and Explanatory variables Response variable (Dependent Variable) the outcome variable on which comparisons are made Explanatory variable (Independent variable) defines the groups to be compared with respect to values on the response variable Example: Response/Explanatory Blood alcohol level/# of beers consumed Grade on test/Amount of study time Yield of corn per bushel/Amount of rainfall

Learning Objective 2: Association The main purpose of data analysis with two variables is to investigate whether there is an association and to describe that association An association exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable

Learning Objective 3: Contingency Table A contingency table: Displays two categorical variables The rows list the categories of one variable The columns list the categories of the other variable Entries in the table are frequencies

Learning Objective 3: Contingency Table What is the response variable? What is the explanatory variable?

Learning Objective 4: Calculate proportions and conditional proportions

What proportion of organic foods contain pesticides? What proportion of conventionally grown foods contain pesticides? What proportion of all sampled items contain pesticide residuals?

Learning Objective 4: Calculate proportions and conditional proportions Use side by side bar charts to show conditional proportions Allows for easy comparison of the explanatory variable with respect to the response variable

Learning Objective 4: Calculate proportions and conditional proportions If there was no association between organic and conventional foods, then the proportions for the response variable categories would be the same for each food type

Chapter 3 Association: Contingency, Correlation, and Regression Section 3.2 How Can We Explore the Association between Two Quantitative Variables?

Learning Objectives: 1. Constructing scatterplots 2. Interpreting a scatterplot 3. Correlation 4. Calculating correlation

Learning Objective 1: Scatterplot Graphical display of relationship between two quantitative variables: Horizontal Axis: Explanatory variable, x Vertical Axis: Response variable, y

Learning Objective 1: Internet Usage and Gross National Product (GDP) Data Set

Enter values of explanatory variable (x) in L1 Enter values of of response variable (y) in L2 STAT PLOT Plot 1 on Type: scatter plot X list: L2 Y list: L1 ZOOM 9:ZoomStat Graph Learning Objective 1: Internet Usage and Gross National Product (GDP)

Learning Objective 1: Baseball Average and Team Scoring

Enter values of explanatory variable (x) in L1 Enter values of of response variable (y) in L2 STAT PLOT Plot 1 on Type: scatter plot X list: L1 Y list: L2 ZOOM 9:ZoomStat Graph Learning Objective 1: Baseball Average and Team Scoring Use L3 for x and L4 for y. You will use data from prior example again later on in the PowerPoint.

Learning Objective 2: Interpreting Scatterplots You can describe the overall pattern of a scatterplot by the trend, direction, and strength of the relationship between the two variables Trend: linear, curved, clusters, no pattern Direction: positive, negative, no direction Strength: how closely the points fit the trend Also look for outliers from the overall trend

Learning Objective 2: Interpreting Scatterplots: Direction/Association Two quantitative variables x and y are Positively associated when High values of x tend to occur with high values of y Low values of x tend to occur with low values of y Negatively associated when high values of one variable tend to pair with low values of the other variable

Learning Objective 2: Example: 100 cars on the lot of a used-car dealership Would you expect a positive association, a negative association or no association between the age of the car and the mileage on the odometer? a) Positive association b) Negative association c) No association

Learning Objective 2: Example: Did the Butterfly Ballot Cost Al Gore the 2000 Presidential Election?

Learning Objective 3: Linear Correlation, r Measures the strength and direction of the linear association between x and y A positive r value indicates a positive association A negative r value indicates a negative association An r value close to +1 or -1 indicates a strong linear association An r value close to 0 indicates a weak association

Learning Objective 3: Correlation coefficient: Measuring Strength & Direction of a Linear Relationship

Learning Objective 3: Properties of Correlation Always falls between -1 and +1 Sign of correlation denotes direction (-) indicates negative linear association (+) indicates positive linear association Correlation has a unitless measure - does not depend on the variables’ units Two variables have the same correlation no matter which is treated as the response variable Correlation is not resistant to outliers Correlation only measures strength of linear relationship

Leaning Objective 4: Calculating the Correlation Coefficient CountryPer Capita GDP (x)Life Expectancy (y) Austria Belgium Finland France Germany Ireland Italy Netherlands Switzerland United Kingdom Per Capita Gross Domestic Product and Average Life Expectancy for Countries in Western Europe

Learning Objective 4: Calculating the Correlation Coefficient xy = = sum = s x =1.532s y =0.795

Learning Objective 4: Internet Usage and Gross National Product (GDP) 1. STAT CALC menu 2. Choose 8: LinReg(a+bx) 3. 1 st number = x variable 4. 2 nd number = y variable 5. Enter Correlation =.889

Learning Objective 4: Baseball Average and Team Scoring 1. Enter x data into L1 2. Enter y data into L2 3. STAT CALC memu 4. Choose 8: LinReg(a+bx) 5. 1 st number = x variable 6. 2 nd number = y variable 7. Enter Correlation =.874

Learning Objective 4: Cereal: Sodium and Sugar

Chapter 3 Association: Contingency, Correlation, and Regression Section 3.3 How Can We Predict the Outcome of a Variable?

Learning Objectives 1. Definition of a regression line 2. Use a regression equation for prediction 3. Interpret the slope and y-intercept of a regression line 4. Identify the least-squares regression line as the one that minimizes the sum of squared residuals 5. Calculate the least-squares regression line

Learning Objectives 6. Compare roles of explanatory and response variables in correlation and regression 7. Calculate r 2 and interpret

Learning Objective 1: Regression Analysis The first step of a regression analysis is to identify the response and explanatory variables We use y to denote the response variable We use x to denote the explanatory variable

Learning Objective 1: Regression Line A regression line is a straight line that describes how the response variable (y) changes as the explanatory variable (x) changes A regression line predicts the value of the response variable (y) for a given level of the explanatory variable (x) The y-intercept of the regression line is denoted by a The slope of the regression line is denoted by b

Learning Objective 2: Example: How Can Anthropologists Predict Height Using Human Remains? Regression Equation: is the predicted height and is the length of a femur (thighbone), measured in centimeters Use the regression equation to predict the height of a person whose femur length was 50 centimeters

Learning Objective 3: Interpreting the y-Intercept y-Intercept: The predicted value for y when x = 0 Helps in plotting the line May not have any interpretative value if no observations had x values near 0

Learning Objective 3: Interpreting the Slope Slope: measures the change in the predicted variable (y) for a 1 unit increase in the explanatory variable in (x) Example: A 1 cm increase in femur length results in a 2.4 cm increase in predicted height

Learning Objective 3: Slope Values: Positive, Negative, Equal to 0

Learning Objective 3: Regression Line At a given value of x, the equation: Predicts a single value of the response variable But… we should not expect all subjects at that value of x to have the same value of y Variability occurs in the y values!

Learning Objective 3: The Regression Line The regression line connects the estimated means of y at the various x values In summary, Describes the relationship between x and the estimated means of y at the various values of x

Learning Objective 4: Residuals Measures the size of the prediction errors, the vertical distance between the point and the regression line Each observation has a residual Calculation for each residual: A large residual indicates an unusual observation

Learning Objective 4: “Least Squares Method” Yields the Regression Line Residual sum of squares: The least squares regression line is the line that minimizes the vertical distance between the points and their predictions, i.e., it minimizes the residual sum of squares Note: the sum of the residuals about the regression line will always be zero

Learning Objective 5: Regression Formulas for y-Intercept and Slope Slope: Y-Intercept: Regression line always passes through

Learning Objective 5: Calculating the slope and y intercept for the regression line y intercept=-2.28 Slope =26.4

Learning Objective 5: Internet Usage and Gross National Product (GDP)

Learning Objective 5: Internet Usage and Gross National Product Enter x data into L1 Enter y data into L2 1. STAT CALC menu 2. Choose 8: LinReg(a+bx) 3. 1 st number = x variable 4. 2 nd number = y variable 5. Enter =1.548x-3.63

Learning Objective 5: Baseball Average and Team Scoring

Learning Objective 5: Baseball average and Team Scoring 1. Enter x data into L1 2. Enter y data into L2 3. STAT CALC 4. Choose 8: LinReg(a+bx) 5. 1 st number = x variable 6. 2 nd number = y variable 7. Enter

Learning Objective 5: Cereal: Sodium and Sugar

Learning Objective 6: The Slope and the Correlation Correlation: Describes the strength of the linear association between 2 variables Does not change when the units of measurement change Does not depend upon which variable is the response and which is the explanatory

Learning Objective 6: The Slope and the Correlation Slope: Numerical value depends on the units used to measure the variables Does not tell us whether the association is strong or weak The two variables must be identified as response and explanatory variables The regression equation can be used to predict values of the response variable for given values of the explanatory variable

Learning Objective 7: The Squared Correlation When a strong linear association exists, the regression equation predictions tend to be much better than the predictions using only We measure the proportional reduction in error and call it, r 2

Learning Objective 7: The Squared Correlation measures the proportion of the variation in the y-values that is accounted for by the linear relationship of y with x A correlation of.9 means that 81% of the variation in the y-values can be explained by the explanatory variable, x

Chapter 3 Association: Contingency, Correlation, and Regression Section 3.4 What Are Some Cautions in Analyzing Association?

Learning Objectives: 1. Extrapolation 2. Outliers and Influential Observations 3. Correlations does not imply causation 4. Lurking variables and confounding 5. Simpson’s Paradox

Learning Objective 1: Extrapolation Extrapolation: Using a regression line to predict y-values for x-values outside the observed range of the data Riskier the farther we move from the range of the given x-values There is no guarantee that the relationship given by the regression equation holds outside the range of sampled x-values

Learning Objective 2: Outliers and Influential Points Construct a scatterplot Search for data points that are well outside of the trend that the remainder of the data points follow

Learning Objective 2: Outliers and Influential Points A regression outlier is an observation that lies far away from the trend that the rest of the data follows An observation is influential if Its x value is relatively low or high compared to the remainder of the data The observation is a regression outlier Influential observations tend to pull the regression line toward that data point and away from the rest of the data

Learning Objective 2: Outliers and Influential Points Impact of removing an Influential data point

Learning Objective 3: Correlation does not Imply Causation A strong correlation between x and y means that there is a strong linear association that exists between the two variables A strong correlation between x and y, does not mean that x causes y

Data are available for all fires in Chicago last year on x = number of firefighters at the fires and y = cost of damages due to fire 1. Would you expect the correlation to be negative, zero, or positive? 2. If the correlation is positive, does this mean that having more firefighters at a fire causes the damages to be worse? Yes or No 3. Identify a third variable that could be considered a common cause of x and y: a.Distance from the fire station b.Intensity of the fire c.Size of the fire Learning Objective 3: Association does not imply causation

Learning Objective 4: Lurking Variables & Confounding A lurking variable is a variable, usually unobserved, that influences the association between the variables of primary interest Ice cream sales and drowning – lurking variable =temperature Reading level and shoe size – lurking variable=age Childhood obesity rate and GDP-lurking variable=time When two explanatory variables are both associated with a response variable but are also associated with each other, there is said to be confounding Lurking variables are not measured in the study but have the potential for confounding

Learning Objective 5: Simpson’s Paradox Simpson’s Paradox: When the direction of an association between two variables changes after we include a third variable and analyze the data at separate levels of that variable

Learning Objective 5: Simpson’s Paradox Example Probability of Death of Smoker = 139/582=24% Probability of Death of Nonsmoker=230/732=31% Is Smoking Actually Beneficial to Your Health? This can’t be true that smoking improves your chances of living! What’s going on!

Learning Objective 5: Simpson’s Paradox Example Break out Data by Age

Learning Objective 5: Simpson’s Paradox Example An association can look quite different after adjusting for the effect of a third variable by grouping the data according to the values of the third variable