Agresti/Franklin Statistics, 1 of 52 Chapter 3 Association: Contingency, Correlation, and Regression Learn …. How to examine links between two variables.

Slides:



Advertisements
Similar presentations
MATH 2400 Chapter 5 Notes. Regression Line Uses data to create a linear equation in the form y = ax + b where “a” is the slope of the line (unit rate.
Advertisements

Chapter 4 The Relation between Two Variables
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.1 The Association.
Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?
Chapter 6: Exploring Data: Relationships Lesson Plan
2nd Day: Bear Example Length (in) Weight (lb)
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Quantitative Variables Chapter 5.
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Chapter 3 Association: Contingency, Correlation, and Regression
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.4 Cautions in Analyzing.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
Relationships Between Quantitative Variables
CHAPTER 3 Describing Relationships
Ch 2 and 9.1 Relationships Between 2 Variables
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Descriptive Methods in Regression and Correlation
Relationship of two variables
Slide Copyright © 2008 Pearson Education, Inc. Chapter 4 Descriptive Methods in Regression and Correlation.
ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION Chapter 3.
Chapter 6: Exploring Data: Relationships Chi-Kwong Li Displaying Relationships: Scatterplots Regression Lines Correlation Least-Squares Regression Interpreting.
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
Chapter 6: Exploring Data: Relationships Lesson Plan Displaying Relationships: Scatterplots Making Predictions: Regression Line Correlation Least-Squares.
Chapter 6: Exploring Data: Relationships Lesson Plan Displaying Relationships: Scatterplots Making Predictions: Regression Lines Correlation Least-Squares.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.2 The Association.
Chapter 3 concepts/objectives Define and describe density curves Measure position using percentiles Measure position using z-scores Describe Normal distributions.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
1 Chapter 10, Part 2 Linear Regression. 2 Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. One variable.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
Chapter 10 Correlation and Regression
Chapter 2 Looking at Data - Relationships. Relations Among Variables Response variable - Outcome measurement (or characteristic) of a study. Also called:
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES.
Scatterplot and trendline. Scatterplot Scatterplot explores the relationship between two quantitative variables. Example:
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12: Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Chapters 8 & 9 Linear Regression & Regression Wisdom.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
Business Statistics for Managerial Decision Making
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.3 Predicting the Outcome.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
1 Chapter 12: Analyzing Association Between Quantitative Variables: Regression Analysis Section 12.1: How Can We Model How Two Variables Are Related?
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.1 The Association.
Unit 3 – Association: Contingency, Correlation, and Regression Lesson 3-3 Linear Regression, Residuals, and Variation.
Chapter 3 Association: Contingency, Correlation, and Regression Section 3.1 How Can We Explore the Association between Two Categorical Variables?
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Least-Squares Regression
Chapter 2 Looking at Data— Relationships
Chapter 3 Association: Contingency, Correlation, and Regression
Least-Squares Regression
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Warmup A study was done comparing the number of registered automatic weapons (in thousands) along with the murder rate (in murders per 100,000) for 8.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Honors Statistics Review Chapters 7 & 8
CHAPTER 3 Describing Relationships
Presentation transcript:

Agresti/Franklin Statistics, 1 of 52 Chapter 3 Association: Contingency, Correlation, and Regression Learn …. How to examine links between two variables

Agresti/Franklin Statistics, 2 of 52 Variables Response variable: the outcome variable Explanatory variable: the variable that explains the outcome variable

Agresti/Franklin Statistics, 3 of 52 Association An association exists between the two variables if a particular value for one variable is more likely to occur with certain values of the other variable

Agresti/Franklin Statistics, 4 of 52  Section 3.1 How Can We Explore the Association Between Two Categorical Variables?

Agresti/Franklin Statistics, 5 of 52 Example: Food Type and Pesticide Status

Agresti/Franklin Statistics, 6 of 52 Example: Food Type and Pesticide Status What is the response variable? What is the explanatory variable? Pesticides: Food Type: Yes No Organic Conventional

Agresti/Franklin Statistics, 7 of 52 Example: Food Type and Pesticide Status What proportion of organic foods contain pesticides? What proportion of conventionally grown foods contain pesticides? Pesticides: Food Type: Yes No Organic Conventional

Agresti/Franklin Statistics, 8 of 52 Example: Food Type and Pesticide Status What proportion of all sampled items contain pesticide residuals? Pesticides: Food Type: Yes No Organic Conventional

Agresti/Franklin Statistics, 9 of 52 Contingency Table The Food Type and Pesticide Status Table is called a contingency table A contingency table: Displays 2 categorical variables The rows list the categories of 1 variable The columns list the categories of the other variable Entries in the table are frequencies

Agresti/Franklin Statistics, 10 of 52 Example: Food Type and Pesticide Status Contingency Table Showing Conditional Proportions

Agresti/Franklin Statistics, 11 of 52 Example: Food Type and Pesticide Status What is the sum over each row? What proportion of organic foods contained pesticide residuals? What proportion of conventional foods contained pesticide residuals? Pesticides: Food Type: Yes No Organic Conventional

Agresti/Franklin Statistics, 12 of 52 Example: Food Type and Pesticide Status

Agresti/Franklin Statistics, 13 of 52 Example: For the following pair of variables, which is the response variable and which is the explanatory variable? College grade point average (GPA) and high school GPA a.College GPA: response variable and High School GPA : explanatory variable b.College GPA: explanatory variable and High School GPA : response variable

Agresti/Franklin Statistics, 14 of 52  Section 3.2 How Can We Explore the Association Between Two Quantitative Variables?

Agresti/Franklin Statistics, 15 of 52 Scatterplot Graphical display of two quantitative variables: Horizontal Axis: Explanatory variable, x Vertical Axis: Response variable, y

Agresti/Franklin Statistics, 16 of 52 Example: Internet Usage and Gross National Product (GDP)

Agresti/Franklin Statistics, 17 of 52 Positive Association Two quantitative variables, x and y, are said to have a positive association when high values of x tend to occur with high values of y, and when low values of x tend to occur with low values of y

Agresti/Franklin Statistics, 18 of 52 Negative Association Two quantitative variables, x and y, are said to have a negative association when high values of x tend to occur with low values of y, and when low values of x tend to occur with high values of y

Agresti/Franklin Statistics, 19 of 52 Example: Did the Butterfly Ballot Cost Al Gore the 2000 Presidential Election?

Agresti/Franklin Statistics, 20 of 52 Linear Correlation: r Measures the strength of the linear association between x and y A positive r-value indicates a positive association A negative r-value indicates a negative association An r-value close to +1 or -1 indicates a strong linear association An r-value close to 0 indicates a weak association

Agresti/Franklin Statistics, 21 of 52 Calculating the correlation, r

Agresti/Franklin Statistics, 22 of 52 Example: 100 cars on the lot of a used-car dealership Would you expect a positive association, a negative association or no association between the age of the car and the mileage on the odometer? Positive association Negative association No association

Agresti/Franklin Statistics, 23 of 52  Section 3.3 How Can We Predict the Outcome of a Variable?

Agresti/Franklin Statistics, 24 of 52 Regression Line Predicts the value for the response variable, y, as a straight-line function of the value of the explanatory variable, x

Agresti/Franklin Statistics, 25 of 52 Example: How Can Anthropologists Predict Height Using Human Remains? Regression Equation: is the predicted height and is the length of a femur (thighbone), measured in centimeters

Agresti/Franklin Statistics, 26 of 52 Example: How Can Anthropologists Predict Height Using Human Remains? Use the regression equation to predict the height of a person whose femur length was 50 centimeters

Agresti/Franklin Statistics, 27 of 52 Interpreting the y-Intercept y-Intercept: the predicted value for y when x = 0 helps in plotting the line May not have any interpretative value if no observations had x values near 0

Agresti/Franklin Statistics, 28 of 52 Interpreting the Slope Slope: measures the change in the predicted variable for every unit change in the explanatory variable Example: A 1 cm increase in femur length results in a 2.4 cm increase in predicted height

Agresti/Franklin Statistics, 29 of 52 Slope Values: Positive, Negative, Equal to 0

Agresti/Franklin Statistics, 30 of 52 Residuals Measure the size of the prediction errors Each observation has a residual Calculation for each residual:

Agresti/Franklin Statistics, 31 of 52 Residuals A large residual indicates an unusual observation Large residuals can easily be found by constructing a histogram of the residuals

Agresti/Franklin Statistics, 32 of 52 “Least Squares Method” Yields the Regression Line Residual sum of squares: The optimal line through the data is the line that minimizes the residual sum of squares

Agresti/Franklin Statistics, 33 of 52 Regression Formulas for y-Intercept and Slope Slope: Y-Intercept:

Agresti/Franklin Statistics, 34 of 52 The Slope and the Correlation Correlation: Describes the strength of the association between 2 variables Does not change when the units of measurement change It is not necessary to identify which variable is the response and which is the explanatory

Agresti/Franklin Statistics, 35 of 52 The Slope and the Correlation Slope: Numerical value depends on the units used to measure the variables Does not tell us whether the association is strong or weak The two variables must be identified as response and explanatory variables The regression equation can be used to predict the response variable

Agresti/Franklin Statistics, 36 of 52  Section 3.4 What Are Some Cautions in Analyzing Associations?

Agresti/Franklin Statistics, 37 of 52 Extrapolation Extrapolation: Using a regression line to predict y-values for x-values outside the observed range of the data Riskier the farther we move from the range of the given x-values There is no guarantee that the relationship will have the same trend outside the range of x-values

Agresti/Franklin Statistics, 38 of 52 Regression Outliers Construct a scatterplot Search for data points that are well removed from the trend that the rest of the data points follow

Agresti/Franklin Statistics, 39 of 52 Influential Observation An observation that has a large effect on the regression analysis Two conditions must hold for an observation to be influential: Its x-value is relatively low or high compared to the rest of the data It is a regression outlier, falling quite far from the trend that the rest of the data follow

Agresti/Franklin Statistics, 40 of 52 Which Regression Outlier is Influential?

Agresti/Franklin Statistics, 41 of 52 Example: Does More Education Cause More Crime?

Agresti/Franklin Statistics, 42 of 52 Correlation does not Imply Causation A correlation between x and y means that there is a linear trend that exists between the two variables A correlation between x and y, does not mean that x causes y

Agresti/Franklin Statistics, 43 of 52 Lurking Variable A lurking variable is a variable, usually unobserved, that influences the association between the variables of primary interest

Agresti/Franklin Statistics, 44 of 52 Simpson’s Paradox The direction of an association between two variables can change after we include a third variable and analyze the data at separate levels of that variable

Agresti/Franklin Statistics, 45 of 52 Example: Is Smoking Actually Beneficial to Your Health?

Agresti/Franklin Statistics, 46 of 52 Example: Is Smoking Actually Beneficial to Your Health?

Agresti/Franklin Statistics, 47 of 52 Example: Is Smoking Actually Beneficial to Your Health?

Agresti/Franklin Statistics, 48 of 52 Example: Is Smoking Actually Beneficial to Your Health?

Agresti/Franklin Statistics, 49 of 52 Example: Is Smoking Actually Beneficial to Your Health? An association can look quite different after adjusting for the effect of a third variable by grouping the data according to the values of the third variable

Agresti/Franklin Statistics, 50 of 52 Data are available for all fires in Chicago last year on x = number of firefighters at the fires and y = cost of damages due to fire Would you expect the correlation to be negative, zero, or positive? a.Negative b.Zero c.Positive

Agresti/Franklin Statistics, 51 of 52 If the correlation is positive, does this mean that having more firefighters at a fire causes the damages to be worse? a.Yes b.No Data are available for all fires in Chicago last year on x = number of firefighters at the fires and y = cost of damages due to fire

Agresti/Franklin Statistics, 52 of 52 Identify a third variable that could be considered a common cause of x and y: a.Distance from the fire station b.Intensity of the fire c.Time of day that the fire was discovered Data are available for all fires in Chicago last year on x = number of firefighters at the fires and y = cost of damages due to fire