Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.

Slides:



Advertisements
Similar presentations
Chapter 3 Examining Relationships Lindsey Van Cleave AP Statistics September 24, 2006.
Advertisements

Section 6.1: Scatterplots and Correlation (Day 1).
Chapter 3 Examining Relationships
Chapter 6: Exploring Data: Relationships Lesson Plan
Looking at data: relationships Scatterplots IPS chapter 2.1 © 2006 W. H. Freeman and Company.
CHAPTER 3 Describing Relationships
Chapter 7 Scatterplots, Association, Correlation Scatterplots and correlation Fitting a straight line to bivariate data © 2006 W. H. Freeman.
LECTURE 2 Understanding Relationships Between 2 Numerical Variables
Association between 2 variables We've described the distribution of 1 variable in Chapter 1 - but what if 2 variables are measured on the same individual?
Relationships Scatterplots and correlation BPS chapter 4 © 2006 W.H. Freeman and Company.
Chapter 6: Exploring Data: Relationships Chi-Kwong Li Displaying Relationships: Scatterplots Regression Lines Correlation Least-Squares Regression Interpreting.
Chapter 6: Exploring Data: Relationships Lesson Plan Displaying Relationships: Scatterplots Making Predictions: Regression Line Correlation Least-Squares.
Scatterplots. Learning Objectives By the end of this lecture, you should be able to: – Describe what a scatterplot is – Be comfortable with the terms.
LECTURE UNIT 7 Understanding Relationships Among Variables Scatterplots and correlation Fitting a straight line to bivariate data.
Looking at data: relationships Scatterplots IPS chapter 2.1 © 2006 W. H. Freeman and Company.
Objectives (IPS Chapter 2.1)
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.1 Scatterplots.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.1 Scatterplots.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Association between 2 variables We've described the distribution of 1 variable - but what if 2 variables are measured on the same individual? Examples?
Relationships Scatterplots and correlation BPS chapter 4 © 2006 W.H. Freeman and Company.
3.2: Linear Correlation Measure the strength of a linear relationship between two variables. As x increases, no definite shift in y: no correlation. As.
4.2 Correlation The Correlation Coefficient r Properties of r 1.
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
Chapter 4 Scatterplots and Correlation. Chapter outline Explanatory and response variables Displaying relationships: Scatterplots Interpreting scatterplots.
Relationships Scatterplots and correlation BPS chapter 4 © 2006 W.H. Freeman and Company.
Relationships Scatterplots and Correlation.  Explanatory and response variables  Displaying relationships: scatterplots  Interpreting scatterplots.
Lecture 4 Chapter 3. Bivariate Associations. Objectives (PSLS Chapter 3) Relationships: Scatterplots and correlation  Bivariate data  Scatterplots (2.
Lecture 3 – Sep 3. Normal quantile plots are complex to do by hand, but they are standard features in most statistical software. Good fit to a straight.
Statistics for Business and Economics Module 2: Regression and time series analysis Spring 2010 Lecture 2: Examining the relationship between two quantitative.
Scatter plots Adapted from 350/
3. Relationships Scatterplots and correlation
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
The Practice of Statistics in the Life Sciences Fourth Edition
Chapter 3: Describing Relationships
Chapter 2 Looking at Data— Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3 Scatterplots and Correlation.
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Summarizing Bivariate Data
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Scatterplots.
Review of Chapter 3 Examining Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Image from Minitab Website
Presentation transcript:

Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012 W.H. Freeman and Company

Relationship of two numerical variables Most statistical studies involve more than one variable and the primary questions are about their relationships. Questions one can ask:  Which variable(s) are explanatory and which are responses?  Do we want to know how one variable affects the value of another?  Or do we simply want to measure their association?  How is the relationship best described?  Is the association positive or negative?  How can we predict one variable from the value of the other(s)?  Can a straight line be used effectively or is the relationship more complex?  How well (close) do the data fit the relationship we describe?  How strong (or weak) is the relationship?  Is the relationship “significant”? (Can we reject H 0 : no association?)  How do the data deviate from the overall pattern?

Examples: variables of interest  Here are two data sets which may interest you:  The weight of a calf (at certain week) and his/her girth. Does the weight of the calf influence the girth, what sort of relationship is there? Can we reliably predict the girth given its weight. How does the relationship change over time.  Your midterm scores. Is there a relationship between the scores in midterm 1 and 2 and midterm 3. Is this relationship strong or weak. If the relationship is strong, then your final grade is pretty much clear. However, if the relationship is weak then those who did well still need to work hard and those who did poorly can still change their grade by working hard.  These data sets are available on my website.  Our objective in the next few lectures is to plot this data (in a meaningful way). Look at the plot for a relationship and to describe the relationship (this is descriptive statistics). Then we will describe how to measure the strength of the relationship and do prediction.

Explanatory and response variables A response variable measures or records an outcome of a study. An explanatory variable explains changes in the response variable. Typically, the explanatory variable is plotted on the x axis, and the response variable is plotted on the y axis. Explanatory variable: number of beers Response variable: blood alcohol content We are interested in the relationship between the two variables: How is one affected by changes in the other one? Two numerical variables for each of 16 students.

StudentBeersBAC Looking at relationships: Scatterplots In a scatterplot, one axis is used to represent each of the variables, and the data are plotted as points on the graph. We look for an overall pattern and for deviations from the pattern.

Interpreting scatterplots  After plotting two variables on a scatterplot, we describe the relationship by examining the direction, form, and strength of the association. We look for an overall pattern …  Direction: positive, negative, no direction.  Form: straight line, curved, clusters, no pattern.  Strength: how closely the points fit the “form”.  … and for deviations from that pattern.  Do the points fit more closely for one part of the form than it does for another?  Are there outliers?  Would it be appropriate to extrapolate the relationship we see?

Form and direction of an association Straight Line Relationship Curved Relationship No Relationship Negative Positive Neither Positive

Positive association: High values of the response variable tend to occur together with high values of the explanatory variable. Negative association: High values of the response variable tend to occur together with low values of the explanatory variable. Flat (no) association: The values of the response variable are similarly distributed for all values of the other variable. There is no information about the response variable that can be predicted from the explanatory variable. Complex association: For some values of the explanatory variable the variables appear to be positively associated, but for other values of that variable they appear to be negatively associated (curvature). Or information other than the general (average) level of the response variable can be predicted from the explanatory variable. Positive or Negative?

Strength of the association The strength of the relationship between the two variables can be seen by how much variation, or scatter, there is around the main form. This is a very strong positive relationship. The daily amount of gas consumed can be predicted quite accurately for a given temperature value. Y varies very little for a given X. This is a weak positive relationship. For a particular median household income (X), you cannot predict the state per capita income (Y) very well. Y varies widely for a given X.

How to scale a scatterplot Using an inappropriate scale for a scatterplot will give an incorrect impression and interpretation of the data. Both variables should be given a similar amount of space: The plot is roughly square. Space cannot be reduced without removing some points. Same data in all four plots. There is a negative relationship between swim time and pulse rate.

Outliers An outlier is a data point that is exceptionally unusual or unexpected. They fall outside of the overall pattern of the relationship. This point is not in line with the others. It is an outlier of the relationship. This point is unusual in its values but it is not an outlier of the relationship.

Objectives 2.2Correlation  The correlation coefficient r  Properties of the correlation coefficient Adapted from authors’ slides © 2012 W.H. Freeman and Company

Measuring relationship: correlation  The correlation coefficient is a measure of the direction and strength of a linear relationship.  It is calculated using the standardized values (z-scores) of both the x and y variables.  r is positive if the relationship is positive and negative if the relationship is negative.  r is always between −1 and 1. The closer it is to −1 or 1, the stronger the relationship. But close to 0 does not necessarily mean no relationship.  r has no units of measurement and does not depend on the units for x and y. Compute this with your calculator or software! z-score for x z-score for y

Time to swim: Pulse rate: Correlation: This indicates a moderately strong negative relationship. The correlation coefficient r "Time to Swim" is the explanatory variable here, and belongs on the x axis. However, the value of r is the same regardless of how we label or plot the variables. The value of r would be the same if, for example, “Time to Swim” was measured in seconds and “Pulse Rate” was measured in beats per hour.

r ranges from − 1 to +1 The correlation coefficient r quantifies the strength and direction of a linear relationship between two quantitative variables. Strength: how closely the points follow a straight line. Direction: is positive when individuals with higher X values tend to have higher values of Y, and is negative when individuals with higher X values tend to have lower values of Y.

Automobiles in Albuquerque were randomly selected (at a shopping center) in 1974 and given an emissions test. Total hydrocarbon emissions level and model year were observed. Direction? Form? Strength? Negative Straight Line? Weak r = −.483

Pollutants were observed over a 28 day period. The carbon pollutants and the ozone level are to be related. Direction? Form? Strength? Positive Straight Line Moderate r =.687

The efficiency of an industrial biofilter is tested at different temperature levels. Direction? Form? Strength? Positive Straight Line Moderate to Strong r =.891

The nickel-to-iron ratio was measured in oat plants and the plant age (in days after emergence) was also recorded. Direction? Form? Strength? Complex (positive until 50 days, then negative) Curved Strong (if curve is taken into account) r =.479 The correlation measures the degree to which the points fit a straight line, not a curve.

Example: correlations between midterm scores Midterm 1 Midterm 2 Midterm 3 Midterm 1 Midterm Midterm We can see from the correlations above, that as expected the correlation between the midterm scores is positive (because the correlation coefficients are all greater than zero). However, none of the correlation coefficients are that large. This means that the association is not strong. This means that the midterm score can not be predicted well from the previous midterm scores. This is good news, it appears that you can improve! The correlation is strongest between midterm 1 and midterm 3, this I did not expect!