Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to bivariate data

Similar presentations


Presentation on theme: "Introduction to bivariate data"— Presentation transcript:

1 Introduction to bivariate data

2 Why Two Variables instead of just one?

3 We

4 Let’s collect some data.
(except we don’t have time, so you will just read about collecting this data) Image that everyone in class counts the number of text messages they received yesterday. Then I select a sample of 10 students and actually record their number of messages received. That’s one-variable data, so we could make a dotplot or some other kind of graphical display.

5 How the heck do we do that?
I want to use this data to PREDICT the number of text messages received by the next randomly chosen student. How the heck do we do that? The best number we can use as a predictor is the mean from the 10 students in our sample But that still might not be a very good prediction Can we make our prediction better? Is there some other variable that influences the number of text messages a student might receive? ? ?

6 How about the number of messages sent?
Could we collect data on messages sent and received, an then use that to make a prediction of the number of messages received based on how many messages were sent?

7 This is what we mean by Bivariate (Two Variable) Data
This is what we mean by Bivariate (Two Variable) Data. Instead of looking at one variable at a time, we look at two variables that are related. Perhaps one even depends on the other. We are able to use the value of one of the variables to make a prediction about the other

8 Instead of having one axis or scale, we’re going to have two axes
Instead of having one axis or scale, we’re going to have two axes. The same ones we call “x” ad “y” in algebra But instead of just having equations like we usually did in algebra, we’re going to start with data, which we can display as a scatterplot.

9 Correlation and the correlation coefficient
How close are we to a line?

10 In AP Stats, when it comes to Bivariate Data, we like lines
In AP Stats, when it comes to Bivariate Data, we like lines. Computers and graphing calculators can create all sorts of equations from data, but we’re only going to be interested in creating lines. So we need to judge our data as to how close it is to being linear. In fact, we’re going to calculate a number that will quantify how linear our data is.

11 Let’s start with this set of data:
We can also do some summary statistics, the mean and standard deviation of the x’s and the y’s 𝑥 = 2.75 and 𝑠 𝑥 = 1.708 𝑦 = 2.5 and 𝑠 𝑦 = 1.291 x y 1 2 3 5 4

12 Let’s look at the plot again with the mean x and mean y values added.
Using z-scores, we can see how far away from the mean each of our points are, relative to the others. When we do z-scores, we can get positive answers (above the mean) and negative answers (below the mean) The z-scores for the x-value and y-value of each point are given on the next slide. 𝑥 𝑦

13 𝑥 = 2.75 and 𝑠 𝑥 = 1.708 𝑦 = 2.5 and 𝑠 𝑦 = 1.291 z = 𝑥 − 𝑥 𝑠 𝑥 or 𝑦 − 𝑦 𝑠 𝑦 x-coord. y-coord. z-score of x z-score of y 1 1− = 1− = 2 3 2− = 1− = 3− = 1− = 5 4 5− = 1− =

14 And now, just for kicks, lets multiply the x z-score and y z- score for each point together! OK, it’s not just for kicks. This is how we combine the effects of the x-coordinate and the y-coordinate together The fact that we multiply is also not arbitrary. It’s how we get positive or negative slope, or positive or negative correlation (remember that from algebra I?) Points below 𝑦 have negative z-scores, and points above are positive. Points to the left of 𝑥 have negative z- scores and points to the right are positive. Multiplication rules from algebra: If the signs are the same, the product is positive. If the signs are different, the product is negative

15 No matter where the x- and y-axes are, the mean of the x-values and mean of the y- values split the graph into four quadrants. If there are more data points in the blue quadrants you have a positive relationship, or positive correlation. If there are more points in the white quadrants you have a negative relationship or negative correlation z-score of x’s negative z-score of y’s positive Product = negative z-score of x’s positive z-score of y’s negative Product = positive 𝒙 𝒚

16 So, back to this multiplication thing…
x-coord. y-coord. z-score of x z-score of y 1 1− = 1− = 2 3 2− = 1− = 3− = 1− = 5 4 5− = 1− = ( )( ) = ( )(0.3873) = (0.1464)( ) = (1.3173)( ) = 1.531 And while we’re at it, why don’t we go ahead and find an average of these x-z-score-y-z-score products. (Notice we’ve been using x-bar and y-bar, not µ or σ, so when we take the average, we’re going to divide by how many we have minus 1) Average = −1.701 − = This number, the average of the products of the z-scores for the x- and y-coordinate of each point, is our magical number that tells us how closely our points are to being perfect line.

17 It’s not really magic. It’s how mathematicians and statisticians decided to measure “how close are we to a line?” This number is called the CORRELATION COEFFICIENT, and the symbol we use is r The formula, based on the steps we just did, is r = 1 𝑛− 𝑥− 𝑥 𝑠 𝑥 𝑦− 𝑦 𝑠 𝑦 The sign of r tells us if our data points have a positive correlation or negative correlation. If the value calculated for r is exactly 1 or negative 1, our points are in an exact line. If the value of r (regardless of sign) is greater than .8, we say that we have a strong relationship If the value of r is between .5 and .8, we say that we have a moderate relationship If the value of r is less than .5, we say we have a weak relationship

18 r is a quantitative value that tells us the strength and direction of the linear relationship that exists in any set of data points r only tells us what kind of linear relationship we have Interpretation of r: When you are asked to interpret the meaning of the correlation coefficient, r, you always do so using the following sentence: There is a [weak/moderate/strong (depending on what the value of r is)], [positive/negative (depending on the sign of r)] LINEAR relationship between [ x variable in context ] and [ y variable in context]. MEMORIZE this sentence. You will change the value of the blue words based on the data set that you have. Luckily, we don’t have to use that awful formula. The calculator calculates this value for us in a couple of different ways. Your first handout has all the steps you need.


Download ppt "Introduction to bivariate data"

Similar presentations


Ads by Google