Covariance and Correlation

Slides:



Advertisements
Similar presentations
Simple Linear Regression 1. 2 I want to start this section with a story. Imagine we take everyone in the class and line them up from shortest to tallest.
Advertisements

Linear regression and correlation
Covariance and Correlation: Estimator/Sample Statistic: Population Parameter: Covariance and correlation measure linear association between two variables,
Review ? ? ? I am examining differences in the mean between groups
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
1 The Basics of Regression. 2 Remember back in your prior school daze some algebra? You might recall the equation for a line as being y = mx + b. Or maybe.
1 Here are some additional methods for describing data.
Price elasticity of demand
R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate.
1 Price elasticity of demand and revenue implications Often in economics we look at how the value of one variable changes when another variable changes.
1 Here are some additional methods for describing data.
Labor Demand. Overview In the next few chapters we will consider the demand for resources. Just as a coin has two sides, when viewing a firm we could.
1 Basic Macroeconomic Relationships. 2 Overview Here we study some basic economic relationships that we think hold in a general way in the economy. Here.
1 Labor Demand and Supply. 2 Overview u In the previous few chapters we have focused on the output decision for firms. Now we want to focus on the input.
Correlation 2 Computations, and the best fitting line.
Basic Statistical Concepts Psych 231: Research Methods in Psychology.
Statistics Psych 231: Research Methods in Psychology.
1 Price elasticity of demand and revenue implications Often in economics we look at how the value of one variable changes when another variable changes.
Vectors Sections 6.6.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
1 Linear Programming Using the software that comes with the book.
Introduction to Linear Regression.  You have seen how to find the equation of a line that connects two points.
Simple Linear Regression 1. 2 I want to start this section with a story. Imagine we take everyone in the class and line them up from shortest to tallest.
Chapter 8: Bivariate Regression and Correlation
Linear Regression and Correlation
Lecture 3-2 Summarizing Relationships among variables ©
Copyright © Cengage Learning. All rights reserved.
Rate of Change and Slope
Correlation and regression 1: Correlation Coefficient
Identifying Linear Functions
Measures of Central Tendency Mean – average, add and divide by number of numbers Median – middle number, order from least to greatest & find middle number.
Linear Inequalities Foundation Part I. An INEQUALITY shows a relationship between two variables, usually x & y Examples –y > 2x + 1 –y < x – 3 –3x 2 +
Unit 1 Understanding Numeric Values, Variability, and Change 1.
Relationships between Variables. Two variables are related if they move together in some way Relationship between two variables can be strong, weak or.
Scatter Plots and Linear Correlation. How do you determine if something causes something else to happen? We want to see if the dependent variable (response.
Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph.
Evaluate each equation for x = –1, 0, and y = 3x 2. y = x – 7 3. y = 2x y = 6x – 2 –3, 0, 3 –8, –7, –6 3, 5, 7 –8, –2, 4 Pre-Class Warm Up.
1. Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Graphing Linear Equations and Inequalities CHAPTER 4.1The Rectangular.
Two Random Variables W&W, Chapter 5. Joint Distributions So far we have been talking about the probability of a single variable, or a variable conditional.
Does this point lie on this line? The Point-Slope format (y – y 1 ) = m(x – x 1 )
Correlation Analysis. A measure of association between two or more numerical variables. For examples height & weight relationship price and demand relationship.
1 Warm UP Graph each equation and tell whether it is linear. (create the table & graph) 1. y = 3x – 1 2. y = x 3. y = x 2 – 3 yes Insert Lesson.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
Absolute Value Make it positive Properties: Commutative Associative Identity.
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Correlation and Regression. Section 9.1  Correlation is a relationship between 2 variables.  Data is often represented by ordered pairs (x, y) and.
Creating a Residual Plot and Investigating the Correlation Coefficient.
Correlation The apparent relation between two variables.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Writing and Graphing Linear Equations Linear equations can be used to represent relationships.
Chapter 4 Scatterplots and Correlation. Chapter outline Explanatory and response variables Displaying relationships: Scatterplots Interpreting scatterplots.
Slide Copyright © 2009 Pearson Education, Inc. Slide Copyright © 2009 Pearson Education, Inc. Welcome to MM150 – Unit 4 Seminar Unit 4 Seminar.
.  Relationship between two sets of data  The word Correlation is made of Co- (meaning "together"), and Relation  Correlation is Positive when the.
Discovering Mathematics Week 9 – Unit 6 Graphs MU123 Dr. Hassan Sharafuddin.
Section 7.1 The Rectangular Coordinate System and Linear Equations in Two Variables Math in Our World.
Copyright © 2009 Pearson Education, Inc. 7.1 Seeking Correlation LEARNING GOAL Be able to define correlation, recognize positive and negative correlations.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Unit 3 Seminar Agenda The Rectangular Coordinate System  The Vocabulary of Graphing  Ordered Pairs  Finding the Midpoint Graphing Lines  Types of Lines.
MM150 ~ Unit 9 Statistics ~ Part II. WHAT YOU WILL LEARN Mode, median, mean, and midrange Percentiles and quartiles Range and standard deviation z-scores.
7.1 Seeking Correlation LEARNING GOAL
Warm Up Scatter Plot Activity.
Chapter 1 Linear Equations and Graphs
Chapter 1 Linear Equations and Graphs
Introduction to bivariate data
3.1 Reading Graphs; Linear Equations in Two Variables
Chapter 1 Linear Equations and Graphs
How to solve equations One step equations:
Warsaw Summer School 2017, OSU Study Abroad Program
Presentation transcript:

Covariance and Correlation Two Variables Covariance and Correlation

y II I III IV x When you first encountered the coordinate system used for graphs, you may recall talking about the first quadrant, second quadrant and so as shown in the graph.

y II I III IV II I III IV x Later, we might break up what was really the first quadrant into another four quadrants. Please be on the look-out for this.

Overview Sometimes we want to study two variables together. Let’s think about an example about the number of commercials run (variable) during the weekend for a store and the sales volume (variable). Presumably sales volume depends on the number of commercials. Sales volume is the dependent variable and the number of commercials is the independent variable. On the next screen I have a scatter plot of the data. Notice I put the dependent variable on the vertical, or y, axis. Also notice that I put in lines where the mean of the x and y variables are located. I also put in roman numerals to represent quadrants that I will refer to later.

II x = 3 I y =51 III IV

Thought experiment Imagine we are all in a room together. Say I ask you to turn around, but keep your eyes open. Then I sneak over to the light switch and flip the switch. (by the way, how do you catch a unique bird? U nique up on it!) Now, since the lights were on when we started, let’s say a flip is me turning the switch to off and back to on. So with each flip the lights go off once, but come back on. How many times would the lights go off if I flip the switch 4 times? The lights would go out 4 times! Wouldn’t it be wild it the number of commercials was like me at the switch flipping and the sales were like the lights going going off. We could get sales to be exactly like what we want.

The covariance and correlation measures are designed to tell us how close our variables are related like the flipping of the switch and lights out phenomenon. By the way, why not just show a gazillion commercials? Because the cost of another commercial might be greater than the additional sales generated. Profit would thus fall. Companies want to maximize profit, not sales. See how easy economics is? Let’s look at two data points. (2, 50) and (5, 57) will be used (# of commercials, sales) = (x, y). Recall a deviation (not standard deviation here) is a value minus the mean. For the first point the x deviation is 2 – 3 = -1, and the y deviation is 50 – 51 = -1. Note this point is in quadrant III in the graph and both deviations are negative. If we multiply the two deviations together we get a positive number.

For the second point the x deviation is 5 – 3 = 2, and the y deviation is 57 – 51 = 6. Note this point is in quadrant I in the graph and both deviations are positive. If we multiply the two deviations together we get a positive number. Note in our example in the graph we do not have any points in quadrant II or IV. But, if we did, one deviation would be negative and the other would be positive. Multiplying the two together would give a negative number. Also note that points on one of the mean lines would have a zero deviation and thus multiplied by the other deviation would also result in zero.

Sample covariance sxy =[{Σ(xi – x)(yi - y)]/n-1 WOW, what the heck is this? A formula. Break it down. For each data point find its x deviation and its y deviation. Multiply the x and y deviation for each point. Add up, or sum, the multiplied deviations. Divide the sum by n - 1 (why n – 1? Why not! – just do it for now- please) Now think about the graph I had before. If most of the values are in quadrants I and III, then step 2) above gives mostly positive values and the sum should be positive. If most values where in quadrants II and IV we should have mostly negative values and a negative sum.

Some interpretation A positive sample covariance means that most of the values in the data are in quadrants I and III. This suggests if the x value is above its mean then the y value is above its mean, and if the x value is below its mean the y value is below its mean. x and y are mostly on the same sides of their means. A negative sample covariance means that most of the values in the data are in quadrants II and IV. This suggests if the x value is above its mean then the y value is below its mean, and if the x value is below its mean the y value is above its mean. x and y are mostly on the opposite sides of their means. The sample covariance tells us about the direction of the relationship between the two variables. A positive value indicates x and y move in the same direction. A negative value means x and y move in opposite directions.

What do you think is the direction of the relationship between the variables price of the product and quantity of the product demanded? Yes, you are right, it is negative. The higher the price, the lower the quantity demanded. What if the sample covariance is zero? Then x and y are not related! What is the covariance between the weekly rain in Spain and the amount of wheat grown in the Great Plains? Probably nothing! Now, when I say the sample covariance indicates the direction of the relationship, I really should say linear relationship.

A problem and a solution The sample covariance in our example between number of commercials and sales volume is 11. The sales where measured in 100’s of dollars. So, for example, 50 really meant 5000. Guess what happens to the covariance if we measured 5000 as 5000 (100 times more than 50). The covariance becomes 1100 (100 times more than 11). This is ok because it is positive, showing the same positive relationship as 11. But we also want to know how strong is the relationship. Covariance does not show this. But the correlation coefficient does show the strength of the relationship.

Correlation coefficient The correlation coefficient uses the covariance in the calculation. rxy = sxy/(sxsy), so the correlation coefficient is the covariance divided by the product of the two standard deviations. For positive relationships between the variables rxy will range from 0 to 1 and for negative relationships the value will range from 0 to –1. So, in total the correlation coefficient ranges from -1 to 1. Remember the flipping the switch and the lights out example. The correlation coefficient would be exactly 1. Let me elaborate. Say each of you gets one minute to flick the switch. Will each of you flick the same number of times? Probably not, but whatever you flick there will be an equal number of lights out times. (more next screen)

The point is if we took each of your flicks, lights out combinations and put them in one graph, all the points could be connected by a straight line. Data that are exactly all on a straight line have a rxy = 1 if the line goes up the hill from left to right and –1 if down the hill from left to right. The 1 or –1 means the relationship is as strong as it can possibly be. Closer to zero means there is no relationship, which is as weak as you can get. Think of our lights example again. If I know the flicks do I even have to watch the lights? NO, because the relationship is perfect. For example, 2 flicks means 2 lights out. Unfortunately in business the relationships are usually not exactly like the lights. But the closer to -1 or 1 that you get the stronger the relationship is between the two variables.