Business Statistics - QBM117 Scatter diagrams and measures of association.

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Chapter 3: Describing Relationships
Chapter 6: Exploring Data: Relationships Lesson Plan
Describing the Relation Between Two Variables
Chapter 10 Relationships between variables
Business Statistics - QBM117 Interval estimation for the slope and y-intercept Hypothesis tests for regression.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
CHAPTER 3 Describing Relationships
Describing Relationships: Scatterplots and Correlation
Business Statistics - QBM117 Least squares regression.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Linear Regression and Correlation
Descriptive Methods in Regression and Correlation
Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.
Correlation and regression 1: Correlation Coefficient
Copyright ©2011 Nelson Education Limited Describing Bivariate Data CHAPTER 3.
Chapter 3 Describing Bivariate Data General Objectives: Sometimes the data that are collected consist of observations for two variables on the same experimental.
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
CHAPTER 4: Scatterplots and Correlation ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Chapter 6: Exploring Data: Relationships Lesson Plan Displaying Relationships: Scatterplots Making Predictions: Regression Line Correlation Least-Squares.
Prior Knowledge Linear and non linear relationships x and y coordinates Linear graphs are straight line graphs Non-linear graphs do not have a straight.
LECTURE UNIT 7 Understanding Relationships Among Variables Scatterplots and correlation Fitting a straight line to bivariate data.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Objectives (IPS Chapter 2.1)
Correlation Analysis. A measure of association between two or more numerical variables. For examples height & weight relationship price and demand relationship.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.1 Scatterplots.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.1 Scatterplots.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Regression and Time Series CHAPTER 11 Correlation and Regression: Measuring and Predicting Relationships.
Creating a Residual Plot and Investigating the Correlation Coefficient.
4.2 Correlation The Correlation Coefficient r Properties of r 1.
Section 5.1: Correlation. Correlation Coefficient A quantitative assessment of the strength of a relationship between the x and y values in a set of (x,y)
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Chapter 4 Scatterplots and Correlation. Chapter outline Explanatory and response variables Displaying relationships: Scatterplots Interpreting scatterplots.
Business Statistics for Managerial Decision Making
What Do You See?. A scatterplot is a graphic tool used to display the relationship between two quantitative variables. How to Read a Scatterplot A scatterplot.
Notes Chapter 7 Bivariate Data. Relationships between two (or more) variables. The response variable measures an outcome of a study. The explanatory variable.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Correlation  We can often see the strength of the relationship between two quantitative variables in a scatterplot, but be careful. The two figures here.
Scatter Plots. Standard: 8.SP.1 I can construct and interpret scatterplots.
GOAL: I CAN USE TECHNOLOGY TO COMPUTE AND INTERPRET THE CORRELATION COEFFICIENT OF A LINEAR FIT. (S-ID.8) Data Analysis Correlation Coefficient.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
CHAPTER 7 LINEAR RELATIONSHIPS
Chapter 3: Describing Relationships
SCATTERPLOTS, ASSOCIATION AND RELATIONSHIPS
Bivariate Data.
Chapter 7 Part 1 Scatterplots, Association, and Correlation
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Introduction to Probability and Statistics Thirteenth Edition
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3 Scatterplots and Correlation.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Review of Chapter 3 Examining Relationships
Chapter 3: Describing Relationships
Presentation transcript:

Business Statistics - QBM117 Scatter diagrams and measures of association

Objectives w To introduce briefly, the topic of regression and correlation. w To explore relationships between two variables using the graphical technique of scatter diagrams. w To introduce two measures of association which can be used to measure the amount of association between two variables.

Regression and correlation: measuring and predicting relationships w Regression and correlation shows us how to summarise the relationship between two factors, based on a bivariate (two variables) set of data. w Correlation is a measure of the strength of the relationship between the two variables; w Regression helps us to predict one variable from the other. In earlier modules we learnt to look at data, compute and interpret probabilities, draw random samples and perform statistical inference. Now we apply these concepts to explore relationships between several variables.

In our earlier studies we learnt to summarise univariate (single variable) data using statistical summaries such as the mean, to describe the centre and the standard deviation to describe the variability. With bivariate data we could use these same statistics to summarise each variable separately, however the payoff comes from studying them both together, to explore the relationship between them.

Economists and business operators are often interested in relationships between two quantitative variables. Exploring relationships using scatterplots For example How does advertising affect sales in my business? If I increase the price on this product, what effect will this have on demand? What effect are inflation rates having on unemployment rates, on the price of petrol, on the price of new homes etc?

Exploring relationships using scatterplots and correlations w is the relationship between the two variables linear or non linear? w are there any outliers in the data? w what is the strength of the relationship between the two variables? etc. Scatterplots provide useful insights into the structure of the data such as

Correlation is a summary measure of the strength of the relationship. It is both helpful and limited. w If the scatterplot shows either a well behaved linear relationship or no relationship at all, then the correlation provides an excellent summary of the relationship; w If however there are problems with the data such as, a non linear relationship or outliers in the data, the correlation can be misleading. Therefore correlation on its own has limited use as its interpretation depends on the type of relationship in the data.

The Scatterplot w is simply a plot of all the data. w If one variable is seen as causing, affecting, or influencing the other, then it is plotted on the x (horizontal) axis. This variable is referred to as the independent variable. The variable that is affected or influenced by the other, is plotted on the y (vertical) axis. This variable is referred to as the dependent variable. w If neither causes, affects or influences the other, it does not matter which one is plotted where.

Correlation measures the strength of the relationship between the two variables w Correlation, denoted  (rho) for a population and r for a sample, varies from –1 to +1, summarising the strength of the relationship in the data. w A correlation of 1 indicates a perfect straight-line relationship, with higher values of one variable associated with perfectly predictable higher values of the other variable. w A correlation of –1 indicates a perfect inverse straight-line relationship, with one variable decreasing as the other increases. w For correlations between –1 and 1, the size of the correlation indicates the strength of the relationship while the sign (+ or -) indicates the direction (increasing or decreasing).

w A correlation of 0 generally indicates no relationship, just randomness. w Correlations must be interpreted with caution as nonlinear structures and outliers can distort the usual interpretation. w Correlation measures how close the data points are to being exactly on a tilted straight line. It has nothing to do with the steepness (slope) of the line.

Interpreting Correlation w r = 1 A perfect straight line tilting up to the right w r = 0 No overall tilt No relationship? w r = – 1 A perfect straight line tilting down to the right X Y X Y X Y X Y X Y X Y

Various types of relationships A linear relationship is observed when w the scatterplot shows points bunched randomly around a straight line. w The points could be tightly bunched, falling almost exactly on a line, or more likely, they will be well scattered, forming a ‘cloud’ of points.

Example: Exploring TV Ratings w People Meters vs. Nielsen Index Two measures of the market share of 10 TV shows Correlation is r = Very strong positive association (since r is close to 1) Linear relationship Straight line with scatter Increasing relationship Tilts up and to the right Nielsen Index People Meters

Example: Merger Deals w Dollars vs. Deals For mergers and acquisitions by investment bankers 134 deals worth $63 billion by Goldman Sachs Correlation is r = Strong positive association Linear relationship Straight line with scatter Increasing relationship Tilts up and to the right Deals Dollars (Billions)

Example: Mortgage Rates & Fees w Interest Rate vs. Loan Fee For mortgages If the interest rate is lower, does the bank make it up with a higher loan fee? Correlation is r = – Negative association Linear relationship Straight line with scatter Decreasing relationship Tilts down and to the right

Various types of relationships No relationship is observed when w the scatterplot shows a random scatter of points with no tilt either upward or downward. w The points could look like a ‘cloud’ of points that is either circular or oval shaped. w The oval could be either up and down or left and right but it is not tilted (as you move from left to right).

Example: The Stock Market w Today’s vs. Yesterday’s Percent Change Is there momentum? If the market was up yesterday, is it more likely to be up today? Or is each day’s performance independent? Correlation is r = 0.11 A weak relationship? No relationship? Tilt is neither up nor down

Various types of relationships A non linear relationship is observed when w the scatterplot shows points bunched around a curve, rather than a straight line. w Correlation and regression analysis must be used with care on nonlinear data sets. w For most problems we first transform one or both of the variables, to obtain a linear relationship, then we fit a regression.

w Call Price vs. Strike Price For stock options “Call Price” is the price of the option contract to buy stock at the “Strike Price” The right to buy at a lower strike price has more value A nonlinear relationship Not a straight line: A curved relationship Correlation r = – A negative relationship: Higher strike price goes with lower call price Example: Stock Options

Example: Maximizing Yield w Output Yield vs. Temperature For an industrial process With a “best” optimal temperature setting A nonlinear relationship Not a straight line: A curved relationship Correlation r = – r suggests no relationship But relationship is strong It tilts neither up nor down Temperature Yield of process

Outliers w A data point is an outlier if it does not fit the relationship of the rest of the data. w It can distort statistical summaries and make them very misleading. w Watch out for outliers by looking at the scatterplot and if you can justify removing an outlier (by finding that it should not have been there), then do so. w If you have to leave it, be aware of the problems it can cause and consider reporting statistical summaries (eg the correlation coefficient) both with and without it.

Example: Cost and Quantity w Cost vs. Number Produced For a production facility It usually costs more to produce more An outlier is visible A disaster (a fire at the factory) High cost, but few produced 3,000 4,000 5, Number produced Cost 0 10, Number produced Cost Outlier removed: More details, r = r = – 0.623

Reading for next lecture Read Chapter 18 Sections (Chapter 11 Sections 11.1 – 11.3 abridged)