Presentation is loading. Please wait.

Presentation is loading. Please wait.

Business Statistics - QBM117 Scatter diagrams and measures of association.

Similar presentations


Presentation on theme: "Business Statistics - QBM117 Scatter diagrams and measures of association."— Presentation transcript:

1 Business Statistics - QBM117 Scatter diagrams and measures of association

2 Objectives w To introduce briefly, the topic of regression and correlation. w To explore relationships between two variables using the graphical technique of scatter diagrams. w To introduce two measures of association which can be used to measure the amount of association between two variables.

3 Regression and correlation: measuring and predicting relationships w Regression and correlation shows us how to summarise the relationship between two factors, based on a bivariate (two variables) set of data. w Correlation is a measure of the strength of the relationship between the two variables; w Regression helps us to predict one variable from the other. In earlier modules we learnt to look at data, compute and interpret probabilities, draw random samples and perform statistical inference. Now we apply these concepts to explore relationships between several variables.

4 In our earlier studies we learnt to summarise univariate (single variable) data using statistical summaries such as the mean, to describe the centre and the standard deviation to describe the variability. With bivariate data we could use these same statistics to summarise each variable separately, however the payoff comes from studying them both together, to explore the relationship between them.

5 Economists and business operators are often interested in relationships between two quantitative variables. Exploring relationships using scatterplots For example How does advertising affect sales in my business? If I increase the price on this product, what effect will this have on demand? What effect are inflation rates having on unemployment rates, on the price of petrol, on the price of new homes etc?

6 Exploring relationships using scatterplots and correlations w is the relationship between the two variables linear or non linear? w are there any outliers in the data? w what is the strength of the relationship between the two variables? etc. Scatterplots provide useful insights into the structure of the data such as

7 Correlation is a summary measure of the strength of the relationship. It is both helpful and limited. w If the scatterplot shows either a well behaved linear relationship or no relationship at all, then the correlation provides an excellent summary of the relationship; w If however there are problems with the data such as, a non linear relationship or outliers in the data, the correlation can be misleading. Therefore correlation on its own has limited use as its interpretation depends on the type of relationship in the data.

8 The Scatterplot w is simply a plot of all the data. w If one variable is seen as causing, affecting, or influencing the other, then it is plotted on the x (horizontal) axis. This variable is referred to as the independent variable. The variable that is affected or influenced by the other, is plotted on the y (vertical) axis. This variable is referred to as the dependent variable. w If neither causes, affects or influences the other, it does not matter which one is plotted where.

9 Correlation measures the strength of the relationship between the two variables w Correlation, denoted  (rho) for a population and r for a sample, varies from –1 to +1, summarising the strength of the relationship in the data. w A correlation of 1 indicates a perfect straight-line relationship, with higher values of one variable associated with perfectly predictable higher values of the other variable. w A correlation of –1 indicates a perfect inverse straight-line relationship, with one variable decreasing as the other increases. w For correlations between –1 and 1, the size of the correlation indicates the strength of the relationship while the sign (+ or -) indicates the direction (increasing or decreasing).

10 w A correlation of 0 generally indicates no relationship, just randomness. w Correlations must be interpreted with caution as nonlinear structures and outliers can distort the usual interpretation. w Correlation measures how close the data points are to being exactly on a tilted straight line. It has nothing to do with the steepness (slope) of the line.

11 Interpreting Correlation w r = 1 A perfect straight line tilting up to the right w r = 0 No overall tilt No relationship? w r = – 1 A perfect straight line tilting down to the right X Y X Y X Y X Y X Y X Y

12 Various types of relationships A linear relationship is observed when w the scatterplot shows points bunched randomly around a straight line. w The points could be tightly bunched, falling almost exactly on a line, or more likely, they will be well scattered, forming a ‘cloud’ of points.

13 Example: Exploring TV Ratings w People Meters vs. Nielsen Index Two measures of the market share of 10 TV shows Correlation is r = 0.974 Very strong positive association (since r is close to 1) Linear relationship Straight line with scatter Increasing relationship Tilts up and to the right 10 20 30 102030 Nielsen Index People Meters

14 Example: Merger Deals w Dollars vs. Deals For mergers and acquisitions by investment bankers 134 deals worth $63 billion by Goldman Sachs Correlation is r = 0.790 Strong positive association Linear relationship Straight line with scatter Increasing relationship Tilts up and to the right 0 20 40 60 80 050100150200 Deals Dollars (Billions)

15 Example: Mortgage Rates & Fees w Interest Rate vs. Loan Fee For mortgages If the interest rate is lower, does the bank make it up with a higher loan fee? Correlation is r = – 0.654 Negative association Linear relationship Straight line with scatter Decreasing relationship Tilts down and to the right

16 Various types of relationships No relationship is observed when w the scatterplot shows a random scatter of points with no tilt either upward or downward. w The points could look like a ‘cloud’ of points that is either circular or oval shaped. w The oval could be either up and down or left and right but it is not tilted (as you move from left to right).

17 Example: The Stock Market w Today’s vs. Yesterday’s Percent Change Is there momentum? If the market was up yesterday, is it more likely to be up today? Or is each day’s performance independent? Correlation is r = 0.11 A weak relationship? No relationship? Tilt is neither up nor down

18 Various types of relationships A non linear relationship is observed when w the scatterplot shows points bunched around a curve, rather than a straight line. w Correlation and regression analysis must be used with care on nonlinear data sets. w For most problems we first transform one or both of the variables, to obtain a linear relationship, then we fit a regression.

19 w Call Price vs. Strike Price For stock options “Call Price” is the price of the option contract to buy stock at the “Strike Price” The right to buy at a lower strike price has more value A nonlinear relationship Not a straight line: A curved relationship Correlation r = – 0.895 A negative relationship: Higher strike price goes with lower call price Example: Stock Options

20 Example: Maximizing Yield w Output Yield vs. Temperature For an industrial process With a “best” optimal temperature setting A nonlinear relationship Not a straight line: A curved relationship Correlation r = – 0.0155 r suggests no relationship But relationship is strong It tilts neither up nor down 120 130 140 150 160 500600700800900 Temperature Yield of process

21 Outliers w A data point is an outlier if it does not fit the relationship of the rest of the data. w It can distort statistical summaries and make them very misleading. w Watch out for outliers by looking at the scatterplot and if you can justify removing an outlier (by finding that it should not have been there), then do so. w If you have to leave it, be aware of the problems it can cause and consider reporting statistical summaries (eg the correlation coefficient) both with and without it.

22 Example: Cost and Quantity w Cost vs. Number Produced For a production facility It usually costs more to produce more An outlier is visible A disaster (a fire at the factory) High cost, but few produced 3,000 4,000 5,000 20304050 Number produced Cost 0 10,000 0204060 Number produced Cost Outlier removed: More details, r = 0.869 r = – 0.623

23 Reading for next lecture Read Chapter 18 Sections 18.1 - 18.3 (Chapter 11 Sections 11.1 – 11.3 abridged)


Download ppt "Business Statistics - QBM117 Scatter diagrams and measures of association."

Similar presentations


Ads by Google