Review of Measures of Central Tendency, Dispersion & Association

Slides:



Advertisements
Similar presentations
Appendix A. Descriptive Statistics Statistics used to organize and summarize data in a meaningful way.
Advertisements

Introduction to Summary Statistics
Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.
Calculating & Reporting Healthcare Statistics
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics.
Chap 3-1 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 3 Describing Data: Numerical.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Numerical Descriptive Techniques
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Intro to Descriptive Statistics
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Measures of Central Tendency
Numerical Descriptive Techniques
1 Descriptive Statistics: Numerical Methods Chapter 4.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.1 Chapter Four Numerical Descriptive Techniques.
1 Tendencia central y dispersión de una distribución.
Economics 173 Business Statistics Lecture 2 Fall, 2001 Professor J. Petry
@ 2012 Wadsworth, Cengage Learning Chapter 5 Description of Behavior Through Numerical 2012 Wadsworth, Cengage Learning.
Objective To understand measures of central tendency and use them to analyze data.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.1 Chapter Four Numerical Descriptive Techniques.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Statistics for Managers.
With Statistics Workshop with Statistics Workshop FunFunFunFun.
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
BIOSTAT - 2 The final averages for the last 200 students who took this course are Are you worried?
Numerical Descriptive Techniques
Chapter 3 – Descriptive Statistics
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
JDS Special Program: Pre-training1 Basic Statistics 01 Describing Data.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Business Statistics: Communicating with Numbers
4 - 1 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Descriptive Statistics: Numerical Methods
Review of Measures of Central Tendency, Dispersion & Association
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Variation This presentation should be read by students at home to be able to solve problems.
1 Economics 173 Business Statistics Lectures 1 & 2 Summer, 2001 Professor J. Petry.
According to researchers, the average American guy is 31 years old, 5 feet 10 inches, 172 pounds, works 6.1 hours daily, and sleeps 7.7 hours. These numbers.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Business Statistics, A First Course.
Finance 300 Financial Markets Lecture 3 Fall, 2001© Professor J. Petry
Economics 173 Business Statistics Lectures 1 Fall, 2001 Professor J. Petry.
Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Why do we analyze data?  It is important to analyze data because you need to determine the extent to which the hypothesized relationship does or does.
CHAPTER 2: Basic Summary Statistics
EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.
Statistical Methods © 2004 Prentice-Hall, Inc. Week 3-1 Week 3 Numerical Descriptive Measures Statistical Methods.
Statistics -Descriptive statistics 2013/09/30. Descriptive statistics Numerical measures of location, dispersion, shape, and association are also used.
MM150 ~ Unit 9 Statistics ~ Part II. WHAT YOU WILL LEARN Mode, median, mean, and midrange Percentiles and quartiles Range and standard deviation z-scores.
STATISTICS AND PROBABILITY IN CIVIL ENGINEERING
Descriptive Statistics ( )
Business and Economics 6th Edition
Numerical Descriptive Techniques
Chapter 4 Describing Data (Ⅱ ) Numerical Measures
Ch 4 實習.
Numerical Measures: Centrality and Variability
Numerical Descriptive Measures
Numerical Descriptive Measures
STA 291 Summer 2008 Lecture 4 Dustin Lueker.
Numerical Descriptive Statistics
Numerical Descriptive Measures
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
CHAPTER 2: Basic Summary Statistics
STA 291 Spring 2008 Lecture 4 Dustin Lueker.
Numerical Descriptive Measures
Presentation transcript:

Review of Measures of Central Tendency, Dispersion & Association Graphical Excellence Measures of Central Tendency Mean, Median, Mode Measures of Dispersion Variance, Standard Deviation, Range Measures of Association Covariance, Correlation Coefficient Relationship of basic stats to OLS

Graphical Excellence Learning from Monkeys

Why Graphs & Stats? Graphs and descriptive statistics when used properly can summarize lines of data effectively for the reader. What’s a good approximation of the age of students in this class? We use graphs and basic stats (Mean, Variance, Covariance) etc to highlight trends and to motivate the research question. We use other tools for analysis – Regression, Case Study, Content Analysis etc.

What story does this graph tell? What questions does the graph raise?

Graphical Excellence The graph presents large data sets concisely and coherently – label your axes The ideas and concepts to be delivered are clearly understood to the viewer – state the units used (EX: $ or $ in Mil. etc.)

What’s the problem here?

Graphical Excellence The display induces the viewer to address the substance of the data and not the form of the graph. – Select the appropriate type of graph (bar chart for levels, scatter plot for trends etc.) There is no distortion of what the data reveal. – Make sure the axes are not stretched or compressed to make a point

Do New Stadiums Bring People in?

Do New Stadiums Bring People in?

Things to be cautious about when observing a graph: Is there a missing scale on one axis. Do not be influenced by a graph’s caption. Are changes presented in absolute values only, or in percent form too.

Numerical Descriptive Measures Measures of Central Tendency Mean, Median, Mode Measures of Dispersion Variance, Standard Deviation Measures of Association Covariance, Correlation Coefficient

Arithmetic mean This is the most popular and useful measure of central location Sum of the measurements Number of measurements Mean = Sample mean Population mean Sample size Population size

Example 1 The mean of the sample of six measurements 7, 3, 9, -2, 4, 6 is given by 7 3 9 4 6 4.5 Example 2 Suppose the telephone bills of example 2.1 represent population of measurements. The population mean is 42.19 15.30 53.21 43.59

Example 3 When many of the measurements have the same value, the measurement can be summarized in a frequency table. Suppose the number of children in a sample of 16 employees were recorded as follows: NUMBER OF CHILDREN 0 1 2 3 NUMBER OF EMPLOYEES 3 4 7 2 16 employees

The median The median of a set of measurements is the value that falls in the middle when the measurements are arranged in order of magnitude. Example 4 Seven employee salaries were recorded (in 1000s) : 28, 60, 26, 32, 30, 26, 29. Find the median salary. Suppose one employee’s salary of $31,000 was added to the group recorded before. Find the median salary. Even number of observations First, sort the salaries. Then, locate the values in the middle First, sort the salaries. Then, locate the value in the middle Odd number of observations 26,26,28,29,30,32,60 There are two middle values! 26,26,28,29, 30,32,60,31 26,26,28,29, 30,32,60,31 26,26,28,29, 30,32,60,31 26,26,28,29,30,32,60,31 29.5,

The mode The mode of a set of measurements is the value that occurs most frequently. Set of data may have one mode (or modal class), or two or more modes. For large data sets the modal class is much more relevant than the a single- value mode. The modal class

Example 5 The manager of a men’s store observes the waist size (in inches) of trousers sold yesterday: 31, 34, 36, 33, 28, 34, 30, 34, 32, 40. The mode of this data set is 34 in. This information seems valuable (for example, for the design of a new display in the store), much more than “ the median is 33.2 in.”.

Relationship among Mean, Median, and Mode If a distribution is symmetrical, the mean, median and mode coincide If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ. A positively skewed distribution (“skewed to the right”) Mode Mean Median

If a distribution is symmetrical, the mean, median and mode coincide ` If a distribution is symmetrical, the mean, median and mode coincide If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ. A positively skewed distribution (“skewed to the right”) A negatively skewed distribution (“skewed to the left”) Mode Mean Mean Mode Median Median

Measures of variability (Looking beyond the average) Measures of central location fail to tell the whole story about the distribution. A question of interest still remains unanswered: How typical is the average value of all the measurements in the data set? or How much spread out are the measurements about the average value?

Observe two hypothetical data sets Low variability data set The average value provides a good representation of the values in the data set. High variability data set This is the previous data set. It is now changing to... The same average value does not provide as good presentation of the values in the data set as before.

The range The range of a set of measurements is the difference between the largest and smallest measurements. Its major advantage is the ease with which it can be computed. Its major shortcoming is its failure to provide information on the dispersion of the values between the two end points. But, how do all the measurements spread out? ? ? ? The range cannot assist in answering this question Range Smallest measurement Largest measurement

The variance This measure of dispersion reflects the values of all the measurements. The variance of a population of N measurements x1, x2,…,xN having a mean m is defined as The variance of a sample of n measurements x1, x2, …,xn having a mean is defined as Excel uses Varp formula Excel uses Var formula

A B Consider two small populations: Population A: 8, 9, 10, 11, 12 Population B: 4, 7, 10, 13, 16 9-10= -1 11-10= +1 8-10= -2 12-10= +2 Thus, a measure of dispersion is needed that agrees with this observation. Sum = 0 Let us start by calculating the sum of deviations The sum of deviations is zero in both cases, therefore, another measure is needed. A 8 9 10 11 12 …but measurements in B are much more dispersed then those in A. The mean of both populations is 10... 4-10 = - 6 16-10 = +6 B 7-10 = -3 4 7 10 13 16 13-10 = +3

A B The sum of squared deviations is used in calculating the variance. 9-10= -1 The sum of squared deviations is used in calculating the variance. See example next. 11-10= +1 8-10= -2 12-10= +2 Sum = 0 The sum of deviations is zero in both cases, therefore, another measure is needed. A 8 9 10 11 12 4-10 = - 6 16-10 = +6 B 7-10 = -3 4 7 10 13 16 13-10 = +3

Let us calculate the variance of the two populations Why is the variance defined as the average squared deviation? Why not use the sum of squared deviations as a measure of dispersion instead? After all, the sum of squared deviations increases in magnitude when the dispersion of a data set increases!!

Which data set has a larger dispersion? Let us calculate the sum of squared deviations for both data sets However, when calculated on “per observation” basis (variance), the data set dispersions are properly ranked Data set B is more dispersed around the mean A B 1 2 3 1 3 5 sA2 = SumA/N = 10/5 = 2 sB2 = SumB/N = 8/2 = 4 SumA = (1-2)2 +…+(1-2)2 +(3-2)2 +… +(3-2)2= 10 5 times 5 times ! SumB = (1-3)2 + (5-3)2 = 8

=[3.42+2.52+…+3.72]-[(17.7)2/6] = 1.075 (years)2 Example 6 Find the mean and the variance of the following sample of measurements (in years). 3.4, 2.5, 4.1, 1.2, 2.8, 3.7 Solution A shortcut formula =[3.42+2.52+…+3.72]-[(17.7)2/6] = 1.075 (years)2

The standard deviation of a set of measurements is the square root of the variance of the measurements. Example 4.9 Rates of return over the past 10 years for two mutual funds are shown below. Which one have a higher level of risk? Fund A: 8.3, -6.2, 20.9, -2.7, 33.6, 42.9, 24.4, 5.2, 3.1, 30.05 Fund B: 12.1, -2.8, 6.4, 12.2, 27.8, 25.3, 18.2, 10.7, -1.3, 11.4

Solution Let us use the Excel printout that is run from the “Descriptive statistics” sub-menu (use file Xm04-10) Fund A should be considered riskier because its standard deviation is larger

The coefficient of variation The coefficient of variation of a set of measurements is the standard deviation divided by the mean value. This coefficient provides a proportionate measure of variation. A standard deviation of 10 may be perceived as large when the mean value is 100, but only moderately large when the mean value is 500

Interpreting Standard Deviation The standard deviation can be used to compare the variability of several distributions make a statement about the general shape of a distribution.

Measures of Association Two numerical measures are presented, for the description of linear relationship between two variables depicted in the scatter diagram. Covariance - is there any pattern to the way two variables move together? Correlation coefficient - how strong is the linear relationship between two variables

The covariance Excel uses this formula to calculate Cov mx (my) is the population mean of the variable X (Y) N is the population size. n is the sample size. NOTE: The formula in Excel does not give you sample covariance

If the two variables move the same direction, (both increase or both decrease), the covariance is a large positive number. If the two variables move in two opposite directions, (one increases when the other one decreases), the covariance is a large negative number. If the two variables are unrelated, the covariance will be close to zero.

The coefficient of correlation This coefficient answers the question: How strong is the association between X and Y.

r or r = +1 Strong positive linear relationship or -1 -1 Strong positive linear relationship COV(X,Y)>0 or r or r = No linear relationship COV(X,Y)=0 Strong negative linear relationship COV(X,Y)<0

If the two variables are very strongly positively related, the coefficient value is close to +1 (strong positive linear relationship). If the two variables are very strongly negatively related, the coefficient value is close to -1 (strong negative linear relationship). No straight line relationship is indicated by a coefficient close to zero.

Example 7 Compute the covariance and the coefficient of correlation to measure how advertising expenditure and sales level are related to one another. Base your calculation on the data provided in example 2.3

Use the procedure below to obtain the required summations x y xy x2 y2 Similarly, sy = 8.839

Excel printout Interpretation The covariance (10.2679) indicates that advertisement expenditure and sales levelare positively related The coefficient of correlation (.797) indicates that there is a strong positive linear relationship between advertisement expenditure and sales level. Covariance matrix Correlation matrix

The actual y value of point i The Least Squares Method We are seeking a line that best fit the data We define “best fit line” as a line for which the sum of squared differences between it and the data points is minimized. The y value of point i calculated from the equation of the line The actual y value of point i

Different lines generate different errors, Y Errors X Different lines generate different errors, thus different sum of squares of errors.

The coefficients b0 and b1 of the line that minimizes the sum of squares of errors are calculated from the data.