BUS173: Applied Statistics

Slides:



Advertisements
Similar presentations
Chapter 3, Numerical Descriptive Measures
Advertisements

Measures of Dispersion
Statistics for Managers using Microsoft Excel 6th Edition
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Basic Business Statistics (10th Edition)
© 2002 Prentice-Hall, Inc.Chap 3-1 Basic Business Statistics (8 th Edition) Chapter 3 Numerical Descriptive Measures.
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics.
Chap 3-1 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 3 Describing Data: Numerical.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 3-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Statistics for Managers Using Microsoft® Excel 4th Edition
1 Business 260: Managerial Decision Analysis Professor David Mease Lecture 1 Agenda: 1) Course web page 2) Greensheet 3) Numerical Descriptive Measures.
1 Pertemuan 02 Ukuran Numerik Deskriptif Matakuliah: I0262-Statistik Probabilitas Tahun: 2007.
Basic Business Statistics 10th Edition
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Statistics for Managers using Microsoft Excel 6th Global Edition
Business and Economics 7th Edition
© 2003 Prentice-Hall, Inc.Chap 3-1 Business Statistics: A First Course (3 rd Edition) Chapter 3 Numerical Descriptive Measures.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 3 Describing Data Using Numerical Measures.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Describing Data: Numerical
Numerical Descriptive Measures
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Statistics for Managers.
Numerical Descriptive Techniques
Chapter 3 – Descriptive Statistics
Modified by ARQ, from © 2002 Prentice-Hall.Chap 3-1 Numerical Descriptive Measures Chapter %20ppts/c3.ppt.
Chap 2-1 Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall Measures of Central Tendency Measures of Variability Covariance and Correlation.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Lecture 3 Describing Data Using Numerical Measures.
Variation This presentation should be read by students at home to be able to solve problems.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Business Statistics, A First Course.
Business Statistics Spring 2005 Summarizing and Describing Numerical Data.
Basic Business Statistics Chapter 3: Numerical Descriptive Measures Assoc. Prof. Dr. Mustafa Yüzükırmızı.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures (Summary Measures) Basic Business Statistics.
Describing Data Descriptive Statistics: Central Tendency and Variation.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Basic Business Statistics 11 th Edition.
Statistical Methods © 2004 Prentice-Hall, Inc. Week 3-1 Week 3 Numerical Descriptive Measures Statistical Methods.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Chapter 2 Describing Data: Numerical
Yandell – Econ 216 Chap 3-1 Chapter 3 Numerical Descriptive Measures.
Descriptive Statistics ( )
Measures of Dispersion
Statistics for Managers Using Microsoft® Excel 5th Edition
Business and Economics 6th Edition
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Descriptive Statistics: Numerical Methods
Ch 4 實習.
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Business and Economic Forecasting
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Numerical Descriptive Measures
Quartile Measures DCOVA
Numerical Descriptive Measures
Numerical Descriptive Measures
Numerical Descriptive Measures
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Measures of Central Tendency Measures of Variability
Business and Economics 7th Edition
Numerical Descriptive Measures
Presentation transcript:

BUS173: Applied Statistics Instructor: Nahid Farnaz (Nhn) North South University BUS173: Applied Statistics Chapter 3 Describing Data: Numerical

Measures of Central Tendency Overview Central Tendency Mean In this chapter, we describe data numerically with measures of central tendency measures of variability measures of the direction and strength of relationship between two variables. Median Mode Arithmetic average Midpoint of ranked values Most frequently observed value

Arithmetic Mean The arithmetic mean (mean) is the most common measure of central tendency For a population of N values: For a sample of size n: Population values Population size Observed values Sample size

Arithmetic Mean The most common measure of central tendency (continued) The most common measure of central tendency Mean = sum of values divided by the number of values Affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 Mean = 4

Median In an ordered list, the median is the “middle” number (50% above, 50% below) Not affected by extreme values 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Median = 3 Median = 3

Finding the Median The location of the median: If the number of values is odd, the median is the middle number If the number of values is even, the median is the average of the two middle numbers Note that is not the value of the median, only the position of the median in the ranked data

Mode A measure of central tendency Value that occurs most often Not affected by extreme values Used for either numerical or categorical data There may may be no mode There may be several modes 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 No Mode Mode = 9

Which measure of location is the “best”? Mean is generally used, unless extreme values (outliers) exist Then median is often used, since the median is not sensitive to extreme values. Example: Median home prices may be reported for a region – less sensitive to outliers

Shape of a Distribution Describes how data are distributed Measures of shape Symmetric or skewed Left-Skewed Symmetric Right-Skewed Mean < Median Mean = Median Median < Mean

Exercise 3.2 A department store manager is interested in the number of complaints received by the customer service dept. about the quality of electrical products sold. Records over a 5 week period show the following number of complaints for each week: 13, 15, 8, 16, 8 Compute the mean number of weekly complaints Compute the median Find the mode Ans: a.12; b.13; c.8 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.

Ex. 3.6 The demand for bottled water increases during the hurricane season in Florida. A random sample of 7 hours showed that the following numbers of 1 gallon bottles were sold in one store: 40, 55, 62, 43, 50, 60, 65. Describe the central tendency of the data Comment on symmetry or skewness Ans: Mean=53.57; no unique mode; median=55; left skewed Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.

Measures of Variability Variation Range Interquartile Range Variance Standard Deviation Coefficient of Variation Measures of variation give information on the spread or variability of the data values. Same center, different variation

Range = Xlargest – Xsmallest Simplest measure of variation Difference between the largest and the smallest observations: Range = Xlargest – Xsmallest Example: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13

Disadvantages of the Range Ignores the way in which data are distributed Sensitive to outliers 7 8 9 10 11 12 7 8 9 10 11 12 Range = 12 - 7 = 5 Range = 12 - 7 = 5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 Range = 5 - 1 = 4 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = 120 - 1 = 119

Interquartile Range Can eliminate some outlier problems by using the interquartile range Eliminate high- and low-valued observations and calculate the range of the middle 50% of the data Interquartile range = 3rd quartile – 1st quartile IQR = Q3 – Q1

Quartile Formulas Find a quartile by determining the value in the appropriate position in the ranked data, where First quartile position: Q1 = 0.25(n+1) Second quartile position: Q2 = 0.50(n+1) (the median position) Third quartile position: Q3 = 0.75(n+1) where n is the number of observed values

Minimum < Q1 < Median Q2 < Q3 < Maximum Interquartile Range Example: Median (Q2) X X Q1 Q3 maximum minimum 25% 25% 25% 25% 12 30 45 57 70 Interquartile range = 57 – 30 = 27 Five number summary: Minimum < Q1 < Median Q2 < Q3 < Maximum

Quartiles Example: Find the first quartile (n = 9) Sample Ranked Data: 11 12 13 16 16 17 18 21 22 (n = 9) Q1 = is in the 0.25(9+1) = 2.5 position of the ranked data so use the value half way between the 2nd and 3rd values, so Q1 = 12.5

Example 3.4: Waiting Times at Gilotti’s Grocery (five number summary) A stem and leaf display for a sample of 25 waiting times (in seconds) is given below. Compute the 5 number summary. Calculate and interpret IQR Minutes N = 25 Stem Unit =10.0; Leaf Unit = 1.0 1 2 4 6 7 8 9 3

Population Variance Average of squared deviations of values from the mean Population variance: Where = population mean N = population size xi = ith value of the variable x

Sample Variance Average (approximately) of squared deviations of values from the mean Sample variance: Where = sample mean n = sample size Xi = ith value of the variable X

Population Standard Deviation Most commonly used measure of variation Shows variation about the mean Has the same units as the original data Population standard deviation:

Sample Standard Deviation Most commonly used measure of variation Shows variation about the mean Has the same units as the original data Sample standard deviation:

Example: Sample Standard Deviation Sample Data (xi) : 10 12 14 15 17 18 18 24 n = 8 Mean = x = 16 A measure of the “average” scatter around the mean

Example 3.5 A professor teaches two large sections of marketing and randomly selects a sample of test scores from both sections. Find the range and standard deviation from each sample: Section 1 50 60 70 80 90 Section 2 72 68 74 66 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.

Chebyshev’s Theorem This theorem established data intervals for any data set, regardless of the shape and distribution For any population with mean μ and standard deviation σ , and k > 1 , the percentage of observations that fall within the interval [μ kσ] Is at least Where k is the number of standard deviations.

Chebyshev’s Theorem (continued) Regardless of how the data are distributed, at least (1 - 1/k2) of the values will fall within k standard deviations of the mean (for k > 1) Examples: (1 - 1/12) = 0% ……..... k=1 (μ ± 1σ) (1 - 1/22) = 75% …........ k=2 (μ ± 2σ) (1 - 1/32) = 89% ………. k=3 (μ ± 3σ) At least within

The Empirical Rule If the data distribution is bell-shaped, then the interval: contains about 68% of the values in the population or the sample 68%

The Empirical Rule contains about 95% of the values in the population or the sample contains about 99.7% of the values in the population or the sample 95% 99.7%

Exercise 3.18 Use Chebychev’s theorem to approximate each of the following observations if the mean is 250 and S.D. is 20. Approximately what proportion of the observations is Between 190 and 310? Between 210 and 290? Between 230 and 270? Ans: a) 88.9% k=3; b) 75% k=2; c) 0% k=1

Coefficient of Variation Measures relative variation Always in percentage (%) Shows standard deviation relative to mean Can be used to compare two or more sets of data measured in different units

Comparing Coefficient of Variation Stock A: Average price last year = $50 Standard deviation = $5 Stock B: Average price last year = $100 Both stocks have the same standard deviation, but stock B is less variable relative to its price

The Sample Covariance The covariance measures the strength of the linear relationship between two variables The population covariance: The sample covariance: Only concerned with the strength of the relationship No causal effect is implied

Interpreting Covariance Covariance between two variables: Cov(x,y) > 0 x and y tend to move in the same direction Cov(x,y) < 0 x and y tend to move in opposite directions Cov(x,y) = 0 x and y are independent

Correlation Coefficient Measures the relative strength of the linear relationship between two variables Population correlation coefficient: Sample correlation coefficient:

Features of Correlation Coefficient, r Unit free Ranges between –1 and 1 The closer to –1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship The closer to 0, the weaker any positive linear relationship

Scatter Plots of Data with Various Correlation Coefficients Y Y Y some examples of scatter plots and their corresponding correlation coefficients. X X X r = -1 r = -.6 r = 0 Y Y Y X X X r = +1 r = +.3 r = 0

Interpreting the Result There is a relatively strong positive linear relationship between test score #1 and test score #2 Students who scored high on the first test tended to score high on second test

Example 3.13 (Covariance and Correlation Coefficient) Rising Hills Manufacturing Inc. wishes to study the relationship between the number of workers, X, and the number of tables produced, Y. it has obtained a random sample of 10 hours of production. The following (x,y) combinations of points were obtained: (12,20) (30,60) (15,27) (24,50) (14,21) (18,30) (28,61) (26,54) (19,32) (27,57) Compute the covariance and correlation coefficient. Briefly discuss the relationship between X and Y.

Obtaining Linear Relationships An equation can be fit to show the best linear relationship between two variables: Y = β0 + β1X Where Y is the dependent variable and X is the independent variable

Least Squares Regression Estimates for coefficients β0 and β1 are found to minimize the sum of the squared residuals The least-squares regression line, based on sample data, is Where b1 is the slope of the line and b0 is the y-intercept:

Example 3.15 Using the answers from Example 3.13 (Rising Hills Inc.), compute the sample regression coefficients b1 and b0. What is the sample regression line? Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.