Presentation on theme: "COMPLETE BUSINESS STATISTICS"— Presentation transcript:
1COMPLETE BUSINESS STATISTICS byAMIR D. ACZEL&JAYAVEL SOUNDERPANDIAN6th edition.Prepared by Lloyd Jaisingh, Morehead State University
2Introduction and Descriptive Statistics 1Using StatisticsPercentiles and QuartilesMeasures of Central TendencyMeasures of VariabilityGrouped Data and the HistogramSkewness and KurtosisRelations between the Mean and Standard DeviationMethods of Displaying DataExploratory Data AnalysisUsing the Computer
3LEARNING OBJECTIVESAfter studying this chapter, you should be able to:Distinguish between qualitative data and quantitative data.Describe nominal, ordinal, interval, and ratio scales of measurements.Describe the difference between population and sample.Calculate and interpret percentiles and quartiles.Explain measures of central tendency and how to compute them.Create different types of charts that describe data sets.Use Excel templates to compute various measures and create charts.
4WHAT IS STATISTICS?Statistics is a science that helps us make better decisions in business and economics as well as in other fields.Statistics teaches us how to summarize, analyze, and draw meaningful inferences from data that then lead to improve decisions.These decisions that we make help us improve the running, for example, a department, a company, the entire economy, etc.
51-1. Using Statistics (Two Categories) Descriptive StatisticsCollectOrganizeSummarizeDisplayAnalyzeInferential StatisticsPredict and forecast values of population parametersTest hypotheses about values of population parametersMake decisions
6Types of Data - Two Types (p.28) Qualitative Categorical or Nominal: Examples are-ColorGenderNationalityQuantitative Measurable or Countable: Examples are-TemperaturesSalariesNumber of points scored on a 100 point exam
7Scales of Measurement (p.28-29) Analytical or metric typeInterval scaleRatio scaleCategorical or nonmertric typeNominal scaleOrdinal scale
8Samples and Populations P.29 A population consists of the set of all measurements for which the investigator is interested.A sample is a subset of the measurements selected from the population.A census is a complete enumeration of every item in a population.
9Simple Random SampleSampling from the population is often done randomly, such that every possible sample of equal size (n) will have an equal chance of being selected.A sample selected in this way is called a simple random sample or just a random sample.A random sample allows chance to determine its elements.
10Samples and Populations Population (N)Sample (n)
11Why Sample? Census of a population may be: Impossible Impractical Too costly
131-2 Percentiles and Quartiles Given any set of numerical observations, order them according to magnitude.The Pth percentile in the ordered set is that value below which lie P% (P percent) of the observations in the set.The position of the Pth percentile is given by (n + 1)P/100, where n is the number of observations in the set.
14Example 1-2 (p.33)A large department store collects data on sales made by each of its salespeople. The number of sales made on a given day by each of 20 salespeople is shown on the next slide. Also, the data has been sorted in magnitude.
16Example 1-2 (Continued) Percentiles Find the 50th, 80th, and the 90th percentiles of thisdata set.To find the 50th percentile, determine the data pointin position (n + 1)P/100 = (20 + 1)(50/100) = 10.5.Thus, the percentile is located at the 10.5th position.The 10th observation is 16, and the 11th observation isalso 16.The 50th percentile will lie halfway between the 10thand 11th values and is thus 16.
17Example 1-2 (Continued) Percentiles To find the 80th percentile, determine the data pointin position (n + 1)P/100 = (20 + 1)(80/100) = 16.8.Thus, the percentile is located at the 16.8th position.The 16th observation is 19, and the 17th observationis also 20.The 80th percentile is a point lying 0.8 of the wayfrom 19 to 20 and is thus 19.8.
18Example 1-2 (Continued) Percentiles To find the 90th percentile, determine the data pointin position (n + 1)P/100 = (20 + 1)(90/100) = 18.9.Thus, the percentile is located at the 18.9th position.The 18th observation is 21, and the 19th observationis also 22.The 90th percentile is a point lying 0.9 of the wayfrom 21 to 22 and is thus 21.9.Example 1-2
19Quartiles – Special Percentiles ,p.35) Quartiles are the percentage points thatbreak down the ordered data set into quarters.The first quartile is the 25th percentile. It is thepoint below which lie 1/4 of the data.The second quartile is the 50th percentile. It is thepoint below which lie 1/2 of the data. This is alsocalled the median.The third quartile is the 75th percentile. It is thepoint below which lie 3/4 of the data.
20Quartiles and Interquartile Range The first quartile, Q1, (25th percentile) isoften called the lower quartile.The second quartile, Q2, (50thpercentile) is often called median or themiddle quartile.The third quartile, Q3, (75th percentile)is often called the upper quartile.The interquartile range is thedifference between the first and the thirdquartiles.
25Summary Measures: Population Parameters Sample Statistics Measures of Central Tendency(衡量集中傾向)Median 中位數Mode 眾數Mean 平均數Measures of Variability(衡量變異性)Range 全距Interquartile range 四分位間距Variance 變異數Standard Deviation 標準差Other summary measures: 其他Skewness 偏態Kurtosis 峰態
261-3 Measures of Central Tendency or Location(p.36) Median 中位數Middle value whensorted in order ofmagnitude50th percentileMode 眾數Most frequently-occurring valueMean 平均數Average
27Example – Median (Data is used from Example 1-2) Sales Sorted SalesSee slide # 19 for the template outputMedian50th Percentile(20+1)50/100=10.516 + (.5)(0) = 16MedianThe median is the middle value of data sorted in order of magnitude. It is the 50th percentile.
28Example - Mode (Data is used from Example 1-2) See slide # 19 for the template output.: . : : :Mode = 16The mode is the most frequently occurring value. It is the value with the highest frequency.
29Arithmetic Mean or Average The mean(平均數) of a set of observations is their average - the sum of the observed values divided by the number of observations.Population Mean母體平均數Sample Mean樣本平均數m=åxNi1xni=å1
30Example – Mean (Data is used from Example 1-2) Sales9612101315161417242122181920317xni=å1317201585.See slide # 19 for the template output
31Example - Mode (Data is used from Example 1-2) .: . : : :Mean = 15.85Median and Mode = 16每一點代表一個數值See slide # 19 for the template output
32Exercise, p.40, 5 min 例1- 4 1-13 ~ 1-16 (See Textbook p.698) 1-17(Ans：mean=592.93, median=566,LQ=546, UQ=618.75Outlier=940,suspected outlier=399)
331-4 Measures of Variability or Dispersion (p.40) Range 全距Difference between maximum and minimum valuesInterquartile Range 四分位數間距Difference between third and first quartile (Q3 - Q1)Variance 變異數Average*of the squared deviations from the meanStandard Deviation 標準差Square root of the varianceDefinitions of population variance and sample variance differ slightly.
34Example - Range and Interquartile Range (Data is used from Example 1-2) SortedSales Sales RankRangeMaximum - Minimum == 18MinimumQ1 = 13 + (.25)(1) = 13.25First QuartileQ3 = 18+ (.75)(1) = 18.75Third QuartileInterquartile RangeQ3 - Q1 == 5.5Maximum
35Variance and Standard Deviation Population Variance母體變異數Sample Variance樣本變異數nNå(x-x)2å(x-m)2s=2i=1s=()2i=1n-1N()()2Nn2xxååNå=n=-i1åx-i1x22Nn==i=1i=1()Nn-1s=s2s=s2
401-5 Group Data and the Histogram 群聚數據與直方圖 Dividing data into groups or classes or intervalsGroups should be:Mutually exclusive 群間互斥Not overlapping - every observation is assigned to only one groupExhaustive 完全分群Every observation is assigned to a groupEqual-width (if possible) 等寬First or last group may be open-ended
41Frequency Distribution頻率分配 Table with two columns兩行 listing:Each and every group or class or interval of valuesAssociated frequency of each groupNumber of observations assigned to each groupSum of frequencies is number of observationsN for populationn for sampleClass midpoint組中點 is the middle value of a group or class or intervalRelative frequency相對頻率 is the percentage of total observations in each classSum of relative frequencies = 1
42Example 1-7: Frequency Distribution p.47 x f(x) f(x)/nSpending Class ($) Frequency (number of customers) Relative Frequency0 to less than100 to less than200 to less than300 to less than400 to less than500 to less thanExample of relative frequency: 30/184 = 0.163Sum of relative frequencies = 1
43Cumulative Frequency Distribution x F(x) F(x)/nSpending Class ($) Cumulative Frequency Cumulative Relative Frequency0 to less than100 to less than200 to less than300 to less than400 to less than500 to less thanThe cumulative frequency累積頻率 of each group is the sum of thefrequencies of that and all preceding groups.
45Histogram 直方圖A histogram is a chart made of bars of different heights. 不同高度之條狀圖Widths and locations of bars correspond to widths and locations of data groupings 寬度與位置代表群組的資料寬度與位置Heights of bars correspond to frequencies or relative frequencies of data groupings 高度代表頻率
481-6 Skewness偏度 and Kurtosis峰度 p.49 Measure of asymmetry of a frequency distributionSkewed to left 左偏 <0Symmetric or unskewed 對稱Skewed to right 右偏 >0KurtosisMeasure of flatness or peakedness of a frequency distributionPlatykurtic (relatively flat)Mesokurtic (normal)Leptokurtic (relatively peaked) *公示如p.51
52Kurtosis扁度值越小, 越平扁Platykurtic平扁 - flat distribution
53KurtosisMesokurtic - not too flat and not too peaked
54Kurtosis扁度值越大, 越尖突Leptokurtic尖扁 - peaked distribution
551-7 Relations between the Mean and Standard Deviation p.51 (重要) Chebyshev’s Theorem柴比雪夫定理Applies to any distribution, regardless of shape 可應用於任何分配之數據Places lower limits on the percentages of observations within a given number of standard deviations from the meanEmpirical Ruler 經驗法則Applies only to roughly mound-shaped and symmetric distributions 適用山型與對稱之數據Specifies approximate percentages of observations within a given number of standard deviations from the mean
56Chebyshev’s TheoremAt least of the elements of any distribution lie within k standard deviations of the mean234Standarddeviationsof the meanAtleastLiewithin
57Empirical Rule 經驗法則For roughly mound-shaped and symmetric distributions, approximately:
591-8 Methods of Displaying Data Pie Charts 圓餅圖Categories represented as percentages of totalBar Graphs 直條圖Heights of rectangles represent group frequenciesFrequency Polygons 頻率圖Height of line represents frequencyOgives 累加頻率圖Height of line represents cumulative frequencyTime Plots 時間圖Represents values over time
651-9 Exploratory Data Analysis – EDA探索性資料分析 Techniques to determine relationships關係 and trends趨勢, identify outliers離群值 and influential有影響的 observations, and quickly describe快速描述 or summarize總結 data sets.Stem-and-Leaf Displays 莖葉Quick-and-dirty listing of all observations 快速瀏覽所有觀測值Conveys some of the same information as a histogram 將資料轉化成直方圖Box Plots 盒形圖MedianLower and upper quartilesMaximum and minimum
67Box Plot 盒形圖 p.62 Elements of a Box Plot * o Q1 Q3 Inner Fence Outer MedianQ1Q3InnerFenceOuterInterquartile RangeSmallest data point not below inner fenceLargest data point not exceeding inner fenceSuspected outlierOutlierQ1-3(IQR)Q1-1.5(IQR)Q3+1.5(IQR)Q3+3(IQR)離群值一半數據在盒內IQR