Descriptive Statistics: Numerical Methods Chapter 3 Descriptive Statistics: Numerical Methods
Descriptive Statistics 3.1 Describing Central Tendency 3.2 Measures of Variation 3.3 Percentiles, Quartiles and Box-and-Whiskers Displays 3.4 Covariance, Correlation, and the Least Square Line 3.5 Weighted Means and Grouped Data (Optional) 3.6 The Geometric Mean (Optional)
Describing Central Tendency In addition to describing the shape of a distribution, want to describe the data set’s central tendency A measure of central tendency represents the center or middle of the data
Parameters and Statistics A population parameter is a number calculated from all the population measurements that describes some aspect of the population A sample statistic is a number calculated using the sample measurements that describes some aspect of the sample
Measures of Central Tendency Mean, The average or expected value Median, Md The value of the middle point of the ordered measurements Mode, Mo The most frequent value
The Mean Population X1, X2, …, XN m Population Mean Sample x1, x2, …, xn Sample Mean
The Sample Mean For a sample of size n, the sample mean is defined as and is a point estimate of the population mean It is the value to expect, on average and in the long run
Example 3.1: The Car Mileage Case Example 3.1: Sample mean for first five car mileages from Table 3.1: 30.8, 31.7, 30.1, 31.6, 32.1
The Median The median Md is a value such that 50% of all measurements, after having been arranged in numerical order, lie above (or below) it If the number of measurements is odd, the median is the middlemost measurement in the ordering If the number of measurements is even, the median is the average of the two middlemost measurements in the ordering
Example: Car Mileage Case Example 3.1: First five observations from Table 3.1: 30.8, 31.7, 30.1, 31.6, 32.1 In order: 30.1, 30.8, 31.6, 31.7, 32.1 There is an odd so median is one in middle, or 31.6
The Mode The mode Mo of a population or sample of measurements is the measurement that occurs most frequently Modes are the values that are observed “most typically” Sometimes higher frequencies at two or more values If there are two modes, the data is bimodal If more than two modes, the data is multimodal When data are in classes, the class with the highest frequency is the modal class The tallest box in the histogram
Histogram Describing the 50 Mileages
Relationships Among Mean, Median and Mode
Measures of Variation Knowing the measures of central tendency is not enough Both of the distributions below have identical measures of central tendency
Measures of Variation Range Largest minus the smallest measurement Variance The average of the squared deviations of all the population measurements from the population mean Standard The square root of the variance Deviation
The Range Largest minus smallest Measures the interval spanned by all the data For Figure 3.13, largest repair time is 5 and smallest is 3 Range is 5 – 3 = 2 days
Population Variance and Standard Deviation The population variance (σ2) is the average of the squared deviations of the individual population measurements from the population mean (µ) The population standard deviation (σ) is the positive square root of the population variance
Variance For a population of size N, the population variance σ2 is: For a sample of size n, the sample variance s2 is:
Standard Deviation Population standard deviation (σ): Sample standard deviation (s):
Example: Chris’s Class Sizes This Semester Data points are: 60, 41, 15, 30, 34 Mean is 36 Variance is: Standard deviation is:
Example: Sample Variance and Standard Deviation Example 3.7: data for first five car mileages from Table 3.1 are 30.8, 31.7, 30.1, 31.6, 32.1 The sample mean is 31.26
The Empirical Rule for Normal Populations If a population has mean µ and standard deviation σ and is described by a normal curve, then 68.26% of the population measurements lie within one standard deviation of the mean: [µ-σ, µ+σ] 95.44% of the population measurements lie within two standard deviations of the mean: [µ-2σ, µ+2σ] 99.73% of the population measurements lie within three standard deviations of the mean: [µ-3σ, µ+3σ]
The Empirical Rule and Tolerance Intervals
Example 3.9: The Car Mileage Case Continued 68.26% of all individual cars will have mileages in the range [x±s] = [31.6±0.8] = [30.8, 32.4] mpg 95.44% of all individual cars will have mileages in the range [x±2s] = [31.6±1.6] = [30.0, 33.2] mpg 99.73% of all individual cars will have mileages in the range [x±3s] = [31.6±2.4] = [29.2, 34.0] mpg
Estimated Tolerance Intervals in the Car Mileage Case
Chebyshev’s Theorem Let µ and σ be a population’s mean and standard deviation, then for any value k> 1 At least 100(1 - 1/k2 )% of the population measurements lie in the interval [µ-kσ, µ+kσ] Only practical for non-mound-shaped distribution population that is not very skewed
z Scores For any x in a population or sample, the associated z score is The z score is the number of standard deviations that x is from the mean A positive z score is for x above (greater than) the mean A negative z score is for x below (less than) the mean
Example: z Score Population of profit margins for five American companies: 8%, 10%, 15%, 12%, 5% µ = 10%, σ = 3.406%
Coefficient of Variation Measures the size of the standard deviation relative to the size of the mean Coefficient of variation =standard deviation/mean × 100% Used to: Compare the relative variabilities of values about the mean Compare the relative variability of populations or samples with different means and different standard deviations Measure risk