Presentation is loading. Please wait.

Presentation is loading. Please wait.

MAT 446 Supplementary Note for Ch 1

Similar presentations


Presentation on theme: "MAT 446 Supplementary Note for Ch 1"— Presentation transcript:

1 MAT 446 Supplementary Note for Ch 1
Myung Song, Ph.D. © 2009 W.H. Freeman and Company

2 Key Statistical Concepts
Keller: Stats for Mgmt&Econ, 7th Ed. May 14, 2018 Key Statistical Concepts Population - a population is the group of all items of interest to a statistics practitioner. - frequently very large; sometimes infinite. E.g. All Florida voters Sample - A sample is a set of data drawn from the population. - Potentially very large, but less than the population. E.g. a sample of 765 voters exit polled on election day.

3 Key Statistical Concepts
Parameter - A descriptive measure of a population. Statistic - A descriptive measure of a sample.

4 Key Statistical Concepts
Population Sample Subset Statistic Parameter Populations have Parameters, Samples have Statistics.

5 Stem plot (stem-and-leaf)
If your data set is the age of 12 persons: {9,9, 22, 32, 33, 39, 39,, 42, 49, 52, 58, 70} How to make a stem plot: A plot where each data value is split into a "leaf" (usually the last digit) and a "stem" (the other digits). For example "32" would be split into "3" (stem) and "2" (leaf). The "stem" values are listed down, and the "leaf" values are listed next to them. This way the "stem" groups the scores and each "leaf" indicates a score within that group. STEM LEAVES

6 Stem plot (stem-and-leaf)
To compare two related distributions, a back-to-back stem plot with common stems is useful. Stem plots do not work well for large datasets. When the observed values have too many digits, trim the numbers before making a stem plot. When plotting a moderate number of observations, you can split each stem.

7 Histograms Example: Weight Data―Introductory Statistics Class

8 Histograms Example: Weight Data―Introductory Statistics Class

9 Describing Distributions
A distribution is symmetric if the right and left sides of the graph are approximately mirror images of each other. A distribution is skewed to the right (right-skewed) if the right side of the graph (containing the half of the observations with larger values) is much longer than the left side. It is skewed to the left (left-skewed) if the left side of the graph is much longer than the right side. Symmetric Skewed-left Skewed-right

10 Pie Charts and Bar Graphs
US Solid Waste (2000) Material Weight (million tons) Percent of total Food scraps 25.9 11.2% Glass 12.8 5.5 % Metals 18.0 7.8 % Paper, paperboard 86.7 37.4 % Plastics 24.7 10.7 % Rubber, leather, textiles 15.8 6.8 % Wood 12.7 Yard trimmings 27.7 11.9 % Other 7.5 3.2 % Total 231.9 100.0 %

11 Pie Charts and Bar Graphs

12 Measuring Center: The Mean
The most common measure of center is the arithmetic average, or mean. To find the mean (pronounced “x-bar”) of a set of observations, add their values and divide by the number of observations. If the n observations are x1, x2, x3, …, xn, their mean is: or in more compact notation

13 Measuring Center: The Median
Because the mean cannot resist the influence of extreme observations, it is not a resistant measure of center. Another common measure of center is the median. The median M is the midpoint of a distribution, the number such that half of the observations are smaller and the other half are larger. To find the median of a distribution: Arrange all observations from smallest to largest. If the number of observations n is odd, the median M is the center observation in the ordered list. If the number of observations n is even, the median M is the average of the two center observations in the ordered list.

14 Measuring Center Use the data below to calculate the mean and median of the commuting times (in minutes) of 20 randomly selected New York workers. 0 5 3 00 5 7 8 5 Key: 4|5 represents a New York worker who reported a 45- minute travel time to work.

15 Comparing the Mean and Median
The mean and median measure center in different ways, and both are useful. Comparing the Mean and the Median The mean and median of a roughly symmetric distribution are close together. If the distribution is exactly symmetric, the mean and median are exactly the same. In a skewed distribution, the mean is usually farther out in the long tail than is the median.

16 Measuring Spread: Quartiles
A measure of center alone can be misleading. A useful numerical description of a distribution requires both a measure of center and a measure of spread. How to Calculate the Quartiles and the Interquartile Range To calculate the quartiles: Arrange the observations in increasing order and locate the median M. The first quartile Q1 is the median of the observations located to the left of the median in the ordered list. The third quartile Q3 is the median of the observations located to the right of the median in the ordered list. The interquartile range (IQR) is defined as: IQR = Q3 – Q1

17 Five-Number Summary The minimum and maximum values alone tell us little about the distribution as a whole. Likewise, the median and quartiles tell us little about the tails of a distribution. To get a quick summary of both center and spread, combine all five numbers. The five-number summary of a distribution consists of the smallest observation, the first quartile(lower fourth), the median, the third quartile(upper fourth), and the largest observation, written in order from smallest to largest. Minimum Q1 M Q3 Maximum

18 Boxplots The five-number summary divides the distribution roughly into quarters. This leads to a new way to display quantitative data, the boxplot. How to Make a Boxplot Draw and label a number line that includes the range of the distribution. Draw a central box from Q1 to Q3. Note the median M inside the box. Extend lines (whiskers) from the box out to the minimum and maximum values that are not outliers.

19 Suspected Outliers: The 1.5  IQR Rule
In addition to serving as a measure of spread, the interquartile range (IQR) or fourth spread is used as part of a rule of thumb for identifying outliers. The 1.5  IQR Rule for Outliers Call an observation an outlier if it falls more than 1.5  IQR above the third quartile or below the first quartile. In the New York travel time data, we found Q1 = 15 minutes, Q3 = 42.5 minutes, and IQR = 27.5 minutes. For these data, 1.5  IQR = 1.5(27.5) = 41.25 Q1 – 1.5  IQR = 15 – = –26.25 Q  IQR = = 83.75 Any travel time shorter than 26.25 minutes or longer than minutes is considered an outlier. 0 5 3 00 5 7 8 5

20 This is an outlier by the
Boxplots Consider our NY travel times data. Construct a boxplot. 10 30 5 25 40 20 15 85 65 60 45 5 10 15 20 25 30 40 45 60 65 85 Min=5 Q1 = 15 M = 22.5 Q3= 42.5 Max=85 This is an outlier by the 1.5 x IQR rule


Download ppt "MAT 446 Supplementary Note for Ch 1"

Similar presentations


Ads by Google