Presentation is loading. Please wait.

Presentation is loading. Please wait.

Math 3680 Lecture #1 Graphical Representation of Data.

Similar presentations


Presentation on theme: "Math 3680 Lecture #1 Graphical Representation of Data."— Presentation transcript:

1 Math 3680 Lecture #1 Graphical Representation of Data

2 In this first lecture, we will discuss some brief quantitative measures, which capture essential properties of a data set. This is often important in presentations: – It is often not necessary to report exactly how each subject faired in an experiment. – Instead, report succinct summaries of the data. – Your audience has a short attention span – Communicate only the most important information

3 Types of Variables

4 Population - some generalization about a class of individuals, set of measurements, either existing or conceptual Sample - subset of measurements from the population, some part of the population being examined Units/Subjects - the things/people in a population Inferences - a generalization made about a population based on a sample Parameters - numerical facts about a population that investigators want to know Statistics - numbers which can be computed from a sample. Parameters are estimated by statistics.DEFINITIONS

5 Variables: There are several ways to characterize data – qualitative and quantitative. Qualitative or categorical variables have answers which are descriptive words or phrases. –Ordinal : can be meaningfully ranked (e.g. survey data, grades) –Nominal : cannot be meaningfully ranked (e.g. race, gender etc.) Quantitative variables have answers which are numbers. –Discrete variables (e.g. number of home runs) have gaps between possible values –Continuous variables (e.g. household income) have no gaps between possible values

6 Exercise: Classify the following variables as qualitative (nominal or ordinal) or quantitative (discrete or continuous). occupation weight opinion of teaching effectiveness region of residence grade point average height number of televisions owned blood type size of wrench randomly chosen from a wrench set

7 Median, Interquartile Range and Box-and-Whiskers Plot

8 Definition: Mode The mode is the most frequently occurring value. (With rare exceptions, the mode is useless.) What is the mode for the given data (points scored in the NFL postseason, 1992-94)? 26482838935154413172122929 133021383441727133020282329 17522030132010341029324031

9 Definition: Median The median is chosen so that half of the data lies above the median and half lies below. What is the median for the given data? To find the median, we first order the data, counting multiplicities: 524944 38 353431 3029 28 272624232221 20 17 1513 10 99330

10 If there is an even number of data values, the median must be constructed as above. If there is an odd number of data values, the median is simply the middle value. Short Cut: For a data set with n values, the median rank is the entry. If this rank ends in 0.5, we take the average of the data values in the adjacent positions. th n        2 1

11 While the median is often a useful summary for data, it is not complete by itself. In particular, it does not provide information about the spread of the data. Example: three data sets with median 60: 10060 0 1009995918960201531 0 100787675746054515049 0

12 Definition: Range. The range of a data set is the difference between the largest element and the smallest element. That is: range = largest – smallest. While the range measures variation, it is not perfect. 10060 0 10099959189602015310 1007876757460545150490

13 Definitions: First Quartile. The first quartile is chosen so that 25% of the data lie at or below it. Second Quartile.The second quartile is chosen so that 50% of the data lie at or below it. Third Quartile. The third quartile is chosen so that 75% of the data lie at or below it.

14 1. Rank the data from smallest to largest. 2. Find the median – it is the second quartile. 3. Take the lower half of the data. (If there are an odd number of measurements, include the median.) The median of this lower half is the first quartile, Q 1. 4.Repeat for the upper half to find the third quartile, Q 3. 5. The difference Q 3 - Q 1 is called the interquartile range (IQR). Computing quartiles:

15 Computing quartiles may be facilitated by using Microsoft Excel:

16 BOX-AND-WHISKER PLOTS 1. Draw a vertical scale to include the low and high values. 2. To the scale’s right, draw a box between the first and third quartiles. 3. Draw a line through the box at the median value. 4. Draw lines (whiskers) from the box to the low and high values. 5. Often the whiskers are drawn to the most extreme values within 1.5 IQR of both Q 1 and Q 3. Symbols (+, *) are used to mark each possible outlier between 1.5 and 3 IQR, and each probable outlier beyond 3 IQR of both Q 1 and Q 3, respectively.

17 Exercise: Draw a boxplot for the domestic gross receipts of the top 100 movies of all time: www.boxofficemojo.com/alltime/domestic.htm Note: Many statistical software packages (SPSS, SAS, etc.) can create boxplots automatically. Unfortunately, Excel is not one of them.

18 33 92 63 72 81 82 81 42 87 55 82 48 49 72 73 95 102 101 92 89 95 74 73 99 Stem-and-Leaf Plots

19 Histograms: Continuous Data

20 In a histogram: 1) A histogram is a special kind of bar chart. 2) Percentages are represented by areas, not heights. 3) The height of a block represents the percentage per horizontal unit. 4) Be sure to decide on the endpoint convention.

21 Ex. Construct a histogram for the 2007 salaries of the 50 U.S. governors (p. 27, from Council of State Governments): http://www.stateline.org/live/details/story?contentId=207914 Relative Density Class Frequency frequency per $1000 $ 70- 90,000 $ 90-110,000 $110-120,000 $120-130,000 $130-150,000 $150-170,000 $170-210,000

22

23 Ex: Draw a histogram for the domestic gross receipts of all movies that grossed at least $100 million: www.boxofficemojo.com/alltime/domestic.htm Relative Density Class Frequency frequency per $1M $100 - 110M $110 - 120M $120 - 130M $130 - 150M $150 - 175M $175 - 200M $200 - 250M $250 - 300M $300 - 800M

24

25 How do you decide on the classes? 1. Too few classes: very undescriptive. 2. Too many classes: very choppy. 3. Sturge’s rule of thumb: for a data set of size n, k ≈ log 2 n =, rounded up to the nearest integer. 4. For long tails, use wide classes as appropriate. 5. Within these guidelines, there are no absolute rules. ln n ln 2

26 Histograms and Excel Doing histograms correctly with Excel is very cumbersome. This chart was generated using the Histogram toolpack, as described on p. 42 of the textbook. What’s wrong with this picture?

27 Conclusion: Do NOT use Excel to make histograms Other software packages (R, Minitab, SPSS etc.) can make correct histograms. For now, just draw histograms by hand.

28 Histograms: Discrete Data

29 Example: A production line inspector records the number of defective items produced each hour of an eight-hour shift: 4 2 4 5 10 5 3 6 Number Relative frequency of items Frequency (Density) 2 3 4 5 6 7 8 9 10

30 Notice that the bar for 2 stretches from 1.5 to 2.5, giving that bar a width of 1.


Download ppt "Math 3680 Lecture #1 Graphical Representation of Data."

Similar presentations


Ads by Google