Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploring Data Chapter 1 Displaying distributions with graphs Describing distributions with Numbers.

Similar presentations


Presentation on theme: "Exploring Data Chapter 1 Displaying distributions with graphs Describing distributions with Numbers."— Presentation transcript:

1 Exploring Data Chapter 1 Displaying distributions with graphs Describing distributions with Numbers

2 Different types of graphs Categorical Data: ◦Use a Bar Graph Quantitative Data: ◦Dot plots ◦Stem plots ◦Histograms

3 Things to remember!! Always Always Always plot your data!! Don’t forget your SOCS ◦S – Shape- ◦O – Outliers ◦C – Center ◦S – Spread

4 Bar Graphs- used to plot categorical data. The distribution of a categorical variable lists the categories and gives either the count or the percent of each individuals who fall in each category. Example 1 The radio audience rating service Aribitron places the country’s 13,838 radio stations into categories that describe the kind of programs they broadcast. Here is the distribution of stations format.

5 FormatCount of stationsPercent of stations Adult Contemporary155611.2 Adult Standards11968.6 Contemporary hit5694.1 Country206614.9 News/Talk/Information217915.7 Oldies10607.7 Religious201414.6 Rock8696.3 Spanish language7505.4 Other format157911.4 Total1383899.9

6

7 Dot Plots Use for quantitative data Small amounts of data

8 Example 2 The accompanying data on gender and birth weight (KG) of foals born to 15 thoroughbred mares appeared in the article “Suckling Behavior Does Not Measure Milk Intake in Horses” (Animal Behaviour (1999): 673-678). Construct a dot plot of the birth weights by gender. Gender: F M M F F M F F M F M F M F F Weight: 129 119 132 123 112 113 95 104 104 93 108 95 117 128 127

9 Stemplots: Use with quantitative data Gives a quick picture of the shape of a distribution Shows symmetry, gaps, clusters, outliers Use for small data sets

10 The accompanying observations are maximum flow rates (@80psi) for 34 different shower heads evaluated in a Consumer Reports article (July 1990). Construct two stem plots (one without splitting and one with split stems) and describe the most prominent features of the displays. 2.8 2.8 2.0 3.6 2.7 2.6 2.9 2.7 2.8 2.5 2.8 2.2 2.5 2.5 2.8 1.8 2.7 2.7 4.7 2.8 2.7 3.1 2.9 3.4 2.6 2.6 2.7 2.4 2.5 5.4 4.9 2.8 2.5

11 Back to Back Stem Plot Literacy rates in Islamic nations CountryFemale Percent Male Percent CountryFemale Percent Male Percent Algeria6078Morocco3868 Bangladesh3150Saudi Arabia 7084 Egypt4668Syria6389 Iran7185Tajikistan99100 Jordan8696Tunisia6383 Kazakhstan99100Turkey7894 Lebanon8295Uzbekistan99100 Libya7192Yemen2970 Malaysia8592

12 Virginia CollegesTuition and fees ($)Virginia CollegesTuition and fees ($) Averett18430Patrick Henry14645 Bluefield10615Randolph—Macon22625 Christendom14420 Randolph—Macon Women’s 21740 Christopher Newport12626Richmond34850 DeVry12710Roanoke22109 Eastern Mennonite18220Saint Paul’s9420 Emory and Henry16690Shenandoah19240 Ferrum16870Sweet Briar21080 George Mason15816University of Virginia22831 Hampton14996University of Virginia-Wise14152 Hampton – Sydney22944Virginia Commonwealth17262 Hollins21675Virginia Intermont15200 Liberty13150Virginia Military Institute19991 Longwood12901Virginia State11462 Lynchburg22885Virginia Tech16530 Mark Baldwin19991Virginia Union12260 Marymount17090Washington and Lee25760 Norfolk State14837William and Mary21796 Old Dominion14688

13 Histograms Used for large sets of data Breaks the range of values of a variable into classes and displays only the count or percent of the observations that fall into each class Divide the range of data into equal-width classes Count the observations in each class – ’frequency’ Draw bars to represent classes- height=frequency Bars should touch (unlike bar graphs) Large sets of data

14 You have probably heard that the distribution of scores on IQ tests follows a bell shaped pattern. Let’s look at some actual IQ scores. Here are 60 5 th -grade students chosen at random from one school 14513912612212513096110118 101142134124112109013411381113 12394100136109131117110127124 106124115133116102127117109137 117901031141391011221059789 10210811012811411211410282101

15 Distributions Look for the overall pattern and for striking deviations from that pattern Describe the overall pattern by its shape, center, spread, and outliers. Outliers-an individual value that falls outside the overall pattern.

16 SHAPE Does the distribution have one or more major peak(s), one peak-unimodal Is the distribution approximately symmetric or is it skewed in one direction? Symmetric- Skewed right Skewed left

17 Outliers Look for points that are clearly apart from the body of the data, not just the most extreme observations in a distribution. We will discuss a test used to identify outliers in the next section. You should look for an explanation for any outlier, sometimes they are an error in recording the data. It is not a good idea to just delete or ignore outliers.

18

19 Relative Frequency Histograms do a good job displaying the distribution of values of a quantitative variable. But….. In order to get information about an individual observation you should construct a relative cumulative frequency graph. Let’s look at the U.S. presidents example…

20 PresidentAgePresidentAgePresidentAge Washington57Lincoln52Hoover54 J. Adams61A.Johnson56F.D. Roosevelt51 Jefferson57Grant46Truman60 Madison57Hayes54Eisenhower61 Monroe58Garfield49Kennedy43 J.Q. Adams57Arthur51L.B. Johnson55 Jackson61Cleveland47Nixon56 Van Buren54B. Harrison55Ford61 W.H. Harrison68Cleveland55Carter52 Tyler51McKinley54Reagan69 Polk49T. Roosevelt42G.H.W. Bush64 Taylor64Taft51Clinton46 Fillmore50Wilson56G.W. Bush54 Pierce48Harding55 Buchanan65Coolidge51

21 1. Decide on class intervals and make a frequency table, add three columns, relative frequency, cumulative frequency, and relative cumulative frequency. ClassFrequencyRelative Frequency Cumulative Frequency Relative cumulative frequency

22 Describing Distributions with Numbers

23 Two-seater CarsMinicompact Cars ModelCityHighwayModelCityHighway Acura NSX1724Aston Martin Vanquish 1219 Audi TT Roadster2028Audi TT Coupe2129 BMW Z4 Roadster2028BMW 325 CI1927 Cadillac XLR1725BMW 330 CI1928 Chevrolet Corvette1825BMW M31623 Dodge Viper1220Jaguar XK81826 Ferrari 360 Modena1116Jaguar XKR1623 Ferrari Maranello1016Lexus SC 4301823 Ford Thunderbird1723Mini Cooper2532 Honda Insight6066Mitsubishi Eclipse2331 Lamborghini Gallardo 915Mitsubishi Spyder2029 Lamborghini Murcielago 913Porsche Cabriolet1826 Lotus Esprit1522Porsche Turbo 9111622 Maserati Spyder1217 Mazda Miata2228 Mercedes-Benz SL 500 1623 Mercedes-Benz SL 600 1319 Nissan 350Z2026 Porsche Boxster2029 Porsche Carrera 9111523 Toyota MR22632

24 Construct a Stem Plot—This will help you describe the shape! In order to interpret measures of center and spread you will need to think about the shape of the distribution.

25 Mean and Median Mean- average value Median- middle value

26 Two-seater CarsMinicompact Cars ModelCityHighwayModelCityHighway Acura NSX1724Aston Martin Vanquish 1219 Audi TT Roadster2028Audi TT Coupe2129 BMW Z4 Roadster2028BMW 325 CI1927 Cadillac XLR1725BMW 330 CI1928 Chevrolet Corvette1825BMW M31623 Dodge Viper1220Jaguar XK81826 Ferrari 360 Modena 1116Jaguar XKR1623 Ferrari Maranello1016Lexus SC 4301823 Ford Thunderbird1723Mini Cooper2532 Honda Insight6066Mitsubishi Eclipse2331 Lamborghini Gallardo 915Mitsubishi Spyder2029 Lamborghini Murcielago 913Porsche Cabriolet1826 Lotus Esprit1522Porsche Turbo 9111622 Maserati Spyder1217 Mazda Miata2228 Mercedes-Benz SL 500 1623 Mercedes-Benz SL 600 1319 Nissan 350Z2026 Porsche Boxster2029 Porsche Carrera 911 1523 Toyota MR22632

27 Looking at the data are there any outliers? What happens to the mean if we remove the outlier? One weakness of mean as a measure of center is it is non resistant to outliers. The Median is resistant to outliers.

28 Mean versus Median Both mean and median are the most common measures of center. The mean and median of a symmetric distribution are close together. In a skewed distribution, the mean is farther out in the ‘tail’ than is the median.


Download ppt "Exploring Data Chapter 1 Displaying distributions with graphs Describing distributions with Numbers."

Similar presentations


Ads by Google