Presentation is loading. Please wait.

Presentation is loading. Please wait.

Describing Distributions of Data

Similar presentations


Presentation on theme: "Describing Distributions of Data"— Presentation transcript:

1 Describing Distributions of Data
Chapter 2 Describing Distributions of Data

2 Bar Graphs example 1

3 Distribution of a variable tells us what values the variable takes and how often it takes these values.

4 Level of Education

5 Bar Graphs (useful info)
Are useful for displaying distributions of categorical variables. Can compare quantities that are not parts of a whole. Always label your axes and title your graph Scale your axes equally and label each category Leave a space between each bar!!

6 Example 2 (seatbelts Do most people wear seat belts? Region
Percent wearing seat belts, 2008 Percent wearing seat belts, 2003 Northeast 78 74 Midwest 79 75 South 80 West 93 84

7 Bar Graph (seat belts)

8 Dot plots Used to display quantitative variables
The simplest graph for displaying the distribution of a quantitative variable. Does not work well with a large set of data Draw and label a number line from min to max. Place one dot per observation above its value Stack multiple values evenly on top of each other.

9 Table 2.3 Highway Gas Mileage for model year 2009 midsize cars

10 What do I see? The purpose of the graph is to help us understand the data. Look for an overall pattern Look for striking deviations from that pattern Clusters Outliers- is an individual observation that falls outside the overall pattern of the graph. Once you spot an outlier, look for an explanation of that outlier.

11 Example how good is the us women’s soccer team?
The number of goals scored by the U.S. women’s soccer team in 36 games played during the 2008 season is shown below: What does this data tell us about the performance of the U.S. women’s team in 2008?

12 Stem plots When the values are too spread out you will use a step plot. Separate each observation into a stem (consisting of all but the final digit) and a leaf (the final digit). Stems can have as many digits as needed. Leaf only contains a single digit Write the stems in a vertical column with smallest on top Write each leaf in the row to the right of the stem Sort the leaves in increasing order as they move out from the stem

13 Where do older folks live?

14 SOCS S-- spread- smallest to the largest values O–-outliers
C—center-midpoint of distribution S—shape- single peak, symmetric, etc…

15 Symmetric and skewed distributions
Symmetric—if the right and left sides of the graph are approximately mirror images of each other. Skewed right Skewed left

16 Histograms Used when you have large amounts of data
Divide the range of data into classes of equal width. Count the number of individuals in each class Draw the histogram There is no space between the bars!

17 Where do older folks live?

18 2.2 describing distribution with numbers

19 Measuring Center: the median
The median, M, is the midpoint of a distribution. Arrange all the observations in order of size, from smallest to largest If the number of observations n is odd, the median M is the center observation in the ordered list. If the number of observations n is even, the median M is the average of the two center observations in the ordered list.

20 example

21 How many text messages?

22 Measuring spread with quartiles
If we choose the median (the midpoint) to describe the center, the quartiles give us a natural way to measure spread. Interquartile range(IQR) IQR = Q3 – Q1

23 How many text messages?

24 Five number summary Minimum Q1 M Q3 Maximum
These 5 numbers offer a reasonably complete distribution of center and spread.

25 Boxplots A graph of the five number summary
Central box drawn from the first quartile to the third quartile A line in the box marks the median Lines extends from the box out to the smallest and largest observations that are not outliers

26 Boxplot: how many text messages?

27 Identifying Outliers Q (IQR)= outlier Q1 – 1.5(IQR) = outlier

28 Measuring center: the mean
The most common way to measure the center, which goes hand in hand with standard deviation to measure spread. Denoted by n = observations

29 Resistant---- not resistant
Median– RESISTANT Mean – NOT RESISTANT Meaning the median is right in the middle of the ordered data, but it ignores the values at each end of the distribution. The median is not effected by outliers The mean incorporates every value in the data set, outliers can have a large effect on the mean.

30 Calculate the Median and Mean:
What do you see??

31 When to use Median vs. Mean?
Median is preferred when the data is skewed or has outliers. Mean is preferred when the data is roughly symmetric.

32 Standard deviation If you are summarizing the data using the mean for center, you will want to use standard deviation to measure spread around the mean. The idea of SD is to give the average distance of observations from the mean. Use s when describing Standard Deviation, when using a sample S=0 only when there is no variability (when all observations have the same value) As the observations become more spread out about their mean, s get larger.

33 metabolism

34 Let’s summarize!!

35 Find 5 number summary Find the mean and standard deviation Are there any outliers?

36 Investigating the effect of the outliers on the summary statistics?
1. Calculate the mean, standard deviation and 5 number summary with and without the outliers. Compare the measures. What happens?

37

38

39

40


Download ppt "Describing Distributions of Data"

Similar presentations


Ads by Google