Describing Distributions of Data

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Chapter 2 Exploring Data with Graphs and Numerical Summaries
Chapter 1: Exploring Data
Chapter 5: Exploring Data: Distributions Lesson Plan Exploring Data Displaying Distributions: Histograms Interpreting Histograms Displaying Distributions:
+ Chapter 1: Exploring Data Section 1.2 Displaying Quantitative Data with Graphs The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Plan for Today: Chapter 11: Displaying Distributions with Graphs Chapter 12: Describing Distributions with Numbers.
More Univariate Data Quantitative Graphs & Describing Distributions with Numbers.
1.2 Displaying Quantitative Data with Graphs.  Each data value is shown as a dot above its location on the number line 1.Draw a horizontal axis (a number.
+ Chapter 1: Exploring Data Section 1.1 Displaying Quantitative Data with Graphs Dotplots, Stemplots and Shapes.
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Describing Distributions Numerically
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Warm Up.
CHAPTER 2: Describing Distributions with Numbers
Statistical Reasoning
Laugh, and the world laughs with you. Weep and you weep alone
Sec. 1.1 HW Review Pg. 19 Titanic Data Exploration (Excel File)
CHAPTER 1 Exploring Data
recap Individuals Variables (two types) Distribution
Chapter 1 Data Analysis Section 1.2
DAY 3 Sections 1.2 and 1.3.
Please take out Sec HW It is worth 20 points (2 pts
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Do Now: A survey of 1,000 randomly chosen residents of a Minnesota town asked “where do you prefer to purchase your daily coffee?” The two-way table below.
CHAPTER 1 Exploring Data
1.1 Cont’d.
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Honors Statistics Review Chapters 4 - 5
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Displaying Quantitative Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Warmup Find the marginal distribution for age group.
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

Describing Distributions of Data Chapter 2 Describing Distributions of Data

Bar Graphs example 1

Distribution of a variable tells us what values the variable takes and how often it takes these values.

Level of Education

Bar Graphs (useful info) Are useful for displaying distributions of categorical variables. Can compare quantities that are not parts of a whole. Always label your axes and title your graph Scale your axes equally and label each category Leave a space between each bar!!

Example 2 (seatbelts Do most people wear seat belts? Region Percent wearing seat belts, 2008 Percent wearing seat belts, 2003 Northeast 78 74 Midwest 79 75 South 80 West 93 84

Bar Graph (seat belts)

Dot plots Used to display quantitative variables The simplest graph for displaying the distribution of a quantitative variable. Does not work well with a large set of data Draw and label a number line from min to max. Place one dot per observation above its value Stack multiple values evenly on top of each other.

Table 2.3 Highway Gas Mileage for model year 2009 midsize cars

What do I see? The purpose of the graph is to help us understand the data. Look for an overall pattern Look for striking deviations from that pattern Clusters Outliers- is an individual observation that falls outside the overall pattern of the graph. Once you spot an outlier, look for an explanation of that outlier.

Example how good is the us women’s soccer team? The number of goals scored by the U.S. women’s soccer team in 36 games played during the 2008 season is shown below: 4 4 1 4 2 4 2 6 3 3 1 3 5 6 2 1 2 1 4 1 1 1 0 1 4 2 4 1 2 1 2 3 0 1 1 1 What does this data tell us about the performance of the U.S. women’s team in 2008?

Stem plots When the values are too spread out you will use a step plot. Separate each observation into a stem (consisting of all but the final digit) and a leaf (the final digit). Stems can have as many digits as needed. Leaf only contains a single digit Write the stems in a vertical column with smallest on top Write each leaf in the row to the right of the stem Sort the leaves in increasing order as they move out from the stem

Where do older folks live?

SOCS S-- spread- smallest to the largest values O–-outliers C—center-midpoint of distribution S—shape- single peak, symmetric, etc…

Symmetric and skewed distributions Symmetric—if the right and left sides of the graph are approximately mirror images of each other. Skewed right Skewed left

Histograms Used when you have large amounts of data Divide the range of data into classes of equal width. Count the number of individuals in each class Draw the histogram There is no space between the bars!

Where do older folks live?

2.2 describing distribution with numbers

Measuring Center: the median The median, M, is the midpoint of a distribution. Arrange all the observations in order of size, from smallest to largest If the number of observations n is odd, the median M is the center observation in the ordered list. If the number of observations n is even, the median M is the average of the two center observations in the ordered list.

example 8 4 9 1 3 8 4 1 9 1 5

How many text messages?

Measuring spread with quartiles If we choose the median (the midpoint) to describe the center, the quartiles give us a natural way to measure spread. Interquartile range(IQR) IQR = Q3 – Q1

How many text messages?

Five number summary Minimum Q1 M Q3 Maximum These 5 numbers offer a reasonably complete distribution of center and spread.

Boxplots A graph of the five number summary Central box drawn from the first quartile to the third quartile A line in the box marks the median Lines extends from the box out to the smallest and largest observations that are not outliers

Boxplot: how many text messages?

Identifying Outliers Q3 + 1.5(IQR)= outlier Q1 – 1.5(IQR) = outlier

Measuring center: the mean The most common way to measure the center, which goes hand in hand with standard deviation to measure spread. Denoted by n = observations

Resistant---- not resistant Median– RESISTANT Mean – NOT RESISTANT Meaning the median is right in the middle of the ordered data, but it ignores the values at each end of the distribution. The median is not effected by outliers The mean incorporates every value in the data set, outliers can have a large effect on the mean.

Calculate the Median and Mean: What do you see??

When to use Median vs. Mean? Median is preferred when the data is skewed or has outliers. Mean is preferred when the data is roughly symmetric.

Standard deviation If you are summarizing the data using the mean for center, you will want to use standard deviation to measure spread around the mean. The idea of SD is to give the average distance of observations from the mean. Use s when describing Standard Deviation, when using a sample S=0 only when there is no variability (when all observations have the same value) As the observations become more spread out about their mean, s get larger.

metabolism

Let’s summarize!!

Find 5 number summary Find the mean and standard deviation Are there any outliers?

Investigating the effect of the outliers on the summary statistics? 1. Calculate the mean, standard deviation and 5 number summary with and without the outliers. Compare the measures. What happens?