Unit 4 Statistical Analysis Data Representations

Slides:



Advertisements
Similar presentations
Very simple to create with each dot representing a data value. Best for non continuous data but can be made for and quantitative data 2004 US Womens Soccer.
Advertisements

So What Do We Know? Variables can be classified as qualitative/categorical or quantitative. The context of the data we work with is very important. Always.
Analyzing Data (C2-5 BVD) C2-4: Categorical and Quantitative Data.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 2 Exploring Data with Graphs and Numerical Summaries Section 2.2 Graphical Summaries.
Descriptive Statistics Summarizing data using graphs.
1.1 Displaying and Describing Categorical & Quantitative Data.
Descriptive Statistics Summarizing data using graphs.
Unit 4 Statistical Analysis Data Representations.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
More Univariate Data Quantitative Graphs & Describing Distributions with Numbers.
AP Statistics chapter 4 review
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
1.2 Displaying Quantitative Data with Graphs.  Each data value is shown as a dot above its location on the number line 1.Draw a horizontal axis (a number.
AP Statistics. Chapter 1 Think – Where are you going, and why? Show – Calculate and display. Tell – What have you learned? Without this step, you’re never.
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
UNIT ONE REVIEW Exploring Data.
Prof. Eric A. Suess Chapter 3
Graphing options for Quantitative Data
Chapter 1.1 Displaying Distributions with graphs.
Homework Line of best fit page 1 and 2.
Descriptive Statistics
Chapter 1: Exploring Data
Warm Up.
AP Statistics CH. 4 Displaying Quantitative Data
Displaying Quantitative Data
Statistical Reasoning
Laugh, and the world laughs with you. Weep and you weep alone
recap Individuals Variables (two types) Distribution
CHAPTER 1: Picturing Distributions with Graphs
Topic 5: Exploring Quantitative data
Histograms: Earthquake Magnitudes
Describing Distributions of Data
Give 2 examples of this type of variable.
Displaying Quantitative Data
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Homework Check.
CHAPTER 1 Exploring Data
Displaying and Summarizing Quantitative Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Good Morning AP Stat! Day #2
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Honors Statistics Review Chapters 4 - 5
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Displaying Distributions with Graphs
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Types of variables. Types of variables Categorical variables or qualitative identifies basic differentiating characteristics of the population.
Chapter 1: Exploring Data
Lesson Plan Day 1 Lesson Plan Day 2 Lesson Plan Day 3
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

Unit 4 Statistical Analysis Data Representations

3 RULES of EXPLORATORY DATA ANALYSIS MAKE A PICTURE – find patterns difficult to see in a chart MAKE A PICTURE – show important features in graph MAKE A PICTURE- communicates your data to others

Concepts to know! Bar graph Histogram Dot plot Stem leaf plot Boxplots Scatter Plots

Categorical Data The objects being studied are grouped into categories based on some qualitative trait. The resulting data are merely labels or categories.

Categorical Data (Single Variable) Eye Color BLUE BROWN GREEN Frequency (COUNTS) 20 50 5 Relative Frequency 20/75 = .27 50/75= .66 5/75= .07

Pie Chart (Data is Counts or Percentages)

Bar Graph Summarizes categorical data. Horizontal axis represents categories, while vertical axis represents either counts (“frequencies”) or percentages (“relative frequencies”). Used to illustrate the differences in percentages (or counts) between categories.

Bar Graph (Shows distribution of data) Bin Width

Contingency Table (How data is distributed across multiple variables) Class Survival First Second Third Crew Total ALIVE 203 118 178 212 711 DEAD 122 167 528 673 1490 325 285 706 885 2201

What can go wrong when working with categorical data? Pay attention to the variables and what the percentages represent (9.4% of passengers who were in first class survived is different from 67% of survivors were first class passengers!!!) Make sure you have a reasonably large data set (67% of the rats tested died and 1 lived) Marginal distribution

Bar chart is to categorical data as histogram is to ... Analogy Bar chart is to categorical data as histogram is to ... quantitative data.

Histogram

Histogram Divide measurement up into equal-sized categories (BIN WIDTH) Determine number (or percentage) of measurements falling into each category. Draw a bar for each category so bars’ heights represent number (or percent) falling into the categories. Label and title appropriately. http://www.stat.sc.edu/~west/javahtml/Histogram.html

Histogram Use common sense in determining number of categories to use. Between 6 & 15 intervals is preferable (Trial-and-error works fine, too.)

Too few categories

Too many categories

Dot Plot Summarizes quantitative data. Horizontal axis represents measurement scale. Plot one dot for each data point.

Dot Plot

Stem-and-Leaf Plot Summarizes quantitative data. Each data point is broken down into a “stem” and a “leaf.” First, “stems” are aligned in a column. Then, “leaves” are attached to the stems.

High temperatures for the last week: 72, 78, 87, 90, 88, 86, 87, 89 Stem 7 8 9 Leaf 2 8 6 7 7 8 9 7 2 = 72 degrees

Box Plot Summarizes quantitative data. Vertical (or horizontal) axis represents measurement scale. Lines in box represent the 25th percentile (“first quartile”), the 50th percentile (“median”), and the 75th percentile (“third quartile”), respectively.

Box Plot

5 Number Summary Minimum Q1 (25th percentile) Median (50th percentile) Maximum

An aside... Roughly speaking: The “25th percentile” is the number such that 25% of the data points fall below the number. The “median” or “50th percentile” is the number such that half of the data points fall below the number. The “75th percentile” is the number such that 75% of the data points fall below the number.

Using Box Plots to Compare Outliers

Strengths and Weaknesses of Graphs for Quantitative Data Histograms Uses intervals Good to judge the “shape” of a data Not good for small data sets Stem-Leaf Plots Good for sorting data (find the median) Not good for large data sets

Strengths and Weaknesses of Graphs for Quantitative Data Dotplots Uses individual data points Good to show general descriptions of center and variation Not good for judging shape for large data sets Boxplots Good for showing exact look at center, spread and outliers Not good for judging shape or overall data analyses

Contingency table is to categorical data with two variables as Analogy Contingency table is to categorical data with two variables as scatterplot is to .. quantitative data with two variables.

Scatter Plots

Scatter Plots Summarizes the relationship between two quantitative variables. Horizontal axis represents one variable and vertical axis represents second variable. Plot one point for each pair of measurements.

No relationship

Summary Many possible types of graphs. Use common sense in reading graphs. When creating graphs, don’t summarize your data too much or too little. When creating graphs, label everything for others. Remember you are trying to communicate something to others!

How to Compare Distributions When you’re visualizing data, you have lots of options as to how we display it. If we are comparing data on the same type of graph, it is important we focus on the relevant qualities. In order to do this, we need to CUSS!

Center: The area where about half of the observations (data) are on either side. Unusual features: Gaps (where there is no data) and outliers. Spread: The variability of the data. If the data has a wide range, it has a larger spread. If the data has a narrow range, it has a smaller spread. Shape: Described by symmetry, skewness, number of peaks, etc.

Center The center is probably at about 4. We’d need to do some calculations to be more precise. The center is at 5.

Unusual features Gap – a space in the data, somewhat balanced on the sides Outlier – a big gap with one, maybe two, pieces of data on the far side

Less Spread More Spread Spread This measure mostly exists as a comparative, not an absolute.

Shape Symmetrical Uniform

Negative Skew left – it tails off to the left Positive Skew right – it tails off to the right

Nonsymmetric – Skewed Negative -- Skewed Left

Nonsymmetric – Skewed Positive -Skewed Right

Symmetrical Bi-Modal – two peaks, but balanced on both peaks Nonsymmetrical Bi-Modal – two peaks, but unbalanced on both peaks