Statistics: Unlocking the Power of Data Lock 5 STAT 250 Nathaniel Cannon Describing Data: Categorical Variables SECTIONS 2.1 One categorical variable Two.

Slides:



Advertisements
Similar presentations
Exploratory Data Analysis I
Advertisements

Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Describing Data: Two Variables SECTIONS 2.1, 2.4, 2.5 Two categorical (2.1)
Describing Data: One Variable
Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?
AP Statistics Section 4.2 Relationships Between Categorical Variables.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/6/12 Describing Data: One Variable SECTIONS 2.1, 2.2, 2.3, 2.4 One categorical.
Describing Data: One Quantitative Variable
CHAPTER 1 Exploring Data 1.1 Analyzing Categorical Data.
Types of Data Displays Based on the 2008 AZ State Mathematics Standard.
QM Spring 2002 Statistics for Decision Making Descriptive Statistics.
STAT 101 Dr. Kari Lock Morgan Exam 2 Review.
Statistics: Unlocking the Power of Data Lock 5 1 in 8 women (12.5%) of women get breast cancer, so P(breast cancer if female) = in 800 (0.125%)
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
Describing distributions with numbers
Chapter 1 Exploring Data
Chapter 1: Exploring Data AP Stats, Questionnaire “Please take a few minutes to answer the following questions. I am collecting data for my.
1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.
Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence.
AP STATISTICS Section 4.2 Relationships between Categorical Variables.
1 Excursions in Modern Mathematics Sixth Edition Peter Tannenbaum.
+ The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 1: Exploring Data Introduction Data Analysis: Making Sense of Data.
CHAPTER 1 STATISTICS Statistics is a way of reasoning, along with a collection of tools and methods, designed to help us understand the world.
The introduction to SPSS Ⅱ.Tables and Graphs for one variable ---Descriptive Statistics & Graphs.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/11/12 Describing Data: Two Variables SECTIONS 2.1, 2.4, 2.5 Two categorical.
Warm-Up List all of the different types of graphs you can remember from previous years:
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 Chapter 2 Displaying and Describing Categorical Data.
Do Now Have you: Read Harry Potter and the Deathly Hallows Seen Harry Potter and the Deathly Hallows (part 2)
Chapters 1 and 2 Week 1, Monday. Chapter 1: Stats Starts Here What is Statistics? “Statistics is a way of reasoning, along with a collection of tools.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
VCE Further Maths Chapter Two-Bivariate Data \\Servernas\Year 12\Staff Year 12\LI Further Maths.
1 Copyright © Cengage Learning. All rights reserved. 3 Descriptive Analysis and Presentation of Bivariate Data.
STA Lecture 51 STA 291 Lecture 5 Chap 4 Graphical and Tabular Techniques for categorical data Graphical Techniques for numerical data.
Exploring Data Section 1.1 Analyzing Categorical Data.
Chapter 1: Exploring Data Sec. 1.1 Analyzing Categorical Data.
Chapter 2 DISPLAYING AND DESCRIBING CATEGORICAL DATA.
Unit 3 Relations in Categorical Data. Looking at Categorical Data Grouping values of quantitative data into specific classes We use counts or percents.
CHAPTER 6: Two-Way Tables. Chapter 6 Concepts 2  Two-Way Tables  Row and Column Variables  Marginal Distributions  Conditional Distributions  Simpson’s.
Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 1 Exploring Data 1.0 Introduction Data Analysis:
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: Categorical Variables SECTIONS 2.1 One categorical variable.
 Some variables are inherently categorical, for example:  Sex  Race  Occupation  Other categorical variables are created by grouping values of a.
Chapter 3: Displaying and Describing Categorical Data Sarah Lovelace and Alison Vicary Period 2.
Stat1510: Statistical Thinking and Concepts Two Way Tables.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Aim: How do we analyze data with a two-way table?
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: One Quantitative Variable SECTIONS 2.2, 2.3 One quantitative.
Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques.
1.1 Analyzing Categorical Data Pages 7-24 Objectives SWBAT: 1)Display categorical data with a bar graph. Decide if it would be appropriate to make a pie.
Unit 2 Descriptive Statistics Objective: To correctly identify and display sets of data.
AP Statistics Section 4.2 Relationships Between Categorical Variables
+ Chapter 1: Exploring Data Section 1.1 Analyzing Categorical Data The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
+ Warm Up Which of these variables are categorical? Which are quantitative?
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Module 8 Test Review. Find the following from the set of data: 6, 23, 8, 14, 21, 7, 16, 8  Five Number Summary: Answer: Min 6, Lower Quartile 7.5, Median.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Unit 6, Module 15 – Two Way Tables (Part I) Categorical Data Comparing 2.
Class Two Before Class Two Chapter 8: 34, 36, 38, 44, 46 Chapter 9: 28, 48 Chapter 10: 32, 36 Read Chapters 1 & 2 For Class Three: Chapter 1: 24, 30, 32,
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Synthesis and Review for Exam 1.
1.1 ANALYZING CATEGORICAL DATA. FREQUENCY TABLE VS. RELATIVE FREQUENCY TABLE.
AP Statistics Chapter 3 Part 2 Displaying and Describing Categorical Data.
Describing Data: Two Variables
Looking at data Visualization tools.
Laugh, and the world laughs with you. Weep and you weep alone
Data Analysis for Two-Way Tables
AP STATISTICS LESSON 4 – 3 ( DAY 1 )
AP Statistics Chapter 3 Part 2
Treat everyone with sincerity,
Good Morning AP Stat! Day #2
Section 4-3 Relations in Categorical Data
Treat everyone with sincerity,
Presentation transcript:

Statistics: Unlocking the Power of Data Lock 5 STAT 250 Nathaniel Cannon Describing Data: Categorical Variables SECTIONS 2.1 One categorical variable Two categorical variables

Statistics: Unlocking the Power of Data Lock 5 Vaccinations in California What proportion of children in California are vaccinated? California law requires students to provide proof of immunization for school, unless they have an approved exception:  Medical Exception  Personal belief exception Let’s look at the data!

Statistics: Unlocking the Power of Data Lock 5 Frequency Table Vaccines up to date Medical Exception Personal Belief Exception OtherTOTAL Data from California department of public healthCalifornia department of public health All kindergartens in California that reported data (required), 2014 – 2015 Do you think schools that reported may differ from schools that didn’t report? Does sampling bias exist? A frequency table shows the number of cases that fall in each category: Minitab: Stat -> Tables -> Tally Individual Variables -> Counts

Statistics: Unlocking the Power of Data Lock 5 Bar Chart/Plot/Graph In a bar chart, the height of the bar is the number of cases falling in each category Minitab: Graph -> Bar chart

Statistics: Unlocking the Power of Data Lock 5 Histogram vs Bar Chart This is a a) Histogram b) Bar chart c) Other d) I have no idea

Statistics: Unlocking the Power of Data Lock 5 Histogram vs Bar Chart This is a a) Histogram b) Bar chart c) Other d) I have no idea

Statistics: Unlocking the Power of Data Lock 5 Histogram vs Bar Chart A bar chart is for categorical data, and the x-axis has no numeric scale A histogram is for quantitative data, and the x- axis is numeric For a categorical variable, the number of bars equals the number of categories, and the number in each category is fixed For a quantitative variable, the number of bars in a histogram is up to you (or your software), and the appearance can differ with different number of bars

Statistics: Unlocking the Power of Data Lock 5 Proportion

Statistics: Unlocking the Power of Data Lock 5 Proportion Vaccines up to date Medical Exception Personal Belief Exception OtherTOTAL

Statistics: Unlocking the Power of Data Lock 5 Relative Frequency Table A relative frequency table shows the proportion of cases that fall in each category All the numbers in a relative frequency table sum to 1 Vaccines up to date Medical Exception Personal Belief Exception OtherTOTAL Minitab: Stat -> Tables -> Tally Individual Variables -> Percents

Statistics: Unlocking the Power of Data Lock 5 Pie Chart In a pie chart, the relative area of each slice of the pie corresponds to the proportion in each category Minitab: Graph -> Pie Chart

Statistics: Unlocking the Power of Data Lock 5 Summary: One Categorical Variable Summary Statistics  Proportion  Frequency table  Relative frequency table Visualization  Bar chart  Pie chart

Statistics: Unlocking the Power of Data Lock 5 Two Categorical Variables Look at the relationship between two categorical variables 1. Relationship status 2. Gender

Statistics: Unlocking the Power of Data Lock 5 Two-Way Table FemaleMaleTotal In a Relationship It’s Complicated12719 Single Total It doesn’t matter which variable is displayed in the rows and which in the columns Minitab: Stat -> Tables -> Tally Individual Variables -> Counts

Statistics: Unlocking the Power of Data Lock 5 Two-Way Table What proportion of students in this sample are in a relationship? a)42/169  25% b)32/107  30% c)10/62  16% d)32/42  76% FemaleMaleTotal In a Relationship It’s Complicated12719 Single Total

Statistics: Unlocking the Power of Data Lock 5 Two-Way Table What proportion of females in this sample are in a relationship? a)42/169  25% b)32/107  30% c)10/62  16% d)32/42  76% FemaleMaleTotal In a Relationship It’s Complicated12719 Single Total

Statistics: Unlocking the Power of Data Lock 5 Male and Female Proportions 30% of females in the sample say they are in a relationship 16% of males in the sample say they are in a relationship Why the difference???

Statistics: Unlocking the Power of Data Lock 5 Difference in Proportions

Statistics: Unlocking the Power of Data Lock 5 Two-Way Table What proportion of people in a relationship in this sample are female? a)42/169  25% b)32/107  30% c)10/62  16% d)32/42  76% FemaleMaleTotal In a Relationship It’s Complicated12719 Single Total

Statistics: Unlocking the Power of Data Lock 5 Two-Way Table CAUTION: The proportion of females in a relationship is NOT THE SAME AS the proportion of people in a relationship who are female! 30% ≠ 76%!

Statistics: Unlocking the Power of Data Lock 5 Side-by-Side Bar Chart Minitab: Graph -> Bar Chart -> Cluster The height of each bar is the number of the corresponding cell in the two-way table

Statistics: Unlocking the Power of Data Lock 5 Segmented Bar Chart A segmented bar chart is like a side-by-side bar chart, but the bars are stacked instead of side-by-side Minitab: Graph -> Bar Chart -> Stack

Statistics: Unlocking the Power of Data Lock 5 Vitamin D Injections Many kidney dialysis patients get vitamin D injections to correct for a lack of calcium. Two forms of vitamin D injections are used: calcitriol and paricalcitol. The records of 67,000 dialysis patients were examined, and half received one drug; the other half the other drug. After three years, 58.7% of those getting paricalcitol had survived, while only 51.5% of those getting calcitriol had survived. Construct an approximate two-way table of the data ( due to rounding of the percentages we can’t recover the exact counts – round to whole numbers). Source: Teng, M., et. al., “Survival of patients undergoing hemodialysis with paricalcitol or calcitriol Therapy,” New England Journal of Medicine, July 31, 2003; 349(5): Survival of patients undergoing hemodialysis with paricalcitol or calcitriol Therapy

Statistics: Unlocking the Power of Data Lock 5 Vitamin D Injections

Statistics: Unlocking the Power of Data Lock 5 Getting dataset from table If you were to write the data from the two-way table out as an entire data set, what would it look like? How many columns would there be? What would they represent? How many rows would there be? Give an example of one of the rows.

Statistics: Unlocking the Power of Data Lock 5 Kidney Stones R. Charig, D. R. Webb, S. R. Payne, O. E. Wickham (1986). "Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy". Br Med J (Clin Res Ed) 292 (6524): 879–882"Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy" SuccessFailure Treatment A27377 Treatment B28961 Which treatment is better at removing kidney stones? a) Treatment A b) Treatment B

Statistics: Unlocking the Power of Data Lock 5 Kidney Stones SMALL STONESSuccessFailure Treatment A816 Treatment B23436 Which treatment is better at removing small kidney stones? a) Treatment A b) Treatment B

Statistics: Unlocking the Power of Data Lock 5 Kidney Stones LARGE STONESSuccessFailure Treatment A19271 Treatment B5525 Which treatment is better at removing large kidney stones? a) Treatment A b) Treatment B

Statistics: Unlocking the Power of Data Lock 5 Kidney Stones Treatment A is more effective for all kidney stones, but the data shows Treatment B to be effective overall! How is this possible!?!?

Statistics: Unlocking the Power of Data Lock 5 Kidney Stones – Simpson’s Paradox Large StonesSuccessFailureSuccess Rate Treatment A % Treatment B552569% Small StonesSuccessFailureSuccess Rate Treatment A81693% Treatment B % ALL STONESSuccessFailureSuccess Rate Treatment A % Treatment B %

Statistics: Unlocking the Power of Data Lock 5 Kidney Stones Treatment A is used more often on large stones, which are harder to treat. This is an example of Simpson’s Paradox: an observed relationship between two variables can change (or even reverse!) when a third variable is considered

Statistics: Unlocking the Power of Data Lock 5 Kidney Stones

Statistics: Unlocking the Power of Data Lock 5 Combined Treatment A Treatment B Successful273 (78%)289 (83%) Unsuccessful7761

Statistics: Unlocking the Power of Data Lock 5

Summary: Two Categorical Variables Summary Statistics  Two-way table  Difference in proportions Visualization  Side-by-side bar chart  Segmented bar chart

Statistics: Unlocking the Power of Data Lock 5 Variable(s)VisualizationSummary Statistics Categoricalbar chart, pie chart frequency table, relative frequency table, proportion, odds Quantitativedotplot, histogram, boxplot mean, median, max, min, standard deviation, range, IQR, five number summary Categorical vs Categorical side-by-side bar chart, segmented bar chart two-way table, difference in proportions, odds ratio Quantitative vs Categorical side-by-side boxplotsstatistics by group, difference in means Quantitative vs Quantitative scatterplotcorrelation

Statistics: Unlocking the Power of Data Lock 5 Descriptive Statistics Think of a topic or question you would like to use data to help you answer.  What would the cases be?  What would the variables be? (Limit to one or two variables)

Statistics: Unlocking the Power of Data Lock 5 Descriptive Statistics How would you visualize and summarize the variable or relationship between variables? a)bar chart/pie chart, proportions, frequency table/relative frequency table b)dotplot/histogram/boxplot, mean/median, sd/range/IQR, five number summary c)side-by-side or segmented bar charts, difference in proportions, two-way table d)side-by-side boxplot, difference in means e)scatterplot, correlation

Statistics: Unlocking the Power of Data Lock 5 To Do Read Section 2.1 Do HW 2.1 (due Friday, 2/13) Study for Exam 1 (Friday, 2/13)