Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics: Unlocking the Power of Data Lock 5 STAT 250 Nathaniel Cannon Describing Data: Categorical Variables SECTIONS 2.1 One categorical variable Two.

Similar presentations


Presentation on theme: "Statistics: Unlocking the Power of Data Lock 5 STAT 250 Nathaniel Cannon Describing Data: Categorical Variables SECTIONS 2.1 One categorical variable Two."— Presentation transcript:

1 Statistics: Unlocking the Power of Data Lock 5 STAT 250 Nathaniel Cannon Describing Data: Categorical Variables SECTIONS 2.1 One categorical variable Two categorical variables

2 Statistics: Unlocking the Power of Data Lock 5 Vaccinations in California What proportion of children in California are vaccinated? California law requires students to provide proof of immunization for school, unless they have an approved exception:  Medical Exception  Personal belief exception Let’s look at the data!

3 Statistics: Unlocking the Power of Data Lock 5 Frequency Table Vaccines up to date Medical Exception Personal Belief Exception OtherTOTAL 48001410091322936391530643 Data from California department of public healthCalifornia department of public health All kindergartens in California that reported data (required), 2014 – 2015 Do you think schools that reported may differ from schools that didn’t report? Does sampling bias exist? A frequency table shows the number of cases that fall in each category: Minitab: Stat -> Tables -> Tally Individual Variables -> Counts

4 Statistics: Unlocking the Power of Data Lock 5 Bar Chart/Plot/Graph In a bar chart, the height of the bar is the number of cases falling in each category Minitab: Graph -> Bar chart

5 Statistics: Unlocking the Power of Data Lock 5 Histogram vs Bar Chart This is a a) Histogram b) Bar chart c) Other d) I have no idea

6 Statistics: Unlocking the Power of Data Lock 5 Histogram vs Bar Chart This is a a) Histogram b) Bar chart c) Other d) I have no idea

7 Statistics: Unlocking the Power of Data Lock 5 Histogram vs Bar Chart A bar chart is for categorical data, and the x-axis has no numeric scale A histogram is for quantitative data, and the x- axis is numeric For a categorical variable, the number of bars equals the number of categories, and the number in each category is fixed For a quantitative variable, the number of bars in a histogram is up to you (or your software), and the appearance can differ with different number of bars

8 Statistics: Unlocking the Power of Data Lock 5 Proportion

9 Statistics: Unlocking the Power of Data Lock 5 Proportion Vaccines up to date Medical Exception Personal Belief Exception OtherTOTAL 48001410091322936391530643

10 Statistics: Unlocking the Power of Data Lock 5 Relative Frequency Table A relative frequency table shows the proportion of cases that fall in each category All the numbers in a relative frequency table sum to 1 Vaccines up to date Medical Exception Personal Belief Exception OtherTOTAL 0.9050.0020.0250.0681 Minitab: Stat -> Tables -> Tally Individual Variables -> Percents

11 Statistics: Unlocking the Power of Data Lock 5 Pie Chart In a pie chart, the relative area of each slice of the pie corresponds to the proportion in each category Minitab: Graph -> Pie Chart

12 Statistics: Unlocking the Power of Data Lock 5 Summary: One Categorical Variable Summary Statistics  Proportion  Frequency table  Relative frequency table Visualization  Bar chart  Pie chart

13 Statistics: Unlocking the Power of Data Lock 5 Two Categorical Variables Look at the relationship between two categorical variables 1. Relationship status 2. Gender

14 Statistics: Unlocking the Power of Data Lock 5 Two-Way Table FemaleMaleTotal In a Relationship321042 It’s Complicated12719 Single6345108 Total10762169 It doesn’t matter which variable is displayed in the rows and which in the columns Minitab: Stat -> Tables -> Tally Individual Variables -> Counts

15 Statistics: Unlocking the Power of Data Lock 5 Two-Way Table What proportion of students in this sample are in a relationship? a)42/169  25% b)32/107  30% c)10/62  16% d)32/42  76% FemaleMaleTotal In a Relationship321042 It’s Complicated12719 Single6345108 Total10762169

16 Statistics: Unlocking the Power of Data Lock 5 Two-Way Table What proportion of females in this sample are in a relationship? a)42/169  25% b)32/107  30% c)10/62  16% d)32/42  76% FemaleMaleTotal In a Relationship321042 It’s Complicated12719 Single6345108 Total10762169

17 Statistics: Unlocking the Power of Data Lock 5 Male and Female Proportions 30% of females in the sample say they are in a relationship 16% of males in the sample say they are in a relationship Why the difference???

18 Statistics: Unlocking the Power of Data Lock 5 Difference in Proportions

19 Statistics: Unlocking the Power of Data Lock 5 Two-Way Table What proportion of people in a relationship in this sample are female? a)42/169  25% b)32/107  30% c)10/62  16% d)32/42  76% FemaleMaleTotal In a Relationship321042 It’s Complicated12719 Single6345108 Total10762169

20 Statistics: Unlocking the Power of Data Lock 5 Two-Way Table CAUTION: The proportion of females in a relationship is NOT THE SAME AS the proportion of people in a relationship who are female! 30% ≠ 76%!

21 Statistics: Unlocking the Power of Data Lock 5 Side-by-Side Bar Chart Minitab: Graph -> Bar Chart -> Cluster The height of each bar is the number of the corresponding cell in the two-way table

22 Statistics: Unlocking the Power of Data Lock 5 Segmented Bar Chart A segmented bar chart is like a side-by-side bar chart, but the bars are stacked instead of side-by-side Minitab: Graph -> Bar Chart -> Stack

23 Statistics: Unlocking the Power of Data Lock 5 Vitamin D Injections Many kidney dialysis patients get vitamin D injections to correct for a lack of calcium. Two forms of vitamin D injections are used: calcitriol and paricalcitol. The records of 67,000 dialysis patients were examined, and half received one drug; the other half the other drug. After three years, 58.7% of those getting paricalcitol had survived, while only 51.5% of those getting calcitriol had survived. Construct an approximate two-way table of the data ( due to rounding of the percentages we can’t recover the exact counts – round to whole numbers). Source: Teng, M., et. al., “Survival of patients undergoing hemodialysis with paricalcitol or calcitriol Therapy,” New England Journal of Medicine, July 31, 2003; 349(5): 446-456.Survival of patients undergoing hemodialysis with paricalcitol or calcitriol Therapy

24 Statistics: Unlocking the Power of Data Lock 5 Vitamin D Injections

25 Statistics: Unlocking the Power of Data Lock 5 Getting dataset from table If you were to write the data from the two-way table out as an entire data set, what would it look like? How many columns would there be? What would they represent? How many rows would there be? Give an example of one of the rows.

26 Statistics: Unlocking the Power of Data Lock 5 Kidney Stones R. Charig, D. R. Webb, S. R. Payne, O. E. Wickham (1986). "Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy". Br Med J (Clin Res Ed) 292 (6524): 879–882"Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy" SuccessFailure Treatment A27377 Treatment B28961 Which treatment is better at removing kidney stones? a) Treatment A b) Treatment B

27 Statistics: Unlocking the Power of Data Lock 5 Kidney Stones SMALL STONESSuccessFailure Treatment A816 Treatment B23436 Which treatment is better at removing small kidney stones? a) Treatment A b) Treatment B

28 Statistics: Unlocking the Power of Data Lock 5 Kidney Stones LARGE STONESSuccessFailure Treatment A19271 Treatment B5525 Which treatment is better at removing large kidney stones? a) Treatment A b) Treatment B

29 Statistics: Unlocking the Power of Data Lock 5 Kidney Stones Treatment A is more effective for all kidney stones, but the data shows Treatment B to be effective overall! How is this possible!?!?

30 Statistics: Unlocking the Power of Data Lock 5 Kidney Stones – Simpson’s Paradox Large StonesSuccessFailureSuccess Rate Treatment A1927173% Treatment B552569% Small StonesSuccessFailureSuccess Rate Treatment A81693% Treatment B2343687% ALL STONESSuccessFailureSuccess Rate Treatment A2737778% Treatment B2896183%

31 Statistics: Unlocking the Power of Data Lock 5 Kidney Stones Treatment A is used more often on large stones, which are harder to treat. This is an example of Simpson’s Paradox: an observed relationship between two variables can change (or even reverse!) when a third variable is considered

32 Statistics: Unlocking the Power of Data Lock 5 Kidney Stones

33 Statistics: Unlocking the Power of Data Lock 5 Combined Treatment A Treatment B Successful273 (78%)289 (83%) Unsuccessful7761

34 Statistics: Unlocking the Power of Data Lock 5

35 Summary: Two Categorical Variables Summary Statistics  Two-way table  Difference in proportions Visualization  Side-by-side bar chart  Segmented bar chart

36 Statistics: Unlocking the Power of Data Lock 5 Variable(s)VisualizationSummary Statistics Categoricalbar chart, pie chart frequency table, relative frequency table, proportion, odds Quantitativedotplot, histogram, boxplot mean, median, max, min, standard deviation, range, IQR, five number summary Categorical vs Categorical side-by-side bar chart, segmented bar chart two-way table, difference in proportions, odds ratio Quantitative vs Categorical side-by-side boxplotsstatistics by group, difference in means Quantitative vs Quantitative scatterplotcorrelation

37 Statistics: Unlocking the Power of Data Lock 5 Descriptive Statistics Think of a topic or question you would like to use data to help you answer.  What would the cases be?  What would the variables be? (Limit to one or two variables)

38 Statistics: Unlocking the Power of Data Lock 5 Descriptive Statistics How would you visualize and summarize the variable or relationship between variables? a)bar chart/pie chart, proportions, frequency table/relative frequency table b)dotplot/histogram/boxplot, mean/median, sd/range/IQR, five number summary c)side-by-side or segmented bar charts, difference in proportions, two-way table d)side-by-side boxplot, difference in means e)scatterplot, correlation

39 Statistics: Unlocking the Power of Data Lock 5 To Do Read Section 2.1 Do HW 2.1 (due Friday, 2/13) Study for Exam 1 (Friday, 2/13)


Download ppt "Statistics: Unlocking the Power of Data Lock 5 STAT 250 Nathaniel Cannon Describing Data: Categorical Variables SECTIONS 2.1 One categorical variable Two."

Similar presentations


Ads by Google