Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/11/12 Describing Data: Two Variables SECTIONS 2.1, 2.4, 2.5 Two categorical.

Similar presentations


Presentation on theme: "Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/11/12 Describing Data: Two Variables SECTIONS 2.1, 2.4, 2.5 Two categorical."— Presentation transcript:

1 Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/11/12 Describing Data: Two Variables SECTIONS 2.1, 2.4, 2.5 Two categorical (2.1) Quantitative and categorical (2.4) Two quantitative (2.5)

2 Statistics: Unlocking the Power of Data Lock 5 The Big Picture Population Sample Sampling Statistical Inference Descriptive Statistics

3 Statistics: Unlocking the Power of Data Lock 5 Two Categorical Variables Look at the relationship between two categorical variables 1. Relationship status 2. Gender

4 Statistics: Unlocking the Power of Data Lock 5 Two-Way Table FemaleMaleTotal In a Relationship321042 It’s Complicated12719 Single6345108 Total10762169 It doesn’t matter which variable is displayed in the rows and which in the columns R: table(relationship, gender) Data from our class survey

5 Statistics: Unlocking the Power of Data Lock 5 Two-Way Table What proportion of students in this sample are in a relationship? a)42/169  25% b)32/107  30% c)10/62  16% d)32/42  76% FemaleMaleTotal In a Relationship321042 It’s Complicated12719 Single6345108 Total10762169

6 Statistics: Unlocking the Power of Data Lock 5 Two-Way Table What proportion of females in this sample are in a relationship? a)42/169  25% b)32/107  30% c)10/62  16% d)32/42  76% FemaleMaleTotal In a Relationship321042 It’s Complicated12719 Single6345108 Total10762169

7 Statistics: Unlocking the Power of Data Lock 5 Two-Way Table What proportion of males in this sample are in a relationship? a)42/169  25% b)32/107  30% c)10/62  16% d)32/42  76% FemaleMaleTotal In a Relationship321042 It’s Complicated12719 Single6345108 Total10762169

8 Statistics: Unlocking the Power of Data Lock 5 Male and Female Proportions 30% of females in the sample say they are in a relationship 16% of males in the sample say they are in a relationship Why the difference???

9 Statistics: Unlocking the Power of Data Lock 5 Difference in Proportions

10 Statistics: Unlocking the Power of Data Lock 5 Two-Way Table What proportion of people in a relationship in this sample are female? a)42/169  25% b)32/107  30% c)10/62  16% d)32/42  76% FemaleMaleTotal In a Relationship321042 It’s Complicated12719 Single6345108 Total10762169

11 Statistics: Unlocking the Power of Data Lock 5 Two-Way Table CAUTION: The proportion of females in a relationship is NOT THE SAME AS the proportion of people in a relationship who are female! 30% ≠ 76%!

12 Statistics: Unlocking the Power of Data Lock 5 Two-Way Table What proportion of students in this sample are female and in a relationship? a)42/169  25% b)32/169  19% c)32/107  30% d)10/62  16% e)32/42  76% FemaleMaleTotal In a Relationship321042 It’s Complicated12719 Single6345108 Total10762169

13 Statistics: Unlocking the Power of Data Lock 5 Side-by-Side Bar Chart R: barplot(relationship~gender, beside=TRUE) The height of each bar is the number of the corresponding cell in the two-way table

14 Statistics: Unlocking the Power of Data Lock 5 Segmented Bar Chart A segmented bar chart is like a side-by-side bar chart, but the bars are stacked instead of side-by-side R: barplot(relationship~gender)

15 Statistics: Unlocking the Power of Data Lock 5 Summary: Two Categorical Variables Summary Statistics  Two-way table  Difference in proportions Visualization  Side-by-side bar chart  Segmented bar chart

16 Statistics: Unlocking the Power of Data Lock 5 Kidney Stones R. Charig, D. R. Webb, S. R. Payne, O. E. Wickham (1986). "Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy". Br Med J (Clin Res Ed) 292 (6524): 879–882"Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy" SuccessFailure Treatment A27377 Treatment B28961 Which treatment is better at removing kidney stones? a) Treatment A b) Treatment B

17 Statistics: Unlocking the Power of Data Lock 5 Kidney Stones SMALL STONESSuccessFailure Treatment A816 Treatment B23436 Which treatment is better at removing small kidney stones? a) Treatment A b) Treatment B

18 Statistics: Unlocking the Power of Data Lock 5 Kidney Stones LARGE STONESSuccessFailure Treatment A19271 Treatment B5525 Which treatment is better at removing large kidney stones? a) Treatment A b) Treatment B

19 Statistics: Unlocking the Power of Data Lock 5 Kidney Stones Treatment A is more effective for all kidney stones, but the data shows Treatment B to be effective overall! How is this possible!?!?

20 Statistics: Unlocking the Power of Data Lock 5 Kidney Stones – Simpson’s Paradox Large StonesSuccessFailureSuccess Rate Treatment A1927173% Treatment B552569% Small StonesSuccessFailureSuccess Rate Treatment A81693% Treatment B2343687% ALL STONESSuccessFailureSuccess Rate Treatment A2737778% Treatment B2896183%

21 Statistics: Unlocking the Power of Data Lock 5 Kidney Stones Treatment A is used more often on large stones, which are harder to treat. This is an example of Simpson’s Paradox: an observed relationship between two variables can change (or even reverses!) when a third variable is considered

22 Statistics: Unlocking the Power of Data Lock 5 Kidney Stones

23 Statistics: Unlocking the Power of Data Lock 5

24 Small Stones Treatment A Treatment B Successful81 (93%)234 (87%) Unsuccessful636 Slope = # successful / # unsuccessful = odds

25 Statistics: Unlocking the Power of Data Lock 5 Large Stones Treatment A Treatment B Successful192 (73%)55 (69%) Unsuccessful7125 Slope = # successful / # unsuccessful = odds

26 Statistics: Unlocking the Power of Data Lock 5 Combined Treatment A Treatment B Successful 81+192=27 3 289 Unsuccessful6+71=7761

27 Statistics: Unlocking the Power of Data Lock 5 Combined Treatment A Treatment B Successful273 (78%)289 (83%) Unsuccessful7761

28 Statistics: Unlocking the Power of Data Lock 5 Combined Treatment A Treatment B Successful273 (78%)289 (83%) Unsuccessful7761

29 Statistics: Unlocking the Power of Data Lock 5

30 Quantitative and Categorical Relationships Interested in a quantitative variable broken down by categorical groups

31 Statistics: Unlocking the Power of Data Lock 5 Tea and the Immune System Mednick, Cai, Kanady, and Drummond (2008). “Comparing the benefits of caffeine, naps and placebo on verbal, motor and perceptual memory,” Behavioral Brain Research, 193, 79-86. Participants were randomized to drink five or six cups of either tea or coffee every day for two weeks (both drinks have caffeine but only tea has L-theanine) After two weeks, blood samples were exposed to an antigen, and production of interferon gamma (immune system response) was measured Explanatory variable: tea or coffee Response variable: measure of interferon gamma

32 Statistics: Unlocking the Power of Data Lock 5 Tea and the Immune System If the tea drinkers have significantly higher levels of interferon gamma, can we conclude that drinking tea rather than coffee caused an increase in this aspect of the immune response? a) Yes b) No Randomized experiment – possible to make conclusions about causality

33 Statistics: Unlocking the Power of Data Lock 5 Side-by-Side Boxplots R: boxplot(InterferonGamma~Drink)

34 Statistics: Unlocking the Power of Data Lock 5 Quantitative Statistics by a Categorical Variable Any of the statistics we use for a quantitative variable can be looked at separately for each level of a categorical variable > mean(InterferonGamma~Drink) Coffee Tea 17.70000 34.81818

35 Statistics: Unlocking the Power of Data Lock 5 Difference in Means R: compareMean(InterferonGamma~Drink) > mean(InterferonGamma~Drink) Coffee Tea 17.70000 34.81818

36 Statistics: Unlocking the Power of Data Lock 5 Summary: One Quantitative and One Categorical Summary Statistics  Any summary statistics for quantitative variables, broken down by groups  Difference in means Visualization  Side-by-side boxplots

37 Statistics: Unlocking the Power of Data Lock 5 Two Quantitative Variables Summary Statistics: correlation Visualization: scatterplot

38 Statistics: Unlocking the Power of Data Lock 5 Scatterplot A scatterplot is the graph of the relationship between two quantitative variables. R: plot(study_hours, gpa)

39 Statistics: Unlocking the Power of Data Lock 5 Direction of Association A positive association means that values of one variable tend to be higher when values of the other variable are higher A negative association means that values of one variable tend to be lower when values of the other variable are higher Two variables are not associated if knowing the value of one variable does not give you any information about the value of the other variable

40 Statistics: Unlocking the Power of Data Lock 5 Cars Data Handout Quantitative Variables:  Weight (pounds)  City MPG  Fuel capacity (gallons)  Page number (in Consumer Reports)  Time to go ¼ mile (in seconds)  Acceleration time from 0 to 60 mph Relationships  Weight vs. CityMPG  Weight vs. FuelCapacity  PageNum vs. Fuel Capacity  Weight vs. QtrMile  Acc060 vs. QtrMile  CityMPG vs. QtrMile

41 Statistics: Unlocking the Power of Data Lock 5 Car Associations

42 Statistics: Unlocking the Power of Data Lock 5 Correlation The correlation is a measure of the strength and direction of linear association between two quantitative variables Sample correlation: r Population correlation:  (“rho”) R: cor(x,y)

43 Statistics: Unlocking the Power of Data Lock 5 Car Correlations What are the properties of correlation? (-.91) (.89) (-.08) (-.45) (.99) (.51)

44 Statistics: Unlocking the Power of Data Lock 5 Correlation 1. -1 ≤ r ≤ 1 2. The sign indicates the direction of association 1. positive association: r > 0 2. negative association: r < 0 3. no linear association: r  0 3. The closer r is to ±1, the stronger the linear association 4. r has no units and does not depend on the units of measurement 5. The correlation between X and Y is the same as the correlation between Y and X

45 Statistics: Unlocking the Power of Data Lock 5 Correlation Guessing Game http://istics.net/gett/gcstart.php?group_id=duke Highest scorer in the class gets one extra point on the first exam!

46 Statistics: Unlocking the Power of Data Lock 5 Correlation r = 0.43 NFL Teams

47 Statistics: Unlocking the Power of Data Lock 5 Correlation Same plot, but with Dolphins and Raiders (outliers) removed r = 0.08

48 Statistics: Unlocking the Power of Data Lock 5 Human Cannonball Y X Plot Y vs. X What is the correlation between X and Y? a) r > 0 b) r < 0 c) r = 0 Are X and Y associated? a) Yes b) No

49 Statistics: Unlocking the Power of Data Lock 5 Correlation Cautions 1. Correlation can be heavily affected by outliers. Always plot your data! 2. r = 0 means no linear association. The variables could still be otherwise associated. Always plot your data! 3. Correlation does not imply causation!

50 Statistics: Unlocking the Power of Data Lock 5 Summary: Two Quantitative Variables Summary Statistics: correlation Visualization: scatterplot

51 Statistics: Unlocking the Power of Data Lock 5 Variable(s)VisualizationSummary Statistics Categoricalbar chart, pie chart frequency table, relative frequency table, proportion Quantitativedotplot, histogram, boxplot mean, median, max, min, standard deviation, range, IQR, five number summary Categorical vs Categorical side-by-side bar chart, segmented bar chart two-way table, difference in proportions Quantitative vs Categorical side-by-side boxplotsstatistics by group, difference in means Quantitative vs Quantitative scatterplotcorrelation

52 Statistics: Unlocking the Power of Data Lock 5 Descriptive Statistics Think of a topic or question you would like to use data to help you answer.  What would the cases be?  What would the variables be? (Limit to one or two variables)

53 Statistics: Unlocking the Power of Data Lock 5 Descriptive Statistics How would you visualize and summarize the variable or relationship between variables? a)bar chart/pie chart, proportions, frequency table/relative frequency table b)dotplot/histogram/boxplot, mean/median, sd/range/IQR, five number summary c)side-by-side or segmented bar charts, difference in proportions, two-way table d)side-by-side boxplot, difference in means e)scatterplot, correlation

54 Statistics: Unlocking the Power of Data Lock 5 Descriptive Statistics Would you have preferred two lectures on descriptive statistics (as we did), or nothing on descriptive statistics and just refer you to the book if you hadn’t already learned it? a) Glad we spent class time on it b) I knew it all – lectures were a waste of time c) Didn’t know all of it, but would rather have just read the book to fill in gaps

55 Statistics: Unlocking the Power of Data Lock 5 To Do Read Sections 2.1, 2.4, 2.5 Do Homework 2 (due Tuesday, 9/18)Homework 2


Download ppt "Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/11/12 Describing Data: Two Variables SECTIONS 2.1, 2.4, 2.5 Two categorical."

Similar presentations


Ads by Google