Presentation is loading. Please wait.

Presentation is loading. Please wait.

Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share.

Similar presentations


Presentation on theme: "Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share."— Presentation transcript:

1 Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share Alike 3.0 License: http://creativecommons.org/licenses/by-nc-sa/3.0/ We have reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your ability to use, share, and adapt it. The citation key on the following slide provides information about how you may share and adapt this material. Copyright holders of content included in this material should contact open.michigan@umich.edu with any questions, corrections, or clarification regarding the use of content. For more information about how to cite these materials visit http://open.umich.edu/education/about/terms-of-use. Any medical information in this material is intended to inform and educate and is not a tool for self-diagnosis or a replacement for medical evaluation, advice, diagnosis or treatment by a healthcare professional. Please speak to your physician if you have questions about your medical condition. Viewer discretion is advised: Some medical content is graphic and may not be suitable for all viewers.

2 Attribution Key for more information see: http://open.umich.edu/wiki/AttributionPolicy Use + Share + Adapt Make Your Own Assessment Creative Commons – Attribution License Creative Commons – Attribution Share Alike License Creative Commons – Attribution Noncommercial License Creative Commons – Attribution Noncommercial Share Alike License GNU – Free Documentation License Creative Commons – Zero Waiver Public Domain – Ineligible: Works that are ineligible for copyright protection in the U.S. (17 USC § 102(b)) *laws in your jurisdiction may differ Public Domain – Expired: Works that are no longer protected due to an expired copyright term. Public Domain – Government: Works that are produced by the U.S. Government. (17 USC § 105) Public Domain – Self Dedicated: Works that a copyright holder has dedicated to the public domain. Fair Use: Use of works that is determined to be Fair consistent with the U.S. Copyright Act. (17 USC § 107) *laws in your jurisdiction may differ Our determination DOES NOT mean that all uses of this 3rd-party content are Fair Uses and we DO NOT guarantee that your use of the content is Fair. To use this content you should do your own independent analysis to determine whether or not your use will be Fair. { Content the copyright holder, author, or law permits you to use, share and adapt. } { Content Open.Michigan believes can be used, shared, and adapted because it is ineligible for copyright. } { Content Open.Michigan has used under a Fair Use determination. }

3 How would you describe the shape of the Sleep Hours distribution? A) Approximately symmetric B) Uniform C) Skewed to the Right D) Skewed to the left

4 What if … … you had some data and you made a histogram of it and it looked like this… What would it tell you? Response Count

5 Histograms: Other Comments NO SPACE BETWEEN BARS! Unless there are no observations in that interval. How Many Classes? Use your judgment: generally somewhere between 6 and 15 intervals. Better to use relative frequencies on y axis when comparing sets of obs. Software has defaults and many options. Note: skip dotplots and stem-and-leaf plots (but read section 2.5); summary pg 35-36.

6 One More Histogram Example: Study to find out number of hours children aged 8 to 12 years spent watching television on a typical day. Listing of all households; a random sample of 20 (of 100) households selected and all children aged 8 to 12 years in selected households interviewed  Histogram Try it on your own – Solutions will be on Ctools (under Lecture Info) next week!

7 2.5 Numerical Summaries of Quantitative Data Notation for generic set of data: x 1, x 2, x 3, …, x n where n = # items in the data set or sample size Describing the Location/Center  Mean -- numerical average value Represent mean of a sample (called a statistic) by …  Median -- middle value when data arranged sm  lg

8 Numerical Summaries of French Fries Data Fries Example: Weight measurements for 12 small orders of Fries (gms). 77 72 69 80 63 67 78 74 70 83 71 79 What should we do first? Graph it! Based on histogram, distribution of weight is unimodal and approximately symmetric. Weights (in grams) range from 60’s to lower 80’s, centered around lower 70’s.

9 Numerical Summaries of French Fries Data Fries Example: Weight measurements for 12 small orders of Fries (gms). 77 72 69 80 63 67 78 74 70 83 71 79 1.Compute the mean weight.

10 Numerical Summaries of French Fries Data Fries Example: Weight measurements for 12 small orders of Fries (gms). 77 72 69 80 63 67 78 74 70 83 71 79 1.Mean weight = 73.6 grams 2.Compute the median weight. 63, 67, 69, 70, 71, 72, 74, 77, 78, 79, 80, 83

11 Numerical Summaries of French Fries Data Fries Example: Weight measurements for 12 small orders of Fries (gms). 1.Mean weight = 73.6 grams 2.Median weight = 73 grams 3.What if smallest weight entered as 3 instead of 63? Note: The mean is _________________ to extreme observations. The median is ________________ to extreme observations.

12 Some Pictures: Mean versus Median

13 2.5 Numerical Summaries of Quantitative Data Describing Spread: Range and Interquartile Range Midterms are returned and the “average” was reported as 76 out of 100. You received a score of 88. How should you feel?

14 Describing Spread Range and Interquartile Range  Range: Measures spread over 100% of data. Range= High value – Low value = Maximum – Minimum …  Percentiles – p th percentile is the value such that p% of the observations fall at or below that value Some Common percentiles: Median: 50 th percentile First quartile: 25 th percentile Third quartile: 75 th percentile

15 Describing Spread Five Number Summary: Interquartile Range: Measures spread over middle 50% of data. IQR = Q3 – Q1 Variable Name and Units (n = number of observations) MedianM QuartilesQ1Q3 ExtremesMinMax

16 Five-Number Summary of French Fries Data Try it! French Fries Data Ordered: 63, 67, 69, 70, 71, 72, 74, 77, 78, 79, 80, 83 Find the five-number summary: Range: IQR: Weight of Fries (in grams) (n = 12 orders) Median Quartiles Extremes Text Scores Example pg 15 will be posted on Ctools

17 Boxplots Boxplot Steps: Label an axis with values to cover the minimum and maximum of the data. Make box with ends at quartiles Q1 and Q3. Draw a line in the box at the median M. Check for possible outliers using the 1.5*IQR rule and if any, plot them individually. Extend lines from end of box to smallest and largest obs that are not possible outliers. Note: Possible outliers are observations that are more than 1.5*IQR outside the quartiles. That is, observations that are below Q1 - 1.5*IQR or observations that are above Q3 + 1.5*IQR.

18 Outliers for French Fries Data Five-number summary: Verify no outliers … IQR = 78.5 – 69.5 = 9 grams 1.5*IQR = 1.5 (9) = 13.5 grams Lower boundary (fence) = Q1 - 1.5*IQR = Are there any observations that fall below this lower boundary? Upper boundary (fence) = Q3 + 1.5*IQR = Are there any observations that fall above this upper boundary? Weight of Fries (in grams) (n = 12 orders) Median 73 Quartiles 69.578.5 Extremes 6383

19 What if …? for French Fries Data Suppose largest weight of 83  93 63, 67, 69, 70, 71, 72, 74, 77, 78, 79, 80, 93 Five-number summary: Boundaries still 56 and 92.  One high outlier, the maximum value of 93. Modified boxplot  Why line extends out to 80? Weight of Fries (in grams) (n = 12 orders) Median 73 Quartiles 69.578.5 Extremes 63

20 Notes on Boxplots: Side-by-side boxplots are good for... Watch out -- points plotted individually are... Can't confirm.... When reading values from a graph show what you are doing (so appropriate credit can be given on exam/quiz)

21 Side-by-side Boxplots Try It: Breakfast status (eat? Yes/No) Standardized grade score (on a 10-point scale). Review NOTES on bottom of page 16, then work with neighbor – try page 17 parts (a) to (c). We will clicker in the answers soon.

22 a. What is approx the lowest grade scored by a child who does have breakfast? A) 4 B) 4.5 C) 6

23 b. Among children who did not eat breakfast, 25% had a grade score of at least ______ points. A) 7.5 B) 6.5 C) 5.5

24 c. A = True or B = False? The symmetry in the boxplot for the children not eating breakfast implies that the distribution for the grade scores of such students is bell-shaped.

25 2.6 How to Handle Outliers See pages 45-46 for good examples. From Utts, Jessica M. and Robert F. Heckard. Mind on Statistics, Fourth Edition. 2012. Used with permission.

26 2.7 Features of Bell-Shaped Distributions Describing Spread with Standard Deviation A measure of the spread of the observations from the mean. Interpret as “roughly, the average distance of the observations from the mean.” s = sample standard deviation = Note: Squared standard deviation, s 2, is called the variance. We emphasize standard deviation since it is in original units.

27 Standard Deviation for French Fries Data Fries Example: Weight measurements for 12 small orders of Fries (gms). 77 72 69 80 63 67 78 74 70 83 71 79 Mean = 73.6 grams. Find the standard deviation.

28 Interpretation of Standard Deviation Fries Example: Weight measurements for 12 small orders of Fries (gms). 77 72 69 80 63 67 78 74 70 83 71 79 Mean = 73.6 grams. Standard deviation = 5.9 grams. Interpretation: These weights of small order of french fries are roughly _________________ away from their mean weight of _____________, on average. Or On average, the weights of small orders of french fries vary by about ______________ from their mean weight of _______________.

29 Notes about the Standard Deviation: s = 0 means... Like the mean, s is... So use the mean and standard deviation for ____________________________________ the 5-number summary is better for ____________________________________.

30 Notes about the Standard Deviation: Technical Note about difference between population parameter and sample statistic...


Download ppt "Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share."

Similar presentations


Ads by Google