Presentation on theme: "Thinking about variation. Learning Objectives By the end of this lecture, you should be able to: – Discuss with an example why it is important to know."— Presentation transcript:
Learning Objectives By the end of this lecture, you should be able to: – Discuss with an example why it is important to know the variation when analyzing a dataset – Interpret a series of Normal curves relative to each other in terms of their center and variation – Be able to compare values from different datasets by comparing their z- scores
Thoughts on variation continued Let’s take a moment to think about spread (again)… Suppose you score 12 out of 15 on a test. – Great score? – Good score? – Average score? – Poor score? – Terrible score? Answer: You can’t tell! I hope you’d agree that you’d at least need the mean in order to interpret how good a score this was. Okay then, so suppose I tell you that the mean was 11 / 15. Now answer the same question: Is 12/15 with a mean of 11 this a Great score, Good score, Fair score, Poor score, Terrible score? Answer: You STILL can’t tell! While you could say that is somewhat better than average, you really have no way of knowing if it is approximately average, good, or great.
Thoughts on variation continued Suppose I tell you that the mean was 11 / 15. Is 12/15 a: – Great score? – Good score? – Average score? – Poor score? – Terrible score? Discussion: What’s missing from this interpretation is a measure of spread. Suppose I told you that of the 500 students who took this test, the vast majority scored between 9.5 and 10.5. In this case, you’d suspect that a score of 12 was, in fact, quite good, but you couldn’t put a number on it. KEY POINT: In order to properly interpret any score (of a Normal distribution), we simply can not ignore the standard deviation!!! Suppose the standard deviation was 0.5. In this case, a score of 12 is two standard deviations above the mean. This would be a score at about the 98 th percentile – which is a great result. Suppose the standard deviation was 2. In that case, your z-score is +0.5 and you are in the 70 th percentile which is good, but not fantastic. In other words, without knowing the spread, you simply do not know the story!
What’s different? What’s the same? In this group, means are different ( = 10, 15, and 20) while the standard deviations are the same ( = 3) In this group, the means are the same ( = 15) but the standard deviations are different ( = 2, 4, and 6).
Another extremely useful thing about working with normally distributed data is that we can compare apples and oranges! That is, because we can convert any observation into a z- score, we can then answer questions to compare seemingly non- comparable distributions.
SAT vs ACT Question: Suppose that student A scores 1140 on their SAT, and student B scores 18.2 on their ACT. You are an admissions counselor and you need to make a decision based exclusively on their test score. Can you use this data to decide? Answer: If you can convert these numbers to their corresponding z-scores, then absolutely! To do so, you would, of course, need to know the mean and standard deviation of the two exams. This information is routinely provided by the testing services. E.g. If student A had a z-score of +1, that means he was in the 84 th percentile for the SAT. If student B had a z-score of +1.3, that means that he was in the 90 th percentile. So even though they took completely different exams, you do have a way of comparing them!
A study was done in which the gestation time of mothers in a poor neighborhood was measured. While there were free prenatal vitamins available, there was a great deal of misinformation about proper prenatal nutrition. The gestation time of this group can be seen on the light-blue curve below. Over the next couple of years, a public health project was implemented at local health-care institutions in which women were also provided with nutritional counseling and healthier food. The results of a study after the nutritional program was implemented are summarized on the orange graph below. Try to interpret the results in your own words…. Example: Gestation time in malnourished mothers
Try to interpret the results in your own words…. The mean gestational time improved from about 250 to 266. In addition to the mean improving, there were more people who reached the mean (the peak of the orange curve is higher than the peak of the blue curve). There was more consistency in the “better nutrition” group: the spread of the orange distribution is narrower. (While you can simply eyeball it, and you can also quantify it by the standard deviation). Example: Gestation time in malnourished mothers Don’t feel bad if you didn’t automatically ‘get’ all these facts. That’s why we do examples here! Your goal should be to begin making these kinds of interpretations on your own.
A commonly accepted number for a minimum gestational period (ideally) is about 240 days or longer. How might we quantify the improvement shown below? Instead of waiting for me to answer, try to come up with it on your own. I.e. STOP and THINK about it for a moment… Answer: The best way would be to look at the percentage of women who reached the target of 240 days in each group. Example: Gestation time in malnourished mothers
Vitamins Only: In the group without nutritional counseling (vitamins only), what percent of mothers failed to carry their babies at least 240 days? Vitamins only: About 31% of women failed to reach the target length of 240 days. =250, =20, x=240
Nutritional counseling and better food =266, =15, x=240 Conclusion: Compared to vitamin supplements alone, vitamins and better food resulted in a much smaller percentage of women with pregnancy terms below 8 months (4% vs. 31%). Nutritional assistance program: Only about 4% of women failed to carry their babies 240 days!
Going in the other direction… Remember: stats teachers love this!! We may also want to find the observed range of values that correspond to a given proportion/ area under the curve. For that, we go backward, that is, we start with the normal table: we first find the desired area/ proportion in the body of the table, we then read the corresponding z-value from the left column and top row. For an area to the left of 1.25 % (0.0125), the z-value is -2.24
Example: =266, =15, upper area 75% How long are the longest 75% of pregnancies when mothers in the neighborhood are entered in the “better food” program? Answer: This is another case where we start with an area, and need to come back to our ‘x’. ? upper 75% Conclusion: The 75% longest pregnancies in this group are about 256 days or longer.