Presentation is loading. Please wait.

Presentation is loading. Please wait.

Daniel S. Yates The Practice of Statistics Third Edition Chapter 2: Describing Location in a Distribution Copyright © 2008 by W. H. Freeman & Company.

Similar presentations


Presentation on theme: "Daniel S. Yates The Practice of Statistics Third Edition Chapter 2: Describing Location in a Distribution Copyright © 2008 by W. H. Freeman & Company."— Presentation transcript:

1 Daniel S. Yates The Practice of Statistics Third Edition Chapter 2: Describing Location in a Distribution Copyright © 2008 by W. H. Freeman & Company

2 2.1 – Measures of Relative Standings and Density Curves Describing the location of an individual within a distribution Density curves – graphical models –(Bell curve in 2.2)

3 Test scores for 25 students 79818077738374937880756773 778386907985838984827772 The distribution seems roughly symmetrical with no outliers. Jenny’s score is highlighted in red. How did she perform relative to her peers?

4 Statistics for test scores We can see that Jenny’s score of 86 is “above average” but how far above is it?

5 Standardized Value One way to describe relative position in a data set is to tell how many standard deviations above or below the mean the observation is. Standardized Value: “z-score” If the mean and standard deviation of a distribution are known, the “z-score” of a particular observation, x, is: Standardized Value: “z-score” If the mean and standard deviation of a distribution are known, the “z-score” of a particular observation, x, is:

6 Calculating z-scores Consider the test data and Jenny’s score. 79818077738374937880756773 778386907985838984827772 According to Minitab, the mean test score was 80 while the standard deviation was 6.07 points. Jenny’s score was above average. Her standardized z- score is: Jenny’s score was almost one full standard deviation above the mean.

7 Calculating z-scores 79818077738374937880756773 778386907985838984827772 6 | 7 7 | 2334 7 | 5777899 8 | 00123334 8 | 569 9 | 03 6 | 7 7 | 2334 7 | 5777899 8 | 00123334 8 | 569 9 | 03 Jenny: z = (86-80)/6.07 z = 0.99 {above average = +z} Kevin: z = (72-80)/6.07 z = -1.32 {below average = -z} Katie: z = (80-80)/6.07 z = 0 {average z = 0}

8 Comparing Scores Standardized values can be used to compare scores from two different distributions. Statistics Test: mean = 80, std dev = 6.07 Chemistry Test: mean = 76, std dev = 4 Jenny got an 86 in Statistics and 82 in Chemistry. On which test did she perform better? Statistics Chemistry Although she had a lower score, she performed relatively better in Chemistry.

9 Percentiles Another measure of relative standing is a percentile rank. p th percentile: Value with p% of observations below it. median = 50th percentile {mean=50th %ile if symmetric} Q1 = 25th percentile Q3 = 75th percentile 6 | 7 7 | 2334 7 | 5777899 8 | 00123334 8 | 569 9 | 03 6 | 7 7 | 2334 7 | 5777899 8 | 00123334 8 | 569 9 | 03 Jenny got an 86. 22 of the 25 scores are ≤ 86. Jenny is in the 22/25 = 88th %ile.

10 Chebyshev’s Inequality The % of observations at or below a particular z-score depends on the shape of the distribution. An interesting (non-AP topic) observation regarding the % of observations around the mean in ANY distribution is Chebyshev’s Inequality. Chebyshev’s Inequality: In any distribution, the % of observations within k standard deviations of the mean is at least Chebyshev’s Inequality: In any distribution, the % of observations within k standard deviations of the mean is at least

11 Density Curve In Chapter 1, you learned how to plot a dataset to describe its shape, center, spread, etc. Sometimes, the overall pattern of a large number of observations is so regular that we can describe it using a smooth curve. Density Curve: An idealized description of the overall pattern of a distribution. Area underneath = 1, representing 100% of observations.

12 Density Curves Density Curves come in many different shapes; symmetric, skewed, uniform, etc. The area of a region of a density curve represents the % of observations that fall in that region. The median of a density curve cuts the area in half. The mean of a density curve is its “balance point.”

13 The mean of density curve The balancing point. Easy to locate if symmetric Difficult if skewed. There are mathematical ways to determine

14 Area under the curve

15 Because density curves are idealized descriptions, we need to distinguish between the mean and standard deviation of the density curve vs. the mean and standard deviation from the actual observations. parametersstatistics mean ActualIdeal st. dev s

16 2.1 Summary We can describe the overall pattern of a distribution using a density curve. The area under any density curve = 1. This represents 100% of observations. Areas on a density curve represent % of observations over certain regions. An individual observation’s relative standing can be described using a z-score or percentile rank.

17 2.2 – Normal Distributions An important class of density curves that describes normal distributions is Normal curves. Symmetric Single-peaked Bell-shaped Not “normal” in the sense of being average or natural. We capitalize Normal.

18 Normal Density Curves All Normal distributions have the same overall shape. The exact density curve for a particular Normal distribution is described completely by giving its mean µ and standard deviation σ. Where would you find the mean on a normal curve?

19 Normal Density Curve Mean and Standard Deviation Changing the mean µ without changing the standard deviation moves the Normal curve along the horizontal axis. The standard deviation σ controls the spread of the Normal curve.

20 Locating Standard Deviation Remember that mu and sigma alone do not define most distributions (like many from Chapter One), and that you can't “eyeball” the standard deviation of most distributions-- these are properties special only to the Normal distribution. As we move out in either direction from the center, μ, the curve changes from falling ever more steeply

21 Why Normal Distributions? 1) Normal distributions describe many sets of real data. For example... Scores on tests taken by many people (SAT's, psychological tests). Repeated careful measurements of the same quantity. Characteristics of biological populations (such as yields of corn and lengths of animal pregnancies).

22 Why Normal Distributions? 2) Normal distributions are good approximations to the results of many kinds of chance outcomes, such as tossing a coin many times. 3) Third, and most important, we will see that many statistical inference procedures based on Normal distributions work well for other roughly symmetric distributions.

23 Be aware... Even though many sets of data follow a Normal distribution, many do not.  Income distributions are skewed right.  Some symmetric distributions are NOT Normal.  Don't assume a distribution is Normal just because it looks like it.

24 All Normal distributions obey the following...

25 The 68-95-99.7 Rule

26 Young women's Heights The distribution of heights of young women aged 18 to 24 is approximately Normal with mean µ = 64.5 inches and standard deviations σ = 2.5 inches. 95% of women are between what two heights? What percent of women are taller than 69.5 in?

27 Young women's Heights Approximately what percent of women are shorter than Mrs. Marshall? Shorthand: N(μ, σ)N(64.9, 2.5) HW: pg. 137 #2.23-2.26

28 Standard Normal Distribution If we standardize values from a normal distribution using the z-score formula, then the standardized value also has a normal distribution. This new distribution is called the standard Normal distribution. N(0, 1) Note: This is a linear transformation – does not effect spread

29

30 Because all Normal distributions are the same once we standardize, you can now find percentages under Normal curves without Calculus, using the standard Normal table. Recall an area under a density curve is a proportion of the observations in a distribution.

31

32 Using the Standard Normal Table 1) Find the proportion of observations from the standard Normal distribution that are less than 2.22.

33 Using the Standard Normal Table 2) Find the proportion of observations from the standard Normal distribution that are greater than -2.15. 1 -.0158 = 0.9842

34 Be aware... A common mistake is to look up a z-value in Table A and report the entry corresponding to that z-value, regardless of whether the problem asks for the area to the left or to the right of that z-value. Always sketch the standard Normal curve, mark the z- value, shade the area of interest, and make sure your answer is reasonable in the context of the problem.

35 Find the proportion between -1.23 and 2.11 Area left of 2.11 – area of -1.23 0.9826 – 0.1093 = 0.8733 Using the Standard Normal Table

36

37

38 Cholesterol in Young Boys For 14-year-old boys, the mean is µ=170 milligrams of cholesterol per deciliter of blood (mg/dl) and the standard deviation is σ=30 mg/dl. Levels above 240 mg/dl may require medical attention. What percent of 14-year-old boys have more than 240 mg/dl of cholesterol? Draw a picture Use the table

39 Cholesterol in Young Boys What percent of 14-year-old boys have blood cholesterol between 170 and 240 mg/dl?

40 IQ Scores Based on N(100,16) what % of people’s IQ scores would you expect to be: –a) over 80? –b) under 90? –c) between 112 and 132? –d) What IQ represents the 15 th percentile –e) What IQ represents the 98 th percentile –f) What is the IQR of the IQ’s

41 Assessing Normality You can only use these calculations and Table A if the data is Normal, so we must develop methods for assessing Normality. There are two methods: –1) construct a histogram or stemplot set intervals to length s Compare counts to the Empirical Rule –2) construct a Normal Probability Plot done on graphing calc

42 The Earth’s Density In 1798 the English scientist Henry Cavendish measured the density of the earth. He took 29 measurements and recorded the density as a multiple of the density of water. 5.505.614.885.075.265.555.365.295.585.65 5.575.535.625.295.445.345.795.105.275.39 5.425.475.635.345.465.305.755.685.85 Please enter in calculator

43 Method 1: histogram/stemplot 48 8 49 50 7 51 0 52 6 7 9 9 53 0 4 4 6 9 54 2 4 6 7 55 0 3 5 7 8 56 1 2 3 3 8 575 9 585 Key: 48 8 = 4.88 Mean: 5.45 st. dev: 0.22 1 2 11 11 4 0 4.79 5.01 5.23 5.45 5.67 5.89 6.11

44 Method 2: Normal Probability Plot Select last stat plot option: The graph will show each data point plotted against its corresponding z-score. If the data is close to Normal, the graph will be fairly (roughly) linear

45 Method 2: continued Can provide this plot on the AP Exam to support Normality (or disprove it) Rule of Thumb: If you refer to a plot, you must provide it

46 Normal or Not? Examine the Normal probability plots below. Determine whether they represent Normal Distributions and explain why or why not. 1)2)

47 4)3) HW: pg. 154 #37-39, 41


Download ppt "Daniel S. Yates The Practice of Statistics Third Edition Chapter 2: Describing Location in a Distribution Copyright © 2008 by W. H. Freeman & Company."

Similar presentations


Ads by Google