Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 2 Modeling Distributions of Data 2.2 Density.

Similar presentations


Presentation on theme: "The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 2 Modeling Distributions of Data 2.2 Density."— Presentation transcript:

1 The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions

2 Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition2 ESTIMATE the relative locations of the median and mean on a density curve. ESTIMATE areas (proportions of values) in a Normal distribution. FIND the proportion of z-values in a specified interval, or a z-score from a percentile in the standard Normal distribution. FIND the proportion of values in a specified interval, or the value that corresponds to a given percentile in any Normal distribution. DETERMINE whether a distribution of data is approximately Normal from graphical and numerical evidence. Density Curves and Normal Distributions

3 The Practice of Statistics, 5 th Edition3 Exploring Quantitative Data In Chapter 1, we developed a kit of graphical and numerical tools for describing distributions. Now, we’ll add one more step to the strategy. 1.Always plot your data: make a graph, usually a dotplot, stemplot, or histogram. 2.Look for the overall pattern (shape, center, and spread) and for striking departures such as outliers. 3.Calculate a numerical summary to briefly describe center and spread. Exploring Quantitative Data 4. Sometimes the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve.

4 The Practice of Statistics, 5 th Edition4 Density Curves A density curve is a curve that is always on or above the horizontal axis, and has area exactly 1 underneath it. A density curve describes the overall pattern of a distribution. The area under the curve and above any interval of values on the horizontal axis is the proportion of all observations that fall in that interval. A density curve is a curve that is always on or above the horizontal axis, and has area exactly 1 underneath it. A density curve describes the overall pattern of a distribution. The area under the curve and above any interval of values on the horizontal axis is the proportion of all observations that fall in that interval. The overall pattern of this histogram of the scores of all 947 seventh-grade students in Gary, Indiana, on the vocabulary part of the Iowa Test of Basic Skills (ITBS) can be described by a smooth curve drawn through the tops of the bars. Example

5 The Practice of Statistics, 5 th Edition5 Describing Density Curves Our measures of center and spread apply to density curves as well as to actual sets of observations. The median of a density curve is the equal-areas point, the point that divides the area under the curve in half. The mean of a density curve is the balance point, at which the curve would balance if made of solid material. The median and the mean are the same for a symmetric density curve. They both lie at the center of the curve. The mean of a skewed curve is pulled away from the median in the direction of the long tail. Distinguishing the Median and Mean of a Density Curve

6 The Practice of Statistics, 5 th Edition6 Describing Density Curves A density curve is an idealized description of a distribution of data. We distinguish between the mean and standard deviation of the density curve and the mean and standard deviation computed from the actual observations. The usual notation for the mean of a density curve is µ (the Greek letter mu). We write the standard deviation of a density curve as σ (the Greek letter sigma).

7 The Practice of Statistics, 5 th Edition7 Normal Distributions One particularly important class of density curves are the Normal curves, which describe Normal distributions. All Normal curves have the same shape: symmetric, single- peaked, and bell-shaped Any specific Normal curve is completely described by giving its mean µ and its standard deviation σ.

8 The Practice of Statistics, 5 th Edition8 Normal Distributions A Normal distribution is described by a Normal density curve. Any particular Normal distribution is completely specified by two numbers: its mean µ and standard deviation σ. The mean of a Normal distribution is the center of the symmetric Normal curve. The standard deviation is the distance from the center to the change-of-curvature points on either side. We abbreviate the Normal distribution with mean µ and standard deviation σ as N(µ,σ). A Normal distribution is described by a Normal density curve. Any particular Normal distribution is completely specified by two numbers: its mean µ and standard deviation σ. The mean of a Normal distribution is the center of the symmetric Normal curve. The standard deviation is the distance from the center to the change-of-curvature points on either side. We abbreviate the Normal distribution with mean µ and standard deviation σ as N(µ,σ). Why are the Normal distributions important in statistics? Normal distributions are good descriptions for some distributions of real data. Normal distributions are good approximations of the results of manykinds of chance outcomes. Many statistical inference procedures are based on Normal distributions.

9 The Practice of Statistics, 5 th Edition9 The 68-95-99.7 Rule Although there are many Normal curves, they all have properties in common. The 68-95-99.7 Rule In the Normal distribution with mean µ and standard deviation σ: Approximately 68% of the observations fall within σ of µ. Approximately 95% of the observations fall within 2σ of µ. Approximately 99.7% of the observations fall within 3σ of µ. The 68-95-99.7 Rule In the Normal distribution with mean µ and standard deviation σ: Approximately 68% of the observations fall within σ of µ. Approximately 95% of the observations fall within 2σ of µ. Approximately 99.7% of the observations fall within 3σ of µ.

10 The Practice of Statistics, 5 th Edition10 Empirical rule The 68-95-99.7 rule is known as the Empirical rule Approximately 68% fall within one standard deviation of the mean. Approximately 95% fall within two standard deviations of the mean. Approximately 99.7% fall within three standard deviations of the mean. The area under the entire curve is 100%. Any region that you calculate is known as a probability region. Remember values in the data set must appear to be a normal bell- shaped histogram, dotplot, or stemplot to use the Empirical Rule Always remember that the mean is in the middle. Just because it is a bell shaped curve that doesn’t mean it is Normal!!

11 The Practice of Statistics, 5 th Edition11 Example of the Empirical Rule The distribution of the scores of a vocabulary test for 7 th graders in Indiana is N(6.84, 1.55). a.What percent of the vocabulary scores are less than 3.74? Show your work. b.What percent of the scores are between 5.29 and 9.94? Show your work.

12 The Practice of Statistics, 5 th Edition12 Solution to examples: Always try to draw a sketch of what this normal curve would look like. Based on Sketch: a.3.74 is exactly two standard deviations below the mean. By the empirical rule, 95% of all scores are between the following: µ - 2σ = 6.84 – 2(1.55) = 6.84 – 3.10 = 3.74. Also, µ + 2σ = 9.94. 100% - 95% = 5%. However, 5% covers two areas of the normal curve. Why? - because the normal curve is symmetric half are below 3.74 and the other half are above 9.94. So 5% divided by two is 2.5%. Therefore the answer is about 2.5% are below 3.74. b.Again, referring to the picture, we want the area under the density curve between 5.29 and 9.94. Since 5.29 is 1σ below the mean and 9.94 is 2σ above the mean, we have to add 68% to 13.5%. Why 68%? Why 13.5? (95 – 68 = 27. We only want half of this, so 27/2 = 13.5) Therefore 68% + 13.5% = 81.5%

13 The Practice of Statistics, 5 th Edition13 Practice – application of the empirical rule. The distribution of batting averages for 432 MLB players is N(0.261, 0.034) a.What percent of the batting averages are greater than 0.329? Show your work. b.What percent of the batting averages are between 0.193 and 0.295? Show your work.

14 The Practice of Statistics, 5 th Edition14 Homework: Read section 2.2 Do pg. 129 #40 – 44

15 The Practice of Statistics, 5 th Edition15 Exit Ticket: Average American adult male height is 69 inches tall with standard deviation of 2.5 inches. What percent of the adult males are taller than 74 inches? Show your work.

16 The Practice of Statistics, 5 th Edition16 The Standard Normal Distribution Aim – How can we understand how to solve problems using the standard normal distribution? H.W. Read section 2.2 do pg. 129 – 130 #47 – 56 Do Now: Test scores have a mean of 79 and standard deviation of 5.5. Determine the z-scores of the following students: 1.Standardized score for Michael who’s exam score is 87. 2.Sarah’s standardized score for her exam score of 75. 3.Fred’s score of 58.

17 The Practice of Statistics, 5 th Edition17 The Standard Normal Distribution All Normal distributions are the same if we measure in units of size σ from the mean µ as center. The standard Normal distribution is the Normal distribution with mean 0 and standard deviation 1. If a variable x has any Normal distribution N(µ,σ) with mean µ and standard deviation σ, then the standardized variable has the standard Normal distribution, N(0,1). The standard Normal distribution is the Normal distribution with mean 0 and standard deviation 1. If a variable x has any Normal distribution N(µ,σ) with mean µ and standard deviation σ, then the standardized variable has the standard Normal distribution, N(0,1).

18 The Practice of Statistics, 5 th Edition18 The Standard Normal Table The standard Normal Table (Table A) is a table of areas under the standard Normal curve. The table entry for each value z is the area under the curve to the left of z. Z.00.01.02 0.7.7580.7611.7642 0.8.7881.7910.7939 0.9.8159.8186.8212 P(z < 0.81) =.7910 Suppose we want to find the proportion of observations from the standard Normal distribution that are less than 0.81. We can use Table A:

19 The Practice of Statistics, 5 th Edition19 Example: Determine the following proportions (probabilities) 1.P( Z < 2.14) 2.P(Z < 3.44) 3.P(Z > –0.67) 4.P(Z > 1.78) 5.P(–1.25 < Z < 0.81) 6.P(–0.58 < Z < 1.79)

20 The Practice of Statistics, 5 th Edition20 Z-Transformation The table only shows the standard normal, N(0, 1). What happens if the Normal standard, ie N(5, 2.55)?

21 The Practice of Statistics, 5 th Edition21 Aim – How can we understand the standard normal in context? Homework: do pg. 129 – 130 #47 – 56 Test on Friday Do Now: Using the table, determine the following: 1.(Z < -1.06) 2.(Z > 2.98) 3.(0.55 < Z < 2.32)

22 The Practice of Statistics, 5 th Edition22 Normal Distribution Calculations We can answer a question about areas in any Normal distribution by standardizing and using Table A or by using technology. Step 1: State the distribution and the values of interest. Draw a Normal curve with the area of interest shaded and the mean, standard deviation, and boundary value(s) clearly identified. Step 2: Perform calculations—show your work! Do one of the following: (i) Compute a z-score for each boundary value and use Table A or technology to find the desired area under the standard Normal curve; or (ii) use the normalcdf command and label each of the inputs. Step 3: Answer the question. How To Find Areas In Any Normal Distribution

23 The Practice of Statistics, 5 th Edition23 Examples on using the table: Suppose that the average temperature in July in a certain region is normal random variable with parameters µ = 90 (measured in degrees Fahrenheit) σ = 5. Find the following probabilities: a.Temperature below 95. b.Temperature above 88. c.Temperature between 85 and 92.

24 The Practice of Statistics, 5 th Edition24 The original distribution and the z-score distribution Whatever the overall shape of the original distribution(skewed or symmetric), the z-score transformation will be the same shape. If the original was skewed, the z-score will be the skewed in the same direction. If it was symmetric, so will the z-score. Why do you think this is?

25 The Practice of Statistics, 5 th Edition25 Practice: Suppose that the growth, in inches, during the tenth year of life of a North American Boy is a Normal Random Variable with parameters µ = 2 and σ = 1. Find the proportion that a North American Boy will grow: a.At least one inch in his tenth year. b.Less than ½ an inch in his tenth year. c.Between 1 and 2 inches.

26 The Practice of Statistics, 5 th Edition26 Practice solutions: 1.0.8413 2.0.0668 3.0.3413

27 The Practice of Statistics, 5 th Edition27 Using the Graphing Calculator: We can use the normal cdf command to calculate the same proportions (probabilities). The command uses this syntax: 2 nd Vars Click on normalcdf(lowerbound, upperbound, mean, standard deviation) This is based off the old operating system: However for most recent operating systems when you click on 2 nd Vars, and after you click on nomalcdf(, this screen might pop up: Lower: -1e99 Upper: µ: 0 σ : 1 Paste

28 The Practice of Statistics, 5 th Edition28 Working Backwards: Normal Distribution Calculations Sometimes, we may want to find the observed value that corresponds to a given percentile. There are again three steps. Step 1: State the distribution and the values of interest. Draw a Normal curve with the area of interest shaded and the mean, standard deviation, and unknown boundary value clearly identified. Step 2: Perform calculations—show your work! Do one of the following: (i) Use Table A or technology to find the value of z with the indicated area under the standard Normal curve, then “unstandardize” to transform back to the original distribution; or (ii) Use the invNorm command and label each of the inputs. Step 3: Answer the question. How To Find Values From Areas In Any Normal Distribution

29 The Practice of Statistics, 5 th Edition29 Example on Working Backwards: Can we find the z-score that corresponds to a particular area? Suppose we wanted the z-score that corresponds to the 90 th percentile? Let’s use the table to see what z-score corresponds to 0.90. Which value corresponds closest to 0.90?

30 The Practice of Statistics, 5 th Edition30 Practice: Using the table, determine the z-value that corresponds to the following: 1.The 20 th percentile. 2.45% of all observations are greater than z.

31 The Practice of Statistics, 5 th Edition31 Using the graphing calculator: Since we are asked to find the z-value, we can assume this means standard normal, i.e. N(0,1) 2 nd Vars: invNorm( On the old operating system: invNorm(area, mean, s.d..) On the new operating system: (new) new operating system Area: 0.90 area: Mean: 0 mean: s.d.: 1 s.d. Paste tail: left center right

32 The Practice of Statistics, 5 th Edition32 Forewarning Students do not get full credit when they use calculator speak to show work on Normal calculations. Normalcdf(290, 100000, 304, 8) is not considered clear communication because not all graphing calculators follow this syntax. To get full credit, you must write down the full process. 1.State the distribution and area of interest. i.e. Normal distribution, with mean = xyz and s.d. = abc. 2.Show calculations when finding the standardized score: 3.Answer the queston.

33 The Practice of Statistics, 5 th Edition33 Example: High cholesterol in the blood increase the risk of heart disease. For 14 year old boys, the distribution of cholesterol is approximately normal with µ = 170 milligrams of cholesterol per deciliter of blood and σ =30 mg/dl. What is the first quartile of the distribution of blood cholesterol?

34 The Practice of Statistics, 5 th Edition34 Alternatively: We could also find the answer using the graphing calculator using the following command: invNorm(area 0.25, mean 170, sd 30)

35 The Practice of Statistics, 5 th Edition35 Practice Problems – use table or tI - 84 1.Cholesterol levels above 240 mg/dl may require medical attention. What percent of 14 year old boys have more than 240 mg/dl? 2.People with cholesterol between 200 and 240 mg/dl are at considerable risk for heart disease. What percent of 14 year old boys have blood cholesterol between 200 and 240 mg/dl?

36 The Practice of Statistics, 5 th Edition36 Assessing Normality The Normal distributions provide good models for some distributions of real data. Many statistical inference procedures are based on the assumption that the population is approximately Normally distributed. A Normal probability plot provides a good assessment of whether a data set follows a Normal distribution. If the points on a Normal probability plot lie close to a straight line, the plot indicates that the data are Normal. Systematic deviations from a straight line indicate a non-Normal distribution. Outliers appear as points that are far away from the overall pattern of the plot. Interpreting Normal Probability Plots

37 The Practice of Statistics, 5 th Edition37 Example: No space in the fridge? The measurements listed below describe the usable capacity (in cubic feet) of a sample of 36 side-by-side refrigerators (Consumer Reports, May 2010).Are the data close to Normal? 12.9 13.7 14.1 14.2 14.5 14.5 14.6 14.7 15.1 15.2 15.3 15.3 15.3 15.3 15.5 15.6 15.6 15.8 16.0 16.0 16.2 16.2 16.3 16.4 16.5 16.6 16.6 16.6 16.8 17.0 17.0 17.2 17.4 17.4 17.9 18.4

38 The Practice of Statistics, 5 th Edition38 Example continued: The mean and standard deviation of these data are = 15.825 and s x = 1.217. = (14.608, 17.042)24 of 36 = 66.7% = (13.391, 18.259)34 of 36 = 94.4% = (12.174, 19.467)36 of 36 = 100% These percents are quite close to what we would expect based on the 68–95–99.7 rule. Combined with the graph, this gives good evidence that this distribution is close to Normal

39 The Practice of Statistics, 5 th Edition39 Normal Probability Plot: Read the example on page 123 for how to construct a normal probability plot. However we may be responsible for interpreting normal probability plots: IF the points on a normal probability plot lie close to a straight line, the data are approximately normally distributed. Systemic deviations from a straight line indicate non-normal distribution. Outliers appear as points that are far away from the overall pattern of the plot.

40 The Practice of Statistics, 5 th Edition40 Interpreting Normal probability plot Here is a Normal probability plot (also called a Normal quantile plot) of the refrigerator data from the previous page. It is quite linear, supporting our earlier decision that the distribution is close to Normal.

41 Section Summary In this section, we learned how to… The Practice of Statistics, 5 th Edition41 ESTIMATE the relative locations of the median and mean on a density curve. ESTIMATE areas (proportions of values) in a Normal distribution. FIND the proportion of z-values in a specified interval, or a z-score from a percentile in the standard Normal distribution. FIND the proportion of values in a specified interval, or the value that corresponds to a given percentile in any Normal distribution. DETERMINE whether a distribution of data is approximately Normal from graphical and numerical evidence. Density Curves and Normal Distributions


Download ppt "The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 2 Modeling Distributions of Data 2.2 Density."

Similar presentations


Ads by Google