Using the Rule Normal Quantile Plots

Slides:



Advertisements
Similar presentations
Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution ● Continuous and discrete distributions; Density curves ● The important.
Advertisements

Using the Rule Normal Quantile Plots
HS 67 - Intro Health Stat The Normal Distributions
Theoretical Probability Distributions We have talked about the idea of frequency distributions as a way to see what is happening with our data. We have.
Standard Normal Table Area Under the Curve
The Normal Distributions
Normal Distribution Recall how we describe a distribution of quantitative (continuous) data: –plot the data (stemplot or histogram) –look for the overall.
Normal Distribution Recall how we describe a distribution of data:
Density Curves Normal Distribution Area under the curve.
3.3 Density Curves and Normal Distributions
POSC 202A: Lecture 9 Lecture: statistical significance.
Looking at Data - Distributions Density Curves and Normal Distributions IPS Chapter 1.3 © 2009 W.H. Freeman and Company.
Probability, contd. Learning Objectives By the end of this lecture, you should be able to: – Describe the difference between discrete random variables.
Do NOT glue (we’ll do that later)— simply.
Standard Deviation Z Scores. Learning Objectives By the end of this lecture, you should be able to: – Describe the importance that variation plays in.
NOTES The Normal Distribution. In earlier courses, you have explored data in the following ways: By plotting data (histogram, stemplot, bar graph, etc.)
IPS Chapter 1 © 2012 W.H. Freeman and Company  1.1: Displaying distributions with graphs  1.2: Describing distributions with numbers  1.3: Density Curves.
The Normal Distributions.  1. Always plot your data ◦ Usually a histogram or stemplot  2. Look for the overall pattern ◦ Shape, center, spread, deviations.
Normal Distributions Overview. 2 Introduction So far we two types of tools for describing distributions…graphical and numerical. We also have a strategy.
Continuous random variables
The Normal distribution
Continuous Distributions
Normal distributions x x
Normal Probability Distributions
Modeling Distributions of Data
11. The Normal distributions
CHAPTER 2 Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Entry Task Chapter 2: Describing Location in a Distribution
The Normal Distribution: Comparing Apples and Oranges
Do NOT glue (we’ll do that later)—simply type the data into List 1
Chapter 6 The Normal Curve.
AP Statistics Empirical Rule.
The Normal Distribution
CHAPTER 3: The Normal Distributions
Chapter 2: Modeling Distributions of Data
Density Curves and Normal Distribution
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
The Normal Distribution
Introduction Previous lessons have demonstrated that the normal distribution provides a useful model for many situations in business and industry, as.
Warmup Normal Distributions.
Lecture: statistical significance.
Chapter 2: Modeling Distributions of Data
Normal Distribution Z-distribution.
CHAPTER 2 Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
The Normal Distribution
Chapter 3 Modeling Distributions of Data
The Normal Distribution
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Standard Deviation and the Normal Model
CHAPTER 3: The Normal Distributions
Mean and Median.
Chapter 2: Modeling Distributions of Data
Standard Normal Table Area Under the Curve
Standard Deviation Z Scores
Density Curves Normal Distribution Area under the curve
Using the Rule Normal Quantile Plots
Probability, contd.
Density Curves Normal Distribution Area under the curve
Standard Normal Table Area Under the Curve
Chapter 2: Modeling Distributions of Data
Presentation transcript:

Using the 68-95-99.7 Rule Normal Quantile Plots

Learning Objectives By the end of this lecture, you should be able to: Calculate approximations of areas under the density curve using the 68-95-99.7 rule Identify the mathematical technique used to help confirm (though not guarantee!) that a distribution is indeed Normal

A few numbers worth memorizing (though not just yet) Because we encounter the Normal distribution so much, it is worth memorizing the approximate areas from the Normal table that correspond to a few different z-scores. I say approximate, because the values are rounded off. Let’s look at some z-scores and their corresponding areas (to the left) under the density curve. However, do not bother memorizing them. z = -2  2.2% z = -1  16% z = 0  50% z = +1  84% z = +2  98% What I do want you to memorize are the 3 numbers shown in a famous ‘rule’ on the next slide.

You WILL be asked to use these numbers on quizzes and exams. The 68-95-99.7% Rule for Normal Distributions This “rule” is essentially a shortcut for an approximation of certain areas under the normal curve. There are only 3 numbers: 68, 95, and 99.7, and they are definitely worth memorizing. What do these numbers represent? The area between (i.e. not below) -1 and +1 standard deviations corresponds to about 68% of the observations. The area between -2 and +2 standard deviations corresponds to about 95% of the observations. The area between -3 and +3 standard deviations corresponds to about 99.7% of the observations. You WILL be asked to use these numbers on quizzes and exams. Please note that on your exams you will not be provided with the three numbers (68, 95, 99.7).

Examples: The 68-95-99.7% ‘Shortcut’ Rule for Normal Distributions Now let’s play around with these numbers by answering some questions. Note: All numbers refer to z-scores (i.e. standard deviations). What percentage of observations lie between -1 and +1? Answer: As we just discussed a moment ago, the number of observations between -1 and +1 standard deviations is 68%. What percentage lie between 0 and +1? Answer: If -1 to +1 is 68%, then 0 to +1 is half of that, which is 34%. This is an important step to understand. Make sure you are clear on this point before moving on. There are a few ways to think of it: Look at the area between z=0 (the black line) and z=+1. Note that is is half of the area between -1 and +1. A great way to visualize it is to shade in the area between z=0 and z=+1. What percentage of observations lie below +1? Answer: Let’s start by looking at our z=0 line. Make sure you recognize that the area to the left of z=0 represents 50% of observations. Now, how many observations are between 0 and +1? Recall from the previous question that this is 34%. Therefore, from 0 to +1 = 34, and below 0 is 50, so the area to the left of +1 represents 84% of observations. The z=0 line (drawn below) is very helpful in doing many of these calculations.

Examples: The 68-95-99.7% ‘Shortcut’ Rule for Normal Distributions More examples: What percentage of observations lies between 0 and +3? Answer: Half of the area between -3 and +3 (99.7) which is 49.85%. What percentage of observations lies between -2 and +1? Answer: Use your midline! I would solve this by adding the area between -2 and 0 (half of 95%) to the area between 0 and +1 (half of 68%)  47.5%+ 34% = 81.5% What percentage of observations lies below -2? Answer: While this too can be answered in a few different ways, I would like you to make sure you can do it this way: Look at the area between -2 and +2. Our ‘shortcut’ tells us that this contains 95% of observations. This means that the area above +2 and below -2 together compromise 5% of observations. So the area above +2 = 2.5% of observations, and the area below -2 also comprises 2.5% of observations. Answer: 2.5% What percentage of observations lies above +3? Answer: Use the same technique as was just discussed: Between -3 and +3 makes up 99.7. Therefore below -3 and above +3 makes up 0.3%. Therefore below -3 is 0.15% and above +3 = 0.15%

Examples: The 68-95-99.7% ‘Shortcut’ Rule for Normal Distributions One more! What percentage of observations lies below +2 standard deviations? Answer: Repeat the process from before to determine the area on either side of +2 and -2. That value was 2.5%. If 2.5% of values lie above +2, then 97.5% of observations lie below it. Answer: 97.5%

The 68-95-99.7% ‘Shortcut’ Rule What percentage of women are between 62 and 67 inches tall? Answer: We must, of course, begin by converting our observation values to z-scores. In this case, 62 corresponds to z =-1, and 67 corresponds to z = +1. So what is the area between -1 and +1? Our shortcut rule tells us that it is about 68% What is the range of heights between which about 95% of women fall? Answer: About -2 to +2 SDs, so, about 59.5 to 69.5 inches tall. Without referring to a z-table, what is the approximate range of heights between which nearly all (over 99%) of women fall? Answer: A quick answer would simply to pick the -3 to +3 SD range which corresponds to 99.7% of heights. Converting from z-scores to values gives us a range of 57-72 inches tall. Inflection point mean µ = 64.5 standard deviation s = 2.5 N(µ, s) = N(64.5, 2.5)

The 68-95-99.7% ‘Shortcut’ Rule for Normal Distributions More Examples: What percentage are taller than 67 inches? Answer: If 68% of all women are between 62 and 67 inches tall, this means that 32% are outside of that range. In other words, 16% are shorter than 62 inches, and 16% are taller than 67. What percentage are shorter than 59.5 inches? Answer: If 95% of all women are between 59.5 and 69.5”, then 5% are outside of that range. In other words, 2.5% are shorter than 59.5 and 2.5% are taller than 69.5”. mean µ = 64.5 standard deviation s = 2.5 N(µ, s) = N(64.5, 2.5)

Shortcut Rule or Z-Table? Students have often been confused as to which should be used. For the most accurate result, use your z-table as the shortcut rule uses rounded-off values. Also, if you are given z-scores that are not anywhere near whole numbers (e.g. z=+2.332), then there is no shortcut to use! The shortcut can only be used with whole (integer) numbers between -3 and +3. The main purpose of learning the ‘shortcut’ rule (in addition to the fact that they come up on all kinds of exams), is to encourage you develop an understanding of what you are trying to do rather than just jumping to calculators and z-tables. For this course, you will be asked to do both.

Is the distribution I’m looking at really Normal? Deciding whether a dataset is indeed Normally distributed (or close to it) is a very important question. All the examples we’ve been discussing involving z-scores assume that the data is Normal. If the distribution of the data was not Normal, all of our answers and calculations would be flawed. Recall that there are many other types of distributions that are not Normal. Some examples include skewed, bimodal, uniform, random, binomial (later in the quarter), Poisson, etc, etc Each type of distribution has its own characteristic formulas, calculations, inference techniques, etc. However, because the Normal distribution is so commonly encountered in real-life situations, statisticians spend lots of time discussing it. So how to you decide if a distribution is Normal? You might be tempted to say “look at a graph”. And this is true! When examining data, a chart is a great — often the best — place to start! However, as humans, we are easily fooled. There are many histograms (and related density curves) that look Normal, but in fact, are not. Fortunately, we do have a statistical test that can support (thought not guarantee) that our dataset does indeed appear to be Normal.

Normal Quantile Plot The Normal Quantile plot is a graph that helps us determine if a distribution is indeed Normal It is a mathematical plot that we can create using our statistical software package of choice. Here is the technique for creating a Normal Quantile Plot: (This is discussed for interest only) The data points are ranked and the percentile ranks are converted to z-scores. The z-scores are then used for the x axis against which the data are plotted on the y axis of the normal quantile plot. If the distribution is indeed normal the plot will show a fairly straight line, indicating a good match between the data and a normal distribution. Systematic deviations from a straight line indicate a non-normal distribution. Outliers appear as points that are far away from the overall pattern of the plot.

Normal Quantile Plot shows a good fit to a straight line: the distribution of rainwater pH values is close to normal. Normal quantile plot is not a straight line. This tells us that the data do not follow a Normal distribution. Normal quantile plots are complex to do by hand, but they are standard features in most statistical software.

The normal quantile test supports normality, but does NOT guarantee it! Two key points here: If the plot is NOT straight, then your data is NOT normal! If the plot IS straight, then you have supported the idea that your dataset is normal. However, you have NOT guaranteed it! This concept (confirming / supportive tests) will come up with various other statistical concepts down the road. Whenever you encounter them, you should be sure to make use of them.