Chapter 9 Normal Distribution 9.1 Continuous distribution 9.2 The normal distribution 9.3 A check for normality 9.4 Application of the normal distribution.

Slides:



Advertisements
Similar presentations
Note 7 of 5E Statistics with Economics and Business Applications Chapter 5 The Normal and Other Continuous Probability Distributions Normal Probability.
Advertisements

5.2 Continuous Random Variable
Chapter 4: Probabilistic features of certain data Distributions Pages
Ch3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical Treatment of Data Before doing ANYTHING with data: Understand the question. –
Chapter 6 Introduction to Continuous Probability Distributions
CONTINUOUS RANDOM VARIABLES These are used to define probability models for continuous scale measurements, e.g. distance, weight, time For a large data.
Review.
Chapter 6 Continuous Random Variables and Probability Distributions
Chapter 6 The Normal Distribution and Other Continuous Distributions
Ch. 6 The Normal Distribution
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 6-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Chapter 5 Continuous Random Variables and Probability Distributions
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Continuous Random Variables and Probability Distributions.
Chapter 5: Continuous Random Variables
Discrete and Continuous Probability Distributions.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution Business Statistics: A First Course 5 th.
Examples of continuous probability distributions: The normal and standard normal.
Chapter 4 Continuous Random Variables and Probability Distributions
PROBABILITY DISTRIBUTIONS
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Chap 6-1 Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall Chapter 6 The Normal Distribution Business Statistics: A First Course 6 th.
8.5 Normal Distributions We have seen that the histogram for a binomial distribution with n = 20 trials and p = 0.50 was shaped like a bell if we join.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 8 Continuous.
Chapter 6: Probability Distributions
Chapter 7: Sampling Distributions
Continuous Probability Distributions  Continuous Random Variable  A random variable whose space (set of possible values) is an entire interval of numbers.
Chapter 6: Probability Distributions
1 Normal Random Variables In the class of continuous random variables, we are primarily interested in NORMAL random variables. In the class of continuous.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Barnett/Ziegler/Byleen Finite Mathematics 11e1 Learning Objectives for Section 11.5 Normal Distributions The student will be able to identify what is meant.
Chapter 11 Data Descriptions and Probability Distributions Section 5 Normal Distribution.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Continuous Random Variables Chapter 6.
Chapter 6 Random Variables
Applied Quantitative Analysis and Practices LECTURE#11 By Dr. Osman Sadiq Paracha.
Continuous distributions For any x, P(X=x)=0. (For a continuous distribution, the area under a point is 0.) Can ’ t use P(X=x) to describe the probability.
5.3 Random Variables  Random Variable  Discrete Random Variables  Continuous Random Variables  Normal Distributions as Probability Distributions 1.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 6 Continuous Random Variables.
Normal distributions The most important continuous probability distribution in the entire filed of statistics is the normal distributions. All normal distributions.
Random Variables Presentation 6.. Random Variables A random variable assigns a number (or symbol) to each outcome of a random circumstance. A random variable.
The Normal Distribution
Lecture 6 Normal Distribution By Aziza Munir. Summary of last lecture Uniform discrete distribution Binomial Distribution Mean and Variance of binomial.
Basic Business Statistics
Stracener_EMIS 7305/5305_Spr08_ Reliability Data Analysis and Model Selection Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering.
Ch3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical Treatment of Data Before doing ANYTHING with data: Understand the question. –
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 6-1 The Normal Distribution.
Normal Distributions.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 6-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions Basic Business.
Chap 6-1 Chapter 6 The Normal Distribution Statistics for Managers.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
THE NORMAL DISTRIBUTION
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution Business Statistics, A First Course 4 th.
Chapter 6 Continuous Random Variables Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
The normal distribution
Continuous random variables
Normal Distribution 9.1 Continuous distribution
Continuous Distributions
STAT 206: Chapter 6 Normal Distribution.
ENGR 201: Statistics for Engineers
The normal distribution
Continuous Probability Distributions
Chapter 6: Random Variables
Statistics for Managers Using Microsoft® Excel 5th Edition
Lecture 12: Normal Distribution
Standard Deviation and the Normal Model
Chapter 5 Continuous Random Variables and Probability Distributions
The Normal Distribution
Presentation transcript:

Chapter 9 Normal Distribution 9.1 Continuous distribution 9.2 The normal distribution 9.3 A check for normality 9.4 Application of the normal distribution 9.5 Normal approximation to Binomial

9.1 Continuous Distribution For a discrete distribution, for example Binomial distribution with n=5, and p=0.4, the probability distribution is x f(x)

A probability histogram x P(x)

How to describe the distribution of a continuous random variable? For continuous random variable, we also represent probabilities by areas — not by areas of rectangles, but by areas under continuous curves. For continuous random variables, the place of histograms will be taken by continuous curves. Imagine a histogram with narrower and narrower classes. Then we can get a curve by joining the top of the rectangles. This continuous curve is called a probability density (or probability distribution).

Continuous distributions For any x, P(X=x)=0. (For a continuous distribution, the area under a point is 0.) Can ’ t use P(X=x) to describe the probability distribution of X Instead, consider P(a ≤ X ≤ b)

Density function A curve f(x): f(x) ≥ 0 The area under the curve is 1 P(a ≤ X ≤ b) is the area between a and b

P(2 ≤ X ≤ 4)= P(2 ≤ X<4)= P(2<X<4)

9.2 The normal distribution A normal curve: Bell shaped Density is given by μ and σ 2 are two parameters: mean and standard variance of a normal population ( σ is the standard deviation)

The normal — Bell shaped curve: μ =100, σ 2 =10

Normal curves: ( μ =0, σ 2 =1) and ( μ =5, σ 2 =1)

Normal curves: ( μ =0, σ 2 =1) and ( μ =0, σ 2 =2)

Normal curves: ( μ =0, σ 2 =1) and ( μ =2, σ 2 =0.25)

The standard normal curve: μ =0, and σ 2 =1

How to calculate the probability of a normal random variable? Each normal random variable, X, has a density function, say f(x) (it is a normal curve). Probability P(a<X<b) is the area between a and b, under the normal curve f(x) Table I in the back of the book gives areas for a standard normal curve with =0 and =1. Probabilities for any normal curve (any  and ) can be rewritten in terms of a standard normal curve.

Table I: Normal-curve Areas Table I on page We need it for tests Areas under standard normal curve Areas between 0 and z (z>0) How to get an area between a and b? when a<b, and a, b positive area[0,b] – area[0,a]

Get the probability from standard normal table z denotes a standard normal random variable Standard normal curve is symmetric about the origin 0 Draw a graph

Table I: P(0<Z<z) z … … … …

Examples Example 9.1 P(0<Z<1) = Example 9.2 P(1<Z<2) =P(0<Z<2) – P(0<Z<1) = – =0.1359

Examples Example 9.3 P(Z ≥ 1) =0.5 – P(0<Z<1) =0.5 – =0.1587

Examples Example 9.4 P(Z ≥ -1) = =0.8413

Examples Example 9.5 P(-2<Z<1) = =0.8185

Examples Example 9.6 P(Z ≤ 1.87) =0.5+P(0<Z ≤ 1.87) = =0.9693

Examples Example 9.7 P(Z<-1.87) = P(Z>1.87) = 0.5 – =

From non-standard normal to standard normal X is a normal random variable with mean μ, and standard deviation σ Set Z=(X – μ )/ σ Z=standard unit or z-score of X Then Z has a standard normal distribution and

Example 9.8 X is a normal random variable with μ =120, and σ =15 Find the probability P(X ≤ 135) Solution:

XZXZ x  z-score of x Example 9.8 (continued) P(X ≤ 150) x=150  z-score z=( )/15=2 P(X ≤ 150)=P(Z ≤ 2) = =

9.3Checking Normality Most of the statistical tools we will use in this class assume normal distributions. In order to know if these are the right tools for a particular job, we need to be able to assess if the data appear to have come from a normal population. A normal plot gives a good visual check for normality.

Simulation: 100 observations, normal with mean=5, st dev=1 x<-rnorm(100, mean=5, sd=1) qqnorm(x)

The plot below shows results on alpha-fetoprotein (AFP) levels in maternal blood for normal and Down ’ s syndrome fetuses. Estimating a woman ’ s risk of having a preganancy associated with Down ’ s syndrome using her age and serum alpha-fetoprotein levelH.S.Cuckle, N.J.Wald, S.O.Thompson

Normal Plot The way these normal plots work is Straight means that the data appear normal Parallel means that the groups have similar variances.

Normal plot In order to plot the data and check for normality, we compare our observed data to what we would expect from a sample of normal data.

To begin with, imagine taking n=5 random values from a standard normal population (=0, =1) Let Z (1) Z (2) Z (3) Z (4) Z (5) be the ordered values. Suppose we do this over and over. SampleZ (1) Z (2) Z (3) Z (4) Z (5) ……………… Forever _______________ Mean E(Z (1) ) E(Z (2) ) E(Z (3) ) E(Z (4) ) E(Z (5) ) On average the smallest of n=5 standard normal values is standard deviations below average the second smallest of n=5 standard normal values is standard deviations below average the middle of n=5 standard normal values is at the average, 0 standard deviations from average

The table of “ rankits ” from the Statistics in Biology table gives these expected values. For larger n, space is saved by just giving the positive values. The negative values are a mirror image of the positive values, since a standard normal distribution is symmetric about its mean of zero.

Check for normality If X is normal, how do ordered values of X, X (i), relate to expected ordered Z values, E( Z (i) ) ? For normal with mean  and standard deviation , the expected values of the data, X (i), will be a linear rescaling of standard normal expected values E(X (i) ) ≈  +  E( Z (i) ) The observed data X (i) will be approximately a linearly related to E( Z (i) ). X (i) ≈  +  E( Z (i) )

If we plot the ordered X values versus E( Z (i) ), we should see roughly a straight line with intercept  slope 

Example Example: Lifetimes of springs under 900 N/mm 2 stress iE( Z(i) ) X(i)

The plot is fairly linear indicating that the data are pretty similar to what we would expect from normal data.

To compare results from different treatments, we can put more than one normal plot on the same graph. The intercept for the 900 stress level is above the intercept for the 950 stress group, indicating that the mean lifetime of the 900 stress group is greater than the mean of the 950 stress group. The slopes are similar, indicating that the variances or standard deviations are similar.

These plots were done in Excel. In Excel you can either enter values from the table of E(Z) values or generate approximations to these tables values. One way to generate approximate E(Z) values is to generate evenly spaced percentiles of a standard normal, Z, distribution. The ordered X values correspond roughly to particular percentiles of a normal distribution. For example if we had n=5 values, the 3rd ordered values would be roughly the median or 50th percentile. A common method is to use percentiles corresponding to.

For n=5 this would give us i the 50th percentile For E(Z) we would use corresponding percentiles of a standard normal Z distribution. Percentiles expressed as fractions are called quantiles. The 0.5 quantile is the 50th percentile. Normal plots from this perspective are sometimes called Q-Q plots, since we are plotting standard normal quantiles versus the associated quantiles of the observed data.

For n = 10 values for the spring data, the corresponding normal percentiles would be iZ quantile

For assessing whether a plotted line is fairly parallel, either the E(Z) values or the normal quantiles work fine. If you are doing the plot by hand it ’ s easiest to use the E(Z) table. If you are doing these in Excel it ’ s easiest to use the normal quantiles. The function NORMINV(p,0,1) finds the Z values corresponding to a given quantile. This is the inverse of the function that finds the cumulative probability for a given Z value. Z  NORMDIST  probability = NORMDIST(1.645, 0, 1, TRUE)  0.95 Probability  NORMINV  probability = NORMINV(0.95, 0, 1)  (The TRUE in NORMDIST says to return the cumulative probability rather than density curve height.)

Excel File of Lifetime of Springs Data

For data that are not normal Many types of data tend to follow a normal distribution, but many data sets aren ’ t particularly normal. If the data aren ’ t fairly normal we have several options Transform the data, meaning change the scale. A log or ln scale is most common. Weights of fish Concentrations Bilirubin levels in blood pH is a log scale RNA expression levels in a microarray experiment A reciprocal (1/Y) change of times to rates Other powers Square root for Poisson variables

Non-normal data continued Use a different distribution other than a normal distribution Weibull distribution for lifetimes Motors at General Electric Patients in a clinical trial

Weibull Distributions (Time to Failure – Non-binomial & Non- normal) Infant Mortality: Fail immediately or last a long time Early Failure: These do not fail immediately, but many do fail early Old-age Wearout: Very few of these fail until they were out

Non-normal data continued Use a nonparametric methods which doesn ’ t assume any distribution Finding a distribution that models the data well rather than nonparametric Allows us to develop a more complete model Allows us to generalize to other situations Gives us more precise information for the same amount of effort

The methods in this class largely apply to normal data or data that we can transform to normal. The EPA fish example is a good example of transforming data with a log transformation. Geometric means and harmonic means arise when we are working with transformed data. For example fish weights are usually analyzed in the log scale. Having a mean in the log scale we want to put this value back into the original scale, for example grams. The back-transformed mean from the log scale is the geometric mean. The back-transformed mean from a reciprocal scale (rates), is the harmonic mean. Back-transformed differences between geometric means correspond to ratios in the original scale.

Suppose ln(X) = Y ~ N(,  2 ). This means Y (or ln(x)) distributed as normal with mean  and variance  2. The geometric mean is e , the back- transformed population mean in the ln scale. If we have the difference between two means in the ln scale then back- transforming give us = ratio of geometric means.

About geometric means A fact is that if the variances of both populations are the same, then the ratio of the population geometric means is the same as the ratio of the population means.

Question: Why not just use the means in the original scale? Answer: Means are best when populations are normal. Using the ratio of the geometric means will give us a more precise estimate of the true ratio than using the ratio of the means in the original scale.

A similar fact explains why we use means rather than medians. For a normal population the mean is the same as the median. We could use either the sample mean or the sample median to estimate . BUT, the mean will be a more precise guess (estimate of) the true value, . It would take us roughly 50% more values (larger n) using the median as our guess at  to accomplish the same degree of precision as we get using the mean as our guess at .

9.4 Application of the normal distribution Public Health Service Health Examination Survey 6,672 Americans years old The woman ’ s heights were approximately normal with 63  and standard deviation 2.5 . What percentage of women were over 68  tall?

Solution: X=height P(X>68)=P(Z>(68-63)/2.5)) =P(Z>2) = =0.0228

Continuity Correction for a Better Approximation Sometimes only integer values are possible for x. x=score of LSAT x=# of heads in 10 tosses of a fair coin A normal approximation is more accurate with a “ continuity correction ”

1976 LSAT Approximately normal mean 650, st. dev 60 P(X ≥ 680)P(Z>( )/60) =P(Z>0.49) = =0.3121

9.5 Normal Approximation to Binomial A binomial distribution: n=10, p=0.5 μ =np=5 σ 2 =np(1-p)=2.5  σ = P(X ≥ 7)=0.172 from Binomial 2. P(X ≥ 7)= P(Z>(6.5-5)/1.58) 3. =P(Z>0.95) = = from normal approximation

Dots: Binomial Probabilities Smoot Line: Normal Curve With Same Mean and Variance

Normal Approximation Is Good If The normal curve has the same mean and standard deviation as binomial np>5 and n(1-p)>5 Continuity correction is made

Example Records show that 60% of the customers of a service station pay with a credit card. Use normal approximation to find the probabilities that among 100 customers 1. At most 65 will pay with a credit 2. At least 55 will pay with a credit 3. Between 55 and 65 will pay with a credit card 4. Exactly 65 will pay with a credit card

Solution: X=# of customers who pay with a credit card μ= np=60, σ 2 = np(1-p)=24  σ=4.8990

Normal Approximation 3. 4.