Presentation on theme: "Normal Distribution The shaded area is the probability of z > 1."— Presentation transcript:
Normal Distribution The shaded area is the probability of z > 1
The normal distribution is actually a family of distributions, all with the same shape and parameterised by mean , and standard deviation . It is usually defined by a reference member of the family which is used to define other members. This reference member has =0 and =1.
Definition: A random variable Z has a normal (or Gaussian) distribution with mean 0 and standard deviation 1, if and only if its distribution function Ф(z) (defined by p(Z z) ) is given by we write Z ~ N(0, 1) and say that Z has a standard normal distribution
Definition: A random variable X has a normal (or Gaussian) distribution with mean and standard deviation , if and only if we write X ~ N( , 2 ) and say that X has a normal distribution
The normal distribution is symmetric about its mean . In particular, if Z ~ N(0, 1), then p(Z ≤ -z) = p(Z ≥ z) i.e. Ф(-z) +Ф(z) = 1 for all z
Whatever the values of and , the area between - 2 and + 2 is always 0.95 (95%).
Similarly, Whatever the values of and , the area between - and + is always 0.68 (68%).
Example It has been suggested IQ scores follow a normal distribution with mean 100 and standard deviation 15. Find the probability that any person chosen at random will have (a) An IQ less than 70 (b) An IQ greater than 110 (c) An IQ between 70 and 110.
Let X 1, X 2 ………. X n be independent identically distributed random variables with mean µ and variance σ 2. Let S = X 1,+ X 2+ ………. +X n Then elementary probability theory tells us that E(S) = nµ and var(S) = nσ 2. The Central Limit Theorem (CLT) further states that, provided n is not too small, S has an approximately normal distribution with the above mean nµ, and variance nσ 2.
In other words, S approx ~ N(nµ, nσ 2 ) The approximation improves as n increases. We will use R to demonstrate the CLT.
Let X 1,X 2 ……X 6 come from the Uniform distribution, U(0,1) 01 1
For any uniform distribution on [A,B], µ is equal to and variance, σ 2, is equal to So for our distribution, µ= 1/2 and σ 2 = 1/12
The Central Limit Theorem therefore states that S should have an approximately normal distribution with mean nµ (i.e. 6 x 0.5 = 3) and var nσ 2 (i.e. 6 x 1/12 = 0.5) This gives standard deviation 0.7071 In other words, S approx ~ N(3, 0.7071 2 )
Generate 10 000 results in each of six vectors for the uniform distribution on [0,1] in R. > x1=runif(10000) > x2=runif(10000) > x3=runif(10000) > x4=runif(10000) > x5=runif(10000) > x6=runif(10000) >
Consider the mean and standard deviation of S > mean(s)  3.002503 > sd(s)  0.7070773 > This agrees with our earlier calculations
A method of examining whether the distribution is approximately normal is by producing a normal Q-Q plot. This is a plot of the sorted values of the vector S (the “data”) against what is in effect a idealised sample of the same size from the N(0,1) distribution.
If the CLT holds good, i.e. if S is approximately normal, then the plot should show an approximate straight line with intercept equal to the mean of S (here 3) and slope equal to the standard deviation of S (here 0.707).
Suppose that the random variables Y 1,Y 2, …………Y n model independent observations from a distribution with mean µ and variance σ 2. Then is the sample mean.
Now by the CLT This is because µ is replaced by µ/n and σ by σ /n (for means)
Recall from Statistics 2 that, if σ 2 is estimated by the sample variance, s 2, an approximate confidence interval for µ is given by: Here y is the observed sample mean, and z is proportional to the level of confidence required. _
So for 95% confidence an approximate interval for µ is given by: 2 is approximate - an accurate value can be obtained from tables or by using the qnorm function on R.
Thus in R, an approximate 95% confidence interval for the mean µ is given by > mean(y)+c(-1,1)*qnorm(0.975)*sqrt(var(y)/length(y)) where y is the vector of observations. A more accurate confidence interval, allowing for the fact that s 2 is only an estimate of σ 2,is given by use of the function t.test.