Presentation on theme: "SADC Course in Statistics Importance of the normal distribution (Session 09)"— Presentation transcript:
SADC Course in Statistics Importance of the normal distribution (Session 09)
To put your footer here go to View > Header and Footer 2 Learning Objectives At the end of this session you will be able to: discuss reasons why the normal probability distribution is important state the Central Limit Theorem and its value in approximating Binomial and Poisson probabilities by normal probabilities explain how the assumption of normality for a given random variable can be checked
To put your footer here go to View > Header and Footer 3 Importance of Normal Distribution Many measurements can be closely approximated by the normal distribution since many variables show normal variation as a resultant of many minor influences up and down Data which are not normal, can often be transformed into a normal random variable The normal distribution underpins a lot of inference ideas. We have seen that probability statements about any normally distributed variable can be done via N(0,1)
To put your footer here go to View > Header and Footer 4 The Central Limit Theorem (CLT) One of the key reasons why the normal distribution is important is because of the Central Limit Theorem (CLT). This theorem states that the sample mean of any random variable has an approximate normal distribution, provided that the sample size is sufficiently large.
To put your footer here go to View > Header and Footer 5 Consequences of the Central Limit Theorem Many statistical techniques are based on the assumption that the mean of the distribution follows a normal distribution As a consequence of the Central Limit Theorem, the above assumption is not invalidated as long as the sample size is large enough, e.g. say > about 30. The CLT also implies that the binomial and Poisson probabilities approach the normal probabilities as n becomes large (see below).
To put your footer here go to View > Header and Footer 6 Normal approximation to the binomial distribution Recall that the form of the binomial distribution for p=0.5 closely resembles the normal distribution This is because the binomial probabilities are symmetric when p=0.5 However, even with p0.5, the normal approximation holds for large n because a binomial random variable is the mean of several Bernoulli random variables and then the CLT applies
To put your footer here go to View > Header and Footer 7 Normal approximation to the Poisson distribution Recall from previous session (slides 8-12) that as the Poisson parameter becomes large, the shape of the Poisson distribution becomes bell-shaped and symmetrical This is again a consequence of the CLT since is the mean of the Poisson distribution
To put your footer here go to View > Header and Footer 8 More formally… has a normal distribution with mean 0 and variance 1 (standard normal) when the sample size n is large. Note that = r/n, where r=number of successes in n trials, i.e. r is a binomial random variable. If is an average of a series of n Bernoulli random variables (0,1 variables), then
To put your footer here go to View > Header and Footer 9 The same result is true for the Poisson average, i.e. Z defined below can be approximated by the standard normal distribution for large values of. and further …
To put your footer here go to View > Header and Footer 10 Thus the normal distribution plays an important role in statistics. Most of the techniques covered in Modules H2 and H8 are based on assuming that the key response of interest follows a normal distribution. We therefore need to be able to check whether measurements on a given random variable follows a normal distribution. This is done by producing a normal probability plot. Checking for normality
To put your footer here go to View > Header and Footer 11 Statistics software packages generally have a facility for producing this plot. Below is the plot for maize cob weights. In this plot, the Y-axis corresponds to values you would expect from an actual normal distribution. The X-axis corresponds to your data. This implies that a straight line indicates the normality assumption is valid. What do you deduce from graph below? Normal Probability Plot
To put your footer here go to View > Header and Footer 12