Chapter 5 Continuous Distributions The Gaussian (Normal) Distribution.

Chapter 5 Continuous Distributions The Gaussian (Normal) Distribution

When Discrete Distributions Aren’t Enough Discrete distributions are used in situations involving counts. (Others are possible but this is the vast majority.) What happens when you want to measure things? –Height –Weight –Miles per Gallon These aren’t counts. (Why not?) Measurements involve rounding and precision. When any level of precision is theoretically possible, we call this a “continuous” variable. The values come from the set of Real Numbers, ie, the number line.

Real Numbers  -------------------------------------------------------  -∞ … -3 -2 -1 0 1 2 3 … ∞ The Real Numbers include all possible values between the pictured integers. That includes rational numbers like ½, 1/3, 237/573, etc. It also includes irrational numbers like π and √2. Real numbers have an infinite string of decimal places. There are “uncountably many” real numbers between any two specified real numbers.

Intervals An interval is a “piece” of the number line, or a subset of the Real numbers. There are no “gaps.” For any two numbers in it, all real numbers between them are included. Therefore an interval is described by its endpoints—with a few special considerations. The endpoints may or may not be included. Round brackets are used to exclude the endpoints, square brackets to include them. Write in order. Ex.: [0,1], (9, 100), [3,6), (0,7]. If an interval goes on to infinity, the ∞ or -∞ symbol is used with a round bracket, since infinity is not a number. Ex.: [0,∞), (-∞,-10).

Definition of Continuous R.V. A continuous random variable takes on values in some Real Interval.  -------------------------------------------------------  -∞ … -3 -2 -1 0 1 2 3 … ∞ Suppose a r.v. X takes values in [0,1]. How many different values are there? Suppose you assign some tiny probability to each Real Number in [0,1]. What is the total probability? Suppose you divide [0,1] up into 10 subintervals. Can you assign probabilities to these so the total is 1?

Definition of Continuous R.V. This illustrates the problem with assigning probabilities to individual numbers, and the contrasting ease of assigning probability to intervals. Summary: –Any continuous distribution has infinitely many values. –No single point has a positive probability. –Said another way: Every individual value of a continuous random variable has probability zero, and as such is an impossible event. –Intervals can be assigned positive probability.

The Paradox Obviously, a r.v., X, must take on some value, and if it does, that value is not impossible (it has P>0). We never actually “mean” a single value. Measurements are given with a certain precision. Example: temperature is continuous, but measured to the nearest degree, “70” really means the interval [69.5,70.5). Intervals can have positive probability, and we can make them as small as we like. The fact that a continuous r.v. cannot take a single value agrees nicely with the fact that it is impossible to measure anything to the exact real number value. Instead, we divide up our scale using equal-width subintervals based on the precision of the measuring device. These subintervals have positive probability.

Continuous Probabilities Probabilities for a continuous random variable, X, are given by a probability function, P. P(X=k)=0 for any k. We might find positive probabilities for expressions like –P(X>k),Note: the interval is (k,∞) –P(X<k), orNote: the interval is (-∞,k) –P(a<X<b).Note: the interval is (a,b) A formula that gives probabilities for X would need to give probabilities for intervals, rather than single values.

Has anything prepared us for this? Tables of probability for discrete r.v.’s? Not if only individual values were given. Ungrouped histograms? No, same. Grouped histograms? Let’s see…. Each bar represents a frequency for an interval, even though this is a discrete example.

What about relative histograms? Look at the histogram for the number of three’s showing in a two-dice toss. Notice it shows the probabilities for 3 discrete values. Replace the discrete values with intervals, [0,1), [1,2), and [2,3). Then this histogram looks like it belongs to a continuous distribution with values in [0,3).

Making the Leap Change the horizontal axis to show that the bars belong to each interval. Each bar is 1 unit wide and its height represents the probability for that interval. Each bar is a rectangle, whose area is 1 x height. Since the heights add up to 1, the total area of the shaded region is 1. Make the transition to the continuous case: Instead of representing probability by height, use area.

What did we leap over? This has been more of an analogy than an explanation. Many details that require calculus are glossed over. The problem: can’t represent probabilities by height at a point, because points all have probability zero. Solution: switch to areas, where the bottom boundary (on the x axis) represents an interval for which we want to determine probability. The area of the graph above that interval represents its probability. In calculus, these areas are called “definite integrals.” You don’t really need to know that, but you may come across the following symbol, which means “the integral from a to b.”

Uniform Distribution A uniform distribution is defined for an interval outside of which there is no positive probability. (This is to prevent the area from being infinite.) Inside that interval, it has the same probabilities for any sub-interval of a given size (they are “always the same”). A uniform distribution on the interval [0,3] is shown here. Note that the height is 1/3, because 3 x 1/3 = 1. However, we should not say that 1/3 is the probability of anything in particular.

Uniform Examples Let X be a uniform r.v. on the interval [1,5]. Find P(X>3), P(X<5), P(2<X<3), and P(0<X<3). Solution: The width of the distribution is 4, so the height of the graph is ¼ between 1 and 5. The area for any interval will be ¼ x the width of the interval. –P(X>3)=(5-3)/4=1/2 –P(X<5)=(5-1)/4=1 –P(2<X<3)=(3-2)/4=1/4 –Careful! P(0<X<3)=P(1<X<3)=(3-1)/4=1/2

More Uniform Examples Let X be a uniform r.v. on the interval [0,8]. –Find P(X>3), P(X<5), and P(2<X<3 or 7<X<8). –Find the median and the 90 th percentile. Solution: The width of the distribution is 8, so the height of the graph is 1/8. –P(X>3)=(8-3)/8=5/8. –P(X<5)=(5-0)/8=5/8. –P(2<X<3 or 7<X<8)=P(2<X<3)+P(7<X<8)=1/4. –The median must have half the probability above it and half below. Therefore the median is 4. –P 90 is a number such that 90% of the probability is below it, so we have (P 90 -0)/8=.9, so P 90 =7.2.

Probability Density Function We have been dealing with the uniform distribution in terms of graphs. Before moving on, we need to put these ideas into the form of mathematical notation. We were focusing on the areas of portions of a graph like the one below. But how do we define the region we want the area for? –The bottom boundary is the x axis –The sides are vertical lines going through the x values we want –The top of the region is a special “curve” (straight lines are curves too). This curve is defined by a function, called the probability density function, or pdf. For our graph, it is:

Normal Probability Distributions The normal probability distribution (Gaussian Distribution) is the most important distribution in all of statistics. Many continuous random variables have normal or approximately normal distributions. A normal distribution is defined by its pdf.

The Normal pdf The parameters are μ and σ. The mean of the distribution is μ. The standard deviation is σ. The median and mode are also μ. There is a normal distribution for every combination of values of μ and σ

Basic Shape Here we see the basic shape of a normal distribution. The blue band is an example of an “area under the curve” that we might want to calculate. This particular distribution has μ=110 and σ=10. The “x” axis represents values of the r.v. X.

Effect of Changing μ Changing μ just causes a horizontal shift, centering the graph in a different place.

Effect of Changing σ Changing σ causes the graph to stretch out or squeeze together around the mean.

What does this mean? The normal pdf is a complicated formula. It is not easy to calculate probabilities from it, even if you know calculus. So, we use tables (or computers). We can’t have a table for every possible normal distribution. We have one table for the “standard” normal distribution, which has μ =0 and σ=1. This r.v. is called Z. It is easy to convert probability statements from other normal distributions to Z.

Table 3, Appendix B entries: The table contains the area under the standard normal curve between 0 and a specific value of z.

Example: Find the area under the standard normal curve between z = 0 and z = 1.45. A portion of Table 3:

Example: Find the area under the normal curve to the right of Z = 1.45; P(Z > 1.45). Area asked for

Example: Find the area to the left of Z = 1.45; P(Z < 1.45).

Example: Find the area between Z =  1.26 and the mean (Z = 0). Area asked for Area from table 0.3962

Example: Find the area to the left of .98; P(Z < .98). Area asked for Area from table 0.3365 Same as area asked for

Applications of Normal Distributions Apply the techniques learned for the Z distribution to all normal distributions. Start with a probability question in terms of x-values. Convert, or transform, the question into an equivalent probability statement involving z-values.

Standardization Suppose X is a normal r.v. with mean  and standard deviation . The r.v. has a standard normal distribution.

Example: A bottling machine is adjusted to fill bottles with a mean of 32.0 oz of soda and standard deviation of 0.02. Assume the amount of fill is normally distributed and a bottle is selected at random. 1.Find the probability the bottle contains between 32 oz and 32.025 oz. 2. Find the probability the bottle contains more than 31.97 oz.

Other Normal Applications Find a cutoff point: a value of X such that there is a certain probability in a specified interval defined by x. Example: The waiting time X at a certain bank is approximately normally distributed with a mean of 3.7 minutes and a standard deviation of 1.4 minutes. The bank would like to claim that 95% of all customers are waited on by a teller within c minutes. Find the value of c that makes this statement true.

Solution:

Notation: If X is a normal random variable with mean  and standard deviation , this is often denoted: X ~ N( ,   ). Example: Suppose X is a normal random variable with  = 35 and  = 6. A convenient notation to identify this random variable is: x ~ N(35, 36). z(  ) and z   are commonly used notations for the z- score (point on the z axis) such that there is  of the area (probability) to the right of z(  ) or z  .

Illustrations: z(0.10) represents the value of Z such that the area to the right under the standard normal curve is 0.10 z(0.80) represents the value of Z such that the area to the right under the standard normal curve is 0.80

Example: Find the numerical value of z(0.10). Use Table 3: look for an area as close as possible to 0.4000 z(0.10) = 1.28 0.10 (area information from notation) Table shows this area (0.4000)

Note: The values of Z that will be used regularly come from one of the following situations: 1.The z-score such that there is a specified area in one tail of the normal distribution. 2.The z-scores that bound a specified middle proportion of the normal distribution.

Example: Find the z-scores that bound the middle 0.99 of the normal distribution. Use Table 3:

Excel has 4 normal distribution functions to choose from.

Excel Functions NormDist NormInv NormSDist NormSInv The “S” in the second pair is for “Standard” Normal. The first pair allows you to enter the parameters. The “Dist” functions are used to find probabilities The “Inv” functions are used to find x or z values

NORMSDIST Returns the standard normal cumulative distribution function. Syntax: NORMSDIST(z) Unlike the tables in the book, the probability Excel gives is not for (0,z) but for (-∞,z). So it will be 0.5 more than the table for positive z values.

NORMDIST Syntax: NORMDIST(x,mean,standard_dev,cumulative) “Cumulative” is a logical value that determines the form of the function. If cumulative is TRUE, NORMDIST returns the cumulative probability. This is similar to using the standard normal tables. if “Cumulative” is FALSE, it returns the probability mass function, that is, the y-value on the graph of normal curve. This is useful if you want to make a graph that shows the normal curve. You can use 1 for true and 0 for false.

NORMSINV Returns the z-value (inverse) of the standard normal cumulative distribution. Syntax NORMSINV(probability) “Probability” is the cumulative probability, that is, the probability for the interval (-∞,z) for which you want to find the z value.

NORMINV Returns the inverse of the normal cumulative distribution for the specified mean and standard deviation. Syntax NORMINV(probability,mean,standard_dev) “Probability” is the probability of the interval (-∞,x) for which you want to find the x-value.

Chapter 5 Continuous Distributions The Gaussian (Normal) Distribution.

Similar presentations

Presentation on theme: "Chapter 5 Continuous Distributions The Gaussian (Normal) Distribution."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 5 Continuous Distributions The Gaussian (Normal) Distribution.

Similar presentations

Presentation on theme: "Chapter 5 Continuous Distributions The Gaussian (Normal) Distribution."— Presentation transcript:

Similar presentations

About project

Feedback