Chapter 7 Random Variables and Continuous Distributions.

Chapter 7 Random Variables and Continuous Distributions

Consider the random variable: x = the weight (in pounds) of a full-term newborn child Suppose that weight is reported to the nearest pound. The following probability histogram displays the distribution of weights. Now suppose that weight is reported to the nearest 0.1 pound. This would be the probability histogram. What type of variable is this? The area of the rectangle centered over 7 pounds represents the probability 6.5 < x < 7.5 What is the sum of the areas of all the rectangles? Notice that the rectangles are narrower and the histogram begins to have a smoother appearance. If weight is measured with greater and greater accuracy, the histogram approaches a smooth curve. The shaded area represents the probability 6 < x < 8. This is an example of a density curve.

Probability Distributions for Continuous Variables Is specified by a curve called a density curve. The function that describes this curve is denoted by f(x) and is called the density function. The probability is the area under the curve and above the given interval.

Properties of continuous probability distributions 1. f(x) > 0 (the curve cannot dip below the horizontal axis) 2. The total area under the density curve equals one.

Let x denote the amount of gravel sold (in tons) during a randomly selected week at a particular sales facility. Suppose that the density curve has a height f(x) above the value x, where The density curve is shown in the figure:

1 – ½(0.5)(1) =.75 Gravel problem continued... What is the probability that at most ½ ton of gravel is sold during a randomly selected week? P(x < ½) = The probability would be the shaded area under the curve and above the interval from 0 to 0.5. This area can be found by use the formula for the area of a trapezoid: OR, more easily, by finding the area of the triangle, and subtracting that area from 1.

0 Gravel problem continued... What is the probability that exactly ½ ton of gravel is sold during a randomly selected week? P(x = ½) = The probability would be the area under the curve and above 0.5. How do we find the area of a line segment? Since a line segment has NO area, then the probability that exactly ½ ton is sold equals 0.

= 1 – ½(0.5)(1) =.75 Gravel problem continued... What is the probability that less than ½ ton of gravel is sold during a randomly selected week? P(x < ½) = Does the probability change whether the ½ is included or not? P(x < ½) Hmmm... This is different than discrete probability distributions where it does change the probability whether a value is included or not!

Suppose x is a continuous random variable defined as the amount of time (in minutes) taken by a clerk to process a certain type of application form. Suppose x has a probability distribution with density function: The following is the graph of f(x), the density curve: Time (in minutes) Density

Application Problem Continued... What is the probability that it takes more than 5.5 minutes to process the application form? Time (in minutes) Density P(x > 5.5) =.5(.5) =.25 Find the probability by calculating the area of the shaded region (base × height). When the density is constant over an interval (resulting in a horizontal density curve), the probability distribution is called a uniform distribution.

Application Problem Continued... What is the mean and standard deviation of the time needed to process a form? Time (in minutes) Density  = (4+6)/2 = 5  2 = 2 2 /12 = 1/3  =.5573

Other Density Curves Some density curves resemble the one below. Integral calculus is used to find the area under the these curves. Don’t worry – we will use tables (with the values already calculated). We can also use calculators or statistical software to find the area.

The probability that a continuous random variable x lies between a lower limit a and an upper limit b is P(a < x < b) = (cumulative area to the left of b) – (cumulative area to the left of a) P(a < x < b) = P(x < b) – P(x < a) This will be useful later in this chapter!

Mean and Variance for Continuous Random Variables For continuous probability distributions,  x and  x can be defined and computed using methods from calculus. The mean value  x locates the center of the continuous distribution. The standard deviation,  x, measures the extent to which the continuous distribution spreads out around  x.

A company receives concrete of a certain type from two different suppliers. Letx = compression strength of a randomly selected batch from Supplier 1 y = compression strength of a randomly selected batch from Supplier 2 Suppose that  x = 4650 pounds/inch 2  x = 200 pounds/inch 2  y = 4500 pounds/inch 2  y = 275 pounds/inch 2 The first supplier is preferred to the second both in terms of mean value and variability. 4500430047004900 yy xx

Normal Distributions Continuous probability distribution Symmetrical bell-shaped (unimodal) density curve defined by  and  Area under the curve equals 1 Probability is calculated by finding the area under the curve As  increases, the curve flattens & spreads out As  decreases, the curve gets taller and thinner How is this done mathematically? To overcome the need for calculus, we rely on technology or on a table of areas for the standard normal distribution

A B Do these two normal curves have the same mean? If so, what is it? Which normal curve has a standard deviation of 3? Which normal curve has a standard deviation of 1? 6 YES B   A

Notice that the normal curve is curving downwards from the center (mean) to points that are one standard deviation on either side of the mean. At those points, the normal curve begins to turn upward.

Standard Normal Distribution Is a normal distribution with  = 0 and  = 1 It is customary to use the letter z to represent a variable whose distribution is described by the standard normal curve (or z curve).

Using the Table of Standard Normal (z) Curve Areas For any number z*, from -3.89 to 3.89 and rounded to two decimal places, the Appendix Table 2 gives the area under the z curve and to the left of z*. P(z < z*) = P(z < z*) Where the letter z is used to represent a random variable whose distribution is the standard normal distribution. To use the table: Find the correct row and column (see the following example) The number at the intersection of that row and column is the probability

Suppose we are interested in the probability that z* is less than -1.62. P(z < -1.62) = z*z*.00.01.02 -1.7.0446.0436.0427.0418 -1.6.0548.0537.0526.0516 -1.5.0668.0655.0643.0618 …………… ….0526 In the table of areas: Find the row labeled -1.6 Find the column labeled 0.02 Find the intersection of the row and column

Suppose we are interested in the probability that z* is less than 2.31. P(z < 2.31) = z*z*.00.01.02 2.2.9861.9864.9868.9871 2.3.9893.9896.9898.9901 2.4.9918.9920.9922.9925 …………… ….9896

Suppose we are interested in the probability that z* is greater than 2.31. P(z > 2.31) = z*z*.00.01.02 2.2.9861.9864.9868.9871 2.3.9893.9896.9898.9901 2.4.9918.9920.9922.9925 …………… … 1 -.9896 =.0104 The Table of Areas gives the area to the LEFT of the z*. To find the area to the right, subtract the value in the table from 1

Suppose we are interested in the finding the z* for the smallest 2%. P(z < z*) =.02 z*z*.03.04.05 -2.1.0162.0158.0154 -2.0.0207.0202.0197 -1.9.0262.0256.0250 …………… … z* = -2.08 z*z* To find z*: Look for the area.0200 in the body of the Table. Follow the row and column back out to read the z-value. … … … Since.0200 doesn’t appear in the body of the Table, use the value closest to it.

Suppose we are interested in the finding the z* for the largest 5%. P(z > z*) =.05 z*z*.03.04.05 1.5.9382.9398.9406 1.6.9495.9505.9515 1.7.9591.9599.9608 …………… … z* = 1.645 z*z* Remember the Table of Areas gives the area to the LEFT of z*. 1 – (area to the right of z*) Then look up this value in the body of the table. … … ….95 Since.9500 is exactly between.9495 and.9505, we can average the z* for each of these

Finding Probabilities for Other Normal Curves To find the probabilities for other normal curves, standardize the relevant values and then use the table of z areas. If x is a random variable whose behavior is described by a normal distribution with mean  and standard deviation , then P(x < b) = P(z < b*) P(x > a) = P(z > a*) P(a < x < b) = P(a* < z < b*) Where z is a variable whose distribution is standard normal and

Data on the length of time to complete registration for classes using an on-line registration system suggest that the distribution of the variable x = time to register for students at a particular university can well be approximated by a normal distribution with mean  = 12 minutes and standard deviation  = 2 minutes.

Registration Problem Continued... x = time to register  = 12 minutes and  = 2 minutes What is the probability that it will take a randomly selected student less than 9 minutes to complete registration? P(x < 9) = Look this value up in the table. Standardized this value..0668 9

Registration Problem Continued... x = time to register  = 12 minutes and  = 2 minutes What is the probability that it will take a randomly selected student more than 13 minutes to complete registration? P(x > 13) = Look this value up in the table and subtract from 1. Standardized this value. 1 -.6915 =.3085 13

Registration Problem Continued... x = time to register  = 12 minutes and  = 2 minutes What is the probability that it will take a randomly selected student between 7 and 15 minutes to complete registration? P(7 < x < 15) = Look these values up in the table and subtract (value for a*) – (value for b*) Standardized these values..9332 -.0062 =.9270 715

Registration Problem Continued... x = time to register  = 12 minutes and  = 2 minutes Because some students do not log off properly, the university would like to log off students automatically after some time has elapsed. It is decided to select this time so that only 1% of students will be automatically logged off while still trying to register. What time should the automatic log off be set at? P(x > a*) =.01 Use the formula for standardizing to find x. Look up the area to the left of a* in the table. a* = 16.66 a*a*.01.99

Will my calculator do any of this normal stuff? Normalpdf – use for graphing ONLY Normalcdf – will find probability of area from lower bound to upper bound Invnorm (inverse normal) – will find z- score from probability

Ways to Assess Normality Some of the most frequently used statistical methods are valid only when x 1, x 2, …, x n has come from a population distribution that at least is approximately normal. One way to see whether an assumption of population normality is plausible is to construct a normal probability plot of the data. A normal probability plot is a scatterplot of (normal score, observed values) pairs. What should happen if our data set is normally distributed?

Consider a random sample with n = 5. To find the appropriate normal scores for a sample of size 5, divide the standard normal curve into 5 equal-area regions. Why are these regions not the same width? Each region has an area equal to 0.2.

1.28.524-.524 Consider a random sample with n = 5. Next – find the median z-score for each region. -1.280 Why is the median not in the “middle” of each region? These are the normal scores that we would plot our data against. We use technology (calculators or statistical software) to compute these normal scores.

Ways to Assess Normality Some of the most frequently used statistical methods are valid only when x 1, x 2, …, x n has come from a population distribution that at least is approximately normal. One way to see whether an assumption of population normality is plausible is to construct a normal probability plot of the data. A normal probability plot is a scatterplot of (normal score, observed values) pairs. A strong linear pattern in a normal probability plot suggest that population normality is plausible. On the other hand, systematic departure from a straight-line pattern indicates that it is not reasonable to assume that the population distribution is normal. Such as curvature which would indicate skewness in the data Or outliers

Let’s construct a normal probability plot. Since the values of the normal scores depend on the sample size n, the normal scores when n = 10 are below: -1.539 -1.001 -0.656 -0.376 -0.123 0.123 0.376 0.656 1.001 1.539 The following data represent egg weights (in grams) for a sample of 10 eggs. 53.0453.5052.5353.0053.07 52.8652.6653.2353.2653.16 Sketch a scatterplot by pairing the smallest normal score with the smallest observation from the data set & so on Since the normal probability plot is approximately linear, it is plausible that the distribution of egg weights is approximately normal.

Notice that the boxplot is approximately symmetrical and that the normal probability plot is approximately linear. Notice that the boxplot is approximately symmetrical except for the outlier and that the normal probability plot shows the outlier. Notice that the boxplot is skewed left and that the normal probability plot shows this skewness.

Chapter 7 Random Variables and Continuous Distributions.

Similar presentations

Presentation on theme: "Chapter 7 Random Variables and Continuous Distributions."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 7 Random Variables and Continuous Distributions.

Similar presentations

Presentation on theme: "Chapter 7 Random Variables and Continuous Distributions."— Presentation transcript:

Similar presentations

About project

Feedback