Chapter 6 Probability Distributions

Chapter 6 Probability Distributions
Section 6.1 Summarizing Possible Outcomes and Their Probabilities

Randomness A random variable is a numerical measurement of the outcome of a random phenomenon. EG: the gender of the next consumer, the part affiliation of the next voter. Often, the randomness results from selecting a random sample for a population or performing a randomized experiment

Random Variable Use letters near the end of the alphabet, such as x, to symbolize: variables a particular value of the random variable Use a capital letter, such as X, to refer to the random variable itself. Example: Flip a coin three times X= number of heads in the 3 flips; defines the random variable x=2; represents a possible value of the random variable

Probability Distribution
The probability distribution of a random variable specifies its possible values and their probabilities. Note: It is the randomness of the variable that allows us to specify probabilities for the outcomes.

Probability Distribution of a Discrete Random Variable
A discrete random variable X takes a set of separate values (such as 0,1,2,…) as its possible outcomes. Its probability distribution assigns a probability P(x) to each possible value x: For each x, the probability P(x) falls between 0 and 1. The sum of the probabilities for all the possible x values equals 1.

Example 1: # of Heads when flipping three fair coins
What is the sample space S? What are the corresponding probabilities? Now let X = # of the heads obtained. What are the possible values for X? What are the corresponding probabilities for X? Summarize into a probability distribution table. Now X is called a random variable (r.v.). Is X a discrete or continuous r.v.?

Example 2: Number of Home Runs in a Game
What is the estimated probability of at least three home runs? X= # of home runs Table 6.1 Probability Distribution of Number of Home Runs in a Game for San Francisco Giants

Table 6.1 Probability Distribution of Number of Home Runs in a Game for San Francisco Giants The probability of at least three home runs in a game is P(X≥3)=P(3)+P(4)+P(5 or more)= =

Mean of a Discrete Probability Distribution
The mean of a probability distribution for a discrete random variable is: where the sum is taken over all possible values of x. The mean of a probability distribution is denoted by the parameter, . The mean is a weighted average; values of x that are more likely receive greater weight P(x). Eg: if you toss a fair coin, how many heads would you expect to see? # heads Prob Product 1/4 1 1/2 1*1/2=1/2 2 2*1/4=1/2 =0+(1/2)+(1/2)=1

Expected Value of X The mean of a probability distribution of a random variable X is also called the expected value of X. The expected value reflects not what we’ll observe in a single observation, but rather what we expect for the average in a long run of observations. Note: It is not unusual for the expected value of a random variable to equal a number that is NOT a possible outcome. Eg1: In the previous, we have that the expected number of heads out of two random tosses of a fair coin is 1. Eg2: The expected number of dots for the toss of a fair die is 3.5 (Verify.)

Example: Number of Home Runs in a Game
Find the mean of this probability distribution. Table 6.1 Probability Distribution of Number of Home Runs in a Game for San Francisco Giants

Example 1: # of Heads obtained when flipping three fair coins
The mean: = (3)(1/8) + (2)(3/8) + (1)(3/8) + (0)(1/8) = 1.5 X 3 2 1 Prob 1/8 3/8

The mean: = 0(0.3889) + 1(0.3148) + 2(0.2222) + 3(0.0556) + 4(0.0185) = 0 * P(0) + 1 * P(1) + 2 * P(2) + 3 * P(3) + 4 * P(4) = 1

The Standard Deviation of a Probability Distribution
The standard deviation of a probability distribution, denoted by the parameter, , measures variability from the mean. Larger values of correspond to greater spread. Roughly, describes how far the random variable falls, on the average, from the mean of its distribution. If you want to know more, STT315 is a good class.

Continuous Random Variable
A continuous random variable has an infinite continuum of possible values in an interval. Examples are: time, age and size measures such as height and weight. Continuous variables are usually measured in a discrete manner because of rounding.

Probability Distribution of a Continuous Random Variable
A continuous random variable has possible values that form an interval. Its probability distribution is specified by a curve. Each interval has probability between 0 and 1. The interval containing all possible values has probability equal to 1.

Probability Distribution of a Continuous Random Variable
Smooth curve approximation Figure 6.2 Probability Distribution of Commuting Time. The area under the curve for values higher than 45 is Question: Identify the area under the curve represented by the probability that commuting time is less than 15 minutes, which equals 0.29.

Chapter 6 Probability Distributions
Section 6.2 Probabilities for Bell-Shaped Distributions

Density Curves Example: Here is a histogram of vocabulary scores of 947 seventh graders. The smooth curve drawn over the histogram is a mathematical model for the distribution.

Density Curves An important property of a density curve is that areas under the curve correspond to relative frequencies relative frequencies= area = .293 Note the relative frequency of vocabulary scores <= 6 is roughly equal to the area under the density curve <= 6.

Density Curves and Normal Distribution
Density curves come in any imaginable shape. Some are well known mathematically and others aren’t.

Density Curves and Normal Distribution
Definition, pg 56 Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company

e = 2.71828… The base of the natural logarithm
Normal distributions Normal – or Gaussian – distributions are a family of symmetrical, bell shaped density curves defined by a mean m (mu) and a standard deviation s (sigma) : N(m,s). Commonly called the bell curve - if were skiing down it you are going steeper and steeper, then starts to flatten out. This is the equation - don’t have to know it - basically for every value x, gestation time, you can plug it in and get f(x), the value on the y axis. What we have done here is to go from a histogram, which is just your few data points, to this curve, which is a representation of what values you would get for any possible value of x whether you have it in your data set or not. x x e = … The base of the natural logarithm π = pi = …

Normal Distribution The normal distribution is symmetric, bell-shaped and characterized by its mean and standard deviation . The normal distribution is the most important distribution in statistics. Many distributions have an approximately normal distribution. The normal distribution also can approximate many discrete distributions well when there are a large number of possible outcomes. Many statistical methods use it even when the data are not bell shaped.

A family of density curves
Here means are the same (m = 15) while standard deviations are different (s = 2, 4, and 6). Here means are different (m = 10, 15, and 20) while standard deviations are the same (s = 3)

Normal Distribution Normal distributions are Bell shaped
Symmetric around the mean The mean ( ) and the standard deviation ( ) completely describe the density curve. Increasing/decreasing moves the curve along the horizontal axis. Increasing/decreasing controls the spread of the curve.

Normal Distribution Within what interval do almost all of the men’s heights fall? Women’s height? Figure 6.4 Normal Distributions for Women’s Height and Men’s Height. For each different combination of and values, there is a normal distribution with mean and standard deviation . Question: Given that = 70 and = 4, within what interval do almost all of the men’s heights fall?

Normal Distribution: 68-95-99.7 Rule for Any Normal Curve
≈ 68% of the observations fall within one standard deviation of the mean. ≈ 95% of the observations fall within two standard deviations of the mean. ≈ 99.7% of the observations fall within three standard deviations of the mean. Figure 6.5 The Normal Distribution. The probability equals approximately 0.68 within 1 standard deviation of the mean, approximately 0.95 within 2 standard deviations, and approximately within 3 standard deviations. Question: How do these probabilities relate to the empirical rule?

Example : % Rule Heights of adult women can be approximated by a normal distribution, inches; inches Rule for women’s heights: 68% are between 61.5 and 68.5 inches 95% are between 58 and 72 inches 99.7% are between 54.5 and 75.5 inches

Z-Scores and the Standard Normal Distribution
The z-score for a value x of a random variable is the number of standard deviations that x falls from the mean. A negative (positive) z-score indicates that the value is below (above) the mean. Z-scores can be used to calculate the probabilities of a normal random variable using TI calculator

The standard normal random variable:
mean=0 and SD=1 Denoted by N(0,1) Symmetric around 0 68% are within (-1,1), 95% are within (-2,2) and 99.7% are within (-3,3)

Using TI84 to find the probability for N(0,1), the standard normal distribution
1. Press the [2nd] + [VARS] keys to get to the DISTR menu; 2. Select the 2nd choice, normalcdf gives you the normal distribution probabilities. The format for this command is 3. normalcdf(lowerbound, upperbound, mu, sigma), Eg1.P(-1.96 < Z < 1.96) = normalcdf(-1.96,1.96, 0, 1)= 0.95 for a standard normal curve, and Eg2.P(Z<0)=normalcdf(-999,0)=0.5 which is the left tail area of 0. Exercise: Let X be a standard normal r.v., find out the Pr(Z>3)= (2) Pr(Z<5)= (3) Pr(Z<-0.31) = (4)Pr(Z>0.31) = (5) Pr(0.31<Z<2.56)= (6)Pr(-2.56<Z<-0.31)= (7) Pr(-0.31<Z<2.56) = (8) Pr(-2.56<Z<0.31)=

Standardized height (no units)
The standard Normal distribution Because all Normal distributions share the same properties, we can standardize our data to transform any Normal curve N(m,s) into the standard Normal curve N(0,1). N(0,1) => N(64.5, 2.5) Standardized height (no units) For each x we calculate a new value, z (called a z-score).

Standardizing: calculating z-scores
A z-score measures the number of standard deviations that a data value x is from the mean m. When x is 1 standard deviation larger than the mean, then z = 1. When x is 2 standard deviations larger than the mean, then z = 2. We do this by standardizing the distributions - really all this is redefining them not changing the shape but the bottom axis so that instead of being N(mu, sigma) they are N(mean =0,sd=1), and the bottom axis is in terms of the SD rather than the Height. You get this by calculating a value z for every point in x your data set. If you were to then draw the density curve for the z values you get a curve with a mean of 0 and a sd of 1. Once you have standardized, you can look up any value you want using a table. So, for instance, we knew that 68% of women were between 62 and 67 inches tall from knowing simple rules about 1,2,3 sd from mean. But if wanted to know the percentage of women that were less than 63 inches tall. Can’t just use those rules. need to standardize and go to table A - standard normal probabilities - on green card in book or in back. First standardize x to get z, the number of sd from the mean. It is 0.6 to the left (is negative). Look for -0.6 in left column (z), and then going across row, under .00 column (no more decimals on (-0.6) you find Twenty seven percent of women are shorter than 62 inches tall. When x is larger than the mean, z is positive. When x is smaller than the mean, z is negative.

Transformation from N(µ, σ) to N(0,1)
If X is N(m,s), then Z is N(0,1), where Z=(X-m)/s. For each x we calculate a new value, z (called a z-score). Because all Normal distributions share the same properties, we can standardize our data to transform any Normal curve N(m,s) into the standard Normal curve N(0,1). Such Transformation keeps the area! Z=(X-65)/3.5 Z-score of 68.5=1, Z-score of 61.5=-1 Z-score of 58=-2, Z-score of 72 = 2 Z-score tells us how many SD the observation is above or below the average.

The NCAA defines a “partial qualifier” eligible to practice and receive an athletic scholarship, but not to compete, as a combined SAT score is at least 720. Assume the scores follow normal distribution with avg=1026 and sd=209 What proportion of all students who take the SAT would be partial qualifiers? That is, what proportion have scores between 720 and 820? About 9% of all students who take the SAT have scores between 720 and 820. Q: How many percent of the students are partial qualifiers Pr(X<720)? (Verify the answer, 0.072)

Women heights N(µ, s) = N(65, 3.5) Women heights Approx. N(65”,3.5”) distribution. What percent of women are shorter than 68 inches tall? Area= ??? mean µ = 65" standard deviation s = 3.5" x (height) = 69" m = 65” x = 69” z = z = ? We calculate z, the standardized value of x:

Review: p-th percentile
The p-th percentile of a distribution is the value that has p percent of the observations fall at or below it. 90th percentile

Inverse normal calculations for N(0,1)
We may also want to find the observed range of values that correspond to a given proportion/ area under the curve. The following instructions are for N(0,1) The percentile always refer to the area to the left. Use Ti84, Press the [2nd] and [VARS] keys to get to the DISTR menu; Select the third choice, invNorm. The format for this command is invNorm(area of the left tail), EX: (1) the 25th percentile (2) the 55th percentile (3) the 10th percentile (4) the 90th percentile

(5) what’s the 56th percentile? (96.4,100.9)
Example1: Suppose the height of a randomly selected 5-year-old child is a normal distribution with  =100cm and  =6cm. What’s the 90th percentile?(107.68) (2) What’s the 50th percentile?(100) (3) What’s the 10th percentile? (92.32) (4) What’s the 25th percentile? (5) what’s the 56th percentile? (96.4,100.9) Example 2: A soft-drink machine is regulated so that it discharges an average of 200 milliliters per cup with SD 15 milliliters. With normality assumption. Find the prob that a cup will contain more than 220 milliliters (2) Find the prob that a cup will contain between 180 and 230 milliliters (3) Find the 40th percentile of the discharge amount (4) Find the 89th percentile of the discharge amount Remember we started out the day using gestation time as an example. Women who are malnourished risk have premature babies, and studies are being done to see whether different diet and vitamin supplements work better. Let’s say the goal is to get them to carry the baby at least 240 days (8 months). For treatment 1, say vitamins only, get normal distribution with mean 250, sd is 20. Treatment 2 is vitamins plus a meals on wheels program. Mean is 266, sd 15 . You can eyeball this and see that more of the women in treatment two are above our goal of 240 days, but how much of an improvement is it? The mean is increased, but the spread has changed too. Let’s standardize to get the proportion of women below 240 in each distribution. Go through - bottom line is that adding food to vitamins resulted in the proportion of women with gestation times of less than 240 days going from 30.85% to only 4.18%. You see figures like this in news stories all the time - if you went to the primary (medical literature) you would see that they had to go through this rigamarole to get you that tidy summary.

Chapter 6 Probability Distributions

Similar presentations

Presentation on theme: "Chapter 6 Probability Distributions"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 6 Probability Distributions

Similar presentations

Presentation on theme: "Chapter 6 Probability Distributions"— Presentation transcript:

Similar presentations

About project

Feedback