Chapter 6 Probability Distributions

Slides:



Advertisements
Similar presentations
Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution ● Continuous and discrete distributions; Density curves ● The important.
Advertisements

The Normal distributions BPS chapter 3 © 2006 W.H. Freeman and Company.
HS 67 - Intro Health Stat The Normal Distributions
Chapter 3 (Introducing density curves) When given a Histogram or list of data, we often are asked to estimate the relative position of a particular data.
Looking at data: distributions - Density curves and normal distributions IPS section 1.3 © 2006 W.H. Freeman and Company (authored by Brigitte Baldi, University.
Chapter 6 Normal Probability Distributions
The Normal distributions PSLS chapter 11 © 2009 W.H. Freeman and Company.
Objectives (BPS 3) The Normal distributions Density curves
PROBABILITY DISTRIBUTIONS
8.5 Normal Distributions We have seen that the histogram for a binomial distribution with n = 20 trials and p = 0.50 was shaped like a bell if we join.
3.3 Density Curves and Normal Distributions
Looking at Data - Distributions Density Curves and Normal Distributions IPS Chapter 1.3 © 2009 W.H. Freeman and Company.
Section 7.1 The STANDARD NORMAL CURVE
Chapter 6: Probability Distributions
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 PROBABILITIES FOR CONTINUOUS RANDOM VARIABLES THE NORMAL DISTRIBUTION CHAPTER 8_B.
The Normal distributions BPS chapter 3 © 2006 W.H. Freeman and Company.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Chapter 6. Continuous Random Variables Reminder: Continuous random variable.
Transformations, Z-scores, and Sampling September 21, 2011.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 6 Probability Distributions Section 6.2 Probabilities for Bell-Shaped Distributions.
The Normal distributions BPS chapter 3 © 2006 W.H. Freeman and Company.
NORMAL DISTRIBUTION Chapter 3. DENSITY CURVES Example: here is a histogram of vocabulary scores of 947 seventh graders. BPS - 5TH ED. CHAPTER 3 2 The.
Random Variables Presentation 6.. Random Variables A random variable assigns a number (or symbol) to each outcome of a random circumstance. A random variable.
IPS Chapter 1 © 2012 W.H. Freeman and Company  1.1: Displaying distributions with graphs  1.2: Describing distributions with numbers  1.3: Density Curves.
Statistics for Business and Economics Module 1:Probability Theory and Statistical Inference Spring 2010 Lecture 3: Continuous probability distributions.
Density Curves & Normal Distributions Textbook Section 2.2.
THE NORMAL DISTRIBUTION
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 6 Probability Distributions Section 6.1 Summarizing Possible Outcomes and Their Probabilities.
The Normal Distributions.  1. Always plot your data ◦ Usually a histogram or stemplot  2. Look for the overall pattern ◦ Shape, center, spread, deviations.
Theoretical distributions: the Normal distribution.
13-5 The Normal Distribution
Section 2.1 Density Curves
The Normal distribution
Normal distributions x x
11. The Normal distributions
CHAPTER 2 Modeling Distributions of Data
Normal Distribution and Parameter Estimation
Review and Preview and The Standard Normal Distribution
CHAPTER 2 Modeling Distributions of Data
Random Variables Random variables assigns a number to each outcome of a random circumstance, or equivalently, a random variable assigns a number to each.
BIOS 501 Lecture 3 Binomial and Normal Distribution
The Normal Distribution
Sec. 2.1 Review 10th, z = th, z = 1.43.
Density Curves and Normal Distribution
CHAPTER 2 Modeling Distributions of Data
Part A: Concepts & binomial distributions Part B: Normal distributions
The Normal Probability Distribution
Introduction to Probability and Statistics
Normal Probability Distributions
Chapter 2 Data Analysis Section 2.2
Review
Basic Practice of Statistics - 3rd Edition The Normal Distributions
The Standard Normal Distribution
6.1: Discrete and Continuous Random Variables
CHAPTER 2 Modeling Distributions of Data
Measuring location: percentiles
Chapter 8 Statistical Inference: Confidence Intervals
CONTINUOUS RANDOM VARIABLES AND THE NORMAL DISTRIBUTION
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Basic Practice of Statistics - 3rd Edition The Normal Distributions
CHAPTER 2 Modeling Distributions of Data
The Normal Curve Section 7.1 & 7.2.
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Standard Deviation and the Normal Model
Basic Practice of Statistics - 3rd Edition The Normal Distributions
CHAPTER 2 Modeling Distributions of Data
Standard Normal Table Area Under the Curve
CHAPTER 2 Modeling Distributions of Data
Presentation transcript:

Chapter 6 Probability Distributions Section 6.1 Summarizing Possible Outcomes and Their Probabilities

Randomness A random variable is a numerical measurement of the outcome of a random phenomenon. EG: the gender of the next consumer, the part affiliation of the next voter. Often, the randomness results from selecting a random sample for a population or performing a randomized experiment

Random Variable Use letters near the end of the alphabet, such as x, to symbolize: variables a particular value of the random variable Use a capital letter, such as X, to refer to the random variable itself. Example: Flip a coin three times X= number of heads in the 3 flips; defines the random variable x=2; represents a possible value of the random variable

Probability Distribution The probability distribution of a random variable specifies its possible values and their probabilities. Note: It is the randomness of the variable that allows us to specify probabilities for the outcomes.

Probability Distribution of a Discrete Random Variable A discrete random variable X takes a set of separate values (such as 0,1,2,…) as its possible outcomes. Its probability distribution assigns a probability P(x) to each possible value x: For each x, the probability P(x) falls between 0 and 1. The sum of the probabilities for all the possible x values equals 1.

Example 1: # of Heads when flipping three fair coins What is the sample space S? What are the corresponding probabilities? Now let X = # of the heads obtained. What are the possible values for X? What are the corresponding probabilities for X? Summarize into a probability distribution table. Now X is called a random variable (r.v.). Is X a discrete or continuous r.v.?

Example 2: Number of Home Runs in a Game What is the estimated probability of at least three home runs? X= # of home runs Table 6.1 Probability Distribution of Number of Home Runs in a Game for San Francisco Giants

Example 2: Number of Home Runs in a Game Table 6.1 Probability Distribution of Number of Home Runs in a Game for San Francisco Giants The probability of at least three home runs in a game is P(X≥3)=P(3)+P(4)+P(5 or more)= 0.0556 + 0.0185 + 0 = 0.0741

Mean of a Discrete Probability Distribution The mean of a probability distribution for a discrete random variable is: where the sum is taken over all possible values of x. The mean of a probability distribution is denoted by the parameter, . The mean is a weighted average; values of x that are more likely receive greater weight P(x). Eg: if you toss a fair coin, how many heads would you expect to see? # heads Prob Product 1/4 1 1/2 1*1/2=1/2 2 2*1/4=1/2 =0+(1/2)+(1/2)=1

Expected Value of X The mean of a probability distribution of a random variable X is also called the expected value of X. The expected value reflects not what we’ll observe in a single observation, but rather what we expect for the average in a long run of observations. Note: It is not unusual for the expected value of a random variable to equal a number that is NOT a possible outcome. Eg1: In the previous, we have that the expected number of heads out of two random tosses of a fair coin is 1. Eg2: The expected number of dots for the toss of a fair die is 3.5 (Verify.)

Example: Number of Home Runs in a Game Find the mean of this probability distribution. Table 6.1 Probability Distribution of Number of Home Runs in a Game for San Francisco Giants

Example 1: # of Heads obtained when flipping three fair coins The mean: = (3)(1/8) + (2)(3/8) + (1)(3/8) + (0)(1/8) = 1.5 X 3 2 1 Prob 1/8 3/8

Example 2: Number of Home Runs in a Game The mean: = 0(0.3889) + 1(0.3148) + 2(0.2222) + 3(0.0556) + 4(0.0185) = 0 * P(0) + 1 * P(1) + 2 * P(2) + 3 * P(3) + 4 * P(4) = 1

The Standard Deviation of a Probability Distribution The standard deviation of a probability distribution, denoted by the parameter, , measures variability from the mean. Larger values of correspond to greater spread. Roughly, describes how far the random variable falls, on the average, from the mean of its distribution. If you want to know more, STT315 is a good class.

Continuous Random Variable A continuous random variable has an infinite continuum of possible values in an interval. Examples are: time, age and size measures such as height and weight. Continuous variables are usually measured in a discrete manner because of rounding.

Probability Distribution of a Continuous Random Variable A continuous random variable has possible values that form an interval. Its probability distribution is specified by a curve. Each interval has probability between 0 and 1. The interval containing all possible values has probability equal to 1.

Probability Distribution of a Continuous Random Variable Smooth curve approximation Figure 6.2 Probability Distribution of Commuting Time. The area under the curve for values higher than 45 is 0.15. Question: Identify the area under the curve represented by the probability that commuting time is less than 15 minutes, which equals 0.29.

Chapter 6 Probability Distributions Section 6.2 Probabilities for Bell-Shaped Distributions

Density Curves Example: Here is a histogram of vocabulary scores of 947 seventh graders. The smooth curve drawn over the histogram is a mathematical model for the distribution.

Density Curves An important property of a density curve is that areas under the curve correspond to relative frequencies relative frequencies=.303 area = .293 Note the relative frequency of vocabulary scores <= 6 is roughly equal to the area under the density curve <= 6.

Density Curves and Normal Distribution Density curves come in any imaginable shape. Some are well known mathematically and others aren’t.

Density Curves and Normal Distribution Definition, pg 56 Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company

e = 2.71828… The base of the natural logarithm Normal distributions Normal – or Gaussian – distributions are a family of symmetrical, bell shaped density curves defined by a mean m (mu) and a standard deviation s (sigma) : N(m,s). Commonly called the bell curve - if were skiing down it you are going steeper and steeper, then starts to flatten out. This is the equation - don’t have to know it - basically for every value x, gestation time, you can plug it in and get f(x), the value on the y axis. What we have done here is to go from a histogram, which is just your few data points, to this curve, which is a representation of what values you would get for any possible value of x whether you have it in your data set or not. x x e = 2.71828… The base of the natural logarithm π = pi = 3.14159…

Normal Distribution The normal distribution is symmetric, bell-shaped and characterized by its mean and standard deviation . The normal distribution is the most important distribution in statistics. Many distributions have an approximately normal distribution. The normal distribution also can approximate many discrete distributions well when there are a large number of possible outcomes. Many statistical methods use it even when the data are not bell shaped.

A family of density curves Here means are the same (m = 15) while standard deviations are different (s = 2, 4, and 6). Here means are different (m = 10, 15, and 20) while standard deviations are the same (s = 3)

Normal Distribution Normal distributions are Bell shaped Symmetric around the mean The mean ( ) and the standard deviation ( ) completely describe the density curve. Increasing/decreasing moves the curve along the horizontal axis. Increasing/decreasing controls the spread of the curve.

Normal Distribution Within what interval do almost all of the men’s heights fall? Women’s height? Figure 6.4 Normal Distributions for Women’s Height and Men’s Height. For each different combination of and values, there is a normal distribution with mean and standard deviation . Question: Given that = 70 and = 4, within what interval do almost all of the men’s heights fall?

Normal Distribution: 68-95-99.7 Rule for Any Normal Curve ≈ 68% of the observations fall within one standard deviation of the mean. ≈ 95% of the observations fall within two standard deviations of the mean. ≈ 99.7% of the observations fall within three standard deviations of the mean. Figure 6.5 The Normal Distribution. The probability equals approximately 0.68 within 1 standard deviation of the mean, approximately 0.95 within 2 standard deviations, and approximately 0.997 within 3 standard deviations. Question: How do these probabilities relate to the empirical rule?

Example : 68-95-99.7% Rule Heights of adult women can be approximated by a normal distribution, inches; inches 68-95-99.7 Rule for women’s heights: 68% are between 61.5 and 68.5 inches 95% are between 58 and 72 inches 99.7% are between 54.5 and 75.5 inches

Z-Scores and the Standard Normal Distribution The z-score for a value x of a random variable is the number of standard deviations that x falls from the mean. A negative (positive) z-score indicates that the value is below (above) the mean. Z-scores can be used to calculate the probabilities of a normal random variable using TI calculator

The standard normal random variable: mean=0 and SD=1 Denoted by N(0,1) Symmetric around 0 68% are within (-1,1), 95% are within (-2,2) and 99.7% are within (-3,3)

Using TI84 to find the probability for N(0,1), the standard normal distribution 1. Press the [2nd] + [VARS] keys to get to the DISTR menu; 2. Select the 2nd choice, normalcdf gives you the normal distribution probabilities. The format for this command is 3. normalcdf(lowerbound, upperbound, mu, sigma), Eg1.P(-1.96 < Z < 1.96) = normalcdf(-1.96,1.96, 0, 1)= 0.95 for a standard normal curve, and Eg2.P(Z<0)=normalcdf(-999,0)=0.5 which is the left tail area of 0. Exercise: Let X be a standard normal r.v., find out the Pr(Z>3)= (2) Pr(Z<5)= (3) Pr(Z<-0.31) = (4)Pr(Z>0.31) = (5) Pr(0.31<Z<2.56)= (6)Pr(-2.56<Z<-0.31)= (7) Pr(-0.31<Z<2.56) = (8) Pr(-2.56<Z<0.31)=

Standardized height (no units) The standard Normal distribution Because all Normal distributions share the same properties, we can standardize our data to transform any Normal curve N(m,s) into the standard Normal curve N(0,1). N(0,1) => N(64.5, 2.5) Standardized height (no units) For each x we calculate a new value, z (called a z-score).

Standardizing: calculating z-scores A z-score measures the number of standard deviations that a data value x is from the mean m. When x is 1 standard deviation larger than the mean, then z = 1. When x is 2 standard deviations larger than the mean, then z = 2. We do this by standardizing the distributions - really all this is redefining them not changing the shape but the bottom axis so that instead of being N(mu, sigma) they are N(mean =0,sd=1), and the bottom axis is in terms of the SD rather than the Height. You get this by calculating a value z for every point in x your data set. If you were to then draw the density curve for the z values you get a curve with a mean of 0 and a sd of 1. Once you have standardized, you can look up any value you want using a table. So, for instance, we knew that 68% of women were between 62 and 67 inches tall from knowing simple rules about 1,2,3 sd from mean. But if wanted to know the percentage of women that were less than 63 inches tall. Can’t just use those rules. need to standardize and go to table A - standard normal probabilities - on green card in book or in back. First standardize x to get z, the number of sd from the mean. It is 0.6 to the left (is negative). Look for -0.6 in left column (z), and then going across row, under .00 column (no more decimals on (-0.6) you find .2743. Twenty seven percent of women are shorter than 62 inches tall. When x is larger than the mean, z is positive. When x is smaller than the mean, z is negative.

Transformation from N(µ, σ) to N(0,1) If X is N(m,s), then Z is N(0,1), where Z=(X-m)/s. For each x we calculate a new value, z (called a z-score). Because all Normal distributions share the same properties, we can standardize our data to transform any Normal curve N(m,s) into the standard Normal curve N(0,1). Such Transformation keeps the area! Z=(X-65)/3.5 Z-score of 68.5=1, Z-score of 61.5=-1 Z-score of 58=-2, Z-score of 72 = 2 Z-score tells us how many SD the observation is above or below the average.

The NCAA defines a “partial qualifier” eligible to practice and receive an athletic scholarship, but not to compete, as a combined SAT score is at least 720. Assume the scores follow normal distribution with avg=1026 and sd=209 What proportion of all students who take the SAT would be partial qualifiers? That is, what proportion have scores between 720 and 820? About 9% of all students who take the SAT have scores between 720 and 820. Q: How many percent of the students are partial qualifiers Pr(X<720)? (Verify the answer, 0.072)

Women heights N(µ, s) = N(65, 3.5) Women heights Approx. N(65”,3.5”) distribution. What percent of women are shorter than 68 inches tall? Area= ??? mean µ = 65" standard deviation s = 3.5" x (height) = 69" m = 65” x = 69” z = 0 z = ? We calculate z, the standardized value of x:

Review: p-th percentile The p-th percentile of a distribution is the value that has p percent of the observations fall at or below it. 90th percentile

Inverse normal calculations for N(0,1) We may also want to find the observed range of values that correspond to a given proportion/ area under the curve. The following instructions are for N(0,1) The percentile always refer to the area to the left. Use Ti84, Press the [2nd] and [VARS] keys to get to the DISTR menu; Select the third choice, invNorm. The format for this command is invNorm(area of the left tail), EX: (1) the 25th percentile. (2) the 55th percentile (3) the 10th percentile (4) the 90th percentile

(5) what’s the 56th percentile? (96.4,100.9) Example1: Suppose the height of a randomly selected 5-year-old child is a normal distribution with  =100cm and  =6cm. What’s the 90th percentile?(107.68) (2) What’s the 50th percentile?(100) (3) What’s the 10th percentile? (92.32) (4) What’s the 25th percentile? (5) what’s the 56th percentile? (96.4,100.9) Example 2: A soft-drink machine is regulated so that it discharges an average of 200 milliliters per cup with SD 15 milliliters. With normality assumption. Find the prob that a cup will contain more than 220 milliliters (2) Find the prob that a cup will contain between 180 and 230 milliliters (3) Find the 40th percentile of the discharge amount (4) Find the 89th percentile of the discharge amount Remember we started out the day using gestation time as an example. Women who are malnourished risk have premature babies, and studies are being done to see whether different diet and vitamin supplements work better. Let’s say the goal is to get them to carry the baby at least 240 days (8 months). For treatment 1, say vitamins only, get normal distribution with mean 250, sd is 20. Treatment 2 is vitamins plus a meals on wheels program. Mean is 266, sd 15 . You can eyeball this and see that more of the women in treatment two are above our goal of 240 days, but how much of an improvement is it? The mean is increased, but the spread has changed too. Let’s standardize to get the proportion of women below 240 in each distribution. Go through - bottom line is that adding food to vitamins resulted in the proportion of women with gestation times of less than 240 days going from 30.85% to only 4.18%. You see figures like this in news stories all the time - if you went to the primary (medical literature) you would see that they had to go through this rigamarole to get you that tidy summary.