Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discrete Probability Distributions

Similar presentations


Presentation on theme: "Discrete Probability Distributions"— Presentation transcript:

1 Discrete Probability Distributions

2 Discrete vs. Continuous
A random variable (RV) that can take only certain values along an interval: Cars passing by a point Results of coin toss Students taking a class Continuous An RV that can take on any value at any point along an interval: Temperature, time, distance, money etc. This is something we’ve discussed before. Basically, if you can count the number of times something happens, it’s discrete. Continuous variables can be broken down into smaller and smaller parts, making them virtually impossible to count. The exception is money; we consider money to be a continuous RV. We will deal with discrete RVs in this lesson, then most of the rest of the class will deal with continuous RVs.

3 Frequency Distribution
Number of times an observation occurs in a given population. 1000 coin tosses done 500 times and counted the # of heads in each of the 1000 tosses If I did this once (1000 tosses) how many heads do you think I will get? Take a minute to explain what a frequency distribution is.

4 What is a variable? A symbol (A, B, x, y, etc.) that can take on any of a specific set of values X=number of heads Y= temperature Random variable The outcome of a statistical experiment Understanding the notation will help you to understand the problems. Sometimes, I get lazy and don’t use the complete notation.

5 Random variable notation
Capital letter represents the RV X=total number of heads in 4 tosses P(X) represents the probability of X Lower-case letter represents one of the values of the RV P(X=x) is the probability the RV will assume a specific value P(X=2) is the probability that we will have exactly 2 heads in the 4 tosses So, in 4 tosses, I can get 16 results. How many of those 16 include having exactly 2 heads? We can probably figure this out by listing all 16. But what happens if I toss the coin 100 times? We’ll get to that in a bit. First…

6 Probability Distribution
Relative frequency distribution that should, theoretically, occur for observations from a given population. Outcome #heads Probability HH 2 .25 HT 1 TH TT X P(X) 0.25 1 0.50 2 An example will make clear the relationship between random variables and probability distributions. Suppose you flip a coin 2 times. Four possible outcomes (n^k: 2^2=4): HH, HT, TH, and TT. Let the variable X represent the number of Heads that result from this experiment. The variable X can take on the values 0, 1, or 2. In this example, X is a random variable; because its value is determined by the outcome of a statistical experiment. A probability distribution is a table or an equation that links each outcome of a statistical experiment with its probability of occurrence. These are individual probabilities. Construct the probability distribution table.

7 Cumulative probability distribution
Probability the value of a RV falls within a specified range. Coin toss: P(X≤1) # Heads P(X=x) P(X≤x) 0.25 1 0.50 0.75 2 1.00 Return to the coin flip experiment. If we flip a coin two times, we might ask: What is the probability that the coin flips would result in one or fewer heads? The answer would be a cumulative probability. It would be the probability that the coin flip experiment results in zero heads plus the probability that the experiment results in one head. P(X=x) is individual; P(X<=x) is cumulative What is the P(X<=1)? 0.75 (demonstrate)

8 Characteristics of a Discrete Probability Distribution
For any value of x The values of x are exhaustive, i.e. the distribution contains all the possible values The values of x are mutually exclusive; i.e., only one value can occur for an experiment The sum of the probabilities equals 1 Pretty much what we said about probability all along, with the addition of the “values of x” part.

9 Mean and standard deviation
Mean of discrete distribution is called expected value Variance Standard Deviation Expected value: demonstrate using dice example I don’t know of a way to get Excel to give you expected values.

10 Practice Determine the Mean (µ) or Expected Value (E(x)) for the following data. X 1 2 P(x) 0.6 0.3 0.1 .6x0 + .3x1 + .1x2 = .5

11 Practice A music shop is holding a promotion in which the customer rolls a die and deducts a dollar from the price of a CD equal to the number that he rolls. If the owner pays $5.00 for each disk and prices them at $9.00, what will his expected profit be on each CD during this promotion? We already computed the EV of a die roll (3.5), so the expected return from each sale would be 9 – 3.5 = $5.50

12 Binomial Distributions
There are 2 or more identical trials In each trial, there can be only 2 outcomes (success or failure) Trials are statistically independent Outcome of one trial does not influence outcome of the next Probability of success remains the same from one trial to the next Be careful labeling success and failure. If we’re tracking the probability of a machine or part failing, the failure would be a success. Sometimes, dead is a success. Yes or no; above freezing/below freezing; red light or green light Deck of cards – 10 or not a 10? Buys a house or doesn’t buy a house? Has red hair or does not have red hair (SRS of all males under 30)? Also known as a Bernoulli process

13 Binomial Experiment? An article in a 1988 issue of The New England Journal of Medicine talked about a TB outbreak. One person caught the disease in 1995 232 workers sampled from a very large population were given a TB test The number of workers testing positive is the variable of interest If we test all 232 workers for the disease, is this a binomial experiment? There are 2 or more identical trials: yes, 232 trials In each trial, there can be only 2 outcomes (success or failure): success = has TB (testing positive) Trials are statistically independent: one person testing positive does not affect next person’s results Probability of success remains the same from one trial to the next: yes, but we don’t know what that is

14 Binomial Experiment? Bill has to sell 3 cars to meet his monthly quota. He has 5 customers, but 3 of them are interested in the same car and will leave if that car is sold. He has a 30% chance of a sale with each customer. Is this a binomial experiment? There are 2 or more identical trials: yes, 5 trials In each trial, there can be only 2 outcomes (success or failure): success = sells car Trials are statistically independent: NO one person buying the car affects next person’s results Probability of success remains the same from one trial to the next: NO if not purchased, 0 if already bought

15 Binomial Distributions
Probability of exactly x successes in n trials: Where: π = probability of success for any trial n = number of trials x = number of successes (1-x) = number of failures Note on notation: The book uses the pi symbol for probability. You may also see p. Suppose a die is tossed 5 times. What is the probability of getting exactly 2 fours? Solution: This is a binomial experiment in which the number of trials is equal to 5, the number of successes is equal to 2, and the probability of success on a single trial is 1/6 or about Look at the first part of this equation. It’s the “n chose r” formula you learned in counting rules. This is how it’s applied. We first determine how many ways we can get a success. Then we determine the probability of success and failure… P(X=2)= 5c2 * (0.167)^2 * (0.833)^3 = .161 TABLES IN BACK OF BOOK: A1 and A2

16 Binomial Distributions
Expected value Variance EV = np So, if we roll a die 24 times, we would expect a 4 to occur 4 times: E(4) = 24*.167 = 4 Not so worried about the variance at this point, but it’s not hard to compute. For the same example: var = np(1-p) = 4(.833) = 3.33

17 Binomial Distributions in Excel
=binom.dist(number_s,trials,probability_s,cumulative) Where: Number_s = number of successes Trials Probability_s = probability of success Cumulative: False, if we want the probability of x True, if we want the probability of all the variables up to and including x Example, in the previous problem P(X=5) =binom.dist(5,5,.1,false)

18 Binomial Experiment We’re going to select 5 households at random in a city where the unemployment rate is 10% to see if the head of the household is unemployed. What is the probability that all 5 are employed? Is this a binomial experiment? Why or why not? Be careful in how you define success Work it out for 5 then try it again for exactly four. P(X=5) = 5c5 * .9^5 * .1^0 = .5905

19 These examples came from the StatTrek website: http://stattrek
Acceptance to college The probability that a student is accepted to a prestigious college is 0.3. If 5 students apply, what is the probability that at most 2 are accepted? Solution: What is our expected value (mean of the distribution)? This can help you see if your solution is in the ballpark. To solve this problem, we compute 3 individual probabilities, using the binomial formula. The sum of all these probabilities is the answer we seek. Thus, P(X<=2) = 5c0 * .3^0(1-.3)^5 + 5c0 * .3^1(1-.3)^4 + 5c2 * .3^2(1-.3)^3 P(X<=2) = P(X<=2) = Demo how to use the tables in the book. Then use Excel and demo using random number generator. Binomial RV with number variables = 1, number of random numbers = 100, number of trials=5, p=.3. Check w/ countif, range, “<3” Compute w/ binomdist and show the probability distribution

20 Probability Distribution
NOTE: No distribution curve. Blocks (or points, etc.) only! # Accepted

21 Cumulative Probability Distribution
# Accepted

22 Coin flipping - again What is the probability of getting 45 or fewer heads in 100 tosses of a fair coin? Solution: To solve this problem, we compute 46 individual probabilities, using the binomial formula. The sum of all these probabilities is the answer we seek. Thus, P(X<=45) = 100c0 * .5^0(1-.5)^100 +…+ 100c45 * .5^45(1-.5)^55 Right now, this is the only way we know how to do this. Answer is Can’t use the tables in the back of the book to figure this one. They end at n=25 Demo using Excel =binomdist(x,n,p,false=individual:true=cumulative) Simulate by generating random data, Bernoulli. Count using “countif” =COUNTIF($A$101:$Y$101,">46")

23 Probability Distribution
Binomial approximates the normal distribution.

24 Cumulative Probability Distribution
This is also typical of a normal distribution – the S curve.

25 The World Series What is the probability that the World Series will last 4 games? 5 games? 6 games? 7 games? Assume the teams are evenly matched. Solution: This is a very tricky application of the binomial distribution. If you can follow the logic of this solution, you have a good understanding of the material covered to this point. Use the probability tables in the back of the book to solve each situation. In the World Series, there are two baseball teams. The series ends when the winning team wins 4 games. Therefore, we define a success as a win by the team that ultimately becomes the World Series champion. For the purpose of this analysis, we assume that the teams are evenly matched. Therefore, the probability that a particular team wins a particular game is 0.5. Let's look first at the simplest case. What is the probability that the series lasts only 4 games? This can occur if one team wins the first 4 games. The probability of the National League team winning 4 games in a row is: P(X=4) = 4c4 * (0.5)^4 * (0.5)^0 = But we also have to compute the probability of the American League team winning 4 games in a row, which, of course, is also Therefore, the probability the series ends in four games is = Now let's tackle the question of finding probability that the World Series ends in 5 games. The trick in finding this solution is to recognize that the series can only end in 5 games, if one team has won 3 out of the first 4 games. So let's first find the probability that the American League team wins exactly 3 of the first 4 games. P(X=3) = 4c3 * (0.5)^3 * (0.5)^1 = 0.25 Okay, here comes some more tricky stuff, so listen up. Given that the American League team has won 3 of the first 4 games, the American League team has a .5 chance of winning the fifth game to end the series. Therefore, the probability of the American League team winning the series in 5 games is 0.25 * 0.50 = Since the National League team could also win the series in 5 games, the probability that the series ends in 5 games would be = 0.25. The rest of the problem would be solved in the same way. You should find that the probability of the series ending in 6 games is : P(X=3) = 5c3 * (0.5)^3 * (0.5)^2 = and the probability of the series ending in 7 games is also : P(X=3) = 6c3 * (0.5)^3 * (0.5)^3 =

26 Poisson Distribution Applies for events occurring over time, space, or distance Examples: Number of cars driving past a point Number of defects per foot in manufactured pipe Number of knots in a section of wood panel Number of accidents per day at a job site STOP HERE FOR CLASS ONE. If there’s still time, students can begin the homework. Poisson is a very useful distribution, and much easier to compute than the binomial.

27 Poisson Distribution e is the base of the natural logarithm system and is equal to Any number raised to a negative exponent is the same as 1 divided by that number raised to its exponent. Example: 2-2 is the same as 1/22 The hardest part about computing a Poisson distribution is determining lambda.

28 Poisson Distribution There were 438 children born in a small town last year. What is the probability that, on any given day, no children were born? Here’s an example. Since we want to know the probability of 0 children being born on any day, we need lambda to tell us the rate of children born/day; i.e. 438/365 = 1.2 (forget about leap years) Now we plug into the formula: P(X=0) = 1.2^0*e^-1.2/0! = .3012 There’s also a chart in the book for this.

29 Poisson Distribution in Excel
=poisson.dist(x,mean,cumulative) Where: X=the number we’re looking for Mean = lambda Cumulative True = probability of all values up to and including x False = probability of x =poisson.dist(0,1.2,false)

30 Probability Distribution
Typical of a Poisson distribution. Note that – in theory – the probability is asymptotic; i.e. it never reaches 0.

31 Cumulative Poisson Distribution
Suppose the average number of lions seen on a 1-day safari is 5. What is the probability that tourists will see fewer than four lions on the next 1-day safari? Use the tables in the book to solve this one. Lambda = 5 P(X=0) = .0067 P(X=1) = .0337 P(X=2) = .0842 P(X=3) = .1404 So P(X<4) = .2650

32 Cumulative Poisson Distribution
Final note on Poisson. When n is large (>20) and p is small (<.05), we can let np=lambda and use the Poisson to approximate the binomial.

33 Hypergeometric Distribution
Sampling without replacement Compare to binomial There are 2 or more identical trials In each trial, there can be only 2 outcomes (success or failure) Trials are statistically independent Probability of success remains the same from one trial to the next The random variable is the number of successes in n trials Trials are not statistically independent The way to recognize a hypergeometric is the sampling without replacement. Remember, with very large populations we can still use the binomial. Frequently used in quality control inspections – inspecting x items from a sample of size n. It makes no sense at all to replace, especially when a defective is chosen. Probability of success changes from one trial to the next

34 Hypergeometric Distribution
Where: N=size of the population n=size of the sample s=number of successes in the population x=number of successes in the sample Unfortunately, no chart in the book for this one. The mean of the distribution is equal to n*s/N. The variance is n *s*(N -s) * ( N - n ) / [ N^2* (N - 1 ) ] .

35 Hypergeometric in Excel
=hypgeom.dist(sample_s, number_sample, population_s, number_population, cumulative) Where: sample_s=number of successes in the sample number_sample=size of the sample population_s=number of successes in the population number_population=size of the population cumulative=same as before =hypgeom.dist(2, 4, 6, 20, false)

36 Hypergeometric Distribution
20 businesses filed tax returns 6 of the returns were filled out incorrectly The IRS has randomly selected 4 of the 20 returns to audit What is the probability that exactly 2 of the 4 selected for audit will be filled out incorrectly? P(X=2) = 6c2*20-6c4-2/20c4 = .2817

37 Hypergeometric Distribution

38 Cumulative Hypergeometric
Suppose we select 5 cards from an ordinary deck of playing cards. What is the probability of obtaining 2 or fewer hearts? N=52 n=5 s=13 x<3 P(X<3) = 13c2*52-13c5-2/52c5 + 13c1*52-13c5-1/52c5 + 13c0*52-13c5-0/52c5 = .9072

39 Cumulative Hypergeometric

40 Summary Random variables Probability distributions Expected values
Discrete v. continuous Probability distributions Cumulative distributions Expected values Binomial Poisson Hypergeometric There are other discrete distributions, but these are the only 3 we’ll study in this class.


Download ppt "Discrete Probability Distributions"

Similar presentations


Ads by Google