Presentation is loading. Please wait.

Presentation is loading. Please wait.

12. Discrete probability distributions

Similar presentations


Presentation on theme: "12. Discrete probability distributions"— Presentation transcript:

1 12. Discrete probability distributions
The Practice of Statistics in the Life Sciences Third Edition © 2014 W.H. Freeman and Company

2 Objectives (PSLS Chapter 12)
Discrete probability distributions The binomial setting and binomial distributions Binomial probabilities Binomial mean and standard deviation The Normal approximation to binomial distributions The Poisson distributions Poisson probabilities

3 Binomial setting and distributions
Binomial distributions are models for some categorical variables, typically representing the number of successes in a series of n independent trials. The observations must meet these requirements: the total number of observations n is fixed in advance each observation falls into just one of two categories: success and failure the outcomes of all n observations are statistically independent all n observations have the same probability p of “success” The terms success/failure come from a gambling background (the first applied probability theories were motivated by gambling questions). In the binomial setting, there is not good/bad connotation to the success/failure notation. A success could describe a heart attack just as well as patient remission.

4 Applications for binomial distributions
Binomial distributions describe the possible number of times that a particular event will occur in a sequence of observations. In a clinical trial, a patient’s condition may improve or not. The binomial distribution describes the number of patients who improved (not how much better they feel) among the study participants. Is a child obese or not (based on their body mass index)? The binomial distribution describes the number of obese children in a random sample of school-age children. In a quality control study, we assess the number of defective items in a lot of goods, irrespective of the type of defect.

5 Binomial parameters We express a binomial distribution for the count X of successes among n observations as a function of the parameters n and p: B(n,p). The parameter n is the total number of observations. The parameter p is the probability of success on each observation. The count of successes X can be any whole number between 0 and n. The CDC estimates that a third of adult men are obese. In a random sample of 10 adult men, each man is either obese or not. The variable X is the number of obese men among those 10 men sampled, our count of “successes.” For each man, the probability of success, “obese,” is 1/3. The number X of obese men among 10 men has the binomial distribution B(n = 10, p = 1/3).

6 Binomial probabilities
The number of ways of arranging k successes in a series of n observations (with constant probability p of success) is the number of possible combinations (unordered sequences). This can be calculated with the binomial coefficient: Where k = 0, 1, 2, ..., or n The binomial coefficient “n_choose_k” uses the factorial notation “!”. The factorial n! for any strictly positive whole number n is: n! = n × (n − 1) × (n − 2) × … × 3 × 2 × 1

7 P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2)
The binomial coefficient counts the number of ways in which k successes can be arranged among n observations. The binomial probability P(X = k) is this count multiplied by the probability of any specific arrangement of the k successes: X P(X) 1 2 k n nC0 p0qn = qn nC1 p1qn-1 nC2 p2qn-2 nCx pkqn-k nCn pnq0 = pn Total The probability that a binomial random variable takes any range of values is the sum of each probability for getting exactly that many successes in n observations. P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2)

8 The frequency of color blindness (dyschromatopsia) in the Caucasian American male population is estimated to be about 8%. In a group of 25 Caucasian American males, what is the probability that exactly five are color blind? P(x = 5) = [n! / k!(n – k)!] pk(1 – p)n-k = (25! / 5!(20)!) = [21*22*23*24*24*25 / 1*2*3*4*5] = 53,130 * * = Use technology Excel: P(x = 5) = BINOM.DIST(5, 25, 0.08, 0) = TI-83: P(x = 5) = binompdf(25, 0.08, 5) = CrunchIt!: P(x = 5) = (Binomial for n = 25, p = 0.08, X = 5) Excel: BINOM.DIST(number_s, trials, probability_s, cumulative = 0) TI-83: 2nd DIST, binompdf(n, p, k) CrunchIt!: Distribution calculator, Discrete, Binomial

9 The probability that exactly 2 adults in the sample have depression is
The incidence of major depression in adults is about 10%. A random sample of 50 adults will be tested for depression. The variable X is the number of individuals diagnosed with depression among all 50 and has the binomial distribution Bin(n = 50, p = 0.1). The probability that exactly 2 adults in the sample have depression is A) 0.010 B) 0.020 C) 0.078 D) 0.100 E) 0.112 P(x = 2) = [n! / k!(n – k)!] pk(1 – p)n-k = (50! / 2!(48)!) = [49*50 / 2] = Excel: BINOM.DIST(2, 55, 0.1, 0) = TI-83: binompdf(50, 0.1, 2) =

10 Binomial mean and variance
The center and spread of the binomial distribution for a count X are defined by the mean m and standard deviation s: The incidence of major depression in adults is about 10%. A random sample of 50 adults will be tested for depression. The variable X is the number of individuals diagnosed with depression among all 50 and has the binomial distribution Bin(n = 50, p = 0.1). Thus,

11 Effect of changing p when n is fixed
Binomial distributions are skewed when p is close to 0 or close to 1 (especially if the sample is small).

12 Effect of changing n for a fixed value of p

13 Normal approximation to binomial
If n is large, and p is not too close to 0 or 1, the binomial distribution can be approximated by a Normal distribution. Practically, the Normal approximation can be used when both np ≥10 and n(1 − p) ≥10. The approximation can be improved by using a continuity correction to take into account the fact that the Normal distribution is continuous. The Normal approximation was very useful before wide access to technology because factorial calculations can be quite intensive for large values of n. The Normal approximation also has implications for modeling sample proportions (proportions are simply counts divided by a constant, but they do not follow a binomial distribution because they are not whole numbers). We’ll see some direct applications in Chapter 13 on sampling distributions.

14 The incidence of major depression in adults is about 10%.
Count of adults diagnosed with depression in a sample of 20 adults, Bin(n = 20, p = 0.1). No Normal approximation Binomial, n=20, p=0.1 Count of adults diagnosed with depression in a sample of adults, Bin(n = 100, p = 0.1). Normal approximation OK Binomial, n=100, p=0.1 For n=20, np=2 (too small) For n=100, np=10 and n(1-p)=90 (OK)

15 The frequency of color blindness (dyschromatopsia) in the Caucasian American male population is about 8%. We take a random sample of size 125 from this population. What is the probability that 6 individuals or fewer in the sample are color blind? Distribution of the count X: B (n = 125, p = 0.08)  np = 10 P(X ≤ 6) = BINOM.DIST(6, 125, .08, 1) = or about 12% Normal approximation: N (np = 10, √np(1 − p) = 3.033) P(X ≤ 6) = NORM.DIST(6, 10, 3.033, 1) = or about 9% Or z = (x - µ)/σ = (6 − 10)/3.033 =  P(X ≤ 6) = from Table B The Normal approximation is reasonable, though not perfect. Here p = 0.08 is not close to 0.5, but np = 10 and n(1 − p) = 115. Using a continuity correction greatly improves the approximation: P(X ≤ 6.5) = NORM.DIST(6.5, 10, 3.033, 1) = or about 12%

16 Distributions for the color blindness example.
The larger the sample size the better the Normal approximation fits the binomial distribution. n = 125 n = 1000

17 The Poisson distributions
A Poisson distribution describes the count X of occurrences of an event in fixed, finite intervals of time or space when occurrences are all independent, and the probability of an occurrence is the same over all possible intervals. Items Containers Radioactive decays Weeds Fleas Cardiovascular deaths Second Acre of farm land Dog County / year Think of the Poisson distribution as describing the number of items in containers.

18 If we divide a natural lawn into 1 ft2 quadrants, we can count how many dandelions are in each quadrant. Dandelions seeds are wind-spread. The probabilities of a quadrant containing 0,1,2,3… dandelions are given by a Poisson distribution: (i) independence of dandelions: the presence of one dandelion in a quadrant does not make the presence of another more or less likely. (ii) homogeneity of quadrants: each quadrant is equally susceptible to contain dandelions.

19 Poisson probabilities
If μ is the population mean number of occurrences for a specified interval of time or space, then the Poisson probability distribution of observing k occurrences (k = 0, 1, 2, …) at constant μ (> 0) is: The Poisson distribution has mean μ and standard deviation σ:

20 The Poisson distribution is skewed when μ < 5.
Effect of changing μ: The Poisson distribution is skewed when μ < 5.

21 The number of deer crossing a road at night during mating season in a particular rural area can be modeled with a Poisson distribution. A local survey conducted over 4 nights found a total of 20 deer crossings. Based on this information, what is the probability that fewer than three deer would cross on a given night during mating season in this area? To compute this probability using the Poisson distribution, we need to know μ. In this case μ = 20 / 4 = 5 deer crossings per night. x

22 Probability of 5 severe rainstorms next year
Historical records over 20 years in a particular town indicate an average of 4 severe rainstorms per year. Modeling the occurrences of severe rainstorms with the Poisson distribution, the probability that there would be no severe rainstorm next year is P(X = 0) = (4)0 e–4 / 0! = 0.018 Probability of 5 severe rainstorms next year P(X = 5) = (4)5 e–4 / 5! = 0.156 Probability of 1 or more severe rainstorms next year P(X > 1) = 1 – P(X = 0) = 1 – = 0.982 Probability of more than 5 severe rainstorms next year P(X > 5) = 1 – P(X ≤ 5) = 1 – = 0.215 x P(X=x) P(X≤x) % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %


Download ppt "12. Discrete probability distributions"

Similar presentations


Ads by Google