Presentation is loading. Please wait.

Presentation is loading. Please wait.

Part A: Concepts & binomial distributions Part B: Normal distributions

Similar presentations


Presentation on theme: "Part A: Concepts & binomial distributions Part B: Normal distributions"— Presentation transcript:

1 Part A: Concepts & binomial distributions Part B: Normal distributions
11/13/2018 4: Probability Part A: Concepts & binomial distributions Part B: Normal distributions 11/13/2018 Unit 4: Intro to probability Biostat

2 Unit 4: Intro to probability
Definitions Random variable  a numerical quantity that takes on different values depending on chance Population  the set of all possible values for a random variable Event  an outcome or set of outcomes for a random variable Probability  the proportion of times an event occurs in the population; (long-run) expected proportion 11/13/2018 Unit 4: Intro to probability

3 Probability (definition #1)
The probability of an event is its relative frequency (proportion) in the population. Example: Let A  selecting a female at random from an HIV+ population There are 600 people in the population. There are 159 females. Therefore, Pr(A) = 159 ÷ 600 = 0.265 11/13/2018 Unit 4: Intro to probability

4 Probability (definition #2)
The probability of an event is its expected proportion when the process in repeated again and again under the same conditions Select 100 individuals at random 24 are female Pr(A)  24 ÷ 100 = 0.24 This is only an estimate (unless n is very very big) 11/13/2018 Unit 4: Intro to probability

5 Probability (definition #3)
The probability of an event is a quantifiable level of belief between 0 and 1 Probability Verbal expression 0.00 Never 0.05 Seldom 0.20 Infrequent 0.50 As often as not 0.80 Very frequent 0.95 Highly likely 1.00 Always Example: Prior experience suggests a quarter of population is female. Therefore, Pr(A) ≈ 0.25 11/13/2018 Unit 4: Intro to probability

6 Some rules of probability
11/13/2018 Unit 4: Intro to probability

7 Types of random variables
Discrete have a finite set of possible outcomes, e.g. number of females in a sample of size n (0, 1, 2, …, n) We cover binomial random variables Continuous have a continuum of possible outcomes e.g., average body weight (lbs) in a sample (160, 160.5, , , …) We cover Normal random variables There are other random variable families, but only binomial and Normal RVs are covered for now. 11/13/2018 Unit 4: Intro to probability

8 Binomial distributions
Most popular type of discrete RV Based on Bernoulli trial  random event characterized by “success” or “failure” Examples Coin flip (heads or tails) Survival (yes or no) 11/13/2018 Unit 4: Intro to probability

9 Binomial random variables
Binomial random variable  random number of successes in n independent Bernoulli trials A family of distributions identified by two parameters n  number of trials p  probability of success for each trial Notation: X~b(n,p) X  random variable ~  “distributed as” b(n, p)  binomial RV with parameters n and p 11/13/2018 Unit 4: Intro to probability

10 “Four patients” example
A treatment is successful 75% of time We treat 4 patients X  random number of successes, which varies  0, 1, 2, 3, or 4 depending on binomial distribution X~b(4, 0.75) 11/13/2018 Unit 4: Intro to probability

11 The probability of i successes is …
Binomial formula The probability of i successes is … Where nCi = the binomial coefficient (next slide) p = probability of success for each trial q = probability of failure = 1 – p 11/13/2018 Unit 4: Intro to probability

12 Binomial coefficient (“choose function”)
where !  the factorial function: x! = x  (x – 1)  (x – 2)  …  1 Example: 4! = 4  3  2  1 = 24 By definition 1! = 1 and 0! = 1 nCi  the number of ways to choose i items out of n Example: “4 choose 2”: 11/13/2018 Unit 4: Intro to probability

13 “Four patients” example
n = 4 and p = 0.75 (so q = = 0.25) Question: What is probability of 0 successes?  i = 0 Pr(X = 0) =nCi pi qn–i = 4C0 · · 0.254–0 = 1 · · = 11/13/2018 Unit 4: Intro to probability

14 Unit 4: Intro to probability
X~b(4,0.75), continued Pr(X = 1) = 4C1 · · –1 = 4 · · = Pr(X = 2) = 4C2 · · –2 = 6 · · = (Do not demonstrate all calculations. Students should prove to themselves they derive and interpret these values.) 11/13/2018 Unit 4: Intro to probability

15 Unit 4: Intro to probability
X~b(4, 0.75) continued Pr(X = 3) = 4C3 · · –3 = 4 · · 0.25 = Pr(X = 4) = 4C4 · · –4 = 1 · · 1 = 11/13/2018 Unit 4: Intro to probability

16 The distribution X~b(4, 0.75)
Probability table for X~b(4,.75) Probability curve for X~b(4,.75) Successes Probability 0.0039 1 0.0469 2 0.2109 3 0.4210 4 0.3164 11/13/2018 Unit 4: Intro to probability

17 Area under the curve (AUC) concept
The area under a probability curve (AUC) = probability! Get it? Pr(X = 2) = .2109 11/13/2018 Unit 4: Intro to probability

18 Cumulative probability (left tail)
Cumulative probability = Pr(X  i) = probability less than or equal to i Illustrative example: X~b(4, .75) Pr(X  0) = Pr(X = 0) = .0039 Pr(X  1) = Pr(X  0) + Pr(X = 1) = = Pr(X  2) = Pr(X  1) + Pr(X = 2) = = Pr(X  3) = Pr(X  2) + Pr(X = 3) = = Pr(X  4) = Pr(X  3) + Pr(X = 4) = = 11/13/2018 Unit 4: Intro to probability

19 Unit 4: Intro to probability
X~b(4, 0.75) Probability function Cumulative probability Pr(X  0) 0.0039 Pr(X  1) 0.0469 0.0508 Pr(X  2) 0.2109 0.2617 Pr(X  3) 0.4210 0.6836 Pr(X  4) 0.3164 1.0000 11/13/2018 Unit 4: Intro to probability

20 Cumulative probability
left tail = cumulative probability Area under shaded bars in left tail sums to , i.e., Pr(X  2) = Area under “curve” = probability Bring it on! 11/13/2018 Unit 4: Intro to probability

21 Reasoning Use probability model to reasoning about chance. I hypothesize p = 0.75, but observe only 2 successes. Should I doubt my hypothesis? ANS: No. When p = 0.75, you’ll see 2 or fewer successes 25% of the time (not that unusual). 11/13/2018 Unit 4: Intro to probability

22 StaTable probability calculator
Link on course homepage Three versions Java (browser) Windows Palm Probability Cumulative probability 11/13/2018 Unit 4: Intro to probability

23 Intro to Probability, Part B
The Normal distributions 11/13/2018 Unit 4: Intro to probability

24 The Normal distributions
Most popular continuous model Recognized by de Moivre (1667– 1754) Extended by Laplace (1749 – 1827) How’s my hair? Looks good. 11/13/2018 Unit 4: Intro to probability

25 Probability density function (curve)
11/13/2018 Probability density function (curve) Example: vocabulary scores of 947 seventh graders Smooth curve drawn over histogram is a model of the actual distribution Mathematical model is the Normal probability density function (pdf) 11/13/2018 Unit 4: Intro to probability Biostat

26 Unit 4: Intro to probability
11/13/2018 Area under curve The area under the curve (AUC) concepts applies The shaded bars (left tail) represent scores ≤ 6.0 = 30.3% of scores Pr(X ≤ 6) = 0.303 11/13/2018 Unit 4: Intro to probability Biostat

27 Areas under curve (cont.)
11/13/2018 Areas under curve (cont.) Now translate this to the area under the curve (AUC) The scale of the Y-axis is adjusted so the total AUC = 1 The AUC to the left of 6.0 (shaded) = 0.293 Therefore, the AUC “models” the area in proportion area in the bars of the histogram, i.e., probabilities of associated ranges 11/13/2018 Unit 4: Intro to probability Biostat

28 Unit 4: Intro to probability
11/13/2018 Density Curves 11/13/2018 Unit 4: Intro to probability Biostat

29 Arrows indicate points of inflection
11/13/2018 Normal distributions Normal distributions = a family of distributions with common characteristics Normal distributions have two parameters Mean µ locates center of the curve Standard deviation  quantifies spread (at points of inflection) Arrows indicate points of inflection 11/13/2018 Unit 4: Intro to probability Biostat

30 Unit 4: Intro to probability
11/13/2018 rule for Normal RVs 68% of AUC falls within 1 standard deviation of the mean (µ  ) 95% fall within 2 (µ  2) 99.7% fall within 3 (µ  3) 11/13/2018 Unit 4: Intro to probability Biostat

31 Illustrative example: WAIS
Wechsler adult intelligence scores (WAIS) vary according to a Normal distribution with μ = 100 and σ = 15 11/13/2018 Unit 4: Intro to probability

32 Another example (male height)
11/13/2018 Another example (male height) Adult male height is approximately Normal with µ = 70.0 inches and  = 2.8 inches (NHANES, 1980) Shorthand: X ~ N(70, 2.8) Therefore: 68% of heights = µ   = 70.0  2.8 = 67.2 to 72.8 95% of heights = µ  2 = 70.0  2(2.8) = 64.4 to 75.6 99.7% of heights = µ  3 = 70.0  3(2.8) = 61.6 to 78.4 11/13/2018 Unit 4: Intro to probability Biostat

33 Another example (male height)
11/13/2018 Another example (male height) What proportion of men are less than 72.8 inches tall? (Note: 72.8 is one σ above μ) ? (height) 68% (by Rule) -1 +1 16% 16% 84% 11/13/2018 Unit 4: Intro to probability Biostat

34 Male Height Example ? 68 70 (height)
11/13/2018 Male Height Example What proportion of men are less than 68 inches tall? ? (height) 68 does not fall on a ±σ marker. To determine the AUC, we must first standardize the value. 11/13/2018 Unit 4: Intro to probability Biostat

35 Standardized value = z score
11/13/2018 Standardized value = z score To standardize a value, simply subtract μ and divide by σ This is now a z-score The z-score tells you the number of standard deviations the value falls from μ 11/13/2018 Unit 4: Intro to probability Biostat

36 Example: Standardize a male height of 68”
11/13/2018 Example: Standardize a male height of 68” Recall X ~ N(70,2.8) Therefore, the value 68 is 0.71 standard deviations below the mean of the distribution 11/13/2018 Unit 4: Intro to probability Biostat

37 Men’s Height (NHANES, 1980) ? 68 70 (height values)
11/13/2018 Men’s Height (NHANES, 1980) What proportion of men are less than 68 inches tall? = What proportion of a Standard z curve is less than –0.71? (height values) ? (standardized values) You can now look up the AUC in a Standard Normal “Z” table. 11/13/2018 Unit 4: Intro to probability Biostat

38 Using the Standard Normal table
11/13/2018 Using the Standard Normal table z .00 .01 .02 0.8 .2119 .2090 .2061 0.7 .2420 .2389 .2358 0.6 .2743 .2709 .2676 Pr(Z ≤ −0.71) = .2389 11/13/2018 Unit 4: Intro to probability Biostat

39 Summary (finding Normal probabilities)
Draw curve w/ landmarks Shade area Standardize value(s) Use Z table to find appropriate AUC (standardized values) (height values) .2389 11/13/2018 Unit 4: Intro to probability

40 Right-”tail” 68 70 (height values)
11/13/2018 Right-”tail” What proportion of men are greater than 68” tall? Greater than  look at right “tail” Area in right tail = 1 – (area in left tail) (standardized values) (height values) .2389 = .7611 Therefore, 76.11% of men are greater than 68 inches tall. 11/13/2018 Unit 4: Intro to probability Biostat

41 Unit 4: Intro to probability
Z percentiles zp  the z score with cumulative probability p What is the 50th percentile on Z? ANS: z.5 = 0 What is the 2.5th percentile on Z? ANS: z.025 = 2 What is the 97.5th percentile on Z? ANS: z.975 = 2 11/13/2018 Unit 4: Intro to probability

42 Finding Z percentile in the table
11/13/2018 Finding Z percentile in the table Look up the closest entry in the table Find corresponding z score e.g., What is the 1st percentile on Z? z.01 = -2.33 closest cumulative proportion is .0099 z .02 .03 .04 2.3 .0102 .0099 .0096 11/13/2018 Unit 4: Intro to probability Biostat

43 Unstandardizing a value
11/13/2018 Unstandardizing a value How tall must a man be to place in the lower 10% for men aged 18 to 24? .10 ? (height values) 11/13/2018 Unit 4: Intro to probability Biostat

44 Table A: Standard Normal Table
11/13/2018 Table A: Standard Normal Table Use Table A Look up the closest proportion in the table Find corresponding standardized score Solve for X (“un-standardize score”) 11/13/2018 Unit 4: Intro to probability Biostat

45 Table A: Standard Normal Proportion
11/13/2018 Table A: Standard Normal Proportion z .07 .09 1.3 .0853 .0838 .0823 .1020 .0985 1.1 .1210 .1190 .1170 .08 1.2 .1003 Pr(Z < -1.28) = .1003 11/13/2018 Unit 4: Intro to probability Biostat

46 Men’s Height Example (NHANES, 1980)
11/13/2018 Men’s Height Example (NHANES, 1980) How tall must a man be to place in the lower 10% for men aged 18 to 24? .10 ? (height values) (standardized values) 11/13/2018 Unit 4: Intro to probability Biostat

47 Observed Value for a Standardized Score
11/13/2018 Observed Value for a Standardized Score “Unstandardize” z-score to find associated x : 11/13/2018 Unit 4: Intro to probability Biostat

48 Observed Value for a Standardized Score
11/13/2018 Observed Value for a Standardized Score x = μ + zσ = 70 + (-1.28 )(2.8) = 70 + (3.58) = 66.42 A man would have to be approximately inches tall or less to place in the lower 10% of the population 11/13/2018 Unit 4: Intro to probability Biostat


Download ppt "Part A: Concepts & binomial distributions Part B: Normal distributions"

Similar presentations


Ads by Google