# Section III Gaussian distribution Probability distributions (Binomial, Poisson)

## Presentation on theme: "Section III Gaussian distribution Probability distributions (Binomial, Poisson)"— Presentation transcript:

Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Notation Statistic Sample Population mean Y μ Std deviation S or SD σ proportion P π mean difference d δ Correlation coeff r ρ rate (regression) b β Num of obs n N

Densities –Percentiles BMI=22 is the 88 th percentile

Standard Z scores Definition: Z = (Y – mean)/ SD Y = mean + Z SD Z is how many SD units Y is above or below mean. Mean & SD might be sample (Y, S) or population (μ,σ) values if population values are known.

YY - meanZ= (Y - mean)/SD 4-13.54-1.16 6-11.54-0.99 8-9.54-0.82 8-9.54-0.82 12-5.54-0.47 14-3.54-0.30 15-2.54-0.22 17-0.54-0.05 191.460.13 224.460.38 246.460.55 3416.461.41 4527.462.35 Survival data, mean=17.54, SD=11.68

Standard Gaussian (Normal) a distribution model

Selected Gaussian percentiles Z lower area (P { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/14/4382352/slides/slide_8.jpg", "name": "Selected Gaussian percentiles Z lower area (P

Gaussian percentiles EXCEL function =NORMSDIST(Z) gives percentile from Z. EXCEL function =NORMSINV(p) gives Z from the percentile

Example- SAT Verbal Mean=μ=500, SD=σ=100 What is your percentile if Y=700? Z= (700-500)/100=2.0, area=0.977=97.7% What score is the 80 th percentile, Z 0.80 =0.842 Y = 500 + 0.842 (100) = 584 What percent are between 450 and 500? For Y=450, Z=(450-500)/100=-.5, area=0.3085 For Y=500, Z=0, area=0.5000, so area between is 0.500-0.3085=0.1915=19%

Example- Anesthesia Effective dose, μ=50 mg/kg, σ=10 mg/kg Lethal dose, μ=110 mg/kg, σ=20 mg/kg Q1= What dose with put 90% to sleep? Q2- What is the risk of death from this dose?

Example- Anesthesia Effective dose, μ=50 mg/kg, σ=10 mg/kg Lethal dose, μ=110 mg/kg, σ=20 mg/kg Q1= What dose with put 90% to sleep? Z 0.90 =1.28, Y=50+1.28 (10) = 62.8 mg/kg Q2- What is the risk of death from this dose? Z=(62.8-110)/20= -2.36, area < 1%

Prediction intervals (not CI) If μ and σ are known and the data is known to have a Gaussian distribution, the interval formed by (μ-Zσ, μ+Zσ) is the (2k-100 th ) prediction interval for the k th percentile Z (Z>0). Z=2, (μ-2σ, μ+2σ) is (approximately) the 95% prediction interval Implies SD ≈ range/4 (extremes excluded)

Normal dist-differences & sums If Y 1,Y 2 each have independent normal distributions with means and SDs as below variable mean SD Y 1 µ 1 σ 1 Y 2 µ 2 σ 2 Then the difference & sum have normal dists. mean SD. diff=Y 1 -Y 2 µ 1 -µ 2 sqrt(σ 1 2 + σ 2 2 ) sum=Y 1 +Y 2 µ 1 +µ 2 sqrt(σ 1 2 + σ 2 2 ) Q: If σ 1 =σ 2,what is mean diff with100% overlap?

Difference of two normals

Specificity & Sensitivity For serum Creatinine in normal adults  = 1.1 mg/dl  = 0.2 mg/dl In one type of renal disease  = 1.7 mg/dl  = 0.4 mg/dl If a cutoff value of 1.6 mg/dl is used Prob false pos= prob Y > 1.6 given normal Prob false neg = prob Y < 1.6 given disease

Data transformations & logs Some continuous variables follow the Gaussian on a transformed scale, not the original scale. Statland implies that perhaps 80% of continuous lab test variables follow a Gaussian on either the original (50%) or a transformed scale, usually the log scale. (Clinical Decision Levels for lab Tests, 2 nd ed, 1987, Med Econ)

Example-Bilirubin Bilirubin umol/LLog Bilirubin, log 10 umol/L Mean=64.3 Median=34.7 SD=104.3 n=216 Mean=1.55 Median=1.54 SD=0.456 n=216

95% prediction intervals Original scale log 10 scale Mean 64.3 1.55 SD 104.3 0.456 2 SD 208.6 0.912 Lower -144.3 0.64 Upper 272.9 2.46 ******************************************* Geometric mean=10 1.55 =35.5 mmol/L Prediction interval (10 0.64,10 2.46 ) or (4.3, 290)

Normal probability plot Bilirubin – original scale Data is Gaussian if plot is a straight line- above not Gaussian

Normal probability plot Bilirubin- log scale Data is Gaussian if plot is a straight line as above

Log transformation (cont). The distribution of ratios is much closer to Gaussian on the log scale The “inverse” of 3/1 is 1/3. This is symmetric only on the log scale Original: 100/1, 10/1, 1/1, 1/10, 1/100 Log: 2, 1, 0, -1, -2 true for OR, RR and HR Measures of growth & proliferation have distribution closer to the Gaussian on the log scale

Data distributions that tend to be Gaussian on the log scale Growth measures - bacterial CFU Ab or Ag titers (IgA, IgG, …) pH Neurological stimuli (dB, Snellen units) Steroids, hormones (Estrogen, Testosterone) Cytokines (IL-1, MCP-1, …) Liver function (Bilirubin, Creatinine) Hospital Length of stay (can be Poisson)

Quick Probability Theory Mutually exclusive events: levels of one variable Blood type probability A 30% B 12% AB 8% O 50% Probability A or O = 30% + 50%=80%. Mutually exclusive probabilities add. All (exhaustive) categories sum to 100%

Probability-Independent events The probabilities of two independent events multiply. (two or more variables) If 5% of pregnant women have gestational diabetes If 8% of pregnant women have pre-eclampsia Probability of gest. diabetes and pre- eclampsia = 5% x 8% = 0.4% if independent.

Conditional probability Probability of an event changes if made conditional on another event. Probability (prevalence) of TB is 0.1% in general population. In Vietnamese immigrants, TB probability is 4%. Conditional on being a Vietnamese immigrant, probability is 4%.

Conditional Probability & Bayes A=Vietnamese n=5000 B=TB+ n=1000 A∩B N=200 n=1,000,000 Want prob TB|Vietnamese but can’t check all Vietnamese for TB

Conditonal Prob & Bayes Rule What is TB prevalence in Orange Co Vietnamese population ? Too hard to take census of all Vietnamese. Assume we know: P(A)=prop in Orange Co who are Viet=0.5% P(B)=prop in Orange Co who have TB = 0.1% P(A|B)=prop of those with TB who are Viet=20% Want P(B|A) = P(A|B) P(B)/ P(A) = (0.2 x 0.001)/(0.005) = 0.04=4%

Bayes rule for conditional probability (formula) Probability of B given A = P(B|A)= Joint probability of A and B/Probability of A= P(A ∩ B)/P(A) = Probability of A given B x Probability of B Probability of A Bayes rule: P(B|A)=[ P(A|B)P(B)] / P(A) If A and B are independent, P(B|A)=P(B) Also P(B) = ∑ P(B|A i ) (sum over all A i )

Example: Bayes rule A=Vietnamese, B=TB+ In pop of 1,000,000, 5000 (0.5%=0.005) are Vietnamese=P(A), 1000 (0.1%=0.001) have TB+ =P(B). Of 1000 with TB+, 200 (20%=0.20) are Vietnamese=P(A|B) Want prob. of TB given Vietnamese? =P(B|A). P(B|A)= 0.20 (0.001)/0.005 = 0.04=4%. =200/5000 Can’t test all Viet for TB+, can check all TB+ for Viet

Bayes rule (graph) 1,000,000 pop 1000 TB+ 200 Viet + TB+ 5000 Viet Conditional probability of TB+ given Vietnamese = 200/5000=4% B|A B A ∩ B A Check all TB+ for Viet rather than check all Viet for TB

Bayesian vs Frequentist Bayesian computes Prob(hypothesis|data) = Prob(data|hypothesis) P(hypothesis) Prob(data) = Data Likelihood x prior probability If data (evidence) refutes a hypothesis Prob(data | hypothesis)=0 so Prob(hypothesis | data)=0 Frequentist computes Prob(data*|hypothesis)= p value * p value is prob of observed data or more extreme data

Binomial distribution Population: Positive= π = 0.30, negative = 1- π = 0.70 Y= number of positive responses out of n trials n=1 Y probability 0 0.700 1 0.300 n=2 Y probability 0 0.49=0.7 x 0.7 1 0.42= 0.7 x 0.3 x 2 2 0.09= 0.3 x 0.3

Binomial (cont.) n=3 Y probability 0 0.343 1 0.441 2 0.189 3 0.027 n=4 Y probability 0 0.2401 1 0.4116 2 0.2646 3 0.0756 4 0.0810

General binomial formula Probability of y positive out of n where π is prob of a single positive = n!/[y!(n-y)!] π y (1-π) (n-y) Mean=πn, SD=√nπ(1-π) Ex:Prob of y=5 herpes cases out n=50 teens if herpes incidence=π=4%=0.04 Prob=50!/(5! 45!)(0.04) 5 (0.96) 45 =3.4% Can compute using “=Binomdist(y,n,π,0)” in EXCEL For example, =BINOMDIST(5,50,0.04,0) is 0.034

Binomial-fair coin example for π=0.5, easy to compute y=number of “heads” (success) out of n prob y out of n = n!/[y!(n-y)!] / 2 n Ex: n=3, flip 3 fair coins, 2 3 =8 possibilities 0+0+0=0=y y freq prob 0+0+1=1=y 0 1 1/8 0+1+0=1=y 1 3 3/8 1+0+0=1=y 2 3 3/8 0+1+1=2=y 3 1 1/8 1+0+1=2=y total 8 8/8 1+1+0=2=y 1+1+1=3=y

Pascal’s triangle n y: 0 to n “success” 2 n - 1 1 1 1 2 2 1 2 1 4 3 1 3 3 1 8 4 1 4 6 4 1 16 5 1 5 10 10 5 1 32 For n=5, prob(y=2) is 10/32 prob(y≤2) is (1+5+10)/32=16/32

Headache remedy success The “old” headache remedy was successful π=50% of the time, a true “population” value well established after years of study. A “new” remedy is tried in 10 persons and is successful in 7 of the 10 (70%). Is this enough evidence to “prove” that the new remedy is better?

Hypothesis testing-Binomial How likely is y=7 success out of n=10 if π=0.5, prob = 10!/(7!3!) / 2 10 = 120/1024=0.1172 How likely y=7 or more (p value)? y probability 7 120/1024 = 0.1172 8 45/1024 = 0.0439 9 10/1024 = 0.0098 10 1/1024 = 0.0010 total 176/1024= 0.1719 <- p value

How likely is observing y=70 success out of n=100 if π=0.5 for each trial? Prob(y=70)=[100!/(70! 30!)] / 2100 = 2.32 x 10-5 How likely is it to observe 70 or more successes out of 100? pr(y=70) + pr(y=71) + …+pr(y=100) = 3.93 x 10-5 This is a simple example of hypothesis testing. The probability of observing y=70 or more successes out of n=100 under the “null hypothesis” that the true population π=0.5 is called a one sided p value.

Gaussian approximation to Binomial ok for large n, π not near 0 or 1 π =0.15, n=50, mean=0.15(50)=7.5, SD=√50(0.15)(0.85)=2.52 Actual 2.5 th percentile is between 2 & 3, Gaussian 7.5-2(2.5)=2.5 Actual 97.5 th percentile is between 12 and 13, Gaussian=7.5 +2(2.5)=12.5

Poisson distribution for count data For a patient, y is a positive integer: 0,1,2,3,… Probability of “y” responses (or events) given mean μ = (μ y e -μ )/ (y!) (Note: μ 0 =1 by definition) For Poisson, if mean=μ then SD=√μ Examples: Number of colds in a season, num neurons fired in 30 sec (firing rate)

Poisson example Q: If average num colds in a single winter is μ=1.9, what is the probability that a given patient will have 4 colds in one winter? A: (1.9) 4 e -1.9 /4x3x2x1 = 0.0812 ≈ 8%. What is the probability of 4 or more (find for 0-3, subtract from 1), prob=12% Can compute in EXCEL with “=POISSON(y,mean,0)”. =POISSON(4, 1.9, 0) gives 0.0812. =POISSON(4, 1.9, 1) gives cumulative probability of 4 or less (4,3,2,1,0) which is 0.9559.

Poisson distribution

Poisson process Mean rate of events is h events/unit=h (Hazard rate). In T units, we expect μ=hT events on average. Can substitute this average (μ) into (μ y e -μ )/ (y!) to get probability of “y” events in T units.

Poisson process example Example: Cancer clusters Q: Given a cancer rate of h=3/1000 person-years, what is the expected number of cases in 2 years in a population of 1500? A: Rate in 2 years is 2 x (3/1000) =h= 6/1000. Expected is μ=hT= 6/1000 x 1500 = 9 cases. Q: What is the probability of observing exactly 15 cases? A: μ=9, Probability =(9 15 e -9 )/15! = 0.019431≈ 2%. Q: What is the probability of observing 15 or more cases in 1500 persons? A: Plug in 0,1,2, …14 and add to get Q= probability of 14 or less. Probability is 1-Q = 1-0.958534 = 0.041466 ≈ 4%. Can compute with “=Poisson(y,μ,0)” in EXCEL for probability of y events with mean μ. =Poisson(y,μ,1) gives cumulative probability of y or less.

Summary: Descriptive stats for Normal, Binomial & Poisson n = sample size Distribution mean variance SD SE Normal µ σ 2 σ σ/√n Binomial π π(1-π) √π(1-π) √π(1-π)/n Poisson µ µ √µ √µ/n SD = √variance, SE= SD/√n