Supplemental Lecture Notes

Supplemental Lecture Notes
1 - Introduction 2 - Exploratory Data Analysis 3 - Probability Theory 4 - Classical Probability Distributions 5 - Sampling Distrbns / Central Limit Theorem 6 - Statistical Inference 7 - Correlation and Regression (8 - Survival Analysis)

What is the connection between probability and random variables
What is the connection between probability and random variables? Events (and their corresponding probabilities) that involve experimental measurements can be described by random variables.

Example: X = Cholesterol level (mg/dL)
POPULATION random variable X Pop values xi Probabilities p(xi ) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ Total 1 Discrete Example: X = Cholesterol level (mg/dL) Data values xi Relative Frequencies p(xi ) = fi /n x1 p(x1) x2 p(x2) x3 p(x3) ⋮ xk p(xk) Total 1 x1 x2 x3 x4 x5 x6 …etc…. xn SAMPLE of size n

Probability Histogram
POPULATION Pop values x Probabilities p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ Total 1 Example: X = Cholesterol level (mg/dL) random variable X Discrete Probability Histogram “Density” Total Area = 1 X p(x) = Probability that the random variable X is equal to a specific value x, i.e., p(x) = P(X = x) “probability mass function” (pmf) | x

Consider the following discrete random variable…
Example: X = “value shown on a single random toss of a fair die (1, 2, 3, 4, 5, 6)” X is said to be uniformly distributed over the values 1, 2, 3, 4, 5, 6. Probability Table x p(x) 1 1/6 2 3 4 5 6 Probability Histogram X P(X = x) Total Area = 1 Density f(x) “What is the probability of rolling a 4?”

POPULATION Pop values x Probabilities p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ Total 1 Example: X = Cholesterol level (mg/dL) random variable X Discrete Probability Histogram X Total Area = 1 F(x) = Probability that the random variable X is less than or equal to a specific value x, i.e., F(x) = P(X  x) “cumulative distribution function” (cdf) | x

Motivation ~ Consider the following discrete random variable…
Example: X = “value shown on a single random toss of a fair die (1, 2, 3, 4, 5, 6)” X is said to be uniformly distributed over the values 1, 2, 3, 4, 5, 6. Cumulative distribution P(X = x) x p(x) 1 1/6 2 3 4 5 6 P(X  x) F(x) 1/6 2/6 3/6 4/6 5/6 1

Motivation ~ Consider the following discrete random variable…
Example: X = “value shown on a single random toss of a fair die (1, 2, 3, 4, 5, 6)” X is said to be uniformly distributed over the values 1, 2, 3, 4, 5, 6. Cumulative distribution P(X = x) x p(x) 1 1/6 2 3 4 5 6 P(X  x) F(x) 1/6 2/6 3/6 4/6 5/6 1 “staircase graph” from 0 to 1

POPULATION Pop vals x pmf p(x) cdf F(x) = P(X  x) x1 p(x1) F(x1) = p(x1) x2 p(x2) F(x2) = p(x1) + p(x2) x3 p(x3) F(x3) = p(x1) + p(x2) + p(x3) ⋮ Total 1 increases from 0 to 1 Example: X = Cholesterol level (mg/dL) random variable X Discrete Calculating “interval probabilities”… X F(b) = P(X  b) F(a–) = P(X  a–) F(b) – F(a–) = P(X  b) – P(X  a–) = P(a  X  b) p(x) | a– | a | b

FUNDAMENTAL THEOREM OF CALCULUS
POPULATION Pop vals x pmf p(x) cdf F(x) = P(X  x) x1 p(x1) F(x1) = p(x1) x2 p(x2) F(x2) = p(x1) + p(x2) x3 p(x3) F(x3) = p(x1) + p(x2) + p(x3) ⋮ Total 1 increases from 0 to 1 Example: X = Cholesterol level (mg/dL) random variable X Discrete Calculating “interval probabilities”… X F(b) = P(X  b) F(a–) = P(X  a–) F(b) – F(a–) = P(X  b) – P(X  a–) FUNDAMENTAL THEOREM OF CALCULUS (discrete form) = P(a  X  b) p(x) | a– | a | b

FUNDAMENTAL THEOREM OF CALCULUS
POPULATION Pop vals x pmf p(x) cdf F(x) = P(X  x) x1 p(x1) F(x1) = p(x1) x2 p(x2) F(x2) = p(x1) + p(x2) x3 p(x3) F(x3) = p(x1) + p(x2) + p(x3) ⋮ Total 1 increases from 0 to 1 Example: X = Cholesterol level (mg/dL) random variable X Discrete Calculating “interval probabilities”… X F(b) = P(X  b) Hey!!! What about the population mean  population variance  2 ??? and the F(a–) = P(X  a–) F(b) – F(a–) = P(X  b) – P(X  a–) FUNDAMENTAL THEOREM OF CALCULUS (discrete form) = P(a  X  b) p(x) | a– | a | b

POPULATION Pop values x Probabilities pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ Total 1 Example: X = Cholesterol level (mg/dL) random variable X Discrete Just as the sample mean and sample variance s2 were used to characterize “measure of center” and “measure of spread” of a dataset, we can now define the “true” population mean  and population variance  2, using probabilities. Population mean Also denoted by E[X], the “expected value” of the variable X. Population variance

POPULATION 1/6 1/3 1/2 Example: X = Cholesterol level (mg/dL) random variable X Discrete Pop values xi Probabilities p(xi ) 210 1/6 240 1/3 270 1/2 Total 1 250 500

Equally likely outcomes result in a “uniform distribution.”
Example 2: POPULATION 1/ / /3 Example: X = Cholesterol level (mg/dL) random variable X Discrete Equally likely outcomes result in a “uniform distribution.” Pop values xi Probabilities p(xi ) 180 1/3 210 240 Total 1 210 (clear from symmetry) 600

To summarize…

Probability Table Probability Histogram X Total Area = 1 POPULATION Pop xi Probabilities pmf p(xi ) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ 1 Discrete random variable X Frequency Table Density Histogram X Total Area = 1 Data xi Relative Frequencies p(xi ) = fi /n x1 p(x1) x2 p(x2) x3 p(x3) ⋮ xk p(xk) 1 SAMPLE of size n x1 x2 x3 x4 x5 x6 …etc…. xn

Probability Table Probability Histogram X Total Area = 1 POPULATION Pop xi Probabilities pmf p(xi ) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ 1 ? Discrete random variable X Continuous Frequency Table Density Histogram X Total Area = 1 Data xi Relative Frequencies p(xi ) = fi /n x1 p(x1) x2 p(x2) x3 p(x3) ⋮ xk p(xk) 1 SAMPLE of size n x1 x2 x3 x4 x5 x6 …etc…. xn

Example 3: TWO INDEPENDENT POPULATIONS
X1 = Cholesterol level (mg/dL) X2 = Cholesterol level (mg/dL) x p1(x) 210 1/6 240 1/3 270 1/2 Total 1 1 = 250 12 = 500 x p2(x) 180 1/3 210 240 Total 1 2 = 210 22 = 600 NOTE: By definition, this is the sample space of the experiment! What are the probabilities of the corresponding events “D = d” for d = -30, 0, 30, 60, 90? NOTE: By definition, this is the sample space of the experiment! D = X1 – X2 ~ ??? d Outcomes -30 (210, 240) (210, 210), (240, 240) +30 (210, 180), (240, 210), (270, 240) +60 (240, 180), (270, 210) +90 (270, 180)

NO!!! Example 3: TWO INDEPENDENT POPULATIONS x p1(x) 210 1/6 240 1/3
X1 = Cholesterol level (mg/dL) X2 = Cholesterol level (mg/dL) x p1(x) 210 1/6 240 1/3 270 1/2 Total 1 1 = 250 12 = 500 x p2(x) 180 1/3 210 240 Total 1 2 = 210 22 = 600 The outcomes of D are NOT EQUALLY LIKELY!!! D = X1 – X2 ~ ??? d Probabilities p(d) -30 1/9 ? 2/9 ? +30 3/9 ? +60 +90 d Outcomes -30 (210, 240) (210, 210), (240, 240) +30 (210, 180), (240, 210), (270, 240) +60 (240, 180), (270, 210) +90 (270, 180) NO!!!

X1 = Cholesterol level (mg/dL) X2 = Cholesterol level (mg/dL) x p1(x) 210 1/6 240 1/3 270 1/2 Total 1 1 = 250 12 = 500 x p2(x) 180 1/3 210 240 Total 1 2 = 210 22 = 600 D = X1 – X2 ~ ??? d Probabilities p(d) -30 (1/6)(1/3) = 1/18 via independence (210, 210), (240, 240) +30 (210, 180), (240, 210), (270, 240) +60 (240, 180), (270, 210) +90 (270, 180) d Outcomes -30 (210, 240) (210, 210), (240, 240) +30 (210, 180), (240, 210), (270, 240) +60 (240, 180), (270, 210) +90 (270, 180)

X1 = Cholesterol level (mg/dL) X2 = Cholesterol level (mg/dL) x p1(x) 210 1/6 240 1/3 270 1/2 Total 1 1 = 250 12 = 500 x p2(x) 180 1/3 210 240 Total 1 2 = 210 22 = 600 D = X1 – X2 ~ ??? d Probabilities p(d) -30 (1/6)(1/3) = 1/18 via independence (1/6)(1/3) + (1/3)(1/3) = 3/18 +30 (210, 180), (240, 210), (270, 240) +60 (240, 180), (270, 210) +90 (270, 180) d Probabilities p(d) -30 (1/6)(1/3) = 1/18 via independence (210, 210), (240, 240) +30 (210, 180), (240, 210), (270, 240) +60 (240, 180), (270, 210) +90 (270, 180)

1/18 3/18 6/18 5/18 Probability Histogram X1 = Cholesterol level (mg/dL) X2 = Cholesterol level (mg/dL) x p1(x) 210 1/6 240 1/3 270 1/2 Total 1 1 = 250 12 = 500 x p2(x) 180 1/3 210 240 Total 1 2 = 210 22 = 600 What happens if the two populations are dependent? Later… D = X1 – X2 ~ ??? d Probabilities p(d) -30 (1/6)(1/3) = 1/18 via independence (1/6)(1/3) + (1/3)(1/3) = 3/18 +30 (210, 180), (240, 210), (270, 240) +60 (240, 180), (270, 210) +90 (270, 180) d Probabilities p(d) -30 (1/6)(1/3) = 1/18 via independence (1/6)(1/3) + (1/3)(1/3) = 3/18 +30 (1/6)(1/3) + (1/3)(1/3) + (1/2)(1/3) = 6/18 +60 (1/3)(1/3) + (1/2)(1/3) = 5/18 +90 (1/2)(1/3) = 3/18

1/18 3/18 6/18 5/18 Probability Histogram X1 = Cholesterol level (mg/dL) X2 = Cholesterol level (mg/dL) x p1(x) 210 1/6 240 1/3 270 1/2 Total 1 1 = 250 12 = 500 1 = 250 12 = 500 x p2(x) 180 1/3 210 240 Total 1 2 = 210 22 = 600 2 = 210 22 = 600 D = (-30)(1/18) + (0)(3/18) + (30)(6/18) + (60)(5/18) + (90)(3/18) = 40 D = X1 – X2 ~ ??? d Probabilities f(d) -30 (1/6)(1/3) = 1/18 via independence (1/6)(1/3) + (1/3)(1/3) = 3/18 +30 (1/6)(1/3) + (1/3)(1/3) + (1/2)(1/3) = 6/18 +60 (1/3)(1/3) + (1/2)(1/3) = 5/18 +90 (1/2)(1/3) = 3/18 d Probabilities f(d) -30 (1/6)(1/3) = 1/18 via independence (1/6)(1/3) + (1/3)(1/3) = 3/18 +30 (210, 180), (240, 210), (270, 240) +60 (240, 180), (270, 210) +90 (270, 180) D = 1 – 2 D2 = (-70) 2(1/18) + (-40) 2(3/18) + (-10) 2(6/18) + (20) 2(5/18) + (50) 2(3/18) = 1100 D 2 = 1 2 + 2 2

General: TWO INDEPENDENT POPULATIONS
IF the two populations are dependent… 1/18 3/18 6/18 5/18 Probability Histogram X1 X1 = Cholesterol level (mg/dL) X2 = Cholesterol level (mg/dL) X2 x f1(x) 210 1/6 240 1/3 270 1/2 Total 1 1 = 250 12 = 500 1 = 250 12 = 500 …then this formula still holds, BUT…… x f2(x) 180 1/3 210 240 Total 1 2 = 210 22 = 600 2 = 210 22 = 600 Mean (X1 – X2) = Mean (X1) – Mean (X2) D = (-30)(1/18) + (0)(3/18) + (30)(6/18) + (60)(5/18) + (90)(3/18) = 40 D = X1 – X2 ~ ??? d Probabilities f(d) -30 (1/6)(1/3) = 1/18 via independence (1/6)(1/3) + (1/3)(1/3) = 3/18 +30 (210, 180), (240, 210), (270, 240) +60 (240, 180), (270, 210) +90 (270, 180) d Probabilities f(d) -30 (1/6)(1/3) = 1/18 via independence (1/6)(1/3) + (1/3)(1/3) = 3/18 +30 (1/6)(1/3) + (1/3)(1/3) + (1/2)(1/3) = 6/18 +60 (1/3)(1/3) + (1/2)(1/3) = 5/18 +90 (1/2)(1/3) = 3/18 D = 1 – 2 Var (X1 – X2) = Var (X1) + Var (X2) D2 = (-70) 2(1/18) + (-40) 2(3/18) + (-10) 2(6/18) + (20) 2(5/18) + (50) 2(3/18) = 1100 – 2 Cov (X1, X2) These two formulas are valid for continuous as well as discrete distributions. D 2 = 1 2 + 2 2

NOTICE TO STAT 324 Slides contain more details on properties of Expected Values. They are not required for Stat 324, but if you are experiencing difficulty with the formulas, you may find them of some benefit. Special note regarding Slide 41: Similar to the “alternate computational formula” for sample variance s2, such a formula also exists for population variance σ 2, derived there. Stat 324 material picks up with the Binomial Distribution.

POPULATION Pop values x Probabilities pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ Total 1 Example: X = Cholesterol level (mg/dL) random variable X Discrete General Properties of “Expectation” of X Suppose X is transformed to another random variable, say h(X). Then by def,

POPULATION Pop values x Probabilities pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ Total 1 Example: X = Cholesterol level (mg/dL) random variable X Discrete General Properties of “Expectation” of X Suppose X is constant, say b, throughout entire population… b Then by def,

POPULATION Pop values x Probabilities pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ Total 1 Example: X = Cholesterol level (mg/dL) random variable X Discrete General Properties of “Expectation” of X Suppose X is constant, say b, throughout entire population… Then…

POPULATION Pop values x Probabilities pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ Total 1 Example: X = Cholesterol level (mg/dL) random variable X Discrete General Properties of “Expectation” of X Multiply X by any constant a… a Then by def,

POPULATION Pop values x Probabilities pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ Total 1 Example: X = Cholesterol level (mg/dL) random variable X Discrete General Properties of “Expectation” of X Multiply X by any constant a… Then… i.e.,…

POPULATION Pop values x Probabilities pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ Total 1 Example: X = Cholesterol level (mg/dL) random variable X Discrete General Properties of “Expectation” of X Multiply X by any constant a… Add any constant b to X… Then… i.e.,…

POPULATION Pop values x Probabilities pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ Total 1 Example: X = Cholesterol level (mg/dL) random variable X Discrete General Properties of “Expectation” of X

POPULATION Pop values x Probabilities pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ Total 1 Example: X = Cholesterol level (mg/dL) random variable X Discrete General Properties of “Expectation” of X Multiply X by any constant a… then X is also multiplied by a.

POPULATION Pop values x Probabilities pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ Total 1 Example: X = Cholesterol level (mg/dL) random variable X Discrete General Properties of “Expectation” of X Multiply X by any constant a… then X is also multiplied by a. i.e.,… i.e.,…

POPULATION Pop values x Probabilities pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ Total 1 Example: X = Cholesterol level (mg/dL) random variable X Discrete General Properties of “Expectation” of X Add any constant b to X… then b is also added to X .

POPULATION Pop values x Probabilities pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ Total 1 Example: X = Cholesterol level (mg/dL) random variable X Discrete General Properties of “Expectation” of X Add any constant b to X… then b is also added to X . i.e.,… i.e.,…

POPULATION Pop values x Probabilities pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ Total 1 Example: X = Cholesterol level (mg/dL) random variable X Discrete General Properties of “Expectation” of X

POPULATION Pop values x Probabilities pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ Total 1 Example: X = Cholesterol level (mg/dL) random variable X Discrete General Properties of “Expectation” of X This is the analogue of the “alternate computational formula” for the sample variance s2.

~ The Binomial Distribution ~
Used only when dealing with binary outcomes (two categories: “Success” vs. “Failure”), with a fixed probability of Success () in the population. Calculates the probability of obtaining any given number of Successes in a random sample of n independent “Bernoulli trials.” Has many applications and generalizations, e.g., multiple categories, variable probability of Success, etc.

How can we calculate the probability of
POPULATION 40% Male, % Female For any randomly selected individual, define a binary random variable: RANDOMSAMPLE n = 100 Discrete random variable X = # Males in sample (0, 1, 2, 3, …, 99, 100) x p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ 1 F(x) F(x1) F(x2) ⋮ 1 How can we calculate the probability of How can we calculate the probability of P(X = x), for x = 0, 1, 2, 3, …,100? p(x) = P(X = x), for x = 0, 1, 2, 3, …,100? P(X = 0), P(X = 1), P(X = 2), …, P(X = 99), P(X = 100)? p(x) = F(x) = P(X ≤ x), for x = 0, 1, 2, 3, …,100?

How can we calculate the probability of
POPULATION 40% Male, % Female For any randomly selected individual, define a binary random variable: RANDOMSAMPLE n = 100 Discrete random variable X = # Males in sample (0, 1, 2, 3, …, 99, 100) Example: How can we calculate the probability of F(x) = P(X ≤ x), for x = 0, 1, 2, 3, …,100? p(25) = P(X = 25)? P(X = x), for x = 0, 1, 2, 3, …,100? p(x) = Solution: Solution: Model the sample as a sequence of independent coin tosses, with 1 = Heads (Male), 0 = Tails (Female), where P(H) = 0.4, P(T) = 0.6 .… etc….

How many possible outcomes of n = 100 tosses exist with X = 25 Heads?
3 4 5 97 98 99 100 … X = 25 Heads: { H1, H2, H3,…, H25 } HOWEVER… permutations of 25 among 100 There are 100 possible open slots for H1 to occupy. For each one of them, there are 99 possible open slots left for H2 to occupy. For each one of them, there are 98 possible open slots left for H3 to occupy. …etc…etc…etc… For each one of them, there are 77 possible open slots left for H24 to occupy. For each one of them, there are 76 possible open slots left for H25 to occupy. Hence, there are ?????????????????????? possible outcomes. 100  99  98  …  77  76 This value is the number of permutations of the coins, denoted 100P25.

3 4 5 97 98 99 100 X = 25 Heads: { H1, H2, H3,…, H25 } 100  99  98  …  77  76 HOWEVER… permutations of 25 among 100 This number unnecessarily includes the distinct permutations of the 25 among themselves, all of which have Heads in the same positions. For example: We would not want to count this as a distinct outcome. 1 2 3 4 5 97 98 99 100

3 4 5 97 98 99 100 X = 25 Heads: { H1, H2, H3,…, H25 } 100  99  98  …  77  76 HOWEVER… permutations of 25 among 100 This number unnecessarily includes the distinct permutations of the 25 among themselves, all of which have Heads in the same positions. How many is that? By the same logic…... 25  24  23  …  3  2  1 “25 factorial” - denoted 25! 100  99  98  …  77  76 25  24  23  …  3  2  1 100!_ 25! 75! = R: choose(100, 25) Calculator: 100 nCr 25 “100-choose-25” - denoted or 100C25 This value counts the number of combinations of 25 Heads among 100 coins.

3 4 5 97 98 99 100 0.4 0.6 Answer: What is the probability of each such outcome? Recall that, per toss, P(Heads) =  = P(Tails) = 1 –  = 0.6 Answer: Via independence in binary outcomes between any two coins, 0.4  0.6  0.6  0.4  0.6  …  0.6  0.4  0.4  0.6 = Therefore, the probability P(X = 25) is equal to……. R: dbinom(25, 100, .4)

3 4 5 97 98 99 100 0.5 0.4 0.6 Answer: This is the “equally likely” scenario! What is the probability of each such outcome? Recall that, per toss, P(Heads) =  = P(Tails) = 1 –  = 0.6  = 0.5 1 –  = 0.5 Answer: Via independence in binary outcomes between any two coins, 0.4  0.6  0.6  0.4  0.6  …  0.6  0.4  0.4  0.6 = 0.5  0.5  0.5  0.5  0.5  …  0.5  0.5  0.5  0.5 = Therefore, the probability P(X = 25) is equal to……. Question: What if the coin were “fair” (unbiased), i.e.,  = 1 –  = 0.5 ?

independent, with constant probability () per trial
POPULATION 40% Male, % Female For any randomly selected individual, define a binary random variable: “Success” vs. “Failure” “Failure” “Success”  1 –  RANDOMSAMPLE n = 100 Discrete random variable X = # Males in sample (0, 1, 2, 3, …, n) Discrete random variable X = # Males in sample (0, 1, 2, 3, …, 99, 100) Discrete random variable X = # “Successes” in sample (0, 1, 2, 3, …, n) size n Example: What is the probability P(X = 25)? F(x) = P(X ≤ x), for x = 0, 1, 2, 3, …,100? x x = 0, 1, 2, 3, …,100 n Solution: Model the sample as a sequence of n = 100 independent coin tosses, with 1 = Heads (Male), 0 = Tails (Female). Solution: n Bernoulli trials with P(“Success”) = , P(“Failure”) = 1 – . independent, with constant probability () per trial Then X is said to follow a Binomial distribution, written X ~ Bin(n, ), with “probability mass function” p(x) = , x = 0, 1, 2, …, n. .… etc….

Example: Blood Type probabilities, revisited
Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 Check: 1. Independent outcomes? Reasonably assume that outcomes “Type O” vs. “Not Type O” between two individuals are independent of each other.  2. Constant probability  ? Suppose n = 10 individuals are to be selected at random from the population. Probability table for X = #(Type O) From table,  = P(Type O) = .461 throughout population.  Binomial model applies?

p(x) = (.461)x (.539)10 – x Example: Blood Type probabilities, revisited R: dbinom(0:10, 10, .461) Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 x p(x) F (x) (.461)0 (.539)10 = 1 (.461)1 (.539)9 = 2 (.461)2 (.539)8 = 3 (.461)3 (.539)7 = 4 (.461)4 (.539)6 = 5 (.461)5 (.539)5 = 6 (.461)6 (.539)4 = 7 (.461)7 (.539)3 = 8 (.461)8 (.539)2 = 9 (.461)9 (.539)1 = 10 (.461)10 (.539)0 = Suppose n = 10 individuals are to be selected at random from the population. Probability table for X = #(Type O) Binomial model applies. X ~ Bin(10, .461)

n = 10 p = .461 pmf = function(x)(dbinom(x, n, p)) N = x = 0:10 bin.dat = rep(x, N*pmf(x)) hist(bin.dat, freq = F, breaks = c(-.5, x+.5), col = "green") axis(1, at = x) axis(2)

p(x) = (.461)x (.539)10 – x Example: Blood Type probabilities, revisited R: dbinom(0:10, 10, .461) Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 x p(x) F (x) (.461)0 (.539)10 = 1 (.461)1 (.539)9 = 2 (.461)2 (.539)8 = 3 (.461)3 (.539)7 = 4 (.461)4 (.539)6 = 5 (.461)5 (.539)5 = 6 (.461)6 (.539)4 = 7 (.461)7 (.539)3 = 8 (.461)8 (.539)2 = 9 (.461)9 (.539)1 = 10 (.461)10 (.539)0 = Suppose n = 10 individuals are to be selected at random from the population. Probability table for X = #(Type O) Binomial model applies. X ~ Bin(10, .461) Also, can show mean  =  x p(x) = and variance  2 =  (x – ) 2 p(x) = n = 4.61 = (10)(.461) n (1 – ) = 2.48

p(x) = (.461)x (.539)10 – x Example: Blood Type probabilities, revisited R: dbinom(0:10, 10, .461) Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 x p(x) F (x) (.461)0 (.539)10 = 1 (.461)1 (.539)9 = 2 (.461)2 (.539)8 = 3 (.461)3 (.539)7 = 4 (.461)4 (.539)6 = 5 (.461)5 (.539)5 = 6 (.461)6 (.539)4 = 7 (.461)7 (.539)3 = 8 (.461)8 (.539)2 = 9 (.461)9 (.539)1 = 10 (.461)10 (.539)0 = Suppose n = 10 individuals are to be selected at random from the population. Probability table for X = #(Type O) Binomial model applies. X ~ Bin(10, .461) Also, can show mean  =  x p(x) = and variance  2 =  (x – ) 2 p(x) = n = 4.61 n (1 – ) = 2.48

Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 Therefore, p(x) = x = 0, 1, 2, …, 1500. RARE EVENT! Suppose n = 10 individuals are to be selected at random from the population. Probability table for X = #(Type AB–) n = 1500 individuals are to Binomial model applies. X ~ Bin(10, .461) X ~ Bin(1500, .007) Also, can show mean  =  x p(x) = and variance  2 =  (x – ) 2 p(x) = n = 10.5 n (1 – ) 2.48 =

Therefore, p(x) = x = 0, 1, 2, …, 1500. Is there a better alternative? RARE EVENT! Long positive skew as x  1500 …but contribution  0

Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 Therefore, p(x) = x = 0, 1, 2, …, 1500. Is there a better alternative? Poisson distribution x = 0, 1, 2, …, where mean and variance are  = n and  2 = n RARE EVENT!  Suppose n = 10 individuals are to be selected at random from the population. Probability table for X = #(Type AB–) n = 1500 individuals are to = 10.5 Binomial model applies. X ~ Bin(1500, .007) X ~ Poisson(10.5) Also, can show mean  =  x p(x) = and variance  2 =  (x – ) 2 p(x) = n = 10.5 Notation: Sometimes the symbol  (“lambda”) is used instead of  (“mu”). n (1 – ) =

Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 Therefore, p(x) = x = 0, 1, 2, …, 1500. Is there a better alternative? Poisson distribution x = 0, 1, 2, …, where mean and variance are  = n and  2 = n RARE EVENT! Suppose n = 1500 individuals are to be selected at random from the population. Probability table for X = #(Type AB–) = 10.5 X ~ Poisson(10.5) Ex: Probability of exactly X = 15 Type(AB–) individuals = ? Poisson: Binomial: (both ≈ .0437)

Example: Deaths in Wisconsin

Example: Deaths in Wisconsin
Assuming deaths among young adults are relatively rare, we know the following: Average deaths per year λ = Mortality rate (α) seems constant. Therefore, the Poisson distribution can be used as a good model to make future predictions about the random variable X = “# deaths” per year, for this population (15-24 yrs)… assuming current values will still apply. Probability of exactly X = 600 deaths next year P(X = 600) = 0.0131 R: dpois(600, 584) Probability of exactly X = 1200 deaths in the next two years Mean of 584 deaths per yr  Mean of 1168 deaths per two yrs, so let λ = 1168: P(X = 1200) = Probability of at least one death per day: λ = = 1.6 deaths/day P(X ≥ 1) = P(X = 1) + P(X = 2) + P(X = 3) + … True, but not practical. P(X ≥ 1) = 1 – P(X = 0) = 1 – = 1 – e–1.6 = 0.798

Poisson Distribution (discrete)
For x = 0, 1, 2, …, this calculates P(x Events) in a random sample of n trials coming from a population with rare P(Event) = . But it may also be used to calculate P(x Events) within a random interval of time units, for a “Poisson process” having a known “Poisson rate” α. Recall… T X = # “clicks” on a Geiger counter in normal background radiation.

Poisson Distribution (discrete)
For x = 0, 1, 2, …, this calculates P(x Events) in a random sample of n trials coming from a population with rare P(Event) = . But it may also be used to calculate P(x Events) within a random interval of time units, for a “Poisson process” having a known “Poisson rate” α. T X = time between “clicks” on a Geiger counter in normal background radiation. X = # “clicks” on a Geiger counter in normal background radiation. “Time-to-Event Analysis” “Time-to-Failure Analysis” “Reliability Analysis” “Survival Analysis” failures, deaths, births, etc. Time between events is often modeled by the Exponential Distribution (continuous).

Classical Discrete Probability Distributions
Binomial ~ X = # Successes in n trials, P(Success) =  Poisson ~ As above, but n large,  small, i.e., Success RARE Negative Binomial ~ X = # trials for k Successes, P(Success) =  Geometric ~ As above, but specialized to k = 1 Hypergeometric ~ As Binomial, but  changes between trials Multinomial ~ As Binomial, but for multiple categories, with 1 + 2 + … + last = 1 and x1 + x2 + … + xlast = n

POPULATION Continuous Discrete random variable X “In the limit…” Time intervals = 0.5 secs Time intervals = 5.0 secs Time intervals = 2.0 secs Time intervals = 1.0 secs Example: X = Cholesterol level (mg/dL) Example: X = “reaction time” “Pain Threshold” Experiment: Volunteers place one hand on metal plate carrying low electrical current; measure duration till hand withdrawn. we obtain a density curve Total Area = 1 SAMPLE In principle, as # individuals in samples increase without bound, the class interval widths can be made arbitrarily small, i.e, the scale at which X is measured can be made arbitrarily fine, since it is continuous.

Cumulative probability F(x) = P(X  x)
= Area under density curve up to x “In the limit…” we obtain a density curve 00 f(x) = probability density function (pdf) f(x)  0 Area = 1 F(x) increases continuously from 0 to 1. x x x As with discrete variables, the density f(x) is the height, NOT the probability p(x) = P(X = x). In fact, the zero area “limit” argument would seem to imply P(X = x) = 0 ??? (Later…) However, we can define “interval probabilities” of the form P(a  X  b), using cdf F(x).

F(b) F(b)  F(a) F(a) However,
Cumulative probability F(x) = P(X  x) = Area under density curve up to x “In the limit…” we obtain a density curve F(b) f(x) = probability density function (pdf) F(b)  F(a) F(a) f(x)  0 Area = 1 F(x) increases continuously from 0 to 1. a b a b As with discrete variables, the density f(x) is the height, NOT the probability p(x) = P(X = x). In fact, the zero area “limit” argument would seem to imply P(X = x) = 0 ??? (Later…) However, we can define “interval probabilities” of the form P(a  X  b), using cdf F(x).

Cumulative probability F(x) = P(X  x)
= Area under density curve up to x “In the limit…” we obtain a density curve F(b) f(x) = probability density function (pdf) F(b)  F(a) F(a) f(x)  0 Area = 1 F(x) increases continuously from 0 to 1. a b a b An “interval probability” P(a  X  b) can be calculated as the amount of area under the curve f(x) between a and b, or the difference P(X  b)  P(X  a), i.e., F(b)  F(a). (Ordinarily, finding the area under a general curve requires calculus techniques… unless the “curve” is a straight line, for instance. Examples to follow…)

Consider the following continuous random variable…
Example: X = “Ages of children from 1 year old to 6 years old” Further suppose that X is uniformly distributed over the interval [1, 6]. > 0  X Total Area = 1 Density Check? Base = 6 – 1 = 5 5  0.2 = 1  Height = 0.2 “What is the probability of rolling a 4?” that a random child is 4 years old?” doesn’t mean….. = 0 !!!!! The probability that a continuous random variable is exactly equal to any single value is ZERO! A single value is one point out of an infinite continuum of points on the real number line.

Example: X = “Ages of children from 1 year old to 6 years old” Further suppose that X is uniformly distributed over the interval [1, 6]. X Density “What is the probability of rolling a 4?” that a random child is 4 years old?” between 4 and 5 years old?” actually means.... = (5 – 4)(0.2) = 0.2 NOTE: Since P(X = 5) = 0, no change for P(4  X  5), P(4 < X  5), or P(4 < X < 5).

Example: X = “Ages of children from 1 year old to 6 years old” Further suppose that X is uniformly distributed over the interval [1, 6]. Cumulative probability F(x) = P(X  x) = Area under density curve up to x X For any x, the area under the curve is Density F(x) = 0.2 (x – 1). x

Example: X = “Ages of children from 1 year old to 6 years old” Further suppose that X is uniformly distributed over the interval [1, 6]. Cumulative probability F(x) = P(X  x) = Area under density curve up to x F(x) = 0.2 (x – 1) For any x, the area under the curve is F(x) increases continuously from 0 to 1. Density F(x) = 0.2 (x – 1). (compare with “staircase graph” for discrete case) X x

Example: X = “Ages of children from 1 year old to 6 years old” Further suppose that X is uniformly distributed over the interval [1, 6]. Cumulative probability F(x) = P(X  x) = Area under density curve up to x X F(x) = 0.2 (x – 1) F(5) = 0.8 Density “What is the probability of rolling a 4?” that a random child is under 5 years old? 0.8

Example: X = “Ages of children from 1 year old to 6 years old” Further suppose that X is uniformly distributed over the interval [1, 6]. Cumulative probability F(x) = P(X  x) = Area under density curve up to x X F(x) = 0.2 (x – 1) Density F(4) = 0.6 “What is the probability of rolling a 4?” that a random child is under 4 years old? 0.6

Example: X = “Ages of children from 1 year old to 6 years old” Further suppose that X is uniformly distributed over the interval [1, 6]. Cumulative probability F(x) = P(X  x) = Area under density curve up to x X F(x) = 0.2 (x – 1) F(5) = 0.8 Density F(4) = 0.6 “What is the probability of rolling a 4?” that a random child is between 4 and 5 years old?”

Example: X = “Ages of children from 1 year old to 6 years old” Further suppose that X is uniformly distributed over the interval [1, 6]. Cumulative probability F(x) = P(X  x) = Area under density curve up to x X F(x) = 0.2 (x – 1) F(5) = 0.8 0.2 Density F(4) = 0.6 “What is the probability of rolling a 4?” that a random child is between 4 and 5 years old?” = F(5)  F(4) = 0.8 – 0.6 = 0.2

Example: X = “Ages of children from 1 year old to 6 years old” Further suppose that X is uniformly distributed over the interval [1, 6].  0  Area = Base  Height = 1  Density

Example: X = “Ages of children from 1 year old to 6 years old” Cumulative probability F(x) = P(X  x) = Area under density curve up to x Cumulative Distribution Function F(x) Density x x

Example: X = “Ages of children from 1 year old to 6 years old” Cumulative probability F(x) = P(X  x) = Area under density curve up to x Cumulative Distribution Function F(x) Density x “What is the probability that a child is under 4 years old?” “What is the probability that a child is under 5 years old?” “What is the probability that a child is between 4 and 5?”

Fundamental Theorem of Calculus
A continuous random variable X corresponds to a probability density function (pdf) f(x), whose graph is a density curve. f(x) is NOT a pmf! Cumulative probability function (cdf) In summary… Fundamental Theorem of Calculus F(x) increases continuously from 0 to 1. Moreover…

SECTION 4.3 IN POSTED LECTURE NOTES

Four Examples: 1 For any b > 0, consider the following probability density function (pdf)... Determine the cumulative distribution function (cdf) For any x < 0, it follows that… For any it follows that…

Four Examples: 1 For any b > 0, consider the following probability density function (pdf)... Determine the cumulative distribution function (cdf) For any x < 0, it follows that For any it follows that…

Four Examples: 1  For any b > 0, consider the following probability density function (pdf)... Determine the cumulative distribution function (cdf) For any x < 0, it follows that For any it follows that… Note: For any it follows that…

Determine the cumulative distribution function (cdf)
Four Examples: 1 For any b > 0, consider the following probability density function (pdf)... Determine the cumulative distribution function (cdf) Monotonic and continuous from 0 to 1

Four Examples: 2 For any b > a > 0, consider the probability density function (pdf)... Determine the cumulative distrib function (cdf) For any it follows that For any it follows that For any it follows that For any it follows that

Four Examples: 2 For any b > a > 0, consider the probability density function (pdf)... Determine the mean Determine the cumulative distrib function (cdf) Determine the variance

WARNING: “IMPROPER INTEGRAL”
Four Examples: 3 Consider the following probability density function (pdf)... Confirm pdf WARNING: “IMPROPER INTEGRAL” 

Four Examples: 4 Four Examples: 3 Consider the following probability density function (pdf)... Confirm pdf WARNING: “IMPROPER INTEGRAL” 

Four Examples: 4 Four Examples: 3 Consider the following probability density function (pdf)... Confirm pdf WARNING: “IMPROPER INTEGRAL”   does not exist!

Time intervals = 0.5 secs Time intervals = 5.0 secs Time intervals = 1.0 secs Time intervals = 2.0 secs DISCRETE CONTINUOUS “Density” Interval widths can be made arbitrarily small, i.e, the scale at which X is measured can be made arbitrarily fine, since it is continuous. As x  0 and # rectangles  ∞, this “Riemann sum” approaches the area under the density curve f(x), expressed as a definite integral.

~ The Normal Distribution ~ (a.k.a. “The Bell Curve”)
X Johann Carl Friedrich Gauss standard deviation X ~ N(μ, σ) σ Symmetric, unimodal Models many (but not all) natural systems Mathematical properties make it useful to work with mean μ

Standard Normal Distribution
Z ~ N(0, 1) SPECIAL CASE Total Area = 1 1 Z The cumulative distribution function (cdf) is denoted by (z). It is not expressible in explicit, closed form, but is tabulated, and computable in R via the command pnorm.

Example Standard Normal Distribution Z ~ N(0, 1) Find (1.2) = P(Z  1.2). Total Area = 1 1 Z 1.2 “z-score”

Example Standard Normal Distribution Z ~ N(0, 1) Find (1.2) = P(Z  1.2). Use the included table. Total Area = 1 1 Z 1.2 “z-score”

Lecture Notes Appendix…

Example Standard Normal Distribution Z ~ N(0, 1) Find (1.2) = P(Z  1.2). Use the included table. Use R: > pnorm(1.2) [1] Total Area = 1 1 P(Z > 1.2) Z 1.2 “z-score” Note: Because this is a continuous distribution, P(Z = 1.2) = 0, so there is no difference between P(Z > 1.2) and P(Z  1.2), etc.

Z ~ N(0, 1) μ σ X ~ N(μ, σ) 1 Z Why be concerned about this, when most “bell curves” don’t have mean = 0, and standard deviation = 1? Any normal distribution can be transformed to the standard normal distribution via a simple change of variable.

Random Variable X = Age at first birth POPULATION Example Question: What proportion of the population had their first child before the age of 27.2 years old? P(X < 27.2) = ? Year 2010 X ~ N(25.4, 1.5) μ = 25.4 σ = 1.5 27.2

Random Variable POPULATION Example X ~ N(25.4, 1.5)
X = Age at first birth POPULATION Example Question: What proportion of the population had their first child before the age of 27.2 years old? P(X < 27.2) = ? The x-score = 27.2 must first be transformed to a corresponding z-score. Year 2010 X ~ N(25.4, 1.5) σ = 1.5 μ = 25.4 μ = 25.4 μ = 27.2 33

Random Variable X = Age at first birth POPULATION Example Question: What proportion of the population had their first child before the age of 27.2 years old? P(X < 27.2) = ? P(Z < 1.2) = Year 2010 X ~ N(25.4, 1.5) σ = 1.5 Using R: > pnorm(27.2, 25.4, 1.5) [1] μ = 25.4 μ = 27.2 33

Z ~ N(0, 1) 1 Z What symmetric interval about the mean 0 contains 95% of the population values? That is…

Z ~ N(0, 1) Use the included table. 0.95 0.025 0.025 Z -z.025 = ? +z.025 = ? What symmetric interval about the mean 0 contains 95% of the population values? That is…

Lecture Notes Appendix…

Z ~ N(0, 1) Use the included table. Use R: > qnorm(.025) [1] > qnorm(.975) [1] 0.95 0.025 0.025 Z -z.025 = -1.96 -z.025 = ? “.025 critical values” +z.025 = +1.96 +z.025 = ? What symmetric interval about the mean 0 contains 95% of the population values?

Z ~ N(0, 1) X ~ N(25.4, 1.5) X ~ N(μ, σ) What symmetric interval about the mean age of 25.4 contains 95% of the population values? 22.46  X  yrs > areas = c(.025, .975) > qnorm(areas, 25.4, 1.5) [1] 0.95 0.025 0.025 Z -z.025 = -1.96 -z.025 = ? “.025 critical values” +z.025 = +1.96 +z.025 = ? What symmetric interval about the mean 0 contains 95% of the population values?

Z ~ N(0, 1) Use the included table. 0.90 0.05 0.05 Z -z.05 = ? +z.05 = ? Similarly… What symmetric interval about the mean 0 contains 90% of the population values?

…so average 1.64 and 1.65 0.95  average of and …

Z ~ N(0, 1) Use the included table. Use R: > qnorm(.05) [1] > qnorm(.95) [1] 0.90 0.05 0.05 Z -z.05 = ? -z.05 = +z.05 = +z.05 = ? “.05 critical values” Similarly… What symmetric interval about the mean 0 contains 90% of the population values?

Z ~ N(0, 1) In general…. 1 –  0.90 0.05  / 2  / 2 0.05 Z -z / 2 -z.05 = ? -z.05 = +z.05 = +z / 2 +z.05 = ? “ / 2 critical values” “.05 critical values” Similarly… What symmetric interval about the mean 0 contains 100(1 – )% of the population values?

Normal Approximation to the Binomial Distribution
continuous discrete Normal Approximation to the Binomial Distribution Suppose a certain outcome exists in a population, with constant probability . We will randomly select a random sample of n individuals, so that the binary “Success vs. Failure” outcome of any individual is independent of the binary outcome of any other individual, i.e., n Bernoulli trials (e.g., coin tosses). Discrete random variable X = # Successes in sample (0, 1, 2, 3, …,, n) Discrete random variable X = # Successes in sample (0, 1, 2, 3, …,, n) P(Success) =  P(Failure) = 1 –  Then X is said to follow a Binomial distribution, written X ~ Bin(n, ), with “probability function” p(x) = , x = 0, 1, 2, …, n.

> dbinom(10, 100, .2) [1] Area

> pbinom(10, 100, .2) [1] Area

“Sampling Distribution” of
Therefore, if… X ~ Bin(n, ) with n  15 and n (1 – )  15, then… That is… “Sampling Distribution” of

Classical Continuous Probability Distributions
Normal distribution Log-Normal ~ X is not normally distributed (e.g., skewed), but Y = “logarithm of X” is normally distributed Student’s t-distribution ~ Similar to normal distr, more flexible F-distribution ~ Used when comparing multiple group means Chi-squared distribution ~ Used extensively in categorical data analysis Others for specialized applications ~ Gamma, Beta, Weibull…

Supplemental Lecture Notes

Similar presentations

Presentation on theme: "Supplemental Lecture Notes"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Supplemental Lecture Notes

Similar presentations

Presentation on theme: "Supplemental Lecture Notes"— Presentation transcript:

Similar presentations

About project

Feedback