Download presentation

Presentation is loading. Please wait.

Published byLisandro Jayne Modified over 2 years ago

1
Part 1 – Probability and Distribution Theory Professor William Greene Stern School of Business IOMS Department Department of Economics Statistical Inference and Regression Analysis: Stat-GB , Stat-UB

2
Part 1 – Probability and Distribution Theory

3
1 – Probability

4
Part 1 – Probability and Distribution Theory 4/107 Sample Space Random outcomes: The result of a process Sequence of events, Number of events, Measurement of a length of time, space, etc. Outcomes, experiments and sample spaces

5
Part 1 – Probability and Distribution Theory 5/107 Consumer Choice: 4 possible ways a randomly chosen traveler might travel between Sydney and Melbourne = {Air, Train, Bus, Car}

6
Part 1 – Probability and Distribution Theory 6/107 Market Behavior: Fair Isaacs credit card service to major vendors = {Reject, Accept}

7
Part 1 – Probability and Distribution Theory 7/107 Measurement of Lifetimes A box of light bulbs states Average life is 1500 hours Outcome = length of time until failure (lifetime) of a randomly chosen light bulb = {lifetime | lifetime > 0}

8
Part 1 – Probability and Distribution Theory 8/107 Events Events are defined as Subsets of sample space, such as empty set Intersection of related events Complements such as A and not A Disjoint sets such as (train,bus),(air,car) Any subset including is a disjoint union of subsets: = (Air, Train) (Bus, Car)

9
Part 1 – Probability and Distribution Theory 9/107 Probability is a Measure The sample space is a - field: Contains at least one nonempty subset (event) Is closed under complementarity Is closed under countable union Probability is a measure defined on all subsets of Axioms of Probability P( ) = 1 A P(A) > 0 If A B = { }, P(A B) = P(A) + P(B)

10
Part 1 – Probability and Distribution Theory 10/107 Implications of the Axioms P(~A) = 1 – P(A) as A ~A = P( ) = 0 as = ~ and P( ) = 1 A B P(A) < P(B) as B = A + (~A B) P(A B) = P(A) + P(B) – P(A B)

11
Part 1 – Probability and Distribution Theory 11/107 Probability Assigning probability: Size of an event relative to size of sample space. Counting rules for equally likely discrete outcomes Using combinations and permutations to count elements Example: Discrete uniform, poker hands Example hypergeometric: the super committee (House 242R,193D, Senate 49R, 51D&I) Measurement for continuous outcomes

12
Part 1 – Probability and Distribution Theory 12/107 Applications: Games of Chance; Poker In a 5 card hand from a deck of 52, there are (52*51*50*49*48)/(5*4*3*2*1) different possible hands. (Order doesnt matter). 2,598,960 possible hands. How many of these hands have 4 aces? 48 = the 4 aces plus any of the remaining 48 cards.

13
Part 1 – Probability and Distribution Theory 13/107 Some Poker Hands Royal Flush – Top 5 cards in a suit Straight Flush – 5 sequential cards in the same suit suit 4 of a kind – plus any other card Full House – 3 of one kind, 2 of another. (Also called a boat.) Flush – 5 cards in a suit, not sequential Straight – 5 cards in a numerical row, not the same suit

14
Part 1 – Probability and Distribution Theory 14/107 5 Card Poker Hands

15
Part 1 – Probability and Distribution Theory 15/107 The Dead Mans Hand The dead mans hand is 5 cards, 2 aces, 2 8s and some other 5 th card (Wild Bill Hickok was holding this hand when he was shot in the back and killed in 1876.) The number of hands with two aces and two 8s is 44 = 1,584 The rest of the story claims that Hickok held all black cards (the bullets). The probability for this hand falls to only 44/ (The four cards in the picture and one of the remaining 44.) Some claims have been made about the 5 th card, but noone is sure – there is no record.

16
Part 1 – Probability and Distribution Theory 16/107 Budget Supercommittee

17
Part 1 – Probability and Distribution Theory 17/107 Conditional Probability P(A|B) = P(A,B)/P(B) = Size of A relative to a subset of Basic result p(A,B) = p(A|B) p(B) (follows from the definition) Bayes theorem Applications – mammography, drug testing, lie detector test, PSA test.

18
Part 1 – Probability and Distribution Theory 18/107 Using Conditional Probabilities: Bayes Theorem

19
Part 1 – Probability and Distribution Theory 19/107 Drug Testing Data P(Test correctly indicates disease)=.98 (Sensitivity) P(Test correctly indicates absence)=.95 (Specificity) P(Disease) =.005 (Fairly rare) Notation + = test indicates disease, – = indicates no disease D = presence of disease, N = absence of disease Data: P(D) =.005 (Incidence of the disease) P(+|D) =.98 (Correct detection of the disease) P( – |N) =.95 (Correct failure to detect the disease) What are P(D|+) and P(N|–)? Note, P(D|+) = the probability that a patient actually has the disease when the test says they do.

20
Part 1 – Probability and Distribution Theory 20/107 More Information Deduce: Since P(+|D)=.98, we know P( – |D)=.02 because P(-|D)+P(+|D)=1 [P( – |D) is the P(False negative). Deduce: Since P( – |N)=.95, we know P(+|N)=.05 because P(-|N)+P(+|N)=1 [P(+|N) is the P(False positive). Deduce: Since P(D)=.005, P(N)=.995 because P(D)+P(N)=1.

21
Part 1 – Probability and Distribution Theory 21/107 Now, Use Bayes Theorem

22
Part 1 – Probability and Distribution Theory 22/107 Independent events Definition: P(A|B) = P(A) Multiplication rule P(A,B) = P(A)P(B) Application: Infectious disease transmission

23
Part 1 – Probability and Distribution Theory 2 – Random Variables

24
Part 1 – Probability and Distribution Theory 24/107 Random Variable Definition: Maps elements of the sample space to a single variable: Assigns a number to Discrete: Payoff to poker hands Continuous: Lightbulb lifetimes Mixed: Ticket sales with capacity constraints. (Censoring)

25
Part 1 – Probability and Distribution Theory 25/107 Market Behavior: Fair Isaacs credit card service to major vendors = {Reject, Accept} X = 0=reject, 1=accept

26
Part 1 – Probability and Distribution Theory 26/107 Caribbean Stud Poker { Sample Space } Probability Variable

27
Part 1 – Probability and Distribution Theory 27/107 Features of Random Variables Probability Distribution Mass function: Prob(X=x)=f(x) Density function: f(x), x =... Cumulative probabilities; CDF Prob(X < x) F(x) Quantiles: x such that F(x) = Q Median: x = median, Q = 0.5.

28
Part 1 – Probability and Distribution Theory 28/107 Discrete Random Variables Elemental building block Bernoulli: Credit card applications Discrete uniform: Die toss Counting Rules Binomial: Family composition Hypergeometric: House/Senate Supercommittee Models Poisson: Diabetes incidence, Accidents, etc.

29
Part 1 – Probability and Distribution Theory 29/107 Market Behavior: Fair Isaacs credit card service to major vendors X = 0=reject, 1=accept Prob(X=x)=(1-p) (1-x) p x, x=0,1

30
Part 1 – Probability and Distribution Theory 30/107 Binomial Sum of n Bernoulli trials

31
Part 1 – Probability and Distribution Theory 31/107 Examples

32
Part 1 – Probability and Distribution Theory 32/107 Poisson Approximation to binomial General model for a type of process

33
Part 1 – Probability and Distribution Theory 33/107 Poisson Approximation to Binomial

34
Part 1 – Probability and Distribution Theory 34/107 Diabetes Incidence per 1000

35
Part 1 – Probability and Distribution Theory 35/107 Poisson Distribution of Disease Cases in 1000 Draws with =7

36
Part 1 – Probability and Distribution Theory 36/107 Poisson Process: Doctor visits in the survey year by people in a sample of 27,326. =.8 Poisson probability model is a description of this process, not an approximation

37
Part 1 – Probability and Distribution Theory 37/107 Continuous RV Density function, f(x) Probability measure P(event) obtained using the density. Application: Lightbulb lifetimes?

38
Part 1 – Probability and Distribution Theory 38/107 Probability Density Function; PDF

39
Part 1 – Probability and Distribution Theory 39/107 CDF and Quantiles pth = quantile; 0 < p < 1 Quantile = x p such that F(x p ) = p. x p = F -1 (p). For p =.5, x p = median

40
Part 1 – Probability and Distribution Theory 40/107 Model for Light Bulb Lifetimes This is the exponential model for lifetimes. The model is f(time) = (1/μ) e -time/μ

41
Part 1 – Probability and Distribution Theory 41/107 Model for Light Bulb Lifetimes The area under the entire curve is 1.0.

42
Part 1 – Probability and Distribution Theory 42/107 Continuous Distribution A partial area will be between 0.0 and 1.0, and will produce a probability. The probability associated with an interval such as 1000 < LIFETIME < 2000 equals the area under the curve from the lower limit to the upper.

43
Part 1 – Probability and Distribution Theory 43/107 Probability of a Single Value Is Zero The probability associated with a single point, such as LIFETIME=2000, equals 0.0.

44
Part 1 – Probability and Distribution Theory 44/107 Probabilities via the CDF

45
Part 1 – Probability and Distribution Theory 45/107 Probability for a Range of Values Based on CDF Prob(Life < 2000) (.7364) Minus Prob(Life < 1000) (.4866) Equals Prob(1000 < Life < 2000) (.2498)

46
Part 1 – Probability and Distribution Theory 46/107 Common Continuous RVs Continuous random variables are all models; they do not occur in nature. The model builders toolkit: Continuous uniform Exponential Normal Lognormal Gamma Beta Defined for specific types of outcomes

47
Part 1 – Probability and Distribution Theory 47/107 Continuous Uniform f(x) = 1/(b – a), a < x < b F(x) = x/(b – a), a < x < b.

48
Part 1 – Probability and Distribution Theory 48/107 Exponential f(x) = exp(- x), x > 0, 0 otherwise F(x) = 1 – exp(- x), x > 0 Median: F(M) =.5 1 – exp(- M) =.5 exp(- M) =.5 – M = ln.5 M = -ln.5/ = (ln2)/

49
Part 1 – Probability and Distribution Theory 49/107 49

50
Part 1 – Probability and Distribution Theory 50/107 Gamma Density Uses the Gamma Function

51
Part 1 – Probability and Distribution Theory 51/107 Gamma Distributed Random Variable Used to model nonnegative random variables – e.g., survival of people and electronic components Two special cases P = 1 is the exponential distribution P = ½ and = ½ is the chi squared with one degree of freedom

52
Part 1 – Probability and Distribution Theory 52/107 Beta Uses Beta Integrals

53
Part 1 – Probability and Distribution Theory 53/107 Normal Density – The Model Mean = μ, standard deviation = σ

54
Part 1 – Probability and Distribution Theory 54/107 Normal Distributions The scale and location (on the horizontal axis) depend on μ and σ. The shape of the distribution is always the same. (Bell curve)

55
Part 1 – Probability and Distribution Theory 55/107

56
Part 1 – Probability and Distribution Theory 56/107 Standard Normal Density (0,1)

57
Part 1 – Probability and Distribution Theory 57/107 Lognormal Distribution

58
Part 1 – Probability and Distribution Theory 58/107 Censoring and Truncation Censoring Observation mechanism. Values above or below a certain value are assigned the boundary value Applications, ticket market: demand vs. sales given capacity constraints; top coded income data Truncation Observation mechanism. The relevant distribution only applies in a restricted range of the random variable Application: On site survey for recreation visits. Truncated Poisson Incidental truncation: Income is observed only for those whose wealth (not income) exceeds $100,000.

59
Part 1 – Probability and Distribution Theory 59/107 Truncated Random Variable Untruncated variable has density f(x) Truncated variable has density f(x)/Prob(x is in range) Truncated Normal:

60
Part 1 – Probability and Distribution Theory 60/107 F(x | x > X L ) Truncated Normal: f(x|x>a) = f(x)/Prob(x>a)

61
Part 1 – Probability and Distribution Theory 61/107 Truncated Poisson f(x)= exp(- ) x / (x+1) f(x|x>0) = f(x)/Prob(x>0) = f(x) / [1 – Prob(x=0)] = {exp(- ) x / (x+1)} / {1 - exp(- )}

62
Part 1 – Probability and Distribution Theory 62/107 Representations of a Continuous Random Variable Representations Density, f(x) CDF, F(x) = Prob(X < x) Survival, S(x) = Prob(X > x) = 1-F(x) Hazard function, h(x) = -dlnS(x)/dx Representations are one to one – each uniquely determines the distribution of the random variable

63
Part 1 – Probability and Distribution Theory 63/107 Application: A Memoryless Process

64
Part 1 – Probability and Distribution Theory 64/107 A Change of Variable Theorem: x = a continuous RV with continuous density f(x). y=g(x) is a monotonic function over the range of x. y=g(x), f(y) = f(x(y)) |dx(y)/dy)| = f(x(y)) |dg -1 (y)/dy)|

65
Part 1 – Probability and Distribution Theory 65/107 Change of Variable Applications Standardized normal Lognormal to normal Fundamental probability transform

66
Part 1 – Probability and Distribution Theory 66/107 Standardized Normal X ~ N[, 2 ] Prob[X < a] = F(a) Prob[X < a] = Prob[(X - )/ ] < (a - )/ y = (x - )/ J = dx(y)/dy = f(y) = f( y+ ) = [1/sqr(2 )]exp(-y 2 /2) Only a table for the standard normal is needed.

67
Part 1 – Probability and Distribution Theory 67/107 Textbooks Provide Tables of Areas for the Standard Normal Econometric Analysis, WHG, 2008, Appendix G, page 1093, Rice Table 2 Note that values are only given for z ranging from 0.00 to No values are given for negative z.

68
Part 1 – Probability and Distribution Theory 68/107 Computing Probabilities Standard Normal Tables give probabilities when μ = 0 and σ = 1. For other cases, do we need another table? Probabilities for other cases are obtained by standardizing. Standardized variable is z = (x – μ)/ σ z has mean 0 and standard deviation 1

69
Part 1 – Probability and Distribution Theory 69/107 Standard Normal Density

70
Part 1 – Probability and Distribution Theory 70/107 Standard Normal Distribution Facts The random variable z runs from - to + (z) > 0 for all z, but for |z| > 4, it is essentially 0. The total area under the curve equals 1.0. The curve is symmetric around 0. (The normal distribution generally is symmetric around μ.)

71
Part 1 – Probability and Distribution Theory 71/107 Only Half the Table Is Needed The area to left of 0.0 is exactly 0.5.

72
Part 1 – Probability and Distribution Theory 72/107 Only Half the Table Is Needed The area left of 1.60 is exactly 0.5 plus the area between 0.0 and 1.60.

73
Part 1 – Probability and Distribution Theory 73/107 Areas Left of Negative Z Area left of -1.6 equals area right of Area right of +1.6 equals 1 – area to the left of +1.6.

74
Part 1 – Probability and Distribution Theory 74/107 Computing Probabilities by Standardizing: Example

75
Part 1 – Probability and Distribution Theory 75/107 Lognormal Distribution

76
Part 1 – Probability and Distribution Theory 76/107 Lognormal Distribution of Monthly Wages in NLS 76

77
Part 1 – Probability and Distribution Theory 77/107 Log of Lognormal Variable 77

78
Part 1 – Probability and Distribution Theory 78/107 Fundamental Probability Transformation

79
Part 1 – Probability and Distribution Theory 79/107 Random Number Generation The CDF is a monotonic function of x If u = F(x), x = F -1 (u) We can generate u with a computer Example: Exponential Example: Normal

80
Part 1 – Probability and Distribution Theory 80/107 Generating Random Samples Exponential u = F(x) = 1 – exp(- x) 1 – u = exp(- x) x = (-1/ ) ln(1 – u) Normal (, ) u = (z) z = -1 (u) x = z + = -1 (u) +

81
Part 1 – Probability and Distribution Theory 81/107 U[0,1] Generation Linear congruential generator x(n) = (a x(n-1) + b)mod m Properties of RNGs Replicability – they are not RANDOM Period Randomness tests The Mersenne twister: Current state of the art (of pseudo-random number generation)

82
Part 1 – Probability and Distribution Theory 3 – Joint Distributions

83
Part 1 – Probability and Distribution Theory 83/107 Jointly Distributed Random Variables Usually some kind of association between the variables. E.g., two different financial assets Joint cdf for two random variables F(x, y) = Prob(X < x, Y < y)

84
Part 1 – Probability and Distribution Theory 84/107 Probability of a Rectangle a1 b1 b2 a2 F(b1,b2) - F(b1,a2) - F(a1,b2) + F(a1,a2) Prob[a1 < x < b1, a2 < y < b2] x y

85
Part 1 – Probability and Distribution Theory 85/107 Joint Distributions Discrete: Multinomial for R kinds of success in N independent trials Continuous: Bi- and Multivariate normal Mixed: Conditional regression models

86
Part 1 – Probability and Distribution Theory 86/107 Multinomial Distribution

87
Part 1 – Probability and Distribution Theory 87/107 Probabilities: Inherited Color Blindness Inherited color blindness has different incidence rates in men and women. Women usually carry the defective gene and men usually inherit it. Pick an individual at random from the population. B=1 = has inherited color blindness, B=0, not color blind G=0 = MALE = gender, G=1, Female Marginal: P(B=1) = 2.75% Conditional:P(B=1|G=0) = 5.0% (1 in 20 men) P(B=1|G=1) = 0.5% (1 in 200 women) Joint: P(B=1 and G=0) = 2.5% P(B=1 and G=1) = 0.25%

88
Part 1 – Probability and Distribution Theory 88/107 Marginal Distributions Prob[X=x] = y Prob[X=x,Y=y] Color Blind Gender B=0 B=1Total G= G= Total Prob[G=0]=Prob[G=0,B=0]+ Prob[G=0,B=1]

89
Part 1 – Probability and Distribution Theory 89/107 Joint Continuous Distribution

90
Part 1 – Probability and Distribution Theory 90/107 Marginal Distributions

91
Part 1 – Probability and Distribution Theory 91/107 Two Leading Applications Copula Function - Application in Finance Bivariate Normal Distribution

92
Part 1 – Probability and Distribution Theory 92/107

93
Part 1 – Probability and Distribution Theory 93/107

94
Part 1 – Probability and Distribution Theory 94/107

95
Part 1 – Probability and Distribution Theory 95/107

96
Part 1 – Probability and Distribution Theory 96/107

97
Part 1 – Probability and Distribution Theory 97/107

98
Part 1 – Probability and Distribution Theory 98/107

99
Part 1 – Probability and Distribution Theory 99/107 The Bivariate Normal Distribution

100
Part 1 – Probability and Distribution Theory 100/107

101
Part 1 – Probability and Distribution Theory 101/107 Independent Random Variables F(x, y) = Prob(X < x, Y < y) = Prob(X < x) Prob(Y < y) = F X (x) F Y (y) f(x,y) = 2 F(x,y)/ x y = f(x) f(y)

102
Part 1 – Probability and Distribution Theory 102/107 Independent Normals

103
Part 1 – Probability and Distribution Theory 103/107 Conditional Distributions Color Blind Gender B=0 (No) B=1 (Yes) Total G=0 (M) G=1 (F) Total Prob(Not color blind given male) Prob(B=0|G=0) = Prob(B=0,G=0) / Prob(G=0) =.475 /.50 =.950 Prob(B=1|G=0) =.025/.5 =.05 Prob(B=1|G=0)+Prob(B=0|G=0)=1

104
Part 1 – Probability and Distribution Theory 104/107 Conditional Distribution Continuous Normal

105
Part 1 – Probability and Distribution Theory 105/107 Bivariate Normal Joint distribution is bivariate normal Marginal distributions are normal Conditional distributions are normal

106
Part 1 – Probability and Distribution Theory 106/107 Y and Y|X X X Y

107
Part 1 – Probability and Distribution Theory 107/107 Model Building Typically f(y|x) is of interest x is generated by a separate process f(x) Joint distribution is f(y,x)=f(y|x)f(x) Ex: demographic y = log(household income|family size) x = family size y|x ~ Normal( y|x, y|x ) x ~ Poisson ( )

108
Part 1 – Probability and Distribution Theory 108/107 y|x ~ Normal[ x, 4 2 ], x = 1,2,3,4; Poisson X=4 X=3 X=2 X=1

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google