Download presentation

Presentation is loading. Please wait.

Published byLisandro Jayne Modified over 2 years ago

1
**Statistical Inference and Regression Analysis: Stat-GB. 3302**

Statistical Inference and Regression Analysis: Stat-GB , Stat-UB Professor William Greene Stern School of Business IOMS Department Department of Economics

2
**Part 1 –Probability and Distribution Theory**

3
1 – Probability

4
**Sample Space Random outcomes: The result of a process**

Sequence of events, Number of events, Measurement of a length of time, space, etc. Outcomes, experiments and sample spaces

5
Consumer Choice: 4 possible ways a randomly chosen traveler might travel between Sydney and Melbourne = {Air, Train, Bus, Car}

6
**Market Behavior: Fair Isaacs credit card service to major vendors**

= {Reject, Accept}

7
**Measurement of Lifetimes**

A box of light bulbs states “Average life is 1500 hours” Outcome = length of time until failure (lifetime) of a randomly chosen light bulb = {lifetime | lifetime > 0}

8
**Events = (Air, Train) (Bus, Car)**

Events are defined as Subsets of sample space, such as empty set Intersection of related events Complements such as “A” and “not A” Disjoint sets such as (train,bus),(air,car) Any subset including is a disjoint union of subsets: = (Air, Train) (Bus, Car)

9
**Probability is a Measure**

The sample space is a - field: Contains at least one nonempty subset (event) Is closed under complementarity Is closed under countable union Probability is a measure defined on all subsets of Axioms of Probability P() = 1 A P(A) > 0 If A B = {}, P(A B) = P(A) + P(B)

10
**Implications of the Axioms**

P(~A) = 1 – P(A) as A ~A = P() = 0 as = ~ and P() = 1 A B P(A) < P(B) as B = A + (~A B) P(A B) = P(A) + P(B) – P(A B)

11
Probability Assigning probability: ‘Size’ of an event relative to size of sample space. Counting rules for equally likely discrete outcomes Using combinations and permutations to count elements Example: Discrete uniform, poker hands Example hypergeometric: the super committee (House 242R,193D, Senate 49R, 51D&I) Measurement for continuous outcomes

12
**Applications: Games of Chance; Poker**

In a 5 card hand from a deck of 52, there are (52*51*50*49*48)/(5*4*3*2*1) different possible hands. (Order doesn’t matter). 2,598,960 possible hands. How many of these hands have 4 aces? 48 = the 4 aces plus any of the remaining 48 cards.

13
Some Poker Hands Full House – 3 of one kind, 2 of another. (Also called a “boat.”) Royal Flush – Top 5 cards in a suit Flush – 5 cards in a suit, not sequential Straight Flush – 5 sequential cards in the same suit suit Straight – 5 cards in a numerical row, not the same suit 4 of a kind – plus any other card

14
5 Card Poker Hands

15
The Dead Man’s Hand The dead man’s hand is 5 cards, 2 aces, 2 8’s and some other 5th card (Wild Bill Hickok was holding this hand when he was shot in the back and killed in 1876.) The number of hands with two aces and two 8’s is = 1,584 The rest of the story claims that Hickok held all black cards (the bullets). The probability for this hand falls to only 44/ (The four cards in the picture and one of the remaining 44.) Some claims have been made about the 5th card, but noone is sure – there is no record.

16
**Budget Supercommittee**

17
**Conditional Probability**

P(A|B) = P(A,B)/P(B) = Size of A relative to a subset of Basic result p(A,B) = p(A|B) p(B) (follows from the definition) Bayes theorem Applications – mammography, drug testing, lie detector test, PSA test.

18
**Using Conditional Probabilities: Bayes Theorem**

19
Drug Testing Data P(Test correctly indicates disease)=.98 (Sensitivity) P(Test correctly indicates absence)=.95 (Specificity) P(Disease) = (Fairly rare) Notation + = test indicates disease, – = indicates no disease D = presence of disease, N = absence of disease Data: P(D) = (Incidence of the disease) P(+|D) = (Correct detection of the disease) P(–|N) = (Correct failure to detect the disease) What are P(D|+) and P(N|–)? Note, P(D|+) = the probability that a patient actually has the disease when the test says they do.

20
More Information Deduce: Since P(+|D)=.98, we know P(–|D)= because P(-|D)+P(+|D)=1 [P(–|D) is the P(False negative). Deduce: Since P(–|N)=.95, we know P(+|N)= because P(-|N)+P(+|N)=1 [P(+|N) is the P(False positive). Deduce: Since P(D)=.005, P(N)=.995 because P(D)+P(N)=1.

21
Now, Use Bayes Theorem

22
**Independent events Definition: P(A|B) = P(A)**

Multiplication rule P(A,B) = P(A)P(B) Application: Infectious disease transmission

23
2 – Random Variables

24
Random Variable Definition: Maps elements of the sample space to a single variable: Assigns a number to Discrete: Payoff to poker hands Continuous: Lightbulb lifetimes Mixed: Ticket sales with capacity constraints. (Censoring)

25
**Market Behavior: Fair Isaacs credit card service to major vendors**

= {Reject, Accept} X = 0=reject, 1=accept

26
Caribbean Stud Poker { Sample Space } Probability Variable

27
**Features of Random Variables**

Probability Distribution Mass function: Prob(X=x)=f(x) Density function: f(x), x = ... Cumulative probabilities; CDF Prob(X < x) F(x) Quantiles: x such that F(x) = Q Median: x = median, Q = 0.5.

28
**Discrete Random Variables**

Elemental building block Bernoulli: Credit card applications Discrete uniform: Die toss Counting Rules Binomial: Family composition Hypergeometric: House/Senate Supercommittee Models Poisson: Diabetes incidence, Accidents, etc.

29
**Market Behavior: Fair Isaacs credit card service to major vendors**

X = 0=reject, 1=accept Prob(X=x)=(1-p)(1-x)px, x=0,1

30
Binomial Sum of n Bernoulli trials

31
Examples

32
Poisson Approximation to binomial General model for a type of process

33
**Poisson Approximation to Binomial**

34
**Diabetes Incidence per 1000**

35
**Poisson Distribution of Disease Cases in 1000 Draws with =7**

36
**Poisson Process: Doctor visits in the survey year by people in a sample of 27,326. = .8**

Poisson probability model is a description of this process, not an approximation 36

37
**Continuous RV Density function, f(x)**

Probability measure P(event) obtained using the density. Application: Lightbulb lifetimes?

38
**Probability Density Function; PDF**

39
**CDF and Quantiles pth = quantile; 0 < p < 1**

Quantile = xp such that F(xp) = p. xp = F-1(p). For p = .5, xp = median

40
**Model for Light Bulb Lifetimes**

This is the exponential model for lifetimes. The model is f(time) = (1/μ) e-time/μ

41
**Model for Light Bulb Lifetimes**

The area under the entire curve is 1.0.

42
**Continuous Distribution**

The probability associated with an interval such as 1000 < LIFETIME < equals the area under the curve from the lower limit to the upper. A partial area will be between 0.0 and 1.0, and will produce a probability.

43
**Probability of a Single Value Is Zero**

The probability associated with a single point, such as LIFETIME=2000, equals 0.0.

44
**Probabilities via the CDF**

45
**Probability for a Range of Values Based on CDF**

Prob(Life < 2000) (.7364) Minus Prob(Life < 1000) (.4866) Equals Prob(1000 < Life < 2000) (.2498)

46
Common Continuous RVs Continuous random variables are all models; they do not occur in nature. The model builder’s toolkit: Continuous uniform Exponential Normal Lognormal Gamma Beta Defined for specific types of outcomes

47
**Continuous Uniform f(x) = 1/(b – a), a < x < b**

F(x) = x/(b – a), a < x < b.

48
**Exponential f(x) = exp(-x), x > 0, 0 otherwise**

Median: F(M) = .5 1 – exp(-M) = .5 exp(-M) = .5 – M = ln.5 M = -ln.5/ = (ln2)/

50
**Gamma Density Uses the Gamma Function**

51
**Gamma Distributed Random Variable**

Used to model nonnegative random variables – e.g., survival of people and electronic components Two special cases P = 1 is the exponential distribution P = ½ and = ½ is the chi squared with one “degree of freedom”

52
**Beta Uses Beta Integrals**

53
**Normal Density – The Model**

Mean = μ, standard deviation = σ

54
Normal Distributions The scale and location (on the horizontal axis) depend on μ and σ. The shape of the distribution is always the same. (Bell curve)

56
**Standard Normal Density (0,1)**

57
**Lognormal Distribution**

58
**Censoring and Truncation**

Observation mechanism. Values above or below a certain value are assigned the boundary value Applications, ticket market: demand vs. sales given capacity constraints; top coded income data Truncation Observation mechanism. The relevant distribution only applies in a restricted range of the random variable Application: On site survey for recreation visits. Truncated Poisson Incidental truncation: Income is observed only for those whose wealth (not income) exceeds $100,000.

59
**Truncated Random Variable**

Untruncated variable has density f(x) Truncated variable has density f(x)/Prob(x is in range) Truncated Normal:

60
**Truncated Normal: f(x|x>a) = f(x)/Prob(x>a)**

F(x | x > XL )

61
**Truncated Poisson f(x)= exp(-) x / (x+1)**

f(x|x>0) = f(x)/Prob(x>0) = f(x) / [1 – Prob(x=0)] = {exp(-) x / (x+1)} / {1 - exp(-)}

62
**Representations of a Continuous Random Variable**

Density, f(x) CDF, F(x) = Prob(X < x) Survival, S(x) = Prob(X > x) = 1-F(x) Hazard function, h(x) = -dlnS(x)/dx Representations are one to one – each uniquely determines the distribution of the random variable

63
**Application: A Memoryless Process**

64
A Change of Variable Theorem: x = a continuous RV with continuous density f(x). y=g(x) is a monotonic function over the range of x. y=g(x), f(y) = f(x(y)) |dx(y)/dy)| = f(x(y)) |dg-1(y)/dy)|

65
**Change of Variable Applications**

Standardized normal Lognormal to normal Fundamental probability transform

66
**Standardized Normal X ~ N[, 2] Prob[X < a] = F(a)**

Prob[X < a] = Prob[(X - )/ ] < (a - )/ y = (x - )/ J = dx(y)/dy = f(y) = f(y+ ) = [1/sqr(2)]exp(-y2/2) Only a table for the standard normal is needed.

67
**Textbooks Provide Tables of Areas for the Standard Normal**

Econometric Analysis, WHG, 2008, Appendix G, page 1093, Rice Table 2 Note that values are only given for z ranging from 0.00 to No values are given for negative z.

68
**Computing Probabilities**

Standard Normal Tables give probabilities when μ = 0 and σ = 1. For other cases, do we need another table? Probabilities for other cases are obtained by “standardizing.” Standardized variable is z = (x – μ)/ σ z has mean 0 and standard deviation 1

69
**Standard Normal Density**

70
**Standard Normal Distribution Facts**

The random variable z runs from -∞ to +∞ (z) > 0 for all z, but for |z| > 4, it is essentially 0. The total area under the curve equals 1.0. The curve is symmetric around 0. (The normal distribution generally is symmetric around μ.)

71
**Only Half the Table Is Needed**

The area to left of 0.0 is exactly 0.5.

72
**Only Half the Table Is Needed**

The area left of 1.60 is exactly 0.5 plus the area between 0.0 and 1.60.

73
**Areas Left of Negative Z**

Area left of -1.6 equals area right of +1.6. Area right of +1.6 equals 1 – area to the left of

74
**Computing Probabilities by Standardizing: Example**

75
**Lognormal Distribution**

76
**Lognormal Distribution of Monthly Wages in NLS**

77
**Log of Lognormal Variable**

78
**Fundamental Probability Transformation**

79
**Random Number Generation**

The CDF is a monotonic function of x If u = F(x), x = F-1(u) We can generate u with a computer Example: Exponential Example: Normal

80
**Generating Random Samples**

Exponential u = F(x) = 1 – exp(-x) 1 – u = exp(-x) x = (-1/ ) ln(1 – u) Normal (,) u = (z) z = -1(u) x = z + = -1(u) +

81
U[0,1] Generation Linear congruential generator x(n) = (a x(n-1) + b)mod m Properties of RNGs Replicability – they are not RANDOM Period Randomness tests The Mersenne twister: Current state of the art (of pseudo-random number generation)

82
3 – Joint Distributions

83
**Jointly Distributed Random Variables**

Usually some kind of association between the variables. E.g., two different financial assets Joint cdf for two random variables F(x, y) = Prob(X < x, Y < y)

84
**Probability of a Rectangle**

Prob[a1 < x < b1, a2 < y < b2] y x F(b1,b2) F(b1,a2) F(a1,b2) F(a1,a2)

85
Joint Distributions Discrete: Multinomial for R kinds of success in N independent trials Continuous: Bi- and Multivariate normal Mixed: Conditional regression models

86
**Multinomial Distribution**

87
**Probabilities: Inherited Color Blindness**

Inherited color blindness has different incidence rates in men and women. Women usually carry the defective gene and men usually inherit it. Pick an individual at random from the population. B=1 = has inherited color blindness, B=0, not color blind G=0 = MALE = gender, G=1, Female Marginal: P(B=1) = 2.75% Conditional: P(B=1|G=0) = 5.0% (1 in 20 men) P(B=1|G=1) = 0.5% (1 in 200 women) Joint: P(B=1 and G=0) = 2.5% P(B=1 and G=1) = 0.25%

88
**Marginal Distributions**

Prob[X=x] = y Prob[X=x,Y=y] Color Blind Gender B=0 B=1 Total G=0 .475 .025 0.50 G=1 .4975 .0025 .97255 .0275 1.00 Prob[G=0]=Prob[G=0,B=0]+ Prob[G=0,B=1]

89
**Joint Continuous Distribution**

90
**Marginal Distributions**

91
**Two Leading Applications**

Copula Function - Application in Finance Bivariate Normal Distribution

99
**The Bivariate Normal Distribution**

101
**Independent Random Variables**

F(x, y) = Prob(X < x, Y < y) = Prob(X < x) Prob(Y < y) = FX(x) FY(y) f(x,y) = 2 F(x,y)/x y = f(x) f(y)

102
Independent Normals

103
**Conditional Distributions**

Color Blind Gender B=0 (No) B=1 (Yes) Total G=0 (M) .475 .025 0.50 G=1 (F) .4975 .0025 .97255 .0275 1.00 Prob(Not color blind given male) Prob(B=0|G=0) = Prob(B=0,G=0) / Prob(G=0) = / .50 = Prob(B=1|G=0) = .025/.5 = .05 Prob(B=1|G=0)+Prob(B=0|G=0)=1

104
**Conditional Distribution Continuous Normal**

105
**Bivariate Normal Joint distribution is bivariate normal**

Marginal distributions are normal Conditional distributions are normal

106
Y and Y|X Y X X

107
**Model Building Typically f(y|x) is of interest**

x is generated by a separate process f(x) Joint distribution is f(y,x)=f(y|x)f(x) Ex: demographic y = log(household income|family size) x = family size y|x ~ Normal(y|x , y|x ) x ~ Poisson ()

108
X=4 X=3 X=2 X=1 y|x ~ Normal[ x, 42 ], x = 1,2,3,4; Poisson

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google