# Statistical Inference and Regression Analysis: Stat-GB. 3302

## Presentation on theme: "Statistical Inference and Regression Analysis: Stat-GB. 3302"— Presentation transcript:

Statistical Inference and Regression Analysis: Stat-GB. 3302
Statistical Inference and Regression Analysis: Stat-GB , Stat-UB Professor William Greene Stern School of Business IOMS Department Department of Economics

Part 1 –Probability and Distribution Theory

1 – Probability

Sample Space Random outcomes: The result of a process
Sequence of events, Number of events, Measurement of a length of time, space, etc. Outcomes, experiments and sample spaces

Consumer Choice: 4 possible ways a randomly chosen traveler might travel between Sydney and Melbourne  = {Air, Train, Bus, Car}

Market Behavior: Fair Isaacs credit card service to major vendors
 = {Reject, Accept}

A box of light bulbs states “Average life is 1500 hours” Outcome = length of time until failure (lifetime) of a randomly chosen light bulb  = {lifetime | lifetime > 0}

Events  = (Air, Train)  (Bus, Car)
Events are defined as Subsets of sample space, such as empty set Intersection of related events Complements such as “A” and “not A” Disjoint sets such as (train,bus),(air,car) Any subset including  is a disjoint union of subsets:  = (Air, Train)  (Bus, Car)

Probability is a Measure
The sample space  is a  - field: Contains at least one nonempty subset (event) Is closed under complementarity Is closed under countable union Probability is a measure defined on all subsets of  Axioms of Probability P() = 1 A    P(A) > 0 If A  B = {}, P(A  B) = P(A) + P(B)

Implications of the Axioms
P(~A) = 1 – P(A) as A  ~A =  P() = 0 as  = ~  and P() = 1 A  B  P(A) < P(B) as B = A + (~A  B) P(A  B) = P(A) + P(B) – P(A  B)

Probability Assigning probability: ‘Size’ of an event relative to size of sample space. Counting rules for equally likely discrete outcomes Using combinations and permutations to count elements Example: Discrete uniform, poker hands Example hypergeometric: the super committee (House 242R,193D, Senate 49R, 51D&I) Measurement for continuous outcomes

Applications: Games of Chance; Poker
In a 5 card hand from a deck of 52, there are (52*51*50*49*48)/(5*4*3*2*1) different possible hands. (Order doesn’t matter). 2,598,960 possible hands. How many of these hands have 4 aces? 48 = the 4 aces plus any of the remaining 48 cards.

Some Poker Hands Full House – 3 of one kind, 2 of another. (Also called a “boat.”) Royal Flush – Top 5 cards in a suit Flush – 5 cards in a suit, not sequential Straight Flush – 5 sequential cards in the same suit suit Straight – 5 cards in a numerical row, not the same suit 4 of a kind – plus any other card

5 Card Poker Hands

The Dead Man’s Hand The dead man’s hand is 5 cards, 2 aces, 2 8’s and some other 5th card (Wild Bill Hickok was holding this hand when he was shot in the back and killed in 1876.) The number of hands with two aces and two 8’s is = 1,584 The rest of the story claims that Hickok held all black cards (the bullets). The probability for this hand falls to only 44/ (The four cards in the picture and one of the remaining 44.) Some claims have been made about the 5th card, but noone is sure – there is no record.

Budget Supercommittee

Conditional Probability
P(A|B) = P(A,B)/P(B) = Size of A relative to a subset of  Basic result p(A,B) = p(A|B) p(B) (follows from the definition) Bayes theorem Applications – mammography, drug testing, lie detector test, PSA test.

Using Conditional Probabilities: Bayes Theorem

Drug Testing Data P(Test correctly indicates disease)=.98 (Sensitivity) P(Test correctly indicates absence)=.95 (Specificity) P(Disease) = (Fairly rare) Notation + = test indicates disease, – = indicates no disease D = presence of disease, N = absence of disease Data: P(D) = (Incidence of the disease) P(+|D) = (Correct detection of the disease) P(–|N) = (Correct failure to detect the disease) What are P(D|+) and P(N|–)? Note, P(D|+) = the probability that a patient actually has the disease when the test says they do.

More Information Deduce: Since P(+|D)=.98, we know P(–|D)= because P(-|D)+P(+|D)=1 [P(–|D) is the P(False negative). Deduce: Since P(–|N)=.95, we know P(+|N)= because P(-|N)+P(+|N)=1 [P(+|N) is the P(False positive). Deduce: Since P(D)=.005, P(N)=.995 because P(D)+P(N)=1.

Now, Use Bayes Theorem

Independent events Definition: P(A|B) = P(A)
Multiplication rule P(A,B) = P(A)P(B) Application: Infectious disease transmission

2 – Random Variables

Random Variable Definition: Maps elements of the sample space to a single variable: Assigns a number to    Discrete: Payoff to poker hands Continuous: Lightbulb lifetimes Mixed: Ticket sales with capacity constraints. (Censoring)

Market Behavior: Fair Isaacs credit card service to major vendors
= {Reject, Accept} X = 0=reject, 1=accept

Caribbean Stud Poker { Sample Space } Probability Variable

Features of Random Variables
Probability Distribution Mass function: Prob(X=x)=f(x) Density function: f(x), x = ... Cumulative probabilities; CDF Prob(X < x) F(x) Quantiles: x such that F(x) = Q Median: x = median, Q = 0.5.

Discrete Random Variables
Elemental building block Bernoulli: Credit card applications Discrete uniform: Die toss Counting Rules Binomial: Family composition Hypergeometric: House/Senate Supercommittee Models Poisson: Diabetes incidence, Accidents, etc.

Market Behavior: Fair Isaacs credit card service to major vendors
X = 0=reject, 1=accept Prob(X=x)=(1-p)(1-x)px, x=0,1

Binomial Sum of n Bernoulli trials

Examples

Poisson Approximation to binomial General model for a type of process

Poisson Approximation to Binomial

Diabetes Incidence per 1000

Poisson Distribution of Disease Cases in 1000 Draws with =7

Poisson Process: Doctor visits in the survey year by people in a sample of 27,326.  = .8
Poisson probability model is a description of this process, not an approximation 36

Continuous RV Density function, f(x)
Probability measure P(event) obtained using the density. Application: Lightbulb lifetimes?

Probability Density Function; PDF

CDF and Quantiles pth = quantile; 0 < p < 1
Quantile = xp such that F(xp) = p. xp = F-1(p). For p = .5, xp = median

This is the exponential model for lifetimes. The model is f(time) = (1/μ) e-time/μ

The area under the entire curve is 1.0.

Continuous Distribution
The probability associated with an interval such as 1000 < LIFETIME < equals the area under the curve from the lower limit to the upper. A partial area will be between 0.0 and 1.0, and will produce a probability.

Probability of a Single Value Is Zero
The probability associated with a single point, such as LIFETIME=2000, equals 0.0.

Probabilities via the CDF

Probability for a Range of Values Based on CDF
Prob(Life < 2000) (.7364) Minus Prob(Life < 1000) (.4866) Equals Prob(1000 < Life < 2000) (.2498)

Common Continuous RVs Continuous random variables are all models; they do not occur in nature. The model builder’s toolkit: Continuous uniform Exponential Normal Lognormal Gamma Beta Defined for specific types of outcomes

Continuous Uniform f(x) = 1/(b – a), a < x < b
F(x) = x/(b – a), a < x < b.

Exponential f(x) =  exp(-x), x > 0, 0 otherwise
Median: F(M) = .5 1 – exp(-M) = .5 exp(-M) = .5 – M = ln.5 M = -ln.5/ = (ln2)/

Gamma Density Uses the Gamma Function

Gamma Distributed Random Variable
Used to model nonnegative random variables – e.g., survival of people and electronic components Two special cases P = 1 is the exponential distribution P = ½ and  = ½ is the chi squared with one “degree of freedom”

Beta Uses Beta Integrals

Normal Density – The Model
Mean = μ, standard deviation = σ

Normal Distributions The scale and location (on the horizontal axis) depend on μ and σ. The shape of the distribution is always the same. (Bell curve)

Standard Normal Density (0,1)

Lognormal Distribution

Censoring and Truncation
Observation mechanism. Values above or below a certain value are assigned the boundary value Applications, ticket market: demand vs. sales given capacity constraints; top coded income data Truncation Observation mechanism. The relevant distribution only applies in a restricted range of the random variable Application: On site survey for recreation visits. Truncated Poisson Incidental truncation: Income is observed only for those whose wealth (not income) exceeds \$100,000.

Truncated Random Variable
Untruncated variable has density f(x) Truncated variable has density f(x)/Prob(x is in range) Truncated Normal:

Truncated Normal: f(x|x>a) = f(x)/Prob(x>a)
F(x | x > XL )

Truncated Poisson f(x)= exp(-) x / (x+1)
f(x|x>0) = f(x)/Prob(x>0) = f(x) / [1 – Prob(x=0)] = {exp(-) x / (x+1)} / {1 - exp(-)}

Representations of a Continuous Random Variable
Density, f(x) CDF, F(x) = Prob(X < x) Survival, S(x) = Prob(X > x) = 1-F(x) Hazard function, h(x) = -dlnS(x)/dx Representations are one to one – each uniquely determines the distribution of the random variable

Application: A Memoryless Process

A Change of Variable Theorem: x = a continuous RV with continuous density f(x). y=g(x) is a monotonic function over the range of x. y=g(x), f(y) = f(x(y)) |dx(y)/dy)| = f(x(y)) |dg-1(y)/dy)|

Change of Variable Applications
Standardized normal Lognormal to normal Fundamental probability transform

Standardized Normal X ~ N[, 2] Prob[X < a] = F(a)
Prob[X < a] = Prob[(X - )/ ] < (a - )/ y = (x - )/  J = dx(y)/dy =  f(y) = f(y+ )   = [1/sqr(2)]exp(-y2/2) Only a table for the standard normal is needed.

Textbooks Provide Tables of Areas for the Standard Normal
Econometric Analysis, WHG, 2008, Appendix G, page 1093, Rice Table 2 Note that values are only given for z ranging from 0.00 to No values are given for negative z.

Computing Probabilities
Standard Normal Tables give probabilities when μ = 0 and σ = 1. For other cases, do we need another table? Probabilities for other cases are obtained by “standardizing.” Standardized variable is z = (x – μ)/ σ z has mean 0 and standard deviation 1

Standard Normal Density

Standard Normal Distribution Facts
The random variable z runs from -∞ to +∞ (z) > 0 for all z, but for |z| > 4, it is essentially 0. The total area under the curve equals 1.0. The curve is symmetric around 0. (The normal distribution generally is symmetric around μ.)

Only Half the Table Is Needed
The area to left of 0.0 is exactly 0.5.

Only Half the Table Is Needed
The area left of 1.60 is exactly 0.5 plus the area between 0.0 and 1.60.

Areas Left of Negative Z
Area left of -1.6 equals area right of +1.6. Area right of +1.6 equals 1 – area to the left of

Computing Probabilities by Standardizing: Example

Lognormal Distribution

Lognormal Distribution of Monthly Wages in NLS

Log of Lognormal Variable

Fundamental Probability Transformation

Random Number Generation
The CDF is a monotonic function of x If u = F(x), x = F-1(u) We can generate u with a computer Example: Exponential Example: Normal

Generating Random Samples
Exponential u = F(x) = 1 – exp(-x) 1 – u = exp(-x) x = (-1/ ) ln(1 – u) Normal (,) u = (z) z = -1(u) x = z +  =  -1(u) + 

U[0,1] Generation Linear congruential generator x(n) = (a x(n-1) + b)mod m Properties of RNGs Replicability – they are not RANDOM Period Randomness tests The Mersenne twister: Current state of the art (of pseudo-random number generation)

3 – Joint Distributions

Jointly Distributed Random Variables
Usually some kind of association between the variables. E.g., two different financial assets Joint cdf for two random variables F(x, y) = Prob(X < x, Y < y)

Probability of a Rectangle
Prob[a1 < x < b1, a2 < y < b2] y x F(b1,b2) F(b1,a2) F(a1,b2) F(a1,a2)

Joint Distributions Discrete: Multinomial for R kinds of success in N independent trials Continuous: Bi- and Multivariate normal Mixed: Conditional regression models

Multinomial Distribution

Probabilities: Inherited Color Blindness
Inherited color blindness has different incidence rates in men and women. Women usually carry the defective gene and men usually inherit it. Pick an individual at random from the population. B=1 = has inherited color blindness, B=0, not color blind G=0 = MALE = gender, G=1, Female Marginal: P(B=1) = 2.75% Conditional: P(B=1|G=0) = 5.0% (1 in 20 men) P(B=1|G=1) = 0.5% (1 in 200 women) Joint: P(B=1 and G=0) = 2.5% P(B=1 and G=1) = 0.25%

Marginal Distributions
Prob[X=x] = y Prob[X=x,Y=y] Color Blind Gender B=0 B=1 Total G=0 .475 .025 0.50 G=1 .4975 .0025 .97255 .0275 1.00 Prob[G=0]=Prob[G=0,B=0]+ Prob[G=0,B=1]

Joint Continuous Distribution

Marginal Distributions

Copula Function - Application in Finance Bivariate Normal Distribution

The Bivariate Normal Distribution

Independent Random Variables
F(x, y) = Prob(X < x, Y < y) = Prob(X < x) Prob(Y < y) = FX(x) FY(y) f(x,y) = 2 F(x,y)/x y = f(x) f(y)

Independent Normals

Conditional Distributions
Color Blind Gender B=0 (No) B=1 (Yes) Total G=0 (M) .475 .025 0.50 G=1 (F) .4975 .0025 .97255 .0275 1.00 Prob(Not color blind given male) Prob(B=0|G=0) = Prob(B=0,G=0) / Prob(G=0) = / .50 = Prob(B=1|G=0) = .025/.5 = .05 Prob(B=1|G=0)+Prob(B=0|G=0)=1

Conditional Distribution Continuous Normal

Bivariate Normal Joint distribution is bivariate normal
Marginal distributions are normal Conditional distributions are normal

Y and Y|X Y X X

Model Building Typically f(y|x) is of interest
x is generated by a separate process f(x) Joint distribution is f(y,x)=f(y|x)f(x) Ex: demographic y = log(household income|family size) x = family size y|x ~ Normal(y|x , y|x ) x ~ Poisson ()

X=4 X=3 X=2 X=1 y|x ~ Normal[ x, 42 ], x = 1,2,3,4; Poisson