Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Discrete and Categorical Data William N. Evans Department of Economics University of Maryland.

Similar presentations


Presentation on theme: "1 Discrete and Categorical Data William N. Evans Department of Economics University of Maryland."— Presentation transcript:

1 1 Discrete and Categorical Data William N. Evans Department of Economics University of Maryland

2 2 Introduction Workhorse statistical model in social sciences is the multivariate regression model Ordinary least squares (OLS) y i = β 0 + x 1i β 1 + x 2i β 2 +… x ki β k + ε i y i = x i β + ε i

3 3 Linear model y i =  +  x i +  i  and  are “population” values – represent the true relationship between x and y Unfortunately – these values are unknown The job of the researcher is to estimate these values Notice that if we differentiate y with respect to x, we obtain dy/dx = 

4 4  represents how much y will change for a fixed change in x –Increase in income for more education –Change in crime or bankruptcy when slots are legalized –Increase in test score if you study more

5 5 Put some concreteness on the problem State of Maryland budget problems –Drop in revenues –Expensive k-12 school spending initiatives Short-term solution – raise tax on cigarettes by 34 cents/pack Problem – a tax hike will reduce consumption of taxable product Question for state – as taxes are raised, how much will cigarette consumption fall?

6 6 Simple model: y i =  +  x i +  i Suppose y is a state’s per capita consumption of cigarettes x represents taxes on cigarettes Question – how much will y fall if x is increased by 34 cents/pack? Problem – many reasons why people smoke – cost is but one of them –

7 7 Data –(Y) State per capita cigarette consumption for the years 1980-1997 –(X) tax (State + Federal) in real cents per pack –“Scatter plot” of the data –Negative covariance between variables When x>, more likely that y<  When x  Goal: pick values of  and  that “best fit” the data –Define best fit in a moment

8 8 Notation True model y i =  +  x i +  i We observe data points (y i,x i ) The parameters  and  are unknown The actual error (  i ) is unknown Estimated model (a,b) are estimates for the parameters ( ,  ) e i is an estimate of  i where e i =y i -a-bx i How do you estimate a and b?

9 9 Objective: Minimize sum of squared errors Min  i e i 2 =  i (y i – a – bx i ) 2 Minimize the sum of squared errors (SSE) Treat positive and negative errors equally –Over or under predict by “5” is the same magnitude of error –“Quadratic form” –The optimal value for a and b are those that make the 1 st derivative equal zero –Functions reach min or max values when derivatives are zero

10 10

11 11

12 12 The model has a lot of nice features –Statistical properties easy to establish –Optimal estimates easy to obtain –Parameter estimates are easy to interpret –Model maximizes prediction If you minimize SSE you maximize R 2 The model does well as a first order approximation to lots of problems

13 13 Discrete and Qualitative Data The OLS model work well when y is a continuous variable –Income, wages, test scores, weight, GDP Does not has as many nice properties when y is not continuous Example: doctor visits Integer values Low counts for most people Mass of observations at zero

14 14 Downside of forcing non-standard outcomes into OLS world? Can predict outside the allowable range –e.g., negative MD visits Does not describe the data generating process well –e.g., mass of observations at zero Violates many properties of OLS –e.g. heteroskedasticity

15 15 This talk Look at situations when the data generating process does lend itself well to OLS models Mathematically describe the data generating process Show how we use different optimization procedure to obtain estimates Describe the statistical properties

16 16 Show how to interpret parameters Illustrate how to estimate the models with popular program STATA

17 17 Types of data generating processes we will consider Dichotomous events (yes or no) –1=yes, 0=no –Graduate high school? work? Are obese? Smoke? Ordinal data –Self reported health (fair, poor, good, excel) –Strongly disagree, disagree, agree, strongly agree

18 18 Count data –Doctor visits, lost workdays, fatality counts Duration data –Time to failure, time to death, time to re- employment

19 19 Econometric Resources Recommended textbook –Jeffrey Wooldridge, undergraduate and grad –Lots of insight and mathematical/statistical detail –Very good examples Helpful web sites –My graduate class –Jeff Smith’s class

20 20 STATA Very fast, convenient, well-documented, cheap and flexible statistical package Excellent for cross-section/panel data projects, not as great for time series Not as easy to manipulate large data sets from flat files as SAS I usually clean data in SAS, estimate models in STATA

21 21 STATA Resources - Specific “Regression Models for Categorical Dependent Variables Using STATA” –J. Scott Long and Jeremy Freese Available for sale from STATA website for $52 (www.stata.com)www.stata.com Post-estimation subroutines that translate results –Do not need to buy the book to use the subroutines

22 22 In STATA command line type net search spost Will give you a list of available programs to download One is Spostado from http://www.indiana.edu/~jslsoc/stataw.indiana.edu/~jslsoc/stata Click on the link and install the files

23 23 Continuous Distributions Random variables with infinite number of possible values Examples -- units of measure (time, weight, distance) Many discrete outcomes can be treated as continuous, e.g., SAT scores

24 24 How to describe a continuous random variable The Probability Density Function (PDF) The PDF for a random variable x is defined as f(x), where f(x) $ 0 I f(x)dx = 1 Calculus review: The integral of a function gives the “area under the curve”

25 25

26 26 Cumulative Distribution Function (CDF) Suppose x is a “measure” like distance or time 0 # x # 4 We may be interested in the Pr(x # a) ?

27 27 CDF What if we consider all values?

28 28 Properties of CDF Note that Pr(x # b) + Pr(x>b) =1 Pr(x>b) = 1 – Pr(x # b) Many times, it is easier to work with compliments

29 29 General notation for continuous distributions The PDF is described by lower case such as f(x) The CDF is defined as upper case such as F(a)

30 30 Standard Normal Distribution Most frequently used continuous distribution Symmetric “bell-shaped” distribution As we will show, the normal has useful properties Many variables we observe in the real world look normally distributed. Can translate normal into ‘standard normal’

31 31 Examples of variables that look normally distributed IQ scores SAT scores Heights of females Log income Average gestation (weeks of pregnancy) As we will show in a few weeks – sample means are normally distributed!!!

32 32 Standard Normal Distribution PDF: For -  # z # 

33 33 Notation  (z) is the standard normal PDF evaluated at z  [a] = Pr(z  a)

34 34

35 35 Standard Normal Notice that: –Normal is symmetric:  (a) =  (-a) –Normal is “unimodal” –Median=mean –Area under curve=1 –Almost all area is between (-3,3) Evaluations of the CDF are done with –Statistical functions (excel, SAS, etc) –Tables

36 36 Standard Normal CDF Pr(z  -0.98) =  [-0.98] = 0.1635

37 37

38 38 Pr(z  1.41) =  [1.41] = 0.9207

39 39

40 40 Pr(x>1.17) = 1 – Pr(z  1.17) = 1-  [1.17] = 1 – 0.8790 = 0.1210

41 41

42 42 Pr(0.1  z  1.9) = Pr(z  1.9) – Pr(z  0.1) = M (1.9) - M (0.1) = 0.9713 - 0.5398 = 0.4315

43 43

44 44

45 45

46 46 Important Properties of Normal Distribution Pr(z  A) =  [A] Pr(z > A) = 1 -  [A] Pr(z  - A) =  [-A] Pr(z > -A) = 1 -  [-A] =  [A]

47 47 Maximum likelihood estimation Observe n independent outcomes, all drawn from the same distribution (y 1, y 2, y 3 ….y n ) y i is drawn from f(y i ; θ) where θ is an unknown parameter for the distribution f Recall definition of indepedence. If a and b and independent, Prob(a and b) = Pr(a)Pr(B)

48 48 Because all the draws are independent, the probability these particular n values of Y would be drawn at random is called the ‘likelihood function’ and it equals L = Pr(y 1 )Pr(y 2 )…Pr(y n ) L = f(y 1 ; θ)f(y 2 ; θ)…..f(y 3 ; θ)

49 49 MLE: pick a value for θ that best represents the chance these n values of y would have been generated randomly To maximize L, maximize a monotonic function of L Recall ln(abcd)=ln(a)+ln(b)+ln(c)+ln(d)

50 50 Max L = ln(L) = ln[f(y 1 ; θ)] +ln[f(y 2 ; θ)] + ….. ln[f(y n ; θ) = Σ i ln[f(y i ; θ)] Pick θ so that L is maximized d L /dθ = 0

51 51 L θ θ1θ1 θ2θ2

52 52 Example: Poisson Suppose y measures ‘counts’ such as doctor visits. y i is drawn from a Poisson distribution f(y i ;λ) =e -λ λ y i /y i ! For λ>0 E[y i ]= Var[y i ] = λ

53 53 Given n observations, (y 1, y 2, y 3 ….y n ) Pick value of λ that maximizes L Max L = Σ i ln[f(y i ; θ)] = Σ i ln[e -λ λ y i /y i !] = Σ i [– λ + y i ln(λ) – ln(y i !)] = -n λ + ln(λ) Σ i y i – Σ i ln(y i !)

54 54 L = -n λ + ln(λ) Σ i y i – Σ i ln(y i !) d L /dθ = -n + (1/ λ )Σ i y i = 0 Solve for λ λ = Σ i y i /n =  = sample mean of y

55 55 In most cases however, cannot find a ‘closed form’ solution for the parameter in ln[f(y i ; θ)] Must ‘search’ over all possible solutions How does the search work? Start with candidate value of θ. Calculate d L /dθ

56 56 If d L /dθ > 0, increasing θ will increase L so we increase θ some If d L /dθ < 0, decreasing θ will increase L so we decrease θ some Keep changing θ until d L /dθ = 0 How far you ‘step’ when you change θ is determined by a number of different factors

57 57 L θθ1θ1 d L/d θ > 0

58 58 L θ θ3θ3 d L/d θ < 0

59 59 Properties of MLE estimates Sometimes call efficient estimation. Can never generate a smaller variance than one obtained by MLE Parameters estimates are distributed as a normal distribution when samples sizes are large


Download ppt "1 Discrete and Categorical Data William N. Evans Department of Economics University of Maryland."

Similar presentations


Ads by Google