Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to log-linear models

Similar presentations


Presentation on theme: "Introduction to log-linear models"— Presentation transcript:

1 Introduction to log-linear models
Saturday, February 02, 2019Saturday, February 02, 2019 Analysis of count data Introduction to log-linear models Log-linear analysis = analysis on logarithmic scale!!

2 Logarithmic scale Natural logarithm If y = ln x
x = exp[y] x changes exponentially with a linear change in y y is measured on log scale

3 Logarithmic scale If ln x = a, then x = exp(a)
If ln x = az and z is discrete, then the change in x associated with one unit change in z is exp(a) If ln x = az and z is continuous, then the change in x associated with an infinitesimally small change in z is

4 Logarithmic scale and logit scale
(First-order) difference in ln is ln of ratio Second-order difference If ln OR = 1.2 and ln a = -ln b = -ln c = ln d, then odds ln(odds) = logit If y = f(x) and y = ln(a/b) then y is measured on logit scale odds ratio coding

5 Log-linear analysis Contingency-table analysis
Categorical data analysis Discrete multivariate analysis (Bishop, Fienberg and Holland, 1975) Analysis of cross-classified data Multivariate analysis of qualitative data (Goodman, 1978) Count data analysis

6 Log-linear model fit a model to a table of counts / frequencies
Two data sets: Survey: political attitudes of British electors Survey: leaving parental home in the Netherlands

7 Survey: political attitudes of British electors
Source: Payne, C. (1977) The log-linear model for contingency. In: C.O. Muircheartaigh and C. Payne eds. The analysis of survey data. Vol 2: Model fitting, Wiley, New York, pp [data p. 106].(from Butler and Stokes, ‘Political change in Britain’, Macmillan, 2nd edidition, 1974)

8 Survey: leaving parental home in the Netherlands

9 Counts are generated by Poisson process  Poisson distribution

10 The Poisson probability model
Let N be a random variable representing the number of events during a unit interval and let n be a realisation of n (COUNT): N is a Poisson r.v. following a Poisson distribution with parameter : The parameter  is the expected number of events per unit time interval:  = E[N]

11

12 Likelihood function Probability mass function: Log-likelihood function:  Likelihood equations to determine ‘best’ value of parameter 

13 Likelihood equations Hence: Hence: Var(N) = 

14 Log-linear model Let i represent an individual with characteristics xi
The probability of observing ni events during a unit interval given that the expected number of events is  : with or Log-linear model

15 The log-linear model The objective of log-linear analysis is to determine if the distribution of counts among the cells of a table can be explained by a simpler, underlying structure. Log-linear models specify different structures in terms of the cross-classified variables (rows, columns and layers of the table).

16 Log-linear models for two-way tables
Saturated log-linear model: Overall effect (level) Main effects (marginal freq.) Interaction effect In case of 2 x 2 table: 4 observations 9 parameters Normalisation constraints

17 Survey: leaving parental home in the Netherlands
Research question: do females leave home earlier than males?

18 Descriptive statistics
Leaving home Descriptive statistics Counts Percentages Odds of leaving home early rather than late Reference category

19 Log-linear models for two-way tables 4 models
Leaving home Log-linear models for two-way tables 4 models Model 1: Null model or overall effect model All categories are equiprobable (an observation is equally likely to fall into any cell) for all i and j Exp(4.887) = 132.5 = 530/4  = s.e ij is expected count (frequency) in cell (ij): category i of variable A (row) and category j of variable B (column)

20 Leaving home Where ij is a cell frequency generated by a Poisson process and Var[aX] = a2 Var[X] where a is a constant (e.g. Fingleton, 1984, p. 29)

21 Log-linear models for two-way tables
Leaving home Log-linear models for two-way tables Model 2: B null model: GLIM Categories of variable B (sex) are equiprobable within levels of variable A (age; time) for all j GLIM estimate s.e Parameter Exp(parameter) Prediction Overall effect TIME(1) TIME(2) 209/2 [321/2]/104.5

22 Log-linear models for two-way tables
Leaving home Log-linear models for two-way tables Model 2: B null model:SPSS Categories of variable B (sex) are equiprobable within levels of variable A (time) for all j SPSS estimate s.e Parameter Exp(parameter) Overall effect TIME(1) TIME(2)

23 SPSS Model: Poisson Design: Constant + TIMING Observed Expected
GENLOG timing sex /MODEL=POISSON /PRINT FREQ ESTIM CORR COV /PLOT NONE /CRITERIA =CIN(95) ITERATE(20) CONVERGE(.001) DELTA(0) /DESIGN timing /SAVE PRED . SPSS Model: Poisson Design: Constant + TIMING Observed Expected Factor Value Count % Count % TIMING Early SEX Females ( 25.47) ( 19.72) SEX Males ( 13.96) ( 19.72) TIMING Late SEX Females ( 26.98) ( 30.28) SEX Males ( 33.58) ( 30.28) Parameter Estimates Asymptotic 95% CI Parameter Estimate SE Lower Upper

24 Design: Constant + SEX + TIMING Table Information Observed Expected
GENLOG timing sex /MODEL=POISSON /PRINT FREQ ESTIM CORR COV /CRITERIA =CIN(95) ITERATE(20) CONVERGE(.001) DELTA(0) /DESIGN sex timing /SAVE PRED . Model: Poisson Design: Constant + SEX + TIMING Table Information Observed Expected Factor Value Count % Count % TIMING Early SEX Females ( 25.47) ( 20.68) SEX Males ( 13.96) ( 18.75) TIMING Late SEX Females ( 26.98) ( 31.77) SEX Males ( 33.58) ( 28.80) Parameter Estimates Asymptotic 95% CI Parameter Estimate SE Lower Upper Constant [SEX = 1] [SEX = 2] [TIMING = 1] [TIMING = 2]

25 Log-linear models for two-way tables
Leaving home Log-linear models for two-way tables Model 3: independence model (unsaturated model) Categories of variable B (sex) are not equiprobable but the probability is independent of levels of variable A (age; time) estimate s.e Parameter Exp(parameter) Overall effect TIME(2) SEX(2) GLIM

26 Females leaving home early: 109.62
LOG-LINEAR MODEL: predictions (unsaturated model) Females leaving home early: Females leaving home late: * = Males leaving home early: * = 99.37 Males leaving home late: * * =

27 SPSS Parameter Estimate SE 1 5.0280 .0721 Overall effect
Leaving home SPSS Parameter Estimate SE Overall effect Time(1) Time(2) Sex(1) Sex (2)

28 Log-linear models for two-way tables
Leaving home Log-linear models for two-way tables Model 4: saturated model The values of categories of variable B (sex) depend on levels of variable A (age; time) estimate s.e parameter Overall effect TIME(2) SEX(2) TIME(2).SEX(2) GLIM ln 135 ln ln 135 ln odds ln 74 - ln 135 ln odds ratio

29 Log-linear model parameters and odds and odds ratios
Dummy-variable coding: Reference categories: conservative / male Interaction effect: ln odds ratio Dummy coding Main effects: ln odds(reference category) Time effect: ln odds(females) = ln 143/135 = ln = Sex effect: ln odds(early) = ln 74/135 = ln = Dummy coding Overall effect: ln frequency ln frequency(early, female) = ln 135 = Dummy coding

30 Parameter Estimate SE Parameter 1 5.1846 .0748 Overall effect
Leaving home Parameter Estimate SE Parameter Overall effect Time(1) Time(2) Sex(1) Sex(2) Time(1) * Sex(1) Time(1) * Sex(2) Time(2) * Sex(1) Time(2) * Sex(2) SPSS

31 LOG-LINEAR MODEL: predictions Expected frequencies
Leaving home LOG-LINEAR MODEL: predictions Expected frequencies Observed Model 1 Model 2 Model 3 Model 4 Model 5 Fem_<20 F Mal_<20 F Fem_>20 F Mal_>20 F D:\s\1\liebr\2_2\2_2.wq2

32 Relation log-linear model and Poisson regression model
are dummy variables (0 if categ. i or j = 1 and 1 if i or j = 2) and interaction variable is

33 Log-linear model fit a model to a table of frequencies
Data: survey of political attitudes of British electors Source: Payne, C. (1977) The log-linear model for contingency. In: C.O. Muircheartaigh and C. Payne eds. The analysis of survey data. Vol 2: Model fitting, Wiley, New York, pp [data p. 106].(from Butler and Stokes, ‘Political change in Britain’, Macmillan, 2nd edidition, 1974)

34 The classical approach
Geometric means (Birch, 1963) Effect coding (mean is ref. Cat.) Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency tables’,J. Royal Stat. Soc. (B), 25:

35 The basic model Political attitudes Overall effect : 22.98/4 = 5.7456
Effect of party : Conservative : 11.49/ = Labour : 11.49/ = Effect of gender : Male : 11.44/ = Female : 11.54/ = Interaction effects: Gender-Party interaction effect Male conservative : = Female conservative : = Male labour : = Female labour : =

36 Parameters are subject to constraints: normalisation constraints
Political attitudes The basic model Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency tables’,J. Royal Stat. Soc. (B), 25: Coding: effect coding Parameters are subject to constraints: normalisation constraints Only first-order contrasts can be estimated:

37 Political attitudes The basic model (GLIM) Estimate S.E.

38 Log-linear model parameters and odds and odds ratios
Dummy-variable coding: Reference categories: conservative / male Interaction effect: ln odds ratio Main effects: ln odds(reference category) Party effect: ln odds(males) = ln 335/279 = ln = Gender effect: ln odds(conservatives) = ln 352/279 = ln = 0.2324 Overall effect: ln frequency ln frequency(conservatives,males) = ln 279 =

39 Log-linear model parameters and odds and odds ratios
Recall: translation from odds to probabilities If you want to predict probabilities or proportions instead of odds

40 Log-linear model parameters and odds and odds ratios
Effect coding: +1:labour / female -1: conservative / male Interaction effect: ln odds ratio Dummy coding Translation between dummy-variable coding and effect coding (Alba, 1987) Sign Parameter Male conservative ( /4) = Female conservative ( /4) = Male labour ( /4) = Female labour ( /4) = Effect coding Translation between effect coding and dummy-variable coding: WEIGHTED SUM (+1)( )+(-1)(0.0933)+(-1)(0.0933)+(+1)( ) =

41 Log-linear model parameters and odds and odds ratios
Effect coding: +1:labour / female -1: conservative / male Main effects: ln odds(reference category) Gender effect: ln odds(conservatives) = ln 352/279 = ln = 0.2324 Dummy coding Translation Sign Parameter Female / = Male (0.2324/ ) = Effect coding (ln odds) / 2 (ln odds ratio) / 4 Translation: WEIGHTED SUM Dummy coding (+1)( )+(-1)( ) = Female conservative Male conservative

42 Log-linear model parameters and odds and odds ratios
Effect coding: +1:labour / female -1: conservative / male Main effects: ln odds(reference category) Party effect: ln odds(males) = ln 335/279 = ln = Dummy coding Translation Sign Parameter Conservative (0.1829/ ) = Labour / = Effect coding (ln odds) / 2 (ln odds ratio) / 4 Translation: WEIGHTED SUM Dummy coding (-1)( )+(+1)( ) = Conservative male Labour male

43 Log-linear model parameters and odds and odds ratios
Effect coding: +1:labour / female -1: conservative / male Overall effect: ln frequency ln frequency(conservatives,males) = ln 279 = Dummy coding Translation Sign Parameter Conservatives, males = Effect coding (ln odds)/2 (ln odds ratio)/4 (ln odds)/2 Translation: WEIGHTED SUM Dummy coding (+1)[ ] = Conservative Male Conservative Male

44 Political attitudes The basic model (SPSS)

45 The basic model (1) Political attitudes
ln 11 = = ln 12 = = ln 21 = = ln 22 = =

46 The design-matrix approach

47 Design matrix unsaturated log-linear model
Number of parameters exceeds number of equations  need for additional equations (X’X)-1 is singular  identify linear dependencies

48 Design matrix unsaturated log-linear model
(additional eq.) Coding!

49 3 unknowns  3 equations where is the frequency predicted by the model

50 Political attitudes

51  Political attitudes 314.17*1.0040*0.9772 = 308.23
314.17*[1/1.0040]* =

52 Design matrix Saturated log-linear model

53 Political attitudes exp[ ] = exp[5.6312] = 279 exp[ ] = 335

54 Political attitudes

55 Design matrix: other restrictions on parameters saturated log-linear model
(SPSS)

56 Political attitudes

57 Political attitudes REF: females labour REF: males conservative
335/279 352/291 REF: females labour REF: males conservative

58 Political attitudes

59 Prediction of counts or frequencies:
Political attitudes Prediction of counts or frequencies: A. Effect coding 279 = * * * 352 = * * * 335 = * * * 291 = * * * B. Contrast coding: GLIM 291 = 279 * * * (females voting labour) 279 = 279 * * * (males voting conservative = ref.cat) 352 = 279 * * * (females voting conservative) 335 = 279 * * * (males voting labour) C. Contrast coding: SPSS (SPSS adds 0.5 to observed values ) 279.5 = * * * 352.5 = * * * 1 291.5 = * * * 1 (females voting labour = ref.cat) 335.5 = * * * 1

60 The Poisson regression model

61 The Poisson probability model
Political attitudes The Poisson probability model with

62

63

64

65 Design: Constant + DESTIN + ORIGIN
Model: Poisson Design: Constant + DESTIN + ORIGIN Parameter Estimate SE Overall Destin 1 Destin 2 Destin 3 Destin 4 Origin 1 Origin 2 Origin 3 Origin 4

66 Hybrid log-linear models
Hybrid log-linear models contain unconventional effect parameters. Interaction effects are restricted in certain way.  restrictions on interaction parameters.

67 Restrictions on effect parameters
Some parameter values are fixed e.g. offset (biproportional adjustment) e.g. quasi-independence model (ij = 0 for i=j) Relation between some parameter values is fixed e.g. normalisation restrictions (coding) e.g. hybrid log-linear models

68 Examples of hybrid log-linear models
Diagonals parameter model 1: (main) diagonal effect With ck = 1 for i  j and ck = c for i = j (diagonal) Off-diagonal elements are independent and diagonal elements are changed by a common factor.

69 ck = 1 for i  j and ck = ci for i = j (diagonal)
Diagonals parameter model 2: each diagonal element has separate effect parameter ck = 1 for i  j and ck = ci for i = j (diagonal) Diagonal elements are predicted perfectly by the model Diagonals parameter model 3: the diagonal and each minor diagonal has unique effect parameter With k indicated the diagonal: k = R + i - j where R is the number of rows (or columns). There are 2R-1 values of ck. Application: APC models

70 Sufficient statistics Predicted marginal totals should satisfy the sufficient statistics
Model: With Sk the set (i,j)-combinations with the same value of ck. Predicted cell frequencies should satisfy: or with

71 Algorithms for hybrid log-linear models
Generalized iterative scaling algorithm by Darroch and Ratcliffe (1972) Iterative proportional fitting (IPF) applied to unfolded table


Download ppt "Introduction to log-linear models"

Similar presentations


Ads by Google