Presentation on theme: "G89.2247 Lecture 101 Examples of Binary Data Binary Data and Correlation Measurement Models and Binary Data Measurement Models and Ordinal Data Analyzing."— Presentation transcript:
G Lecture 101 Examples of Binary Data Binary Data and Correlation Measurement Models and Binary Data Measurement Models and Ordinal Data Analyzing binary data with different SEM software packages
G Lecture 102 Examples of Binary Data Some binary outcomes have categorical meaning Did Tasha get an academic job? (yes/no) Has Jimmy ever injected heroin? (yes/no) Other binary outcomes reflect passing some threshold Did Jenna make the Dean's list this semester? Other binary outcomes may reflect some complex position on an ordered dimension True or False: I am an outgoing person True or False: I smoked marijuana last year
G Lecture 103 Dichotomized Data: A Bad Habit of Psychologists Sometimes perfectly good quantitative data is made binary because it seems easier to talk about "High" vs. "Low" The worst habit is median split Usually the High and Low groups are mixtures of the continua Rarely is the median interpreted rationally See references Cohen, J. (1983) The cost of dichotomization. Applied Psychological Measurement, 7, McCallum, R.C., Zhang, S., Preacher, K.J., Rucker, D.D. (2002) On the practice of dichotomization of quantitative variables. Psychological Methods, 7,
G Lecture 104 Correlations of Binary data Product moment correlations computed on binary data are called phi coefficients Phi depends on the means of the two variables as well as their “strength of relationship”
G Lecture 105 Example: Phi is.13, Underlying r is.66
G Lecture 106 Factor Analysis of Phi Coefficients Loadings tend to be low In exploratory factor analysis, some factors emerge that cluster together variables that have the same proportion positive (mean values) In educational psychology these are called "difficulty factors" Considered to be an artifact of cutpoint Conventional psychometric wisdom says factor analysis of phi correlations is incorrect
G Lecture 107 Phi Factor Analysis as Incorrect Mislevy (1986) summarized problems with the analysis of phi coefficients in an often-cited paper on factor analysis of categorical data : Phi coefficients depend on the means of the X variables as well as their “strength of relationship” The linear factor model inherently mispecified More appropriate models exist
G Lecture 108 The linear phi factor model is inherently mispecified Suppose that the binary X variables are coded as (0, 1). Consider the linear factor model: X j = 1j f 1 + 2j f 2 + e j, (j=1, 2,... q). Even if we assume that the model is meaningful for values between 0 and 1, there is no guarantee that the fitted values of X j will be in that interval.
G Lecture 109 Modern "appropriate" methods Suppose X is a dichotomized variable X * is the original continuous variable X j =1 if X j * > j and X j =0 otherwise Tetrachoric correlations estimate the correlations among the X * variables rather than the dichotomized ones. When the sample size is large, SEM software will compute the tetrachoric correlations, assuming that the underlying distribution is bivariate normal.
G Lecture 1011 Example of Factor Analysis Use EQS to simulate simple one factor model Check solution with SPSS Dichotomize variables at two thresholds Compute biased factor analysis Compute analysis based on tetrachoric correlations. Note the standard errors!
G Lecture 1012 Possible Overstatement of Conventional Wisdom In many substantive fields, binary data are included in factor analyses and measurement models Inferences not necessarily wrong Means of binary data may similar Binary outcomes conceived more as categorical events than measures of some underlying continuum
G Lecture 1013 Model Specification: Always a problem? X 1 = f + e 1 X 2 = f + e 2 X q = 1q f + e q Whether the term, 1j f 1, exceeds the interval (0,1) depends on the distribution of f. What do we know about the distribution of f? ONLY WHAT WE ASSUME Normal (Gibbons et al) Continuous and unbounded (Mislevy) Arbitrary (Bartholomew) Distribution may be some other that prevents out of range scores in factor model
G Lecture 1014 Generalization: Ordinal data, mixed data (binary, ordinal, quantitative) When one variable is quantitative and the other is binary Product moment correlation is called point biserial correlation Analogue of tetrachoric is simply biserial correlation When variables are ordinal Product moment r is Spearman Rank Correlation Inferred process correlation is Polychoric Correlation
G Lecture 1015 Tetrachoric, Polychoric Correlations require large (1000s) to estimate For small n's the estimates can be unstable Unstable estimates lead to covariance structures that have problems Not positive definite Cannot be inverted Cannot be fit with SEM Muthen's software MPlus has better estimators of the polychoric and tetrachoric values.
G Lecture 1016 Interpretation of SEM models based on Categorical data Latent variables represent processes inferred from RECONSTRUCTED quantitative variables Think in terms of X * rather than X Unit is standard deviation of implied continuum Effects are often larger Work on standard errors is still being done