Challenges posed by Structural Equation Models Thomas Richardson Department of Statistics University of Washington Joint work with Mathias Drton, UC Berkeley.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Probability and Maximum Likelihood. How are we doing on the pass sequence? This fit is pretty good, but… Hand-labeled horizontal coordinate, t The red.

Generalized Method of Moments: Introduction

Multilevel analysis with EQS. Castello2004 Data is datamlevel.xls, datamlevel.sav, datamlevel.ess.

1 Regression as Moment Structure. 2 Regression Equation Y =  X + v Observable Variables Y z = X Moment matrix  YY  YX  =  YX  XX Moment structure.

Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.

Structural Equation Modeling Using Mplus Chongming Yang Research Support Center FHSS College.

Structural Equation Modeling

Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.

Probabilistic models Haixu Tang School of Informatics.

Unsupervised Learning

Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.

1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [ Research.

. The sample complexity of learning Bayesian Networks Or Zuk*^, Shiri Margel* and Eytan Domany* *Dept. of Physics of Complex Systems Weizmann Inst. of.

Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Supervised Learning Recap

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

Belief Propagation on Markov Random Fields Aggeliki Tsoli.

Goodness of Fit of a Joint Model for Event Time and Nonignorable Missing Longitudinal Quality of Life Data – A Study by Sneh Gulati* *with Jean-Francois.

Hilbert Space Embeddings of Hidden Markov Models Le Song, Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1.

Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.

Design of Engineering Experiments - Experiments with Random Factors

Conditional Random Fields

Statistical Inference and Regression Analysis: GB Professor William Greene Stern School of Business IOMS Department Department of Economics.

1 gR2002 Peter Spirtes Carnegie Mellon University.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Factor Analysis Ulf H. Olsson Professor of Statistics.

TOPLHCWG. Introduction The ATLAS+CMS combination of single-top production cross-section measurements in the t channel was performed using the BLUE (Best.

Mixture Modeling Chongming Yang Research Support Center FHSS College.

Super-Resolution of Remotely-Sensed Images Using a Learning-Based Approach Isabelle Bégin and Frank P. Ferrie Abstract Super-resolution addresses the problem.

CAUSAL SEARCH IN THE REAL WORLD. A menu of topics  Some real-world challenges:  Convergence & error bounds  Sample selection bias  Simpson’s paradox.

Bayes Net Perspectives on Causation and Causal Inference

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Multiple Sample Models James G. Anderson, Ph.D. Purdue University.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang.

Representation and Learning in Directed Mixed Graph Models Ricardo Silva Statistical Science/CSML, University College London Networks:

Mixed Cumulative Distribution Networks Ricardo Silva, Charles Blundell and Yee Whye Teh University College London AISTATS 2011 – Fort Lauderdale, FL.

Markov Random Fields Probabilistic Models for Images

Controlling for Common Method Variance in PLS Analysis: The Measured Latent Marker Variable Approach Wynne W. Chin Jason Bennett Thatcher Ryan T. Wright.

An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

Measurement Models: Identification and Estimation James G. Anderson, Ph.D. Purdue University.

Slide 9.1 Confirmatory Factor Analysis MathematicalMarketing In This Chapter We Will Cover Models with multiple dependent variables, where the independent.

CS Statistical Machine learning Lecture 24

Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan.

A generalized bivariate Bernoulli model with covariate dependence Fan Zhang.

Lecture 2: Statistical learning primer for biologists

Assessing Estimability of Latent Class Models Using a Bayesian Estimation Approach Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University Departments.

5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.

29 August 2013 Venkat Naïve Bayesian on CDF Pair Scores.

1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.

LOGISTIC REGRESSION Binary dependent variable (pass-fail) Odds ratio: p/(1-p) eg. 1/9 means 1 time in 10 pass, 9 times fail Log-odds ratio: y = ln[p/(1-p)]

M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.

ALISON BOWLING CONFIRMATORY FACTOR ANALYSIS. REVIEW OF EFA Exploratory Factor Analysis (EFA) Explores the data All measured variables are related to every.

Today Graphical Models Representing conditional dependence graphically

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

ICS 280 Learning in Graphical Models

Multimodal Learning with Deep Boltzmann Machines

Latent Variables, Mixture Models and EM

Writing about Structural Equation Models

Probabilistic Models with Latent Variables

Lecture 5 Unsupervised Learning in fully Observed Directed and Undirected Graphical Models.

The Most General Markov Substitution Model on an Unrooted Tree

Graziano and Raulin Research Methods: Chapter 12

Maximum Likelihood Estimation (MLE)

Presentation transcript:

Challenges posed by Structural Equation Models Thomas Richardson Department of Statistics University of Washington Joint work with Mathias Drton, UC Berkeley Peter Spirtes, CMU

Overview n Challenges for Likelihood Inference n Problems in Model Selection and Interpretation n Partial Solution u sub-class of path diagrams: ancestral graphs

Problems for Likelihood Inference n Likelihood may be multimodal u e.g. the bi-variate Gaussian Seemingly Unrelated Regression (SUR) model: X1X1 X2X2 Y1Y1 Y2Y2 may have up to 3 local maxima. Consistent starting value does not guarantee iterative procedures will find the MLE.

Problems for Likelihood Inference n Discrete latent variable models are not curved exponential families C X1X1 X2X2 X3X3 X4X4 binary observed variables ternary latent class variable 15 parameters in saturated model 14 model parameters BUT model has 2d.f. (Goodman) Usual asymptotics may not apply

Problems for Likelihood Inference n Likelihood may be highly multimodal in the asymptotic limit u After accounting for label switching/aliasing C X1X1 X2X2 X3X3 X4X4 Why report one mode ? d.f. may vary as a function of model parameters

Problems for Model Selection n SEM models with latent variables are not curved exponential families  Standard  2 asymptotics do not necessarily apply e.g. for LRTs u Model selection criteria such as BIC are not asymptotically consistent u The effective degrees of freedom may vary depending on the values of the model parameters

Problems for Model Selection n Many models may be equivalent: X1X1 X2X2 Y1Y1 Y2Y2 X1X1 X2X2 Y1Y1 Y2Y2 X1X1 X2X2 Y1Y1 Y2Y2 X1X1 X2X2 Y1Y1 Y2Y2

Problems for Model Selection X1X1 XpXp Y1Y1 YqYq   X1X1 XpXp Y1Y1 YqYq  n Models with different numbers of latents may be equivalent: u e.g. unrestricted error covariance within blocks

Problems for Model Selection n Models with different numbers of latents may be equivalent: u e.g. unrestricted error covariance within blocks X1X1 XpXp Y1Y1 YqYq   X1X1 XpXp Y1Y1 YqYq  Wegelin & Richardson (2001)

Two scenarios n A single SEM model is proposed and fitted. The results are reported.

Two scenarios n A single SEM model is proposed and fitted. The results are reported. n The researcher fits a sequence of models, making modifications to an original specification. u Model equivalence implies: F Final model depends on initial model chosen F Sequence of changes is often ad hoc F Equivalent models may lead to very different substantive conclusions u Often many equivalence classes of models give reasonable fit. Why report just one?

Partial Solution n Embed each latent variable model in a ‘larger’ model without latent variables characterized by conditional independence restrictions. n We ignore non-independence constraints and inequality constraints. Latent variable model Model imposing only independence constraints on observed variables Sets of distributions

ab t cd Toy Example: acbd ad ad c ad b ac d bd a G at dt bc t +others The Generating graph n Begin with a graph, and associated set of independences

ab t cd acbd ad ad c ad b ac d bd a G at dt bc t +others hidden: ‘Unobserved’ independencies in red Marginalization n Suppose now that some variables are unobserved n Find the independence relations involving only the observed variables Toy Example:

ab t cd acbd ad ad c ad b ac d bd a G at dt bc t +others hidden: ‘Unobserved’ independencies in red Marginalization n Suppose now that some variables are unobserved n Find the independence relations involving only the observed variables Toy Example:

ab t cd abcd acbd ad ad c ad b ac d bd a G G* ‘Graphical Marginalization’ n Now construct a graph that represents the conditional independence relations among the observed variables. n Bi-directed edges are required. represents Toy Example: all and only the distributions in which these independencies hold

Equivalence re-visited n Restrict model class to path diagrams including only observed variables characterized by conditional independence u Ancestral Graph Markov models n For such models we can: u Determine the entire class of equivalent models u Identify which features they have in common n Models are curved exponential: usual asymptotics do apply

A T AB C D AC BD AD AD C AD B AC D BD A A BCD Ancestral Graph

A V ABCD T AB C D U AC BD AD AD C AD B AC D BD A A BCD A BCD Equivalent ancestral graphs 

A V ABCD T AB C D U Q A BC D P R AC BD AD AD C AD B AC D BD A A BCD A BCD A BCD Markov Equiv. Class of Graphs with Latent Variables  Equivalent ancestral graphs

A V ABCD T AB C D U + infinitely many others Q A BC D P R AC BD AD AD C AD B AC D BD A A BCD A BCD A BCD A BCD N A BC D M R L Markov Equiv. Class of Graphs with Latent Variables  Equivalence Classes Equivalent ancestral graphs

ABCD A V ABCD T AB C D U + infinitely many others Q A BC D P R AC BD AD AD C AD B AC D BD A A BCD A BCD A BCD A BCD N A BC D M R L Markov Equiv. Class of Graphs with Latent Variables  Equivalence class of Ancestral Graphs Partial Ancestral Graph

ABCD A V ABCD T AB C D U + infinitely many others Q A BC D P R AC BD AD AD C AD B AC D BD A A BCD A BCD A BCD A BCD Equivalence class of Ancestral Graphs N A BC D M R L Markov Equiv. Class of Graphs with Latent Variables 

Measurement models n If we have pure measurement models with several indicators per latent: u May apply similar search methods among the latent variables (Spirtes et al. 2001; Silva et al.2003)

Other Related Work n Iterative ML estimation methods exist u Guaranteed convergence F Multimodality is still possible  Implemented in R package ggm (Drton & Marchetti, 2003) n Current work: u Extension to discrete data F Parameterization and ML fitting for binary bi-directed graphs already exist u Implementing search procedures in R

References n Richardson, T., Spirtes, P. (2002) Ancestral graph Markov models, Ann. Stat., 30: n Richardson, T. (2003) Markov properties for acyclic directed mixed graphs. Scand. J. Statist. 30(1), pp n Drton, M., Richardson T. (2003) A new algorithm for maximum likelihood estimation in Gaussian graphical models for marginal independence. UAI 03, n Drton, M., Richardson T. (2003) Iterative conditional fitting in Gaussian ancestral graph models. UAI n Drton, M., Richardson T. (2004) Multimodality of the likelihood in the bivariate seemingly unrelated regressions model. Biometrika, 91(2), Marchetti, G., Drton, M. (2003) ggm package. Available from Marchetti, G., Drton, M. (2003) ggm package. Available from