Modeling Correlated/Clustered Multinomial Data Justin Newcomer Department of Mathematics and Statistics University of Maryland, Baltimore County Probability.

Slides:



Advertisements
Similar presentations
Use of Estimating Equations and Quadratic Inference Functions in Complex Surveys Leigh Ann Harrod and Virginia Lesser Department of Statistics Oregon State.
Advertisements

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 6 Point Estimation.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Expectation-Maximization (EM) Algorithm Md. Rezaul Karim Professor Department of Statistics University of Rajshahi Bangladesh September 21, 2012.
Probabilistic models Haixu Tang School of Informatics.
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Maximum likelihood (ML) and likelihood ratio (LR) test
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Instructor: K.C. Carriere
458 Fitting models to data – II (The Basics of Maximum Likelihood Estimation) Fish 458, Lecture 9.
Maximum likelihood (ML)
Generalised linear models
Maximum likelihood (ML) and likelihood ratio (LR) test
Log-linear and logistic models
Mixed models Various types of models and their relation
Continuous Random Variables and Probability Distributions
Maximum likelihood (ML)
Multivariate Probability Distributions. Multivariate Random Variables In many settings, we are interested in 2 or more characteristics observed in experiments.
GEE and Generalized Linear Mixed Models
1 Regression Models with Binary Response Regression: “Regression is a process in which we estimate one variable on the basis of one or more other variables.”
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Distributions Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
Estimation Basic Concepts & Estimation of Proportions
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University.
A REVIEW OF OCCUPANCY PROBLEMS AND THEIR APPLICATIONS WITH A MATLAB DEMO Samuel Khuvis, Undergraduate Nagaraj Neerchal, Professor of Statistics Department.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Random Sampling, Point Estimation and Maximum Likelihood.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
HSRP 734: Advanced Statistical Methods June 19, 2008.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Danila Filipponi Simonetta Cozzi ISTAT, Italy Outlier Identification Procedures for Contingency Tables in Longitudinal Data Roma,8-11 July 2008.
GEE Approach Presented by Jianghu Dong Instructor: Professor Keumhee Chough (K.C.) Carrière.
Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
An Introduction to Overdispersion Justin Newcomer University of Maryland, Baltimore County STAT 690 – Graduate Student Seminar October 5, 2005.
STA 216 Generalized Linear Models Meets: 2:50-4:05 T/TH (Old Chem 025) Instructor: David Dunson 219A Old Chemistry, Teaching.
Introduction to logistic regression and Generalized Linear Models July 14, 2011 Introduction to Statistical Measurement and Modeling Karen Bandeen-Roche,
Bivariate Poisson regression models for automobile insurance pricing Lluís Bermúdez i Morata Universitat de Barcelona IME 2007 Piraeus, July.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Estimation in Marginal Models (GEE and Robust Estimation)
A generalized bivariate Bernoulli model with covariate dependence Fan Zhang.
Estimation Method of Moments (MM) Methods of Moment estimation is a general method where equations for estimating parameters are found by equating population.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Assessing Estimability of Latent Class Models Using a Bayesian Estimation Approach Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University Departments.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
CHAPTER 6: SAMPLING, SAMPLING DISTRIBUTIONS, AND ESTIMATION Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
Sampling Distributions and Estimation
CH 5: Multivariate Methods
Inference for the mean vector
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Spatial Prediction of Coho Salmon Counts on Stream Networks
Comparisons among methods to analyze clustered multivariate biomarker predictors of a single binary outcome Xiaoying Yu, PhD Department of Preventive Medicine.
Statistical NLP: Lecture 4
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

Modeling Correlated/Clustered Multinomial Data Justin Newcomer Department of Mathematics and Statistics University of Maryland, Baltimore County Probability and Statistics Day, April 28, 2007 Joint Research with Professor Nagaraj K. Neerchal, UMBC and Jorge G. Morel, PhD, P&G Pharmaceuticals, Inc.

2 Motivation In the analysis of forest pollen, counts of the frequency of occurrence of different kinds of pollen grains are made at various levels of a sediment core An attempt is then made to reconstruct the past vegetation changes in the area from which the core was taken Example – Forrest Pollen Count, Mosimann (1962)

3 Motivation Four arboreal types of fossil forest pollen (pine, fir, oak and alder) were counted in the Bellas Artes core from the Valley of Mexico At various levels of the core, pollen was classified in clusters of 100 pollen grains The Data: Example – Forrest Pollen Count, Mosimann (1962)

4 Motivation The probability function: Key assumptions:  Each observation can be classified by exactly one of k possible outcomes, with probabilities  1,...,  k  All observations are independent of each other In our example, since each pollen count comes from a cluster of 100 pollen grains, the individual observations within a cluster can be expected to be correlated  The possible correlations are a violation of the multinomial model assumptions! The Multinomial Model

5 Motivation How can we properly model these data and estimate the proportions of pollen grains? What are the effects of using the wrong model? Problem Statement

6 Overdispersion (Extra Variation) Data exhibit variances larger than that permitted by the multinomial model Usually caused by a lack of independence or clustering of experimental units “Overdispersion is not uncommon in practice. In fact, some would maintain that over-dispersion is the norm in practice and nominal dispersion the exception.”  McCullagh and Nelder (1989) Overview

7 Overdispersion (Extra Variation) Usually characterized by the first two moments  The quantity {1+  2 (m – 1)} is known as the design effect (Kish, 1965). The parameter  is known as the “intra class” or “intra cluster” correlation  We use  to denote a positive intra cluster correlation which corresponds to overdispersion Multinomial Overdispersion

8 Parameter Estimation How can we properly model these data and estimate the proportions of pollen grains? Moment Based Likelihood Based Quasi-Likelihood Generalized Estimating Equations Finite Mixture Distribution Dirichlet Multinomial Distribution (Easily implemented in SAS – Proc Genmod) (Not currently in SAS – Must write your own code)

9 Quasi-Likelihood Estimation Here we assume that overdispersion occurs by inflation of variances by a constant factor  Estimate systematic structure of the model via maximum likelihood procedures  Inflate the variance by a suitable constant Wedderburn (1974), Cox and Snell (1989)

10 Generalized Estimating Equations (GEE) Liang and Zeger (1986), Zeger and Liang (1986) Extension of Quasi-likelihood to clustered and longitudinal data:  The Generalized Estimating Equations are: 

11 Likelihood Models for Correlated Multinomial Multinomial Distribution with a Dirichlet Prior  Dirichlet Multinomial Distribution, Mosimann (1962)

12 It can be shown that  If we let then the moments of the Dirichlet Multinomial distribution are given by  Dirichlet Multinomial Distribution, Mosimann (1962) Likelihood Models for Correlated Multinomial

13 Likelihood Models for Correlated Multinomial Can be represented as: T=YN+X|N  N  Binomial( , m), Y  Multinomial( , 1), N  Y  (X|N)  Multinomial( , m-N ) if N < m Finite Mixture of Multinomials, Morel & Neerchal (1993)

14 Likelihood Models for Correlated Multinomial It can be shown that:  If and, Then the moments of the Finite Mixture distribution are given by,  Finite Mixture of Multinomials, Morel & Neerchal (1993)

15 Maximum Likelihood Estimation Computed using the Fisher Scoring Algorithm:  Fisher Information Matrix plays an important role  Can be computationally challenging   Approximations are available  Dirichlet Multinomial FIM can be computed using marginal Beta-Binomial moments Overview

16 Maximum Likelihood Estimation Maximum Likelihood Estimation results under the Finite Mixture and Dirichlet Multinomial Distributions The naïve model underestimates the standard errors The FM model gives smaller standard errors for the estimates of  Example – Forrest Pollen Count, Mosimann (1962) (pine) (fir) (oak) (alder)  4 = 1-(  1 +  2 +  3 )

17 Maximum Likelihood Estimation Simulation Study What are the effects of using the wrong model? After each simulation, we calculate the average of the determinants from each model A comparison of these averages gives us insight as to which model may be more efficient

18 Maximum Likelihood Estimation Simulation Study The Joint Asymptotic Relative Efficiency (JARE) can be used to summarize the simulation results as it indicates which estimate would have a smaller asymptotic variance For a vector parameter, JARE is the ratio of the determinants of the asymptotic variance-covariance matrices 

19 Conclusions If we observe correlated/clustered multinomial data, use of the naïve multinomial model causes the standard errors to be underestimated which leads to erroneous inferences and inflated Type-I error rates If the data truly comes from a Finite Mixture distribution, then estimation using this model clearly outperforms the Dirichlet Multinomial in terms of efficiency If we are unsure of the distribution, the FM model may underestimate the standard errors and the Dirichlet Multinomial model provides a safe alternative

20 Future Work Covariates can be included and linked to the model parameters through “link” functions as in the Generalized Linear Model (GLM) framework Obtain the expressions for the efficiency of likelihood models relative to GEE Use simulations to see if gains in efficiency of the likelihood models can be achieved over GEE Does the inclusion of covariates change our conclusions?  Does the choice of link function have an influence? Extension to Include Covariates Simulation Study

21 References Cox, D.R. and Snell, E.J. (1989) Analysis of Binary Data. 2nd Ed. New York: Chapman and Hall. Kish, L. (1965) Survey Sampling. New York: John Wiley & Sons. Liang, K.Y. and Zeger, S.L. (1986) “Longitudinal data analysis using generalized linear models.” Biometrika 73: McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models. 2nd Ed. London: Chapman and Hall. Morel, J.G. and Nagaraj, N.K. (1993) “A finite mixture distribution for modelling multinomial extra variation.” Biometrika 80: Mosimann, J. E. (1962) “On the Compound Multinomial Distribution, the Multivariate  - distribution, and Correlation among Proportions,” Biometrika, 49: Neerchal, N.K. and Morel, J.G. (1998) “Large cluster results for two parametric multinomial extra variation models.” Journal of the American Statistical Association 93: Wedderburn, R.W.M. (1974) “Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method.” Biometrika 61: Zeger, S.L. and Liang, K.Y. (1986) “Longitudinal data analysis for discrete and continuous outcomes.” Biometrics 42: