Statistical Models for Partial Membership Katherine Heller Gatsby Computational Neuroscience Unit, UCL Sinead Williamson and Zoubin Ghahramani University.

Statistical Models for Partial Membership Katherine Heller Gatsby Computational Neuroscience Unit, UCL Sinead Williamson and Zoubin Ghahramani University of Cambridge

Partial Membership Example: Person with mixed ethnic background.  Someone who is 50% Asian and 50% European partly belongs to 2 different groups (ethnicities).  This partial membership may be relevant for predicting this person’s phenotype or food preferences. Conceptually not the same as uncertain membership.  Being certain that someone is half Asian and half European is very different than being unsure of their ethnicity.  More evidence (like DNA tests) can help resolve uncertainty but will not change their ethnicity memberships. Work on modeling partial membership by fuzzy logic community

Outline Goal: Describe a fully probabilistic approach to data modeling with partial memberships. Introduction Bayesian Partial Membership Model (BPM) BPM Learning Experiments  Synthetic  Senate Roll Call data Related Work Conclusions  Nonparametric Extension?

Finite Mixture Models Generative Process: where : Consider modeling a data set,, using a finite mixture of K components… and denote memberships of data points to clusters! 1) Choose a cluster 2) Generate a data point from that cluster

denote memberships of data points to clusters!denote partial memberships of data points to clusters! Finite Mixture Models where : and Continuous Relaxation

Why does this make sense? If there is an “Asian” cluster and a “European” cluster, the partial membership model will better capture people with mixed ethnicity, whose features lie in between. Partial Membership Mixture Model (1,0) (0,1) (.5,.5)

Exponential Family Distributions Sufficient Statistics: Conjugate prior can be written as: Lets consider the case where: Natural Parameters: It follows that:

Bayesian Partial Membership Model Generative Process: For each k: For each n: Ethnicity Example: Defines a distribution over features for each of k ethnic groups Defines ethnic composition of the population Controls how similar to the population an individual is expected to be Ethnic composition of individual n Feature values of individual n

Bayesian Partial Membership Model Generative Process: For each k: For each n:

BPM Sampled Data Each of the four plots shows 3000 data points drawn from the BPM with the same 3 full-covariance Gaussian clusters.

BPM Theory Lemma 1 In the limit as a  0 the exponential family BPM model is a mixture of K components with mixing proportions Lemma 2 In the limit as a  the exponential family BPM model has only one component with natural parameters

BPM Learning Want to infer all unknowns given X: We treat as fixed hyperparameters: Goal: Infer using MCMC All parameters in the BPM are continuous so we can use Hybrid Monte Carlo.  Hybrid Monte Carlo is an efficient MCMC method that uses gradient information to find high probability regions.

Synthetic Data Generated synthetic binary data set of 50 data points, 32 dimensions, and 3 clusters. Ran HMC sampler for 4000 iterations. Computed: is the true generated matrix andwhereis sampled.

Senate Roll Call Data (2001-2002) (99 senators + 1 outcome) x 633 votes K=2 multivariate Bernoulli clusters Model adapted to handle missing data

Senate Roll Call Comparisons Fuzzy K-means: Blue: Senator Schumer Black: “Outcome” Red: Senator Ensign Partial membership values are very sensitive to exponent For no value of do the membership values make sense

Senate Roll Call Comparisons Dirichlet Process Mixtures: DPM confidently infers 4 clusters Uncertainty is not a good substitute for partial membership 18716893422224 196178112412245 DPM BPM MeanMedianMinMax“Outcome” Negative log predictive probability (in bits) across senators

Image Data 329 Tower and Sunset Images with 240 simple binary texture and color features and K=2 clusters.

Related Work Latent Dirichlet Allocation (LDA)  Mixed Membership Models Fuzzy Clustering Exponential Family PCA

Future Work Would be nice to have a nonparametric version. Obvious thing to try: Hierarchical Dirichlet Processes. But this would require summing over all infinitely many elements of, which isn’t computationally feasible. Also semantically not very nice. Indian Buffet Processes might work. Sample an IBP matrix with interpretation that a 1 means having some non-zero amount of membership in that cluster, then draw continuous exact amount separately.

Conclusions Developed a fully probabilistic approach to data modeling with partial membership. Uses continuous latent variables and can be seen as a relaxation of clustering with standard mixture models. Used Hybrid Monte Carlo for inference which was extremely fast (finding sensible partial membership structure after very few samples).

Thank You

Partial Membership Cornerstone of fuzzy set theory  Traditional set theory: Items belong to a set or they don’t {0,1}.  Fuzzy set theory: membership function where denotes the degree to which belongs to set Fuzzy logic versus probabilistic models  Misguided arguments that fuzzy logic is different or supercedes probability theory.  While it might be easy to dismiss fuzzy logic, its framework for representing partial membership has inspired many researchers.  Google Scholar: Over 45,000 fuzzy clustering papers. Most cited papers cited as frequently as most cited “NIPS” area papers.

Related Work - Latent Dirichlet Allocation (LDA) and Mixed Membership Models BPM generates data points at the document level of LDA (no word plate). Whereas LDA (or Mixed Membership models) assume words (or attributes) are drawn using as mixing proportions in a mixture model, and are factorized, the BPM uses to form a convex combination of natural parameters. Attributes not drawn from mixture model and need not be factorized. BPM - potentially faster MCMC sampling since BPM has all continuous parameters and LDA must infer a discrete topic assignment for each word.

Mixed Membership Model Generation

Related Work: Fuzzy Clustering Fuzzy k-means iteratively minimizes the following objective: where d is the distance between a data point and a cluster center, is the degree of membership of a data point in a cluster, and controls the amount of partial membership ( =1 is normal k-means) None of these variables have probabilistic interpretations.

Related Work: Exponential Family PCA Originally formulated in terms of Bregman divergences, it can be seen as a non-Bayesian version of the BPM where the s are not constrained (to normalize to 1 or be positive).  Not a convex combination of natural parameters with the same sort of partial membership interpretation. If we wanted we could relax these same constraints to get a Bayesian version of Exponential Family PCA, but we’d have to tweak the model e.g. a Gaussian prior on.

Hybrid Monte Carlo is an MCMC method that uses gradient information. Hybrid Monte Carlo simulates dynamics of a system with continuous state variable on an energy function: provide forces on the state variables which encourage the system to find high probability regions, while maintaining detailed balance. BPM Learning

Bregman Divergence F is a strictly convex function, p and q are points Intuitively the difference between the value of F at p and the value of the first order Taylor expansion of F around q, evaluated at p.

LDA Review 1. for z=1…K, Draw 2. For d=1…D, a) Draw b) for n=1…N d i. Draw ii. Draw, hyperparameters, multinomial parameters for topics multinomial parameters for words given topics, words, topics - # topics - # words in doc - # documents

Statistical Models for Partial Membership Katherine Heller Gatsby Computational Neuroscience Unit, UCL Sinead Williamson and Zoubin Ghahramani University.

Similar presentations

Presentation on theme: "Statistical Models for Partial Membership Katherine Heller Gatsby Computational Neuroscience Unit, UCL Sinead Williamson and Zoubin Ghahramani University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistical Models for Partial Membership Katherine Heller Gatsby Computational Neuroscience Unit, UCL Sinead Williamson and Zoubin Ghahramani University.

Similar presentations

Presentation on theme: "Statistical Models for Partial Membership Katherine Heller Gatsby Computational Neuroscience Unit, UCL Sinead Williamson and Zoubin Ghahramani University."— Presentation transcript:

Similar presentations

About project

Feedback