Latent Feature Models for Network Data over Time Jimmy Foulds Advisor: Padhraic Smyth (Thanks also to Arthur Asuncion and Chris Dubois)

Slides:

Advertisements

Similar presentations

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei

Advertisements

Xiaolong Wang and Daniel Khashabi

Hierarchical Dirichlet Process (HDP)

Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.

Sharing Features among Dynamical Systems with Beta Processes

Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections

An Introduction to Variational Methods for Graphical Models.

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Variational Inference for Dirichlet Process Mixture Daniel Klein and Soravit Beer Changpinyo October 11, 2011 Applied Bayesian Nonparametrics Special Topics.

Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.

Lecture 3: Markov processes, master equation

Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.

Beam Sampling for the Infinite Hidden Markov Model Van Gael, et al. ICML 2008 Presented by Daniel Johnson.

Hilbert Space Embeddings of Hidden Markov Models Le Song, Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1.

Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth.

Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.

Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.

Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.

Gaussian Mixture Example: Start After First Iteration.

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..

A Unifying Review of Linear Gaussian Models

Gaussian Mixture Model and the EM algorithm in Speech Recognition

6. Experimental Analysis Visible Boltzmann machine with higher-order potentials: Conditional random field (CRF): Exponential random graph model (ERGM):

Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Popularity versus Similarity in Growing Networks Fragiskos Papadopoulos Cyprus University of Technology M. Kitsak, M. Á. Serrano, M. Boguñá, and Dmitri.

Fast Max–Margin Matrix Factorization with Data Augmentation Minjie Xu, Jun Zhu & Bo Zhang Tsinghua University.

Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.

Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.

2004 All Hands Meeting Analysis of a Multi-Site fMRI Study Using Parametric Response Surface Models Seyoung Kim Padhraic Smyth Hal Stern (University of.

Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.

Graphical Models for Machine Learning and Computer Vision.

Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.

Variational Inference for the Indian Buffet Process

Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.

Bayesian Generalized Kernel Mixed Models Zhihua Zhang, Guang Dai and Michael I. Jordan JMLR 2011.

1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.

Stick-Breaking Constructions

CS Statistical Machine learning Lecture 24

CSC321: Neural Networks Lecture 16: Hidden Markov Models

The Infinite Hierarchical Factor Regression Model Piyush Rai and Hal Daume III NIPS 2008 Presented by Bo Chen March 26, 2009.

Statistical Models for Partial Membership Katherine Heller Gatsby Computational Neuroscience Unit, UCL Sinead Williamson and Zoubin Ghahramani University.

Lecture 2: Statistical learning primer for biologists

Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan.

by Ryan P. Adams, Iain Murray, and David J.C. MacKay (ICML 2009)

Gaussian Processes For Regression, Classification, and Prediction.

Stick-breaking Construction for the Indian Buffet Process Duke University Machine Learning Group Presented by Kai Ni July 27, 2007 Yee Whye The, Dilan.

CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Analysis of Social Media MLD , LTI William Cohen

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

The Phylogenetic Indian Buffet Process : A Non- Exchangeable Nonparametric Prior for Latent Features By: Kurt T. Miller, Thomas L. Griffiths and Michael.

Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:

Introduction to ERGM/p* model Kayo Fujimoto, Ph.D. Based on presentation slides by Nosh Contractor and Mengxiao Zhu.

Analysis of Social Media MLD , LTI William Cohen

Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.

CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.

Introduction to Sampling based inference and MCMC

An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism

Nonparametric Bayesian Learning of Switching Dynamical Processes

Accelerated Sampling for the Indian Buffet Process

Nonparametric Latent Feature Models for Link Prediction

A Non-Parametric Bayesian Method for Inferring Hidden Causes

Markov Networks.

Estimating Networks With Jumps

Topic models for corpora and for graphs

Expectation-Maximization & Belief Propagation

Topic models for corpora and for graphs

Presentation transcript:

Latent Feature Models for Network Data over Time Jimmy Foulds Advisor: Padhraic Smyth (Thanks also to Arthur Asuncion and Chris Dubois)

Overview The task Prior work – Miller, Van Gael, Indian Buffet Processes The DRIFT model Inference Preliminary results Future work

The Task Modeling Dynamic (time-varying) Social Networks  Interested in prediction  Model interpretation for sociological understanding  Continuous time relational events versus panel data?

Applications Predicting Communications

Applications Predicting Paper Co-authorship  NIPS data

Prior Work Erdos-Renyi Models are “pseudo-dynamic” Continuous Markov Process Models (Snijders 2006)  The network stochastically optimizes ERGM likelihood function Dynamic Latent Space Model (Sarkar & Moore, 2005)  Each node (actor) is associated with a point in a low dimensional space (Raftery et al. 2002). Link probability is a function of distance between points  Gaussian jumps in latent space in each timestep

Prior Work Nonparametric Latent Feature Relational Model (Miller et al. 2009)  Each actor is associated with an unbounded sparse vector of binary latent features, generated from an Indian Buffet Process prior  The probability of a link between two actors is a function of the latent features of those actors (and additional covariates)

Prior Work Nonparametric Latent Feature Relational Model (Miller et al. 2009) generative process:  Z ~ IBP   W kk' ~ N(0,  w )  Y ij ~  (Z i WZ j  + covariate terms ) A kind of blockmodel with overlapping classes

How to Make this Model Dynamic For Longitudinal Data? We would like the Zs to change over time, modeling changing interests, community memberships, … Want to maintain sparsity property, but model persistence, generation of new features,...

Infinite Factorial Hidden Markov Models (Van Gael et al., 2010) A variant of the IBP  A probability distribution over a potentially infinite number of binary Markov chains  Sparsity: At each timestep, introduce new features using the IBP distribution  Persistence: A coin flip determines whether each feature persists to the next timestep  Hidden Markov structure: the latent features are hidden but we observe something at each timestep.

DRIFT: the Dynamic Relational Infinite FeaTure Model The iFHMM models the evolution of one actor's features over time We use an iFHMM for each actor, but share the transition probabilities Observed graphs generated via (Miller et al. 2009)'s latent feature model Y ij ~  (Z i WZ j  +...)

DRIFT: the Dynamic Relational Infinite FeaTure Model

Inference Markov chain Monte Carlo inference  Use “slice sampling” trick with the stick- breaking construction of the IBP to effectively truncate num features but still perform exact inference  Blocked Gibbs sampling on the other variables Forward-backward dynamic programming on each actor's feature chain Metropolis-Hastings updates for W's since non- conjugate

Group DRIFT Clustering to reduce the number of chains  Each actor has hidden class variable c < C < N  The chains of infinite binary feature vectors are associated with classes rather than actors  Allows us to scale up to large numbers of actors  Clustering may be interpretable

i=1:C CnCn n=1:N β=1/C Group DRIFT Inference for a, b, exactly the same Inference for z’s similar: Slightly different “emission” probability Run forward-backward sampler on M*C chains rather than M*N chains Inference for c’s (actor’s assignment to specific chain) is easy too Inference for W is similar (slightly different likelihood). Note we must now assume that the diagonal of W can be non-zero.

Preliminary Experimental Results (Synthetic Data)

Future work Extension to Continuous Time  It's easy to use IBP latent factor model as a covariate in Relational Event Model (Butts 2008)  How to model the Zs changing over time for continuous data?

Thanks for Listening!