Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth.

Slides:



Advertisements
Similar presentations
Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Advertisements

Topic models Source: Topic models, David Blei, MLSS 09.
Xiaolong Wang and Daniel Khashabi
Hierarchical Dirichlet Process (HDP)
Information retrieval – LSI, pLSI and LDA
Hierarchical Dirichlet Processes
Community Detection with Edge Content in Social Media Networks Paper presented by Konstantinos Giannakopoulos.
Title: The Author-Topic Model for Authors and Documents
1 Multi-topic based Query-oriented Summarization Jie Tang *, Limin Yao #, and Dewei Chen * * Dept. of Computer Science and Technology Tsinghua University.
An Introduction to LDA Tools Kuan-Yu Chen Institute of Information Science, Academia Sinica.
Probabilistic Clustering-Projection Model for Discrete Data
{bojan.furlan, jeca, 1/42 Probabilistic Graphical Models For Text Mining: A Topic Modeling Survey V. Jelisavčić*, B.
Statistical Topic Modeling part 1
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
Generative Topic Models for Community Analysis
Caimei Lu et al. (KDD 2010) Presented by Anson Liang.
Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.
Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.
Latent Dirichlet Allocation a generative model for text
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference (Sec. )
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
British Museum Library, London Picture Courtesy: flickr.
Models for Authors and Text Documents Mark Steyvers UCI In collaboration with: Padhraic Smyth (UCI) Michal Rosen-Zvi (UCI) Thomas Griffiths (Stanford)
LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.
Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine.
Introduction to Machine Learning for Information Retrieval Xiaolong Wang.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Online Learning for Latent Dirichlet Allocation
(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)
Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
27. May Topic Models Nam Khanh Tran L3S Research Center.
Finding the Hidden Scenes Behind Android Applications Joey Allen Mentor: Xiangyu Niu CURENT REU Program: Final Presentation 7/16/2014.
Eric Xing © Eric CMU, Machine Learning Latent Aspect Models Eric Xing Lecture 14, August 15, 2010 Reading: see class homepage.
Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Storylines from Streaming Text The Infinite Topic Cluster Model Amr Ahmed, Jake Eisenstein, Qirong Ho Alex Smola, Choon Hui Teo, Eric Xing Carnegie Mellon.
 Goal recap  Implementation  Experimental Results  Conclusion  Questions & Answers.
Topic Modeling using Latent Dirichlet Allocation
Latent Dirichlet Allocation
Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal:
Techniques for Dimensionality Reduction
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Analysis of Social Media MLD , LTI William Cohen
Latent Feature Models for Network Data over Time Jimmy Foulds Advisor: Padhraic Smyth (Thanks also to Arthur Asuncion and Chris Dubois)
Analysis of Social Media MLD , LTI William Cohen
Inferring User Interest Familiarity and Topic Similarity with Social Neighbors in Facebook INSTRUCTOR: DONGCHUL KIM ANUSHA BOOTHPUR
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
CSRS: A Context and Sequence Aware Recommendation System
The topic discovery models
The topic discovery models
Trevor Savage, Bogdan Dit, Malcom Gethers and Denys Poshyvanyk
Topic and Role Discovery In Social Networks
Latent Dirichlet Analysis
The topic discovery models
Bayesian Inference for Mixture Language Models
Stochastic Optimization Maximization for Latent Variable Models
Topic models for corpora and for graphs
Michal Rosen-Zvi University of California, Irvine
Latent Dirichlet Allocation
Topic models for corpora and for graphs
Topic Models in Text Processing
Hierarchical Relational Models for Document Networks
Presentation transcript:

Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth

Motivation Networks often have text associated with them – Citation networks – – Social media Can we leverage this text information for better prediction and sociological understanding?

Statistical Latent Variable Models Find low-dimensional representations of the data Conditional independence assumptions improve tractability of inference Unifying view: probabilistic matrix factorization Y ∼ f(Λ) Λ Z W = N D N K K D

Latent Variable Models for Text Latent Dirichlet allocation (Blei et al. 2003) is a generative model for text Documents are associated with distributions over topics θ d Topics are distributions over words φ i Each word w d,n is associated with a latent topic variable z d,n For each document d Draw topic proportions θ d ~ Dirichlet(α) For each word w d,n Draw a topic assignment z d,n ~ Discrete(θ d ) Draw a word from the chosen topic w d,n ~ Discrete(φ Zd,n )

Latent Variable Models for Text Latent Dirichlet allocation (Blei et al. 2003) can be thought of as a latent variable model in a matrix factorization framework Documents are represented by latent distributions over topics θ d Λ θ φ = Documents Words Probability distributions over words for each document Probability distributions over topics Probability distributions over words Documents Words Topics

Latent Variable Models for Networks Find low-dimensional representations of the data Conditional independence assumptions improve tractability of inference Unifying view: probabilistic matrix factorization Eg MMSB (Airoldi et al. 2008), LFRM (Miller et al. 2009), RTM (Chang and Blei 2009), Latent Factor Model (Hoff et al. 2002)… Y ∼ f(Λ) Λ Z ZTZT = N N N K K N W K K

Relational Topic Model (Chang and Blei 2009) A latent variable model for networks of documents, eg citation networks Text is associated with the nodes Documents are generated via LDA The probability of a link between two documents is a function of their latent topic assignments Pr(Y ij =1) = ψ(z i, z j )

The Nonparametric Latent Feature Relational Model (Miller et al. 2009) A C B Cycling Fishing Running Waltz Running Tango Salsa CyclingFishingRunningTangoSalsaWaltz A B C Z =

The Nonparametric Latent Feature Relational Model (Miller et al. 2009) A C B Cycling Fishing Running Waltz Running Tango Salsa Pr(Y bc =1) =  (Z b WZ c T ) -  +  Y ∼ Entry-wise Bernoulli(  (ZWZ T )) =  (W Tango, Waltz + W Tango, Running + …)

LFRM-LDA A novel latent variable model for networks with text on the edges, eg data Associates LFRM features with LDA topics Generate the network via LFRM Generate the document on each edge via LDA The prior for the document on each edge’s distribution over topics is a function of the sender and receiver’s latent features

LFRM-LDA: Discussion A model for networks with text associated with edges Associates LFRM latent features with LDA topics The model can be learned via standard blocked Gibbs sampling techniques Can answer queries such as “who is the likely recipient of this , given the sender and its text”. LFRM features are associated with topics, which may be interpretable, allowing us to recover their semantics Future work is an experimental analysis of this model

Thanks!