Topic Models Discovering Annotating Comparing Referring Sampling Illustrating Representing John Unsworth, “Scholarly Primitives” “Scholarly Primitives”

Slides:



Advertisements
Similar presentations
Yansong Feng and Mirella Lapata
Advertisements

Topic models Source: Topic models, David Blei, MLSS 09.
Dr Ronni Michelle Greenwood Autumn  Introduction  Method  Results  Discussion.
Title: The Author-Topic Model for Authors and Documents
Joint Sentiment/Topic Model for Sentiment Analysis Chenghua Lin & Yulan He CIKM09.
MIS 696A Final Presentation Victor Benjamin, Joey Buckman, Xiaobo Cao, Weifeng Li, Zirun Qi, Lee Spitzley, Yun Wang, Rich Yueh.
Topic Extraction From Turkish News Articles Anıl Armağan Fuat Basık Fatih Çalışır Arif Usta.
Caimei Lu et al. (KDD 2010) Presented by Anson Liang.
Topic Modeling with Network Regularization Md Mustafizur Rahman.
Latent Dirichlet Allocation a generative model for text
Unsupervised discovery of visual object class hierarchies Josef Sivic (INRIA / ENS), Bryan Russell (MIT), Andrew Zisserman (Oxford), Alyosha Efros (CMU)
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.
1. Social-Network Analysis Using Topic Models 2. Web Event Topic Analysis by Topic Feature Clustering and Extended LDA Model RMBI4310/COMP4332 Big Data.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
Background 30 seconds 5 minutes 24 minutes 54 minutes 1  m 2.5 hours Systematic attempts to measure partisan bias in the media tended to focus on estimating.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Example 16,000 documents 100 topic Picked those with large p(w|z)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
CONCLUSION & FUTURE WORK Normally, users perform triage tasks using multiple applications in concert: a search engine interface presents lists of potentially.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
Duane Searsmith Automated Learning Group National Center for Supercomputing Applications University of Illinois Office: (217)
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
27. May Topic Models Nam Khanh Tran L3S Research Center.
Integrating Topics and Syntax -Thomas L
Unsupervised Learning of Visual Sense Models for Polysemous Words Kate Saenko Trevor Darrell Deepak.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Building Topic Models in a Federated Digital Library Through Selective Document Exclusion ASIST 2011 New Orleans, LA October 10, 2011 Miles Efron Peter.
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Topic Modeling using Latent Dirichlet Allocation
Co-funded by the European Union WeKnowIt Emerging, Collective Intelligence for personal, organisational and social use Event Detection.
Latent Dirichlet Allocation
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Abdul Wahid, Xiaoying Gao, Peter Andreae
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
English Lesson: FOOD. Introduction - Level: 5° Grade class. - Topic of the Unit: Food (4 th Unit) - Duration: 45 min. - Materials: Worksheets, Pictures,
Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal:
Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.
Link Distribution on Wikipedia [0407]KwangHee Park.
Automatic Labeling of Multinomial Topic Models
Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
DO NOW – 9/29/15 Answer the following question with 2-3 complete sentences in Cornell Notes. 1)If you could go back in time, what would you change about.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Understanding unstructured texts via Latent Dirichlet Allocation Raphael Cohen DSaaS, EMC IT June 2015.
Making Sense of Large Volumes of Unstructured Responses K. M. P. N. Jayathilaka Department of Statistics University of Colombo.
Hierarchical Clustering & Topic Models
Extracting Mobile Behavioral Patterns with the Distant N-Gram Topic Model Lingzi Hong Feb 10th.
Kuifei Yu, Baoxian Zhang, Hengshu Zhu,Huanhuan Cao, and Jilei Tian
The topic discovery models
Measuring Sustainability Reporting using Web Scraping and Natural Language Processing Alessandra Sozzi
The topic discovery models
Understanding Connections: Amazon Customer Reviews
Machine Learning in Practice Lecture 11
Matching Words with Pictures
Digital Humanities 567/764 Shawn Martin
The topic discovery models
Topic Modeling Nick Jordan.
Michal Rosen-Zvi University of California, Irvine
Topic Models in Text Processing
Text Categorization Berlin Chen 2003 Reference:
Unsupervised learning of visual sense models for Polysemous words
Presentation transcript:

Topic Models Discovering Annotating Comparing Referring Sampling Illustrating Representing John Unsworth, “Scholarly Primitives” “Scholarly Primitives” (2000) Digitizing (and OCR) Collecting (and cleaning: “scrubbing,” “wrangling,” “munging,” etc.) Organizing (Clustering, Classifying, etc.) topic modeling Evidentiary Primitives Narrating Arguing Reframing (altering the context, etc.) Hermeneutic Primitives

Topic Models Idea of Topic Modeling (simplified) Generally: an “unsupervised” method of creating a simplified representation of a body of materials. Specifically: an unsupervised method of representing a corpus as a set of topics (a distribution over a set of topics)

Topic Models Goals of Topic Modeling (simplified) Clustering Hypothesis Forming Exploring Verifying/Proving?

Topic Models Logical Process of Topic Modeling (simplified) Edwin Chen "Introduction to Latent Dirichlet Allocation" (2011) Suppose you have the following set of sentences: 1.I like to eat broccoli and bananas. 2.I ate a banana and spinach smoothie for breakfast. 3.Chinchillas and kittens are cute. 4.My sister adopted a kitten yesterday. 5.Look at this cute hamster munching on a piece of broccoli. What is latent Dirichlet allocation? It’s a way of automatically discovering topics that these sentences contain. For example, given these sentences and asked for 2 topics, LDA might produce something like Sentences 1 and 2: 100% Topic A Sentences 3 and 4: 100% Topic B Sentence 5: 60% Topic A, 40% Topic B Topic A: 30% broccoli, 15% bananas, 10% breakfast, 10% munching, … (at which point, you could interpret topic A to be about food) Topic B: 20% chinchillas, 20% kittens, 20% cute, 15% hamster, … (at which point, you could interpret topic B to be about cute animals) The question, of course, is: how does LDA perform this discovery?

Topic Models Logical Process of Topic Modeling (simplified) Ted Underwood "Topic Modeling Made Just Simple Enough" (2012)

Topic Models Logical Process of Topic Modeling (even more simplified) Treat a document or set of documents as a “bag of words.” Use the Latent Dirichlet Allocation (LDA) algorithm to hypothesize the generation of the documents from sub-”bags” of words (“topics”) that tend to collocate (by means of the MALLET -- MAchine Learning for LanguagE Toolkit): Show each topic: For each topic, show the 10 or so words belonging to the topic that occur most frequently. Visualize in a word-cloud (or by other means). Assume that a “topic” (sub-bag of words) is a “theme.” Andrew Goldstone's interface (Dfr-Browser) for browsing topic models created from JSTOR journals Topic model of PMLA,1889–2007

Topic Models(plus other text analysis)

Topic Models Hermeneutical Moves at a Low Level in Topic Modeling Process (not covered in today’s workshop) Scrubbing and Creating a Stop Word List Chunking Texts Predefining the Number of Topics to Look for

Topic Models Matt Burton, "The Joy of Topic Modeling": “… the brown squiggles along the bottom represent a vocabulary of words and the grey peaks represent individual word’s probability density…. The list of top words, words that are “heavy” with more probabilistic mass, are the interesting group of words to examine because they are the co-occurring words in that topic distribution.”

Topic Models – A Probabilistic Universe

Boris Tomashevsky’s example of a narrative motif (theme) (“Thematics,” 1925): “Raskolnikov kills the old woman” Probablistic rewriting: “There is a 74% chance that in this document Raskolnikov kills (82%) / wounds (15%) / ignores (3%) the old woman (68%) / young woman (23%) / other (9%).”