Semantic Representations with Probabilistic Topic Models

Slides:



Advertisements
Similar presentations
Critical Reading Strategies: Overview of Research Process
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Information retrieval – LSI, pLSI and LDA
Cross-Corpus Analysis with Topic Models Padhraic Smyth, Mark Steyvers, Dave Newman, Chaitanya Chemudugunta University of California, Irvine New York Times.
STEM Fair Projects.
Probabilistic inference in human semantic memory Mark Steyvers, Tomas L. Griffiths, and Simon Dennis 소프트컴퓨팅연구실오근현 TRENDS in Cognitive Sciences vol. 10,
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled.
Extracting Semantic Representations with Probabilistic Topic Models Mark SteyversUC Irvine Tom Griffiths Padhraic Smyth Dave Newman Brown University UC.
Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Analyzing unstructured text with topic models Mark Steyvers Dep. of Cognitive Sciences & Dep. of Computer Science University of California, Irvine collaborators:
Latent Dirichlet Allocation a generative model for text
Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.
Latent Semantic Analysis Probabilistic Topic Models & Associative Memory.
Topic Models for Semantic Memory and Information Extraction Mark Steyvers Department of Cognitive Sciences University of California, Irvine Joint work.
Probabilistic Topic Models Mark Steyvers Department of Cognitive Sciences University of California, Irvine Joint work with: Tom Griffiths, UC Berkeley.
British Museum Library, London Picture Courtesy: flickr.
Concepts & Categorization. Measurement of Similarity Geometric approach Featural approach  both are vector representations.
Introduction to Language Models Evaluation in information retrieval Lecture 4.
Probabilistic Latent Semantic Analysis
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Scalable Text Mining with Sparse Generative Models
Probabilistic Topic Models Mark Steyvers Department of Cognitive Sciences University of California, Irvine Joint work with: Tom Griffiths, UC Berkeley.
Memory--retrieval. For later... Try to remember these words...
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
The Role of Prior Knowledge in Human Reconstructive Memory Mark Steyvers Pernille Hemmer University of California, Irvine 1.
@ 2012 Wadsworth, Cengage Learning Chapter 11 The Ecology of the Experiment: The Scientist and Research Participant in Relation to Their
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Pattern Recognition & Machine Learning Debrup Chakraborty
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Teacher Page Top Introduction Learner Standards Process Resources Evaluation Conclusion Student page Credits Put the Title of the Lesson Here A WebQuest.
Eric Xing © Eric CMU, Machine Learning Latent Aspect Models Eric Xing Lecture 14, August 15, 2010 Reading: see class homepage.
Integrating Topics and Syntax -Thomas L
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Statistical Modeling of Large Text Collections Padhraic Smyth Department of Computer Science University of California, Irvine MURI Project Kick-off Meeting.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Latent Dirichlet Allocation
Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
CE114 Unit Four Seminar: Psychosocial and Cognitive Development of the Infant.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Experimental Psychology PSY 433 Chapter 5 Research Reports.
CE Unit Four Seminar: Psychosocial and Cognitive Development of the Infant Please chat amongst yourselves, seminar will begin at 9 PM.
Memory. Hopfield Network Content addressable Attractor network.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Topic Modeling for Short Texts with Auxiliary Word Embeddings
What is cognitive psychology?
Online Multiscale Dynamic Topic Models
Experimental Psychology
Multimodal Learning with Deep Boltzmann Machines
Latent Dirichlet Analysis
Michal Rosen-Zvi University of California, Irvine
Junghoo “John” Cho UCLA
Selecting a Sample Notes
Presentation transcript:

Semantic Representations with Probabilistic Topic Models Mark Steyvers Department of Cognitive Sciences University of California, Irvine Joint work with: Tom Griffiths, UC Berkeley Padhraic Smyth, UC Irvine

Topic Models in Machine Learning Unsupervised extraction of content from large text collection Topics provide quick summary of content / gist What is in this corpus? What is in this document, paragraph, or sentence? What are similar documents to a query? What are the topical trends over time?

Topic Models in Psychology Topic models address three computational problems for semantic memory system: Gist extraction: what is this set of words about? Disambiguation: what is the sense of this word? - E.g. “football field” vs. “magnetic field” Prediction: what fact, concept, or word is next?

Two approaches to semantic representation Semantic networks Semantic Spaces BAT LOAN BALL CASH GAME FUN MONEY PLAY BANK THEATER STAGE Semantic networks: Spreading activation: retrieval based on activating neighbors of presented words Semantic spaces: Retrieval based on distance to presented words in semantic space Can be learned from large text corpora: e.g. LSA RIVER STREAM Can be learned (e.g. Latent Semantic Analysis), but is this representation flexible enough? How are these learned?

Overview I Probabilistic Topic Models generative model statistical inference: Gibbs sampling II Explaining human memory word association semantic isolation false memory III Information retrieval

Probabilistic Topic Models Extract topics from large text collections  unsupervised  generative  Bayesian statistical inference Our modeling work is based on: pLSI Model: Hoffman (1999) LDA Model: Blei, Ng, and Jordan (2001, 2003) Topics Model: Griffiths and Steyvers (2003, 2004) Michael Jordan UC Berkeley Thomas Hoffman, Brown University

Model input: “bag of words” Matrix of number of times words occur in documents Note: some function words are deleted: “the”, “a”, “and”, etc documents 1 … 16 MONEY 6 19 5 BANK 12 STREAM 34 RIVER Doc3 … Doc2 Doc1 words Matrix characterizes a simple verbal environment

Probabilistic Topic Models A topic represents a probability distribution over words Related words get high probability in same topic Example topics extracted from NIH/NSF grants: Topics can be combined to form: 1) Grant on Brain Imaging and Schizophrenia 2) Grant on memory performance in Alzheimer patients Probability distribution over words. Most likely words listed at the top

Document = mixture of topics 20% Document ------------------------------- -------------------------------------------------------------- --------------------------------------------------------------------------------------- 80% 100% Document ------------------------------- -------------------------------------------------------------- ---------------------------------------------------------------------------------------

Generative Process For each document, choose a mixture of topics   Dirichlet() Sample a topic [1..T] from the mixture z  Multinomial() Sample a word from the topic w  Multinomial((z))   Dirichlet(β) Nd D T

Prior Distributions Dirichlet priors encourage sparsity on topic mixtures and topics Topic 3 Word 3 Topic 1 Topic 2 Word 1 Word 2 θ ~ Dirichlet( α )  ~ Dirichlet( β ) (darker colors indicate lower probability)

Creating Artificial Dataset Two topics 16 documents Docs Can we recover the original topics and topic mixtures from this data?

Statistical Inference Three sets of latent variables topic mixtures θ word mixtures  topic assignments z Estimate posterior distribution over topic assignments P( z | w ) (we can later infer θ and )

Statistical Inference Exact inference is impossible Use approximate methods: Markov chain Monte Carlo (MCMC) with Gibbs sampling Sum over Tn terms

Gibbs Sampling count of topic t assigned to doc d count of word w assigned to topic t probability that word i is assigned to topic t

Example of Gibbs Sampling Assign word tokens randomly to topics: (●=topic 1; ○=topic 2 )

After 1 iteration Apply sampling equation to each word token: (●=topic 1; ○=topic 2 )

After 4 iterations (●=topic 1; ○=topic 2 )

After 8 iterations (●=topic 1; ○=topic 2 )

After 32 iterations  (●=topic 1; ○=topic 2 )

Algorithm input/output INPUT: word-document counts (word order is irrelevant) OUTPUT: topic assignments to each word P( zi ) likely words in each topic P( w | z ) likely topics in each document (“gist”) P( θ | d )

Software Public-domain MATLAB toolbox for topic modeling on the Web: http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm

Examples Topics from New York Times Terrorism Wall Street Firms Stock Market Bankruptcy SEPT_11 WAR SECURITY IRAQ TERRORISM NATION KILLED AFGHANISTAN ATTACKS OSAMA_BIN_LADEN AMERICAN ATTACK NEW_YORK_REGION NEW MILITARY NEW_YORK WORLD NATIONAL QAEDA TERRORIST_ATTACKS WALL_STREET ANALYSTS INVESTORS FIRM GOLDMAN_SACHS FIRMS INVESTMENT MERRILL_LYNCH COMPANIES SECURITIES RESEARCH STOCK BUSINESS ANALYST WALL_STREET_FIRMS SALOMON_SMITH_BARNEY CLIENTS INVESTMENT_BANKING INVESTMENT_BANKERS INVESTMENT_BANKS WEEK DOW_JONES POINTS 10_YR_TREASURY_YIELD PERCENT CLOSE NASDAQ_COMPOSITE STANDARD_POOR CHANGE FRIDAY DOW_INDUSTRIALS GRAPH_TRACKS EXPECTED BILLION NASDAQ_COMPOSITE_INDEX EST_02 PHOTO_YESTERDAY YEN 10 500_STOCK_INDEX BANKRUPTCY CREDITORS BANKRUPTCY_PROTECTION ASSETS COMPANY FILED BANKRUPTCY_FILING ENRON BANKRUPTCY_COURT KMART CHAPTER_11 FILING COOPER BILLIONS COMPANIES BANKRUPTCY_PROCEEDINGS DEBTS RESTRUCTURING CASE GROUP

Example topics from an educational corpus PRINTING PAPER PRINT PRINTED TYPE PROCESS INK PRESS IMAGE PLAY PLAYS STAGE AUDIENCE THEATER ACTORS DRAMA SHAKESPEARE ACTOR TEAM GAME BASKETBALL PLAYERS PLAYER PLAY PLAYING SOCCER PLAYED JUDGE TRIAL COURT CASE JURY ACCUSED GUILTY DEFENDANT JUSTICE HYPOTHESIS EXPERIMENT SCIENTIFIC OBSERVATIONS SCIENTISTS EXPERIMENTS SCIENTIST EXPERIMENTAL TEST STUDY TEST STUDYING HOMEWORK NEED CLASS MATH TRY TEACHER Example topics from psych review abstracts SIMILARITY CATEGORY CATEGORIES RELATIONS DIMENSIONS FEATURES STRUCTURE SIMILAR REPRESENTATION ONJECTS STIMULUS CONDITIONING LEARNING RESPONSE STIMULI RESPONSES AVOIDANCE REINFORCEMENT CLASSICAL DISCRIMINATION MEMORY RETRIEVAL RECALL ITEMS INFORMATION TERM RECOGNITION LIST ASSOCIATIVE GROUP INDIVIDUAL GROUPS OUTCOMES INDIVIDUALS DIFFERENCES INTERACTION EMOTIONAL EMOTION BASIC EMOTIONS AFFECT STATES EXPERIENCES AFFECTIVE AFFECTS RESEARCH

Choosing number of topics Bayesian model selection Generalization test e.g., perplexity on out-of-sample data Non-parametric Bayesian approach Number of topics grows with size of data E.g. Hierarchical Dirichlet Processes (HDP)

Applications to Human Memory

Computational Problems for Semantic Memory System Gist extraction What is this set of words about? Disambiguation What is the sense of this word? Prediction what fact, concept, or word is next?

Disambiguation “FIELD” “FOOTBALL FIELD” P( zFIELD | w ) MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POLES BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIELD FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POLES BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIELD

Modeling Word Association

Word Association (norms from Nelson et al. 1998) CUE: PLANET

Word Association (norms from Nelson et al. 1998) CUE: PLANET associate number 1 2 3 4 5 6 7 8 people EARTH STARS SPACE SUN MARS UNIVERSE SATURN GALAXY (vocabulary = 5000+ words)

Word Association as a Prediction Problem Given that a single word is observed, predict what other words might occur in that context Under a single topic assumption: Response Cue

Word Association (norms from Nelson et al. 1998) CUE: PLANET associate number 1 2 3 4 5 6 7 8 people EARTH STARS SPACE SUN MARS UNIVERSE SATURN GALAXY model STARS STAR SUN EARTH SPACE SKY PLANET UNIVERSE First associate “EARTH” has rank 4 in model

Median rank of first associate TOPICS

Median rank of first associate LSA TOPICS

Episodic Memory Semantic Isolation Effects False Memory

Semantic Isolation Effect Study this list: PEAS, CARROTS, BEANS, SPINACH, LETTUCE, HAMMER, TOMATOES, CORN, CABBAGE, SQUASH HAMMER, PEAS, CARROTS, ...

Semantic isolation effect / Von Restorff effect Finding: contextually unique words are better remembered Verbal explanations: Attention, surprise, distinctiveness Our approach: assume memories can be accessed and encoded at multiple levels of description Semantic/ Gist aspects – generic information Verbatim – specific information John Anderson Sue Dumais – finding relevant memories – finding relevant documents

Computational Problem How to tradeoff specificity and generality? Remembering detail and gist Dual route topic model = topic model + encoding of specific words

Dual route topic model Two ways to generate words: Topic Model Verbatim word distribution (unique to document) Each word comes from a single route Switch variable xi for every word i: xi =0  topics xi =1  verbatim Conditional prob. of a word under a document:

Graphical Model Variable x is a switch : x=0  sample from topic x=1  sample from verbatim word distribution

Applying Dual Route Topic Model to Human Memory Train model on educational corpus (TASA) 37K documents, 1700 topics Apply model to list memory experiments Study list is a “document” Recall probability based on model

ENCODING RETRIEVAL Study words PEAS CARROTS BEANS SPINACH LETTUCE HAMMER TOMATOES CORN CABBAGE SQUASH verbatim Special

Hunt & Lamb (2001 exp. 1) OUTLIER LIST PEAS CARROTS BEANS SPINACH LETTUCE HAMMER TOMATOES CORN CABBAGE SQUASH CONTROL LIST SAW SCREW CHISEL DRILL SANDPAPER HAMMER NAILS BENCH RULER ANVIL

False Memory (e.g. Deese, 1959; Roediger & McDermott) Study this list: Bed, Rest, Awake, Tired, Dream, Wake, Snooze, Blanket, Doze, Slumber, Snore, Nap, Peace, Yawn, Drowsy SLEEP, BED, REST, ...

False memory effects Number of Associates 3 6 9 (lure = ANGER) MAD FEAR HATE SMOOTH NAVY HEAT SALAD TUNE COURTS CANDY PALACE PLUSH TOOTH BLIND WINTER MAD FEAR HATE RAGE TEMPER FURY SALAD TUNE COURTS CANDY PALACE PLUSH TOOTH BLIND WINTER MAD FEAR HATE RAGE TEMPER FURY WRATH HAPPY FIGHT CANDY PALACE PLUSH TOOTH BLIND WINTER (lure = ANGER) Robinson & Roediger (1997)

Modeling Serial Order Effects in Free Recall

Problem Dual route model predicts no sequential effects Order of words is important in human memory experiments Standard Gibbs sampler is psychologically implausible: Assumes list is processed in parallel Each item can influence encoding of each other item

Semantic isolation experiment to study order effects Study lists of 14 words long 14 isolate lists (e.g. A A A B A A ... A A ) 14 control lists (e.g. A A A A A A ... A A ) Varied serial position of isolate (any of 14 positions)

Immediate Recall Results Control list: A A A A A ... A Isolate list: B A A A A ... A

Immediate Recall Results Control list: A A A A A ... A Isolate list: B A A A A ... A

Immediate Recall Results Control list: A A A A A ... A Isolate list: A B A A A ... A

Immediate Recall Results Control list: A A A A A ... A Isolate list: A A B A A ... A

Immediate Recall Results

Modified Gibbs Sampling Scheme Update items non-uniformly in Gibbs sampler Probability of updating item i after observing words 1..t  Words further back in time are less likely to be re-assigned item to update Current time Parameter

Effect of Sampling Scheme Study order Note that last sampling scheme corresponds to particle filter with one particle  this captures qualitative effects in data *except* for recency effects Note the primacy effect Note the cross-over

Normalized Serial Position Effects DATA MODEL Delayed recall model  decay the counts in the special words route

Information Retrieval & Human Memory

Example Searching for information on Padhraic Smyth:

Query = “Smyth”

Query = “Smyth irish computer science department”

Query = “Smyth irish computer science department weather prediction seasonal climate fluctuations hmm models nips conference consultant yahoo netflix prize dave newman steyvers”

Problem More information in a query can lead to worse search results Human memory typically works better with more cues Problem: how can we better match queries to documents to allow for partial matches, and matches across documents?

Dual route model for information retrieval Encode documents with two routes contextually unique words  verbatim route Thematic words  topics route

Example encoding of a psych review abstract Contextually unique words: ALCOVE, SCHAFFER, MEDIN, NOSOFSKY Kruschke, J. K.. ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99, 22-44. Topic 1 (p=0.21): learning phenomena acquisition learn acquired ... Topic 22 (p=0.17): similarity objects object space category dimensional categories spatial Topic 61 (p=0.08): representations representation order alternative 1st higher 2nd descriptions problem form

Retrieval Experiments For each candidate document, calculate how likely the query was “generated” from the model’s encoding

Information Retrieval Results Evaluation Metric: precision for 10 highest ranked docs APs FRs Method Title Desc Concepts TFIDF .406 .434 .549 LSI .455 .469 .523 LDA .478 .463 .556 SW .488 .468 .561 SWB .495 .473 .558 Method Title Desc Concepts TFIDF .300 .287 .483 LSI .366 .327 .487 LDA .428 .340 SW .448 .407 .560 SWB .459 .400

Information retrieval systems in the mind & web Similar computational demands: Both retrieve the most relevant items from a large information repository in response to external cues or queries. Useful analogies/ interdisciplinary approaches Many cognitive aspects in information retrieval Internet content is produced by humans Queries are formulated by humans

Recent Papers Steyvers, M., Griffiths, T.L., & Dennis, S. (2006). Probabilistic inference in human semantic memory. Trends in Cognitive Sciences, 10(7), 327-334. Griffiths, T.L., Steyvers, M., & Tenenbaum, J.B.T. (2007). Topics in Semantic Representation. Psychological Review, 114(2), 211-244. Griffiths, T.L., Steyvers, M., & Firl, A. (in press). Google and the mind: Predicting fluency with PageRank. Psychological Science. Steyvers, M. & Griffiths, T.L. (in press). Rational Analysis as a Link between Human Memory and Information Retrieval. In N. Chater and M Oaksford (Eds.) The Probabilistic Mind: Prospects from Rational Models of Cognition. Oxford University Press. Chemudugunta, C., Smyth, P., & Steyvers, M. (2007, in press). Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model. In: Advances in Neural Information Processing Systems, 19.

Text Mining Applications

Topics provide quick summary of content Who writes on what topics? What is in this corpus? What is in this document? What are the topical trends over time? Who is mentioned in what context? Michael Jordan UC Berkeley Thomas Hoffman, Brown University

Faculty Browser System spiders UCI/UCSD faculty websites related to CalIT2 = California Institute for Telecommunications and Information Technology Applies topic model on text extracted from pdf files Browser demo: http://yarra.calit2.uci.edu/calit2/

most prolific researchers for this topic one topic most prolific researchers for this topic

topics this researcher works on one researcher topics this researcher works on other researchers with similar topical interests

Inferred network of researchers connected through topics

Analyzing the New York Times 330,000 articles 2000-2002

Extracted Named Entities Three investigations began Thursday into the securities and exchange_commission's choice of william_webster to head a new board overseeing the accounting profession. house and senate_democrats called for the resignations of both judge_webster and harvey_pitt, the commission's chairman. The white_house expressed support for judge_webster as well as for harvey_pitt, who was harshly criticized Thursday for failing to inform other commissioners before they approved the choice of judge_webster that he had led the audit committee of a company facing fraud accusations. “The president still has confidence in harvey_pitt,” said dan_bartlett, bush's communications director … Used standard algorithms to extract named entities: People Places Organizations

Standard Topic Model with Entities

Proportion of words assigned to topic for that time slice Topic Trends Tour-de-France Proportion of words assigned to topic for that time slice Quarterly Earnings 3 topics trends out of 400 Anthrax

Example of Extracted Entity-Topic Network

Prediction of Missing Entities in Text Shares of XXXX slid 8 percent, or $1.10, to $12.65 Tuesday, as major credit agencies said the conglomerate would still be challenged in repaying its debts, despite raising $4.6 billion Monday in taking its finance group public. Analysts at XXXX Investors service in XXXX said they were keeping XXXX and its subsidiaries under review for a possible debt downgrade, saying the company “will continue to face a significant debt burden,'' with large slices of debt coming due, over the next 18 months. XXXX said … Test article with entities removed

Prediction of Missing Entities in Text Shares of XXXX slid 8 percent, or $1.10, to $12.65 Tuesday, as major credit agencies said the conglomerate would still be challenged in repaying its debts, despite raising $4.6 billion Monday in taking its finance group public. Analysts at XXXX Investors service in XXXX said they were keeping XXXX and its subsidiaries under review for a possible debt downgrade, saying the company “will continue to face a significant debt burden,'' with large slices of debt coming due, over the next 18 months. XXXX said … fitch goldman-sachs lehman-brother moody morgan-stanley new-york-stock-exchange standard-and-poor tyco tyco-international wall-street worldco Test article with entities removed Actual missing entities

Prediction of Missing Entities in Text Shares of XXXX slid 8 percent, or $1.10, to $12.65 Tuesday, as major credit agencies said the conglomerate would still be challenged in repaying its debts, despite raising $4.6 billion Monday in taking its finance group public. Analysts at XXXX Investors service in XXXX said they were keeping XXXX and its subsidiaries under review for a possible debt downgrade, saying the company “will continue to face a significant debt burden,'' with large slices of debt coming due, over the next 18 months. XXXX said … fitch goldman-sachs lehman-brother moody morgan-stanley new-york-stock-exchange standard-and-poor tyco tyco-international wall-street worldco wall-street new-york nasdaq securities-exchange-commission sec merrill-lynch new-york-stock-exchange goldman-sachs standard-and-poor Test article with entities removed Actual missing entities Predicted entities given observed words (matches in blue)

Model Extensions

Model Extensions HMM-topics model Modeling aspects of syntax Hierarchical topic model Modeling relations between topics Collocation topic models Learning collocations of words within topics

Hidden Markov Topic Model

Hidden Markov Topics Model Syntactic dependencies  short range dependencies Semantic dependencies  long-range q Semantic state: generate words from topic model z1 z2 z3 z4 w1 w2 w3 w4 Syntactic states: generate words from HMM s1 s2 s3 s4 (Griffiths, Steyvers, Blei, & Tenenbaum, 2004)

Transition between semantic state and syntactic states OF 0.6 FOR 0.3 BETWEEN 0.1 0.8 z = 1 0.4 z = 2 0.6 HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY 0.2 SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK 0.2 RESEARCH 0.2 MATHEMATICS 0.2 0.7 0.3 0.1 0.2 THE 0.6 A 0.3 MANY 0.1 0.9

Combining topics and syntax OF 0.6 FOR 0.3 BETWEEN 0.1 0.8 z = 1 0.4 z = 2 0.6 HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY 0.2 SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK 0.2 RESEARCH 0.2 MATHEMATICS 0.2 0.7 0.3 0.1 0.2 x = 3 THE 0.6 A 0.3 MANY 0.1 0.9 THE ………………………………

Combining topics and syntax OF 0.6 FOR 0.3 BETWEEN 0.1 0.8 z = 1 0.4 z = 2 0.6 HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY 0.2 SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK 0.2 RESEARCH 0.2 MATHEMATICS 0.2 0.7 0.3 0.1 0.2 x = 3 THE 0.6 A 0.3 MANY 0.1 0.9 THE LOVE……………………

Combining topics and syntax OF 0.6 FOR 0.3 BETWEEN 0.1 0.8 z = 1 0.4 z = 2 0.6 HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY 0.2 SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK 0.2 RESEARCH 0.2 MATHEMATICS 0.2 0.7 0.3 0.1 0.2 x = 3 THE 0.6 A 0.3 MANY 0.1 0.9 THE LOVE OF………………

Combining topics and syntax OF 0.6 FOR 0.3 BETWEEN 0.1 0.8 z = 1 0.4 z = 2 0.6 HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY 0.2 SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK 0.2 RESEARCH 0.2 MATHEMATICS 0.2 0.7 0.3 0.1 0.2 x = 3 THE 0.6 A 0.3 MANY 0.1 0.9 THE LOVE OF RESEARCH ……

Semantic topics FOOD FOODS BODY NUTRIENTS DIET FAT SUGAR ENERGY MILK EATING FRUITS VEGETABLES WEIGHT FATS NEEDS CARBOHYDRATES VITAMINS CALORIES PROTEIN MINERALS MAP NORTH EARTH SOUTH POLE MAPS EQUATOR WEST LINES EAST AUSTRALIA GLOBE POLES HEMISPHERE LATITUDE PLACES LAND WORLD COMPASS CONTINENTS DOCTOR PATIENT HEALTH HOSPITAL MEDICAL CARE PATIENTS NURSE DOCTORS MEDICINE NURSING TREATMENT NURSES PHYSICIAN HOSPITALS DR SICK ASSISTANT EMERGENCY PRACTICE BOOK BOOKS READING INFORMATION LIBRARY REPORT PAGE TITLE SUBJECT PAGES GUIDE WORDS MATERIAL ARTICLE ARTICLES WORD FACTS AUTHOR REFERENCE NOTE GOLD IRON SILVER COPPER METAL METALS STEEL CLAY LEAD ADAM ORE ALUMINUM MINERAL MINE STONE MINERALS POT MINING MINERS TIN BEHAVIOR SELF INDIVIDUAL PERSONALITY RESPONSE SOCIAL EMOTIONAL LEARNING FEELINGS PSYCHOLOGISTS INDIVIDUALS PSYCHOLOGICAL EXPERIENCES ENVIRONMENT HUMAN RESPONSES BEHAVIORS ATTITUDES PSYCHOLOGY PERSON CELLS CELL ORGANISMS ALGAE BACTERIA MICROSCOPE MEMBRANE ORGANISM FOOD LIVING FUNGI MOLD MATERIALS NUCLEUS CELLED STRUCTURES MATERIAL STRUCTURE GREEN MOLDS PLANTS PLANT LEAVES SEEDS SOIL ROOTS FLOWERS WATER FOOD GREEN SEED STEMS FLOWER STEM LEAF ANIMALS ROOT POLLEN GROWING GROW

Syntactic classes BE MAKE GET HAVE GO TAKE DO FIND USE SEE HELP KEEP GIVE LOOK COME WORK MOVE LIVE EAT BECOME SAID ASKED THOUGHT TOLD SAYS MEANS CALLED CRIED SHOWS ANSWERED TELLS REPLIED SHOUTED EXPLAINED LAUGHED MEANT WROTE SHOWED BELIEVED WHISPERED THE HIS THEIR YOUR HER ITS MY OUR THIS THESE A AN THAT NEW THOSE EACH MR ANY MRS ALL MORE SUCH LESS MUCH KNOWN JUST BETTER RATHER GREATER HIGHER LARGER LONGER FASTER EXACTLY SMALLER SOMETHING BIGGER FEWER LOWER ALMOST ON AT INTO FROM WITH THROUGH OVER AROUND AGAINST ACROSS UPON TOWARD UNDER ALONG NEAR BEHIND OFF ABOVE DOWN BEFORE GOOD SMALL NEW IMPORTANT GREAT LITTLE LARGE * BIG LONG HIGH DIFFERENT SPECIAL OLD STRONG YOUNG COMMON WHITE SINGLE CERTAIN ONE SOME MANY TWO EACH ALL MOST ANY THREE THIS EVERY SEVERAL FOUR FIVE BOTH TEN SIX MUCH TWENTY EIGHT HE YOU THEY I SHE WE IT PEOPLE EVERYONE OTHERS SCIENTISTS SOMEONE WHO NOBODY ONE SOMETHING ANYONE EVERYBODY SOME THEN

NIPS Semantics NIPS Syntax IMAGE IMAGES OBJECT OBJECTS FEATURE RECOGNITION VIEWS # PIXEL VISUAL DATA GAUSSIAN MIXTURE LIKELIHOOD POSTERIOR PRIOR DISTRIBUTION EM BAYESIAN PARAMETERS STATE POLICY VALUE FUNCTION ACTION REINFORCEMENT LEARNING CLASSES OPTIMAL * MEMBRANE SYNAPTIC CELL * CURRENT DENDRITIC POTENTIAL NEURON CONDUCTANCE CHANNELS EXPERTS EXPERT GATING HME ARCHITECTURE MIXTURE LEARNING MIXTURES FUNCTION GATE KERNEL SUPPORT VECTOR SVM KERNELS # SPACE FUNCTION MACHINES SET NETWORK NEURAL NETWORKS OUPUT INPUT TRAINING INPUTS WEIGHTS # OUTPUTS NIPS Syntax IN WITH FOR ON FROM AT USING INTO OVER WITHIN IS WAS HAS BECOMES DENOTES BEING REMAINS REPRESENTS EXISTS SEEMS SEE SHOW NOTE CONSIDER ASSUME PRESENT NEED PROPOSE DESCRIBE SUGGEST USED TRAINED OBTAINED DESCRIBED GIVEN FOUND PRESENTED DEFINED GENERATED SHOWN MODEL ALGORITHM SYSTEM CASE PROBLEM NETWORK METHOD APPROACH PAPER PROCESS HOWEVER ALSO THEN THUS THEREFORE FIRST HERE NOW HENCE FINALLY # * I X T N - C F P

Random sentence generation LANGUAGE: [S] RESEARCHERS GIVE THE SPEECH [S] THE SOUND FEEL NO LISTENERS [S] WHICH WAS TO BE MEANING [S] HER VOCABULARIES STOPPED WORDS [S] HE EXPRESSLY WANTED THAT BETTER VOWEL

Nested Chinese Restaurant Process

Topic Hierarchies Blei, Griffiths, Jordan, Tenenbaum (2004) In regular topic model, no relations between topics topic 1 topic 2 Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum (2004) Learn hierarchical structure, as well as topics within structure topic 3 topic 4 topic 5 topic 6 topic 7

Example: Psych Review Abstracts THE OF AND TO IN A IS A MODEL MEMORY FOR MODELS TASK INFORMATION RESULTS ACCOUNT SELF SOCIAL PSYCHOLOGY RESEARCH RISK STRATEGIES INTERPERSONAL PERSONALITY SAMPLING MOTION VISUAL SURFACE BINOCULAR RIVALRY CONTOUR DIRECTION CONTOURS SURFACES DRUG FOOD BRAIN AROUSAL ACTIVATION AFFECTIVE HUNGER EXTINCTION PAIN RESPONSE STIMULUS REINFORCEMENT RECOGNITION STIMULI RECALL CHOICE CONDITIONING SPEECH READING WORDS MOVEMENT MOTOR VISUAL WORD SEMANTIC ACTION SOCIAL SELF EXPERIENCE EMOTION GOALS EMOTIONAL THINKING GROUP IQ INTELLIGENCE SOCIAL RATIONAL INDIVIDUAL GROUPS MEMBERS SEX EMOTIONS GENDER EMOTION STRESS WOMEN HEALTH HANDEDNESS REASONING ATTITUDE CONSISTENCY SITUATIONAL INFERENCE JUDGMENT PROBABILITIES STATISTICAL IMAGE COLOR MONOCULAR LIGHTNESS GIBSON SUBMOVEMENT ORIENTATION HOLOGRAPHIC CONDITIONIN STRESS EMOTIONAL BEHAVIORAL FEAR STIMULATION TOLERANCE RESPONSES

Generative Process THE OF AND TO IN A IS A MODEL MEMORY FOR MODELS TASK INFORMATION RESULTS ACCOUNT SELF SOCIAL PSYCHOLOGY RESEARCH RISK STRATEGIES INTERPERSONAL PERSONALITY SAMPLING MOTION VISUAL SURFACE BINOCULAR RIVALRY CONTOUR DIRECTION CONTOURS SURFACES DRUG FOOD BRAIN AROUSAL ACTIVATION AFFECTIVE HUNGER EXTINCTION PAIN RESPONSE STIMULUS REINFORCEMENT RECOGNITION STIMULI RECALL CHOICE CONDITIONING SPEECH READING WORDS MOVEMENT MOTOR VISUAL WORD SEMANTIC ACTION SOCIAL SELF EXPERIENCE EMOTION GOALS EMOTIONAL THINKING GROUP IQ INTELLIGENCE SOCIAL RATIONAL INDIVIDUAL GROUPS MEMBERS SEX EMOTIONS GENDER EMOTION STRESS WOMEN HEALTH HANDEDNESS REASONING ATTITUDE CONSISTENCY SITUATIONAL INFERENCE JUDGMENT PROBABILITIES STATISTICAL IMAGE COLOR MONOCULAR LIGHTNESS GIBSON SUBMOVEMENT ORIENTATION HOLOGRAPHIC CONDITIONIN STRESS EMOTIONAL BEHAVIORAL FEAR STIMULATION TOLERANCE RESPONSES

Collocation Topic Model

What about collocations? Why are these words related? PLAY - GROUND DOW - JONES BUMBLE - BEE Suggests at least two routes for association: Semantic Collocation  Integrate collocations into topic model

Collocation Topic Model TOPIC MIXTURE If x=0, sample a word from the topic If x=1, sample a word from the distribution based on previous word ... TOPIC TOPIC TOPIC WORD WORD WORD ... X X ...

Collocation Topic Model Example: “DOW JONES RISES” JONES is more likely explained as a word following DOW than as word sampled from topic Result: DOW_JONES recognized as collocation TOPIC MIXTURE ... TOPIC TOPIC DOW JONES RISES ... X=1 X=0 ...

Examples Topics from New York Times Terrorism Wall Street Firms Stock Market Bankruptcy SEPT_11 WAR SECURITY IRAQ TERRORISM NATION KILLED AFGHANISTAN ATTACKS OSAMA_BIN_LADEN AMERICAN ATTACK NEW_YORK_REGION NEW MILITARY NEW_YORK WORLD NATIONAL QAEDA TERRORIST_ATTACKS WALL_STREET ANALYSTS INVESTORS FIRM GOLDMAN_SACHS FIRMS INVESTMENT MERRILL_LYNCH COMPANIES SECURITIES RESEARCH STOCK BUSINESS ANALYST WALL_STREET_FIRMS SALOMON_SMITH_BARNEY CLIENTS INVESTMENT_BANKING INVESTMENT_BANKERS INVESTMENT_BANKS WEEK DOW_JONES POINTS 10_YR_TREASURY_YIELD PERCENT CLOSE NASDAQ_COMPOSITE STANDARD_POOR CHANGE FRIDAY DOW_INDUSTRIALS GRAPH_TRACKS EXPECTED BILLION NASDAQ_COMPOSITE_INDEX EST_02 PHOTO_YESTERDAY YEN 10 500_STOCK_INDEX BANKRUPTCY CREDITORS BANKRUPTCY_PROTECTION ASSETS COMPANY FILED BANKRUPTCY_FILING ENRON BANKRUPTCY_COURT KMART CHAPTER_11 FILING COOPER BILLIONS COMPANIES BANKRUPTCY_PROCEEDINGS DEBTS RESTRUCTURING CASE GROUP