2005/09/13 A Probabilistic Model for Retrospective News Event Detection Zhiwei Li, Bin Wang*, Mingjing Li, Wei-Ying Ma University of Science and Technology.

Slides:



Advertisements
Similar presentations
DISCOVERING EVENT EVOLUTION GRAPHS FROM NEWSWIRES Christopher C. Yang and Xiaodong Shi Event Evolution and Event Evolution Graph: We define event evolution.
Advertisements

A probabilistic model for retrospective news event detection
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Mixture Models and the EM Algorithm
Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR.
Expectation Maximization
K Means Clustering , Nearest Cluster and Gaussian Mixture
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Patch to the Future: Unsupervised Visual Prediction
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Statistical Topic Modeling part 1
K-means clustering Hongning Wang
Systems Engineering and Engineering Management The Chinese University of Hong Kong Parameter Free Bursty Events Detection in Text Streams Gabriel Pui Cheong.
Visual Recognition Tutorial
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
The use of unlabeled data to improve supervised learning for text summarization MR Amini, P Gallinari (SIGIR 2002) Slides prepared by Jon Elsas for the.
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.
Latent Dirichlet Allocation a generative model for text
Incremental Learning of Temporally-Coherent Gaussian Mixture Models Ognjen Arandjelović, Roberto Cipolla Engineering Department, University of Cambridge.
Clustering.
Expectation Maximization Algorithm
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Visual Recognition Tutorial
What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Scalable Text Mining with Sparse Generative Models
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
EM and expected complete log-likelihood Mixture of Experts
Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty Gabrilovich et.al WWW2004.
Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.
Combining Statistical Language Models via the Latent Maximum Entropy Principle Shaojum Wang, Dale Schuurmans, Fuchum Peng, Yunxin Zhao.
Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee and Keiichi.
Learning Geographical Preferences for Point-of-Interest Recommendation Author(s): Bin Liu Yanjie Fu, Zijun Yao, Hui Xiong [KDD-2013]
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2007.SIGIR.8 New Event Detection Based on Indexing-tree.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
CS Statistical Machine learning Lecture 24
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Using Social Annotations to Improve Language Model for Information Retrieval Shengliang Xu, Shenghua Bao, Yong Yu Shanghai Jiao Tong University Yunbo Cao.
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Big Data Infrastructure Week 9: Data Mining (4/4) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States.
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Statistical Models for Automatic Speech Recognition
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Topic Models in Text Processing
Text Categorization Berlin Chen 2003 Reference:
Biointelligence Laboratory, Seoul National University
Retrieval Performance Evaluation - Measures
Presentation transcript:

2005/09/13 A Probabilistic Model for Retrospective News Event Detection Zhiwei Li, Bin Wang*, Mingjing Li, Wei-Ying Ma University of Science and Technology of China Microsoft Research Asia SIGIR2005

2005/09/13 Abstract Retrospective news event detection (RED) The discovery of previously unidentified events in historical news corpus Both the contents and time information of news article are helpful to RED, but most researches focus on the utilization of the contents of news article. Few research works have been carried out on finding better usages of time information. Propose A probabilistic model to incorporate both content and time information in unified framework. Build an interactive RED system, HISCOVERY, which provides additional functions to present events, Photo Story and Chronicle.

2005/09/13 Introduction News Event A specific thing happens at a specific time and place RED The discovery of previously unidentified events in historical news corpus Applications: detect earthquakes happened in the last ten years from historical news articles Exploration The better representations of news articles and events, which should effectively model both the contents and time information. Model events in probabilistic manners

2005/09/13 Introduction (cont.) Main contributions Proposing a multi-model RED algorithm, in which both the contents and time information of news articles are modeled explicitly and effectively. Proposing an approach to determine the approximate number of events from the articles count-time distribution.

2005/09/13 Related Work RED First proposed and defined by Yang et al. (SIGIR1998), and an agglomerative clustering algorithm (Group Average Clustering, GAC) was proposed. There are few right-on-the-target research work reported. New Event Detection (NED) Similar topic, it has been extensively studied. The most prevailing approach of NED was proposed by Allan et al. (SIGIR1998) and Yang et al. (SIGIR1998) Modifications: better representation of contents and utilizing of time information

2005/09/13 Related Work (cont.) From the aspect of utilizing the contents TF-IDF and cosine similarity New distance metrics, such as the Hellinger distance metric (SIGIR2003) Better representation of documents, i.e. feature selection, Yang et al. (SIGKDD2002) The usage of named entities have been studied, such as in Allan et al. (1999), Yang et al. (2002) and Lam et al. (2001) Re-weighting of terms, firstly proposed by Allan et al.(1999) Kumaran et al. (SIGIR2004) exploited to use both text classification and named entities to improve the performance of NED.

2005/09/13 Related Work (cont.) From the aspect of utilizing time informaiton Two kinds of usages Some approaches only use the chronological order of documents The others use decaying functions to modify the similarity metrics of the contents. (Brants et al. SIGIR2003)

2005/09/13 Characteristics of News Articles and Events “Halloween” is a topic, while it includes lots of events.

2005/09/13 Characteristics of News Articles and Events (cont.) Two most important characteristics of news articles and events News articles are always aroused by news events, the articles counts of an event are changed with time. Events are peaks. However, in some situations, the observed peaks and events are not exactly corresponding. Both the contents and time of the articles reporting the same event are similar on different news sites. The start and end time of reports to events on different websites are very similar. Method The first characteristic leads RED algorithm to be modeled by a latent variable model, where events are latent variables and articles are observations. The second characteristic can gather lots of news stories on the same event by mixing articles coming from different sources.

2005/09/13 Multi-model Retrospective News Event Detection Method Multi-model approach Since contents and timestamps have different characteristics, it proposes multi-model to incorporate them in a unified probabilistic framework. Representations According to the knowledge about news, news articles can be represented by four kinds of information: who (persons), when (time), where (location),and what (keywords)

2005/09/13 The Generative Model of News Articles Generative Model Contents Use mixture of unigram models to model contents Since persons and locations are important, we model persons, locations and keywords by three mixtures of unigram models. Timestamps The articles count-time distribution is a mixture of many distributions of event. A peak is usually modeled by a Gaussian functions. Thus, Gaussian Mixture Model (GMM) is chosen to model timestamps. The whole model combines the four mixture models Three mixture of unigram model and one GMM

2005/09/13 The Generative Model of News Articles (cont.) The two-step generating process of a news articles:

2005/09/13 The Generative Model of News Articles (cont.) A graphical representation of this model N: the terms space sizes of the three kinds of entities (N p, N l and N n )

2005/09/13 Learning Model Parameters Model Parameter Can be estimated by Maximum Likelihood method Given an event j, the four kinds of information of the i-th article are conditional independent:

2005/09/13 Learning Model Parameters (cont.) Expectation Maximization algorithm EM is generally applied to maximize log-likelihood. By using the independent assumptions, parameters of the four mixture models can be estimated independently. In E-step, compute the posteriors probability

2005/09/13 Learning Model Parameters (cont.) Expectation Maximization algorithm In M-step, update the parameters of the four model For the three mixture of unigram models, parameters are updated by:

2005/09/13 Learning Model Parameters (cont.) Expectation Maximization algorithm In M-step, the parameters of the GMM are updated by: Since the mean and variance of the GMM are changed consistently with the whole model, the Gaussian functions work like sliding windows on time line.

2005/09/13 Learning Model Parameters (cont.) Expectation Maximization algorithm In M-step the mixture proportions are updated by: The EM algorithm increases the log-likelihood consistently, while it will stop at a local maximum.

2005/09/13 How Many Event? Basic The initial estimate of events number can be set as the number of peaks But noises damage the distribution Salient peak Define salient scores for peaks as:

2005/09/13 How Many Event? (cont.) Salient peak Use hill-climbing to detect all peaks, and calculate their salient score. the number of top 20% peaks is the initial estimation of k. Alternative way of k User can specify the initial value of k, and use split/merge Model selection Apply the Minimum Description Length (MDL) principle to select among values of k:

2005/09/13 Event Summarization Two ways to summarize news events Choose some features with the maximum probabilities to represent event For event j, the ‘protagonist’ is the person with the maximum p(person p |e j ) The read abilities are so bad Choose one news article as the representative for each news event The article with the maximum p(x i |e j ) The first article of each event is also a good representative

2005/09/13 Algorithm Summary 1.Multi-model RED Algorithm: a. Using hill-climbing algorithm to find all peaks b. Using salient scores to determine the TOP 20% peaks, and initialize events correspondingly. 2. Learning model parameters a. E-step: computing posteriors b. M-step: updating parameters 3. Increasing/decreasing the initial number of events until the minimum/maximum events numbers is reached a. Using Splitting/merging current big/small peaks, and re-initialize events correspondingly b. Goto step 2 4. Performing model selections by MDL 5. Summarizing

2005/09/13 Application: HISCOVERY System HISCOVERY (HIStory disCOVERY) Photo Story and Chronicle News article come from 12 news sites Photo Story

2005/09/13 Application: HISCOVERY System (cont.) HISCOVERY Chronicle User enters a topic HISCOVERY search the news corpus to gather related articles Apply the proposed RED approach to detect events belonging to this topic, and then sort summaries of events in chronological order.

2005/09/13 Experimental Methods Data Preparation The first is TDT4 dataset Choose three representative topics form TDT4 dataset, and download articles from some news websites

2005/09/13 Experimental Methods (cont.) Experimental Design In the first two experiments, set the cluster numbers as the number of events, but in practice, the event number must be determined automatically To compare, Yang et al.’s augmented Group Average Clustering (GAC) and kNN algorithm are chosen as baselines Evaluation Measures Once got contingency tables and corresponding measures (precision, recall, and F1) are calculated

2005/09/13 Results Overall Performance on Dataset 1 The better performance of the full Probabilistic Model indicates the benefits of modeling named entities by separate models. Name entities are very important for news articles.

2005/09/13 Results (cont.) Overall Performance on Dataset 2

2005/09/13 Results (cont.) How many events? Salient peak Use Mutual information to measure the fitness of a partition with the ground truth

2005/09/13 Conclusions and Future Work Contribution Use a multi-model RED algorithm to model two characteristics of news articles and events Future Work Find better representation of the contents of news articles Study how to use dynamic models to model news events, such as Hidden Markov Model (HMM) and Independent Components Analysis (ICA)