Systems Engineering and Engineering Management The Chinese University of Hong Kong Parameter Free Bursty Events Detection in Text Streams Gabriel Pui Cheong.

Slides:



Advertisements
Similar presentations
Generation of Multimedia TV News Contents for WWW Hsin Chia Fu, Yeong Yuh Xu, and Cheng Lung Tseng Department of computer science, National Chiao-Tung.
Advertisements

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
DISCOVERING EVENT EVOLUTION GRAPHS FROM NEWSWIRES Christopher C. Yang and Xiaodong Shi Event Evolution and Event Evolution Graph: We define event evolution.
A probabilistic model for retrospective news event detection
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Toward Automatic Music Audio Summary Generation from Signal Analysis Seminar „Communications Engineering“ 11. December 2007 Patricia Signé.
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Information Retrieval in Practice
Introduction to Automatic Classification Shih-Wen (George) Ke 7 th Dec 2005.
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Problem Sets Problem Set 3 –Distributed Tuesday, 3/18. –Due Thursday, 4/3 Problem Set 4 –Distributed Tuesday, 4/1 –Due Tuesday, 4/15. Probably a total.
Blogosphere  What is blogosphere?  Why do we need to study Blog-space or Blogosphere?
Detecting Near Duplicates for Web Crawling Authors : Gurmeet Singh Mank Arvind Jain Anish Das Sarma Presented by Chintan Udeshi 6/28/ Udeshi-CS572.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Presented by Zeehasham Rasheed
Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.
Overview of Search Engines
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
1. Introduction Generally Intrusion Detection Systems (IDSs), as special-purpose devices to detect network anomalies and attacks, are using two approaches.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
Building Face Dataset Shijin Kong. Building Face Dataset Ramanan et al, ICCV 2007, Leveraging Archival Video for Building Face DatasetsLeveraging Archival.
Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty Gabrilovich et.al WWW2004.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Learning Phonetic Similarity for Matching Named Entity Translation and Mining New Translations Wai Lam, Ruizhang Huang, Pik-Shan Cheung ACM SIGIR 2004.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
OLAP : Blitzkreig Introduction 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema :
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,
Yang Hu University of Pittsburgh Department of Computer Science.
Chapter 6: Information Retrieval and Web Search
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Bursty Event Detection from Text Streams for Disaster Management Sungjun Lee, Sangjin Lee, Kwanho Kim, and Jonghun Park Information.
1 A Probabilistic Model for Bursty Topic Discovery in Microblogs Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Jun Xu, Xueqi Cheng CAS Key Laboratory of Web Data.
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington.
Inference: Probabilities and Distributions Feb , 2012.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
2005/09/13 A Probabilistic Model for Retrospective News Event Detection Zhiwei Li, Bin Wang*, Mingjing Li, Wei-Ying Ma University of Science and Technology.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Queensland University of Technology
Automatic Video Shot Detection from MPEG Bit Stream
Yi-Chia Wang LTI 2nd year Master student
Bursty and Hierarchical Structure in Streams
Building Topic/Trend Detection System based on Slow Intelligence
Presentation transcript:

Systems Engineering and Engineering Management The Chinese University of Hong Kong Parameter Free Bursty Events Detection in Text Streams Gabriel Pui Cheong Fung, Jeffrey Xu Yu, Hongjun Lu, Philip S Yu VLDB 2005

Systems Engineering and Engineering Management The Chinese University of Hong Kong Outline Introduction Introduction –Bursty events? Text streams? Etc. A Possible Method A Possible Method –Document pivot clustering Proposed Work Proposed Work –Feature pivot clustering Results Highlight Results Highlight Related Works Related Works Summary & Future Work Summary & Future Work

Systems Engineering and Engineering Management The Chinese University of Hong Kong Outline Introduction Introduction –Bursty events? Text streams? Etc. A Possible Method A Possible Method –Document pivot clustering Proposed Work Proposed Work –Feature pivot clustering Results Highlight Results Highlight Related Works Related Works Summary & Future Work Summary & Future Work

Systems Engineering and Engineering Management The Chinese University of Hong Kong Parameter Free Bursty Events Detection in Text Streams Parameter Free Bursty Events Detection in Text Streams Introduction (1 or 5)

Systems Engineering and Engineering Management The Chinese University of Hong Kong Parameter Free Bursty Events Detection in Text Streams Parameter Free Bursty Events Detection in Text Streams –A sequence of documents organized temporally »E.g. News stories and s –Two kinds of stream: Online vs. Offline »Online Stream: Open-ended. »Offline Stream: Have boundaries. Introduction (2 or 5) ………

Systems Engineering and Engineering Management The Chinese University of Hong Kong Parameter Free Bursty Events Detection in Text Streams Parameter Free Bursty Events Detection in Text Streams –An event consists a set of features that are useful to identify (understand) the event. –A Bursty Event is an event that is hot in a specific period of time –We call the features that are used to identify the Bursty Event as Bursty Features –E.g. The event “SARS” consists of the features “Outbreak, Atypic, Respire, …” Introduction (3 or 5) Time No. of News Stories An event, e.g. SARS

Systems Engineering and Engineering Management The Chinese University of Hong Kong Introduction (4 or 5) Parameter Free Bursty Events Detection in Text Stream Parameter Free Bursty Events Detection in Text Stream –Given a text stream, try to figure out all of the bursty events »In other words, try to figure out all of the bursty features (features that are “hot” in a specific period) and group the bursty features together logically, such that the bursty features grouped together are useful for identifying an event. ………

Systems Engineering and Engineering Management The Chinese University of Hong Kong Introduction (5 or 5) Parameter Free Bursty Events Detection in Text Streams Parameter Free Bursty Events Detection in Text Streams –Parameter Free – You do not need to turn the parameters by yourself »The framework is applicable on any corpus »No fine tuning is necessary »No parameter needs to be estimated –Why parameter free is useful? »Without any prior knowledge about the information in a database, it is rather difficult to make any initially estimation »In our problem, we are trying to identify the bursty events in a text stream. In this problem, we do not know have any prior knowledge about the information in the database. We do not know what it contains. We even do not know whether there is any burst. We do not know…

Systems Engineering and Engineering Management The Chinese University of Hong Kong Problem Setting Data archived Data archived –Source: Local news stories (South China Morning Post) –Period: to Some major settings Some major settings –Offline detection –New stories that are release on the same day (i.e. new stories that appear in the same piece of the newspaper) are grouped together as a batch

Systems Engineering and Engineering Management The Chinese University of Hong Kong Outline Introduction Introduction –Bursty events? Text streams? Etc. A Possible Method A Possible Method –Document pivot clustering Proposed Work Proposed Work –Feature pivot clustering Results Highlight Results Highlight Related Works Related Works Summary & Future Work Summary & Future Work

Systems Engineering and Engineering Management The Chinese University of Hong Kong A possible method (Not our approach) A possible method (Not our approach) –Step 1: »Objective: Group similar events together »Method: Use clustering to group similar documents together (e.g. K- Means) –Step 2 »Objective: Extract the keywords of each event »Method: Use feature selection (e.g. Information gain) Document Pivot Clustering Approach (1 of 3) All News Stories Via Clustering... Group 1 Group 2 Step 1 Step 2 Extract the Key Features feature... feature...

Systems Engineering and Engineering Management The Chinese University of Hong Kong Document Pivot Clustering Approach (2 of 3) Some difficulties Some difficulties 1.Most similar documents may not report the same event –From our experiments, we found that two documents that are the most similar in terms of the features, may not necessary report the same event 2.Clustering requires feature weightings (e.g. tf-idf) –Feature weighting is originated from IR. Its idea is: feature appear in fewer documents in the domain are more useful (obtain higher weights). –For clustering: feature appear in many documents in a certain period should obtain a higher weights.

Systems Engineering and Engineering Management The Chinese University of Hong Kong Some difficulties (cont’d) Some difficulties (cont’d) 3.A long running events may be broken down into several small pieces –This phenomenon appears in many reported studies (esp. in TDT) 4.Difficult to figure out the bursty features –Assume clustering can determine bursty events. However, there can be many clusters that are not “hot” (important). Determine which of the cluster is “hot” is difficult (may require a ranking function, but difficult to derive.) Document Pivot Clustering Approach (3 of 3)

Systems Engineering and Engineering Management The Chinese University of Hong Kong Outline Introduction Introduction –Bursty events? Text streams? Etc. A Possible Method A Possible Method –Document pivot clustering Proposed Work Proposed Work –Feature pivot clustering Results Highlight Results Highlight Related Works Related Works Summary & Future Work Summary & Future Work

Systems Engineering and Engineering Management The Chinese University of Hong Kong Feature Pivot Clustering Approach Overview of the framework Overview of the framework –Step 1 »Identify the bursty features –Step 2 »Group the bursty features into bursty events –Step 3 »Determine the hot periods of the bursty events All News Stories Extract All feature... Identify Event 1... Bursty feature... Cluster Event Determine the hot period Determine the hot period Step 1 Step 2 Step 3

Systems Engineering and Engineering Management The Chinese University of Hong Kong Cluster Feature Pivot Clustering Approach Overview of the framework Overview of the framework –Step 1 »Identify the bursty features Step 2 Group the bursty features into bursty events Step 3 Determine the hot periods of the bursty events All News Stories Extract All feature... Identify Event 1... Bursty feature... Event Determine the hot period Determine the hot period Step 1 Step 2 Step 3

Systems Engineering and Engineering Management The Chinese University of Hong Kong Identify the Bursty Features (1 of 7) General Idea General Idea –Given a single feature, f, try to figure out whether it contains any bursty period. –If so, then it is a bursty feature (in some specific periods) Time No. of docs contains the feature, f Bursty Period The distribution of a feature, f, among documents

Systems Engineering and Engineering Management The Chinese University of Hong Kong Identify the Bursty Features (2 of 7) Some more examples Some more examples Time No. of docs contains the feature, f Time No. of docs contains the feature, f Time No. of docs contains the feature, f Time No. of docs contains the feature, f No burst Not a burst (stopword) Burst without fading away Two burst

Systems Engineering and Engineering Management The Chinese University of Hong Kong Identify the Bursty Features (3 of 7) An obvious approach to discover whether a feature is a bursty feature is to use a “threshold cut” An obvious approach to discover whether a feature is a bursty feature is to use a “threshold cut” Time No. of docs contains the feature, f Bursty Period The distribution of a feature, f, among documents threshold

Systems Engineering and Engineering Management The Chinese University of Hong Kong Identify the Bursty Features (4 of 7) Challenges Challenges –Setting one single threshold for all features is impossible Another attempt – set a “percentage cut” Another attempt – set a “percentage cut” –Figure out the relative differences between the max and min of the “No. of docs contains the feature” Time No. of docs contains the feature, f Time No. of docs contains the feature, f For a stop-word: For a normal non-bursty feature: threshold

Systems Engineering and Engineering Management The Chinese University of Hong Kong Identify the Bursty Features (5 of 7) Challenges Challenges –Setting a percentage cut is also impossible »Different features has different distribution: Time No. of docs contains the feature, f Time No. of docs contains the feature, f

Systems Engineering and Engineering Management The Chinese University of Hong Kong Identify the Bursty Features (6 of 7) Our solution Our solution –Treating each feature in the text stream as a probabilistic distribution –In each day, we compute the probability that the number of documents contains a particular feature, f j »What we got are: N’ – no. of news stories in the stream n’ – no. of news stories in a time window (one day) K’– no. of news stories contains the specific feature n’ – K’ – no. of news stories does not contain the specific feature N’ – no. of news stories in the stream n’ – no. of news stories in a time window (one day) K’– no. of news stories contains the specific feature n’ – K’ – no. of news stories does not contain the specific feature »We can model the distribution of a feature in a time window (i.e. in a day) by binomial distribution (the above four elements are enough for computing binomial distribution) (Continue next page)

Systems Engineering and Engineering Management The Chinese University of Hong Kong Identify the Bursty Features (7 of 7) –If in any time window (day), the value of the binomial distribution (probability that the number of documents contain the feature) change significantly, than it implies that the feature exhibit “abnormal” behavior »The reason is that if the features are generated from an unknown probability distribution, than the value of the binomial distribution at each time window (in each day) should be more or less constant –Two reasons that it drop significantly: »Suddenly very few documents contains the specific features We are not interested in this kind of observation, as it only tells us that the specific feature is NOT a bursty feature in the corresponding time window (day). It gives no insight about whether it is a bursty feature NOW. We are not interested in this kind of observation, as it only tells us that the specific feature is NOT a bursty feature in the corresponding time window (day). It gives no insight about whether it is a bursty feature NOW. »Suddenly many documents contains the specific features We are interested in this kind of features We are interested in this kind of features

Systems Engineering and Engineering Management The Chinese University of Hong Kong Cluster Feature Pivot Clustering Approach Overview of the framework Overview of the framework –Step 1 »Identify the bursty features Step 2 Group the bursty features into bursty events Step 3 Determine the hot periods of the bursty events All News Stories Extract All feature... Identify Event 1... Bursty feature... Event Determine the hot period Determine the hot period Step 1 Step 2 Step 3

Systems Engineering and Engineering Management The Chinese University of Hong Kong Feature Pivot Clustering Approach Overview of the framework Overview of the framework –Step 1 »Identify the bursty features –Step 2 »Group the bursty features into bursty events –Step 3 Determine the hot periods of the bursty events All News Stories Extract All feature... Identify Event 1... Bursty feature... Cluster Event Determine the hot period Determine the hot period Step 1 Step 2 Step 3

Systems Engineering and Engineering Management The Chinese University of Hong Kong Group the Bursty Features (1 of 2) General idea General idea –Group the features such that they always appear together »If the features always appear together, they should be discussing the same event –Cluster the features Challenge Challenge –Should we group these two features together? »Situation: If feature A appears, Feature B always appears also. Feature A appears in 1,000 stories. Feature B appears in 200 stories. »We claim that they should not be grouped together, as Feature B is only a subset of Feature A. We want to group the feature at the “same level” We want to group the feature at the “same level”

Systems Engineering and Engineering Management The Chinese University of Hong Kong Group the Bursty Features (2 of 2) Our solution Our solution –We try to figure out what is the probability of the features grouped together given the observation of the document distribution of the text stream »Find a maximum probability that the features would be grouped together (Expectation-Maximization, EM) –Mathematically,

Systems Engineering and Engineering Management The Chinese University of Hong Kong Feature Pivot Clustering Approach Overview of the framework Overview of the framework –Step 1 »Identify the bursty features –Step 2 »Group the bursty features into bursty events –Step 3 Determine the hot periods of the bursty events All News Stories Extract All feature... Identify Event 1... Bursty feature... Cluster Event Determine the hot period Determine the hot period Step 1 Step 2 Step 3

Systems Engineering and Engineering Management The Chinese University of Hong Kong Feature Pivot Clustering Approach Overview of the framework Overview of the framework –Step 1 »Identify the bursty features –Step 2 »Group the bursty features into bursty events –Step 3 »Determine the hot periods of the bursty events All News Stories Extract All feature... Identify Event 1... Bursty feature... Cluster Event Determine the hot period Determine the hot period Step 1 Step 2 Step 3

Systems Engineering and Engineering Management The Chinese University of Hong Kong Determine the Hot Periods General idea General idea –The highest average probability that the bursty features will be appeared together Graphically Graphically Time Document Distribution

Systems Engineering and Engineering Management The Chinese University of Hong Kong Outline Introduction Introduction –Bursty events? Text streams? Etc. A Possible Method A Possible Method –Document pivot clustering Proposed Work Proposed Work –Feature pivot clustering Results Highlight Results Highlight Related Works Related Works Summary & Future Work Summary & Future Work

Systems Engineering and Engineering Management The Chinese University of Hong Kong Problem Setting Data archived Data archived –Source: Local news stories (South China Morning Post) –Period: to Major Settings Major Settings –Offline detection –New stories that are release on the same day (i.e. new stories that appear in the same piece of the newspaper) are grouped together as a batch

Systems Engineering and Engineering Management The Chinese University of Hong Kong Results Highlight Some events Some events Bursty Events Bursty Features SARS Sars, Outbreak, Atypic, Respire, … Legislation Article, Yip, Law, Rally, … Bird Fu Bird, Flu Taiwan Issue Taiwan, Chen, Shu, Bian Iraq War Iraq, War, Saddam, … Gas Victim, Might, Accident, Gas

Systems Engineering and Engineering Management The Chinese University of Hong Kong Outline Introduction Introduction –Bursty events? Text streams? Etc. A Possible Method A Possible Method –Document pivot clustering Proposed Work Proposed Work –Feature pivot clustering Results Highlight Results Highlight Related Works Related Works Conclusion Conclusion

Systems Engineering and Engineering Management The Chinese University of Hong Kong Related Works (1 of 2) TDT – Automatically techniques for locating topically related materials in streams data (Wayne 2000 pp. 1487) TDT – Automatically techniques for locating topically related materials in streams data (Wayne 2000 pp. 1487) –Five major tasks: segmentation, tracking, detection, first story detection, linking –Work well with the “document-pivot clustering” approach »Try to group similar documents to form an event (The event is not named, i.e. no need to extract or identify the main features in the event) No need to figure out the “bursty features” No need to figure out the “bursty features” –Other interesting issue »Our approach naturally combine the detection task and linking task together

Systems Engineering and Engineering Management The Chinese University of Hong Kong Related Works (2 of 2) Many other related works Many other related works –Vlachos et la SIGMOD’04 »Burst for online query –Smith SIGIR’02 »Events Detection –Kleinbery KDD’02 »Burst and hierarchical structure –Swan & Allan SIGIR’00 »Time varying features –…

Systems Engineering and Engineering Management The Chinese University of Hong Kong Outline Introduction Introduction –Bursty events? Text streams? Etc. A Possible Method A Possible Method –Document pivot clustering Proposed Work Proposed Work –Feature pivot clustering Results Highlight Results Highlight Related Works Related Works Summary & Future Work Summary & Future Work

Systems Engineering and Engineering Management The Chinese University of Hong Kong Summary & Future Work Document Pivot Clustering vs. Feature Pivot Clustering Document Pivot Clustering vs. Feature Pivot Clustering –Document Pivot Clustering – Clustering is based on the content of the documents –Feature Pivot Clustering – Clustering is based on distribution of features Future Works Future Works –Try to apply the framework in TDT dataset »However, TDT contain selected news stories from multiple sources. The distribution of features may be affected. »Moreover, the time period of TDT is relatively short. We do not know whether the change in the distribution of features is significant enough for us to do analysis –Try to assign the same features to multiple events (more realistic) »However, this may lead to many new issues, such as a “cycle” appear, or the some parameters needed to introduce

Systems Engineering and Engineering Management The Chinese University of Hong Kong Thank you very much – The End –