DISCOVERING EVENT EVOLUTION GRAPHS FROM NEWSWIRES Christopher C. Yang and Xiaodong Shi Event Evolution and Event Evolution Graph: We define event evolution.

Slides:



Advertisements
Similar presentations
ADBIS 2007 Aggregating Multiple Instances in Relational Database Using Semi-Supervised Genetic Algorithm-based Clustering Technique Rayner Alfred Dimitar.
Advertisements

Google News Personalization: Scalable Online Collaborative Filtering
A probabilistic model for retrospective news event detection
Text Categorization.
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Boolean and Vector Space Retrieval Models
Music Recommendation by Unified Hypergraph: Music Recommendation by Unified Hypergraph: Combining Social Media Information and Music Content Jiajun Bu,
Clustering for web documents 1 박흠. Clustering for web documents 2 Contents Cluto Criterion Functions for Document Clustering* Experiments and Analysis.
A Vector Space Model for Automatic Indexing
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
LYRIC-BASED ARTIST NETWORK METHODOLOGY Derek Gossi CS 765 Fall 2014.
Data Mining Techniques: Clustering
To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information.
Systems Engineering and Engineering Management The Chinese University of Hong Kong Parameter Free Bursty Events Detection in Text Streams Gabriel Pui Cheong.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Video summarization by video structure analysis and graph optimization M. Phil 2 nd Term Presentation Lu Shi Dec 5, 2003.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Video summarization by graph optimization Lu Shi Oct. 7, 2003.
IR Models: Review Vector Model and Probabilistic.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
“A Comparison of Document Clustering Techniques” Michael Steinbach, George Karypis and Vipin Kumar (Technical Report, CSE, UMN, 2000) Mahashweta Das
1 Automatic Indexing The vector model Methods for calculating term weights in the vector model : –Simple term weights –Inverse document frequency –Signal.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Social Network Analysis via Factor Graph Model
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Rui Yan, Yan Zhang Peking University
TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
TEMPORAL EVENT CLUSTERING FOR DIGITAL PHOTO COLLECTIONS Matthew Cooper, Jonathan Foote, Andreas Girgensohn, and Lynn Wilcox ACM Multimedia ACM Transactions.
MINING RELATED QUERIES FROM SEARCH ENGINE QUERY LOGS Xiaodong Shi and Christopher C. Yang Definitions: Query Record: A query record represents the submission.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
VAST 2011 Sebastian Bremm, Tatiana von Landesberger, Martin Heß, Tobias Schreck, Philipp Weil, and Kay Hamacher Interactive-Graphics Systems TU Darmstadt,
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Pete Bohman Adam Kunk.  ChronoSearch: A System for Extracting a Chronological Timeline ChronoChrono.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
Yang Hu University of Pittsburgh Department of Computer Science.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
Motion Analysis using Optical flow CIS750 Presentation Student: Wan Wang Prof: Longin Jan Latecki Spring 2003 CIS Dept of Temple.
Algorithmic Detection of Semantic Similarity WWW 2005.
PCI th Panhellenic Conference in Informatics Clustering Documents using the 3-Gram Graph Representation Model 3 / 10 / 2014.
Vector Space Models.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
LexPageRank: Prestige in Multi-Document Text Summarization Gunes Erkan, Dragomir R. Radev (EMNLP 2004)
2005/09/13 A Probabilistic Model for Retrospective News Event Detection Zhiwei Li, Bin Wang*, Mingjing Li, Wei-Ying Ma University of Science and Technology.
An Adaptive User Profile for Filtering News Based on a User Interest Hierarchy Sarabdeep Singh, Michael Shepherd, Jack Duffy and Carolyn Watters Web Information.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
Collection Fusion in Carrot2
A Case Study for Adaptive News Systems with Open User Model
Video Summarization by Spatial-Temporal Graph Optimization
Modern Information Retrieval
Representation of documents and queries
Boolean and Vector Space Retrieval Models
Information Retrieval and Web Design
Presentation transcript:

DISCOVERING EVENT EVOLUTION GRAPHS FROM NEWSWIRES Christopher C. Yang and Xiaodong Shi Event Evolution and Event Evolution Graph: We define event evolution as the directional dependencies or relatedness, which exhibit the track of event development, between two events inside a same news affair; the relationship between such two events is called event evolution relationship. An event evolution graph is defined as a directed acyclic graph (DAG) G = (V, L) consisting of events as the nodes V = { ε 1, ε 2, …, ε n } and event evolution relationships as the directed edges L = {(ε i, ε j )} between nodes. A partial event evolution graph for the news topic Beslan school hostage crisis. The numbers in the bracket indicates their temporal orderings. There are totally 10 events and 11 event evolution relationships in it. Overview: In Topic Detection and Tracking (TDT) news stories are often organized into a flat hierarchical structure where inter-cluster relationships are missing. Modeling the event evolutions of news topics and presenting them in a graph structure can be useful in various applications: Direct users through the news topic in information browsing. Integrate with automatic summarization techniques and graphical interfaces to provide a graphical web news infomediary. We propose to represent the event evolutions of news topic using an event evolution graph, which is a directed graph with its vertices as events and its edges as event evolution relationships. Modeling Event Evolution Relationships: Previous researches, e.g. event threading, uses the average pairwise story similarity as the measurement of event evolutions. Event threading neglects the properties of events and simply treats event as an aggregate set of news stories. We propose to measure the event content similarity between two events by calculating the cosine similarity of the event term vectors, which is then combined with two decaying factors, temporal proximity and document distributional proximity, to measure the confidences of event evolution relationships. Event Content Similarity: We use the simple bag of words model to represent the textual content of each news story. The event term vector of event ε i is computed as the average of the document term vectors of stories that belong to ε i. TF weights are used instead of traditional TF-IDF weights. The event content similarity between events ε i and ε j is: where etv(.) is the event term vector representation of the set of stories belonging to the same event. Experiments: Event threading model combined with Nearest Parent or Best Similarity graph model is selected as the baseline. When event evolution model is combined with static thresholding, it outperforms the rival models a lot. (α=0.5, β=0.5) The Precision and Recall Curves (Interpolated to Standard 11 Levels) of the Comparative Experimental Results where m is the number of documents that belong to the events happening in- between event ε i and ε j. N is the total number of documents in the topic. β is a decaying factor. Document Distributional Proximity: The proximity of news stories in their distributions is more useful for measuring event evolution than temporal proximity in cases like when there is a burst of events and stories. We define the document distributional proximity as: Static Thresholding: To prune generated event evolution graphs, we compute the confidence of all event evolution relationships and filter away undesirable ones according to the static thresholding model described below: G = (V, L) where, Flat hierarchical structure of news topics in TDT Event evolution graph representation of news topics Temporal Proximity: Assume the timestamp of an event ε i is a timeinterval [s i, e i ], the temporal distance between two events ε i and ε j as (s i s j ): Intuitively if two events are farther away from each other along the timeline, the event evolution between them is less likely to exist. The temporal proximity between two events is: (s i s j ): where T is the event horizon defined as the time-span of the entire news affair. α is the time decaying weight (0 α 1).