Event Detection and Summarization in Weblogs with Temporal Collocations Chun-Yuan Teng and Hsin-Hsi Chen Department of Computer Science and Information.

Slides:



Advertisements
Similar presentations
Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute.
Advertisements

TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Hashtags as Milestones in Time Identifying the hashtags for meaningful events using Twitter search logs and Wikipedia data Stewart Whiting University of.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Systems Engineering and Engineering Management The Chinese University of Hong Kong Parameter Free Bursty Events Detection in Text Streams Gabriel Pui Cheong.
Topic Extraction From Turkish News Articles Anıl Armağan Fuat Basık Fatih Çalışır Arif Usta.
Blog Data Analysis S. Muthukrishnan, CS Rutgers & DIMACS Graham Cormode, DIMACS.
Measuring Monolinguality Chris Biemann NLP Department, University of Leipzig LREC-06 Workshop on Quality Assurance and Quality Measurement for Language.
Fuzzy Medical Image Segmentation
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
On Burstiness-Aware Search for Document Sequences Theodoros Lappas Benjamin Arai Manolis Platakis Dimitrios Kotsakos Dimitrios Gunopulos SIGKDD 2009.
Twitter Mood Predicts the Stock Market Authors: Johan Bollen, Huina Mao, Xiao-Jun Zeng Presented By: Krishna Aswani Computing ID: ka5am.
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 30, (2014) BERLIN CHEN, YI-WEN CHEN, KUAN-YU CHEN, HSIN-MIN WANG2 AND KUEN-TYNG YU Department of Computer.
A Social Help Engine for Online Social Network Mobile Users Tam Vu, Akash Baid WINLAB, Rutgers University May 21,
Yin Yang (Hong Kong University of Science and Technology) Nilesh Bansal (University of Toronto) Wisam Dakka (Google) Panagiotis Ipeirotis (New York University)
Traffic modeling and Prediction ----Linear Models
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Golder and Huberman, 2006 Journal of Information Science Usage Patterns of Collaborative Tagging System.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
UNIVERSITY of NOTRE DAME COLLEGE of ENGINEERING Preserving Location Privacy on the Release of Large-scale Mobility Data Xueheng Hu, Aaron D. Striegel Department.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query.
Addressing Incompleteness and Noise in Evolving Web Snapshots KJDB2007 Masashi Toyoda IIS, University of Tokyo.
Chapter 1 Introduction to Data Mining
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
Multilingual Relevant Sentence Detection Using Reference Corpus Ming-Hung Hsu, Ming-Feng Tsai, Hsin-Hsi Chen Department of CSIE National Taiwan University.
Crawling and Aligning Scholarly Presentations and Documents from the Web By SARAVANAN.S 09/09/2011 Under the guidance of A/P Min-Yen Kan 10/23/
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
You Are What You Tag Yi-Ching Huang and Chia-Chuan Hung and Jane Yung-jen Hsu Department of Computer Science and Information Engineering Graduate Institute.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Bursty Event Detection from Text Streams for Disaster Management Sungjun Lee, Sangjin Lee, Kwanho Kim, and Jonghun Park Information.
Facilitating Document Annotation using Content and Querying Value.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
A Study of Smartphone User Privacy from the Advertiser's Perspective Yan Wang 1, Yingying Chen 1, Fan Ye 2, Jie Yang 3, Hongbo Liu 4 1 Department of Electrical.
Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.
 Goal recap  Implementation  Experimental Results  Conclusion  Questions & Answers.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : YUNG-MING LI, TSUNG-YING LI 2013, DSS Deriving market intelligence from microblogs.
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Extracting and Ranking Product Features in Opinion Documents Lei Zhang #, Bing Liu #, Suk Hwan Lim *, Eamonn O’Brien-Strain * # University of Illinois.
GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.
1 Blog Cascade Affinity: Analysis and Prediction 2009 ACM Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
LOGO Comments-Oriented Blog Summarization by Sentence Extraction Meishan Hu, Aixin Sun, Ee-Peng Lim (ACM CIKM’07) Advisor : Dr. Koh Jia-Ling Speaker :
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
Biao Wang 1, Ge Chen 1, Luoyi Fu 1, Li Song 1, Xinbing Wang 1, Xue Liu 2 1 Shanghai Jiao Tong University 2 McGill University
This multimedia product and its contents are protected under copyright law. The following are prohibited by law: any public performance or display, including.
Measuring Monolinguality
User Joining Behavior in Online Forums
Ahmet Fatih Mustacoglu
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Automatic Segmentation of Data Sequences
Pei Lee, ICDE 2014, Chicago, IL, USA
Ali Hakimi Parizi, Paul Cook
Presentation Outline Science Fair.
Yingze Wang and Shi-Kuo Chang University of Pittsburgh
Presentation transcript:

Event Detection and Summarization in Weblogs with Temporal Collocations Chun-Yuan Teng and Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan

2 Outlines Motivation Temporal collocation Event detection and summarization using temporal collocations Experiments –Datasets –Evaluation of event detection –Evaluation of event summarization Conclusion

3 Motivation Weblogs –containing abundant life experiences and public opinions toward different topics –highly sensitive to the events occurring in the real world –associated with the personal information of bloggers Problem –How to know what bloggers write and discuss over time? –Event detection is fundamental

4 Google Trend –Plot the frequency of word and frequency of news over time –E.g., Select the news with highest frequency of “president” Ambiguous peak –We don’t know the peak of “president” is caused by which president.

5 Collocations Combination of words give the specific meaning. Collocations such as mean and variance, hypothesis test, mutual information, etc. are used to model the relationship between terms. Can we model collocations over time?

6 Temporal Collocation Mutual Information Temporal Mutual Information –P(x,y|t) denotes the probability of co-occurrence of terms x and y in timestamp t. –P(x|t) and P(y|t) denote the probability of x and y in timestamp t.

7 Temporal Collocation Change of Temporal Mutual Information –C(x,y,t1,t2) is the change of temporal mutual information of terms x and y in time interval [t1, t2] –I(x,y| t1) and I(x,y| t2) are the temporal mutual information in time stamps t1 and t2, respectively

8 Event Detection Identify the collocations resulting in events Retrieve the descriptions of events

9 System Architecture Pre-processing phase –parse the weblogs –retrieve the collocations Event detection phase –detect the unusual peak of the change of temporal mutual information –identify the set of collocations resulting in an event in a specific time duration Event summarization phase –extract the collocations related to the seed collocations found in a specific time duration

10 Pre-processing Phase Retrieve the collocations from the sentences in blog posts –Propose the candidates within a window size –Remove those candidates containing stop-words or with low change of temporal mutual information

11 Event Detection Phase Remove the regular pattern by seasonal index Measure the unusual peak of temporal mutual information to detect the plausible events –change of temporal mutual information (MI2-MI1) favor the events with high frequency –relative change of temporal mutual information (MI2-MI1)/MI1 favor the events with low mutual information MI1 and MI2: temporal mutual information at timestamps t1 and t2

12 Event Summarization Phase Select the collocations with the highest mutual information with the word w in a seed collocation –Place the seed collocation into a collocation network –Add the collocation having the highest mutual information –Compute the mutual information of the multiword collocations when a new collocation is added –Stop and return the words in the collocation network if the multiword mutual information is lower than a threshold

13 A Collocation Network

14 Data Sets ICWSM weblog data set –collected from May 1, 2006 through May 20, 2006 –about 20 GB –the English weblog of 2,734,518 articles for analysis Gold standard – –The events posted in wikipedia are not always complete, thus we adopt recall rate –The events specified in wikipedia are not always discussed in weblogs, thus we remove the events listed in wikipedia, but not referenced in the weblogs

15 Evaluation of Event Detection Phase recall rate: 75%

16 Performance of Event Detection Phase

17 Discussion CollocationsRelative change casinos online zacarias moussaoui Tsunami warning Conspirator zacarias71.62 Artist formerly57.04 Federal jury41.78 Wed Pramod mahajan35.41 BBC version35.21 Geena davis33.64 Diet sodas32.50 Ving rhames31.63 Stock picks29.09 Happy hump28.45 Wong kan28.34 Sixapartcom movabletype Aaron echolls27.48 Phnom Penh25.78 Livejournal sixapartcom George yeo20.34 CollocationsChange of MI May Illegal immigrants Feel left Saturday night Past weekend White house Red sox Album tool Sunday morning Sunday night Current music Hate studying Stephen Colbert Thursday night Can’t believe Feel asleep Ice cream Oh god Illegal immigration Pretty cool Illegal aliens Change of MI (left) favors regular events and events with high frequency Time: May 03 Feeling: fell left Relative change (right) favors person or special event Terrorists killed in May 3: zacarias moussaoui, parad mahajan best actress award in golden globe award in May 3: Geena Davis

18 Evaluation of Event Summarization Method 1: Employ the highest temporal mutual information Method 2: Utilize the highest product of temporal mutual information and change of temporal mutual information

19 An Example of Event Retrieval typhoon Chanchu –The typhoon Chanchu appears in the pacific ocean near 5/10, and the typhoon passes through Philippine and China and result in disasters in these areas.

20 Event Summarization for Typhoon Chanchu Using Method 1

21 Event Summarization for Typhoon Chanchu Using Method 2

22 Some Observations The appearance of the typhoon Chanchu cannot be found from the events listed in wikipedia on May 10. We can identify the appearance of typhoon Chanchu from the description of the typhoon appearance such as “typhoon named” and “Typhoon eye.” The typhoon Chanchu’s path can also be inferred from the retrieved collocations such as “Philippine China” and “near China”. The responses of bloggers such as “unexpected typhoon” and “8 typhoons” are also extracted.

23 Method 1 vs. Method 2 Method 1 shows more noise than Method 2. The term “typhoon earthquake” is extracted using the Method 1. The term “typhoon earthquake” is not retrieved using Method 2 because we also consider the change of temporal mutual information.

24 Concluding Remarks The works we have done –Introduce temporal mutual information to capture term-term association over time in weblogs –Select the extracted collocation with unusual peak in terms of relative change of temporal mutual information to represent an event –Collect those collocations with the highest product of mutual information and change of temporal mutual information to summarize the specific event Future works –Model the collocations over time and location –Model the relationship between the user-preferred usage of collocations and the profile of users

25 Thanks