Alexander Kotov, ChengXiang Zhai, Richard Sproat University of Illinois at Urbana-Champaign.

Slides:



Advertisements
Similar presentations
Slide 1 Insert your own content. Slide 2 Insert your own content.
Advertisements

Requirements Engineering Process
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
1 A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs Qiaozhu Mei, Chao Liu, Hang Su, and ChengXiang Zhai : University of Illinois.
DISCOVERING EVENT EVOLUTION GRAPHS FROM NEWSWIRES Christopher C. Yang and Xiaodong Shi Event Evolution and Event Evolution Graph: We define event evolution.
Dynamic View Selection for Time-Varying Volumes Guangfeng Ji* and Han-Wei Shen The Ohio State University *Now at Vital Images.
0 - 0.
Normal forms - 1NF, 2NF and 3NF
Visual Model-based Software Development EUD-Net Workshop, Pisa, Italy September 23 rd, 2002 University of Paderborn Gregor Engels, Stefan Sauer University.
BURSTY SUBGRAPHS IN SOCIAL NETWORKS. Introduction 2.
A probabilistic model for retrospective news event detection
ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.
Alexander Kotov and ChengXiang Zhai University of Illinois at Urbana-Champaign.
A Cross-Collection Mixture Model for Comparative Text Mining
Pattern Finding and Pattern Discovery in Time Series
1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
AIM Operational Concept
Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
Dept of Biomedical Engineering, Medical Informatics Linköpings universitet, Linköping, Sweden A Data Pre-processing Method to Increase.
Music Recommendation by Unified Hypergraph: Music Recommendation by Unified Hypergraph: Combining Social Media Information and Music Content Jiajun Bu,
Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,
Language Modeling.
Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.
Noise & Data Reduction. Paired Sample t Test Data Transformation - Overview From Covariance Matrix to PCA and Dimension Reduction Fourier Analysis - Spectrum.
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Machine learning continued Image source:
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Blogosphere  What is blogosphere?  Why do we need to study Blog-space or Blogosphere?
Monitoring Methods for Topic Drift in Message Streams By Christopher Ross & S. Muthu Muthukrishnan.
Automatic Blog Monitoring and Summarization Ka Cheung “Richard” Sia PhD Prospectus.
Media trends - market data correlation Assuming mass media events can have a significant impact to the market environment - service determines how informative.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
Rui Yan, Yan Zhang Peking University
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Discovering Emerging Topics in Social Streams via Link Anomaly Detection.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large- scale Data Collections Xuan-Hieu PhanLe-Minh NguyenSusumu Horiguchi GSIS,
Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty Gabrilovich et.al WWW2004.
Towards Natural Question-Guided Search Alexander Kotov ChengXiang Zhai University of Illinois at Urbana-Champaign.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.
Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.
1 A Probabilistic Model for Bursty Topic Discovery in Microblogs Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Jun Xu, Xueqi Cheng CAS Key Laboratory of Web Data.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Tokyo Research Laboratory © Copyright IBM Corporation 2005SDM 05 | 2005/04/21 | IBM Research, Tokyo Research Lab Tsuyoshi Ide Knowledge Discovery from.
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
1 Generating Comparative Summaries of Contradictory Opinions in Text (CIKM09’)Hyun Duk Kim, ChengXiang Zhai 2010/05/24 Yu-wen,Hsu.
Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.
Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.
Automatic Labeling of Multinomial Topic Models
On Frequent Chatters Mining Claudio Lucchese 1 st HPC Lab Workshop 6/15/12 1st HPC Workshp - Claudio Lucchese.
MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Discovering Evolutionary Theme Patterns from Text -An exploration of Temporal Text Mining KDD’05, August 21–24, 2005, Chicago, Illinois, USA. Qiaozhu Mei.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Summary Presented by : Aishwarya Deep Shukla
An Adaptive Middleware for Supporting Time-Critical Event Response
Bursty and Hierarchical Structure in Streams
Course Summary ChengXiang “Cheng” Zhai Department of Computer Science
Effective Entity Recognition and Typing by Relation Phrase-Based Clustering
Hierarchical, Perceptron-like Learning for OBIE
Presentation transcript:

Alexander Kotov, ChengXiang Zhai, Richard Sproat University of Illinois at Urbana-Champaign

Roadmap Problem definition Previous work Approach Experiments Summary

Motivation Web data is generated by a large number of textual streams (news, blogs, tweets, etc.) Bursts of entity mentions (people, locations) correspond to a particular event Bursts of entity mentions are influenced by bursts of other entities Intuition: bursts of semantically related entities should be temporally correlated

Problem definition time time sparsity magnitude time lag entity 1 entity 2 = ?

Temporally correlated bursts Problem: given a collection of textual streams discover named entities with correlated bursts Provide multilingual summaries of real life events Estimate social impact of a particular event in different countries Differentiate between local and global events Discover transliterations of named entities

Roadmap Problem definition Previous work Approach Experiments Summary

Previous work Burst detection: infinite-state automation (Kleinberg 02) factorial HMMs (Krause 06) wavelet transformation (Zhu 03) Stream correlation: distance-based measures: Pearson coefficient (Chien05) singular spectrum transformation (Ide05) topic based (PLSA, LDA) (Wang09)

Previous work Smoothing is efficient for large amount of data, but not precise Do not abstract away from the raw data Distance based measures suffer from magnitude and sparsity problems Temporal lags are not considered

Roadmap Problem definition Previous work Approach Experiments Summary

Approach Difference in magnitude: normalization with Markov Modulated Poisson Process Temporal lag: flexible alignment of bursts using dynamic programming

Markov-Modulated Poisson Process Ergodic Markov chain over finite number of states Each state is associated with Poisson distribution Burstiness of a state is represented by the intensity parameter of Poisson distribution States are labeled by the rank of the intensity parameter

Normalization mention counts MMPP states

Normalization MMPP consistently outperforms the baseline The optimal performance is achieved when the number of states is 3

Burst Alignment

Burst alignment perfect alignement exponential penalty logarithmic penalty

Burst alignment quadratic penalty function in combination with reward constant of 2 is optimal maximum permitted temporal gap is 1 day

Roadmap Problem definition Previous work Approach Experiments Summary

Dataset News data crawled from RSS feeds over 4 month Basic named entity recognition Basic stemming

Correlated Bursts Pattern 1: World Economic Forum in Davos, Switzerland and death of actor Heath Ledger; Pattern 2: death of Bobby Fischer Pattern 3: assassination of Benazir Bhutto Pattern 4: French bank major trading loss incident and death of George Habash Real life events:

Mining transliterations Static aligned corpora: +identical or semantically related contents +temporal topical alignment -limited coverage Web: +covers almost any domain -difference in burst magnitude -temporal lag between bursts

Transliteration MMPP+DP outperforms one baseline (CS) in all entropy categories and the other baseline (PC) for low- and medium-entropy (more bursty) entities; Combination of MMPP+DP performs better than MMPP alone.

Roadmap Problem definition Previous work Approach Experiments Summary

Novel multi-stream text mining problem Our approach can effectively discover correlated bursts corresponding to major and minor real life events Effective for unsupervised discovery of transliterations Method is data independent and not limited to textual domain

Contributions First method to use MMPP for burst detection in textual streams Algorithm for temporally flexible stream correlation based on bursts Unsupervised method for language-independent transliteration without any linguistic knowledge

Future work Applying proposed method to non-textual data (e.g., sensor streams) Burst correlations between entities different types of Web 2.0 data (news and tweets, news and blogs, news and tags, etc.)