Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.

Slides:



Advertisements
Similar presentations
SEARCHING THE BLOGOSPHERE
Advertisements

A probabilistic model for retrospective news event detection
Business Development Suit Presented by Thomas Mathews.
Our Digital World Second Edition
Fast Algorithms For Hierarchical Range Histogram Constructions
PoliWeb project (PEPS'14) Geraldine Castel CEMRA, Université Stendhal, France Genoveva Vargas-Solar CNRS, LIG-LAFMIA, France Towards a cloud infrastructure.
Presenter: Liu, Ya Tian, Yujia Pham, Anh TwitterMonitor: Trend Detection over the Twitter Stream EvenTweet: Online Localized Event Detection from Twitter.
SNA: Research Dr. Nawaporn Wisitpongphan 1. Michael Mathioudakis, Nick Koudas TwitterMonitor: Trend Detection over the Twitter Stream Michael Mathioudakis,
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Systems Engineering and Engineering Management The Chinese University of Hong Kong Parameter Free Bursty Events Detection in Text Streams Gabriel Pui Cheong.
Search Engines and Information Retrieval
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Blogosphere  What is blogosphere?  Why do we need to study Blog-space or Blogosphere?
Information Retrieval in Practice
University of Minnesota
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
Online communities 1 Theory revision Complete some of the activities in this powerpoint and use the revision book to answer questions.
Data Mining – Intro.
SEO PACKAGES. Types of Plans Starter Plan Business Plan Enterprises Plan.
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Yin Yang (Hong Kong University of Science and Technology) Nilesh Bansal (University of Toronto) Wisam Dakka (Google) Panagiotis Ipeirotis (New York University)
The 2nd International Conference of e-Learning and Distance Education, 21 to 23 February 2011, Riyadh, Saudi Arabia Prof. Dr. Torky Sultan Faculty of Computers.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Discovering Emerging Topics in Social Streams via Link Anomaly Detection.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Web 2.0: Concepts and Applications 4 Organizing Information.
Search Engines and Information Retrieval Chapter 1.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
CONCLUSION & FUTURE WORK Normally, users perform triage tasks using multiple applications in concert: a search engine interface presents lists of potentially.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
Microblogs: Information and Social Network Huang Yuxin.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
VLDB2005 CMS-ToPSS: Efficient Dissemination of RSS Documents Milenko Petrovic Haifeng Liu Hans-Arno Jacobsen University of Toronto.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
What Is Text Mining? Also known as Text Data Mining Process of examining large collections of unstructured textual resources in order to generate new.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Challenge Problem: Link Mining Lise Getoor University of Maryland, College Park.
TWinner : Understanding News Queries with Geo-content using Twitter Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Introduction to Social Media October 28, 2010 Green County High School Vickie Buckman.
Social Media & Social Networking 101 Canadian Society of Safety Engineering (CSSE)
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
Data Mining - Introduction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CLAN SOFT LEARNING VIRTUAL ENVIRONMENT September 19, 2008 Kaunas 2nd Meeting.
Information Retrieval in Practice
Data Mining – Intro.
Introduction Multimedia initial focus
Yi-Chia Wang LTI 2nd year Master student
Fred Dirkse CEO, OIC Group, Inc.
CS7280: Special Topics in Data Mining Information/Social Networks
Presentation transcript:

Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis Gong

Identifying, attributing and describing spatial bursts Problem Description Related Works Solution Experiment & Result Q&A Michael Mathioudakis, Nilesh Bansal, and Nick Koudas Identifying, attributing and describing spatial burstsIdentifying, attributing and describing spatial bursts. Proc. VLDB. 3, 1-2 (September 2010),

BlogScope Automatically collect information. (blogosphere, news sources, social network, online forums.) Advanced information retrieval tasks with data mining and language processing. Warehouses metadata about the content (time of creation, demographic profile of author).

Problem Description User generated content that appears on blogs, microblogging websites, wikis and social networks proliferates at profound rates. Automating the process of information discovery given the vast collection of information. Example: Barack Obama, 2008, Bin laden, recently

Related Works 1. J. Kleinberg. Bursty and hierarchical structure in streams. In KDD, proposed a model for burst identification over document streams. 2. J. M. Kleinberg and E. Tardos. Approximation algorithms for classification problems with pairwise relationships: metric labeling and markov random fields. J. ACM, 2002 provides a 2-approximation linear programming algorithm to spatial burst detection problem. 3. Statistical discrepancy functions are used to quantify the difference between distributions and are commonly used to identify regions where two spatial distributions differ significantly. Such regions can be interpreted as areas where one spatial distribution exhibits a burst in comparison with the other.

Solution Identify spatial burst Burst attribution Keywords based description 6

Spatial Bursts G: grid; for a suitable choice of granularity, geographical entities of interest(cities) correspond to a cell. Rs: the spatial distribution of related documents published within t. Ds: the spatial distribution of all the documents published within t. Spatial bursts are identified as cells for which the value of Rs is large in comparison with Ds.

Burst Attribution Attribute the burst to profile features. 1. Focus on a specific set of bursty cells and ask what are the demographic factors in the absence of which no burst would have been detected. (eg. “Toronto Film Festival”) 2. Compare a bursty region with a non-bursty region and get the demographic factors that make the difference.

Keyword based description of bursts Query Expansion: Identify the keywords highly related to q (bursts for a query q). q U wi. Curve Estimation: the keywords w that occur frequently together with q often exhibit a burst themselves over the same interval. q0[t]est = (1 + ) minfb(q)[t]; b(wi)[t]g

Experiment & Result Average running time of the algorithms

Experiment & Result Queries q were submitted to BlogScope, with temporal interval qt set as the first 10 days of March Retrieving distributions Rs and Ds for a query.

Experiment & Result Parameter Sensitivity

Summary Scalable method to identify spatial information bursts. Efficient techniques to attribute bursts to specific demographic factors. Techniques to analyze bursts and effectively identify sets of keywords that describe the burst.

Early online identification of attention items in social media Problem Description ISIS Model Experiment Result Q&A Michael Mathioudakis, Nick Koudas, and Peter Marbach In Proceedings of the third ACM international conference on Web search and data mining (WSDM '10). ACM, New York, NY, USA,

Problem Description Activity in social media is manifested via interaction that involve text, images, links and other information items. Naturally, some items attract more attention than others, expressed with large volumes of linking, commenting or tagging activity. Being able to identify information items that gather much attention in such a real time information collective is a challenging task.

Comparison (traditional & social media) Traditional webpages – Graph Model (PageRank) diff: 1. Social media is associated with individual documents, pictures, news articles. So it is reasonable to separate the measures for the importance or attention gathering potential of different items. 2.Linking activity in social media is the product of continuous interaction between participating individuals. Dynamic aspects of this process are not captured by graph model.

Comparison (traditional & social media) 3.Linking is not the only action by which structure arises in social media, as individuals also interact by commenting, sharing, recommending or rating.

Subject Proposed the first formal definition and analysis of such a model and use it as a basis to identify attention gathering items in online fashion. Identify individual items that attract a significant number of actions and its main focus is ‘early identification’ of such items.

ISIS Model An abstraction of social media activity. Information units(units) – items such as blog posts status messages, photos, etc. in social media stream. Information sources(sources) – individuals contributing information. A source participate in two sets of stochastic processes: 1. The process of emitting information units in a streaming fashion. 2. Processes of interaction with other sources.

ISIS Model Each unit is associated with a timestamp tp and a validity period dp. The validity periods of units emitted by the same source might overlap.

ISIS Model Source interaction

ISIS Model Source interaction

ISIS Model Source interaction

Experiment Setting

Result Interaction weights of posts in (a) engadget.com (b) techcrunch.com

Result Attention Gathering Posts

Result Quality vs Efficiency Trade-offs

Summary ISIS Model : a general stochastic model for interacting streaming information sources. Measure for the attention gathering potential of information units. Experimental results on real data collected form a period of blogging activity.

Q&A

Thank You Louis Gong