EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University.

Slides:



Advertisements
Similar presentations
Negative Selection Algorithms at GECCO /22/2005.
Advertisements

A Unified Framework for Context Assisted Face Clustering
ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and.
Random Forest Predrag Radenković 3237/10
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Finding your friends and following them to where you are by Adam Sadilek, Henry Kautz, Jeffrey P. Bigham Presented by Guang Ling 1.
Learning to Cluster Web Search Results SIGIR 04. ABSTRACT Organizing Web search results into clusters facilitates users quick browsing through search.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Bring Order to Your Photos: Event-Driven Classification of Flickr Images Based on Social Knowledge Date: 2011/11/21 Source: Claudiu S. Firan (CIKM’10)
CVPR2013 Poster Representing Videos using Mid-level Discriminative Patches.
Marios Iliofotou (UC Riverside) Brian Gallagher (LLNL)Tina Eliassi-Rad (Rutgers University) Guowu Xi (UC Riverside)Michalis Faloutsos (UC Riverside) ACM.
Christine Preisach, Steffen Rendle and Lars Schmidt- Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Germany Relational.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
CS345 Data Mining Recommendation Systems Netflix Challenge Anand Rajaraman, Jeffrey D. Ullman.
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
Commentary-based Video Categorization and Concept Discovery By Janice Leung.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.
Multi-view Exploratory Learning for AKBC Problems Bhavana Dalvi and William W. Cohen School Of Computer Science, Carnegie Mellon University Motivation.
Scalable Text Mining with Sparse Generative Models
Memoplex Browser: Searching and Browsing in Semantic Networks CPSC 533C - Project Update Yoel Lanir.
Text Classification With Labeled and Unlabeled Data Presenter: Aleksandar Milisic Supervisor: Dr. David Albrecht.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
POTENTIAL RELATIONSHIP DISCOVERY IN TAG-AWARE MUSIC STYLE CLUSTERING AND ARTIST SOCIAL NETWORKS Music style analysis such as music classification and clustering.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Tag-based Social Interest Discovery
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
On Sparsity and Drift for Effective Real- time Filtering in Microblogs Date : 2014/05/13 Source : CIKM’13 Advisor : Prof. Jia-Ling, Koh Speaker : Yi-Hsuan.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
By : Garima Indurkhya Jay Parikh Shraddha Herlekar Vikrant Naik.
We introduce the use of Confidence c as a weighted vote for the voting machine to avoid low confidence Result r of individual expert from affecting the.
Enron Corpus: A New Dataset for Classification By Bryan Klimt and Yiming Yang CEAS 2004 Presented by Will Lee.
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Exploiting Flickr Tags and Groups for Finding Landmark Photos short paper at ECIR 2009 Rabeeh Abbasi, Sergey Chernov, Wolfgang Nejdl, Raluca Paiu, and.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical.
Semi-Automatic Image Annotation Liu Wenyin, Susan Dumais, Yanfeng Sun, HongJiang Zhang, Mary Czerwinski and Brent Field Microsoft Research.
A Critique and Improvement of an Evaluation Metric for Text Segmentation A Paper by Lev Pevzner (Harvard University) Marti A. Hearst (UC, Berkeley) Presented.
E VENT D ETECTION USING A C LUSTERING A LGORITHM Kleisarchaki Sofia, University of Crete, 1.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.
Unsupervised Streaming Feature Selection in Social Media
Improving Music Genre Classification Using Collaborative Tagging Data Ling Chen, Phillip Wright *, Wolfgang Nejdl Leibniz University Hannover * Georgia.
2005/09/13 A Probabilistic Model for Retrospective News Event Detection Zhiwei Li, Bin Wang*, Mingjing Li, Wei-Ying Ma University of Science and Technology.
Scalable Learning of Collective Behavior Based on Sparse Social Dimensions Lei Tang, Huan Liu CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/02/01.
ENHANCING CLUSTER LABELING USING WIKIPEDIA David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab SIGIR’09.
Adaptive Cluster Ensemble Selection Javad Azimi, Xiaoli Fern {azimi, Oregon State University Presenter: Javad Azimi. 1.
Matching References to Headers in PDF Papers Tan Yee Fan 2007 December 19 WING Group Meeting.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
A Simple Approach for Author Profiling in MapReduce
P.Demestichas (1), S. Vassaki(2,3), A.Georgakopoulos(2,3)
Neighborhood - based Tag Prediction
Semi-Supervised Clustering
Saliency-guided Video Classification via Adaptively weighted learning
A Pool of Deep Models for Event Recognition
Liang Zheng and Yuzhong Qu
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
Presentation transcript:

EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

Social Media Sites Host Many “Event” Documents Photo-sharing: Flickr Video-sharing: YouTube Social networking: Facebook 2 “Event”= something that occurs at a certain time in a certain place [Yang et al. ’99]  Popular, widely known events Presidential Inauguration, Thanksgiving Day Parade  Smaller events, without traditional news coverage Local food drive, street fair  … Social media documents for “All Points West” festival, Liberty State Park, New Jersey, 8/8/08

Identifying Events and Associated Social Media Documents  Applications  Event search and browsing  Local search …… 3  General approach: group similar documents via clustering Each cluster corresponds to one event and its associated social media documents

Event Identification: Challenges  Uneven data quality  Missing, short, uninformative text  … but revealing structured context available: tags, date/time, geo-coordinates  Scalability  Dynamic data stream of event information  Unknown number of events  Necessary for many clustering algorithms  Difficult to estimate 4

Clustering Social Media Documents  Social media document representation  Social media document similarity  Social media document clustering  Clustering task: definition  Ensemble algorithm: combining multiple clustering results  Preliminary evaluation 5

Social Media Document Representation Title Description Tags Date/Time Location All-Text 6

Social Media Document Similarity  Text: tf-idf weights, cosine similarity 7 Title Description Tags Date/Time Location All-Text Title Description Tags Date/Time- Keywords Location- Proximity All-Text Location- Keywords Date/Time- Proximity time  Location: geo-coordinate proximity A A A A A A B B B B B B  Time: proximity in minutes

Social Media Document Clustering Framework Document feature representation Social media documents Event clusters 8

Consensus Function: combine ensemble similarities Consensus Function: combine ensemble similarities Clustering: Ensemble Algorithm W title W tags W time 9 f(C,W) C title C tag s C time Ensemble clustering solution Learned in a training step

Clustering: Measuring Quality  Homogeneous clusters 10 ✔ ✔  Complete clusters  Metric: Normalized Mutual Information (NMI) Shared information between clustering solution and “ground truth”

Experimental Setup  Data: >270K Flickr photos  Event labels from Yahoo!’s “upcoming” event database  Split into 3 parts for training/validation/testing  Clusterers: single pass algorithm with centroid similarity  Weighing scheme: Normalized Mutual Information (NMI) scores on validation set  Consensus function: weighted average of clusterers’ binary predictions  Final prediction step: single pass clustering algorithm 11

Preliminary Evaluation Results  Individual clusterer performance  Highest NMI: Tags, All-Text  Lowest NMI: Description, Title  Ensemble performance, compared against all individual clusterers  Highest overall performance in terms of NMI  More homogenous clusters: each event is spread over fewer clusters 12 Details in paper

Document similarity metric  Ensemble approach Weight assignment Choice of clusterers  Train a classifier to predict document similarity Features correspond to similarity scores All-text, title, tags, time, location, etc. Numeric values in [0,1] State-of-the-art classifiers: SVM, Logistic Regression, … 13 Future Work: Alternative Choices

 Final clustering step  Apply graph partitioning algorithms Requires estimating the number of clusters  Evaluation metrics: beyond NMI  Datasets  Flickr LastFM, YouTube  Exploit social network connections 14

Conclusions  Identified events and their corresponding social media documents  Proposed a clustering solution  Leveraged different representations of social media documents  Employed various social media similarity metrics  Developed a weighted ensemble clustering approach  Reported preliminary results of our event identification approach on a large-scale dataset of Flickr photographs 15