To Link or Not to Link? A Study on End-to-End Tweet Entity Linking Stephen Guo, Ming-Wei Chang, Emre Kıcıman.

Slides:



Advertisements
Similar presentations
Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
Advertisements

By Klejdi Muca & Stephen Quinn. A method used by companies like IMDB or Netlfix to turn raw data into useful information, for example It helps companies.
SUPPORTING NOTES FOR PRACTICAL UNIT “In addition to presenting the extract (s) from their selected play, the students will need to provide supporting notes.
Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
PSMAGE: Balanced Map Generation for StarCraft Alberto Uriarte and Santiago Ontañón Drexel University Philadelphia 1/34 August 11, 2013.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extraction Processes and Data Integration Methods William W. Cohen, Sunita Sarawagi.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Personalized Abstraction of Broadcasted American Football Video by Highlight Selection Noboru Babaguchi (Professor at Osaka Univ.) Yoshihiko Kawai and.
Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao.
Information Retrieval in Practice
WEBQUEST Let’s Begin TITLE AUTHOR:. Let’s continue Return Home Introduction Task Process Conclusion Evaluation Teacher Page Credits This document should.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Distributed Representations of Sentences and Documents
Scalable Text Mining with Sparse Generative Models
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Overview of Search Engines
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Projects ( ) Ida Mele. Rules Students have to work in teams (max 2 people). The project has to be delivered by the deadline that will be published.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Put the Lesson Title Here A webquest for xth grade Designed by Put your You may include graphics, a movie, or sound to any of the slides. Introduction.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
1 Named Entity Recognition based on three different machine learning techniques Zornitsa Kozareva JRC Workshop September 27, 2005.
Resource Curation and Automated Resource Discovery.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
ACBiMA: Advanced Chinese Bi-Character Word Morphological Analyzer 1 Ting-Hao (Kenneth) Huang Yun-Nung (Vivian) Chen Lingpeng Kong
Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad
Learning to Link with Wikipedia David Milne and Ian H. Witten Department of Computer Science, University of Waikato CIKM 2008 (Best Paper Award) Presented.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
CS 6998 NLP for the Web Columbia University 04/22/2010 Analyzing Wikipedia and Gold-Standard Corpora for NER Training William Y. Wang Computer Science.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts Zhe Zhao Paul Resnick Qiaozhu Mei Presentation Group 2.
Certificate in Digital Applications – Level 02 Multimedia Showcase – DA202.
FORESTUR How to work… …with this training platform? …with this methodology?
Hi [name]! Welcome to [camp]. I’m camp counselor Biebs. Over the years our camp as grown so much, and each summer we have hundreds of boys and girls come.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth This research.
Date: 2013/10/23 Author: Salvatore Oriando, Francesco Pizzolon, Gabriele Tolomei Source: WWW’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang SEED:A Framework.
Exploiting Relevance Feedback in Knowledge Graph Search
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Twitter Hashtags RMBI4310Spring 2016 Group 14 Cheung Hiu Yan, Debbie Chow Miu Lam, Carman Tsang Wing Wah, Denise
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Information Retrieval in Practice
Automatically Labeled Data Generation for Large Scale Event Extraction
Concept Grounding to Multiple Knowledge Bases via Indirect Supervision
Erasmus University Rotterdam
Aspect-based sentiment analysis
GLOW- Global and Local Algorithms for Disambiguation to Wikipedia
Lecture 24: NER & Entity Linking
Social Knowledge Mining
Twitter 330 million Montlhy Active Users
The World of Social Media
EDIUM: Improving Entity Disambiguation via User modelling
Effective Entity Recognition and Typing by Relation Phrase-Based Clustering
GANG: Detecting Fraudulent Users in OSNs
Summarization for entity annotation Contextual summary
CS565: Intelligent Systems and Interfaces
Entity Linking Survey
Put the Lesson Title Here
Presentation transcript:

To Link or Not to Link? A Study on End-to-End Tweet Entity Linking Stephen Guo, Ming-Wei Chang, Emre Kıcıman

Motivation  Microblogs are data gold mines! Twitter reports that it alone captures over 340M short messages per day  Many applications on tweet information extraction Election results (Tumasjan et al., 2010) Disease spreading (Paul and Dredze, 2011) Tracking product feedback and sentiment (Asur and Huberman, 2010)...  Existing tools (for example, NER) are often too limited Stanford NER on tweets set achieves 44% F1 [Ritter et. al, 2011] 2

Entity Linking (Wikifier) in Tweets Oh Yes!! giants vs packers game now!! Touchdown!! Q1: Which phrase should be linked? (mention detection) Q2: Which Wikipedia page should be linked for selected phrases? (disambiguation) 3

Contributions  Proposed a new evaluation scheme for entity linking A natural evaluation scheme for microblogs  A system that performs significantly better on tweets than other systems Learn to detect mention and perform linking jointly Outperform Tagme [Ferragina & Scaiella 2010] and [Cucerzan 07] by 15% F1  What we have learned Mention detection is a difficult problem Entity information can help mention detection 4

Outline  Task Definition (again!)  Two stage versus Joint  Model + Features  Results + Analysis 5

What should be linked? Oh Yes!! giants vs packers game now!! Touchdown!! Comparing different Wikifiers is a tough problem [Cornolti, WWW 2013] Really, there is no good definition on what should be linked 6

Our Scenario 7 What people are talking about the movie “The Town” on twitter?  Assume our customers are only interested in entities of certain types Movies; Video Games; Sports Team;… Type information can be directly inferred by the corresponding Wikipedia page  Now, it is fair to compare different systems We assume PER, LOC, ORG, BOOK, TVSHOW, MOVIE

The Desired Results 8 Oh Yes!! giants vs packers game now!! Touchdown!!

Terminology 9 Oh Yes!! giants vs packers game now!! Touchdown!! Mention Candidates Entity Mentions Assignment

Related Work  Wikifier [Cucerzan, 2007; Milne and Witten, 2008…….] Given a document, create Wikipedia-like links Very difficult to evaluate/compare Mention detection and disambiguation are often treated separately  NER [Li et al., 2012; Ritter et al., 2011,...] No Linking Limited Types  KBP [Ji et al., 2010; Ji et al., 2011,...] Focus on disambiguation aspect 10

Outline  Task Definition (again!)  Two stage versus Joint  Model + Features  Results + Analysis 11

What approach should we use?  Task: Wikifier to the entities of the certain types (all named entities)  Approach 1: Train a general named entity recognizer for those types Link to entities from the output of the first stage  Approach 2: Learn to jointly detect mention and disambiguate entities Take advantage of Wikipedia information Take advantage of type information into our model 12 Advanced model Limited Types; Adaptation

The Necessity of the Joint Approach The town is so so good, Don’t worry Ben, we already forgave you for Gigli  Q: Is “the town” a mention?  Deep analysis with knowledge is required Gigli is Ben Affleck’s movie, which did not receive a good review Ben Affleck is the lead actor in the movie “The Town” 13

Outline  Task Definition (again!)  Two stage versus Joint  Model + Features  Results + Analysis 14

Features 15 Oh Yes!! giants vs packers game now!! Touchdown!! Mention Specific Features Mention, Entity Pair Features 2-nd Order FeaturesType Features

Mention Specific Features 16

View Count  The Wikipedia statistics Log exists for every hour Very valuable data  View count is useful Sometimes the most linked entity in Wikipedia is not the most popular one “jersey shore” ==> ? Jersey Shore links: 441 views: Jersey Shore (TV_series) links: 324 views:

Second Order Features 18

Type Features  The information content on Wikipedia are different from Twitter Wikipedia is informational; Tweets are actionable Misspelled words: “watchin, watchn, …… “  We want to find context for PER, LOC, ORG,… for tweets Step 1: train on a system Step 2: labeled 10 million unlabeled tweets Step 3: Collect popular contextual words for each type Step 4: train a new system with one new feature Check if the context match the type 19

Mining Contextual Words Entity TypeWords appearing before the mention Words appearing after the mention Personwr, dominating, rip, quarterback, singer, featuring, defender, rb, minister, actress, twitition, secretary tarde, format, noite, suffers, dire, admits, senators, urges, performs, joins TV Showsbs, assistir, assistindo, otm, watching, nw, watchn, viagra, watchin, ver skit, performances, premieres, finale, parody, marathon, season, episodes, spoilers, sketch 20

Procedure  Testing: step 1 Given a tweet Tokenize it, remove symbols, segment hashtags  Testing: step 2 For all k-gram words in the tweet, do table look up To find mention candidates and the entities they can link to  Testing: step 3 Construct features and output the assignment with the trained model  Learning: Structural SVM; Inference: Exact/Beamseach A rule-base system for categorizing Wikipedia 21

Outline  Task Definition (again!)  Two stage versus Joint  Model + Features  Results + Analysis 22

Data Train % Test % Test %  We sample two sets of tweets Train, Test 1 from [Ritter 2011] Test 2 from Twitter with entertainment keywords “director, actress”……  is very high Many, many algorithms focus on disambiguation However, if the mention are correctly extracted, the system is already very good 23

Main Results  TagMe [Ferragina & Scaiella 2010] and Cucerzan [Cucerzan 07] Cucerzan is designed for well-written documents We have a more principle way to handle mention detection than Tagme 24

Impact of Features  Entity information helps mention detections  Mining contextual words helps a bit  Capturing Entity-Entity relation also improves the model 25 Feature TypeTest 1 Base + Cap. Rate45.6

Conclusion & Discussions  We provide an experimental study on tweets Jointly detect mentions and disambiguate A structured learning approach  What have we learned Mention detection is a difficult problem Entity information could potentially help mention detection  Future work Explore the connections between the joint approaches and the two stage approaches [Illinois—ACL 2011, Aida-- VLDB 2011] A more principled way to handle context 26