Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington.

Slides:



Advertisements
Similar presentations
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Advertisements

Open Domain Event Extraction from Twitter
Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.
Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Presenters: Arni, Sanjana.  Subtask of Information Extraction  Identify known entity names – person, places, organization etc  Identify the boundaries.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Unsupervised Modeling of Twitter Conversations
Problem Semi supervised sarcasm identification using SASI
Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16.
Event Extraction Using Distant Supervision Kevin Reschke, Martin Jankowiak, Mihai Surdeanu, Christopher D. Manning, Daniel Jurafsky 30 May 2014 Language.
Cross-Domain Bootstrapping for Named Entity Recognition Ang Sun Ralph Grishman New York University July 28, 2011 Beijing, EOS, SIGIR 2011 NYU.
LingPipe Does a variety of tasks  Tokenization  Part of Speech Tagging  Named Entity Detection  Clustering  Identifies.
Extracting Personal Names from Applying Named Entity Recognition to Informal Text Einat Minkov & Richard C. Wang Language Technologies Institute.
A Novel Approach to Event Duration Prediction
Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,
Modeling Missing Data in Distant Supervision for Information Extraction Alan Ritter Luke Zettlemoyer Mausam Oren Etzioni 1.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.
ELN – Natural Language Processing Giuseppe Attardi
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Truc-Vien T. Nguyen Lab: Named Entity Recognition.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
Short Text Understanding Through Lexical-Semantic Analysis
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
1 Named Entity Recognition based on three different machine learning techniques Zornitsa Kozareva JRC Workshop September 27, 2005.
Researcher affiliation extraction from homepages I. Nagy, R. Farkas, M. Jelasity University of Szeged, Hungary.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Fine-Grained Location Extraction from Tweets with Temporal Awareness Date:2015/03/19 Author:Chenliang Li, Aixin Sun Source:SIGIR '14 Advisor:Jia-ling Koh.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Discriminative Models for Spoken Language Understanding Ye-Yi Wang, Alex Acero Microsoft Research, Redmond, Washington USA ICSLP 2006.
Lecture 13 Information Extraction Topics Name Entity Recognition Relation detection Temporal and Event Processing Template Filling Readings: Chapter 22.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
CS 6998 NLP for the Web Columbia University 04/22/2010 Analyzing Wikipedia and Gold-Standard Corpora for NER Training William Y. Wang Computer Science.
Entity Set Expansion in Opinion Documents Lei Zhang Bing Liu University of Illinois at Chicago.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-
To Link or Not to Link? A Study on End-to-End Tweet Entity Linking Stephen Guo, Ming-Wei Chang, Emre Kıcıman.
Hedge Detection with Latent Features SU Qi CLSW2013, Zhengzhou, Henan May 12, 2013.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
Unsupervised Relation Detection using Automatic Alignment of Query Patterns extracted from Knowledge Graphs and Query Click Logs Panupong PasupatDilek.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
A CRF-BASED NAMED ENTITY RECOGNITION SYSTEM FOR TURKISH Information Extraction Project Reyyan Yeniterzi.
Project Deliverable-1 -Prof. Vincent Ng -Girish Ramachandran -Chen Chen -Jitendra Mohanty.
Understanding unstructured texts via Latent Dirichlet Allocation Raphael Cohen DSaaS, EMC IT June 2015.
Language Identification and Part-of-Speech Tagging
Learning Relational Dependency Networks for Relation Extraction
Automatically Labeled Data Generation for Large Scale Event Extraction
GATE and Social Media: Normalisation and PoS-tagging Leon Derczynski
A Brief Introduction to Distant Supervision
Preeti Bhargava, Nemanja Spasojevic, Guoning Hu
Giuseppe Attardi Dipartimento di Informatica Università di Pisa
CSCE 590 Web Scraping – Information Retrieval
Aspect-based sentiment analysis
Introduction Task: extracting relational facts from text
Predicting Prevalence of Influenza-Like Illness From Geo-Tagged Tweets
Text Mining & Natural Language Processing
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington

Information Extraction: Motivation Status Updates = short realtime messages Low Overhead: Can be created quickly Even on mobile devices Realtime: users report events in progress Often the most up-to date source of information Huge Volume of Users People Tweet about things they find interesting Can use redundancy as a measure of importance

Information Extraction: Motivation Status Updates = short realtime messages Low Overhead: Can be created quickly Even on mobile devices Realtime: users report events in progress Often the most up-to date source of information Huge Volume of Users People Tweet about things they find interesting Can use redundancy as a measure of importance

Related Work (Applications) Extracting music performers and locations – (Benson et. al 2011) Predicting Polls (O’Connor et. al. 2010) Product Sentiment (Brody et. al. 2011) Outbreak detection – (Aramaki et. al. 2011)

Outline Motivation Error Analysis of Off The Shelf Tools POS Tagger Named Entity Segmentation Named Entity Classification – Distant Supervision Using Topic Models Tools available:

Off The Shelf NLP Tools Fail

Twitter Has Noisy & Unique Style

Noisy Text: Challenges Lexical Variation (misspellings, abbreviations) – `2m', `2ma', `2mar', `2mara', `2maro', `2marrow', `2mor', `2mora', `2moro', `2morow', `2morr', `2morro', `2morrow', `2moz', `2mr', `2mro', `2mrrw', `2mrw', `2mw', `tmmrw', `tmo', `tmoro', `tmorrow', `tmoz', `tmr', `tmro', `tmrow', `tmrrow', `tmrrw', `tmrw', `tmrww', `tmw', `tomaro', `tomarow', `tomarro', `tomarrow', `tomm', `tommarow', `tommarrow', `tommoro', `tommorow', `tommorrow', `tommorw', `tommrow', `tomo', `tomolo', `tomoro', `tomorow', `tomorro', `tomorrw', `tomoz', `tomrw', `tomz‘ Unreliable Capitalization – “The Hobbit has FINALLY started filming! I cannot wait!” Unique Grammar – “watchng american dad.”

PART OF SPEECH TAGGING

Part Of Speech Tagging: Accuracy Drops on Tweets Most Common Tag : 76% (90% on brown corpus) Stanford POS : 80% (97% on news)

Part Of Speech Tagging: Accuracy Drops on Tweets Most Common Tag : 76% (90% on brown corpus) Stanford POS : 80% (97% on news) Most Common Errors: – Confusing Common/Proper nouns – Misclassifying interjections as nouns – Misclassifying verbs as nouns

POS Tagging Labeled 800 tweets w/ POS tags – About 16,000 tokens Also used labeled news + IRC chat data (Forsyth and Martell 07) CRF + Standard set of features – Contextual – Dictionary – Orthographic

Results

XX/YY = XX is misclassified as YY

Named Entity Segmentation Off the shelf taggers perform poorly Stanford NER: F1=0.44 not including classification

Named Entity Segmentation Off the shelf taggers perform poorly Stanford NER: F1=0.44 not including classification

Annotating Named Entities Annotated 2400 tweets (about 34K tokens) Train on in-domain data

Learning Sequence Labeling Task IOB encoding Conditional Random Fields Features: – Orthographic – Dictionaries – Contextual WordLabel T-MobileB-ENTITY toO releaseO DellB-ENTITY StreakI-ENTITY 7 onO FebO 2ndO

Performance (Segmentation Only)

NAMED ENTITY CLASSIFICATION

Challenges Plethora of distinctive, infrequent types – Bands, Movies, Products, etc… – Very Little training data for these – Can’t simply rely on supervised classification Very terse (often contain insufficient context)

Challenges Plethora of distinctive, infrequent types – Bands, Movies, Products, etc… – Very Little training data for these – Can’t simply rely on supervised classification Very terse (often contain insufficient context)

Weakly Supervised NE Classification (Collins and Singer 99) (Etzioni et. al. 05) (Kozareva 06) Freebase lists provide a source of supervision But entities often appear in many different lists, for example “China” could be: – A country – A band – A person (member of the band “metal boys”) – A film (released in 1943)

Weakly Supervised NE Classification (Collins and Singer 99) (Etzioni et. al. 05) (Kozareva 06) Freebase lists provide a source of supervision But entities often appear in many different lists, for example “China” could be: – A country – A band – A person (member of the band “metal boys”) – A film (released in 1943) We need Some way to disambiguate

Distant Supervision With Topic Models Treat each entity as a “document” – Words in document are those which co-occur with entity LabeledLDA (Ramage et. al. 2009) – Constrained Topic Model – Each entity is associated with a distribution over topics Constrained based on FB dictionaries – Each topic is associated with a type (in Freebase)

26 Generative Story

27 For each type, pick a random distribution over words Generative Story

28 Type 1: TEAM P(victory|T1)=0.02 P(played|T1)=0.01 … Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 … For each type, pick a random distribution over words Generative Story

29 Type 1: TEAM P(victory|T1)=0.02 P(played|T1)=0.01 … Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 … For each type, pick a random distribution over words For each entity, pick a distribution over types (constrained by Freebase) Generative Story

30 Type 1: TEAM P(victory|T1)=0.02 P(played|T1)=0.01 … Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 … Seattle P(TEAM| Seattle )=0.6 P(LOCATION| Seattle )=0.4 For each type, pick a random distribution over words For each entity, pick a distribution over types (constrained by Freebase) Generative Story

31 Type 1: TEAM P(victory|T1)=0.02 P(played|T1)=0.01 … Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 … Seattle P(TEAM| Seattle )=0.6 P(LOCATION| Seattle )=0.4 For each type, pick a random distribution over words For each entity, pick a distribution over types (constrained by Freebase) For each position, first pick a type Generative Story

32 Type 1: TEAM P(victory|T1)=0.02 P(played|T1)=0.01 … Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 … Seattle P(TEAM| Seattle )=0.6 P(LOCATION| Seattle )=0.4 Is a TEAM For each type, pick a random distribution over words For each entity, pick a distribution over types (constrained by Freebase) For each position, first pick a type Generative Story

33 Type 1: TEAM P(victory|T1)=0.02 P(played|T1)=0.01 … Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 … Seattle P(TEAM| Seattle )=0.6 P(LOCATION| Seattle )=0.4 Is a TEAM For each type, pick a random distribution over words For each entity, pick a distribution over types (constrained by Freebase) For each position, first pick a type Then pick an word based on type Generative Story

34 Type 1: TEAM P(victory|T1)=0.02 P(played|T1)=0.01 … Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 … Seattle P(TEAM| Seattle )=0.6 P(LOCATION| Seattle )=0.4 Is a TEAM victory For each type, pick a random distribution over words For each entity, pick a distribution over types (constrained by Freebase) For each position, first pick a type Then pick an word based on type Generative Story

35 Type 1: TEAM P(victory|T1)=0.02 P(played|T1)=0.01 … Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 … Seattle P(TEAM| Seattle )=0.6 P(LOCATION| Seattle )=0.4 Is a TEAM victory Is a LOCATION For each type, pick a random distribution over words For each entity, pick a distribution over types (constrained by Freebase) For each position, first pick a type Then pick an word based on type Generative Story

36 Type 1: TEAM P(victory|T1)=0.02 P(played|T1)=0.01 … Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 … Seattle P(TEAM| Seattle )=0.6 P(LOCATION| Seattle )=0.4 Is a TEAM victory Is a LOCATION airport For each type, pick a random distribution over words For each entity, pick a distribution over types (constrained by Freebase) For each position, first pick a type Then pick an word based on type Generative Story

Data/Inference Gather entities and words which co-occur – Extract Entities from about 60M status messages Used a set of 10 types from Freebase – Commonly occur in Tweets – Good coverage in Freebase Inference: Collapsed Gibbs sampling: – Constrain types using Freebase – For entities not in Freebase, don’t constrain

Type Lists

KKTNY = Kourtney and Kim Take New York RHOBH = Real Housewives of Beverly Hills

Evaluation Manually Annotated the 2,400 tweets with the 10 entity types – Only used for testing purposes – No labeled examples for LLDA & Cotraining

Classification Results: 10 Types (Gold Segmentation)

Precision =0.85 Recall=0.24

Classification Results: 10 Types (Gold Segmentation)

Why is LDA winning? Share type info. across mentions – Unambiguous mentions help to disambiguate – Unlabeled examples provide entity-specific prior Explicitly models ambiguity – Each “entity string” is modeled as (constrained) distribution over types – Takes better advantage of ambiguous training data

Segmentation + Classification

Related Work Named Entity Recognition – (Liu et. al. 2011) POS Tagging – (Gimpel et. al. 2011)

Calendar Demo Extract Entities from millions of Tweets – Using NER trained on Labeled Tweets Extract and Resolve Temporal Expressions – For example “Next Friday” = Count Entity/Day co-occurrences – G 2 Log Likelihood Ratio Plot Top 20 Entities for Each Day

Contributions Analysis of challenges in noisy text Adapted NLP tools to Twitter Distant Supervision using Topic Models Tools available:

Contributions Analysis of challenges in noisy text Adapted NLP tools to Twitter Distant Supervision using Topic Models Tools available:

Classification Results (Gold Segmentation)

Classification Results By Type (Gold Segmentation)

Performance (Segmentation Only)

Part Of Speech Tagging: Accuracy Drops on Tweets Most Common Tag : 76% (90% on brown corpus) Stanford POS : 80% (97% on news)