Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington.

Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington

Information Extraction: Motivation Status Updates = short realtime messages Low Overhead: Can be created quickly Even on mobile devices Realtime: users report events in progress Often the most up-to date source of information Huge Volume of Users People Tweet about things they find interesting Can use redundancy as a measure of importance

Related Work (Applications) Extracting music performers and locations – (Benson et. al 2011) Predicting Polls (O’Connor et. al. 2010) Product Sentiment (Brody et. al. 2011) Outbreak detection – (Aramaki et. al. 2011)

Outline Motivation Error Analysis of Off The Shelf Tools POS Tagger Named Entity Segmentation Named Entity Classification – Distant Supervision Using Topic Models Tools available: https://github.com/aritter/twitter_nlp https://github.com/aritter/twitter_nlp

Off The Shelf NLP Tools Fail

Twitter Has Noisy & Unique Style

Noisy Text: Challenges Lexical Variation (misspellings, abbreviations) – `2m', `2ma', `2mar', `2mara', `2maro', `2marrow', `2mor', `2mora', `2moro', `2morow', `2morr', `2morro', `2morrow', `2moz', `2mr', `2mro', `2mrrw', `2mrw', `2mw', `tmmrw', `tmo', `tmoro', `tmorrow', `tmoz', `tmr', `tmro', `tmrow', `tmrrow', `tmrrw', `tmrw', `tmrww', `tmw', `tomaro', `tomarow', `tomarro', `tomarrow', `tomm', `tommarow', `tommarrow', `tommoro', `tommorow', `tommorrow', `tommorw', `tommrow', `tomo', `tomolo', `tomoro', `tomorow', `tomorro', `tomorrw', `tomoz', `tomrw', `tomz‘ Unreliable Capitalization – “The Hobbit has FINALLY started filming! I cannot wait!” Unique Grammar – “watchng american dad.”

PART OF SPEECH TAGGING

Part Of Speech Tagging: Accuracy Drops on Tweets Most Common Tag : 76% (90% on brown corpus) Stanford POS : 80% (97% on news)

Part Of Speech Tagging: Accuracy Drops on Tweets Most Common Tag : 76% (90% on brown corpus) Stanford POS : 80% (97% on news) Most Common Errors: – Confusing Common/Proper nouns – Misclassifying interjections as nouns – Misclassifying verbs as nouns

POS Tagging Labeled 800 tweets w/ POS tags – About 16,000 tokens Also used labeled news + IRC chat data (Forsyth and Martell 07) CRF + Standard set of features – Contextual – Dictionary – Orthographic

Results

XX/YY = XX is misclassified as YY

Named Entity Segmentation Off the shelf taggers perform poorly Stanford NER: F1=0.44 not including classification

Annotating Named Entities Annotated 2400 tweets (about 34K tokens) Train on in-domain data

Learning Sequence Labeling Task IOB encoding Conditional Random Fields Features: – Orthographic – Dictionaries – Contextual WordLabel T-MobileB-ENTITY toO releaseO DellB-ENTITY StreakI-ENTITY 7 onO FebO 2ndO

Performance (Segmentation Only)

NAMED ENTITY CLASSIFICATION

Challenges Plethora of distinctive, infrequent types – Bands, Movies, Products, etc… – Very Little training data for these – Can’t simply rely on supervised classification Very terse (often contain insufficient context)

Weakly Supervised NE Classification (Collins and Singer 99) (Etzioni et. al. 05) (Kozareva 06) Freebase lists provide a source of supervision But entities often appear in many different lists, for example “China” could be: – A country – A band – A person (member of the band “metal boys”) – A film (released in 1943)

Weakly Supervised NE Classification (Collins and Singer 99) (Etzioni et. al. 05) (Kozareva 06) Freebase lists provide a source of supervision But entities often appear in many different lists, for example “China” could be: – A country – A band – A person (member of the band “metal boys”) – A film (released in 1943) We need Some way to disambiguate

Distant Supervision With Topic Models Treat each entity as a “document” – Words in document are those which co-occur with entity LabeledLDA (Ramage et. al. 2009) – Constrained Topic Model – Each entity is associated with a distribution over topics Constrained based on FB dictionaries – Each topic is associated with a type (in Freebase)

26 Generative Story

27 For each type, pick a random distribution over words Generative Story

28 Type 1: TEAM P(victory|T1)=0.02 P(played|T1)=0.01 … Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 … For each type, pick a random distribution over words Generative Story

29 Type 1: TEAM P(victory|T1)=0.02 P(played|T1)=0.01 … Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 … For each type, pick a random distribution over words For each entity, pick a distribution over types (constrained by Freebase) Generative Story

30 Type 1: TEAM P(victory|T1)=0.02 P(played|T1)=0.01 … Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 … Seattle P(TEAM| Seattle )=0.6 P(LOCATION| Seattle )=0.4 For each type, pick a random distribution over words For each entity, pick a distribution over types (constrained by Freebase) Generative Story

31 Type 1: TEAM P(victory|T1)=0.02 P(played|T1)=0.01 … Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 … Seattle P(TEAM| Seattle )=0.6 P(LOCATION| Seattle )=0.4 For each type, pick a random distribution over words For each entity, pick a distribution over types (constrained by Freebase) For each position, first pick a type Generative Story

32 Type 1: TEAM P(victory|T1)=0.02 P(played|T1)=0.01 … Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 … Seattle P(TEAM| Seattle )=0.6 P(LOCATION| Seattle )=0.4 Is a TEAM For each type, pick a random distribution over words For each entity, pick a distribution over types (constrained by Freebase) For each position, first pick a type Generative Story

33 Type 1: TEAM P(victory|T1)=0.02 P(played|T1)=0.01 … Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 … Seattle P(TEAM| Seattle )=0.6 P(LOCATION| Seattle )=0.4 Is a TEAM For each type, pick a random distribution over words For each entity, pick a distribution over types (constrained by Freebase) For each position, first pick a type Then pick an word based on type Generative Story

34 Type 1: TEAM P(victory|T1)=0.02 P(played|T1)=0.01 … Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 … Seattle P(TEAM| Seattle )=0.6 P(LOCATION| Seattle )=0.4 Is a TEAM victory For each type, pick a random distribution over words For each entity, pick a distribution over types (constrained by Freebase) For each position, first pick a type Then pick an word based on type Generative Story

35 Type 1: TEAM P(victory|T1)=0.02 P(played|T1)=0.01 … Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 … Seattle P(TEAM| Seattle )=0.6 P(LOCATION| Seattle )=0.4 Is a TEAM victory Is a LOCATION For each type, pick a random distribution over words For each entity, pick a distribution over types (constrained by Freebase) For each position, first pick a type Then pick an word based on type Generative Story

36 Type 1: TEAM P(victory|T1)=0.02 P(played|T1)=0.01 … Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 … Seattle P(TEAM| Seattle )=0.6 P(LOCATION| Seattle )=0.4 Is a TEAM victory Is a LOCATION airport For each type, pick a random distribution over words For each entity, pick a distribution over types (constrained by Freebase) For each position, first pick a type Then pick an word based on type Generative Story

Data/Inference Gather entities and words which co-occur – Extract Entities from about 60M status messages Used a set of 10 types from Freebase – Commonly occur in Tweets – Good coverage in Freebase Inference: Collapsed Gibbs sampling: – Constrain types using Freebase – For entities not in Freebase, don’t constrain

Type Lists

KKTNY = Kourtney and Kim Take New York RHOBH = Real Housewives of Beverly Hills

Evaluation Manually Annotated the 2,400 tweets with the 10 entity types – Only used for testing purposes – No labeled examples for LLDA & Cotraining

Classification Results: 10 Types (Gold Segmentation)

Precision =0.85 Recall=0.24

Classification Results: 10 Types (Gold Segmentation)

Why is LDA winning? Share type info. across mentions – Unambiguous mentions help to disambiguate – Unlabeled examples provide entity-specific prior Explicitly models ambiguity – Each “entity string” is modeled as (constrained) distribution over types – Takes better advantage of ambiguous training data

Segmentation + Classification

Related Work Named Entity Recognition – (Liu et. al. 2011) POS Tagging – (Gimpel et. al. 2011)

Calendar Demo http://statuscalendar.com Extract Entities from millions of Tweets – Using NER trained on Labeled Tweets Extract and Resolve Temporal Expressions – For example “Next Friday” = 02-24-11 Count Entity/Day co-occurrences – G 2 Log Likelihood Ratio Plot Top 20 Entities for Each Day

Contributions Analysis of challenges in noisy text Adapted NLP tools to Twitter Distant Supervision using Topic Models Tools available: https://github.com/aritter/twitter_nlp https://github.com/aritter/twitter_nlp

Classification Results (Gold Segmentation)

Classification Results By Type (Gold Segmentation)

Performance (Segmentation Only)

Part Of Speech Tagging: Accuracy Drops on Tweets Most Common Tag : 76% (90% on brown corpus) Stanford POS : 80% (97% on news)

Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington.

Similar presentations

Presentation on theme: "Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington.

Similar presentations

Presentation on theme: "Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington."— Presentation transcript:

Similar presentations

About project

Feedback