Presentation is loading. Please wait.

Presentation is loading. Please wait.

Text Annotation: DBpedia Spotlight

Similar presentations


Presentation on theme: "Text Annotation: DBpedia Spotlight"— Presentation transcript:

1 Text Annotation: DBpedia Spotlight
From: To: (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.

2 Challenge: Term Ambiguity
...this apple on the palm of my hand... ...Apple tried to acquire Palm Inc.... ...eating an apple sitted by a palm tree... What do “apple” and “palm” mean in each case? Our objective is to recognize entities/topics and disambiguate their meaning, generating DBpedia annotation in text.

3 DBpedia Spotlight DBpedia is a collection of entity descriptions extracted from Wikipedia & shared as linked data DBpedia Spotlight uses data from DBpedia and text from associated Wikipedia pages Learns how to recognize that a DBpedia resource was mentioned Given plain text as input, generates annotated text

4 Stage 1: Spotting Find substrings that seem worthy of annotation
Simplest approach relies on a dictionary of known entity names. Other: Named Entity Recognition, Keyphrase Extraction, ... Input: (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. Output: “Lennon”, “McCartney”, “New York”, “Apple Corps”

5 Stage 2: Candidate Mapping
Find possible meanings for each fo the spotted substrings. Input (spotted names): “Lennon”, “McCartney”, “New York”, “Apple Corps” Output (candidate map): “Lennon”: { Lennon_(album), Lennon,_Michigan, … } “McCartney”: { McCartney(surname), Paul_McCartney, … } “New York”: { New_York_State, New_York_City, … } “Apple Corps”: { Apple_Corps }

6 Candidate Map: Disambiguation Pages
6 Candidate Map: Disambiguation Pages Collectively provide a list of ambiguous terms and meanings for each Annotator 3 vs Annotator 4 (Kappa = 0.385)

7 Candidate Map: Redirects
Apple_Inc AAPL Apple (Company) Apple (Computers) Apple (company) Apple (computer) Apple Company Apple Computer Apple Computer Co. Apple Computer Inc. Apple Computer Incorporated Apple Computer, Inc Apple Computer, Inc. Apple Computers Apple Inc Apple Incorporate Apple Incorporated Apple India Apple comp Apple compputer Apple computer Apple computer Inc Apple computers Apple inc Apple inc. Apple incoporated Apple incorporated Apple pc Apple's Apple, Inc

8 Stage 3: Disambiguation
Select the correct candidate DBpedia Resource for a given surface form. Decision is made based on the context(1) the surface form was mentioned con·text  (kntkst)n. 1. the parts of a discourse that surround a word or passage and can throw light on its meaning

9 Learning the Context for a resource
Collect context for DBpedia Resources from all articles in Wikipedia e.g. Co-occurrence Statistics Lennon = {John:981, Beatles:320, McCartney:100, ...} Types of context Wikipedia Pages Definitions from disambiguation pages Paragraphs that link to resources (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.

10 Disambiguation in DBpedia Spotlight
Model DBpedia Resources as vectors of terms found in Wikipedia text Define functions for term scoring and vector similarity (e.g. frequency and cosine) Rank candidate resource vectors based on their similarity with vector of input text Choose highest ranking candidate Lennon = {Beatles,McCartney,rock,guitar,...} Lennon = {tf(Beatles)=320,tf(McCartney)=100,...} Cos(Input,Lennon) = 0.12

11 Many scoring techniques
Cosine similarity of tf*idf scores Probabilistic Collective scoring


Download ppt "Text Annotation: DBpedia Spotlight"

Similar presentations


Ads by Google