Presentation is loading. Please wait.

Presentation is loading. Please wait.

Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012.

Similar presentations


Presentation on theme: "Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012."— Presentation transcript:

1 Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012

2 Named Entity Disambiguation in TrendMiner Newswire Market data Polls … Multilingual Text Processing (EN, DE, IT, BG, HI) Time-Series Machine Learning models Cross-Lingual Summarisation Knowledge-based Search and Browse TrendMiner Platform Financial Decisions Political Analysis Named Entity Recognition is the first step: and it is important to get it right! Hardik Fintrade Pvt. Ltd. SORA Eurokleis srl

3 Example

4 Linked Data Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/http://lod-cloud.net/

5 Why DBpedia? Regularly updated (from Wikipedia) Good source for named entities A hierarchy of concepts a capital is also a city, but not vice versa Relations between concepts Paris locatedIn France ParisHilton bornIn NewYorkCity

6 Task Identify named entities in text and attach the correct DBpedia URI to each one of them

7 Named Entity Recognition ANNIE Produces NE types such as Organization, Location and Person Resolves coreference Entities with the same meaning are linked E.g. General Motors and GM

8 Entity Linking The Large Knowledge Gazetteer (LKB) Matches text against URIs Match only against the values of The rdf:label and foaf:name properties For all instances of the classes: dbpedia-ont:Person dbpedia-ont:Organisation dbpedia-ont:Place classes.

9 So, why not just combine them? NE types generated by ANNIE miss the URI LKB does not use any context Spurious entities E.g. each letter B is annotated as a possible mention of dbpedia:B_%28Los_Angeles Railway%29 Refers to a line called B operated by Los Angeles Railway

10 How to filter out the noise? Identify NEs (Location, Organisation and Person) using ANNIE For each NE add URIs of matching instances from DBpedia For each ambiguous NE calculate disambiguation scores Remove all matches except the highest scoring one

11 Disambiguation score Uses context A weighted sum of the three similarity metrics String similarity Structural similarity Contextual similarity

12 String similarity Refers to the edit distance between the text string, and the labels matching URIs Paris and Paris Hilton Levenshtein: 0.4166667 Jaccard: 0.5 MongeElcan: 1.0 Paris and Paris, Ontario Levenshtein: 0.35714287 Jaccard: 0.0 MongeElcan: 1.0 Paris Hilton and Paris, Ontario Levenshtein: 0.4285714 JaccardSimilarity: 0.0 MongeElcan: 0.6333333

13 Structural similarity Is there a relation between the ambiguous NE and any other NE from the same sentence or document? Paris....France >> true (Paris capitalOf France) Paris...New York>>true (ParisHilton bornIn NewYorkCity)

14 Contextual similarity The probability that two words appear with a similar set of other words (Random Indexing) Paris FranceParis OntarioParis Hilton 0.9999999:paris 0.3674829:métro 0.356694:paul-martin 0.34328446:lewden 0.33907568:pimpfen 0.33907568:théas 0.33907568:werfft 0.33907568:birmoverse 0.33907568:cszhech 0.330207:pierre 0.6818793:paris 0.6818793:ontario 0.5707274:merrickville-wolford 0.5707274:naiscoutaing 0.5707274:neguaguon 0.5707274:magnetewan 0.5707274:wabauskang 0.5679094:tp 0.5468101:s-e 0.42145208:henvey 0.7042532:hilton 0.70425296:paris 0.2825679:poverty-related 0.276114:jaumont 0.276114:jaune-montagne 0.276114:malancourt-la- montagne 0.26384133:mons–january 0.26142785:métro 0.26125407:tank-tread 0.26125407:“plane’s

15 Evaluation PrecisionRecallf-measure LKB0.030.860.05 LKB+ANNIE0.140.810.24 LKB+ANNIE+Disambiguation0.660.750.70 100 Wikipedia user profiles manually annotated

16 Conclusion Linked Data as an additional knowledge source for resolving context eliminated a large number of incorrect annotations

17 Thank You! Questions? More about the project:http://www.trendminer- project.euhttp://www.trendminer- project.eu Contact: danica.damljanovic@gmail.com


Download ppt "Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012."

Similar presentations


Ads by Google