Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012.

Slides:



Advertisements
Similar presentations
From 8 miles high to ground zero: granularity in the information landscape Gordon Dunsire.
Advertisements

Who? What? When? Where? Reaching Jewish resources and knowledge Chi? Cosa? Quando? Dove? Accedere alle risorse ebraiche Dov Winer Scientific Manager, Judaica.
Relevance Feedback Limitations –Must yield result within at most 3-4 iterations –Users will likely terminate the process sooner –User may get irritated.
Navigating Cultural Heritage Collections using Pathways N. Aletras, P.D. Clough, S. Fernando, N.Ford, P. Goodale, M.M. Hall, M. Stevenson University of.
26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.
Funded by: European Commission – 6th Framework Project Reference: IST WP6 review presentation GATE ontology QuestIO - Question-based Interface.
University of Sheffield NLP Exercise I Objective: Implement a ML component based on SVM to identify the following concepts in company profiles: company.
Natural Language Interfaces to Ontologies Danica Damljanović
Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
Gerhard Weikum Max Planck Institute for Informatics & Saarland University Semantic Search: from Names and Phrases to.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Linked data: P redicting missing properties Klemen Simonic, Jan Rupnik, Primoz Skraba {klemen.simonic, jan.rupnik,
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
02/04/09Danica Damljanović1 Natural Language Interfaces to conceptual models: usability and performance Danica Damljanović
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Digital textNamed Entities Hovering over a named entity highlights the areas where it appears in the text.
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.
Identification of Composite Named Entities in a Spanish Textual Database Sofía N. Galicia-Haro Facultad de Ciencias - UNAM Alexander F. Gelbukh and Igor.
Named Entity Disambiguation Based on Explicit Semantics Martin Jačala and Jozef Tvarožek Špindlerův Mlýn, Czech Republic January 23, 2012 Slovak University.
National libraries and identity in the Semantic Web Gordon Dunsire BNE, Madrid, 14 Dec 2011.
Ontology-based Information Extraction for Business Intelligence
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.
A Web-based Question Answering System Yu-shan & Wenxiu
Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching ER 2012 October 2012, Florence.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
The Problem Finding information about people in huge text collections or on-line repositories on the Web is a common activity Person names, however, are.
Intelius-NYU Cold Start System Ang Sun, Xin Wang, Sen Xu, Yigit Kiran, Shakthi Poornima, Andrew Borthwick (Intelius Inc.) Ralph Grishman (New York University)
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Survey of Semantic Annotation Platforms
University of Sheffield, NLP Entity Linking Kalina Bontcheva © The University of Sheffield, This work is licensed under the Creative Commons.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
11 CANTINA: A Content- Based Approach to Detecting Phishing Web Sites Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/6/7.
Using String Similarity Metrics for Terminology Recognition Jonathan Butters March 2008 LREC 2008 – Marrakech, Morocco.
 Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Enabling Networked Knowledge.
Extracting Metadata for Spatially- Aware Information Retrieval on the Internet Clough, Paul University of Sheffield, UK Presented By Mayank Singh.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
© Copyright 2013 STI INNSBRUCK “How to put an annotation in HTML?” Ioannis Stavrakantonakis.
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
Using linked data to interpret tables Varish Mulwad September 14,
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Natural Language Interfaces to Ontologies Danica Damljanović
Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.
LREC Authors Mithun Balakrishna, Dan Moldovan, Marta Tatu, Marian Olteanu Presented by Chris Irwin Davis Semi-Automatic Domain Ontology Creation.
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Uncertainty reasoning for Linked.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Linked Data Profiling Andrejs Abele UNLP PhD Day Supervisor: Paul Buitelaar.
Ontology Evaluation Outline Motivation Evaluation Criteria Evaluation Measures Evaluation Approaches.
Preeti Bhargava, Nemanja Spasojevic, Guoning Hu
Big Data Quality the next semantic challenge
X Ambiguity & Variability The Challenge The Wikifier Solution
An Empirical Study of Property Collocation on Large Scale of Knowledge Base 龚赛赛
Property consolidation for entity browsing
A Graph-Based Approach to Learn Semantic Descriptions of Data Sources
A Framework for Benchmarking Entity-Annotation Systems
Leverage Consensus Partition for Domain-Specific Entity Coreference
Text Annotation: DBpedia Spotlight
Hierarchical, Perceptron-like Learning for OBIE
Presentation transcript:

Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012

Named Entity Disambiguation in TrendMiner Newswire Market data Polls … Multilingual Text Processing (EN, DE, IT, BG, HI) Time-Series Machine Learning models Cross-Lingual Summarisation Knowledge-based Search and Browse TrendMiner Platform Financial Decisions Political Analysis Named Entity Recognition is the first step: and it is important to get it right! Hardik Fintrade Pvt. Ltd. SORA Eurokleis srl

Example

Linked Data Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.

Why DBpedia? Regularly updated (from Wikipedia) Good source for named entities A hierarchy of concepts a capital is also a city, but not vice versa Relations between concepts Paris locatedIn France ParisHilton bornIn NewYorkCity

Task Identify named entities in text and attach the correct DBpedia URI to each one of them

Named Entity Recognition ANNIE Produces NE types such as Organization, Location and Person Resolves coreference Entities with the same meaning are linked E.g. General Motors and GM

Entity Linking The Large Knowledge Gazetteer (LKB) Matches text against URIs Match only against the values of The rdf:label and foaf:name properties For all instances of the classes: dbpedia-ont:Person dbpedia-ont:Organisation dbpedia-ont:Place classes.

So, why not just combine them? NE types generated by ANNIE miss the URI LKB does not use any context Spurious entities E.g. each letter B is annotated as a possible mention of dbpedia:B_%28Los_Angeles Railway%29 Refers to a line called B operated by Los Angeles Railway

How to filter out the noise? Identify NEs (Location, Organisation and Person) using ANNIE For each NE add URIs of matching instances from DBpedia For each ambiguous NE calculate disambiguation scores Remove all matches except the highest scoring one

Disambiguation score Uses context A weighted sum of the three similarity metrics String similarity Structural similarity Contextual similarity

String similarity Refers to the edit distance between the text string, and the labels matching URIs Paris and Paris Hilton Levenshtein: Jaccard: 0.5 MongeElcan: 1.0 Paris and Paris, Ontario Levenshtein: Jaccard: 0.0 MongeElcan: 1.0 Paris Hilton and Paris, Ontario Levenshtein: JaccardSimilarity: 0.0 MongeElcan:

Structural similarity Is there a relation between the ambiguous NE and any other NE from the same sentence or document? Paris....France >> true (Paris capitalOf France) Paris...New York>>true (ParisHilton bornIn NewYorkCity)

Contextual similarity The probability that two words appear with a similar set of other words (Random Indexing) Paris FranceParis OntarioParis Hilton :paris :métro :paul-martin :lewden :pimpfen :théas :werfft :birmoverse :cszhech :pierre :paris :ontario :merrickville-wolford :naiscoutaing :neguaguon :magnetewan :wabauskang :tp :s-e :henvey :hilton :paris :poverty-related :jaumont :jaune-montagne :malancourt-la- montagne :mons–january :métro :tank-tread :“plane’s

Evaluation PrecisionRecallf-measure LKB LKB+ANNIE LKB+ANNIE+Disambiguation Wikipedia user profiles manually annotated

Conclusion Linked Data as an additional knowledge source for resolving context eliminated a large number of incorrect annotations

Thank You! Questions? More about the project: project.euhttp:// project.eu Contact: