Presentation is loading. Please wait.

Presentation is loading. Please wait.

Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.

Similar presentations


Presentation on theme: "Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding."— Presentation transcript:

1 Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

2 Toward a Semantic Web Fully automatic methods for the semantic annotation are needed Related topics  Information retrieval (IR)  Information extraction (IE)  Name-entity recognition (NER)  Annotation processes

3 Semantic Annotation Diagram

4 Name Entities Named Entities (NE)  people, organizations, locations, and others referred by name. May also include scalars and expressions  numbers, amounts of money, dates, etc. (NUMEX, TIMEX) Hypothesis  Named entities (and the relations between them) mentioned in a resource constitute an important part of its semantics

5 Semantic Annotation of NEs Semantic Annotation of the NEs in a text includes:  Recognition of the type of the entities in the text  Identification of the entity individual Comparison  the traditional NER approach results in: Yihong Ding  the Semantic Annotation of NEs should result in something like the following: Yihong Ding

6 The KIM Platform The Knowledge and Information Management Platform provides:  Automatic Semantic Annotation of NEs (and relations between them)  Ontology Population with NE individuals and relations  Indexing and Retrieval w.r.t NEs  Query and Navigation over the Formal Knowledge

7 KIM Constituents KIM Ontology (KIMO) KIM World KB KIM Server – with API for remote access and integration Front-ends: KIM Web UI, Plug-in for Internet Explorer, and KB Explorer

8 KIM Bases KIM is based on the following open-source platforms:  GATE – NLP and IE platform in University of Sheffield  Sesame – RDF(S) repository Administrator b.v. Ontology Middleware and Custom Inference by Ontotext as extensions of Sesame  Lucene – open source IR-engine from Apache

9 KIM Architecture

10 KIM Ontology (KIMO) Light-weight upper- level ontology  250 NE classes  100 relations and attributes: covers mostly NE classes, and ignores general concepts includes classes representing lexical resources www.ontotext.com/KIM/ kimo.rdfs

11 KIM World KB A projection of the world (domain ontology)  Quasi-exhaustive coverage of the most popular entities in the world  Entities of general importance – like the ones that appear in the news At present KIM KB consists of about 200,000 entities:  50,000 locations, 130,000 organizations, 6000 people, etc.

12 Entity Description NEs are represented in KIM World KB with their Semantic Descriptions consisting of…  Aliases (Florida & FL)  Relations with other entities (Person hasPosition Position)  Attributes (latitude & longitude of geographic entities)  Proper class of the NE

13 KIM Server APIs for:  Semantic Annotation  Document Persistence  Indexing & Retrieval of documents w.r.t NEs  Semantic Repository Access & Exploration

14 KIM Semantic Information Extraction Based on GATE  NLP IE platform  Rules now based on ontology classes instead of a flat set of NE types Recognition and Identification of the NEs IE supported by a Semantic Repository  Containing lexical and gazetteer resources  Annotations referring to Entity Descriptions Ontology Population with the newly recognized entities & relations

15 KIM IE Pipeline

16 KIM Plug-in

17 KIM IE Performance Evaluated over 3 human-annotated corpora of news articles:  International Business News, International Political News, and UK Political News (~500 articles): Precision 86%, Recall 84% w.r.t the standard NE types But these metrics are not representative for semantic annotation

18 Semantic Annotation Metrics There are no established metrics for semantic annotation:  No human-annotated corpora with precise class and instance information  No metrics for various partial matches When a more specific class is recognized When a more general class is recognized When the class is correctly recognized, but the individual entity is not correctly identified.

19 Conclusion It is possible to adopt traditional IE techniques for semantic annotation It is worth using almost-exhaustive entity knowledge for IE KIM is still under development  Proper evaluation metrics  Precise disambiguation  More advanced IE techniques  KIM ontology and KB development


Download ppt "Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding."

Similar presentations


Ads by Google