Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ontea: Pattern based Annotation Platform Michal Laclavík.

Similar presentations


Presentation on theme: "Ontea: Pattern based Annotation Platform Michal Laclavík."— Presentation transcript:

1 Ontea: Pattern based Annotation Platform Michal Laclavík

2 Onteahttp://ontea.sourceforge.net2 Ontea Method Motivation –To create semantic meta data from texts or documents Approach –Even unstructured text contains patterns –Patterns can be used to extract various objects from text –Results are: key - value pairs –Such pairs can be transformed to ontology individuals Class – individual Individual – property

3 Onteahttp://ontea.sourceforge.net3 Result Examples Text –Bratislava is the capital of Slovakia. Slovakia is in Europe. Pattern: “(in|by) + (the)? *([A-Z][a-z]+)” for Location Ontea discovers key – value pair: – Location – Europe By transformation to ontology knowledge base - it finds Europe as continent using inference (sub-class of Location) –Continent – Europe More Examples are in the table: #Text Key – valuePatterns – regular expressions 1Apple, Inc.Company: AppleCompany: ([A-Za-z0-9]+)[, ]+(Inc|Ltd) 2Mountain View, CA 94043Settlement: Mountain ViewSettlement: ([A-Z][a-z]+[ ]*[A-Za-z]*)[ ]+[A-Z]{2}[ ]*[0-9]{5} 3laclavik.ui@savba.skEmail: laclavik.ui@savba.skEmail: [-_.a-z0-9]+@[-_.a-zA-Z0-9]+\.[a-z]{2,8} 4Mr. Michal LaclavikPerson: Michal LaclavikPerson: (Mr.|Mrs.|Dr.) ([A-Z][a-z]+ [A-Z][a-z]+)

4 Onteahttp://ontea.sourceforge.net4 Features Identification of concept instances from the ontology Automatic population of ontologies with instances Identifying relevance, when creating instances using information retrieval techniques Large scale semantic annotation of documents or texts using Google’s MapReduce architecture.

5 Onteahttp://ontea.sourceforge.net5 Advantages Simple, customizable method Not tied to document structure Architecture build on detection of key-value pairs and its various transformation. For example: –Text: “Slovensko je v Európe“=> –Extraction: Location – Európe => –Transformation, Lemmatization: Location – Európa => –Transformation, Ontology: Continent – Europe Scalable method. Ported to Grid and Hadoop. Applicable on texts in any language Success rate 60%-90% depending on used patterns, transformers and application

6 Onteahttp://ontea.sourceforge.net6 Integration with other tools Ontea DocConverter Nalit Morphonary Lucene URL Plain Text Language Identification Pattern Matching Transformation: Lemmatization Transformation: Relevance Identification Ontology Repository Transformation: Individual Search and Creation

7 Future research & development http://ontea.sourceforge.net/


Download ppt "Ontea: Pattern based Annotation Platform Michal Laclavík."

Similar presentations


Ads by Google