Ontologies, semantic annotation and GATE Kalina Bontcheva Johann Petrak University of Sheffield.

Slides:



Advertisements
Similar presentations
1/(20) Introduction to ANNIE Diana Maynard University of Sheffield March 2004
Advertisements

University of Sheffield NLP Exercise I Objective: Implement a ML component based on SVM to identify the following concepts in company profiles: company.
University of Sheffield NLP Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve Gorrell.
University of Sheffield NLP Module 4: Machine Learning.
Chronos: A Tool for Handling Temporal Ontologies in Protégé
1 A Description Logic with Concrete Domains CS848 presentation Presenter: Yongjuan Zou.
So What Does it All Mean? Geospatial Semantics and Ontologies Dr Kristin Stock.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.
Ontology-based Information Extraction for Business Intelligence
Editing Description Logic Ontologies with the Protege OWL Plugin.
Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Semantic Web. Course Content
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
Practical RDF Chapter 1. RDF: An Introduction
Survey of Semantic Annotation Platforms
Protege OWL Plugin Short Tutorial. OWL Usage The world wide web is a natural application area of ontologies, because ontologies could be used to describe.
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
A view-based approach for semantic service descriptions Carsten Jacob, Heiko Pfeffer, Stephan Steglich, Li Yan, and Ma Qifeng
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
University of Sheffield NLP Teamware: A Collaborative, Web-based Annotation Environment Kalina Bontcheva, Milan Agatonovic University of Sheffield.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Introduction to GATE Developer Ian Roberts. University of Sheffield NLP Overview The GATE component model (CREOLE) Documents, annotations and corpora.
LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
University of Sheffield, NLP Module 6: ANNIC Kalina Bontcheva © The University of Sheffield, This work is licensed under the Creative Commons.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
1/(27) GATE Ontology Tools GATE Training Course October 2006 Kalina Bontcheva
WonderWeb. Ontology Infrastructure for the Semantic Web. IST Project Review Meeting, 11 th March, WP2: Tools Raphael Volz Universität.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Ontology Technology applied to Catalogues Paul Kopp.
University of Sheffield, NLP Module 6: Ontologies and Semantic Annotation © The University of Sheffield, This work is licensed under the Creative.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
1 Budapest University of Technology and Economics Department of Measurement and Information Systems Budapest University of Technology and Economics Fault.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Linked Open Data Dataset from Related Documents Petya Osenova and Kiril Simov IICT-BAS LDL-2016, LREC, Portoroz.
16 April 2011 Alan, Edison, etc, Saturday.. Knowledge, Planning and Robotics 1.Knowledge 2.Types of knowledge 3.Representation of knowledge 4.Planning.
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
Ontologies and semantic annotation. University of Sheffield, NLP Ontology – A Definition “An Ontology is a formal specification of a shared conceptualisation.”
University of Sheffield NLP Module 9: Ontologies and Semantic Annotation © The University of Sheffield, This work is licensed under the Creative.
University of Sheffield NLP Sentiment Analysis (Opinion Mining) with Machine Learning in GATE.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Towards a framework for architectural design decision support
Encoding Extraction as Inferences
Module 9: Ontologies and Semantic Annotation
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Presented by: Hassan Sayyadi
Supervised Machine Learning
CS416 Compiler Design lec00-outline September 19, 2018
ece 627 intelligent web: ontology and beyond
2. An overview of SDMX (What is SDMX? Part I)
CS416 Compiler Design lec00-outline February 23, 2019
Hierarchical, Perceptron-like Learning for OBIE
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
A framework for ontology Learning FROM Big Data
Presentation transcript:

Ontologies, semantic annotation and GATE Kalina Bontcheva Johann Petrak University of Sheffield

University of Sheffield, NLP Topics Ontologies Semantic annotation Ontology population Ontology learning

University of Sheffield, NLP Ontology - What? “An Ontology is a formal specification of a shared conceptualisation.” [Gruber] Set of concepts (instances and classes) Relationships between concepts (is-a, is-subclass, is-part, located-in) Allows reasoning – Class membership, inferred properties... – Need tradeoff: expressivity vs. reasoning complexity and decidability

University of Sheffield, NLP Ontology – How? RDF/RDFS – Triple-based representation scheme OWL 1.1 / OWL 2 – Ontology representation formalism based on RDF/RDFS Description Logic – Logic based KR formalism used for OWL, allows well-defined sublanguages. OWL 1.1: OWL-Lite, OWL-DL, OWL-Full official sublanguages, several inofficial others OWL 2: language profiles ==> expressiveness / reasoning effort trade-off

University of Sheffield, NLP OWL – Issues OWA – Open World Assumption: if something is not in the ontology, it can still be true No UNA – No Unique Name Assumption: one entity can have different names owl:Class vs. rdfs:Class

University of Sheffield, NLP Ontologies in GATE Abstract ontology model for the API: Comes with one concrete implementation preinstalled: Sesame/OWLIM Comes with several tools: – Ontology Visualizer/Editor – OntoGazetteer, OntoRootGazetteer – Ontology support in JAPE

University of Sheffield, NLP Ontology implementation SwiftOWLIM2 from Ontotext A Sesame1 repository SAIL Fast in memory repository, scales to millions of statements (depending on RAM) Supports “almost OWL-Lite” SwiftOWLIM is exchangeable with persistence-based BigOWLIM: not free, scales to billions of statements. Planned: Migration to Sesame2/OWLIM3

University of Sheffield, NLP Ontology API Ontology, Ontology resources represented as Java objects: gate.creole.ontology Ontology, OClass, OResource, URI, Literal Currently: ~ OWL-Lite actions OWLIMOntologyLR is a Java Ontology object JAPE RHS can access Ontology object

University of Sheffield, NLP Ontology API URI uri = new URI(“ OClass c = ontology.addClass(uri); Datatype dt = new Datatype(XMLStringURI); DatatypeProperty dtp = ontology.addDatatypeProperty(uri2,domain,dt); OInstance i = ontology.addOInstance(uri3,c); Set scs = c.getSuperClasses(DIRECT_CLOSURE); i.addDatatypePropertyValue(dtp, new Literal(“thevalue”));

University of Sheffield, NLP Ontology Viewer/Editor Basic viewing of ontologies, to allow their linking to texts via semantic annotation Some edit functionalities: –create new concepts and instances –define new properties and property values –deletion Some limitations of what's supported, basically chosen from practical needs for semantic annotation Not a Protege replacement

University of Sheffield, NLP Ontology Editor

University of Sheffield, NLP PROTON Ontology -a light-weight upper-level ontology; -250 NE classes; -100 relations and attributes; entity descriptions; -covers mostly NE classes, and ignores general concepts; -includes classes representing lexical resources. proton.semanticweb.org

University of Sheffield, NLP Hands-on 1 ● Load Ontology_Tools plugin ● Language Resource → New → OWLIMOntologyLR ● URI: load from web or from local file: load protonust.owl ● Format: rdfxml, ntriples, turtle ● Default default NS: ● Resolves all imports automatically when loading ● Double-click ontology LR to view/edit

University of Sheffield, NLP Semantic Annotation “Semantic”: link the annotation to a concept in an ontology. The semantic link connects the text mention to knowledge about the concept that is mentioned. The mention can link to an instance, a class, or a property – i.e. to a resource Use the semantic link to access additional data about the concept – use for disambiguation and further annotation processing Use for NER, IE, querying,...

University of Sheffield, NLP Semantic Annotation :London a City ;... :Company a :Organization. XYZ-02FA a :Company ; rdfs:label ; :basedIn :London-UK... XYZ-98 a :Company ; rdfs:label ; :basedIn :Boston-US … XYZ was established on 03 November 1978 in London. The company opened a plant in Bulgaria in.. Ontology Document

University of Sheffield, NLP Semantic Annotation :London a City ;... :Company a :Organization. XYZ-02FA a :Company ; rdfs:label ; :basedIn :London-UK... XYZ-98 a :Company ; rdfs:label ; :basedIn :Boston-US … XYZ was established on 03 November 1978 in London. The company opened a plant in Bulgaria in.. Ontology Document

University of Sheffield, NLP Semantic Annotation vs. “traditional” Link to hierarchy of concepts instead of flat set of concepts Larger space of possible annotations - harder to get it right + candidate concepts have associated knowledge that can be used to support decision + found concepts can be generalized based on ontology: context(company) < context(organization) → ontology aware JAPE in GATE

University of Sheffield, NLP Semantic Annotation: How? Manually: ontology based annotation – GATE OAT (Ontology Annotation Tool) Automatically – Gazetteer/rule/pattern based – Similarity based – Classifier (ML) based – Parser based – Combinations thereof

University of Sheffield, NLP GATE OAT Show document and ontology class hierarchy side- by-side Interactive creation of annotations that link to the ontology class/instance Allows on-the-fly instance creation For: – Creating Evaluation Corpus – Creating ML-Training Corpus

University of Sheffield, NLP OAT

University of Sheffield, NLP OAT

University of Sheffield, NLP OAT

University of Sheffield, NLP Hands-on 2 ● (Load Ontology_Tools plugin) ● Load ontology protonust.owl ● Load a document from corpus_original (encoding iso ) ● Create annotation ● Create annotation and instance ● Load document from corpus_annotated and show annotations

University of Sheffield, NLP Semantic Annotation: Automatic Create language resources from existing ontology: – Retrieve or generate possible mentions and create gazetteer lists or gazetteer – Preprocess document – Annotate document with gazetteer – Disambiguation, postprocessing

University of Sheffield, NLP OntoGazetteer Map ontology classes to gazetteer lists e.g. List of first names to class “Person” Uses Hash Gazetteer internally Provides a GUI to establish the mappings Mapping file could also be created by other means – Gazetteer list file name / ontology class URI For simple situations w/ few classes and many instances per class

University of Sheffield, NLP OntoGazetteer

University of Sheffield, NLP Onto Root Gazetteer Tries to find mentions in resource names (fragement ids), data property values, labels Converts “CamelCase” names, hyphen, underscore Produce multiword subsequences Finds lemma of mentions using the GATE Morphological Analyzer Creates a gazetteer PR that can be used with the FlexibleGazetteerPR

University of Sheffield, NLP Onto Root Gazetteer OntoRootGazeteer: – Generate candidate list from ontology – Run Tokeniser, POS tagger, Morphological Analyser(M.A.) and find lemmata/stems Document pipeline: – Run Tokenizer, POS tagger, M.A. and find lemmata/stems and place in Token.root Flexible gazetteer: – Match Token.root (not text as DefaultGazetteer) using OntoRootGazetteer

University of Sheffield, NLP Hands-on 3 Plugin Ontology_Tools for OntoRootGazetteer Plugin Tools for GATE Morphological Analyser Load Ontology Create Tokeniser, POS Tagger, and Morphological Analyser Create and configure OntoRootGazetteer Create Flexible Gazetteer – add OntoRootGazetteer as gazetteerInst – Specify Token.root for inputFeatureNames

University of Sheffield, NLP Hands-on 3 Ontology LR POS Tagger PR Tokeniser PR

University of Sheffield, NLP Hands-on 3 Create pipeline Create and add Sentence splitter Add Tokeniser Add POS Tagger Add Morphological Analyser Add Flexible Gazetteer Run

University of Sheffield, NLP Postprocess Original annotations contain just candidate URIs and classes. Original annotations might overlap Pull in additional knowledge for – Disambiguation (which person of that name?) – Semantic enrichment for subsequent processing stages

University of Sheffield, NLP Ontology-aware JAPE Rule: LocationLookup ( {Lookup.class == Location} ):location –> :location.Location = { } Matches any name of a class that is a subclass of Location

University of Sheffield, NLP Ontology Population Annotate document and find mentions of what could be (new) instances in the ontology – Use traditional NER, linked to ontology – Use semantic annotation based on existing knowledge – Use ML Create ontology instances and property values (“ABOX”) from the final annotations

University of Sheffield, NLP Ontology population :London a City ;... :Company a :Organization. XYZ was established on 03 November 1978 in London. The company opened a plant in Bulgaria in..

University of Sheffield, NLP Ontology population :London a City ;... :Company a :Organization. XYZ was established on 03 November 1978 in London. The company opened a plant in Bulgaria in..

University of Sheffield, NLP Ontology population :London a City ;... :Company a :Organization. XYZ was established on 03 November 1978 in London. The company opened a plant in Bulgaria in.. :XYZ-001 a :Company ; :established-in :London.

University of Sheffield, NLP Ontology Population Populate Ontology with Instances: – Of classes – Of properties connecting class instances with other class instances or values (literals) – Graph describing n-ary relations or events … Strategy – Place in domain ontology? – Place in intermediate ontology/KB?

University of Sheffield, NLP Ontology Population Place directly in domain ontology: + Simple & straight-forward - Cannot model likelihoods, hard to model meta information (where from, which context) Can easily leave sub-language or become inconsistent Knowledge arrives incrementally but has dependencies Place in intermediate ontology - Processing more complex Apropriate model for intermediate ontology? + Can do iterative improvement Can model meta information

University of Sheffield, NLP Ontology Population: JAPE Rule: FindEntities ({Mention}):mention –> :mention{ Annotation mentionAnn = (Annotation)mentionAnnots.iterator().next(); String className = (String)mentionAnn.getFeature().get(“class”); List matches = ontology.getOresourcesByName(className); Use qualified name! Check if null!

University of Sheffield, NLP Ontology Population: JAPE // find the resource representing the class for(OResource aResource : matches ) { if(aResource instanceof Oclass) { aClass = (Oclass) aResource; Break; } // get Text of mention String mentionName = doc.getContent(). getContent()( mentionAnn.getStartNode().getOffset(), mentionAnn.getEndNode().getOffset()). toString();

University of Sheffield, NLP Ontology Population: JAPE // populate the ontology gate.creole.ontolog.URI uri = OntologyUtilities.createURI( Ontology, mentionName, false); if(!ontology.containsOInstance(uri)) { ontology.addOInstance(uri, aClass); }

University of Sheffield, NLP Hands-on 4 Open protonust.owl ontology Create corpus from corpus_annotated (encoding iso ) Create JAPE file populate.jape or download populate.jape from Create Pipeline and run JAPE transducer View ontology

University of Sheffield, NLP Recap Semantic Annotation –Mentions of instances in the text are annotated wrt concepts (classes) in the ontology. –Requires that instances are disambiguated. –It is the text which is modified. Ontology Population –Generates new instances in an ontology from a text. –Links unique mentions of instances in the text to instances of concepts in the ontology. –It is the ontology which is modified.

University of Sheffield, NLP Ontology Learning Extraction of (domain) ontologies from natural language text –Machine learning –Natural language processing Tools: OntoLearn, OntoLT, ASIUM, Mo’K Workbench, JATKE, TextToOnto, … Slide courtesy of Johanna Volker, UKARL

University of Sheffield, NLP Ontology Learning – Tasks drive( Peter, his-car )‏ Relation instance extraction drive( person, car )‏ Relation extraction instance-of( Peter, person )‏ Instance classification Peter, his-car Instance extraction subclass-of( car, vehicle )‏ Concept classification car, vehicle, person Concept extraction Slide courtesy of Johanna Volker, UKARL

University of Sheffield, NLP OL – Problems Text Understanding Words are ambiguous –‘A bank is a financial institution. A bank is a piece of furniture.’  subclass-of( bank, financial institution ) ? Natural Language is informal –‘The sea is water.’  subclass-of( sea, water ) ? Sentences may be underspecified –‘Mary started the book.’  read( Mary, book_1 ) ? Anaphores –‘Peter lives in Munich. This is a city in Bavaria.’  instance-of( Munich, city ) ? Metaphores, … Slide courtesy of Johanna Volker, UKARL

University of Sheffield, NLP What is an instance / concept? –‘The koala is an animal living in Australia.’  instance-of( koala, animal ) subclass-of( koala, animal ) ? How to deal with opinions and quoted speech? –‘Tom thinks that Peter loves Mary.’  love( Peter, Mary ) ? Knowledge is changing –instance-of( Pluto, planet ) ? Conclusion: Ontology learning is difficult. What we can learn is fuzzy and uncertain. Ontology maintenance is important. OL – Problems Knowledge Modeling Slide courtesy of Johanna Volker, UKARL

University of Sheffield, NLP Ontology Learning Approaches Concept Classification Heuristics –‘image processing software’  subclass-of( image processing software, software )‏ Patterns –‘animals such as dogs’ –‘dogs and other animals’ –‘a dog is an animal’  subclass-of( dog, animal )‏ Slide courtesy of Johanna Volker, UKARL

University of Sheffield, NLP JAPE Patterns for Ontology Learning rule: Hearst_1 ( (NounPhrase):superconcept {Token.string=="such"} {Token.string=="as"} (NounPhrasesAlternatives):subconcept ):hearst1 --> :hearst1.SubclassOfRelation = { rule = "Hearst1" }, :subconcept.Domain = { rule = "Hearst1" }, :superconcept.Range = { rule = "Hearst1" } Slide courtesy of Johanna Volker, UKARL