Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway Vijay Sugumaran.

Similar presentations


Presentation on theme: "An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway Vijay Sugumaran."— Presentation transcript:

1 An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway Vijay Sugumaran Oakland University, USA

2 Agenda Ontology development Traditional ontology learning Limitations of ontology learning A phased approach to ontology learning

3 The Challenge How to develop large complex ontologies? How to keep ontologies updated in dynamic domains?

4 Ontology Modeling vs. Learning Traditional ontology engineering approach –Project: Form team of ontology and domain experts –Ontology & domain experts: Collaborative manual modeling process –Domain experts: Verify ontology against domain knowledge –Ontology experts: Verify ontology against syntactic and semantic quality measures Expensive and time-consuming approach Stable domains assumed Ontology learning approach: –Domain experts: Find representative domain text –Tool: Extract candidate classes, individuals and properties automatically from domain texts –Ontology & domain experts: Verify candidate structures and complete ontology Can also be used to verify domain quality of existing ontology Cost-effective approach Not unproblematic in dynamic domains

5 Agenda Ontology development Traditional ontology learning Limitations of ontology learning A phased approach to ontology learning

6 Ontology Learning Basis People communicate using domain-specific concepts People document using domain-specific concepts Ontology learning: Extract ontology structures from written documentation Requirements: –Documents representative for domain terminology –Documents cover all the terminology –Well-defined and consistent use of terminology in domain Ontology discussions Realm of ontology learning Realm of ontology engineering Ontology in use

7 Levels of Ontology Learning Terms Synonyms Concepts Concept hierarchies Relations Rules sponsors, costs, charter (leader, manager, lead) PROJECT is_a(MANAGER, EMPLOYEE) FINANCE(ag:SPONSOR, go: PROJECT) x,y(manager(x,y) report(y,x)) Degree of difficulty

8 Ontology Learning Strategies Term extraction –Linguistic analysis –Statistical analysis Synonyms –Classification-based techniques –Distribution-based techniques Concept formation –Structure recognition –Keyphrase generation –Instance learning Concept hierarchy –Clustering –Lexico-syntactic patterns –Head-modifier approaches –Subsumption approaches –Classification-based techniques Relations –Association rules –Concept vectors Rules –Structure recognition for meta- property recognition –Dependency trees and path similarities

9 Ontology Learning Process Scope management WBS Business need Constituent components Product description... Domain text Reference set Concept candidates PMBOK Search ontology Abstract elements Constraints Properties Rules Automatic extraction of concept and relationship candidates Manual selection of candidates and completion of model

10 Ex 1. Learning Concept/Individual Candidates Scope planning is the process of progressively elaborating and documenting the project work (project scope) that produces the product of the project. Scope/NNP planning/NN is/VBZ the/DT process/NN of/IN progressively/RB elaborating/VBG and/CC documenting/VBG the/DT project/NN work/NN (/( project/NN scope/NN )/) that/WDT produces/VBZ the/DT product/NN of/IN the/DT project/NN./. Scope planning is the process of progressively elaborating and documenting the project work (project scope) that produces the product of the project. Scope plan process progress elaborate document project work project scope produce product project POS tagging Stopword removal (571 words) Lemmatization/stemming (POS tags not shown) {scope planning, process, project work, project scope, product, project} Select consecutive nouns as candidate phrases Calculate tf.idf score for phrases {(scope planning, ), (project scope, ), (product, ), (project work, ), (project, ), (process, )}

11 Classes Relevant to the Drama Genre Data sources: IMDB, Wikipedia, Videoload Keyphrase extraction technique Noun phrases ranked according to various statistical measures

12 Tokenizer GATE Sentence splitter GATE Sentence splitter GATE Tagger GATE Tagger GATE Lemmatizer GATE Lemmatizer GATE Noun phrase extractor GATE Noun phrase extractor Noun phrase indexer Noun phrase indexer Association rules miner Association rules miner Association rules Concept profiles Concept similarity calculation Concept similarity calculation Concept profile builder Concept profile builder Lucene Document indexer Lucene Document indexer Lucene Paragraph indexer Lucene Paragraph indexer Lucene Sentence indexer Lucene Sentence indexer Light stemmer Light stemmer Relationship merger Relationship merger Ex 2. Learning Relationship Candidates

13 Relationships Relevant to Drama Genre Association rules on extracted concepts

14 Automatic OWL Generation

15 Agenda Ontology development Traditional ontology learning Limitations of ontology learning A phased approach to ontology learning

16 Limitations of Ontology Learning Different techniques produce different results Different data sources produce different results Lost control over process Extensive verification of final ontology needed New data hard to combine with old data

17 Agenda Ontology development Traditional ontology learning Limitations of ontology learning A phased approach to ontology learning

18 Ontology Learning for Entertainment Domain Ontology evolution for Deutsche Telecoms Videoload download service What does Brangelina mean? Should Pitt be Brad Pitt or Michael Pitt? Actor vs. Schauspieler? All movies of Brad Pitt? Last movie of Pitt?

19 Ontology Learning Project Duration: Nov 2007 – Nov 2009 Domain: movie download service Ontology analysis and creation based on indexed noun phrases from movie documents Ontology used for search and navigation on top of FAST search platform Ontology learning challenges: –Domain changes from one day to another –No consistent domain terminology –No professional domain terminology –Multiple languages –Movies about anything... unlimited domain –Ontology needs to be up to date to support search

20 Ontology Workbench 3 phases that are carried out independently –Crawling into Lucene indices –Supervised extraction of candidates –Combining candidates into ontology structures

21 Interactive Ontology Development Expandable indices Subset of data source Focus of analysis List of techniques Partial results Stored results Set operations for combining results

22 Thank you


Download ppt "An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway Vijay Sugumaran."

Similar presentations


Ads by Google