An Ontology Creation Methodology: A Phased Approach

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

TU/e technische universiteit eindhoven Hera: Development of Semantic Web Information Systems Geert-Jan Houben Peter Barna Flavius Frasincar Richard Vdovjak.
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
Document Clustering Content: 1.Document Clustering Essentials. 2.Text Clustering Architecture 3.Preprocessing 4.Different Document Models 1.Probabilistic.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
“ Leveraging SharePoint 2010 Search Technologies ” With: Ivan Neganov.
Information Retrieval in Practice
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
ANLE1 CC 437: Advanced Natural Language Engineering ASSIGNMENT 2: Implementing a query expansion component for a Web Search Engine.
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Digital Library Service Integration (DLSI) --> Looking for Collections and Services to be DLSI Testbeds
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
12 -1 Lecture 12 User Modeling Topics –Basics –Example User Model –Construction of User Models –Updating of User Models –Applications.
Overview of Search Engines
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
1 LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology A. Singh, H. Boley, V.C. Bhavsar National Research Council and University.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Idea-garden.org SOCIAL SEMANTIC INFORMATION SPACE An Interactive Learning Environment Fostering Creativity Grant agreement no: nd CIDOC CRM-SIG.
Taxonomies: Hidden but Critical Tools Marjorie M.K. Hlava President Access Innovations, Inc.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Jon Atle GullaSpråkteknologi og innovasjon1 Språkteknologi i industrielle anvendelser Or: How we have commercialized linguistic technologies 1. Linguistics.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
Survey of Semantic Annotation Platforms
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Computational Linguistics WTLAB ( Web Technology Laboratory ) Mohsen Kamyar.
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Querying Structured Text in an XML Database By Xuemei Luo.
Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
Ontology Evolution and Regression Analysis Insights into Ontology Regression Testing Maria Copeland Rafael Goncalvez Robert Stevens Bijan Parsia Uli Sattler.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Semiautomatic domain model building from text-data Petr Šaloun Petr Klimánek Zdenek Velart Petr Šaloun Petr Klimánek Zdenek Velart SMAP 2011, Vigo, Spain,
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
C. Lawrence Zitnick Microsoft Research, Redmond Devi Parikh Virginia Tech Bringing Semantics Into Focus Using Visual.
CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.
Web- and Multimedia-based Information Systems Lecture 2.
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Collocations and Terminology Vasileios Hatzivassiloglou University of Texas at Dallas.
MedKAT Medical Knowledge Analysis Tool December 2009.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Knowledge based Question Answering System Anurag Gautam Harshit Maheshwari.
TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining knowledge from natural language texts using fuzzy associated concept mapping Presenter : Wu,
A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties Liang Chen*,Naoyuki Tokuda+, Hisahiro Adachi+ *University of Northern.
Selected Semantic Web UMBC CoBrA – Context Broker Architecture  Using OWL to define ontologies for context modeling and reasoning  Taking.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
Intelligent Database Systems Lab Presenter: YU-TING LU Authors: Yong-Bin Kang, Pari Delir Haghighi, Frada Burstein ESA CFinder: An intelligent key.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Elaboration popo.
CSE5544 Final Project Interactive Visualization Tool(s) for IEEE Vis Publication Exploration and Analysis Team Name: Publication Miner Team Members:
CSE5544 Final Project Interactive Visualization Tool(s) for IEEE Vis Publication Exploration and Analysis Team Name: Publication Miner Team Members:
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Dept. of Computation, UMIST
Presentation transcript:

An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran Oakland University, USA sugumara@oakland.edu

Agenda Ontology development Traditional ontology learning Limitations of ontology learning A phased approach to ontology learning

The Challenge How to develop large complex ontologies? How to keep ontologies updated in dynamic domains?

Ontology Modeling vs. Learning Traditional ontology engineering approach Project: Form team of ontology and domain experts Ontology & domain experts: Collaborative manual modeling process Domain experts: Verify ontology against domain knowledge Ontology experts: Verify ontology against syntactic and semantic quality measures Expensive and time-consuming approach Stable domains assumed Ontology learning approach: Domain experts: Find representative domain text Tool: Extract candidate classes, individuals and properties automatically from domain texts Ontology & domain experts: Verify candidate structures and complete ontology Can also be used to verify domain quality of existing ontology Cost-effective approach Not unproblematic in dynamic domains

Agenda Ontology development Traditional ontology learning Limitations of ontology learning A phased approach to ontology learning

Ontology Learning Basis People communicate using domain-specific concepts People document using domain-specific concepts Ontology learning: Extract ontology structures from written documentation Requirements: Documents representative for domain terminology Documents cover all the terminology Well-defined and consistent use of terminology in domain Ontology discussions Realm of ontology engineering Ontology in use Realm of ontology learning

Levels of Ontology Learning Degree of difficulty  x,y(manager(x,y) → report(y,x)) Rules Relations FINANCE(ag:SPONSOR, go: PROJECT) Concept hierarchies is_a(MANAGER, EMPLOYEE) Concepts PROJECT Synonyms (leader, manager, lead) Terms sponsors, costs, charter

Ontology Learning Strategies Term extraction Linguistic analysis Statistical analysis Synonyms Classification-based techniques Distribution-based techniques Concept formation Structure recognition Keyphrase generation Instance learning Concept hierarchy Clustering Lexico-syntactic patterns Head-modifier approaches Subsumption approaches Relations Association rules Concept vectors Rules Structure recognition for meta-property recognition Dependency trees and path similarities

Ontology Learning Process Scope management WBS Business need Constituent components Product description ... PMBOK Abstract elements Constraints Properties Rules Domain text Concept candidates Search ontology Reference set Automatic extraction of concept and relationship candidates Manual selection of candidates and completion of model

Ex 1. Learning Concept/Individual Candidates Scope planning is the process of progressively elaborating and documenting the project work (project scope) that produces the product of the project. Scope/NNP planning/NN is/VBZ the/DT process/NN of/IN progressively/RB elaborating/VBG and/CC documenting/VBG the/DT project/NN work/NN (/( project/NN scope/NN )/) that/WDT produces/VBZ the/DT product/NN of/IN the/DT project/NN ./. POS tagging Scope planning is the process of progressively elaborating and documenting the project work (project scope) that produces the product of the project. Stopword removal (571 words) Scope plan process progress elaborate document project work project scope produce product project Lemmatization/stemming (POS tags not shown) {scope planning, process, project work, project scope, product, project} Select consecutive nouns as candidate phrases {(scope planning, 0.0097), (project scope, 0.0047), (product, 0.0043), (project work, 0.0008), (project, 0.0001), (process, 0.0000)} Calculate tf.idf score for phrases

Classes Relevant to the Drama Genre Data sources: IMDB, Wikipedia, Videoload Keyphrase extraction technique Noun phrases ranked according to various statistical measures

Ex 2. Learning Relationship Candidates Tokenizer GATE Sentence splitter Tagger Lemmatizer Noun phrase extractor Noun phrase indexer Association rules miner Association rules Concept profiles Concept similarity calculation profile builder Lucene Document Paragraph Light stemmer Relationship merger

Relationships Relevant to Drama Genre Association rules on extracted concepts

Automatic OWL Generation

Agenda Ontology development Traditional ontology learning Limitations of ontology learning A phased approach to ontology learning

Limitations of Ontology Learning Different techniques produce different results Different data sources produce different results Lost control over process Extensive verification of final ontology needed New data hard to combine with old data

Agenda Ontology development Traditional ontology learning Limitations of ontology learning A phased approach to ontology learning

Ontology Learning for Entertainment Domain Ontology evolution for Deutsche Telecom’s Videoload download service What does Brangelina mean? Should Pitt be Brad Pitt or Michael Pitt? Actor vs. Schauspieler? All movies of Brad Pitt? Last movie of Pitt?

Ontology Learning Project Duration: Nov 2007 – Nov 2009 Domain: movie download service Ontology analysis and creation based on indexed noun phrases from movie documents Ontology used for search and navigation on top of FAST search platform Ontology learning challenges: Domain changes from one day to another No consistent domain terminology No professional domain terminology Multiple languages Movies about anything... unlimited domain Ontology needs to be up to date to support search

Ontology Workbench 3 phases that are carried out independently Crawling into Lucene indices Supervised extraction of candidates Combining candidates into ontology structures

Interactive Ontology Development Expandable indices Subset of data source Focus of analysis List of techniques Partial results Stored results Set operations for combining results

Thank you