Ontologies & Databases: Similarities & Differences Ontolog Panel

Slides:



Advertisements
Similar presentations
May 23, 2004OWL-S straw proposal for SWSL1 OWL-S Straw Proposal Presentation to SWSL Committee May 23, 2004 David Martin Mark Burstein Drew McDermott Deb.
Advertisements

Three-Step Database Design
Semantic Interoperability & Semantic Models: Introduction
University of Toronto Michael Gruninger University of Toronto, Canada Leo Obrst MITRE, McLean, VA, USA February 6, 2014February 6, 2014February 6, 2014.
OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Dr. Leo Obrst Information Semantics Command & Control Center July 17, 2007 Ontologies Can't Help Records Management Or Can They?
Relational Database and Data Modeling
Limitations of the relational model 1. 2 Overview application areas for which the relational model is inadequate - reasons drawbacks of relational DBMSs.
Chapter 10: Designing Databases
Ontology-based User Modeling for Web-based Information Systems Anton Andrejko, Michal Barla and Mária Bieliková {andrejko, barla,
the Entity-Relationship (ER) Model
The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
Database System Concepts and Architecture
Executional Architecture
2009 – E. Félix Security DSL Toward model-based security engineering: developing a security analysis DSML Véronique Normand, Edith Félix, Thales Research.
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
Data Modeling and Database Design Chapter 1: Database Systems: Architecture and Components.
Basics of Knowledge Management ICOM5047 – Design Project in Computer Engineering ECE Department J. Fernando Vega Riveros, Ph.D.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
ETEC 100 Information Technology
UML CASE Tool. ABSTRACT Domain analysis enables identifying families of applications and capturing their terminology in order to assist and guide system.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
Trisha Cummings.  Most people involved in application development follow some kind of methodology.  A methodology is a prescribed set of processes through.
The chapter will address the following questions:
Chapter 2 Database System Concepts and Architecture
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
Enterprise Interoperability Basic Concepts, Definitions and Approaches
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
Knowledge representation
Database System Concepts and Architecture Lecture # 2 21 June 2012 National University of Computer and Emerging Sciences.
Database System Concepts and Architecture
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
SAMANVITHA RAMAYANAM 18 TH FEBRUARY 2010 CPE 691 LAYERED APPLICATION.
Of 33 lecture 10: ontology – evolution. of 33 ece 720, winter ‘122 ontology evolution introduction - ontologies enable knowledge to be made explicit and.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Lecture # 3 & 4 Chapter # 2 Database System Concepts and Architecture Muhammad Emran Database Systems 1.
IS 325 Notes for Wednesday August 28, Data is the Core of the Enterprise.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Knowledge Representation Semantic Web - Fall 2005 Computer.
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
1 Chapter 1 Introduction to Databases Transparencies.
Working with Ontologies Introduction to DOGMA and related research.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Some Thoughts to Consider 8 How difficult is it to get a group of people, or a group of companies, or a group of nations to agree on a particular ontology?
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Object storage and object interoperability
Information Integration 15 th Meeting Course Name: Business Intelligence Year: 2009.
Chapter 7 K NOWLEDGE R EPRESENTATION, O NTOLOGICAL E NGINEERING, AND T OPIC M APS L EO O BRST AND H OWARD L IU.
Copyright 2008, The MITRE Corporation Dr. Leo Obrst MITRE Information Semantics Group Information Discovery & Understanding Dept. Command & Control Center.
Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.
Mathematical Service Matching Using Description Logic and OWL Kamelia Asadzadeh Manjili
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Databases and DBMSs Todd S. Bacastow January 2005.
Chapter 2 Database System Concepts and Architecture
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Chapter 2 Database Environment.
Data, Databases, and DBMSs
ece 627 intelligent web: ontology and beyond
Chapter 2 Database Environment Pearson Education © 2009.
Presentation transcript:

Ontologies & Databases: Similarities & Differences Ontolog Panel Dr. Leo Obrst MITRE Information Semantics Center for Innovative Computing & Informatics October 12, 2006

Summary Databases: Ontologies: Focus on local semantics that have only aspects of the real world Typically keep that semantics implicit Use logic structurally Their schemas are not generally reusable Ontologies: Focus on global semantics of the real world Make that semantics explicit Enable machine interpretability by using a logic-based modeling language Are reusable as true models of a portion of the world

Tightness of Coupling & Semantic Explicitness Explicit, Loose Far Performance = k / Integration_Flexibility EA Ontologies From Synchronous Interaction to Asynchronous Communication Application Same Process Space Same CPU Same OS Same Programming Language Same Local Area Network Same Wide Area Network Client-Server Same Intranet Compiling Linking Agent Programming Web Services: SOAP Distributed Systems OOP Applets, Java Semantic Brokers Middleware Web Peer-to-peer N-Tier Architecture Same Address Space Same DBMS Federated DBs Data Warehouses Data Marts Workflow Ontologies Semantic Mappings XML, XML Schema Conceptual Models RDF/S, OWL Web Services: UDDI, WSDL OWL-S Proof, Rules, Modal Policies: SWRL, FOL+ Enterprise Ontologies EA Brokers 1 System: Small Set of Developers Systems of Systems Enterprise Community Internet EA Semantics Explicitness Data SOA EAI The point of this slide is to show that we have evolved over time in information technology from tightly coupled systems to loosely coupled systems, primarily to deal with increasing heterogeneity and increasingly many kinds of heterogeneity. Correlated to that evolution is the need for increasingly more explicit semantics to deal with the heterogeneity. Range is from lower left, very Tightly Coupled with very implicit semantics (local), to upper right, very loosely coupled with very explicit semantics (far): red and blue arrows from lower left to upper right. Red font indicates the data constructs that have evolved over time in information technology to adjust to increasingly necessary loose coupling. Blue font indicates the application constructs that have evolved over time in information technology to adjust to increasingly necessary loose coupling. Green font (far right of graphic) indicates from bottom to top the evolution of the groups involved in dealing with the heterogeneity, from the original small set of developers creating one system with a couple applications or subprocedures and one database (where the developers can make implicit agreements as to the semantics, by nodding their heads that yes, this API and this database schema, means what we say it means), to systems of systems to enterprise level computing to the Internet. At the Internet level, we can’t all nod our heads and say, yes, this is what we mean. We need increasingly more explicit semantics, and that semantics needs to be increasingly more machine- interpretable. Rationale for increasing loose-coupling: dealing with the heterogeneity forced on us over time in information technology. We have progressed from writing and executing programs in the same programming language on the same operating system on the same computer/CPU, in the same address space and same process space to a vast heterogeneous world, inventing new data and application constructs at each increase in heterogeneity. This increased heterogeneity forces us to adopt increasingly more explicit semantics to deal with it. The two curves indicate different things. The Right Curve (From Synchronous Interaction to Asynchronous Communication) roughly characterizes the evolution from synchronous to asynchronous). The Left Curve (Performance = k / Integration_Flexibility) roughly demonstrates the inverse relationship between performance and integration flexibililty, i.e., tightly coupled systems generally have better performance than loosely coupled systems, but increasingly we need to emphasize integration flexibility over that performance, a tradeoff to deal with heterogeneity. Local Implicit, TIGHT Looseness of Coupling

Ontology Spectrum: One View strong semantics Modal Logic First Order Logic Logical Theory Is Disjoint Subclass of with transitivity property Description Logic DAML+OIL, OWL From less to more expressive UML Conceptual Model Is Subclass of RDF/S Semantic Interoperability XTM Extended ER Thesaurus Has Narrower Meaning Than ER DB Schemas, XML Schema Structural Interoperability Taxonomy Is Sub-Classification of Relational Model, XML Syntactic Interoperability weak semantics 1

Ontology Spectrum: Application Concept- based Ontology strong Logical Theory weak Conceptual Model Term- based Thesaurus Expressivity Taxonomy Synonyms, Enhanced Search (Improved Recall) & Navigation, Cross Indexing Enterprise Modeling (system, service, data), Question-Answering (Improved Precision), Querying, SW Services Real World Domain Modeling, Semantic Search (using concepts, properties, relations, rules), Machine Interpretability (M2M, M2H semantic interoperability), Automated Reasoning, SW Services Categorization, Simple Search & Navigation, Simple Indexing Application 1

Example: Metadata Registry/Repository – Contains Objects + Classification Data Element Taxonomy Namespace Class Data Objects Classification Objects Terminology Objects Meaning Objects Data Attribute Conceptual Model Ontology Thesaurus XML DTD XML Schema Concept Property Relation Attribute Value Instance Privileged Taxonomic Relation Data Schema Documents Data Value Term (can be multi-lingual) Keyword List

Approximate Cost/Benefit of Moving up the Ontology Spectrum Increasingly greater benefit because of increased semantic interoperability, precision, level machine-human interaction Higher Initial Costs Cost Much lower eventual costs because of reuse, less analyst labor Logical Theory Higher initial costs at each step up Time Thesaurus Conceptual Model Taxonomy Cost Benefit

What Problems Do Ontologies Help Solve? Heterogeneous database problem Different organizational units, Service Needers/Providers have radically different databases Different syntactically: what’s the format? Different structurally: how are they structured? Different semantically: what do they mean? They all speak different languages Enterprise-wide system interoperability problem Currently: system-of-systems, vertical stovepipes Ontologies act as conceptual model representing enterprise consensus semantics Well-defined, sound, consistent, extensible, reusable, modular models Relevant document retrieval/question-answering problem What is the meaning of your query? What is the meaning of documents that would satisfy your query? Can you obtain only meaningful, relevant documents?

A Business Example of Ontology Washer Catalog No. Shape Size Price iMetal Corp. E-Machina Manufacturer .45 1.25 Square 550298 .35 1.5 Round 550296 .75 XAB023 .25 XAB035 … Price ($US) Size (in) Shape Mfr No. .25 1.25 Square XAB035 .75 1.5 Round XAB023 … Price ($US) Size (in) Shape Catalog No. .45 31 S 550298 .35 37 R 550296 … Price ($US) Diam (mm) Geom. Part No. Supplier A Buyer Supplier B

Ontologies & the Data Integration Problem DBs provide generality of storage and efficient access Formal data model of databases insufficiently semantically expressive The process of developing a database discards meaning Conceptual model  Logical Model  Physical Model Keys signify some relation, but no solid semantics DB Semantics = Schema + Business Rules + Application Code Ontologies can represent the rich common semantics that spans DBs Link the different structures Establish semantic properties of data Provide mappings across data based on meaning Also capture the rest of the meaning of data: Enterprise rules Application code (the inextricable semantics)

Background on Relational Calculus for Databases Tuple Relational Calculus (TRC) More like a pre-relational file structure format Domain Relational Calculus (DRC) Similar to logic as a modeling language Relational Algebra (RA) Roughly equivalent expressivity: all the above SQL: slightly more powerful because of some computation, ordering, etc. These use the syntax of FOL but only a very simplified semantics

Ontologies & Databases Ontologies are about vocabularies and their meanings, with an explicit, expressive, and well-defined semantics, possibly machine-interpretable Ontologies try to limit the possible formal models of interpretation (semantics) of those vocabularies to the set of meanings a modeler intends, i.e., close to the human conceptualization None of the other "vocabularies" such as database schemas or object models, with less expressive semantics, does that The approaches with less expressive semantics typically assume that humans will look at the "vocabularies" and supply the semantics via the human semantic interpreter (your mental model) Additionally a human developer will code programs to enforce the local semantics that the database/DBMS cannot They may or may not get it right Other humans will have to read that code, interpret it, and see if it's actually doing what everyone thinks it should be doing The higher you go in terms of data warehouses, marts, etc., the more human interpreted semantic error creeps in Ontologies model generic real world concepts and their meanings, unlike either database schemas or object models, which are typically very specific to a particular set of applications and represent limited semantics A given ontology cannot model completely any given domain However, in capturing real world (and imaginary, if you wish, i.e., you might want a theory of unicorns and other fantastic beasts) semantics, you are thereby enabled to reuse, extend, refine, generalize, etc., that semantic model

Ontologies & Databases It's suggested you reuse ontologies You cannot reuse database schemas You might be able to take a database conceptual schema and use that as the basis of an ontology, but that would still be a leap from an Entity-Relation model to a Conceptual Model (say, UML, i.e., a weak ontology) to a Logical Theory (strong ontology) In much the same way, you can start with a taxonomy or a thesaurus and migrate it to an ontology But logical and physical schemas are typically pretty useless, since they incorporate non real world knowledge (and in non-machine-interpretable form) By the time you have the physical schema, you just have relations and key information: you've thrown away the little semantics you had at the conceptual schema level The methodology for ontologies and databases are similar (as for all models in the Ontology Spectrum) insofar as the database designer or knowledge/ontology engineer has to consider an information space that captures certain kinds of knowledge However, a database designer does not care about the real world, per se, but about constructing a specific local container/structure of data that will hold his/her user's data in an access-efficient way A good database designer will sit down with users and generate use cases/scenarios based on interaction with the users. Similarly, for ontologists: they'll sit down with domain experts/SMEs and get a sense of the semantics of the part of the world that these folks are knowledgeable about A good ontologist will analyze the data available (if available; bottom up) and also analyze what the domain expert says (top down) In many cases (intelligence analysis, e.g.), the ontologist won't ask the SME what kinds of questions that person asks for their tasks, but also what kinds of questions they would like to ask and which are impossible to get answered currently by using mainstream database and system technology

The Database Design Process: 3 Stages In interaction with prospective users and stakeholders of the proposed database, the database designer will create a conceptual schema, usually using a modeling language and tools based on Entity-Relation models, extended ER models, or recently, on object- oriented models using UML Once this conceptual schema is captured, the designer will refine to become a logical schema, sometimes called a logical data model, still in an ER language or UML. The logical schema typically results by refining the conceptual schema using normalization and other techniques to move closer to the so-called physical model that will be implemented to create the actual database - by normalizing the relations (and attributes, if the conceptual schema contains these) using the same ER and UML languages Finally, refining the logical schema to become the physical schema, where the tables, columns, keys, etc., are defined, and then the physical table optimized in terms of which elements to index, which sectors in the database to place the various data elements A data dictionary may be created for the database; this expresses in natural language documentation, what the various elements of the database are intended to mean The data dictionary is only semantically interpretable by human beings, since it is written in natural language The most expressive real-world semantics of the database creation process thus exists in the conceptual schema and the data dictionary The conceptual schema, may be kept around, as part of the documentation of the process of developing the database, an artifact of that process The data dictionary, will typically be kept as documentation Unfortunately, the underlying physical database and its schema may be changed dramatically - without the original conceptual schema and the data dictionary being comparably changed This is also typically the case with UML models used to create object-oriented systems and sometimes to defined enterprise architectures

The Database Design Process Databases typically try to enforce 3 kinds of integrity 1) Domain integrity (and note that this is not the same notion of "domain" we use in general in logic/ontologies): domains are usually datatype domains, i.e., integers, strings, real numbers, or column-data domains. Typically you don't have any symbolic objects at all in a database, just strings So on data entry or update say of a row, some program (or the DBMS) will make sure that if a column is defined to contain only integer data, that the user can only enter integer data 2) Referential integrity: this refers to key relationships, primary and foreign This kind of integrity is structural, making sure that if a key gets updated, that any key in any other place that's dependent on it gets updated appropriately to. Add, Delete, Update (usually considered an initial Delete, followed by an Add) 3) Semantic integrity: this is the hardest part. Represents real-world constraints/etc., sometimes called "business rules" that you want to hold over your data Databases and DBMSs can't usually do this (even with active and passive triggers), and so auxiliary programming code usually has to enforce this Example:"no other employee can make more than the CEO", or other cross-dependencies. You can't really check consistency of a database in the same way you can for an ontology in a logical knowledge representation language For databases, you can just enforce as best as you can the above 3 kinds of integrity For an ontology, you can check consistency in two ways: Syntactically (proof theory) Semantically (model-theory) But you can do this at two levels: (1) prove that your KR language is sound and complete, i.e., at the meta-level Sound ('Phi |- A' implies 'Phi |= A'): the proof system will not prove anything that is not valid Complete ('Phi |= A' implies 'Phi |- A'): the proof system is strong enough to prove everything that's valid 'Phi |- A' means something like: A follows from or is a consequence of Phi 'Phi |= A' means that A is a semantic consequence or entailment of Phi in some model (or valuation system) M (with truth values, etc.) I.e., the argument is valid Both |- and |= are called turnstyles, syntactic and semantic respectively Check the consistency of a theory (ontology), i.e., at the object level This is usually something like Negation consistency: there is no A such that both 'Phi |- A' and 'Phi |- ~A', i.e., a contradiction

Ontology Design If you are creating common knowledge (as opposed to deep domain knowledge), you can in fact use your own intuition and understanding of the world to develop your ontology It certainly helps to have a good background in formal ontology or formal semantics, because then you've already learned 1) a rigorous, systematic methodology 2) formal machinery for expressing fine details of world semantics 3) an appreciation of many alternative analyses, pitfalls, errors, etc. 4) complex knowledge about things in the world and insight into your pretheoretical knowledge In linguistics we say that although everyone knows how to use natural language like English, very few know how to characterize that knowledge nor about prospective theories about that knowledge Naive speakers don't have good subjective insight into how they do things; they just do them

Ontologies vs. Databases As is so often the case with non-ontological approaches to capturing the semantics of data, systems, and services, the modeling process stops at a syntactic and structural model, and throws even the impoverished semantic model away, to act as historical artifact, completely separated from the evolution of the live database, system, or service, and still only semantically interpretable by a human being who can read the documents, interpret the graphics, supply the real world knowledge of the domain, and understands how the database, system, or service will actually be implemented and used Ontologists want to shift some of that "semantic interpretative burden“ to machines and have them eventually mimic human semantics, i.e., understand what we mean The result would be to bring the machine up to the human, not force the human to the machine level By "machine semantic interpretation" we mean: by structuring and constraining in logical, axiomatic language the symbols humans supply, the machine will conclude via an automated inference process roughly what a human would in comparable circumstances The knowledge representation language that enables this automated inference must be a language that both makes fine modeling distinctions and has a formal or axiomatic semantics for those distinctions, so no direct human involvement will be necessary – the meaning of "automated inference" Databases primary purpose is for storage and ease of access to data, not complex use Software applications (with the data semantics embedded in nonreusable code via programmers) and human beings must focus on data use, manipulation, and transformation, all of which require a high degree of interpretation of the data" Extending the capabilities of a database often requires significant reprogramming and restructuring of the database schema Extending the capabilities of an ontology can often be done by adding to its set of constituent relationships In theory, this may also include relationships for semantic mapping whereas semantic mapping between multiple databases will require external applications