Presentation on theme: "Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR."— Presentation transcript:
Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR
Data Model - Purpose To provide a consistent and efficiently functioning data store for a particular business application(s) – Represents specific business concepts in a way that determines organization of data in the store – Commonly used representations are relational and graph; they are supported by data management technologies, e.g. relational – Oracle and MySQL, graph – Neoj4, RDF/OWL stores. Efficiency requires – Application-specific representations – Store only data needed the application Objective (shared) representation of the domain is not the purpose – multiple data models for the same domain to accommodate different business applications
Data Silos Numerous partial idiosyncratic representations of the domain in data models and numerous versions of data in data stores No re-usability No single version of truth Accounts Receivable Accounts Payable Budget
Ontology – Purpose Objectivity of representation of reality Commonly used representation is graph, it is supported by RDF-based semantic technologies Objective (shared) representation of the domain - one authoritative ontology for the domain of reality meant for re-use Storing vast volumes of data is not the purpose
Financial Ontology A single domain ontology (or a collection of ontologies) To be re-used in different applications Single version of truth (as we know it today) Note: we discuss ontologies built in accordance with the methodology and architecture pioneered by Dr. Smith.
Comparison Although there are technologies that support a particular paradigm in the best way, they are not the defining factor in distinguishing between a data model and ontology We compare not technologies but paradigms Ontology Data Model
Data Model – Types Types are general or repeatable entities capable of being instantiated by indefinitely many particulars Data model types and instances are abstractions embodying efficient ways of describing the data about reality that is needed by an application (efficient both for reasoning and for storage) – Different abstractions depending on the business need The data model term ‘person’ is used to define an efficient storage solution for data about persons needed by a particular application
Ontology – Types Ontology types and instances are on the side of reality They must provide one term, and one definition, for each salient type of entity in each domain of interest The ontology term ‘person’, when it is used to represent data about persons, is designed to establish a link between these data and persons in reality.
Data Model – Organization Arbitrary combination of selected types suited for efficient data processing The data model view of reality is flat and rigid One of the models needs to be changed to accommodate multiple skills of a person. These changes can be performed only through significant effort because of relative rigidity of data representation languages and the need to re-arrange the physical data store
Ontology - Organization Each type appears only once in the ontology hierarchy. The ontology view of reality is synoptic – it represents in non-redundant fashion an entire hierarchy of types at different levels of generality. Each term is associated in an intelligible way with its subsuming and subsumed terms (and thus with the ancestor and descendant types) in the hierarchy of more and less general Representation is more flexible, changes are easier to make, and changes are not as disruptive
Data Model vs. Ontology –Types and Individuals Person NameSkill JohnComputer Skill MarySewing Skill Skill Computer Skill Programming Skill Java C++ Person NameSkill JohnJava MaryC++
Data Model – Labels Are not as important because databases are not directly exposed to users – they are presented via an application that exposes the database content using the specific vocabulary of a narrow community of users Can be anything, e.g. ‘PN’, ‘PName’, ‘PersName’, ‘PersonN’, etc. for the person name The meaning of the label is often derived from the context (e.g. Name for the name of the Person and the name of the Skill in one of the examples)
Ontology - Labels Are exposed to users Are nouns and noun phrases from natural language, and each type has a unique name that designates the type unambiguously regardless of the context in which the type might be used, e.g. PersonName, SkillName
Closed and Open World Assumptions (impact of technologies) Database reasoning is confined to search based on the closed world assumption. If we do not find something in the database, then this means that this something does not exist in the world that is defined by the database. Ontologies are based on the idea that we can never describe entities in the real world completely. This means that, from the absence in an ontology of a particular term ‘A’, we cannot infer that As do not exist. It means also that ontologies are constructed in a way which allows easy addition of new types and relations.
Life Span Data models are created in ad hoc ways to capture targeted selection of features; the data model usually is not reused, which results in numerous data silos for a domain Ontologies will grow and expand as new knowledge is gained over time
Summary of Comparison Dimension of Comparison Traditional Data-ModelOntologies Closeness to reality Variable, application-specificReality is always the prime focus Conceptualization of the domain Plain and partial (always at the level of detail needed for a particular implementation) Hierarchical, simultaneously describing the same domain at different levels of detail VocabularyApplication-specific, not intended for sharing Application-independent, intended to support sharing and reuse Structures or organization of types Groupings of types to accommodate data access patterns Taxonomies (type hierarchies) always used to describe/classify the domain CombinabilityCan rarely be combined; even if possible this will typically require significant manual effort If the ontology building methodology is followed, then the results will be combinable automatically FlexibilityRigid, changes normally require significant effort Flexible, changes can normally be effected very easily.
Semantic Enhancement of Data Models by Ontology Semantic Enhancement (SE) is realized with the help of ontologies that are used to explicate data models and annotate data instances – Vocabulary of ontologies used for explications and annotations provides agile horizontal integration – Ontologies, by virtue of their nature and organization, provide semantic enhancement of data PersonIDNameDescription 111JavaProgramming 222SQLDatabase SQLJavaC++ ProgrammingSkill ComputerSkill Skill Education Technical Education 18
The Meaning of ‘Enhancement’ Semantic enhancement/enrichment of data = arm’s length approach (no change to data) – through simple explication we associate an entire knowledge system with a database field – enables analytics to process data, e.g. about computer skills, “vertically” along the Skill hierarchy, as well as “horizontally” via relations between Skill and Education. – and further… while data in the database does not change, its analysis can be richer and richer as our understanding of the reality changes For this richness to be leveraged by different communities, persons, and applications it needs to have the properties mentioned above and be constructed in accordance with the principles of the SE (see References) 19
SE and Data Integration Traditional integration approaches involve creation of a new model used in – A new physical store (data warehouse) Expensive, resource- and time-consuming Another data store – rigid (potential data silo), interoperable with other stores Querying the data sources via it – Fragile Both entail loss and or distortion of data and semantics, and provide only ‘local’ integration (do not lead to interoperability with other sources) SE of a store – Does not require data reorganization and creation of another store – Changes to it are non-intrusive – Leads to integration of the store with other stores, enhanced previously or in the future
References Barry Smith, et al. IAO-Intel – An Ontology of Information Artifacts in the Intelligence Domain, STIDS Conference, 2013.IAO-Intel – An Ontology of Information Artifacts in the Intelligence Domain Barry Smith, Tatiana Malyuta, William S. Mandrick, Chia Fu, Kesny Parent, Milan Patel, Horizontal Integration of Warfighter Intelligence Data: A Shared Semantic Resource for the Intelligence Community, STIDS Conference, 2012.Horizontal Integration of Warfighter Intelligence Data: A Shared Semantic Resource for the Intelligence Community Barry Smith, Tatiana Malyuta, David Salmen, William Mandrick, Kesny Parent, Shouvik Bardhan, Jamie Johnson, “Ontology for the Intelligence Analyst”, Crosstalk: The Journal of Defense Software Engineering, David Salmen, Tatiana Malyuta, Alan Hansen, Shaun Cronen, Barry Smith, Integration of Intelligence Data through Semantic Enhancement, STIDS Conference, Integration of Intelligence Data through Semantic Enhancement 21