Presentation is loading. Please wait.

Presentation is loading. Please wait.

Department of Computer Science & Engineering University of California, San Diego CSE-291:Ontologies in Data and Process Integration Spring 2004 Bertram.

Similar presentations


Presentation on theme: "Department of Computer Science & Engineering University of California, San Diego CSE-291:Ontologies in Data and Process Integration Spring 2004 Bertram."— Presentation transcript:

1 Department of Computer Science & Engineering University of California, San Diego CSE-291:Ontologies in Data and Process Integration Spring 2004 Bertram Ludäscher LUDAESCH@SDSC.EDU

2 Overview Introduction to ontologies:Introduction to ontologies: –What are ontologies (and some related formalisms”)? –How do we represent ontologies? –What can we do with them/to them? Introduction to some specific formalismsIntroduction to some specific formalisms –Logic, Description Logics, OWL, FCA, TMs,... New themes / possible topics (vs. Spring ’03):New themes / possible topics (vs. Spring ’03): –querying concept graphs –use of ontologies in query processing (specifically mediation) –process ontologies, capturing procedural knowledge –philosophical approaches Some guest lecturesSome guest lectures “Class Action”:“Class Action”: –Theoretical studies: surveying/comparing/analyzing approaches (based on research literature) –Practical studies, e.g., experiments with reasoning tools and graph querying tools

3 Overview … TodayToday –Introduction to ontologies –Example application –Description logic (first steps) Next week:Next week: –More on DL, FOL, reasoning, …

4 Ontologies in Data and Information Integration

5 An Online Shopper’s Information Integration Problem El Cheapo: “Where can I get the cheapest copy (including shipping cost) of Wittgenstein’s Tractatus Logicus-Philosophicus within a week?” ? Information Integration addall.com“One-World”Mediation amazon.com A1books.com half.com barnes&noble.com

6 A Home Buyer’s Information Integration Problem What houses for sale under $500k have at least 2 bathrooms, 2 bedrooms, a nearby school ranking in the upper third, in a neighborhood with below-average crime rate and diverse population??InformationIntegration Realtor Demographics School Rankings Crime Stats “Multiple-Worlds”Mediation

7 A Neuroscientist’s Information Integration Problem What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents??InformationIntegration protein localization (NCMIR) neurotransmission(SENSELAB) sequence info (CaPROT) morphometry(SYNAPSE) “ComplexMultiple-Worlds”Mediation Biomedical Informatics Research Network http://nbirn.net

8

9 Standard (XML-Based) Mediator Architecture MEDIATOR (XML) Queries & Results S1S1 Wrapper (XML) View S2S2 Wrapper (XML) View SkSk Wrapper (XML) View Integrated Global (XML) View G Integrated View Definition G(..)  S 1 (..)…S k (..) USER/Client USER/Client Query Q ( G (S 1,..., S k ) ) Query Q ( G (S 1,..., S k ) ) wrappers implemented as web services

10 Data Integration Approaches:Data Integration Approaches: –Let’s just share data, e.g., link everything from a web page! –... or better put everything into an relational or XML database –... and do remote access using the Grid –... or just use Web services! Nice try. But:Nice try. But: –“Find the files where the amygdala was segmented.” –“Which other structures were segmented in the same files?” –“Did the volume of any of those structures differ much from normal?” –“What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents?” Some BIRNing Data Integration Questions Biomedical Informatics Research Network http://nbirn.net

11 Heterogeneous Data integration Requires advanced metadata and processingRequires advanced metadata and processing –Attributes must be semantically typed –Collection protocols must be known –Units and measurement scale must be known –Measurement relationships must be known e.g., that ArealDensity=Count/Area

12 Information Integration Challenges System aspects: “Grid” MiddlewareSystem aspects: “Grid” Middleware –distributed data & computing –Web Services, WSDL/SOAP, … –sources = functions, files, databases, … Syntax & Structure:Syntax & Structure: XML-Based Mediators –wrapping, restructuring –XML queries and views –sources = XML databases Semantics:Semantics: Model-Based/Semantic Mediators –conceptual models and declarative views –SemanticWeb/KnowledgeGrid stuff: ontologies, description logics (RDF(S), DAML+OIL, OWL...) –sources = knowledge bases (DB+CMs+ICs) Syntax Structure Semantics System aspects  reconciling S 4 heterogeneities  “gluing” together multiple data sources  bridging information and knowledge gaps computationally

13 Information Integration from a DB Perspective Information Integration ProblemInformation Integration Problem –Given: data sources S 1,..., S k (DBMS, web sites,...) and user questions Q 1,..., Q n that can be answered using the S i –Find: the answers to Q 1,..., Q n The Database Perspective: source = “database”The Database Perspective: source = “database”  S i has a schema (relational, XML, OO,...)  S i can be queried  define virtual (or materialized) integrated views V over S 1,..., S k using database query languages (SQL, XQuery,...)  questions become queries Q i against V(S 1,..., S k )

14 What’s the Problem with XML & Complex Multiple-Worlds? XML is SyntaxXML is Syntax –DTDs talk about element nesting –XML Schema schemas give you data types –need anything else? => write comments! Domain Semantics is complex:Domain Semantics is complex: –implicit assumptions, hidden semantics  sources seem unrelated to the non-expert Need Structure and Semantics beyond XML trees!Need Structure and Semantics beyond XML trees!  employ richer OO models  make domain semantics and “glue knowledge” explicit  use ontologies to fix terminology and conceptualization  avoid ambiguities by using formal semantics

15 XML-Based vs. Model-Based Mediation Raw Data IF  THEN  Logical Domain Constraints Integrated-CM := CM-QL(Src1-CM,...) Integrated-CM := CM-QL(Src1-CM,...)...... (XML) Objects Conceptual Models XML Elements XML Models C2 C3 C1 R Classes, Relations, is-a, has-a,... Ontologies DMs, PMs Ontologies DMs, PMs Integrated-DTD := XML-QL(Src1-DTD,...) Integrated-DTD := XML-QL(Src1-DTD,...) No Domain Constraints A = (B*|C),D B =... Structural Constraints (DTDs), Parent, Child, Sibling,... CM ~ {Descr.Logic, ER, UML, RDF/XML(-Schema), …} CM-QL ~ {F-Logic, DAML+OIL, …}

16 Knowledge Representation: Relating Theory to the World via Formal Models Source: John F. Sowa, Knowledge Representation: Logical, Philosophical, and Computational FoundationsKnowledge Representation: Logical, Philosophical, and Computational Foundations “All models are wrong, but some models are useful!”

17 What is an ontology (and what is it good for)? And the answer is...

18 Glossary (wordreference.com) ontology noun 1 (Philosophy) the branch of metaphysics that deals with the nature of being 2 (Logic) the set of entities presupposed by a theoryontology noun 1 (Philosophy) the branch of metaphysics that deals with the nature of being 2 (Logic) the set of entities presupposed by a theory taxonomy noun 1 a the branch of biology concerned with the classification of organisms into groups based on similarities of structure, origin, etc.b the practice of arranging organisms in this way 2 the science or practice of classification [ETYMOLOGY: 19th Century: from French taxonomie, from Greek taxis order + -nomy]taxonomy noun 1 a the branch of biology concerned with the classification of organisms into groups based on similarities of structure, origin, etc.b the practice of arranging organisms in this way 2 the science or practice of classification [ETYMOLOGY: 19th Century: from French taxonomie, from Greek taxis order + -nomy] thesaurus noun (plural: -ruses, -ri [ -raı ]) 1 a book containing systematized lists of synonyms and related words 2 a dictionary of selected words or topics 3 (rare) a treasury[ETYMOLOGY: 18th Century: from Latin, Greek: treasure]thesaurus noun (plural: -ruses, -ri [ -raı ]) 1 a book containing systematized lists of synonyms and related words 2 a dictionary of selected words or topics 3 (rare) a treasury[ETYMOLOGY: 18th Century: from Latin, Greek: treasure]

19 Glossary (wordreference.com) concept noun 1 an idea, esp. an abstract idea example: the concepts of biology 2 (Philosophy) a general idea or notion that corresponds to some class of entities and that consists of the characteristic or essential features of the class 3 (Philosophy) a the conjunction of all the characteristic features of something b a theoretical construct within some theory c a directly intuited object of thought d the meaning of a predicate 4 [modifier] (of a product, esp. a car) created as an exercise to demonstrate the technical skills and imagination of the designers, and not intended for mass production or sale[ETYMOLOGY: 16th Century: from Latin conceptum something received or conceived, from concipere to take in, conceive]concept noun 1 an idea, esp. an abstract idea example: the concepts of biology 2 (Philosophy) a general idea or notion that corresponds to some class of entities and that consists of the characteristic or essential features of the class 3 (Philosophy) a the conjunction of all the characteristic features of something b a theoretical construct within some theory c a directly intuited object of thought d the meaning of a predicate 4 [modifier] (of a product, esp. a car) created as an exercise to demonstrate the technical skills and imagination of the designers, and not intended for mass production or sale[ETYMOLOGY: 16th Century: from Latin conceptum something received or conceived, from concipere to take in, conceive] contingent adjective 1 [when postpositive, often foll by on or upon] dependent on events, conditions, etc., not yet known; conditional 2 (Logic) (of a proposition) true under certain conditions, false under others; not necessary 3 (in systemic grammar) denoting contingency (sense 4) 4 (Metaphysics) (of some being) existing only as a matter of fact; not necessarily existing 5 happening by chance or without known cause; accidental 6 that may or may not happen; uncertaincontingent adjective 1 [when postpositive, often foll by on or upon] dependent on events, conditions, etc., not yet known; conditional 2 (Logic) (of a proposition) true under certain conditions, false under others; not necessary 3 (in systemic grammar) denoting contingency (sense 4) 4 (Metaphysics) (of some being) existing only as a matter of fact; not necessarily existing 5 happening by chance or without known cause; accidental 6 that may or may not happen; uncertain glossary noun (plural: -ries); an alphabetical list of terms peculiar to a field of knowledge with definitions or explanations. Sometimes called: gloss [ETYMOLOGY: 14th Century: from Late Latin glossarium; see gloss 2 ]glossary noun (plural: -ries); an alphabetical list of terms peculiar to a field of knowledge with definitions or explanations. Sometimes called: gloss [ETYMOLOGY: 14th Century: from Late Latin glossarium; see gloss 2 ]

20 1 st Attempt: Ontologies in CS An ontology is...An ontology is... –an explicit specification of a conceptualization [Gruber93] –a shared understanding of some domain of interest [Uschold, Gruninger96] Some aspects and parameters:Some aspects and parameters: –a formal specification (reasoning and “execution”) –... of a conceptualization of a domain (community) –... of some part of world that is of interest (application) Provides:Provides: –A common vocabulary of terms –Some specification of the meaning of the terms (semantics) –A shared “understanding” for people and machines

21 Ontology as a philosophical discipline Ontology as a philosophical discipline, which deals with the nature and the organization of reality:Ontology as a philosophical discipline, which deals with the nature and the organization of reality: –Ontology as such is usually contrasted with Epistemology, which deals with the nature and sources of our knowledge [a.k.a. Theory of Knowledge]. Aristotle defined Ontology as the science of being as such: unlike the special sciences, each of which investigates a class of beings and their determinations, Ontology regards all the species of being qua being and the attributes which belong to it qua being" (Aristotle, Metaphysics, IV, 1). In this sense Ontology tries to answer to the question: What is being? What exists? (the nature of being, not an enumeration of “stuff” around us…)In this sense Ontology tries to answer to the question: What is being? What exists? (the nature of being, not an enumeration of “stuff” around us…)

22 Some different uses of the word “Ontology” [Guarino’95] 1. Ontology as a philosophical discipline 2. Ontology as a an informal conceptual system 3. Ontology as a formal semantic account 4. Ontology as a specification of a “conceptualization” 5. Ontology as a representation of a conceptual system via a logical theory 5.1 characterized by specific formal properties 5.2 characterized only by its specific purposes 6. Ontology as the vocabulary used by a logical theory 7. Ontology as a (meta-level) specification of a logical theory http://ontology.ip.rm.cnr.it/Papers/KBKS95.pdf

23 Ontologies vs Conceptualizations Given a logical language L...Given a logical language L... –... a conceptualization is a set of models of L which describes the admittable (intended) interpretations of its non-logical symbols (the vocabulary) –... an ontology is a (possibly incomplete) axiomatization of a conceptualization. conceptualization C(L) ontology set of all models M(L) logictheories (consistent sets of sentences; closed under logical consequence) [Guarino96] http://www-ksl.stanford.edu/KR96/Guarino-What/P003.html

24 Ontologies vs Knowledge Bases An ontology is a particular KB, describing facts assumed to be always true by a community of users:An ontology is a particular KB, describing facts assumed to be always true by a community of users: –in virtue of the agreed-upon meaning of the vocabulary used (analytical knowledge): black => not white –... whose truth does not descend from the meaning of the vocabulary used (non-analytical, common knowledge) Rome is the capital of Italy An arbitrary KB may describe facts which are contingently true, and relevant to a particular epistemic state:An arbitrary KB may describe facts which are contingently true, and relevant to a particular epistemic state: –Mr Smith’s pathology is either cirrhosis or diabetes

25 Formal Ontology [Guarino’96] Theory of formal distinctionsTheory of formal distinctions –among things –among relations Basic toolsBasic tools –Theory of parthood What counts as a part of a given entity? What properties does the part relation have? Are the different kinds of parts? –Theory of integrity What counts as a whole? In which sense are its parts connected? –Theory of identity How can an entity change while keeping its identity? What are its essential properties? Under which conditions does an entity loose its identity? Does a change of “point of view” change the identity conditions? –Theory of dependence Can a given entity exist alone, or does it depend on other entities?

26 Ontology: Definition and Scope [Sowa] The subject of ontology is the study of the categories of things that exist or may exist in some domain. The product of such a study, called an ontology, is a catalog of the types of things that are assumed to exist in a domain of interest D from the perspective of a person who uses a language L for the purpose of talking about D. The types in the ontology represent the predicates, word senses, or concept and relation types of the language L when used to discuss topics in the domain D. An uninterpreted logic, such as predicate calculus, conceptual graphs, or KIF, is ontologically neutral. It imposes no constraints on the subject matter or the way the subject may be characterized. By itself, logic says nothing about anything, but the combination of logic with an ontology provides a language that can express relationships about the entities in the domain of interest.The subject of ontology is the study of the categories of things that exist or may exist in some domain. The product of such a study, called an ontology, is a catalog of the types of things that are assumed to exist in a domain of interest D from the perspective of a person who uses a language L for the purpose of talking about D. The types in the ontology represent the predicates, word senses, or concept and relation types of the language L when used to discuss topics in the domain D. An uninterpreted logic, such as predicate calculus, conceptual graphs, or KIF, is ontologically neutral. It imposes no constraints on the subject matter or the way the subject may be characterized. By itself, logic says nothing about anything, but the combination of logic with an ontology provides a language that can express relationships about the entities in the domain of interest. http://users.bestweb.net/~sowa/ontology/index.htm

27 Ontology: Definition and Scope [Sowa] An informal ontology may be specified by a catalog of types that are either undefined or defined only by statements in a natural language. A formal ontology is specified by a collection of names for concept and relation types organized in a partial ordering by the type-subtype relation. Formal ontologies are further distinguished by the way the subtypes are distinguished from their supertypes: an axiomatized ontology distinguishes subtypes by axioms and definitions stated in a formal language, such as logic or some computer-oriented notation that can be translated to logic; a prototype-based ontology distinguishes subtypes by a comparison with a typical member or prototype for each subtype. Large ontologies often use a mixture of definitional methods: formal axioms and definitions are used for the terms in mathematics, physics, and engineering; and prototypes are used for plants, animals, and common household items..An informal ontology may be specified by a catalog of types that are either undefined or defined only by statements in a natural language. A formal ontology is specified by a collection of names for concept and relation types organized in a partial ordering by the type-subtype relation. Formal ontologies are further distinguished by the way the subtypes are distinguished from their supertypes: an axiomatized ontology distinguishes subtypes by axioms and definitions stated in a formal language, such as logic or some computer-oriented notation that can be translated to logic; a prototype-based ontology distinguishes subtypes by a comparison with a typical member or prototype for each subtype. Large ontologies often use a mixture of definitional methods: formal axioms and definitions are used for the terms in mathematics, physics, and engineering; and prototypes are used for plants, animals, and common household items.. http://users.bestweb.net/~sowa/ontology/index.htm

28 Why develop an ontology? To make domain assumptions explicitTo make domain assumptions explicit –Easier to change domain assumptions –Easier to understand, update, and integrate legacy data  data integration To separate domain knowledge from operational knowledgeTo separate domain knowledge from operational knowledge –Re-use domain and operational knowledge separately A community reference for applicationsA community reference for applications To share a consistent understanding of what information means.To share a consistent understanding of what information means. [Carole Goble, Nigel Shadbolt, Ontologies and the Grid Tutorial]

29 What is being shared? Metadata Data describing the content and meaning of resources and services.Data describing the content and meaning of resources and services. But everyone must speak the same language…But everyone must speak the same language…Terminologies Shared and common vocabulariesShared and common vocabularies For search engines, agents, curators, authors and usersFor search engines, agents, curators, authors and users But everyone must mean the same thing…But everyone must mean the same thing…Ontologies Shared and common understanding of a domain Shared and common understanding of a domain Essential for search, exchange and discovery Essential for search, exchange and discovery  Ontologies aim at sharing meaning [Carole Goble, Nigel Shadbolt, Ontologies and the Grid Tutorial]

30 Origin and History Humans require words (or at least symbols) to communicate efficiently. The mapping of words to things is indirect. We do it by creating concepts that refer to things. The relation between symbols and things has been described in the form of the meaning triangle: “Jaguar“ Concept Ogden, C. K. & Richards, I. A. 1923. "The Meaning of Meaning." 8th Ed. New York, Harcourt, Brace & World, Inc before: Frege, Peirce; see [Sowa 2000] [Carole Goble, Nigel Shadbolt, Ontologies and the Grid Tutorial]

31 Human and machine communication... Machine Agent 1 Things Human Agent 2 Ontology Description Machine Agent 2 exchange symbol, e.g. via nat. language ‘‘JAGUAR“ Internal models Concept Formal models exchange symbol, e.g. via protocols MA1 HA1 HA2 MA2 Symbol commit a specific domain, e.g. animals commit Ontology Formal Semantics Human Agent 1 Meaning Triangle [Maedche et al., 2002]

32 An explicit description of a domain Concepts (class, set, type, predicate)Concepts (class, set, type, predicate) –event, gene, gammaBurst, atrium, molecule, cat Properties of concepts and relationships between them (slot)Properties of concepts and relationships between them (slot) –Taxonomy: generalisation ordering among concepts isA, partOf, subProcess –Relationship, Role or Attribute: functionOf, hasActivity location, eats, size animal rodent cow cat mouse eats dog domestic vermin [Carole Goble, Nigel Shadbolt, Ontologies and the Grid Tutorial]

33 Concepts Primitive concepts:Primitive concepts: –properties are necessary –Globular protein must have hydrophobic core (but a protein with a hydrophobic core need not be a globular protein) GlobularProtein ⊑ has-a.HydrophobicCore Defined concepts:Defined concepts: –properties are necessary + sufficient –Eukaryotic cells must have a nucleus. EukaryoticCell  has-a.Nucleus –Every cell that contains a nucleus must be Eukaryotic. [Robert Stevens]

34 What is a concept? Different communities have different notions on what a concept means: –Formal concept analysis (see http://www.math.tu- dresden.de/~ganter/fba.html) talk about formal conceptshttp://www.math.tu- dresden.de/~ganter/fba.html –Description Logics (see http://dl.kr.org/): They talk about concept labelshttp://dl.kr.org/ –ISO-704:2000 – Terminology Work: (see http://www.iso.ch/)http://www.iso.ch/ –Often the classical notion of a frame in AI or a class in OO modeling is seen as equivalent to a concept.

35 Formal Concept Analysis (FCA) Concept Lattice Formal Concept Analysis [Sowa, http://users.bestweb.net/~sowa/misc/mathw.htm]

36 An explicit description of a domain Constraints or axioms on properties and concepts:Constraints or axioms on properties and concepts: –value: integer –domain: cat –cardinality: at most 1 –range: 0 <= X <= 100 –oligonucleiotides < 20 base pairs –cows are larger than dogs –cats cannot eat only vegetation –cats and dogs are disjoint Values or concrete domainsValues or concrete domains –integer, strings –20, trypotoplan-synthetase animal rodent cow cat mouse eats dog domestic vermin [Carole Goble, Nigel Shadbolt, Ontologies and the Grid Tutorial]

37 An explicit description of a domain Individuals or InstancesIndividuals or Instances –sulphur, trpA Gene, felix NominalsNominals –Concepts that cannot have instances –Instances that are used in conceptual definitions –ItalianDog = Dog bornIn Italy InstancesInstances –An ontology = concepts+properties+axioms+values+nominals –A knowledge base = ontology+instances animal rodent cow cat mouse eats dog domestic vermin mickey felix jerry tom [Carole Goble, Nigel Shadbolt, Ontologies and the Grid Tutorial]

38 Light and Heavy expressivity LightweightLightweight –Concepts, atomic types –Is-a hierarchy –Relationships between concepts HeavyweightHeavyweight –Metaclasses –Type constraints on relations –Cardinality constraints –Taxonomy of relations –Reified statements –Axioms –Semantic entailments –Expressiveness –Inference systems A matter of rigour and representational expressivity [Carole Goble, Nigel Shadbolt, Ontologies and the Grid Tutorial]

39 A semantic continuum [Mike Uschold, Boeing Corp] Shared human consensus Text descriptions Semantics hardwired; used at runtime Semantics processed and used at runtime Pump: “a device for moving a gas or liquid from one place or container to another” (pump has (superclasses (…)) ImplicitInformal (explicit) Formal (for humans) Formal (for machines) Further to the right means: Less ambiguity More likely to have correct functionality Better inter-operation (hopefully) Less hardwiring More robust to change More difficult

40 Some Ontologies and “Ontologies” (coming soon to a project near you)

41 SMART (Meta)data I: Logical Data Views Source: NADAM Team (Boyan Brodaric et al.) Adoption of a standard (meta)data model => wrap data sets into unified virtual views

42 SMART Metadata II: Multihierarchical Rock Classification for “Thematic Queries” (GSC) –– or: Taxonomies are not only for biologists... Composition Genesis Fabric Texture “smart discovery & querying” via multiple, independent concept hierarchies (controlled vocabularies) data at different description levels can be found and processed

43 Biomedical Informatics Research Network http://nbirn.net Biomedical Informatics Research Network http://nbirn.net SMART Metadata III: Source Contextualization & Ontology Refinement Focused GEON ontology working meeting last week... (GEON, SCEC/KR, GSC, ESRI)

44 Gene Ontology http://www.geneontology.org “a dynamic controlled vocabulary that can be applied to all eukaryotes” Built by the community for the community. Three organising principles: Molecular function, Biological process, Cellular component Isa and Part of taxonomy – but not good! ~10,000 concepts Lightweight ontology, Poor semantic rigour. Ok when small and used for annotation. Obstacle when large, evolving and used for mining.

45 Controlled vocabulary AGROVOC: Agricultural VocabularyAGROVOC: Agricultural Vocabulary

46 AN APPLICATION OF ONTOLOGIES: An Ontology-Driven Framework for Data Transformation in Scientific Workflows (from DILS’04) Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego

47 Outline Background (SEEK Project)Background (SEEK Project) Scientific WorkflowsScientific Workflows The Problem: Reusing Structurally Incompatible ServicesThe Problem: Reusing Structurally Incompatible Services The Ontology-Driven FrameworkThe Ontology-Driven Framework Future WorkFuture Work

48 Outline Background (SEEK Project)Background (SEEK Project) Scientific WorkflowsScientific Workflows The Problem: Reusing Structurally Incompatible ServicesThe Problem: Reusing Structurally Incompatible Services The Ontology-Driven FrameworkThe Ontology-Driven Framework Future WorkFuture Work

49 Science Environment for Ecological Knowledge (SEEK) Domain Science DriverDomain Science Driver –Ecology (LTER), biodiversity, … Analysis & Modeling SystemAnalysis & Modeling System –Design and execution of ecological models and analysis –End user focus –{application,upper}-ware Semantic Mediation SystemSemantic Mediation System –Data Integration of hard-to- relate sources and processes –Semantic Types and Ontologies –upper middleware EcoGridEcoGrid –Access to ecology data and tools –{middle,under}-ware Architecture (cf. US cyberinfrastructure, UK e-Science) this paper

50 Outline The SEEK ProjectThe SEEK Project Scientific WorkflowsScientific Workflows –Focus: analysis & component integration on top of data integration The Problem: Reusing Structurally Incompatible ServicesThe Problem: Reusing Structurally Incompatible Services The Ontology-Driven FrameworkThe Ontology-Driven Framework Future WorkFuture Work

51 Promoter Identification in Kepler [SSDBM’03] Problems –Many components (web serivces) are NOT designed to fit! “The problem P that X solves is simple, and X doesn’t solve it well” –Semantically meaningful connections are structurally incompatible Approach –Distinguish structural type and semantic type –Structural type: e.g. XML Schema –Semantic type: e.g. OWL expressions –Exploit the (optional!) semantic type as much as possible Problems –Many components (web serivces) are NOT designed to fit! “The problem P that X solves is simple, and X doesn’t solve it well” –Semantically meaningful connections are structurally incompatible Approach –Distinguish structural type and semantic type –Structural type: e.g. XML Schema –Semantic type: e.g. OWL expressions –Exploit the (optional!) semantic type as much as possible

52 A Very Simple Scientific Workflow S 1 (life stage property) S 2 (mortality rate for period) S 2 (mortality rate for period) P1P1 P2P2 P4P4 P3P3 P5P5

53 A Very Simple Scientific Workflow S 1 (life stage property) S 2 (mortality rate for period) S 2 (mortality rate for period) P1P1 P2P2 P4P4 P3P3 P5P5 Phase Observed Eggs Instar I Instar II Instar III Instar IV Adults 44,000 3,513 2,529 1,922 1,461 1,300 observations Population samples for life stages of the common field grasshopper [Begon et al, 1996]

54 A Very Simple Scientific Workflow S 1 (life stage property) S 2 (mortality rate for period) S 2 (mortality rate for period) P1P1 P2P2 P4P4 P3P3 P5P5 Phase ObservedPeriodPhases Eggs Instar I Instar II Instar III Instar IV Adults 44,000 3,513 2,529 1,922 1,461 1,300 Nymphal{Instar I, Instar II, Instar III, Instar IV} Population samples for life stages of the common field grasshopper [Begon et al, 1996] Periods of development in terms of phases life stage periods observations

55 A Very Simple Scientific Workflow S 1 (life stage property) S 2 (mortality rate for period) S 2 (mortality rate for period) P1P1 P2P2 P4P4 P3P3 P5P5 Phase ObservedPeriodPhases Eggs Instar I Instar II Instar III Instar IV Adults 44,000 3,513 2,529 1,922 1,461 1,300 Nymphal{Instar I, Instar II, Instar III, Instar IV} Population samples for life stages of the common field grasshopper [Begon et al, 1996] Periods of development in terms of phases life stage periods k-value for each period of observation [(nymphal, 0.44)] observations

56 Scientific Workflows A scientific workflow consists of a network of connected services … A service can be any software component (including a web service or even a data source) … Each service (optionally) takes input and (optionally) produces output

57 Scientific Workflows SEEK adopts a Ptolemy II “workflow” model: –A service is called an actor –Each actor has zero or more input and output ports (and possibly parameters) –Data flows through a workflow based on connections made from output to input ports –(ignored here: different models of computation, directors, …) S 1 (life stage property) S 2 (mortality rate for period) S 2 (mortality rate for period) P1P1 P2P2 P4P4 P3P3 P5P5

58 Outline The SEEK ProjectThe SEEK Project Scientific WorkflowsScientific Workflows The Problem: Reusing Structurally Incompatible ServicesThe Problem: Reusing Structurally Incompatible Services The Ontology-Driven FrameworkThe Ontology-Driven Framework Future WorkFuture Work

59 Service Reusability A scientist wishes to connect two (independent) services Source Service Source Service Target Service Target Service PsPs PtPt Desired Connection

60 Service Reusability In Ptolemy II/Kepler (and in web services), input and output ports (message parts) have structural types (XML Schema) Source Service Source Service Target Service Target Service PsPs PtPt Structural Type P t Structural Type P s Desired Connection

61 Service Reusability Unless “designed to fit,” independent services are structurally incompatible  Generally, the source output type will not be a subtype of the target input type Source Service Source Service Target Service Target Service PsPs PtPt Structural Type P t Structural Type P s Desired Connection Incompatible (⋠)(⋠)

62 Service Reusability A transformation mapping (  ) is required to connect the services … artificially creating subtype compatibility If such a  exists, the services are “structurally feasible” Source Service Source Service Target Service Target Service PsPs PtPt Structural Type P t Structural Type P s Desired Connection Incompatible (⋠)(⋠) (Ps)(Ps) (Ps)(Ps)  (≺)(≺)

63 Service Reusability SEEK annotates services with semantic types for discovery and interoperability of services Source Service Source Service Target Service Target Service PsPs PtPt Ontologies (OWL) Semantic Type P s Semantic Type P t Desired Connection Compatible (⊑)(⊑)

64 Service Reusability Services can be semantically compatible, but structurally incompatible Source Service Source Service Target Service Target Service PsPs PtPt Semantic Type P s Semantic Type P t Structural Type P t Structural Type P s Desired Connection Incompatible Compatible (⋠)(⋠) (⊑)(⊑) (Ps)(Ps) (Ps)(Ps)  (≺)(≺) Ontologies (OWL)

65 Example Structural Types (XML) S 1 (life stage property) S 2 (mortality rate for period) S 2 (mortality rate for period) P1P1 P2P2 P4P4 P3P3 P5P5 root population = (sample)* elem sample= (meas, lsp) elem meas= (cnt, acc) elem cnt= xsd:integer elem acc= xsd:double elem lsp= xsd:string 44,000 0.95 Eggs … root cohortTable= (measurement)* elem measuremnt= (phase, obs) elem phase= xsd:string elem obs= xsd:integer Eggs 44,000 … structType(P 2 ) structType(P 3 )

66 Example Semantic Types Portion of SEEK measurement ontology MeasContext ObservationEntityMeasProperty hasContext 0:* 1:1 appliesTo hasProperty 0:* Accuracy Qualifier Ecological Property Abundance Count LifeStage Property Numeric Value Spatial Location hasLocation hasCount 1:1 hasValue 1:1 itemMeasured 1:*

67 Example Semantic Types Portion of SEEK measurement ontology MeasContext ObservationEntityMeasProperty hasContext 0:* 1:1 appliesTo hasProperty 0:* Accuracy Qualifier Ecological Property Abundance Count LifeStage Property Numeric Value Spatial Location hasLocation hasCount 1:1 hasValue 1:1 itemMeasured 1:* Same in OWL, a description logic standard (here, Sparrow syntax): Observation subClassOf forall hasContext/MeasContext and forall hasProperty/MeasProperty and exists itemMeasured/Entity. MeasContext subClassOf exists appliesTo/Entity and atmost 1/appliesTo. EcologicalProperty subClassOf Entity. LifeStageProperty subClassOf EcologicalProperty. AbundanceCount subClassOf EcologicalProperty and exists hasLocation/SpatialLocation and atMost 1/hasLocation and exists hasCount/NumericValue and atMost 1/hasCount. Same in OWL, a description logic standard (here, Sparrow syntax): Observation subClassOf forall hasContext/MeasContext and forall hasProperty/MeasProperty and exists itemMeasured/Entity. MeasContext subClassOf exists appliesTo/Entity and atmost 1/appliesTo. EcologicalProperty subClassOf Entity. LifeStageProperty subClassOf EcologicalProperty. AbundanceCount subClassOf EcologicalProperty and exists hasLocation/SpatialLocation and atMost 1/hasLocation and exists hasCount/NumericValue and atMost 1/hasCount.

68 Example Semantic Types Semantic types for P2 and P3 S 1 (life stage property) S 2 (mortality rate for period) S 2 (mortality rate for period) P1P1 P2P2 P4P4 P3P3 P5P5 Observation semType(P 3 ) MeasContext hasContext 1:1 appliesTo LifeStage Property 1:1 Abundance Count itemMeasured Number Value hasCount 1:1 semType(P 2 ) ⊑ Accuracy Qualifier hasProperty 1:1 hasValue 1:1

69 Example Semantic Types Semantic types for P2 and P3 S 1 (life stage property) S 2 (mortality rate for period) S 2 (mortality rate for period) P1P1 P2P2 P4P4 P3P3 P5P5 Observation semType(P 3 ) MeasContext hasContext 1:1 appliesTo LifeStage Property 1:1 Abundance Count itemMeasured Number Value hasCount 1:1 semType(P 2 ) ⊑ Accuracy Qualifier hasProperty 1:1 hasValue 1:1 semType(P3) subClassOf Observation and exists hasContext/(MeasurementContext and exists appliesTo/LifeStageProperty and atMost 1/appliesTo) and exists itemMeasured/AbundanceCount and atMost 1/itemMeasured. semType(P2) subClassOf Observation and exists hasContext/(MeasurementContext and exists appliesTo/LifeStageProperty and atMost 1/appliesTo) and exists itemMeasured/AbundanceCount and atMost 1/itemMeasured and exists hasProperty/AccuracyQualifier and atMost 1/hasProperty. semType(P3) subClassOf Observation and exists hasContext/(MeasurementContext and exists appliesTo/LifeStageProperty and atMost 1/appliesTo) and exists itemMeasured/AbundanceCount and atMost 1/itemMeasured. semType(P2) subClassOf Observation and exists hasContext/(MeasurementContext and exists appliesTo/LifeStageProperty and atMost 1/appliesTo) and exists itemMeasured/AbundanceCount and atMost 1/itemMeasured and exists hasProperty/AccuracyQualifier and atMost 1/hasProperty.

70 Outline The SEEK ProjectThe SEEK Project Scientific WorkflowsScientific Workflows The Problem: Reusing Structurally Incompatible ServicesThe Problem: Reusing Structurally Incompatible Services The Ontology-Driven FrameworkThe Ontology-Driven Framework Future WorkFuture Work

71 The Ontology-Driven Framework Define semantic registration mappings (“semantic views”) to connect structural and semantic types Use registration mappings to (semi-) automate transformation, based on derived structural correspondences Depending on the ontologies and registration mappings, it may not be possible to find an appropriate  … (since the correspondence is often under-specified)

72 The Ontology-Driven Framework Source Service Source Service Target Service Target Service PsPs PtPt Semantic Type P s Semantic Type P t Structural Type P t Structural Type P s Desired Connection Compatible (⊑)(⊑) Registration Mapping (Output) Registration Mapping (Input) Ontologies (OWL)

73 Registration Example (simple XPaths) /population/sample == semType(P2) /population/sample/meas/cnt == semType(P2).itemMeasured /population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount /population/sample/meas/acc == semType(P2).hasProperty /population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue /population/sample/lsp/text() == semType(P2).hasContext.appliesTo root population = (sample)* elem sample= (meas, lsp) elem meas= (cnt, acc) elem cnt= xsd:integer elem acc= xsd:double elem lsp= xsd:string 44,000 0.95 Eggs … structType(P 2 )

74 Registration Example (simple XPaths) /population/sample == semType(P2) /population/sample/meas/cnt == semType(P2).itemMeasured /population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount /population/sample/meas/acc == semType(P2).hasProperty /population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue /population/sample/lsp/text() == semType(P2).hasContext.appliesTo root population = (sample)* elem sample= (meas, lsp) elem meas= (cnt, acc) elem cnt= xsd:integer elem acc= xsd:double elem lsp= xsd:string 44,000 0.95 Eggs … structType(P 2 ) Each sample is an instance of the semantic type

75 Registration Example (simple XPaths) /population/sample == semType(P2) /population/sample/meas/cnt == semType(P2).itemMeasured /population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount /population/sample/meas/acc == semType(P2).hasProperty /population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue /population/sample/lsp/text() == semType(P2).hasContext.appliesTo root population = (sample)* elem sample= (meas, lsp) elem meas= (cnt, acc) elem cnt= xsd:integer elem acc= xsd:double elem lsp= xsd:string 44,000 0.95 Eggs … structType(P 2 ) Each sample ’s cnt represents the itemMeasured object

76 Registration Example (simple XPaths) /population/sample == semType(P2) /population/sample/meas/cnt == semType(P2).itemMeasured /population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount /population/sample/meas/acc == semType(P2).hasProperty /population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue /population/sample/lsp/text() == semType(P2).hasContext.appliesTo root population = (sample)* elem sample= (meas, lsp) elem meas= (cnt, acc) elem cnt= xsd:integer elem acc= xsd:double elem lsp= xsd:string 44,000 0.95 Eggs … structType(P 2 ) Each sample ’s cnt ’s value represents the hasCount value of the corresponding itemMeasured object

77 Registration Example (simple XPaths) /cohortTable/measurement == semType(P3) /cohortTable/measurement/obs == semType(P3).itemMeasured /cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount /cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo Eggs 44,000 … root cohortTable= (measurement)* elem measuremnt= (phase, obs) elem phase= xsd:string elem obs= xsd:integer structType(P 3 ) … similary for P 3.. ….

78 The Ontology-Driven Framework Source Service Source Service Target Service Target Service PsPs PtPt Semantic Type P s Semantic Type P t Structural Type P t Structural Type P s Desired Connection Compatible (⊑)(⊑) Registration Mapping (Output) Registration Mapping (Input) Correspondence Ontologies (OWL)

79 Correspondence Example /population/sample == semType(P2) /population/sample/meas/cnt == semType(P2).itemMeasured /population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount /population/sample/meas/acc == semType(P2).hasProperty /population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue /population/sample/lsp/text() == semType(P2).hasContext.appliesTo /cohortTable/measurement == semType(P3) /cohortTable/measurement/obs == semType(P3).itemMeasured /cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount /cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo Source-side semantic registration mapping Target-side semantic registration mapping population sample * meas cnt xsd:double xsd:string lsp xsd:integer acc cohortTable measurement * obs xsd:integer phase xsd:string

80 Correspondence Example /population/sample == semType(P2) /population/sample/meas/cnt == semType(P2).itemMeasured /population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount /population/sample/meas/acc == semType(P2).hasProperty /population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue /population/sample/lsp/text() == semType(P2).hasContext.appliesTo /cohortTable/measurement == semType(P3) /cohortTable/measurement/obs == semType(P3).itemMeasured /cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount /cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo Source Target population sample * meas cnt xsd:double xsd:string lsp xsd:integer acc cohortTable measurement * obs xsd:integer phase xsd:string We want to “compose” the registrations to obtain structural correspondences We want to “compose” the registrations to obtain structural correspondences

81 Correspondence Example /population/sample == semType(P2) /population/sample/meas/cnt == semType(P2).itemMeasured /population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount /population/sample/meas/acc == semType(P2).hasProperty /population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue /population/sample/lsp/text() == semType(P2).hasContext.appliesTo /cohortTable/measurement == semType(P3) /cohortTable/measurement/obs == semType(P3).itemMeasured /cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount /cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo Source Target population sample * meas cnt xsd:double xsd:string lsp xsd:integer acc cohortTable measurement * obs xsd:integer phase xsd:string /population/sample == semType(P2) /cohortTable/measurement == semType(P3) These fragments correspond

82 Correspondence Example /population/sample == semType(P2) /population/sample/meas/cnt == semType(P2).itemMeasured /population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount /population/sample/meas/acc == semType(P2).hasProperty /population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue /population/sample/lsp/text() == semType(P2).hasContext.appliesTo /cohortTable/measurement == semType(P3) /cohortTable/measurement/obs == semType(P3).itemMeasured /cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount /cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo Source Target population sample * meas cnt xsd:double xsd:string lsp xsd:integer acc cohortTable measurement * obs xsd:integer phase xsd:string /population/sample/meas/cnt == semType(P2).itemMeasured /cohortTable/measurement/obs == semType(P3).itemMeasured These fragments correspond

83 Correspondence Example /population/sample == semType(P2) /population/sample/meas/cnt == semType(P2).itemMeasured /population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount /population/sample/meas/acc == semType(P2).hasProperty /population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue /population/sample/lsp/text() == semType(P2).hasContext.appliesTo /cohortTable/measurement == semType(P3) /cohortTable/measurement/obs == semType(P3).itemMeasured /cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount /cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo Source Target population sample * meas cnt xsd:double xsd:string lsp xsd:integer acc cohortTable measurement * obs xsd:integer phase xsd:string /population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount /cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount These fragments correspond

84 Correspondence Example /population/sample == semType(P2) /population/sample/meas/cnt == semType(P2).itemMeasured /population/sample/meas/cnt/text() == semType(P2).itemMeasured.hasCount /population/sample/meas/acc == semType(P2).hasProperty /population/sample/meas/acc/text() == semType(P2).hasProperty.hasValue /population/sample/lsp/text() == semType(P2).hasContext.appliesTo /cohortTable/measurement == semType(P3) /cohortTable/measurement/obs == semType(P3).itemMeasured /cohortTable/measurement/obs/text() == semType(P3).itemMeasured.hasCount /cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo Source Target population sample * meas cnt xsd:double xsd:string lsp xsd:integer acc cohortTable measurement * obs xsd:integer phase xsd:string /population/sample/lsp/text() == semType(P2).hasContext.appliesTo /cohortTable/measurement/phase/text() == semType(P3).hasContext.appliesTo These fragments correspond

85 The Ontology-Driven Framework Source Service Source Service Target Service Target Service PsPs PtPt Semantic Type P s Semantic Type P t Structural Type P t Structural Type P s Desired Connection Compatible (⊑)(⊑) Registration Mapping (Output) Registration Mapping (Input) Correspondence Generate (Ps)(Ps) (Ps)(Ps) Ontologies (OWL) Transformation

86 Example Result (XQuery) Based on the structural correspondences and certain assumptions, we derive the transformation XQuery: { for $s in /population/sample return { for $c in $s/meas/cnt return {$c/text()} } { for $l in $s/lsp return {$l/text()} } }

87 Assumptions Made (or why this may not work for you…) Common XPath prefixes refer to the same elementCommon XPath prefixes refer to the same element Elements in correspondences have compatible cardinalitiesElements in correspondences have compatible cardinalities –source is equivalent or stricter than target (e.g., + is stricter than *) Primitive data types are compatiblePrimitive data types are compatible


Download ppt "Department of Computer Science & Engineering University of California, San Diego CSE-291:Ontologies in Data and Process Integration Spring 2004 Bertram."

Similar presentations


Ads by Google