Semantic Data Integration in myGrid and ourGrid (SEEK) National e-Science Centre e-Science Institute, Edinburgh May 14 th, 2004
meets e-Science … meets e-Science, Edinburgh, May 9-11, Plan of the Day 9:00–10:30 –SEEK Data Integration & Semantic Extensions 10:30–11:00 BREAK 11:00–12:30 –myGrid Data Integration & Semantic Extensions 12:30–13:45 LUNCH 13:45–15:45 –Interoperable Semantic Registration, Mediation, Workflows 15:45–16:00 BREAK 16:00–17:00 –Plenary Session
SEEK Data Integration & Semantic Extensions Shawn Bowers (SDSC/UCSD) Bertram Ludaescher (SDSC/UCSD) & SEEK KR-SMS Team & GEON KR Team Sparrow
meets e-Science … meets e-Science, Edinburgh, May 9-11, Purpose / Goals Link-Up: –… [on] data / services with “semantics” –… to do semantic data & service integration –also: an e-Science “Sister Project” to facilitate knowledge exchange & collaboration between UK & US based projects (where is the web/wiki page?) Specifically: –What approaches to express semantics of data, services, and workflows do we all use? –How can we make them interoperable? … keeping in mind… –What problem is it that the XYZ solution solves ?
meets e-Science … meets e-Science, Edinburgh, May 9-11, Science Environment for Ecological Knowledge Domain Science Driver –Ecology (LTER), biodiversity, … Analysis & Modeling System –Design & execution of ecological models & analysis –End (&power) user focus – {application,upper}-ware Kepler Semantic Mediation System –Data Integration of hard-to- relate sources and processes –Semantic Types and Ontologies – upper middleware Sparrow Toolkit EcoGrid –Access to ecology data and tools – {middle,under}-ware one specific problem (DILS’04) our focus architecture
meets e-Science … meets e-Science, Edinburgh, May 9-11, Heterogeneous Data integration Requires advanced metadata and processing –Attributes must be semantically typed –Collection protocols must be known –Units and measurement scale must be known –Measurement relationships must be known e.g., that ArealDensity=Count/Area
meets e-Science … meets e-Science, Edinburgh, May 9-11,
A Neuroscientist’s Information Integration Problem What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents??InformationIntegration protein localization (NCMIR) neurotransmission(SENSELAB) sequence info (CaPROT) morphometry(SYNAPSE) “ComplexMultiple-Worlds”Mediation Biomedical Informatics Research Network
A Home Buyer’s Information Integration Problem What houses for sale under $500k have at least 2 bathrooms, 2 bedrooms, a nearby school ranking in the upper third, in a neighborhood with below-average crime rate and diverse population??InformationIntegration Realtor Demographics School Rankings Crime Stats “Multiple-Worlds”Mediation
An Online Shopper’s Information Integration Problem El Cheapo: “Where can I get the cheapest copy (including shipping cost) of Wittgenstein’s Tractatus Logicus-Philosophicus within a week?” ? Information Integration addall.com“One-World”Mediation amazon.com A1books.com half.com barnes&noble.com
meets e-Science … meets e-Science, Edinburgh, May 9-11, Standard (XML-Based) Mediator Architecture MEDIATOR (XML) Queries & Results S1S1 Wrapper (XML) View S2S2 Wrapper (XML) View SkSk Wrapper (XML) View Integrated Global (XML) View G Integrated View Definition G(..) S 1 (..)…S k (..) USER/Client USER/Client Query Q ( G (S 1,..., S k ) ) Query Q ( G (S 1,..., S k ) ) wrappers implemented as web services
meets e-Science … meets e-Science, Edinburgh, May 9-11, Information Integration: Problems and “Solutions” System aspects: “Grid” Infrastructure –Authentication, single sign-on, … –distributed computation –web wervices, WSDL/SOAP, … – sources = functions, files, databases, … Syntax & Structure: (XML-Based) Database Mediators –wrapping, restructuring –distributed (XML) queries and views – sources = (XML) databases Semantics: Model-Based/Semantic Mediators –conceptual models, declarative views –ontologies, description logics (OWL, RDF,…) – sources = knowledge bases (DB+CMs+ICs) Syntax Structure Semantics System aspects reconciling S 4 heterogeneitiesreconciling S 4 heterogeneities “gluing” together multiple data sources“gluing” together multiple data sources bridging inforbmation and knowledge gaps computationallybridging inforbmation and knowledge gaps computationally
meets e-Science … meets e-Science, Edinburgh, May 9-11, Exercise: Classify (system, syntax, structure, semantics, sth else …) “9:00” vs “9am” vs “21:00” vs “9 ct” “3 miles” (land|sea) (here UK|US|elsewhere) (now|elsewhen) … “picea rubens” (name vs concept … in biological taxonomies) …
meets e-Science … meets e-Science, Edinburgh, May 9-11, Different Types of “Ontologies” and Representations Overloaded/sloppy for a… –“ Napkin drawing ”, “concept space” (e.g. in PPT) – Labeled graph, semantic network, concept map (e.g. in RDF) – Controlled vocabulary (structured or flat) – Database schema (relational, XML, …) – Conceptual schema (ER, UML, … ) – Thesaurus (synonyms, broader term/narrower term) – Taxonomy – Formal ontology, e.g., in [Description] Logic (e.g. in OWL) “formalization of a specification” An ontology may … –constrain possible interpretation of terms – specify a theorydefiningrelating concepts – specify a theory by defining and relating concepts of a domain of interest logic models (=“allowed/intented intepretations” of symbolstheory = set of logic models (=“allowed/intented intepretations” of symbols)
meets e-Science … meets e-Science, Edinburgh, May 9-11, Community-Based Ontology Development Draft of a geochemistry ontology developed by scientists Current concept maps and emerging ontologies in GEON: 1.Igneous Rocks/Plutons 2.Seismology 3.Geochemistry … in SEEK: 1.Taxon 2.Units 3.Measurements 4.…?
meets e-Science … meets e-Science, Edinburgh, May 9-11, Creating and Sharing Concept Maps (here: Seismology concept map, Cmap tool; Kai Lin, GEON) Lock up scientists for 2+ days Add CS/KRDB types Create concept maps Refine Iterate from napkin drawings, to concept maps, to ontologies
meets e-Science … meets e-Science, Edinburgh, May 9-11,
meets e-Science … meets e-Science, Edinburgh, May 9-11,
meets e-Science … meets e-Science, Edinburgh, May 9-11,
meets e-Science … meets e-Science, Edinburgh, May 9-11, Graph (RDF) Queries on Ontologies visualization RQL Query: Show all “ products ” Query Results Prototype: Kai Lin, GEON
meets e-Science … meets e-Science, Edinburgh, May 9-11, Ontologies: Qui bono? What are ontologies used for? – Conceptual models of a domain or application, (communication means, system/database design, …) – Classification of … concepts (taxonomy) and data/object instances based on properties and concept definitions – Analysis of ontologies e.g. Graph queries (reachability, path queries, …) Reasoning (concept subsumption, consistency checking, …) – Targets for semantic data registration – Conceptual indexes and views for “smart” operations
meets e-Science … meets e-Science, Edinburgh, May 9-11, Using ontologies for … Smart data discovery Smart service discovery Smart (data) querying Smart data integration (declarative) Smart workflow planning (execution !?) (procedural) Here: def_macro “smart” := (ontology|semantics) – (based|enhanced|enabled)
meets e-Science … meets e-Science, Edinburgh, May 9-11, Specifically in SMS.. “smart data discovery” – e.g., … –asking for A, retrieve B’s too, since B isa A “smart connections” – e.g., … –data/source binding to AMS (Kepler) services (actors) –service-to-service semantic (and structural?) type checking –service-to-service & data-to-service “gluing” (insert structural transformations, unit conversions, suggest services based on parameter chasing (parameter ontologies)
meets e-Science … meets e-Science, Edinburgh, May 9-11, … specifically in SMS.. (Cont’d) “smart data integration”– e.g., … –concept-based instance classification and data enumeration (as part of integrated/mediated views) –discovery and use of new join relations across sources –rewriting queries (against which SEEK/EcoGrid/EML schemas??) using ontologies & integrity constraints –generation of feasible distributed query plans in the presence of access patterns (web services), views, integrity constraints (ICs) Need for “semantic registration/annotation” –Linking data structures/objects to conceptual structures
meets e-Science … meets e-Science, Edinburgh, May 9-11, Things to “Register” Data files (individual files) –Shapefile as a blob (+ file type) Collections (of files; nested; eg satellite data) Databases (has schema and can be queried) –Shapefile with schema registered Ontologies Services (web + grid services) Other/external applications
meets e-Science … meets e-Science, Edinburgh, May 9-11, Ontologies and Data Management ( watch out for Semantic Data Registration later) Schema Conceptual Model Conceptual Model Ontology Data Metadata Design Artifact use concepts from (explicitly or implicitly)
meets e-Science … meets e-Science, Edinburgh, May 9-11, A Multi-Hierarchical Rock Classification “Ontology” (GSC) Composition Genesis Fabric Texture
Application Example: Geologic Map Integration domain knowledge domain knowledge Knowledge representation Ontologies!? Nevada +/- a few hundred million years “Semantic Registration” of shapefiles to a shared ontology concept-based queries; also allows … … viewing of British-registered USGS data through Canadian eyes
meets e-Science … meets e-Science, Edinburgh, May 9-11, Example: Smart Connections [DILS’04] Services can be semantically compatible, but structurally incompatible Source Service Source Service Target Service Target Service PsPs PtPt Semantic Type P s Semantic Type P t Structural Type P t Structural Type P s Desired Connection Incompatible Compatible (⋠)(⋠) (⊑)(⊑) (Ps)(Ps) (Ps)(Ps) (≺)(≺) Ontologies (OWL)
meets e-Science … meets e-Science, Edinburgh, May 9-11, Example: Smart Connections [DILS’04] Source Service Source Service Target Service Target Service PsPs PtPt Semantic Type P s Semantic Type P t Structural Type P t Structural Type P s Desired Connection Compatible (⊑)(⊑) Registration Mapping (Output) Registration Mapping (Input) Correspondence Generate (Ps)(Ps) (Ps)(Ps) Ontologies (OWL) Transformation
meets e-Science … meets e-Science, Edinburgh, May 9-11, The Sparrow Toolkit (Origins) Annoyance with ugly, user-unfriendly XML syntaxes (e.g., OWL in XML, rules in XML, … anything in XML) –Note: others got annoyed too, but we didn’t know [OWL Concrete Abstract Syntax, Bechhofer et al.] –(well, we knew about Triple, but that’s only RDF…) Instead use a lean syntax (how XML should have been) – owl employee isa person and worksfor some employer. – owl mother eqv person and female and hasChild some person. – rdf john, worksfor, ‘IBM’. –… are both human and machine readable –… in fact the language was invented around the corner… –… and this is the “parser”: :- op(1100, fx, owl ), op(1100, fx, rdf ), :- op(600, xfx, isa ), op(600, xfx, eqv ). :- op(550, xfy, or ), op(500, xfy, and ), op(350, fx, not ). :- op(400, xfy, some ), op(400, xfy, only ).
meets e-Science … meets e-Science, Edinburgh, May 9-11, Sparrow (a poor man’s OWL tool …) Simple ASCII-based RDF and OWL entry and manipulation
meets e-Science … meets e-Science, Edinburgh, May 9-11, Sparrow Toolkit Much more than a lean syntax for OWL & RDF –Syntax transformation services: RDF, OWL, … Sparrow RDF, OWL, LaTeX, FO/LeanTap, … –Semantic registration services Semantic Annotation language –Reasoning services Classification, Consistency checking, Conversion, Query rewriting, … Will be provided in Kepler –e.g., as actors, but also as type extensions
meets e-Science … meets e-Science, Edinburgh, May 9-11, Sparrow: The Name “A poor man’s OWL” –or how XML really should look like “ Lieber den Spatz in der Hand als die Taube auf dem Dach ” –Better a sparrow in the hand than a pigeon/dove on the roof Also: In Memoriam :
meets e-Science … meets e-Science, Edinburgh, May 9-11, In Memoriam : Dusky Seaside Sparrow
meets e-Science … meets e-Science, Edinburgh, May 9-11, Some work in progress … [short-paper SSDBM’04]
meets e-Science … meets e-Science, Edinburgh, May 9-11, References SMS: – An Ontology Driven Framework for Data Transformation in Scientific Workflows. S. Bowers and B. Ludäscher. In International Workshop on Data Integration in the Life Sciences (DILS), LNCS, Leipzig, Germany, March – On Integrating Scientific Resources through Semantic Registration, S. Bowers, K. Lin, and B. Ludäscher, 16th International Conference on Scientific and Statistical Database Management (SSDBM'04), June 2004, Santorini Island, Greece. – Towards a Generic Framework for Semantic Registration of Scientific Data. S. Bowers and B. Ludäscher. In Semantic Web Technologies for Searching and Retrieving Scientific Data (SCISW), Sanibel Island, Florida, – Processing First-Order Queries under Limited Access Patterns, Alan Nash and B. Ludäscher, Proc. 23rd ACM Symposium on Principles of Database Systems (PODS'04) Paris, France, June 2004, to appear. – Processing Unions of Conjunctive Queries with Negation under Limited Access Patterns, Alan Nash and B. Ludäscher., 9th Intl. Conference on Extending Database Technology (EDBT'04) Heraklion, Crete, Greece, March 2004, LNCS. – Web Service Composition Through Declarative Queries: The Case of Conjunctive Queries with Union and Negation, B. Ludäscher and Alan Nash. Research abstract (poster), 20th Intl. Conference on Data Engineering (ICDE'04) Boston, IEEE Computer Society, April – Teaching : Graduate Class: CSE-291 – Ontologies in Data and Process Integration : (Bertram; guest lectures by Shawn) –…
meets e-Science … meets e-Science, Edinburgh, May 9-11, References Kepler – Kepler: An Extensible System for Design and Execution of Scientific Workflows, I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, S. Mock, 16th International Conference on Scientific and Statistical Database Management (SSDBM'04), June 2004, Santorini Island, Greece. – Kepler: Towards a Grid-Enabled System for Scientific Workflows. I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, and S. Mock In Workshop on Workflow in Grid Systems, Global-Grid Forum (GGF10), Berlin, Germany, March – A Web Service Composition and Deployment Framework for Scientific Workflows, I. Altintas, E. Jaeger, K. Lin, B. Ludaescher, A. Memon, In Intl. Conference on Web Services (ICWS), San Diego, California, July – Kepler: Towards a Grid-Enabled System for Scientific Workflows, Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher (presenter), Steve Mock, Workflow in Grid Systems (GGF10), Berlin, March 9th, – Kepler/GEON User Manual, Efrat Jaeger. – The Computational Chemistry Prototyping Environment, Kim Baldridge, Jerry Greenberg, Wibke Sudholt, Karan Bhatia, Stephen Mock, Ilkay Altintas, Cline Amoreira, Yohan Potier, Mucaehl Taufer –…