Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantic Data Integration in myGrid and ourGrid (SEEK) National e-Science Centre e-Science Institute, Edinburgh May 14 th, 2004.

Similar presentations


Presentation on theme: "Semantic Data Integration in myGrid and ourGrid (SEEK) National e-Science Centre e-Science Institute, Edinburgh May 14 th, 2004."— Presentation transcript:

1 Semantic Data Integration in myGrid and ourGrid (SEEK) National e-Science Centre e-Science Institute, Edinburgh May 14 th, 2004

2 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 2 Plan of the Day 9:00–10:30 –SEEK Data Integration & Semantic Extensions 10:30–11:00 BREAK 11:00–12:30 –myGrid Data Integration & Semantic Extensions 12:30–13:45 LUNCH 13:45–15:45 –Interoperable Semantic Registration, Mediation, Workflows 15:45–16:00 BREAK 16:00–17:00 –Plenary Session

3 SEEK Data Integration & Semantic Extensions Shawn Bowers (SDSC/UCSD) Bertram Ludaescher (SDSC/UCSD) & SEEK KR-SMS Team & GEON KR Team Sparrow http://seek.ecoinformatics.org

4 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 4 Purpose / Goals Link-Up: –… [on] data / services with “semantics” –… to do semantic data & service integration –also: an e-Science “Sister Project” to facilitate knowledge exchange & collaboration between UK & US based projects (where is the web/wiki page?) Specifically: –What approaches to express semantics of data, services, and workflows do we all use? –How can we make them interoperable? … keeping in mind… –What problem is it that the XYZ solution solves ?

5 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 5 Science Environment for Ecological Knowledge Domain Science Driver –Ecology (LTER), biodiversity, … Analysis & Modeling System –Design & execution of ecological models & analysis –End (&power) user focus – {application,upper}-ware  Kepler Semantic Mediation System –Data Integration of hard-to- relate sources and processes –Semantic Types and Ontologies – upper middleware  Sparrow Toolkit EcoGrid –Access to ecology data and tools – {middle,under}-ware one specific problem (DILS’04) our focus architecture

6 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 6 Heterogeneous Data integration Requires advanced metadata and processing –Attributes must be semantically typed –Collection protocols must be known –Units and measurement scale must be known –Measurement relationships must be known e.g., that ArealDensity=Count/Area

7 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 7

8 A Neuroscientist’s Information Integration Problem What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents??InformationIntegration protein localization (NCMIR) neurotransmission(SENSELAB) sequence info (CaPROT) morphometry(SYNAPSE) “ComplexMultiple-Worlds”Mediation Biomedical Informatics Research Network http://nbirn.net

9 A Home Buyer’s Information Integration Problem What houses for sale under $500k have at least 2 bathrooms, 2 bedrooms, a nearby school ranking in the upper third, in a neighborhood with below-average crime rate and diverse population??InformationIntegration Realtor Demographics School Rankings Crime Stats “Multiple-Worlds”Mediation

10 An Online Shopper’s Information Integration Problem El Cheapo: “Where can I get the cheapest copy (including shipping cost) of Wittgenstein’s Tractatus Logicus-Philosophicus within a week?” ? Information Integration addall.com“One-World”Mediation amazon.com A1books.com half.com barnes&noble.com

11 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 11 Standard (XML-Based) Mediator Architecture MEDIATOR (XML) Queries & Results S1S1 Wrapper (XML) View S2S2 Wrapper (XML) View SkSk Wrapper (XML) View Integrated Global (XML) View G Integrated View Definition G(..)  S 1 (..)…S k (..) USER/Client USER/Client Query Q ( G (S 1,..., S k ) ) Query Q ( G (S 1,..., S k ) ) wrappers implemented as web services

12 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 12 Information Integration: Problems and “Solutions” System aspects: “Grid” Infrastructure –Authentication, single sign-on, … –distributed computation –web wervices, WSDL/SOAP, … – sources = functions, files, databases, … Syntax & Structure: (XML-Based) Database Mediators –wrapping, restructuring –distributed (XML) queries and views – sources = (XML) databases Semantics: Model-Based/Semantic Mediators –conceptual models, declarative views –ontologies, description logics (OWL, RDF,…) – sources = knowledge bases (DB+CMs+ICs) Syntax Structure Semantics System aspects reconciling S 4 heterogeneitiesreconciling S 4 heterogeneities “gluing” together multiple data sources“gluing” together multiple data sources bridging inforbmation and knowledge gaps computationallybridging inforbmation and knowledge gaps computationally

13 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 13 Exercise: Classify (system, syntax, structure, semantics, sth else …) “9:00” vs “9am” vs “21:00” vs “9 ct” “3 miles” (land|sea) (here UK|US|elsewhere) (now|elsewhen) … “picea rubens” (name vs concept … in biological taxonomies) …

14 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 14 Different Types of “Ontologies” and Representations Overloaded/sloppy for a… –“ Napkin drawing ”, “concept space” (e.g. in PPT) – Labeled graph, semantic network, concept map (e.g. in RDF) – Controlled vocabulary (structured or flat) – Database schema (relational, XML, …) – Conceptual schema (ER, UML, … ) – Thesaurus (synonyms, broader term/narrower term) – Taxonomy – Formal ontology, e.g., in [Description] Logic (e.g. in OWL) “formalization of a specification” An ontology may … –constrain possible interpretation of terms – specify a theorydefiningrelating concepts – specify a theory by defining and relating concepts of a domain of interest logic models (=“allowed/intented intepretations” of symbolstheory = set of logic models (=“allowed/intented intepretations” of symbols)

15 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 15 Community-Based Ontology Development Draft of a geochemistry ontology developed by scientists Current concept maps and emerging ontologies in GEON: 1.Igneous Rocks/Plutons 2.Seismology 3.Geochemistry … in SEEK: 1.Taxon 2.Units 3.Measurements 4.…?

16 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 16 Creating and Sharing Concept Maps (here: Seismology concept map, Cmap tool; Kai Lin, GEON) Lock up scientists for 2+ days Add CS/KRDB types Create concept maps Refine Iterate  from napkin drawings, to concept maps, to ontologies

17 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 17

18 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 18

19 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 19

20 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 20 Graph (RDF) Queries on Ontologies visualization RQL Query: Show all “ products ” Query Results Prototype: Kai Lin, GEON

21 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 21 Ontologies: Qui bono? What are ontologies used for? – Conceptual models of a domain or application, (communication means, system/database design, …) – Classification of … concepts (taxonomy) and data/object instances based on properties and concept definitions – Analysis of ontologies e.g. Graph queries (reachability, path queries, …) Reasoning (concept subsumption, consistency checking, …) – Targets for semantic data registration – Conceptual indexes and views for “smart” operations

22 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 22 Using ontologies for … Smart data discovery Smart service discovery Smart (data) querying Smart data integration (declarative) Smart workflow planning (execution !?) (procedural) Here: def_macro “smart” := (ontology|semantics) – (based|enhanced|enabled)

23 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 23 Specifically in SMS.. “smart data discovery” – e.g., … –asking for A, retrieve B’s too, since B isa A “smart connections” – e.g., … –data/source binding to AMS (Kepler) services (actors) –service-to-service semantic (and structural?) type checking –service-to-service & data-to-service “gluing” (insert structural transformations, unit conversions, suggest services based on parameter chasing (parameter ontologies)

24 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 24 … specifically in SMS.. (Cont’d) “smart data integration”– e.g., … –concept-based instance classification and data enumeration (as part of integrated/mediated views) –discovery and use of new join relations across sources –rewriting queries (against which SEEK/EcoGrid/EML schemas??) using ontologies & integrity constraints –generation of feasible distributed query plans in the presence of access patterns (web services), views, integrity constraints (ICs)  Need for “semantic registration/annotation” –Linking data structures/objects to conceptual structures

25 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 25 Things to “Register” Data files (individual files) –Shapefile as a blob (+ file type) Collections (of files; nested; eg satellite data) Databases (has schema and can be queried) –Shapefile with schema registered Ontologies Services (web + grid services) Other/external applications

26 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 26 Ontologies and Data Management (  watch out for Semantic Data Registration later) Schema Conceptual Model Conceptual Model Ontology Data  Metadata Design Artifact use concepts from (explicitly or implicitly)

27 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 27 A Multi-Hierarchical Rock Classification “Ontology” (GSC) Composition Genesis Fabric Texture

28 Application Example: Geologic Map Integration domain knowledge domain knowledge Knowledge representation Ontologies!? Nevada +/- a few hundred million years “Semantic Registration” of shapefiles to a shared ontology  concept-based queries; also allows …  … viewing of British-registered USGS data through Canadian eyes

29 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 29 Example: Smart Connections [DILS’04] Services can be semantically compatible, but structurally incompatible Source Service Source Service Target Service Target Service PsPs PtPt Semantic Type P s Semantic Type P t Structural Type P t Structural Type P s Desired Connection Incompatible Compatible (⋠)(⋠) (⊑)(⊑) (Ps)(Ps) (Ps)(Ps)  (≺)(≺) Ontologies (OWL)

30 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 30 Example: Smart Connections [DILS’04] Source Service Source Service Target Service Target Service PsPs PtPt Semantic Type P s Semantic Type P t Structural Type P t Structural Type P s Desired Connection Compatible (⊑)(⊑) Registration Mapping (Output) Registration Mapping (Input) Correspondence Generate (Ps)(Ps) (Ps)(Ps) Ontologies (OWL) Transformation

31 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 31 The Sparrow Toolkit (Origins) Annoyance with ugly, user-unfriendly XML syntaxes (e.g., OWL in XML, rules in XML, … anything in XML) –Note: others got annoyed too, but we didn’t know [OWL Concrete Abstract Syntax, Bechhofer et al.] –(well, we knew about Triple, but that’s only RDF…) Instead use a lean syntax (how XML should have been) – owl employee isa person and worksfor some employer. – owl mother eqv person and female and hasChild some person. – rdf john, worksfor, ‘IBM’. –… are both human and machine readable –… in fact the language was invented around the corner… –… and this is the “parser”: :- op(1100, fx, owl ), op(1100, fx, rdf ), :- op(600, xfx, isa ), op(600, xfx, eqv ). :- op(550, xfy, or ), op(500, xfy, and ), op(350, fx, not ). :- op(400, xfy, some ), op(400, xfy, only ).

32 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 32 Sparrow (a poor man’s OWL tool …) Simple ASCII-based RDF and OWL entry and manipulation

33 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 33 Sparrow Toolkit Much more than a lean syntax for OWL & RDF –Syntax transformation services: RDF, OWL, …  Sparrow  RDF, OWL, LaTeX, FO/LeanTap, … –Semantic registration services Semantic Annotation language –Reasoning services Classification, Consistency checking, Conversion, Query rewriting, … Will be provided in Kepler –e.g., as actors, but also as type extensions

34 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 34 Sparrow: The Name “A poor man’s OWL” –or how XML really should look like “ Lieber den Spatz in der Hand als die Taube auf dem Dach ” –Better a sparrow in the hand than a pigeon/dove on the roof Also: In Memoriam :

35 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 35 In Memoriam : Dusky Seaside Sparrow

36 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 36 Some work in progress … [short-paper SSDBM’04]

37 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 37 References SMS: – An Ontology Driven Framework for Data Transformation in Scientific Workflows. S. Bowers and B. Ludäscher. In International Workshop on Data Integration in the Life Sciences (DILS), LNCS, Leipzig, Germany, March 2004. – On Integrating Scientific Resources through Semantic Registration, S. Bowers, K. Lin, and B. Ludäscher, 16th International Conference on Scientific and Statistical Database Management (SSDBM'04), 21-23 June 2004, Santorini Island, Greece. – Towards a Generic Framework for Semantic Registration of Scientific Data. S. Bowers and B. Ludäscher. In Semantic Web Technologies for Searching and Retrieving Scientific Data (SCISW), Sanibel Island, Florida, 2003. – Processing First-Order Queries under Limited Access Patterns, Alan Nash and B. Ludäscher, Proc. 23rd ACM Symposium on Principles of Database Systems (PODS'04) Paris, France, June 2004, to appear. – Processing Unions of Conjunctive Queries with Negation under Limited Access Patterns, Alan Nash and B. Ludäscher., 9th Intl. Conference on Extending Database Technology (EDBT'04) Heraklion, Crete, Greece, March 2004, LNCS. – Web Service Composition Through Declarative Queries: The Case of Conjunctive Queries with Union and Negation, B. Ludäscher and Alan Nash. Research abstract (poster), 20th Intl. Conference on Data Engineering (ICDE'04) Boston, IEEE Computer Society, April 2004. – Teaching : Graduate Class: CSE-291 – Ontologies in Data and Process Integration : http://www.sdsc.edu/~ludaesch/CSE-291-Spring-04/ (Bertram; guest lectures by Shawn) –…

38 meets e-Science … meets e-Science, Edinburgh, May 9-11, 2004 38 References Kepler – Kepler: An Extensible System for Design and Execution of Scientific Workflows, I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, S. Mock, 16th International Conference on Scientific and Statistical Database Management (SSDBM'04), 21-23 June 2004, Santorini Island, Greece. – Kepler: Towards a Grid-Enabled System for Scientific Workflows. I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, and S. Mock In Workshop on Workflow in Grid Systems, Global-Grid Forum (GGF10), Berlin, Germany, March 2004. – A Web Service Composition and Deployment Framework for Scientific Workflows, I. Altintas, E. Jaeger, K. Lin, B. Ludaescher, A. Memon, In Intl. Conference on Web Services (ICWS), San Diego, California, July 2004. – Kepler: Towards a Grid-Enabled System for Scientific Workflows, Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher (presenter), Steve Mock, Workflow in Grid Systems (GGF10), Berlin, March 9th, 2004. – Kepler/GEON User Manual, Efrat Jaeger. – The Computational Chemistry Prototyping Environment, Kim Baldridge, Jerry Greenberg, Wibke Sudholt, Karan Bhatia, Stephen Mock, Ilkay Altintas, Cline Amoreira, Yohan Potier, Mucaehl Taufer –…


Download ppt "Semantic Data Integration in myGrid and ourGrid (SEEK) National e-Science Centre e-Science Institute, Edinburgh May 14 th, 2004."

Similar presentations


Ads by Google