Semantic Data Integration in myGrid and ourGrid (SEEK) National e-Science Centre e-Science Institute, Edinburgh May 14 th, 2004.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

1 ICS-FORTH EU-NSF Semantic Web Workshop 3-5 Oct Christophides Vassilis Database Technology for the Semantic Web Vassilis Christophides Dimitris Plexousakis.
1 ICS-FORTH & Univ. of Crete SeLene November 15, 2002 A View Definition Language for the Semantic Web Maganaraki Aimilia.
Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
® IBM Software Group © 2006 IBM Corporation Rational Software France Object-Oriented Analysis and Design with UML2 and Rational Software Modeler 04. Other.
Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications Dr. Bhavani Thuraisingham February 15, 2013.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Ontology Notes are from:
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Module 2b: Modeling Information Objects and Relationships IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
January, 23, 2006 Ilkay Altintas
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
GEON-UTEP GEON-Knowledge Representation WG Update GEON-KR list (currently) Bertram Ludaescher (SDSC: Bertram Ludaescher (SDSC:
Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Towards Semantic Mediation for GEON: Facilitating Scientific Data Integration using.
CSE-291: Ontologies in Data & Process Integration Department of Computer Science & Engineering University of California, San Diego CSE-291: Ontologies.
Recording application executions enriched with domain semantics of computations and data Master of Science Thesis Michał Pelczar Krakow,
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher.
1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.
Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows Bertram Ludäscher.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
updated CmpE 583 Fall 2008 Ontology Integration- 1 CmpE 583- Web Semantics: Theory and Practice ONTOLOGY INTEGRATION Atilla ELÇİ Computer.
1 Ilkay ALTINTAS - July 24th, 2007 Ilkay ALTINTAS Director, Scientific Workflow Automation Technologies Laboratory San Diego Supercomputer Center, UCSD.
Ontologies in Data and Application Integration – an Update Kai Lin Bertram Ludäscher Knowledge-Based Information Systems Lab Data and Knowledge Systems.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
© Geodise Project, University of Southampton, Knowledge Management in Geodise Geodise Knowledge Management Team Barry Tao, Colin Puleston, Liming.
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
Knowledge-Based Integration of Neuroscience Data Sources Amarnath Gupta Bertram Ludäscher Maryann Martone University of California San Diego.
Knowledge Representation Breakout KR: to create content (objects, reltnshps) for SMS (logic/inference) that will be useful for enhancing the discovery.
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
An Ontology-Driven Framework for Data Transformation in Scientific Workflows Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows Bertram Ludäscher.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
Information Architecture The Open Group UDEF Project
Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego.
GEONSearch: From Searching to Recommending GeoInformatics 2006 May 10-12, Reston, Virginia Ullas Nambiar, Bertram Ludaescher Dept. of Computer Science.
An Extensible Model-Based Mediator System with Domain Maps Amarnath Gupta * Bertram Ludäscher * Maryann E. Martone + * San Diego Supercomputer Center (SDSC)
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center KNOW-ME (KNOWledge-Map-Explorer) Semantic Browsing of Integrated.
Ontology Technology applied to Catalogues Paul Kopp.
Efrat Jaeger – SDSC Bertram Ludäscher – UC DAVIS Krishna Sinha – Virginia Tech Ashraf Memon – SDSC Ghulam Memon – SDSC Ilkay Altintas – SDSC Kai Lin –
EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis Arcot Rajasekar San Diego Supercomputer Center, University.
The Semantic Web By: Maulik Parikh.
Data R&D Issues for GTL Bertram Ludäscher Data and Knowledge Systems
ece 720 intelligent web: ontology and beyond
A Semantic Type System and Propagation
Ontologies: Introduction and Some Uses
Presentation transcript:

Semantic Data Integration in myGrid and ourGrid (SEEK) National e-Science Centre e-Science Institute, Edinburgh May 14 th, 2004

meets e-Science … meets e-Science, Edinburgh, May 9-11, Plan of the Day 9:00–10:30 –SEEK Data Integration & Semantic Extensions 10:30–11:00 BREAK 11:00–12:30 –myGrid Data Integration & Semantic Extensions 12:30–13:45 LUNCH 13:45–15:45 –Interoperable Semantic Registration, Mediation, Workflows 15:45–16:00 BREAK 16:00–17:00 –Plenary Session

SEEK Data Integration & Semantic Extensions Shawn Bowers (SDSC/UCSD) Bertram Ludaescher (SDSC/UCSD) & SEEK KR-SMS Team & GEON KR Team Sparrow

meets e-Science … meets e-Science, Edinburgh, May 9-11, Purpose / Goals Link-Up: –… [on] data / services with “semantics” –… to do semantic data & service integration –also: an e-Science “Sister Project” to facilitate knowledge exchange & collaboration between UK & US based projects (where is the web/wiki page?) Specifically: –What approaches to express semantics of data, services, and workflows do we all use? –How can we make them interoperable? … keeping in mind… –What problem is it that the XYZ solution solves ?

meets e-Science … meets e-Science, Edinburgh, May 9-11, Science Environment for Ecological Knowledge Domain Science Driver –Ecology (LTER), biodiversity, … Analysis & Modeling System –Design & execution of ecological models & analysis –End (&power) user focus – {application,upper}-ware  Kepler Semantic Mediation System –Data Integration of hard-to- relate sources and processes –Semantic Types and Ontologies – upper middleware  Sparrow Toolkit EcoGrid –Access to ecology data and tools – {middle,under}-ware one specific problem (DILS’04) our focus architecture

meets e-Science … meets e-Science, Edinburgh, May 9-11, Heterogeneous Data integration Requires advanced metadata and processing –Attributes must be semantically typed –Collection protocols must be known –Units and measurement scale must be known –Measurement relationships must be known e.g., that ArealDensity=Count/Area

meets e-Science … meets e-Science, Edinburgh, May 9-11,

A Neuroscientist’s Information Integration Problem What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents??InformationIntegration protein localization (NCMIR) neurotransmission(SENSELAB) sequence info (CaPROT) morphometry(SYNAPSE) “ComplexMultiple-Worlds”Mediation Biomedical Informatics Research Network

A Home Buyer’s Information Integration Problem What houses for sale under $500k have at least 2 bathrooms, 2 bedrooms, a nearby school ranking in the upper third, in a neighborhood with below-average crime rate and diverse population??InformationIntegration Realtor Demographics School Rankings Crime Stats “Multiple-Worlds”Mediation

An Online Shopper’s Information Integration Problem El Cheapo: “Where can I get the cheapest copy (including shipping cost) of Wittgenstein’s Tractatus Logicus-Philosophicus within a week?” ? Information Integration addall.com“One-World”Mediation amazon.com A1books.com half.com barnes&noble.com

meets e-Science … meets e-Science, Edinburgh, May 9-11, Standard (XML-Based) Mediator Architecture MEDIATOR (XML) Queries & Results S1S1 Wrapper (XML) View S2S2 Wrapper (XML) View SkSk Wrapper (XML) View Integrated Global (XML) View G Integrated View Definition G(..)  S 1 (..)…S k (..) USER/Client USER/Client Query Q ( G (S 1,..., S k ) ) Query Q ( G (S 1,..., S k ) ) wrappers implemented as web services

meets e-Science … meets e-Science, Edinburgh, May 9-11, Information Integration: Problems and “Solutions” System aspects: “Grid” Infrastructure –Authentication, single sign-on, … –distributed computation –web wervices, WSDL/SOAP, … – sources = functions, files, databases, … Syntax & Structure: (XML-Based) Database Mediators –wrapping, restructuring –distributed (XML) queries and views – sources = (XML) databases Semantics: Model-Based/Semantic Mediators –conceptual models, declarative views –ontologies, description logics (OWL, RDF,…) – sources = knowledge bases (DB+CMs+ICs) Syntax Structure Semantics System aspects reconciling S 4 heterogeneitiesreconciling S 4 heterogeneities “gluing” together multiple data sources“gluing” together multiple data sources bridging inforbmation and knowledge gaps computationallybridging inforbmation and knowledge gaps computationally

meets e-Science … meets e-Science, Edinburgh, May 9-11, Exercise: Classify (system, syntax, structure, semantics, sth else …) “9:00” vs “9am” vs “21:00” vs “9 ct” “3 miles” (land|sea) (here UK|US|elsewhere) (now|elsewhen) … “picea rubens” (name vs concept … in biological taxonomies) …

meets e-Science … meets e-Science, Edinburgh, May 9-11, Different Types of “Ontologies” and Representations Overloaded/sloppy for a… –“ Napkin drawing ”, “concept space” (e.g. in PPT) – Labeled graph, semantic network, concept map (e.g. in RDF) – Controlled vocabulary (structured or flat) – Database schema (relational, XML, …) – Conceptual schema (ER, UML, … ) – Thesaurus (synonyms, broader term/narrower term) – Taxonomy – Formal ontology, e.g., in [Description] Logic (e.g. in OWL) “formalization of a specification” An ontology may … –constrain possible interpretation of terms – specify a theorydefiningrelating concepts – specify a theory by defining and relating concepts of a domain of interest logic models (=“allowed/intented intepretations” of symbolstheory = set of logic models (=“allowed/intented intepretations” of symbols)

meets e-Science … meets e-Science, Edinburgh, May 9-11, Community-Based Ontology Development Draft of a geochemistry ontology developed by scientists Current concept maps and emerging ontologies in GEON: 1.Igneous Rocks/Plutons 2.Seismology 3.Geochemistry … in SEEK: 1.Taxon 2.Units 3.Measurements 4.…?

meets e-Science … meets e-Science, Edinburgh, May 9-11, Creating and Sharing Concept Maps (here: Seismology concept map, Cmap tool; Kai Lin, GEON) Lock up scientists for 2+ days Add CS/KRDB types Create concept maps Refine Iterate  from napkin drawings, to concept maps, to ontologies

meets e-Science … meets e-Science, Edinburgh, May 9-11,

meets e-Science … meets e-Science, Edinburgh, May 9-11,

meets e-Science … meets e-Science, Edinburgh, May 9-11,

meets e-Science … meets e-Science, Edinburgh, May 9-11, Graph (RDF) Queries on Ontologies visualization RQL Query: Show all “ products ” Query Results Prototype: Kai Lin, GEON

meets e-Science … meets e-Science, Edinburgh, May 9-11, Ontologies: Qui bono? What are ontologies used for? – Conceptual models of a domain or application, (communication means, system/database design, …) – Classification of … concepts (taxonomy) and data/object instances based on properties and concept definitions – Analysis of ontologies e.g. Graph queries (reachability, path queries, …) Reasoning (concept subsumption, consistency checking, …) – Targets for semantic data registration – Conceptual indexes and views for “smart” operations

meets e-Science … meets e-Science, Edinburgh, May 9-11, Using ontologies for … Smart data discovery Smart service discovery Smart (data) querying Smart data integration (declarative) Smart workflow planning (execution !?) (procedural) Here: def_macro “smart” := (ontology|semantics) – (based|enhanced|enabled)

meets e-Science … meets e-Science, Edinburgh, May 9-11, Specifically in SMS.. “smart data discovery” – e.g., … –asking for A, retrieve B’s too, since B isa A “smart connections” – e.g., … –data/source binding to AMS (Kepler) services (actors) –service-to-service semantic (and structural?) type checking –service-to-service & data-to-service “gluing” (insert structural transformations, unit conversions, suggest services based on parameter chasing (parameter ontologies)

meets e-Science … meets e-Science, Edinburgh, May 9-11, … specifically in SMS.. (Cont’d) “smart data integration”– e.g., … –concept-based instance classification and data enumeration (as part of integrated/mediated views) –discovery and use of new join relations across sources –rewriting queries (against which SEEK/EcoGrid/EML schemas??) using ontologies & integrity constraints –generation of feasible distributed query plans in the presence of access patterns (web services), views, integrity constraints (ICs)  Need for “semantic registration/annotation” –Linking data structures/objects to conceptual structures

meets e-Science … meets e-Science, Edinburgh, May 9-11, Things to “Register” Data files (individual files) –Shapefile as a blob (+ file type) Collections (of files; nested; eg satellite data) Databases (has schema and can be queried) –Shapefile with schema registered Ontologies Services (web + grid services) Other/external applications

meets e-Science … meets e-Science, Edinburgh, May 9-11, Ontologies and Data Management (  watch out for Semantic Data Registration later) Schema Conceptual Model Conceptual Model Ontology Data  Metadata Design Artifact use concepts from (explicitly or implicitly)

meets e-Science … meets e-Science, Edinburgh, May 9-11, A Multi-Hierarchical Rock Classification “Ontology” (GSC) Composition Genesis Fabric Texture

Application Example: Geologic Map Integration domain knowledge domain knowledge Knowledge representation Ontologies!? Nevada +/- a few hundred million years “Semantic Registration” of shapefiles to a shared ontology  concept-based queries; also allows …  … viewing of British-registered USGS data through Canadian eyes

meets e-Science … meets e-Science, Edinburgh, May 9-11, Example: Smart Connections [DILS’04] Services can be semantically compatible, but structurally incompatible Source Service Source Service Target Service Target Service PsPs PtPt Semantic Type P s Semantic Type P t Structural Type P t Structural Type P s Desired Connection Incompatible Compatible (⋠)(⋠) (⊑)(⊑) (Ps)(Ps) (Ps)(Ps)  (≺)(≺) Ontologies (OWL)

meets e-Science … meets e-Science, Edinburgh, May 9-11, Example: Smart Connections [DILS’04] Source Service Source Service Target Service Target Service PsPs PtPt Semantic Type P s Semantic Type P t Structural Type P t Structural Type P s Desired Connection Compatible (⊑)(⊑) Registration Mapping (Output) Registration Mapping (Input) Correspondence Generate (Ps)(Ps) (Ps)(Ps) Ontologies (OWL) Transformation

meets e-Science … meets e-Science, Edinburgh, May 9-11, The Sparrow Toolkit (Origins) Annoyance with ugly, user-unfriendly XML syntaxes (e.g., OWL in XML, rules in XML, … anything in XML) –Note: others got annoyed too, but we didn’t know [OWL Concrete Abstract Syntax, Bechhofer et al.] –(well, we knew about Triple, but that’s only RDF…) Instead use a lean syntax (how XML should have been) – owl employee isa person and worksfor some employer. – owl mother eqv person and female and hasChild some person. – rdf john, worksfor, ‘IBM’. –… are both human and machine readable –… in fact the language was invented around the corner… –… and this is the “parser”: :- op(1100, fx, owl ), op(1100, fx, rdf ), :- op(600, xfx, isa ), op(600, xfx, eqv ). :- op(550, xfy, or ), op(500, xfy, and ), op(350, fx, not ). :- op(400, xfy, some ), op(400, xfy, only ).

meets e-Science … meets e-Science, Edinburgh, May 9-11, Sparrow (a poor man’s OWL tool …) Simple ASCII-based RDF and OWL entry and manipulation

meets e-Science … meets e-Science, Edinburgh, May 9-11, Sparrow Toolkit Much more than a lean syntax for OWL & RDF –Syntax transformation services: RDF, OWL, …  Sparrow  RDF, OWL, LaTeX, FO/LeanTap, … –Semantic registration services Semantic Annotation language –Reasoning services Classification, Consistency checking, Conversion, Query rewriting, … Will be provided in Kepler –e.g., as actors, but also as type extensions

meets e-Science … meets e-Science, Edinburgh, May 9-11, Sparrow: The Name “A poor man’s OWL” –or how XML really should look like “ Lieber den Spatz in der Hand als die Taube auf dem Dach ” –Better a sparrow in the hand than a pigeon/dove on the roof Also: In Memoriam :

meets e-Science … meets e-Science, Edinburgh, May 9-11, In Memoriam : Dusky Seaside Sparrow

meets e-Science … meets e-Science, Edinburgh, May 9-11, Some work in progress … [short-paper SSDBM’04]

meets e-Science … meets e-Science, Edinburgh, May 9-11, References SMS: – An Ontology Driven Framework for Data Transformation in Scientific Workflows. S. Bowers and B. Ludäscher. In International Workshop on Data Integration in the Life Sciences (DILS), LNCS, Leipzig, Germany, March – On Integrating Scientific Resources through Semantic Registration, S. Bowers, K. Lin, and B. Ludäscher, 16th International Conference on Scientific and Statistical Database Management (SSDBM'04), June 2004, Santorini Island, Greece. – Towards a Generic Framework for Semantic Registration of Scientific Data. S. Bowers and B. Ludäscher. In Semantic Web Technologies for Searching and Retrieving Scientific Data (SCISW), Sanibel Island, Florida, – Processing First-Order Queries under Limited Access Patterns, Alan Nash and B. Ludäscher, Proc. 23rd ACM Symposium on Principles of Database Systems (PODS'04) Paris, France, June 2004, to appear. – Processing Unions of Conjunctive Queries with Negation under Limited Access Patterns, Alan Nash and B. Ludäscher., 9th Intl. Conference on Extending Database Technology (EDBT'04) Heraklion, Crete, Greece, March 2004, LNCS. – Web Service Composition Through Declarative Queries: The Case of Conjunctive Queries with Union and Negation, B. Ludäscher and Alan Nash. Research abstract (poster), 20th Intl. Conference on Data Engineering (ICDE'04) Boston, IEEE Computer Society, April – Teaching : Graduate Class: CSE-291 – Ontologies in Data and Process Integration : (Bertram; guest lectures by Shawn) –…

meets e-Science … meets e-Science, Edinburgh, May 9-11, References Kepler – Kepler: An Extensible System for Design and Execution of Scientific Workflows, I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, S. Mock, 16th International Conference on Scientific and Statistical Database Management (SSDBM'04), June 2004, Santorini Island, Greece. – Kepler: Towards a Grid-Enabled System for Scientific Workflows. I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, and S. Mock In Workshop on Workflow in Grid Systems, Global-Grid Forum (GGF10), Berlin, Germany, March – A Web Service Composition and Deployment Framework for Scientific Workflows, I. Altintas, E. Jaeger, K. Lin, B. Ludaescher, A. Memon, In Intl. Conference on Web Services (ICWS), San Diego, California, July – Kepler: Towards a Grid-Enabled System for Scientific Workflows, Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher (presenter), Steve Mock, Workflow in Grid Systems (GGF10), Berlin, March 9th, – Kepler/GEON User Manual, Efrat Jaeger. – The Computational Chemistry Prototyping Environment, Kim Baldridge, Jerry Greenberg, Wibke Sudholt, Karan Bhatia, Stephen Mock, Ilkay Altintas, Cline Amoreira, Yohan Potier, Mucaehl Taufer –…