Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 eXtended Metadata Registry (XMDR) for Ecoinformatics Test Bed Interagency/International Cooperation on Ecoinformatics Copenhagen, Denmark June, 20 2006.

Similar presentations


Presentation on theme: "1 eXtended Metadata Registry (XMDR) for Ecoinformatics Test Bed Interagency/International Cooperation on Ecoinformatics Copenhagen, Denmark June, 20 2006."— Presentation transcript:

1 1 eXtended Metadata Registry (XMDR) for Ecoinformatics Test Bed Interagency/International Cooperation on Ecoinformatics Copenhagen, Denmark June, 20 2006 Bruce Bargmeyer Lawrence Berkeley National Laboratory and Berkeley Water Center University of California, Berkeley Tel: +1 510-495-2905 bebargmeyer@lbl.gov

2 XMDR Purpose F Improve data management through use of stronger semantics management u Databases u XML data F Enable new wave of semantic computing u Take meaning of data into account u Process across relations as well as properties u May use reasoning engines, e.g., to draw inferences 2

3 3 Object Class Chemopreventive Agent Property NSCNumber Conceptual Domain Agent Data Element Concept Chemopreventive Agent NSC Number Data Element Chemopreventive Agent Name Value Domain NSC Code Context caCORE Representation Code Classification Schemes caDSRTraining Valid Values Cyclooxygenase Inhibitor Doxercalciferol Eflornithine … Ursodiol Enterprise Vocabulary Services (EVS) Concepts Unite NCI MDR Source: Denise Warzel, National Cancer Institute

4 Vocabulary Management F Vocabulary Management is the first step for use of semantic technologies u Define concepts and relationships u Harmonize terminology, resolve conflicts u Collaborate with stakeholders F An approach u Select a domain of interest u Enter core concepts and relationships u Enter metadata describing enterprise data u Engage community in vocabulary review u Harmonize, validate and vet the vocabulary 4

5 Use XMDR F For vocabulary repository u Register, harmonize, validate, and vet definitions and relations F To register mappings between multiple vocabularies F To register mappings of concepts to data F To provide semantics services F To register and manage the provenance of data XMDR is part of the infrastructure for semantics and data management. 5

6 XMDR Use F Upside u Collaborative n Supports interaction with community of interest n Shared evolution and dissemination n Enables Review Cycle u Standards-based – don’t lock semantics into proprietary technology u Foundation for strategic data centric applications u Lays the foundation for Ontology-based Information Management u Content is reusable for many purposes F Downside u Managing semantics is HARD WORK - No matter how friendly the tools u Needs integration with other components 6

7 XMDR Project Participants F Collaborative, interagency effort u EPA, USGS, NCI, Mayo Clinic, DOD, LBNL …& others F Draws on and contributes to interagency/ International Cooperation on Ecoinformatics F Involves Ecoterm, international, national, state, local government agencies, other organizations as content providers and potential users F Interacts with many organizations around the world through ISO/IEC standards committees F Expected to interact with R&D under EU 7 th Framework Program 7

8 XMDR Update F Extended the capabilities to register more difficult kinds of metadata and concept systems u Linguistic ontologies (OMEGA) u Axiomatized ontologies (OpenCyc) F Created new draft of ISO/IEC 11179. Working Draft 4 out for Comment, Committee Draft 1 to go out in June. u Includes UML packages to make it easier to understand and easier to align with other standards u Looking at alignment with OASIS ebXML Registry F Worked on mapping existing 11179 MDR (E2) extended content to proposed Edition 3, particularly Cancer Data Standards Repository (caDSR). 8

9 XMDR Update F Created new version of XMDR prototype software keyed to ISO/IEC 11179 Working Draft 4. u Revised ontology u Revised software u Reloaded previous content u Loading new content (ongoing) n OMEGA linguistic ontology n Cancer Data Standards Repository (caDSR) n OpenCyc ontology n SIC – NAICS codes n Mapping of NAICS to SIC codes u Improved interface 9

10 XMDR for Ecoinformatics Test Bed F Demonstrate the use of the eXtended Metadata Registry (XMDR) to unite concept systems (such as ontologies) and metadata (which describes data) to support semantic services that help to answer tough questions. F Load selected concept systems and metadata into the XMDR and then utilize semantics technologies, including semantics services to make use of data to demonstrate the results. F The demonstration is intended to help answer questions that are swirling around emerging semantics technologies. u Do these open new doors? Answer new questions? u How does this fit into the rest of what EPA is doing? u How can EPA lead in the use of these new technologies? u Why and how should EPA invest in the infrastructure that is necessary to make effective use of semantic technologies? u How is EPA aligning? What is the EPA strategy? 10

11 XMDR in Ecoinformatics Test Bed Think of XMDR as “Embedded—an essential part of an infrastructure upon which applications are built. F Embed XMDR in EU FP7 project technology F Embed XMDR in traditional database application environment F Embed XMDR in new semantic computing environment 11

12 XMDR in Ecoinformatics Test Bed F Include XMDR (ISO/IEC 11179 Edition 3 in architectures – DoD, EPA, Federal Enterprise Architecture F Include XMDR as key enabling capability for Ecoinformatics F Looking for a collaborator who has the “rest of the story” that can demonstrate the utility of XMDR 12

13 XMDR Demonstration using Water Information Potential Collaboration with the following: F USGS Terminology Web Services F EU FP 7 EcoSemantics project F GEOSS data integration F Water Information System for Europe F Water Data Infrastructure (WADI) F Berkeley Water Center (BWC) Microsoft Technical Computing Initiative (TCI) F BWC Digital Watershed Research Thrust Area F Estuarine and Great Lakes Program (EAGLES) F LBNL Environmental Modeling projects

14 XMDR in Ecoinformatics Test Bed Demonstrate capabilities: F Register existing and formative water related concept systems, based on their underlying structures, such as graphs of varying complexity. u Register water ontologies as they are developed. F Interrelate concepts systems with each other. F Support efforts to converge on consistency through harmonization and vetting activities. F Interrelate concepts in concept systems with concepts in metadata and concepts in databases, knowledgebases, and text. F Provide semantic services needed to support traditional computing as well as semantic computing. u E.g., dereferencing the URIs used in creating RDF statements, by providing relevant information describing the referenced concept and its authoritative standing within some community of interest. 14

15 Collaborate with USGS Terminology Web Services F Already working with Mike Frame F Capability to use web service to access terms in multiple concept systems F Developed XMDR REST API to support this F More from Mike Frame 15

16 XMDR Embedded in EcoSemantics Architecture 16

17 XMDR Prototype Modular Architecture: primary functional components Registry Store Search & Content Serving XMDR metamodel (OWL & xml schema) standard XMDR files Logic Index Content Loading & Transformation Human User Interface Metadata Sources concept systems, data elements USERS Web Browsers…..Client Software Application Program Interface Authentication Service Validation Mapping Engine Logic IndexerText Indexer Metamodel specs (UML & Editing) XMDR data model & exchange format XML, RDF, OWL Text Index

18 XMDR Prototype open source software components Registry Store Search & Content Serving (Jena, Lucene) XMDR metamodel (OWL & xml schema) standard XMDR files Logic Index Content Loading & Transformation (Lexgrid & custom) Human User Interface (HTML fromJSP and javascript; Exhibit) Metadata Sources concept systems, data elements USERS Web Browsers…..Client Software Application Program Interface (REST) Authentication Service Validation (XML Schema) Mapping Engine Logic Indexer (Jana & Pellet) Text Indexer (Lucene) Metamodel specs (UML & Editing) (Poseidon, Protege) XMDR data model & exchange format XML, RDF, OWL Text Index Postgres Database

19 New REST style API facilitates interface for Web Services Registry Store Search & Content Serving (Jena, Lucene) XMDR metamodel (OWL & xml schema) standard XMDR files Logic Index Content Loading & Transformation (Lexgrid & custom) Human User Interface (HTML fromJSP and javascript; Exhibit) Metadata Sources concept systems, data elements USERS Web Browsers…..Client Software Application Program Interface (REST) Authentication Service Validation (XML Schema) Mapping Engine Logic Indexer (Jana & Pellet) Text Indexer (Lucene) Metamodel specs (UML & Editing) (Poseidon, Protege) XMDR data model & exchange format XML, RDF, OWL Text Index Postgres Database Third Party Software

20 Collaborate with GEOSS (with EPA and Others) F Global Earth Observation System of Systems (GEOSS) ten-year implementation plan. F GEOSS is envisioned as a large national and international cooperative effort to bring together existing and new hardware and software, making it all compatible in order to supply data and information at no cost. The U.S. and developed nations have a unique role in developing and maintaining the system, collecting data, enhancing data distribution, and providing models to help all of the world's nations. Outcomes and benefits of a global informational system will include: u disaster reduction u integrated water resource management u ocean and marine resource monitoring and management u weather and air quality monitoring, forecasting and advisories u biodiversity conservation u sustainable land use and management u public understanding of environmental factors affecting human health and well being u better development of energy resources u adaptation to climate variability and change F Demonstrate data integration 20

21 ADC Co-Chair Meeting 27 Nov 2006 21 GEOSS Standards and Interoperability Forum Experts, SDOs, Community GEOSS Interoperability Registry Base GEOSS Standards GEOSS Standards Registry GEOSS Societal Benefit Activity GEOSS Components Registry References Recommendation Request for help with interoperability between two GOESS components Study for possible existing solutions Register the issue as “under review” Register the recommendations, if “accepted” References From: S.J.S. Khalsa, IEEE Geoscience and Remote Sensing Society GEOS Interoperability

22 Collaborate with Water Information System for Europe (WISE) F Register metadata about WISE data elements F Register concept systems with concepts used in WISE data (glossary … ontology) F Support data harmonization F Initially shows support for traditional database computing F Helps to enable introduction of semantic computing for WISE F Are there any people working on WISE metadata and concept systems? 22

23 Collaboration with EPA Estuarine and Great Lakes Program (EAGLES) EAGLES Program is designed to: F Develop indicators and/or procedures useful for evaluating the ‘health' or condition of important coastal natural resources (e.g., lakes, streams, coral reefs, coastal wetlands, inland wetlands, rivers, estuaries) at multiple scales, ranging from individual communities to coastal drainage areas to entire biogeographical regions. F Develop indicators, indices, and/or procedures useful for evaluating the integrated condition of multiple resource/ecosystem types within a defined watershed, drainage basin, or larger biogeographical region of the U.S. F Develop landscape measures that characterize landscape attributes and that concomitantly serve as quantitative indicators of a range of environmental endpoints, including water quality, watershed quality, freshwater/estuarine/marine biological condition, and habitat suitability. F Develop nested suites of indicators that can both quantify the health or condition of a resource or system and identify its primary stressors at local to regional scales. F XMDR as extension to Environnemental Information Management System (EIMS) 23

24 Collaborate with Water Data Infrastructure (WADI) F WADI is a Semantic Computing application. F WADI goes from data collection to indicator display F XMDR could support concept management for WADI F WADI still needs some R&D and Demonstration F E.g., work on "integration" between a "data- layer“ (real data of RWS, all in XML and some basic low level RDF) and some higher layer of vocabularies/thesauri/ontologies 24

25 Potential Collaboration with Berkeley Water Center Digital Watershed Research Thrust Area F Understanding hydrological processes with sufficient accuracy--in the face of anthropogenic and global changes--is a prerequisite to successful water management. F Progress in this area requires research in engineering and IT: data, technologies, modeling, analysis tools (Theme 1), and cyberinfrastructure (Theme 2). F Developing an understanding requires synthesis of theory, concepts and engineering/IT tools

26 Digital Watershed Theme 1-TOOLS Development of novel sensors, technologies, and modeling/ analysis approaches is needed to provide information about complex water systems and to ensure cost effective and sustainable delivery of clean water. Examples: u SENSORS to autonomously measure important components of the water cycle and water quality at sufficient resolution and coverage. u TECHNOLOGIES that promote, for example, point-of-use clean water use or cost-efficient desalinization. u NUMERICAL APPROACHES that represent the coupling between atmosphere, vegetation, vadose and groundwater processes that are important for accurately predicting watershed behavior and sustainability.

27 This theme focuses on the development of cyber- infrastructure that will enable researchers and water managers to: Curate, assimilate, and clean complex, multi-scale datasets collected from networked micro sensors to global satellite platforms; Connect datasets to analysis, modeling, and visualization tools to facilitate hypotheses testing and eventually decision making. Theme 2: Water CyberInfrastructure

28 Microsoft Technical Computing Initiative Approach l Demonstrate an advanced cyber-infrastructure approach for tackling 21 st century challenges by leveraging web service concepts, technologies, and information technology expertise; l Early focus will integrate the most critical components needed to address relevant science questions, rather than creating a fully developed problem solving environment. l Demonstrate prototypes with end-to-end scenarios, and use feedback from water scientists to refine and augment l Work on two different, yet scientifically related projects that will : m Permit us to understand what is common and what is distinct between different water research approaches; m Allow us to work with a wide range of water datasets and analysis techniques; m Provide demonstration vehicles to two different water research communities.

29 CA WATER RESOURCES Extremely diverse datasets from many data providers; Datasets typically ‘dirtier’ and larger than AmeriFlux; Project offers significant potential for transferability to other basins; Will build on advances developed under Carbon- Climate portal. CARBON-CLIMATE Protocols for AmeriFlux data acquisition and reporting are well defined; Data are small and fairly clean; Will permit development and testing of a portal that will be rapidly useful for water scientists. Advances developed during this project will be applied to the development of the more challenging Central Valley portal. The Microsoft TCI will focus on development based on the needs of different water research communities Technical Computing Initiative

30 Web Service Interface to Data and Tools Host Ameriflux Climate Data, Statsgo Soils Data, MODIS products Web-based Workbench access Tools: Statistical Graphical LAI Temp Fpar Veg Index Surf Refl NPP Albedo Choose Ameriflux Area/Transect, Time Range, Data Type Gap Fill, A technique Gap Fill, B technique Design Workflow Statistical & graphical analysis Canoak Model Site 9 Data harvest Sites 1-16 Canoak Model Site 1 Version control Network display LAI Statistical & Graphical analysis Data Cleaning Tools Data Mining and Analysis Tools Modeling Tools Visualization Tools Ecology Toolbox Compute Resources Carbon-Climate Workbench Climate Statsgo MODIS Import other Datasets Knowledge Generation Tools Carbon-Climate Workbench

31 California Water CyberInfrastructure F BWC is in discussion with several groups to determine optimal project/place to develop and demonstrate Water TCI. F Criteria: u Agency involvement and interest; u Problem Characteristics (Science and socioeconomic importance; reward/risk); u Leveraging opportunity (projects / datasets); u Transferability to other basins; u Visibility u Springboard for Digital CAL synthesis F Ideal: Work with two different basins to explore what is similar and different in terms of water data IT and science challenges; F Long Term: Scalability between water agency / basin datasets and supply/demand estimates and DWR State components. State Water Plan.

32 Example Water TCI focus: Central Valley Water Resources and Quality F Across the US, groundwater supplies roughly 40 percent of drinking water; F The State of California alone uses about 16 Million acre-feet of ground water each year, more than any other State in the Nation, and 80% of that goes toward crop irrigation; F The 400 Mile long Central Valley supplies ¼ of the food in the US. F California Groundwater quantity and quality is critical to the economic viability of the state; F Recognizing this importance, USGS has developed a $50 Million program focusing on CA water quality monitoring. F PROBLEM: Disparate datasets and tools hinder ability to assess water resources and quality in Central Valley (and most basins in world)…. Central Valley Ken Belitz (USGS))

33 USGS and State Water Resources Control Board GAMA* and RASA** Projects F The importance of California groundwater quality and resources has prompted the USGS and SWRCB to develop a project to model flow pathways in the Central Valley (Central Valley RASA) and a $50M project to monitor ground water quality (GAMA); F As the GAMA project focuses on intensive data collection, no plans have been made to curate these data or to federate them with the other water datasets critical for understanding water balance and quality over time in the Central Valley. * Ground Water Ambient Monitoring and Assessment Program; ** Regional Aquifer Systems Analysis (Ref: Ken Belitz, USGS)

34 List of Analytes Volatile organic compounds Pesticides Stable Isotopes, D, O-18 Tritium-3He / Noble Gases Specific Conductance Stable isotopes, 3H/He, noble gases Carbon Isotopes (C-13,C-14) Radon, Radium, gross alpha/beta Field parameters - temp, EC, DO, turbidity, pH, alk. Major ions and trace elements Arsenic & Iron speciation Nutrients (nitrates, phosphates) Dissolved Organic Carbon Emerging Contaminants E. Coli, total Coliform, Coliphage Selected “ Emerging Contaminants ” Pharmaceuticals N-nitrosodimethylamine (NDMA) Perchlorate 1,4-dioxane Chromium (total and VI) Example of GAMA Water Quality Data Ken Belitz (USGS)

35 Data Harvesting and Transformations Knowledge discovery, Hypothesis testing, Water Synthesis Distributed California Water Resource Datasets Data Cleaning, Models, Analysis Tools BWC Analysis Gateway Dissemination and Archiving BWC Data Gateway BWC Water Portal Computational Resources California Water Portal Digital CAL

36 FYI Special Edition of IJMSO F Editing special edition of International Journal of Metadata, Semantics and Ontology u Open Forum on Metadata Registries u Topics related to metadata registries F Inviting people to write articles F Contact Bruce Bargmeyer 36

37 In Response to Mike Frame’s Question Describe the API for Terminology Web Services. 37

38 Initial XMDR REST-style Application Programming Interface (API) F Search Methods (GET) u Text Search u SPARQL Search u XMDR Search (not documented yet) F Registry Information Methods u Summary information u registered models u Identified Items F Method Parameters u can be included as part of any method u as part of URL u Accept_type (what xml components to expect) u Stylesheet (how to display results)

39 REST API (Search Methods) ResourceURI (relative to application root) MethodRepresentationAccept RequestDescription Text Search search/text? query={queryText} GETapplication/xml (searchResult) Any (ignores)Start a text search. Text Search Results search/text/{queryID}? offset={offset}& maxResults={maxResults} GETapplication/xml (textResultSet) application/xml, application/*, or */* Retrieve the results of a text search. application/exhibi*application/exhibit SPARQL Search search/sparql? query={queryText}& model={modelNameN} GETapplication/xml (searchResult) Any (ignores)Start a SPARQL search. SPARQL Search Results search/sparql/{queryID}? offset={offset}& maxResults={maxResults} GETapplication/xml (sparqlResultSet) application/xml, application/*, or */* Retrieve the results of a SPARQL search. application/ sparql-results+xml ** application/ sparql-results+xml application/ sparql-results+json *** application/ sparql-results+json, application/json application/exhibit * application/exhibit

40 *REST API (Search Results) searchResult (application/xml) jfs934js textResultSet (application/xml) <!—element names will be names of fields in the Lucene document and element values will be their string values  … 0 sparqlResultSet (application/xml) 0

41 *REST -- Registry (content) methods ResourceURI (relative to application root) MethodRepresentationAccept Request Description Registry content content/GET *application/xml (contentList) Any (ignores)Retrieve the names of the models (concept systems) registered in the registry. POST *XML/RDFCreate a new item in the registry content/{path} (where path does not correspond to an identifier for an item in the registry) GET *application/xml (contentList) Any (ignores)Retrieves the immediate next portion of the path. Identified Itemcontent/{ID}GETapplication/ rdf+xml Any (ignores)Retrieve an Identified Item from the registry PUT *XML/RDFUpdate an Identified Item in the registry DELETE *-Remove an Identified Item from the registry (* indicates that feature is not yet implemented)

42 *REST API (Registry Results) contentList (application/xml) nameOfItem … nameOfItemN

43 REST API (Method Parameters) ParameterDescription acceptTypeTreated as the Accept header value in the HTTP Request (limited support: only 1 type with no modifiers). stylesheet *Apply the stylesheet at the provided URI or path to the results. (for now must be on application server)

44 Acknowledgements F Susan Hubbard, BWC F John McCarthy, LBNL F Karlo Berket, LBNL This material is based upon work supported by the National Science Foundation under Grant No. 0637122, USEPA and USDOD. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, USEPA or USDOD. 44


Download ppt "1 eXtended Metadata Registry (XMDR) for Ecoinformatics Test Bed Interagency/International Cooperation on Ecoinformatics Copenhagen, Denmark June, 20 2006."

Similar presentations


Ads by Google