Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory.

Similar presentations


Presentation on theme: "1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory."— Presentation transcript:

1 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory Tel: +1 510-495-2905 bebargmeyer@lbl.gov JTC1 SC32 N1649

2 2 Topics F Standards development: OMG, ISO (TC 37 & JTC 1/SC 32), W3C, OASIS u Align, Coordinate, Integrate: Standards, Recommendations, Specifications F Semantics Challenges and Future Directions

3 Align, Coordinate, Integrate Standards 3 24707 11179 E3 19763 20944 WG 2 doing OK internally:

4 Align, Coordinate, Integrate Standards 4 WG 1 WG 2 WG 3 WG 4 SC 32? Clearwater meeting a step forward

5 5 Align, Coordinate, Integrate Standards/Recommendations/Specifications for Semantic Computing ISO/IEC JTC 1/SC 32 Us er s ISO/IEC 11179 Metadata Registries Metadata Registry Terminology Thesaurus Taxonomy Data Standards Ontology Structured Metadata Terminology CONCEPT Referent Refers To Symbolizes Stands For “Rose”, “ClipArt Rose” ISO TC 37 Semantic Web W3C Object Management MOF ODM CWM IMM OMG Node Edge Subject Predicate Object Graph RDF

6 Standards Development Semantics Management and Semantics Services – Semantic Computing 6 OMG W3C ISO/IEC JTC 1 SC 32 Align, Co-develop, Fast Track, PAS Submission … ISO TC 37

7 Standards Development Semantics Management and Semantics Services – Semantic Computing 7 OMG W3C ISO/IEC JTC 1 SC 32 Align, integrate, co-develop, Fast Track, PAS Submission … Can we coordinate content? W3C

8 A Success 8 OMG ISO/IEC JTC 1 SC 32 Some text and figures are identical in the two standards. ISO/IEC 24707 OMG ODM ISO/IEC 20944 – Common Logic OMG Ontology Definition Metamodel

9 Standards Development Semantics Management and Semantics Services – Semantic Computing 9 ISO/IEC 11179 (Edition 3) ISO/IEC JTC 1 SC 32 Ongoing effort

10 Standards Development Semantics Management and Semantics Services – Semantic Computing 10 Possible effort 11179 E3 proposals OMG RFP - MOF? IMM

11 Standards Development Semantics Management and Semantics Services – Semantic Computing 11 ISO/IEC 11179 (Edition 3) ISO/IEC JTC 1 SC 32 Hopeful? OMG IMM &

12 Other Possibilities F OASIS ebXML Registry F W3C Semantic Web Deployment WG F TC 37 12

13 Getting the information that we need, when we need it, without afflicting the excellent minds of humans with toil and drudgery The litany: F Too much or too little, irrelevant, not authoritative, out of date F Unknown quality, not trustable, lacks provenance, no certainty measures F Difficult to find, difficult to access, difficult to use F Meaning not clear, relationship to other information not clear F Data creators do not have the same understanding of the data as end users F Recorded data loses much real world meaning, context, relationships F Much of the meaning of data is buried in the processes used to manipulate the data (e.g., in computer code) F Need improvements in efficiency and effectiveness Every time we solve it, we re-create it. The Ageless Information Problem cf: Data, Information, Knowledge, Wisdom

14 F Improve traditional data management/data administration u Use stronger semantics management and semantics services capabilities F Enable something new u Semantic computing New Semantics Capabilities Proposed for ISO/IEC 11179 MDR (Edition 3)

15 F Processing that takes “meaning” into account u Makes use of concept systems, e.g., thesauri and/or ontologies u Moves some of the “meaning” of data from computer code to managed semantics F Processing that uses (e.g., reasons across) the relations between things not just computing about the things themselves. F Processing that helps to take people out of the computation, reducing the human toil u Semantics “grounding” for data, data discovery, extraction, mapping, translation, formatting, validation, inferencing, … F Delivering higher-level results that are more helpful for the user’s thought and action Semantic Computing: The Nub of It

16 In The Epic Information Struggle We Have Made Heroic Progress Files Machine Processing Computer Processing Cards Tape Disk

17 In structuring data and text -- F Structured Data u Columns on cards & tape (possibly comma separated) u Hierarchical (DBMS) u Network u Table (relational DBMS) u Hierarchy (XML) u Graph (RDF) F Semi-structured text u Nrof, trof, LaTeX … u SGML u HTML u XML In The Epic Information Struggle We Have Made Heroic Progress

18 In documenting data and text (e.g., semantics management) – F Data Standards u Code sets F (Meta)Data Standards u Data element definitions, valid values, value meanings u Metadata registries (MDR, ISO/IEC 11179) u Other standards as presented at this conference F Concept systems (or KOS) u Glossaries u Dictionaries u Thesauri u Taxonomies u Ontologies u Graphs In The Epic Information Struggle We Have Made Heroic Progress

19 F Improve data management through use of stronger semantics management u Databases u XML data u Other “traditional” data F Enable new wave of semantic computing u Take meaning of data into account u Process across relations as well as properties u May use reasoning engines, e.g., to draw inferences Semantic Management Proposals for 11179 Edition 3

20 Semantics Improve Data Management/Data Administration Object Class Chemopreventive Agent Property NSCNumber Conceptual Domain Agent Data Element Concept Chemopreventive Agent NSC Number Data Element Chemopreventive Agent Name Value Domain NSC Code Context caCORE Representation Code Classification Schemes caDSRTraining Valid Values Cyclooxygenase Inhibitor Doxercalciferol Eflornithine … Ursodiol Source: Denise Warzel, National Cancer Institute Enterprise Vocabulary Services (EVS) Concepts Unite NCI MDR

21 Semantic Computing Application: Find and process non-explicit data Analgesic Agent Non-Narcotic Analgesic AcetominophenNonsteroidal Antiinflammatory Drug Analgesic and Antipyretic Datril Anacin-3Tylenol For example… Patient data on drugs contains brand names (e.g. Tylenol, Anacin-3, Datril,…); However, want to study patients taking analgesic agents

22 A Semantics Application: Specify and compute across Relations, e.g., within a food web in an Arctic ecosystem An organism is connected to another organism for which it is a source of food energy and material by an arrow representing the direction of biomass transfer. Source: http://en.wikipedia.org/wiki/Food_web#Food_web (from SPIRE)http://en.wikipedia.org/wiki/Food_web#Food_web

23 Semantics Application: Combine Data, Metadata & Concept Systems NameDatatypeDefinitionUnits IDtext Monitoring Station Identifier not applicable DatedateDateyy-mm-dd Tempnumber Temperature (to 0.1 degree C) degrees Celcius Hgnumber Mercury contamination micrograms per liter IDDateTempHg A06-09-134.44 B06-09-139.32 X06-09-136.778 Inference Search Query: “find water bodies downstream from Fletcher Creek where chemical contamination was over 10 micrograms per liter between December 2001 and March 2003” Data: Metadata: BiologicalRadioactive Contamination leadcadmium mercury Chemical Concept system:

24 Semantics Application: Use data from systems that record the same facts with different terms F Reduce the human toil of drawing information together and performing analysis.

25 Challenge: Use data from systems that record the same facts with different terms Common Content OASIS/ebXML Registries Common Content ISO 11179 Registries Common Content Ontological Registries Common Content CASE Tool Repositories Common Content UDDI Registries Country Identifier Data Element XML Tag Term Hierarchy Attribute Business Specification Table Column Software Component Registries Common Content Database Catalogs Business Object Dublin Core Registries Common Content Coverage

26 Data Elements DZ BE CN DK EG FR... ZW ISO 3166 English Name ISO 3166 3-Numeric Code 012 056 156 208 818 250... 716 ISO 3166 2-Alpha Code Algeria Belgium China Denmark Egypt France... Zimbabwe Name: Context: Definition: Unique ID: 4572 Value Domain: Maintenance Org. Steward: Classification: Registration Authority: Others ISO 3166 French Name L`Algérie Belgique Chine Danemark Egypte La France... Zimbabwe DZA BEL CHN DNK EGY FRA... ZWE ISO 3166 3-Alpha Code Same Fact, Different Terms Algeria Belgium China Denmark Egypt France... Zimbabwe Name: Country Identifiers Context: Definition: Unique ID: 5769 Conceptual Domain: Maintenance Org.: Steward: Classification: Registration Authority: Others Data Element Concept

27 Challenge: Draw information together from a broad range of studies, databases, reports, etc.

28 A semantics application: Information Extraction and Use Segment Classify Associate Normalize Deduplicate Discover patterns Select models Fit parameters Inference Report results Actionable Information Decision Support Extraction Engine 11179-3 (E3) XMDR

29 Extraction Engines F Find concepts and relations between concepts in text, tables, data, audio, video, … F Produce databases (relational tables, graph structures), and other output F Functions: u Segment – find text snippets (boundaries important) u Classify – determines database field for text segment u Association – which text segments belong together u Normalization – put information into standard form u Deduplication – collapse redundant information

30 Metadata Registries are Useful Registered semantics F For “training” extraction engines F The “Normalize” function can make use of standard code sets that have mapping between representation forms. F The “Classify” function can interact with pre-established concept systems. Provenance F High precision for proper nouns, less precision (e.g., 70%) for other concepts -> impacts downstream processing, Need to track precision

31 Challenge: Gain Common Understanding of meaning between Data Creators and Data Users Users Information systems Data Creation Users EEA USGS DoD EPA environ agriculture climate human health industry tourism soil water air 123 345 445 670 248 591 308 123 345 445 670 248 591 308 3268 0825 1348 5038 2708 0000 2178 3268 0825 1348 5038 2708 0000 2178 textdata environ agriculture climate human health industry tourism soil water air 123 345 445 670 248 591 308 123 345 445 670 248 591 308 3268 0825 1348 5038 2708 0000 2178 3268 0825 1348 5038 2708 0000 2178 text ambiente agricultura tiempo salud hunano industria turismo tierra agua aero 123 345 445 670 248 591 308 123 345 445 670 248 591 308 3268 0825 1348 5038 2708 0000 2178 3268 0825 1348 5038 2708 0000 2178 textdata environ agriculture climate human health industry tourism soil water air 123 345 445 670 248 591 308 123 345 445 670 248 591 308 3268 0825 1348 5038 2708 0000 2178 3268 0825 1348 5038 2708 0000 2178 textdata Others... ambiente agricultura tiempo salud huno industria turismo tierra agua aero 123 345 445 670 248 591 308 123 345 445 670 248 591 308 3268 0825 1348 5038 3268 0825 1348 5038 2708 0000 2178 textdata A common interpretation of what the data represents

32 F Vocabulary Management is essential for use of semantic technologies u Define concepts and relationships u Harmonize terminology, resolve conflicts u Collaborate with stakeholders F An approach u Select a domain of interest u Enter core concepts and relationships u Engage community in vocabulary review u Harmonize, validate and vet the vocabulary u Enter metadata describing enterprise data u Link concept system to metadata Practical Vocabulary Management

33 F For vocabulary repository u Register, harmonize, validate, and vet definitions and relations F To register mappings between multiple vocabularies F To register mappings of concepts to data F To provide semantics services F To register and manage the provenance of data 11179-3 (E3) is part of the infrastructure for semantics and data management. These capabilities are proposed for ISO/IEC 11179 Edition 3 Use eXtended MDR Capabilities

34 F Upside u Collaborative n Supports interaction with community of interest n Shared evolution and dissemination n Enables Review Cycle u Standards-based – don’t lock semantics into proprietary technology u Foundation for strategic data centric applications u Lays the foundation for Ontology-based Information Management u Content is reusable for many purposes F Downside u Managing semantics is HARD WORK - No matter how friendly the tools u Needs integration with other components 11179 (E3) Use

35 F Data management and metadata management must evolve to address more complex data structures (relational, object, hierarchies, graphs) u Query capabilities n More than SQL, XQuery, SPARQL u Discovery mechanisms n More than Google u Access, mining, extraction We need stronger semantics management Some Challenges

36 F Registering and mapping ontologies F Ontology Evolution F Registering Process Ontologies Metadata Registry Support for

37 Thank You F Acknowledgements u Karlo Berket, LBNL u Kevin Keck, LBNL u John McCarthy, LBNL u Harold Solbrig, Apelon This material is based upon work supported by the National Science Foundation under Grant No. 0637122, USEPA and USDOD. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, USEPA or USDOD. 37 Bruce Bargmeyer Lawrence Berkeley National Laboratory & Berkeley Water Center University of California, Berkeley Tel: +1 510-495-2905 bebargmeyer@lbl.gov


Download ppt "1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory."

Similar presentations


Ads by Google