Presentation on theme: "Direction of Proposals for New Edition (E3) of ISO/IEC 11179 XMDR Working Group Presentation to SC 32/WG 2 meeting September, 2005 Toronto, Canada."— Presentation transcript:
Direction of Proposals for New Edition (E3) of ISO/IEC XMDR Working Group Presentation to SC 32/WG 2 meeting September, 2005 Toronto, Canada
Page 2 XMDR Presentation Where have we been? Where are we now?…& where are we planning to go? System manuals Data dictionaries E E3 Terminologies, ontologies, etc. XML & related standards Semantic grids E2 Semantics management for data Semantics services (SSOA) Complex semantics management Data engineering/XML Data Data Standards/Data Administration XMDR Project
Page 4 XMDR Presentation The semantics challenge has evolved Computer Era: 3 rd Generation Languages - Challenge: Automated Data Processing – convert paper data systems to automated systems and improve processing. Coded data to save memory, disk & tape. Began to identify data with meaningful names –Data naming methods were innovative and helpful Described data using unstructured text in manuals and/or with comments embedded in software –Only visible & useful to programmers Text/documents were not computerized (remember typewriters, stencils, mimeographs, carbon paper?) ISO/IEC JTC 1/SC 14 developed standard code sets (valid values) Focus: Nomenclature for data only
Page 5 XMDR Presentation The semantics challenge has evolved Computer Era: Early DBMS, 4 th GL query systems, word processing - Challenge: Manage data – schema integration, eliminate “bit- twiddling” Document data in data dictionaries, software packages usually linked to a DBMS. Enforce “integrity constraints (e.g., valid values)”. Use “description” field to describe data Manage data life cycle –Standard code sets (valid values) were useful, but difficult to manage – tended to be left behind by programming changes required to keep up with real world changes. –Data naming methods failed to achieve interoperation of content between applications and between organizations, but remain useful as human friendly identifiers SC 14 began to develop methodology for data element standardization Edition 1 - Part 3, written in text, had ~15 attributes for data elements (editor-Netherlands) Focus on standards for data elements
Page 6 XMDR Presentation The semantics challenge has evolved Computer Era: DBMS, query systems, word processing - challenge: Manage data – DBMS schema integration, data quality (continued) Began to model data and processes Modeling standards became useful, ERD, NIAM, UML Word processing began to capture text documents Keywords, glossaries, thesauri, and taxonomies became “machine readable”, but were treated as documents and were used manually SC 14 Developed methodology for data element standardization (E1) Parts 4 & 5 covered data definitions and names. Part 6 covered registration Part 2 WD suggested development of a global taxonomy, then changed to specify classification attributes (term, definition & identifier) in Part 2 (E1) All parts of were written in text. Focus on managing data elements and classification of data elements
Page 7 XMDR Presentation The semantics challenge has evolved Computer Era: Maturing Relational DBMS, Metadata Registries, XML, early WWW - challenge: Manage metadata, use terminology for data integration, data interoperability, data provenance, XML schema integration SC 14 -> SC 32/WG 2. Developed Edition 2: Broadened from data elements to management of all “administered items” theme became “metadata registries” Part 3 was expressed as a metamodel. Part 3 included a “classification scheme region” (nodes & relationships) to improve semantics management –Link terms in definitions and valid values to terms and definitions in vocabularies and terminologies –Align concepts used in data with concepts used in text –Use computers to create and manage terminologies, thesauri, taxonomies Part 2 (E2) restated the classification scheme region attributes from Part 3. (All Part 2 E1 attributes were included in Part 3 (E2)). Focus on semantics for data and text
Page 8 XMDR Presentation The semantics challenge has evolved Computer Era: WWW, Concept systems, XMDR - challenge: Semantics management & semantics Services SC 32/WG 2 developing (E3). Proposals are made to extend semantics management and semantics services for MDR
Page 9 XMDR Presentation MDR Uses Data administration (design time) –Data engineering –Design of databases, DB applications, XML Schemas –Design of messages –Concept specification: Terminologies, Taxonomies, Ontologies … –Data documentation Data Integration & harmonization (design + run time) –Federated queries, data warehousing –Discovery of hidden relationships between data –Provide links between concepts and data Support for interactive users (run time) –Data entry forms, output explanation –Data discovery –Show data provenance –Provide understanding of data and related concepts Semantic Services (design + run time) –MDR metadata interchange –Ground concepts found in RDF statements and ontologies –Semantic computing –Semantic grids & semantic web services
Page 10 XMDR Presentation ISO/IEC MDR Standard Goals Used to record and link: –Data elements –Data element concepts –Conceptual Domains –Value Domains: e.g, enumerated value domains –Classification Schemes –….. Goal: –To record the unambiguous meaning of data Human understandable semantics: Current paradigm is natural language definitions For E3: Machine “Understandable”: Formal definitions (and axioms). Machine “understandable” in sense that computer can make use of concept systems for processing
Page 11 XMDR Presentation Advanced E3 Use Scenario A User is concerned about a specific type of cancer Wants to discover any documents on the web (reliable and unreliable sources) about the disease, causes, treatment, victims, and researchers Wants to link concepts and individuals found in text to metadata and data in databases (where metadata/data relate to the concepts/individuals) Wants to find relevant information where the terms used for the concepts vary: by regions, disciplines, scientific nomenclature, vernacular usage, language, and names of individuals. Want to find information that is related through generalization and specialization and other relationships. Note: No assumption of federation or central control over data and text generation. However, well managed concept systems and metadata (e.g., data definitions) help.
Page 13 XMDR Presentation Object Class Chemopreventive Agent Property NSCNumber Conceptual Domain Agent Data Element Concept Chemopreventive Agent NSC Number Data Element Chemopreventive Agent Name Value Domain NSC Code Context caCORE Representation Code Classification Schemes caDSRTraining Valid Values Cyclooxygenase Inhibitor Doxercalciferol Eflornithine … Ursodiol *Concept Use and Integration with Part 3, Edition 2
Page 14 XMDR Presentation Semantic Management Extensions Goals for Edition 3 Sharable data that can easily be identified, shared, integrated, and made interoperable across information systems and organizations (a continuing challenge) –Unambiguous metadata characteristics to register semantic, syntactic and lexical information about data and text Human AND machine “understandable” Maintain backward compatibility with (E2) implementations. Registration and management of any semantic information useful for administering and managing the content of data and text
Page 15 XMDR Presentation Semantic Management Extensions Goals for Edition 3 Specify disciplined way to manage linkage of concept systems (KOS) to administered items. Improve the linkage of concept systems to data and text Enable users to find correspondences between concepts in text and in data, where these are found in dispersed documents and databases. Concepts may be given linguistic expression with terms that vary by synonymy, discipline, region, language, etc. Registration of semantics to facilitate concept (and data) mapping, inference, aggregation Manage metadata for not only DBMS & XML schemas, but also for knowledge bases, concept systems, …
Page 16 XMDR Presentation Semantic Management Extensions Goals for Edition 3 (Continued) Manage both data life cycle and ontology life cycle Help to harmonize ontologies Manage metamodels, reference ontologies & local ontologies Restate Part 3 as an ontology and in Common Logic to enable use in Semantics technologies (Semantic Web, inference engines, reasoners, …). Restate Part 3 using MOF registries provide support for ISO/IEC Specify semantics services for a semantics service oriented architecture. Enabler for semantic computing, semantic agents, semantic grids. –Semantic services needed for semantic web and semantic grids to become part of ISO/IEC
Page 17 XMDR Presentation XMDR Intentions We want to try capture existing thesauri, terminologies, ontologies as sources for the semantic specification of data elements to be used in databases, XML documents, messages, etc. We want to incorporate more formal semantic specifications (e.g., ontologies, formal statements (axioms, sentences,...)) to permit more precise semantic specifications (cf. to natural language definitions). We want to incorporate formal semantic specifications to facilitate machine processing of semantic specifications, e.g., by inference engines, agents, etc. Such machine processing of semantic specifications can be used in support of federated database access, web service identification and coordination, agent-based computations, etc. We want to provide a framework for the registration, harmonization, evolution and standardization of ontologies.
Page 18 XMDR Presentation Conceptual vs. Information Centric Metadata Standards Ontology Standards: OWL, KIF, CL,... Terminology Standards Connections ??? Conceptual Level OMG Standards: MOF, CWM, UML Information Artifacts Metadata
Page 19 XMDR Presentation Space of Metadata Standards OMG Standards: MOF, UML, CWM Terminology Standards Ontology Standards: OWL, KIF, CL, XTM,.... MMF & ISO/IEC Edition 3 Metadata Registry Standards About information artifacts: data elements, schemas, UML models,... Conceptual models of the “real world” ISO/IEC connects both conceptual models and information artifacts.
Page 20 XMDR Presentation ISO/IEC Metadata Registry Standard Connects both: –Conceptual models of the real world: Concepts, data element concepts, classification schemes Terminologies, taxonomies, ontologies –Information Artifacts Data elements, enumerated values,... UML models (e.g., in caDSR)