Presentation is loading. Please wait.

Presentation is loading. Please wait.

2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck Lawrence.

Similar presentations


Presentation on theme: "2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck Lawrence."— Presentation transcript:

1 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence Berkeley National Laboratory Presentation to Open Metadata Forum Kobe, Japan March 21, 2006

2 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan2 XMDR means: Extended Metadata Registry

3 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan3 The Cast ● Bruce Bargmeyer (LBNL) = Principal Investigator ● Kevin Keck (LBNL) = architect & stds. (design) ● Frank Olken (LBNL) = content characterization & stds. (design) ● John McCarthy (LBNL) = prototype development (management) ● Karlo Berket (LBNL) = prototype development ● Harold Solbrig (Mayo) = content preprocessing via LexGrid, stds ● Gayle Hodge (USGS) = content characterization, acquisition ● Denise Warzel (NCI) = content acquisition, standards, design ● Larry Fitzwater (EPA) = program mgt. (vision, direction) ● Nancy Lawler (DOD) = program mgt. (vision, direction) ● Sam Chance (DOD) = program mgt. (vision, direction)

4 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan4 Organizational Cast ● Lawrence Berkeley National Laboratory ● Environmental Protection Agency ● National Cancer Institute ● Mayo Clinic ● United States Geological Survey ● Department of Defense

5 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan5 Goals ● Assist revisions of ISO/IEC 11179 Metadata Registry Standard to encompass additional semantic descriptions and resources  Vocabularies, thesauri, etc.  Ontologies  Relationships  Semantic types ● Design and implement prototype Extended Metadata Registry ● Load metadata content into prototype ● Demonstrate prototype

6 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan6 Why Metadata Registries? ● Facilitate reuse/standardization/integration/exchange of data ● Design time:  Database / messaging / application / forms designers  Data warehouse design ● Run-time:  Query formulation / optimization  Federated data query optimization / processing  Extraction, Translation, Load (ETL) of Data Warehouses  Semantic services, composition, workflows,... ● Users  Finding, understanding data  Understanding data entry forms

7 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan7 Why Standards? ● Developing metamodel to serve as design for next generation metadata registries ● Evolve ISO/IEC 11179 Metadata Data Registry Standard  Edition 2 (current) ● UML modeling, relational DB technology implementation  Edition 3 (new) ● UML + OWL (Ontology Web Language) / MOF (Meta Object Facility) / CL (Common Logic) modeling ● Add support for ontologies

8 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan8 More on Why MDR Standards? ● MDR Standards  Can improve metadata creation practice  Can improve metadata and data reuse  Facilitate MDR adoption by organizations  Facilitate MDR interoperability  Facilitate MDR software marketing  Facilitate MDR procurement  Facilitate alignment / mapping among metadata schemas,...

9 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan9 Proposed Changes to ISO/IEC 11179 ● Support for ontologies, etc. ● More formal modeling of relationships ● Semantic types (?)

10 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan10 Changes to ISO/IEC 11179 Std. ● Add support for ontologies, vocabularies  Add ontologies  Add predicates (logical formulae)  Add axioms (asserted to be true)  Add support for modularization of ontologies ● Add inclusion mechanisms for concept systems and ontologies ● Assert axioms in context of containing ontology

11 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan11 Why add support for ontologies? ● More precise specification of data semantics (than natural language definitions) ● Machine processing of semantic specifications of data  Classification, subsumption testing, alignment, spatial, temporal reasoning ● Reusable semantic specifications for subject domains ● Conceptual data models to facilitate data integration ● Encoding of much current work on data semantics and terminologies as ontologies ● Useful for machine learning.

12 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan12 Issues in Including Ontologies in ISO/IEC 11179 ● Lack of agreement on logical formalisms  FOL, description logic (which?),... ● Hence, MDR std must be agnostic among logic formalisms ● Poses difficulties for:  Standards specification  MDR implementation  MDR interoperability ● See work of OMG Ontology Definition Metamodel (ODM) standard

13 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan13 Changes to ISO/IEC 11179 Std. ● Formalize specification of semantic relationships  Refinement of Edition 2 Classification Schemes  Add relationships (types), roles, links (instances) among concepts  Specify attributes of relationships ● Reflexivity, irreflexivity, symmetry, anti-symmetry, transitivity  To support inference across semantic relationships ● e.g., transitive closure over is-a, part-of,...

14 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan14 Relationship Modeling in ISO/IEC 11179 Edition 3 ● Edition 2 has classification schemes and specialized relationships among various metamodel entities ● Proposed for Edition 3 ● Binary and N-ary semantic relationships among concepts (a.k.a. relations) ● Treat data element concept, conceptual value domain, value meaning, etc. as subtypes of concept ● More detailed characterization of relationships:  Roles / links  Reflexivity, symmetry, anti-symmetry, transitivity,....

15 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan15 Why care about relationship characterization? ● Who cares about reflexivity, irreflexivity, symmetry, transitivity? ● Answer: need this information for inference on semantic relationships (usually binary)  Example: Does it make sense to compute transitive closure? ● Is-a: transitive ● Part-of: sometimes transitive ● Equals: transitive, symmetric ● Similar: usually symmetric, typically not transitive

16 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan16 Semantic Types for ISO/IEC 11179 ● ISO/IEC 11179 Edition 2 has “datatypes”  Associated with “value domain”  i.e., datatypes are an aspect of representation NOT semantics ● Semantic Types  Concern meaning rather than representation  Uses: ● Constraints over relationship roles ● Attribute of concepts, conceptual value domains,... ● Ubiquitous in ontologies, schemas,...

17 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan17 Some Issues for Semantic Types ● Alternative approaches:  Build semantic types into 11179 metamodel  Reuse relationships for semantic type specifications  Treat semantic types as unary predicates in ontologies + axioms ● Should we have a standard set of semantic types (at least base types)  Yes, for interoperability  No, for flexibility ● Collection types, type constructors ?

18 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan18 Why Construct A Prototype? ● To explore alternative revisions to ISO/IEC 11179 ● To demonstrate that proposed revisions to ISO/IEC 11179 Metadata Registry Std. are:  Feasible  Useful ● To experiment with alternative architectures / technologies for constructing extended metadata registries.  Text retrieval engines - Lucene  Inference engines – Jena, Kowari (?),....  Service oriented architecture (SOA) ● To facilitate deployment of revised ISO/IEC Metadata Registries  Example implementation  Open Source Code !

19 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan19 Why Content? ● Content characterization assists in shaping revisions to ISO/IEC 11179 ● Content characterization assists in selection of content to load ● Content ingestion, installation, querying provides a means to exercise the prototype  Testing  Demonstration  Performance evaluation  Utility evaluation

20 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan20 Metadata Content Activities ● Content Characterization  e.g., graph theoretic characterization ● Content Acquisition ● Content Preprocessing  Into standard formats for loading (H. Solbrig) ● Content Loading ● Content Querying

21 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan21 Desiderata for Content Selection ● Accessibility  Licensing, source cooperation, unclassified ● Documentation, familiarity to XMDR collaborators ● Funder interest ● Diversity of metadata types, subject areas ● Diverse graph structures (of semantic relationships) ● OWL encodings available ● Moderate size ● Opportunities for mappings among metadata sets ● Multi-linguality

22 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan22 Content Characterization ● Provenance: Name, source, contact,... ● Type of metadata:  thesauri, ontology, ISO/IEC 11179 metadata registry,... ● Graph Characterization  Tree, Faceted Classification, partial order (directed acyclic graph), cyclic graph,... ● Size: # concepts, # links, # bytes ● Definitions ? ● File Formats ● OWL encoding ? ● Multilingual ● Availability / licensing issues

23 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan23 Why Graph-theoretic Content Characterization? ● Important structural taxonomy ● Impacts:  Expressivity required of registry  Content representation, index structures  Search, matching algorithms  Computational complexity of search, matching,...  Inference algorithms  Computational complexity of inference  Design / implementation / performance of metadata registries

24 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan24 Loaded content metadatasets ● National Cancer Institute Thesaurus (NCIT) ● Defense Technology Information Center (DTIC) Thesaurus ● General Multilingual Environmental Thesaurus (GEMET) ● Adult Mouse Anatomical Dictionary ● EPA Terms of the Environment ● ISO 3166 Country Codes ● ISO 4217 Currency Codes

25 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan25 Other Metadatasets of Interest ● NCI Cancer Data Standards Repository (caDSR) ● EPA Environmental Data Registry (EDR) ● NLM Uniform Medical Language System (UMLS) ● USGS Geographic Names Information System (GNIS) ● Integrated Taxonomic Information System (ITIS) ● NBII Biocomplexity Thesaurus ● ISO 639 Language Identifiers ● Logical Observations, Identifiers, Codes (LOINC) ● Getty Thesaurus of Geographical Names (TGN) ● NASA Semantic Web Earth and Environmental Terminologies (SWEET) ● Dublin Core Metadata (?)

26 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan26 Conclusions ● XMDR Activities  ISO/IEC 11179 Revisions ● Support for ontologies, etc. ● Relationships ● Semantic types  Prototype Development  Content (characterization, loading, query)  Prototype testing, performance evaluation, demos

27 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan27 Coming in Second Part of Talk (Kevin Keck) : ● Detailed discussion of the architecture and technology of the prototype...

28 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan28 Acknowledgements ● Financial support from U.S. Dept. of Defense, U.S. Environmental Protection Agency ● In kind contributions from U.S. National Cancer Institute, Mayo Clinic, US Geological Survey ● Support from program managers: Nancy Lawler (DOD) and Sam Chance (DOD) ● Comments on drafts of this talk by John L. McCarthy

29 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan29 Contact Information: ● Project:  http://xmdr.org/ http://xmdr.org/ ● Frank Olken:  Lawrence Berkeley National Laboratory  Email: olken@lbl.govolken@lbl.gov  Tel: 510-486-5891  URL: http://www.lbl.gov/~olkenhttp://www.lbl.gov/~olken


Download ppt "2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck Lawrence."

Similar presentations


Ads by Google