Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 eXtended Metadata Registries (XMDR) NKOS Workshop June 11, 2005 Bruce Bargmeyer Chair: ISO/IEC JTC1/SC32-Data Mgmt & Interchange.

Similar presentations


Presentation on theme: "1 eXtended Metadata Registries (XMDR) NKOS Workshop June 11, 2005 Bruce Bargmeyer Chair: ISO/IEC JTC1/SC32-Data Mgmt & Interchange."— Presentation transcript:

1 1 eXtended Metadata Registries (XMDR) NKOS Workshop June 11, 2005 Bruce Bargmeyer bebargmeyer@lbl.gov Chair: ISO/IEC JTC1/SC32-Data Mgmt & Interchange PI: XMDR Project Lawrence Berkley National Laboratory University of California WWW.XMDR.ORG

2 2 XMDR Project Draws Together Users ISO/IEC 11179 Metadata Registries Terminology CONCEPT Referent Refers To Symbolizes Stands For “Rose”, “ClipArt” Metadata Registry Terminology Thesaurus Taxonomy Data Standards Ontology Structured Metadata ISO/IEC JTC 1/SC 32 ISO TC 37 & …

3 3 Align, Coordinate, Integrate Standards/Recommendations ISO/IEC JTC 1/SC 32 Us ers ISO/IEC 11179 Metadata Registries Terminology CONCEPT Referent Refers To Symbolizes Stands For “Rose”, “ClipArt” Metadata Registry Terminology Thesaurus Taxonomy Data Standards Ontology Structured Metadata ISO TC 37 & … Semantic Web W3C

4 4 XMDR Metadata Registries Extensions F Register (and manage) any semantics that are useful in managing data. u Enable users to find and register correspondences between the content (concepts & relationships) of multiple KOSs u Enable registration of links between concepts and data in databases F Provide Semantic Services -- lay foundation for semantics based computing: Semantics Service Oriented Architecture, Semantic Grids, Semantics based workflows, Semantic Web … F Prepare standards proposals, especially for ISO/IEC 11179 Parts 2 & 3 F Develop a reference implementation

5 5 Data Elements DZ BE CN DK EG FR... ZW ISO 3166 English Name ISO 3166 3-Numeric Code 012 056 156 208 818 250... 716 ISO 3166 2-Alpha Code Algeria Belgium China Denmark Egypt France... Zimbabwe Name: Context: Definition: Unique ID: 4572 Value Domain: Maintenance Org. Steward: Classification: Registration Authority: Others ISO 3166 French Name L`Algérie Belgique Chine Danemark Egypte La France... Zimbabwe DZA BEL CHN DNK EGY FRA... ZWE ISO 3166 3-Alpha Code Current 11179 Metadata Registry Content E.g., Country Identifier Algeria Belgium China Denmark Egypt France... Zimbabwe Name: Country Identifiers Context: Definition: Unique ID: 5769 Conceptual Domain: Maintenance Org.: Steward: Classification: Registration Authority: Others Data Element Concept

6 6 Data Element List – Address Group Alice Wilson 161 North Street Happy Valley MO 63105 USA Use ISO/IEC 11179 MDR to Create XML Schemas, DBMS Schemas 33c Name Street Address City, State Postal Code Country

7 7 XMDR: Register Ontologies Concept Geographic Area Geographic Sub-Area Country Country Identifier Country NameCountry Code Short Name ISO 3166 2-Character Code ISO 3166 3- Character Code Long Name Distributor Country Name Mailing Address Country Name ISO 3166 3-Numeric Code FIPS Code

8 8 XMDR: Register Any Knowledge Organization System (KOS) F Keywords F Glossaries F Gazetteers F Thesauri F Taxonomies F Concept system (ISO TC 37) F Ontologies F Axiomititized Ontologies

9 9 XMDR: Register Graphs Directed Graph Directed Acyclic Graph Graph Undirected Graph Bipartite Graph Partial Order Graph Faceted Classification Clique Partial Order Tree Tree Lattice Ordered Tree Note: not all bipartite graphs are undirected. Graph Taxonomy:

10 10 Samples of Eco & Bio Graph Data F Nutrient cycles in microbial ecologies These are bipartite graphs, with two sets of nodes, microbes and reactants (nutrients), and directed edges indicating input and output relationships. Such nutrient cycle graphs are used to model the flow of nutrients in microbial ecologies, e.g., subsurface microbial ecologies for bioremediation. F Chemical structure graphs: Here atoms are nodes, and chemical bonds are represented by undirected edges. Multi- electron bonds are often represented by multiple edges between nodes (atoms), hence these are multigraphs. Common queries include subgraph isomorphism. Chemical structure graphs are commonly used in chemoinformatics systems, such as Chem Abstracts, MDL Systems, etc. F Sequence data and multiple sequence alignments. DNA/RNA/Protein sequences can be modeled as linear graphs F Topological adjacency relationships also arise in anatomy. These relationships differ from partonomies in that adjacency relationships are undirected and not generally transitive.

11 11 Eco & Bio Graph Data (Continued) F Taxonomies of proteins, chemical compounds, and organisms,... These taxonomies (classification systems) are usually represented as directed acyclic graphs (partial orders or lattices). They are used when querying the pathways databases. Common queries are subsumption testing between two terms/concepts, i.e., is one concept a subset or instance of another. Note that some phylogenetic tree computations generate unrooted, i.e., undirected. trees. F Metabolic pathways: chemical reactions used for energy production, synthesis of proteins, carbohydrates, etc. Note that these graphs are usually cyclic. F Signaling pathways: chemical reactions for information transmission and processing. Often these reactions involve small numbers of molecules. Graph structure is similar to metabolic pathways. F Partonomies are used in biological settings most often to represent common topological relationships of gross anatomy in multi-cellular organisms. They are also useful in sub-cellular anatomy, and possibly in describing protein complexes. They are comprised of part-of relationships (in contrast to is-a relationships of taxonomies). Part-of relationships are represented by directed edges and are transitive. Partonomies are directed acyclic graphs. F Data Provenance relationships are used to record the source and derivation of data. Here, some nodes are used to represent either individual "facts" or "datasets" and other nodes represent "data sources" (either labs or individuals). Edges between "datasets" and "data sources" indicate "contributed by". Other edges (between datasets (or facts)) indicate derived from (e.g., via inference or computation). Data provenance graphs are usually directed acyclic graphs.

12 12 A graph theoretic characterization F Readily comprehensible characterization of metadata structures F Graph structure has implications for: u Integrity Constraint Enforcement u Data structures u Query languages u Combining metadata sets u Algorithms for query processing

13 13 Trees F In metadata settings trees are almost most often directed u edges indicate direction F In metadata settings trees are usually partial orders u Transtivity is implied (see next slide) u Not true for some trees with mixed edge types. u Not always true for all partonomies

14 14 Example: Tree Oakland Berkeley Alameda County California part-of Santa Clara San Jose Santa Clara County part-of

15 15 Trees - cont. F Uniform vs. non-uniform height subtrees F Uniform height subtrees u fixed number of levels u common in dimensions of multi-dimensional data models F Non-uniform height subtrees u common terminologies

16 16 Partially Ordered Trees F A conventional directed tree F Plus, assumption of transitivity F Usually only show immediate ancestors (transitive reduction) F Edges of transitive closure are implied F Classic Example: u Simple Taxonomy, “is-a” relationship

17 17 Example: Partial Order Tree PolioSmallpox Infectious Disease Disease is-a Diabetes Heart disease Chronic Disease is-a Signifies inferred is-a relationship

18 18 Ordered Trees F Order here refers to order among sibling nodes (not related to partial order discussed elsewhere) F XML documents are ordered trees u Ordering of “sub-elements” is to support classic linear encoding of documents

19 19 Example: Ordered Tree part-of Paper Title page Section Bibliography Note: implicit ordering relation among parts of paper.

20 20 Faceted Classification F Classification scheme has mulitple facets F Each facet = partial order tree F Categories = conjunction of facet values (often written as [facet1, facet2, facet3]) F Faceted classification = a simplified partial order graph F Introduced by Ranganathan in 19 th century, as Colon Classification scheme F Faceted classification can be descirbed with Description Logc, e.g., OWL-DL

21 21 Example: Faceted Classification Wheeled Vehicle Facet 2 wheeled 4 wheeled 3 wheeled is-a Internal Combustion Human Powered Vehicle Propulsion Facet Bicycle Tricycle Motorcycle Auto is-a

22 22 Faceted Classifications and Multi- dimensional Data Model F MDM – a.k.a. OLAP data model u Online Analytical Processing data model u Star / Snowflake schemas F Fact Tables u fact = function over Cartesian product of dimensions u dimensions = facets n geographic region, product category, year,...

23 23 Directed Acylic Graphs F Graph: u Directed edges u No cycles u No assumptions about transitivity (e.g., mixed edge types, some partonomies) u Nodes may have multiple parents F Examples: u Partonomies (“part-of”) - transitivity is not always true

24 24 Example: Directed Acyclic Graph Wheeled Vehicle 2 Wheeled Vehicle 4 Wheeled Vehicle 3 Wheeled Vehicle is-a Internal Combustion Vehicle Human Powered Vehicle Propelled Vehicle Bicycle Tricycle Motorcycle Auto is-a Vehicle

25 25 Partial Order Graphs F Directed acyclic graphs + inferred transitivity F Nodes may have multiple parents F Most taxonomies drawn as transitive reduction, transitive closure edges are implied. F Examples: u all taxonomies u most partonomies u multiple inheritance F POGs can be described in Description Logic, e.g., OWL- DL

26 26 Example: Partial Order Graph Wheeled Vehicle 2 Wheeled Vehicle 4 Wheeled Vehicle 3 Wheeled Vehicle is-a Internal Combustion Vehicle Human Powered Vehicle Propelled Vehicle Bicycle Tricycle Motorcycle Auto is-a Vehicle Dashed line = inferred is-a (transitive closure)

27 27 Directed Graph F Generalization of DAG (directed acyclic graph) F Cycles are allowed F Arises when many edge types allowed F Example: UMLS

28 28 Lattices F A partial order F For every pair of elements A and B u There exists a least upper bound u There exists a greatest lower bound F Example: u The power set (all possible subsets) of a finite set u LUB(A,B) = union of two sets A, B u GLB(A,B) = intersect of two sets A,B

29 29 Example Lattice: Powerset of 3 element set {a,b,c} {c} {a} {b} {b,c}{a,b} {a,c} Denotes subset Empty Set

30 30 Lattices - Applications F Formal Concept Analysis u synthesizing taxonomies F Machine Learning u concept learning

31 31 Bipartite Graphs F Vertices = two disjoint sets, V and W F All edges connect one vertex from V and one vertex from W F Examples: u mappings among value representations u mappings among schemas u (entity/attribute, relationship) nodes in Conceptual Graphs

32 32 Example Bipartite Graph California Massachusetts Oregon CA MA OR States Two-letter state codes

33 33 Clique F Clique = complete graph (or subgraph) u all possible edges are present F Used to represent equivalence classes F Typically, on undirected graphs

34 34 Example of Clique California CA Calif. CAL Here edges denote synonymy.

35 35 Compound Graphs F Edges can point to/from subgraphs, not just simple nodes F Used in conceptual graphs F CG is isomorphic to First Order Logic F Could be used to specify contexts for subgraphs

36 36 Example Compound Graph Colin Powell claimed Iraq had WMDs

37 37 Conclusions F We can characterize metadata structure in terms of graph structures F Partial Order Graphs are the most common structure: u used for taxonomies, partonomies u support multiple inheritance, faceted classification u implicit inclusion of inferred transitive closure edges

38 38 Challenges F How to register & manage the various graph structures? u DBMS, File systems …. F How to query the graph structures? u XQuery for XML u Poor to non-existent graph query languages F How to get adequate performance, even in high performance computing environment F User interface complexity F How to manage semantic drift F Versions F How to interrelate graphs with other graphs and with data F Granularity at which to register metadata (then point to greater detail elsewhere?)

39 39 ISO/IEC 11179 Metadata Registries + XMDR F Register and manage semantics that are or can be harmonized and vetted by some Community of Interest (COI) F Provide Semantic Services F E.g., the semantics can be referenced by RDF statements (subjects, predicates, objects) F The semantics can be used to bootstrap the Semantic Web and Semantic Computing u A “vocabulary” that is grounded for some COI F Enable registration using formal logic as well as natural language

40 40 Terminolocgical & Formal (Axiomatized) Ontologies The difference between a terminological ontology and a formal ontology is one of degree: as more axioms are added to a terminological ontology, it may evolve into a formal or axiomatized ontology. Cyc has the most detailed axioms and definitions; it is an example of an axiomatized or formal ontology. WordNet is usually considered a terminological ontology. Building, Sharing, and Merging Ontologies John F. Sowa

41 41 An Axiom for an Axiomatized Ontology Definition: The resource_cost_point predicate, cpr, specifies the cost_value, c, (monetary units) of a resource, r, required by an activity, a, upto a certain time point, t. If a resource of the terminal use or consume states, s, for an activity, a, are enabled at time point, t, there must exist a cost_value, c, at time point, t, for the activity, a,that uses or consumes the resource, r. The time interval, ti = [ts, te], during which a resource is used or consumed byan activity is specified in the use or consume specifications as use_spec(r, a, ts, te, q) or consume_spec(r, a, ts, te, q) where activity, a, uses or consumes quantity, q, of resource, r, during the time interval [ts, te]. Hence, Axiom: ∀ a, s, r, q, ts, te, (use_spec(r, a, ts, te, q) ∧ enabled(s, a, t)) ∨ (consume_spec(r, a, ts, te, q) ∧ enabled(s, a, t))≡ ∃ c, cpr(a,c,t,r) Cost Ontology for TOronto Virtual Enterprise (TOVE)

42 42 XMDR: Vocabularies – Vetted with a Community of Interest to Boostrap the Semantic Web and Semantic Computing dbA:ma344 “AB”^^ai:StateCode ai: StateUSPSCode @prefix dbA: “http:/www.epa.gov/databaseA” @prefix ai: “http://www.epa/gov/edr/sw/AdministeredItem#” dbA:e0139 ai: MailingAddress

43 43 ISO/IEC 11179 Part 3 Metamodel (Metadata Registry Schema) Expressed in: F UML model XMDR: Also express in: F OWL Ontology F Common Logic

44 44 Ontology Editor Protege 11179 OWL Ontology XMDR Prototype: Modular Architecture-- Initial Implemented Modules MetadataValidator AuthenticationService MappingEngine Registry External Interface Generalization Composition (tight ownership) Aggregation (loose ownership) Jena, Xerces Java RetrievalIndex FullTextIndex Lucene LogicBasedIndex Jena, OWI KS Racer RegistryStore WritableRegistryStore Subversion

45 45 XMDR Content Priority List Phase 1 (V.A) National Drug File Reference Terminology DTIC Thesaurus (Defense Technology Info. Center Thesaurus) NCI Thesaurus National Cancer Institute Thesaurus NCI Data Elements (National Cancer Institute Data Standards Registry UMLS (non-proprietary portions) GEMET (General Multilingual Environmental Thesaurus) EDR Data Elements (Environmental Data Registry) ISO 3166 Country Codes – from EPA EDR USGS Geographic Names Information System (GNIS)

46 46 XMDR Content Priority List Phase 2 LOINC Logical Observation Identifiers Names and Codes ITIS Integrated Taxonomic Information System Getty Thesaurus of Geographic Names (TGN) SIC (Standard Industrial Classification System) NAICS (North American Industrial Classification System) NAIC-SIC mappings UNSPSC (United Nations Standard Products and Services Codes) EPA Chemical Substance Registry System EPA Terminology Reference System ISO Language Identifiers ISO 639-3 Part 3 IETF Language Identifiers RFC 1766 Units Ontology

47 47 XMDR Content Priority List Phase 3 HL7 Terminology HL7 Data Elements GO (Gene Ontology) NBII Biocomplexity Thesaurus EPA Web Registry Controlled Vocabulary BioPAX Ontology NASA SWEET Ontologies NDRTF

48 48 Project Status This year: F Building XMDR Core (content & System) Next year: F Extend XMDR-Core (content & system) F R&D Semantic Services in a Semantic Service Oriented Architecture Expected to be five-year project

49 49 XMDR Project Participants F Collaborative, interagency effort u US Environmental Protection Agency u United States Geological Survey u National Cancer Institute u Mayo Clinic u US Department of Defense u Lawrence Berkeley National Laboratory u & others F Interagency/International Cooperation on Ecoinformatics u EPA, European Environment Agency, UNEP u Ecoterm

50 50 Job Posting F Vacancy Announcement--Research position to be posted at LBNL—looking for person with education and experience in semantics, terminology, system development WWW.XMDR.ORG

51 51 Acknowledgements and References F Frank Olken, LBNL F Kevin Keck, LBNL F John McCarthy, LBNL


Download ppt "1 eXtended Metadata Registries (XMDR) NKOS Workshop June 11, 2005 Bruce Bargmeyer Chair: ISO/IEC JTC1/SC32-Data Mgmt & Interchange."

Similar presentations


Ads by Google