eXtended Metadata Registry (XMDR)

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

1 eXtended Metadata Registry (XMDR) Two Slides for Ontology Summit Presentation Bruce Bargmeyer Lawrence Berkeley National Laboratory and University of.
Ontology Assessment – Proposed Framework and Methodology.
Extended Metadata Registry (XMDR) September 2004 Bruce Bargmeyer +1 (510) Interagency/International Cooperation on Ecoinformatics.
1 Extended Metadata Registries and Semantics April 18, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkeley National Laboratory.
Direction of Proposals for New Edition (E3) of ISO/IEC 11179
Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
1 eXtended Metadata Registries (XMDR) NKOS Workshop June 11, 2005 Bruce Bargmeyer Chair: ISO/IEC JTC1/SC32-Data Mgmt & Interchange.
Environmental Terminology System and Services (ETSS) June 2007.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 The Enhanced Entity- Relationship (EER) Model.
Biological Ontologies Neocles Leontis April 20, 2005.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
SDC JE-xxxx. Bruce Bargmeyer EPA/OIRM/EIM Division Tel: (202) WWW URL:
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Future of MDR - ISO/IEC Metadata Registries (MDR) Larry Fitzwater, SC 32 WG 2 Convener Computer Scientist U.S. Environmental Protection Agency May.
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries 1 SC 32 Tutorial Session WG 2 eXtended Metadata Registry (XMDR) April 19,
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
9 th Open Forum on Metadata Registries Harmonization of Terminology, Ontology and Metadata 20th – 22nd March, 2006, Kobe Japan. XMDR Prototype Day: 21.
Practical RDF Chapter 1. RDF: An Introduction
Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment.
A Standard & Prototype Starting Point for An Open Ontology Repository: The Extended Metadata Registry Project John L. McCarthy XMDR Project Lawrence Berkeley.
Environmental Terminology Research in China HE Keqing, HE Yangfan, WANG Chong State Key Lab. Of Software Engineering
Of 39 lecture 2: ontology - basics. of 39 ontology a branch of metaphysics relating to the nature and relations of being a particular theory about the.
SDC JE-Matsue May 1999 Bruce Bargmeyer U.S. Environmental Protection Agency Tel: (202) WWW URL:
1 Extended Metadata Registry (XMDR) November 2004 Bruce Bargmeyer +1 (510) ISO/IEC JTC 1/SC 32/WG 2.
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
SDC JE-8020 February 18, 1999 Bruce Bargmeyer EPA/OIRM/EIM Division Tel: (202) WWW URL:
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
FEA Data and Information Reference Model (DRM): the Interoperability Message Presented by Eliot Christian, USGS based on work of ISO/IEC JTC1/SC32 Data.
Sommerville 2004,Mejia-Alvarez 2009Software Engineering, 7th edition. Chapter 8 Slide 1 System models.
F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National.
th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck Lawrence.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Knowledge Representation Semantic Web - Fall 2005 Computer.
Proposed NWI KIF/CG --> Common Logic Standard A working group was recently formed from the KIF working group. John Sowa is the only CG representative so.
Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries 1 SC 32 Tutorial Session WG 2 Metadata Registries – Next Edition April 18,
1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Ispra, Italy January 17, 2006 Bruce Bargmeyer, Lawrence Berkley.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
1 eXtended Metadata Registry (XMDR) Ecoterm Rome, Italy May 17, 2006 Bruce Bargmeyer, Lawrence Berkley National Laboratory University of California Tel:
EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007.
Introduction to the Semantic Web and Linked Data
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
1 Technical Projects Workgroup Report to Plenary Ecoinformatics International Technical Collaboration April 10, 2008 Research Triangle Park, North Carolina,
SDC JE-2027 January 18, 2000 Bruce Bargmeyer Chair, SC 32 – Data Management and Interchange U.S. Environmental Protection Agency Telephone: (202)
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
Terminology Components for Ecoinformatics Sharing Gail Hodge Consultant to USGS BIO/NBII Information International Associates, Inc. 28 January 2004 science.
SDC JE-xxxx September 1999 Bruce Bargmeyer U.S. Environmental Protection Agency Tel: (202) WWW URL:
Concept Proposal Sixth Open Forum on Metadata Registries Semantic Interoperability between Registries To be held January 20-24, 2003 Bruce Bargmeyer
Data Element Classification ISO/IEC 11179, Part 2
31 st October – 4 th November 2011 Fall 2011 Meeting Agenda Boulder, Colorado, USA SOIS Application Support Services WG Device Virtualisation & EDS Coordination.
International/Interagency Collaboration – IT for Environmental Information & Environmental Data Exchange Network Copenhagen, Denmark April 25, 2002 Bruce.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Semantics and the EPA System of Registries Gail Hodge IIa/ Consultant to the U.S. Environmental Protection Agency 18 April 2007.
Update on Ecoinformatics Technical Working Group Activities Larry Fitzwater Computer Scientist US Environmental Protection Agency Rome, Italy – 17 May.
Ontology Technology applied to Catalogues Paul Kopp.
Extended Metadata Registries and Semantics (Part 2: Implementation) Karlo Berket Ecoterm IV Environmental Terminology Workshop April 18, 2007 Diplomatic.
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
Concept Presentation Sixth Open Forum on Metadata Registries To be held January 20-24, 2003 Bruce Bargmeyer
SDC JE-8020 February 18, 1999 Bruce Bargmeyer EPA/OIRM/EIM Division Tel: (202) WWW URL:
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
The Semantic Web By: Maulik Parikh.
ece 627 intelligent web: ontology and beyond
Report on Eighth Open Forum on Metadata Registries, Berlin, April 2005
ece 720 intelligent web: ontology and beyond
Ecoinformatics Technical Projects Workgroup
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
Presentation transcript:

eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley National Laboratory University of California Tel: +1 510-495-2905 bebargmeyer@lbl.gov

XMDR Project Background Collaborative, interagency effort EPA, USGS, NCI, Mayo Clinic, DOD, LBNL …& others Draws on and contributes to interagency/International Cooperation on Ecoinformatics Involves Ecoterm, international, national, state, local government agencies, other organizations Recognizes great potential of semantic computing, management of metadata Improving collection, maintenance, dissemination, processing of very diverse data structures Collaboration arises from needs for traditional data administration, for sharing data across multiple organizations, for managing complex semantics associated with data, and for emerging semantics computing capbilities Collaborative, Interagency effort Draws on and Contributes to Interagency Cooperation on Ecoinformatics Involving International, National, State and Local Government Agencies and other Organizations Recognizes Great Potential of Semantics-based Computing, Management of Metadata for Improving Collection, Maintenance, Dissemination, Processing of Very Diverse Data Structures Collaboration Arises from the Need to Share Diverse Data Across Multiple Organizations Many Players, Many Interests…Shared Context

11179 Metadata Registries Extensions Register (and manage) any semantics that are useful for managing data. E.g., this may include registering not only permissible values (concepts), definitions, but may extend to registration of the full concept systems in which the permissible values are found. E.g., may want to register keywords, thesauri, taxonomies, ontologies, axiomitized ontologies…. Support traditional data management and data administration Lay Foundation for semantic computing: Semantics Service Oriented Architecture, Semantic Grids, Semantics based workflows, Semantic Web ….

XMDR Draws Together Metadata Terminology Registries Users CONCEPT Referent Refers To Symbolizes Stands For “Rose”, “ClipArt” Metadata Registry Terminology Thesaurus Themes Data Standards Ontology GEMET Structured Metadata 11179 Metadata Registry

What is Metadata? What is Terminology (a concept system)? Users Metadata Registries Terminology CONCEPT Referent Refers To Symbolizes Stands For “Rose”, “ClipArt” Metadata Registry Terminology Thesaurus Themes Data Standards Ontology GEMET Structured Metadata 11179 Metadata Registry

Metadata Registries Data Element Concept Data Elements Name: Country Identifiers Context: Definition: Unique ID: 5769 Conceptual Domain: Maintenance Org.: Steward: Classification: Registration Authority: Others Afghanistan Belgium China Denmark Egypt France Germany ………… Data Element Concept Data Elements Afghanistan Belgium China Denmark Egypt France Germany ………… AFG BEL CHN DNK EGY FRA DEU ………… 004 056 156 208 818 250 276 ………… Name: Context: Definition: Unique ID: 4572 Value Domain: Maintenance Org.: Steward: Classification: Registration Authority: Others Name: Context: Definition: Unique ID: 3820 Value Domain: Maintenance Org.: Steward: Classification: Registration Authority: Others Name: Context: Definition: Unique ID: 1047 Value Domain: Maintenance Org.: Steward: Classification: Registration Authority: Others In order to reduce costs associated with managing metadata, we want to enable interchange of metadata including terminology between metadata registries. Organizations that are responsible for particular terminology and data elements can propagate these changes to other 11179 metadata registries. ISO 3166 English Name ISO 3166 3-Alpha Code ISO 3166 3-Numeric Code

What is Metadata/Terminology? Fuji Variety name Product look-up (PLU) code 4129 Product of Canada Country of origin PLU codes consist of 4 to 5 numbers 4 numbers = conventional produce 5 numbers, starting with 9 = organic produce 5 numbers, staring with 8 = genetically engineered produce PLU codes are established by the International Federation for Produce Coding, A coalition of fruit and vegetable associations coordinated by the Produce Marketing Association.

What is Metadata/Terminology? Fuji Variety name Product look-up (PLU) code 4129 Product of Canada Country of origin New PLUs are assigned by the PEIB.  More information about the PEIB may be found at their website: Produce Electronic Information Board.

What is Metadata/Terminology? Fuji Variety name 4129 Product look-up (PLU) code of Canada Country origin 4128 Apple, Cripps Pink, Small, 100 size and smaller, less than 205g, [Notes: As of 1 Jun 2001, Pink Lady® is a registered trademark of certain Cripps Pink apples, Last revised: 1 Jun 2002] 4129 Apple, Fuji, Small, 100 size and smaller, less than 205g 4130 Apple, Cripps Pink, Large, 88 size and larger, 205g and above, [Notes: As of 1 Jun 2001, Pink Lady® is a registered trademark of certain Cripps Pink apples, Last revised: 1 Jun 2002] 4131 Apple, Fuji, Large, 88 size and larger, 205g and above

What is Metadata/Terminology? Fuji Variety name 4129 Product look-up (PLU) code of Canada Country origin Fruit: the developed ovary of a seed plant with its contents and accessory parts, as the pea pod, nut, tomato, or pineapple. Fruit Apple Orange

What is Metadata/terminology? Fuji Variety name 4129 Product look-up (PLU) code of Canada Country origin fruit Fruit (frÁt), n., pl. fruits, (esp. collectively) fruit, v. –n. 1. any product of plant growth useful to humans or animals. 2. the developed ovary of a seed plant with its contents and accessory parts, as the pea pod, nut, tomato, or pineapple. 3. the edible part of a plant developed from a flower, with any accessory tissues, as the peach, mulberry, or banana. 4. the spores and accessory organs of ferns, mosses, fungi, algae, or lichen. 5. anything produced or accruing; product, result, or effect; return or profit: the fruits of one's labors. 6. Slang (disparaging and offensive). a male homosexual. 7. –v.i., v.t. to bear or cause to bear fruit: a tree that fruits in late summer; careful pruning that sometimes fruits a tree.

What is Metadata/Terminology? Fuji Variety name 4129 Product look-up (PLU) code of Canada Country origin fruit Fruit Apple Orange Fly Horse Fruit Flies Fruit flies lay eggs in fruit

What is Terminology? “Rose”, “ClipArt” Refers To Symbolizes Stands For CONCEPT Referent Refers To Symbolizes Stands For “Rose”, “ClipArt” C.K. Ogden/I.A. Richards, The Meaning of Meaning A Study in the Influence of Language upon Thought and The Science of Symbolism London 1923, 10th edition 1969

Registering Terminology Definition: Any of several game fishes of the genus Salmo, related to the salmon... Concept Refers To Symbolizes Term Referent Stands For trout Salmo trutta brown trout truite

Registering Terminology any of several game fishes of the genus Salmo, related to the salmon... Concept Terms Context trout Salmo trutta truite common name scientific name French name UID=6349

Concepts into Data Data Elements Name: trout species Definition: The names of species of trout. Values: brook trout Salvelinus fontinalis brown trout Salmo trutta cutthroat trout Oncorhynchus clarkii Terms Context Concept Brown trout Salmo trutta truite common name scientific name French name UIN=6349

XML Schemas EDI Messages Systems: STORET Envirofacts . . . DBMS Query XML Schemas EDI Messages Data Interchange W3C RDF Vocabularies Ontology Data Elements Terms Context Concept Brown trout Salmo trutta truite common name scientific name French name UIN=6349

Continuing Challenge Synonyms, Homonyms, Provenance Synonyms: so many ways to name, identify, and state the same thing (one concept--many terms) Homonyms: different meanings for the same terms and identifiers (one term—many concepts) Provenance: How to record the who, where, when, why, and how that is relevant to data

Two Points of View I wanna be free: Programs, system developers, scientists, … that want to get something done quickly, without the drag of documentation and uniformity. Let me do it quickly, my way, and let others accept it. Coherence within some large Universe of Discourse Data users who want to get a coherent view across the boundaries of individual programs, organizations, scientific studies. E.g., media specific programs. Harmonize and standardize data and terminology. Document data/terminology in structured ways. Then easier to find, access, analyze, understand and use data. Market driven approaches to data management may provide a means to draw these closer together. E.g., anyone can register anything, and a community of interest gives it some declared level of acceptance.

Data Management Evolution Trying to manage semantics: What does data mean? Can data be compared? What is the provenance of data? [Freedom vs. Coherence] 3rd Generation languages – naming conventions, system documentation Data Base Management Systems – Data dictionary for schema, valid values, etc. Metadata Registries for data sharing organization-wide or across environmental domain of discourse XML – Metadata Registries and XML registries for managing XML tags, data, and XML artifacts. Semantic computing – Metadata Registries for managing the “vocabulary” and concept systesm, e.g., ontologies.

Movement Toward Semantics Management Going beyond traditional Data Standards and Data Administration In addition to anchoring data with definitions, we want to process data and concepts based on context and relationships, possibly using inferences and rules. In addition to natural language, we want to capture semantics with more formal description techniques FOL, DL, Common Logic, OWL Going beyond information system interoperability and data interchange to processing based on inferences and probabilistic correspondence between concepts found in natural language (in the wild) and both data in databases and concepts found in concept systems.

Purposes of XMDR Project 1. Propose revisions to 11179 Parts 2 & 3 (3rd Ed.) – to serve as the design for the next generation of metadata registries. 2. Demonstrate Reference Implementation – to validate the proposed revisions Extend semantics management capabilities Enable registration of correspondences between multiple concept systems and between concept systems and data Explore uses of terminologies and ontologies Systematize representation of concepts and relationships Enable registration of metadata for knowledge bases Adapt & test emerging semantic technologies Provide an environment for developing and interrelating ontologies

What is an ontology? The subject of ontology is the study of the categories of things that exist or may exist in some domain. The product of such a study, called an ontology, is a catalog of the types of things that are assumed to exist in a domain of interest D from the perspective of a person who uses a language L for the purpose of talking about D. The types in the ontology represent the predicates, word senses, or concept and relation types of the language L when used to discuss topics in the domain D. Building, Sharing, and Merging Ontologies-John F. Sowa

Terminolocgical & Formal (Axiomatized) Ontologies The difference between a terminological ontology and a formal ontology is one of degree: as more axioms are added to a terminological ontology, it may evolve into a formal or axiomatized ontology. Cyc has the most detailed axioms and definitions; it is an example of an axiomatized or formal ontology. EDR and WordNet are usually considered terminological ontologies. Building, Sharing, and Merging Ontologies John F. Sowa

An Axiom for an Axiomatized Ontology Definition: The resource_cost_point predicate, cpr, specifies the cost_value, c, (monetary units) of a resource, r, required by an activity, a, upto a certain time point, t. If a resource of the terminal use or consume states, s, for an activity, a, are enabled at time point, t, there must exist a cost_value, c, at time point, t, for the activity, a,that uses or consumes the resource, r. The time interval, ti = [ts, te], during which a resource is used or consumed byan activity is specified in the use or consume specifications as use_spec(r, a, ts, te, q) or consume_spec(r, a, ts, te, q) where activity, a, uses or consumes quantity, q, of resource, r, during the time interval [ts, te]. Hence, Axiom:∀ a, s, r, q, ts, te, (use_spec(r, a, ts, te, q)∧ enabled(s, a, t))∨ (consume_spec(r, a, ts, te, q)∧ enabled(s, a, t))≡∃c, cpr(a,c,t,r) Cost Ontology for TOronto Virtual Enterprise (TOVE)

Samples of Eco & Bio Graph Data Nutrient cycles in microbial ecologies These are bipartite graphs, with two sets of nodes, microbes and reactants (nutrients), and directed edges indicating input and output relationships. Such nutrient cycle graphs are used to model the flow of nutrients in microbial ecologies, e.g., subsurface microbial ecologies for bioremediation. Chemical structure graphs: Here atoms are nodes, and chemical bonds are represented by undirected edges. Multi-electron bonds are often represented by multiple edges between nodes (atoms), hence these are multigraphs. Common queries include subgraph isomorphism. Chemical structure graphs are commonly used in chemoinformatics systems, such as Chem Abstracts, MDL Systems, etc. Sequence data and multiple sequence alignments . DNA/RNA/Protein sequences can be modeled as linear graphs Topological adjacency relationships also arise in anatomy. These relationships differ from partonomies in that adjacency relationships are undirected and not generally transitive.

Eco & Bio Graph Data (Continued) Taxonomies of proteins, chemical compounds, and organisms, ... These taxonomies (classification systems) are usually represented as directed acyclic graphs (partial orders or lattices). They are used when querying the pathways databases. Common queries are subsumption testing between two terms/concepts, i.e., is one concept a subset or instance of another. Note that some phylogenetic tree computations generate unrooted, i.e., undirected. trees. Metabolic pathways: chemical reactions used for energy production, synthesis of proteins, carbohydrates, etc. Note that these graphs are usually cyclic. Signaling pathways: chemical reactions for information transmission and processing. Often these reactions involve small numbers of molecules. Graph structure is similar to metabolic pathways. Partonomies are used in biological settings most often to represent common topological relationships of gross anatomy in multi-cellular organisms. They are also useful in sub-cellular anatomy, and possibly in describing protein complexes. They are comprised of part-of relationships (in contrast to is-a relationships of taxonomies). Part-of relationships are represented by directed edges and are transitive. Partonomies are directed acyclic graphs. Data Provenance relationships are used to record the source and derivation of data. Here, some nodes are used to represent either individual "facts" or "datasets" and other nodes represent "data sources" (either labs or individuals). Edges between "datasets" and "data sources" indicate "contributed by". Other edges (between datasets (or facts)) indicate derived from (e.g., via inference or computation). Data provenance graphs are usually directed acyclic graphs.

A graph theoretic characterization Readily comprehensible characterization of metadata structures Graph structure has implications for: Integrity Constraint Enforcement Data structures Query languages Combining metadata sets Algorithms for query processing

Definition of a graph Graph = vertex (node) set + edge set Nodes, edges may be labeled Edge set = binary relation over nodes cf. NIAM Labeled edge set RDF triples (subject, predicate, object) predicate = edge label Typically edges are directed

Example of a graph infectious disease is-a is-a influenza measles

Types of Metadata Graph Structures Trees Partially Ordered Trees Ordered Trees Faceted Classifications Directed Acyclic Graphs Partially Ordered Graphs Lattices Bipartite Graphs Directed Graphs Cliques Compound Graphs

Graph Taxonomy Graph Directed Graph Undirected Graph Directed Acyclic Graph Bipartite Graph Clique Partial Order Graph Faceted Classification Lattice Partial Order Tree Tree Note: not all bipartite graphs are undirected. Ordered Tree

Trees In metadata settings trees are almost most often directed edges indicate direction In metadata settings trees are usually partial orders Transtivity is implied (see next slide) Not true for some trees with mixed edge types. Not always true for all partonomies

Example: Tree California part-of part-of Alameda County Santa Clara County part-of part-of part-of part-of Oakland Berkeley Santa Clara San Jose

Trees - cont. Uniform vs. non-uniform height subtrees fixed number of levels common in dimensions of multi-dimensional data models Non-uniform height subtrees common terminologies

Partially Ordered Trees A conventional directed tree Plus, assumption of transitivity Usually only show immediate ancestors (transitive reduction) Edges of transitive closure are implied Classic Example: Simple Taxonomy, “is-a” relationship

Example: Partial Order Tree Disease is-a is-a Infectious Disease Chronic Disease is-a is-a is-a is-a Polio Smallpox Diabetes Heart disease Signifies inferred is-a relationship

Ordered Trees Order here refers to order among sibling nodes (not related to partial order discussed elsewhere) XML documents are ordered trees Ordering of “sub-elements” is to support classic linear encoding of documents

Example: Ordered Tree Paper part-of part-of part-of Title page Section Bibliography Note: implicit ordering relation among parts of paper.

Faceted Classification Classification scheme has mulitple facets Each facet = partial order tree Categories = conjunction of facet values (often written as [facet1, facet2, facet3]) Faceted classification = a simplified partial order graph Introduced by Ranganathan in 19th century, as Colon Classification scheme Faceted classification can be descirbed with Description Logc, e.g., OWL-DL

Example: Faceted Classification Wheeled Vehicle Facet Vehicle Propulsion Facet is-a is-a is-a is-a is-a 2 wheeled 3 wheeled 4 wheeled Human Powered Internal Combustion is-a is-a is-a is-a is-a is-a is-a is-a is-a Bicycle Tricycle Auto Motorcycle

Faceted Classifications and Multi-dimensional Data Model MDM – a.k.a. OLAP data model Online Analytical Processing data model Star / Snowflake schemas Fact Tables fact = function over Cartesian product of dimensions dimensions = facets geographic region, product category, year, ...

Directed Acylic Graphs Directed edges No cycles No assumptions about transitivity (e.g., mixed edge types, some partonomies) Nodes may have multiple parents Examples: Partonomies (“part-of”) - transitivity is not always true

Example: Directed Acyclic Graph Vehicle Wheeled Vehicle is-a is-a Propelled Vehicle is-a is-a is-a is-a is-a 3 Wheeled Vehicle 4 Wheeled Vehicle Human Powered Vehicle 2 Wheeled Vehicle Internal Combustion Vehicle is-a is-a is-a is-a is-a is-a is-a is-a is-a Bicycle Tricycle Auto Motorcycle

Partial Order Graphs Directed acyclic graphs + inferred transitivity Nodes may have multiple parents Most taxonomies drawn as transitive reduction, transitive closure edges are implied. Examples: all taxonomies most partonomies multiple inheritance POGs can be described in Description Logic, e.g., OWL-DL

Example: Partial Order Graph Vehicle Wheeled Vehicle is-a is-a Propelled Vehicle is-a is-a is-a is-a is-a 2 Wheeled Vehicle 3 Wheeled Vehicle 4 Wheeled Vehicle Human Powered Vehicle Internal Combustion Vehicle is-a is-a is-a is-a is-a is-a is-a is-a is-a Bicycle Tricycle Auto Motorcycle Dashed line = inferred is-a (transitive closure)

Directed Graph Generalization of DAG (directed acyclic graph) Cycles are allowed Arises when many edge types allowed Example: UMLS

Lattices A partial order For every pair of elements A and B Example: There exists a least upper bound There exists a greatest lower bound Example: The power set (all possible subsets) of a finite set LUB(A,B) = union of two sets A, B GLB(A,B) = intersect of two sets A,B

Example Lattice: Powerset of 3 element set {a,b,c} {a,b} {a,c} {b,c} {c} {a} {b} Empty Set Denotes subset

Lattices - Applications Formal Concept Analysis synthesizing taxonomies Machine Learning concept learning

Bipartite Graphs Vertices = two disjoint sets, V and W All edges connect one vertex from V and one vertex from W Examples: mappings among value representations mappings among schemas (entity/attribute, relationship) nodes in Conceptual Graphs

Example Bipartite Graph California CA Massachusetts MA Oregon OR States Two-letter state codes

Clique Clique = complete graph (or subgraph) all possible edges are present Used to represent equivalence classes Typically, on undirected graphs

Example of Clique California Calif. CA CAL Here edges denote synonymy.

Compound Graphs Edges can point to/from subgraphs, not just simple nodes Used in conceptual graphs CG is isomorphic to First Order Logic Could be used to specify contexts for subgraphs

Example Compound Graph Colin Powell claimed had Iraq WMDs

Conclusions We can characterize metadata structure in terms of graph structures Partial Order Graphs are the most common structure: used for taxonomies, partonomies support multiple inheritance, faceted classification implicit inclusion of inferred transitive closure edges

Challenges How to register & manage the various graph structures? DBMS, File systems …. How to query the graph structures? XQuery for XML Poor to non-existent graph query languages How to get adequate performance, even in high performance computing environment User interface complexity How to manage semantic drift Versions How to interrelate graphs with other graphs and with data Granularity at which to register metadata (then point to greater detail elsewhere?)

How can Terminologies and Ontologies help Manage Metadata? At the level of metadata instances in a registry, connect metadata entities via shared terms via automatic indexing of metadata words via text values from specific metadata elements At the level of the 11179 (or other) metamodel, ontologies can help specify formal relationships is-a and part-of hierarchies, etc. Inheritance, aggregation, … for automatic searching of sub-classes & inverses to specify semantic pathways for indexing

Major Tasks, Deliverables & Milestones Initial Architecture Design Present Proposed 11179 Part 2 Revisions to SC32 WG2 mtg in DC Research and Development Task/Deliverable Test Implementation (External Users) Prepare Draft Revision of 11179 Part 2 for SC32 mtg in Berlin System Test & Evaluation (Internal Participants) Identify, Select Metadata Sources Identify, Select Technologies Develop Project Plan System TestDevelop Project Plan - LBNL – Jul 04 Develop and Test RI – Extend (Revise) 11179 Part 2 Standard April 05 – LBNL Identify, Select Appropriate Technologies – DoD – Dec 04 Identify, Select Metadata Sources – LBNL – Dec 04 Content Assessment and Selection – LBNL – Aug – Dec 04 Install Contents into Test and Demo System – LBNL – Dec 04 to End of Project Test and Demo System Metadata Content Selection and Prioritization Test and Demo System Content Acquisition Test and Demo System Content Installation System Test and Feedback – All – NLT Feb 05 Technical Assistance – All – As Required General Limited User Interface Registration of Content will be Prioritized to Ensure Optimal Coverage (esp. Across versions, domains) Help Functions will be Limited to Loading Metadata Sources and Testing Query Optimization will be Limited Most Deliverable Documents will be to Recommend Choices During RDT&E of the RI. They will be Working Documents and Not Necessarily Publication Quality Project Duration is Expected to be July 04 – May 05; and may be extended if additional resources are available Gantt Chart Forthcoming

General Tasks/Intentions Limited Query Optimization Brief ISO/IEC L8 Documents will Recommend Choices Task/Intention Brief DOD Metadata Working Group Brief IC Metadata Working Group Limited Help Functions Prioritized Content Registration Limited User Interface - Initially General Limited User Interface Registration of Content will be Prioritized to Ensure Optimal Coverage (esp. Across versions, domains) Help Functions will be Limited to Loading Metadata Sources and Testing Query Optimization will be Limited Most Deliverable Documents will be to Recommend Choices During RDT&E of the RI. They will be Working Documents and Not Necessarily Publication Quality Project Duration is Expected to be July 04 – May 05; and may be extended if additional resources are available Will Seek to Promote Awareness

Potential Standards/Technologies DBMS Object, XML, Relational, RDF/Graph, Logic, Text, Document, Multimedia Knowledge Representation Web Ontology Language (OWL) Common Logic (CL) Middleware/Messaging Cocoon 2, Jini, CoABS, JMS, XMLBlaster, SOAP XML [Semantic] Web Services Axis, JWSDP Agent Development ABLE, JADE Engines/Servers OMS (IBM), Federator/OMS (OWI) Jess Open Source and Risk Tolerant

Architecture Approach Fully modular approach Exemplars: Apache Web Server Eclipse IDE Protégé Ontology Editor Benefits: numerous modules are relatively easy to implement clean separation of concerns and high reusability and portability tooling support required is minimal Collaborative, Interagency effort to Extend the Semantics Management Capabilities of ISO/IEC 11179 Test and Demonstrate Extended Capabilities with a Reference Implementation Serve as a Design for Operational 11179 Registries Focuses on Extending Classification and Terminology Aspects Specified in 11179 Part 2 Adapting and Adopting Useful Emerging Technologies Result of Work will be Draft Documents that Propose Revisions for ISO/IEC 11179 Parts 2 and 3 as Version 3 Additionally, the Reference Implementation (RI) Prototype will Help Resolve Issues of How to Register, Manage, and Interrelate Complex Metadata Standards

XMDR Prototype Architecture: Initial Implemented Modules Registry External Interface RegistryStore WritableRegistryStore MetadataValidator AuthenticationService MappingEngine Java RetrievalIndex FullTextIndex Subversion Ontology Editor Protege 11179 OWL Ontology Jena, Xerces LogicBasedIndex Jena, OWI KS Racer Lucene Generalization Composition (tight ownership) Aggregation (loose ownership)

XMDR Content Priority List Phase 1 (V.A) National Drug File Reference Terminology (?) DTIC Thesaurus (Defense Technology Info. Center Thesaurus) NCI Thesaurus National Cancer Institute Thesaurus NCI Data Elements (National Cancer Institute Data Standards Registry UMLS (non-proprietary portions) GEMET (General Multilingual Environmental Thesaurus) EDR Data Elements (Environmental Data Registry) ISO 3166 Country Codes – from EPA EDR USGS Geographic Names Information System (GNIS) Collaborative, Interagency effort to Extend the Semantics Management Capabilities of ISO/IEC 11179 Test and Demonstrate Extended Capabilities with a Reference Implementation Serve as a Design for Operational 11179 Registries Focuses on Extending Classification and Terminology Aspects Specified in 11179 Part 2 Adapting and Adopting Useful Emerging Technologies Result of Work will be Draft Documents that Propose Revisions for ISO/IEC 11179 Parts 2 and 3 as Version 3 Additionally, the Reference Implementation (RI) Prototype will Help Resolve Issues of How to Register, Manage, and Interrelate Complex Metadata Standards

XMDR Content Priority List Phase 2 LOINC Logical Observation Identifiers Names and Codes ITIS Integrated Taxonomic Information System Getty Thesaurus of Geographic Names (TGN) SIC (Standard Industrial Classification System) NAICS (North American Industrial Classification System) NAIC-SIC mappings UNSPSC (United Nations Standard Products and Services Codes) EPA Chemical Substance Registry System EPA Terminology Reference System ISO Language Identifiers ISO 639-3 Part 3 IETF Language Identifiers RFC 1766 Units Ontology Collaborative, Interagency effort to Extend the Semantics Management Capabilities of ISO/IEC 11179 Test and Demonstrate Extended Capabilities with a Reference Implementation Serve as a Design for Operational 11179 Registries Focuses on Extending Classification and Terminology Aspects Specified in 11179 Part 2 Adapting and Adopting Useful Emerging Technologies Result of Work will be Draft Documents that Propose Revisions for ISO/IEC 11179 Parts 2 and 3 as Version 3 Additionally, the Reference Implementation (RI) Prototype will Help Resolve Issues of How to Register, Manage, and Interrelate Complex Metadata Standards

XMDR Content Priority List Phase 3 HL7 Terminology HL7 Data Elements GO (Gene Ontology) NBII Biocomplexity Thesaurus EPA Web Registry Controlled Vocabulary BioPAX Ontology NASA SWEET Ontologies NDRTF Collaborative, Interagency effort to Extend the Semantics Management Capabilities of ISO/IEC 11179 Test and Demonstrate Extended Capabilities with a Reference Implementation Serve as a Design for Operational 11179 Registries Focuses on Extending Classification and Terminology Aspects Specified in 11179 Part 2 Adapting and Adopting Useful Emerging Technologies Result of Work will be Draft Documents that Propose Revisions for ISO/IEC 11179 Parts 2 and 3 as Version 3 Additionally, the Reference Implementation (RI) Prototype will Help Resolve Issues of How to Register, Manage, and Interrelate Complex Metadata Standards

Coming Year (Proposed) Extension of XMDR core – data & system Semantic Services Greater interaction with Ecoterm organizations Interaction with Ecoinformatics Test Bed project

Acknowledgements and References Frank Olken, LBNL Kevin Keck, LBNL John McCarthy, LBNL