Presentation on theme: "Manage Scientific Metadata Using XML Yang, R., M. Kafatos and X. Wang, Managing Scientific Metadata Using XML, IEEE Internet Computing, Volume: 6, Issue:"— Presentation transcript:
Manage Scientific Metadata Using XML Yang, R., M. Kafatos and X. Wang, Managing Scientific Metadata Using XML, IEEE Internet Computing, Volume: 6, Issue: 4, pp July-Aug, 2002
Outline Abstract Introduction Metadata XML DIMES Conclusion
Abstract With explosively increasing volumes of remote sensing, model and other Earth Science data available and the popularity of the Internet, scientists are now facing challenges to publish and to find interesting data sets effectively and efficiently.
Introduction The Earth-observing systems (EOS) satellite Terra alone adds more than half a terabyte of data each day. Metadata have been recognized as a key technology to ease the search and retrieval of Earth science data.
Metadata( ) (data about data) (structure data about data) from
EXAMPLE A - - 1a - 1b - 1a -90/03/22 1b -90/05/20 - -A
90/03/22 90/05/20 A
Metadata Metadata are in very diverse formats since different data providers and data users usually define their own metadata schema.
Example (From )
Metadata How to handle the metadata, therefore, becomes a challenge to the designers and developers of distributed information systems.
XML-Based Distributed Metadata Server (DIMES) In this paper, we discuss the Distributed MEtadata Server (DIMES) prototype system. Designed to be flexible yet simple, DIMES uses XML to represent, store, retrieve and interoperate metadata in a distributed environment.
XML & Metadata The Extensible Markup Language (XML) is ideal for describing ASCII-based data because both human users and computers can understand XML-encoded data. Most Earth science metadata are in ASCII format, and can therefore easily be migrated to XML.
DIMES Currently, most work on XML-based metadata focuses on defining XML structure (tags and relations) for specific scientific disciplines. Our XML-based software solution, on the other hand, supports a wide variety of metadata.
DIMES We have developed such software, based on the XML4J package, with document-type definitions (DTD).
DIMES Metadata model XML query engine Web-based prototype interface
Metadata Model A common weakness of many existing Earth science distributed information systems is the lack of metadata interoperability support. A naive way to integrate metadata from heterogeneous source is to represent metadata from different sources in XML format.
Metadata Model There are two kinds of elements: 1. Node: Element with an ID attribute. 2. Nonnode: Element without ID attribute. A node is uniquely identified by the ID attribute s value.
Metadata Model A node, together with all its nonnode elements, forms a basic information block for describing objects (data or metadata), and is identified by the ID value. We assume the metadata provided is an XML document, and that it is in XML nugget form that is, a separate XML document describes each data object.
XML nugget Metadata Node: Element with an ID attribute Nonnode XML nugget
USING DTD FOR Object identification Type information Node relationships
WHY DTD From an ease-of-use viewpoint, DTD is arguably the best of the six proposed schema languages. XML DTD XML Schema XDR SOX Schematron DSD D. Lee and W.W. Chu, Comparative Analysis of Six XML Schema Languages, SIGMOD Record, vol. 29, no. 3, 2000.
Metadata Model Object identification Each XML nugget has a unique ID value, and an ID attribute goes in the root of the XML nugget.
Metadata Model Type information Since many XML nuggets can describe similar objects, we introduce a new XML element a type node, which is assigned an ID attribute for each object type, and make all XML nuggets that describe similar objects subelements of the type node. Type Node Nonnode XML nugget
Metadata Model Node relationships There are two ways to code node relationships in XML documents: Subtrees Pointers
Node relationships Subtrees When a node is a descendant of another node in the XML tree, the two nodes are related.
Subtrees : Type – Instance relationship The child – parent relationship between two nodes often reflects the type – instance relationship between concepts.
Node relationships Pointers When a node points to another node in the XML tree by an IDREFS attribute, the two nodes are related. Using IDREFS attribute for: node_type type_instances refer_to inline_types
Node relationships There can be multiple types for a single instance, however, so it is desirable for a node to have multiple parents. TYPE Node INSTANCE TYPE Node
Type information Unfortunately, the basic XML model does not support multiple parents for a single element. Hence, we introduce the attributes node_type to record a node s additional parents, and type_instances to record the reverse relationship.
Type information type_instance =3 ID=1 ID=2ID=3 Node_type =4 ID=4
IDRefs attribute: refer_to For simplicity, we assume that the refer_to relationship is symmetric, that is, if node A refers to node B, B also refers back to A.
IDRefs attribute: inline_types Intuitively, a node represents a piece of identifiable metadata. In practice, many nodes share information.
IDRefs attribute: inline_types For example, many data sets have the same temporal coverage, thus we represent temporal-coverage as a node. We can define the temporal-coverage node type as an inline node of dataset nodes by using the inline_types attribute.
Metadata Model This model requires: Well-formed XML. Do not use ID as an attribute name for any elements.
DIMES Metadata Model Summary Data providers could add new nodes, new node attributes, and new links to satisfy their metadata requirements. Additionally, having a flexible system implies that we can preserve much of the original metadata structure.
Basic queries The simplest query is finding a node by its ID. To answer these queries, our XML-based search engine evaluates these conditions on each node, including inline nodes.
Nearest-neighbor search For a given node, its nearest-neighbor node from a given group is the one with the shortest distance. Shortest distance between two nodes: minimum number of relations (type–instance, parent–child, or refer_to) needed to connect the nodes.
EXAMPLE Phenomenon1 Nearest- neighbor 1 Nearest- neighbor n …
Tree-expand query If we choose one node as a root and all its nearest neighbors as the first-level branches, and so on, we will get a tree presentation. In practice, we use the tree-expand query to present the metadata such that users can navigate it easily and understand its results quickly.
Prototype Web Browsers A Web-based Dimes client usually includes a Web interface, an XML translator, and an XML-to-HTML mapper suite.
XML translator When a Web user submits a query, the client passes the query to a specific XML translator, which automatically translates the query into one or more predefined types of queries in XML format, and then sends them to the XML query engine.
XML-to-HTML mapper An XML-to-HTML mapper converts the output from XML into an HTML page, and returns the result to the user. We use Java servlets and XSL Transformations for the translator and mapper tools.
Prototype Web Browsers We have developed two Web-based prototypes for exploring Dimes capabilities. Regular search Metadata navigation
DIMES Conclusion Our work is closely related to mediators in federated databases, with the goal of accommodating various metadata sources into a unified framework. Our long-term goal is to integrate software components with existing data servers to build the Scientific Data and Information Super Servers (SDISS) which are defined here as servers to support interactive access to metadata, data, and domain knowledge.