From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University

From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University http://www-db.stanford.edu/lore/

Introduction Lore –Originally a DBMS designed specifically for semistructured data –Semistructured data models and XML share many similarities –Migrating Lore to work with XML Modifications to data model Changes to query language Changes to DataGuides

OEM (Object Exchange Model) Lore’s original data model All entities are atomic or complex objects Each object has a unique object identifier (oid) Atomic objects contain a value from one of the atomic types (integer, real, string, etc…) Complex objects are sets of pairs Can be thought of as a labeled directed graph –objects are nodes –complex objects have labeled outgoing edges –atomic objects contain their value

Differences between XML and OEM XML has attributes XML is ordered, OEM is not XML does not directly support graph structure –Uses special attribute types to encode graph structure –Example: Attribute Id is of type ID, Colleague is of type IDREF, and Author is of type IDREFS Colleague Author Jennifer WidomJeff Ullman

Literal vs. Semantic Data Model Should an XML data model be a literal tree corresponding to XML’s text representation? (where IDREF(S) are nothing but string attributes) Or should it be a graph that includes all the intended links? (preserving the semantic graph structure) It should be... BOTH! –Both literal and semantic modes should be supported –The user or application can select between the two

Lore’s XML Data Model An XML element is a pair eid is a unique element identifier value is either an atomic text string or a complex value containing the following four components: –A string-valued tag corresponding to the XML tag for that element –An ordered list of attribute-name/atomic-value pairs (attribute- name is a string, atomic-value has an atomic type) –An ordered list of crosslink subelements of the form where label is a string. Crosslink subelements are introduced via an attribute of type IDREF(S) –An ordered list of normal subelements of the form where label is a string. Normal subelements are introduced via lexical nesting within an XML document

XML Document/Graph Example eids appear within nodes (&1, &2, etc…) Attributes appear within brackets next to the nodes Two types of edges: Normal subelement edges labeled with destination subelement’s tag (solid line) Crosslink edges labeled with the attribute name that introduced the link (dashed line) Semantic vs. Literal: In semantic mode, omit attributes of type IDREF(S) In literal mode, omit crosslink edges

Migrating Lorel (Lore’s query language) Distinguishing between attributes and subelements –Lorel uses path expressions A sequence of labels such as DBGroup.Member.Project.Title Can also contain wildcards and regular expressions –Path expression qualifiers differentiate between attributes and subelements Placing a ‘>‘ before a label matches subelements only Placing a ‘@’ before a label matches attributes only Absence of qualifier means match both –Examples: DBGroup.Member.>Name will match name elements that are subelements of DBGroup.Member elements DBGroup.Member.@Name will match name attributes of DBGroup.Member elements DBGroup.Member.Name will match both

Migrating Lorel (continued...) Comparisons –How do we compare two different things? (for example, comparing constants with attribute values) All XML components are treated as atomic values... Functions that transform elements into strings: –Flatten(e) : Ignoring all tags, recursively serialize all text values in the subtree rooted at element e –Concatenate(e) : Concatenates all immediate text children of element e (subelements are ignored) –Tag(e) : Returns the XML tag of element e –Eid(e) : Returns the eid of element e as a string –XML(e) : Tranforms the graph, starting with element e, into an XML document Default Semantics (when no functions are specified): –atomic (Text) element : the text itself –elements with no attributes and only one or more Text elements as children : concatenation of the children’s text values –all others : the element’s eid represented as a string

Migrating Lorel (continued...) Range qualifiers –The expression [range] can be optionally applied to any path expression component or variable Example: select y from DBGroup.Member x, x.Office[1-2] y –returns the first two Office subelements of every group member Example: select y[1-2] from DBGroup.Member x, x.Office y –returns the first two Office subelements over ALL members Order-by clause –Query results are ordered lists of eids that identify the elements selected by the query (attributes are coerced into elements) –order-by-document-order orders results based on original XML document Newly constructed elements are placed at the end of the document order with no specified order among them

Migrating Lorel (continued...) Transformations and structured results –Using queries to restructure XML data The with clause (added to the standard select-from-where construct) –Query result will replicate all data selected by the select clause, along with all data reachable via a set of path expressions in the with clause Skolem functions –Allows more expressive data restructuring –Accepts a list of variables as arguments and produces one unique element for every binding of elements and/or attributes to the arguments Updates –Lorel supports an expressive update language –Changes for XML model: ability to create both attributes and elements order-relevant updates

Migrating Lorel (continued...) DataGuides –Can be used when a DTD is not supplied –A notion of order must be introduced Problem - could result in very large DataGuides –When DTD’s exist, DataGuides are built from those DTD’s –Combining DTD’s and DataGuides DTD’s available for specific portions of an XML database DataGuides can be used over portions not specified by DTD’s Conclusion As of June 1999, the migration of Lore to an XML model is nearly complete

From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University

Similar presentations

Presentation on theme: "From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University

Similar presentations

Presentation on theme: "From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University"— Presentation transcript:

Similar presentations

About project

Feedback