Download presentation
Presentation is loading. Please wait.
Published byHester Richards Modified over 9 years ago
1
10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics
2
Coping with Semantics in XML Document Management 2 Overview Introduction –Motivation –XML: A Semantic Perspective –XML Document Types XML Semantic Problems –XML: A Database Perspective –Common Mapping Problems RM-ODP Viewpoints on XML Documents –Content View vs. Logical Layout View –Example Realization of XML Document Management: Nesting of Viewpoints Conclusions
3
Coping with Semantics in XML Document Management 3 Motivation Aim: XML Document Management using Database Systems Problem: Map XML Documents to Databases –different approaches –no mapping rules –many open issues Reason: Semantics of XML not well understood –XML: only syntax, no predefined semantics Introduction XML Semantic Problems Viewpoints on XML Documents Realization Conclusions
4
Coping with Semantics in XML Document Management 4 XML - A Semantic Perspective User-Defined Markup –structure the character data of a document –explain the documents through the use of names Naming –RMD-ODP: “A name is a term that refers to an entity in a given naming context.“ –XML namespaces no solution –possible improvement: shared ontologies No Standard Behavior of Tags –XSL processors: flexible presentation of XML document –XML processor: check well-formedness and validity of the XML document –open issue: document object semantics Introduction XML Semantic Problems Viewpoints on XML Documents Realization Conclusions
5
Coping with Semantics in XML Document Management 5 XML Document Types Data-Centric Documents –designed for machine consumption (XML for data transport) –examples: sales orders, stock quotes, flight schedules –fairly regular structure –fine-grained data Document-Centric Documents –designed for human readers –examples: books, journal articles, emails –less regular structure –coarse-grained data Hybrid Documents –composition of documents of different types –example: medical documents = patient data + findings + prescriptions + procedures Document Type Requirements to the Document Management System Introduction XML Semantic Problems Viewpoints on XML Documents Realization Conclusions
6
Coping with Semantics in XML Document Management 6 XML - A Database Perspective Round-Trip Problem –store an XML document in a database and retrieve the “same“ document back again –vital to applications required by law to keep exact copies of documents –less important to data-centric documents focus on the document content ignore the order of sibling elements –many XML-to-DB algorithms don‘t preserve the whole documents CDATA sections character entities comments processing instructions Introduction XML Semantic Problems Viewpoints on XML Documents Realization Conclusions
7
Coping with Semantics in XML Document Management 7 Common Mapping Problems (1) Attributes vs. Element Text –where to store data of a document? –both alternatives possible, influenced by the implementation Meaning of Attributes –ambiguities when interpreting attributes –example: order of a customer has an attribute expiry date = “11/2001“ different meanings: “The order will expire in Nov. 2001“ “The information about the order can be thrown away in Nov. 2001“ “The expiry date is an information about the credit card used for purchase“ Introduction XML Semantic Problems Viewpoints on XML Documents Realization Conclusions
8
Coping with Semantics in XML Document Management 8 Common Mapping Problems (2) Null Values –different semantics of null values –database null values have to be reflected in XML documents –XML Schema: null values in element‘s text can be expressed no concept of null for attributes –DTD: optional elements and attributes Comments, Processing Instructions –considered no content of the document in many algorithms Markup –visible in the logical document layout (e.g., character entities) –substituted in the physical representation of the document –Example: <foo/> stored in a database non-XML aware database don‘t recognize markup Introduction XML Semantic Problems Viewpoints on XML Documents Realization Conclusions
9
Coping with Semantics in XML Document Management 9 Common Mapping Problems (3) Links –links originally designed for documents and document fragments e.g., XPointers point to document subtrees using XPath –not adequate to express semantic relationships among document elements e.g., ID: identifier value - primary key IDREF - foreign key Behavioral Semantics? –another language more appropriate to specify the invariants Sibling Orders –particularly important for document-centric documents –can be arbitrary in data-centric documents Other Invariants (e.g., identity constraints) –specified on the level of instances - not schema –construct the set of all concerned objects (using XPath) before Introduction XML Semantic Problems Viewpoints on XML Documents Realization Conclusions
10
Coping with Semantics in XML Document Management 10 RM-ODP Viewpoints on Documents Physical Presentation View –dependent on media, screen size / paper size –document = composition of characters with attributes (font, size, style) –XML character entities replaced Logical Layout View –composition of prose components (paragraphs, sections, lists, list items) and other objects (e.g., frames, code sections) –mostly ordered composition in document-centric documents –many possible physical presentation views Content View –composition of information objects (title, author, abstract, body, bibliography) –can be organized in a hierarchical structure or can be flat –mapped to several logical layouts Introduction XML Semantic Problems Viewpoints on XML Documents Realization Conclusions
11
Coping with Semantics in XML Document Management 11 Content View vs. Logical Layout Content View –document-centric documents information viewpoint in DTD or XML Schema some constructs to specify structural constraints (e.g., cardinality constraints in XML Schema) –data-centric documents structure not very relevant many invariants among content elements cannot be adequately expressed in DTD / XML Schema possible abuse of XLink / XPointers to specify relationships among content elements Logical Layout –document-centric documents may follow the structure of the content –data-centric documents often arbitrary Introduction XML Semantic Problems Viewpoints on XML Documents Realization Conclusions
12
Coping with Semantics in XML Document Management 12 Data-Centric Documents: Content View Example: Integrity Constraints: –The overall value of an order must exceed a certain minimum. –A customer can submit at most 5 orders. –If a customer is deleted, all of his orders have to be cancelled. Order HeaderLine Item (1,1)(1,N) Customer Product CD How to Map to an XML Document ? OR How to Map to the Logical Layout View? Introduction XML Semantic Problems Viewpoints on XML Documents Realization Conclusions Rel
13
Coping with Semantics in XML Document Management 13 Alternative 1 C1... O1...... O2......... C2... O3...... O4......... Data-Centric Documents: Logical Layout View Alternative 2 O1... C1...... O2... C1...... O3... C2...... O4... C2......... Alternative 3... O1... C1...... O2... C1...... O3... C2...... Introduction XML Semantic Problems Viewpoints on XML Documents Realization Conclusions
14
Coping with Semantics in XML Document Management 14 Operations Operations are viewpoint-specific XML-APIs: DOM / XPath –based on a tree model –although powerful, not appropriate for set-oriented operations Viewpoints vs. Operations –content view: set-oriented operations –logical layout view: navigating operations (on a tree) Need another language to express operations in the content view of data-centric documents! Introduction XML Semantic Problems Viewpoints on XML Documents Realization Conclusions
15
Coping with Semantics in XML Document Management 15 iTE Realization: Nesting of Viewpoints XML Document Content View B “Store“ “Retrieve“ Semantic Model ENTERPRISEINFORMATIONCOMPUT. ENG.TECHNOLOGY RDBMS native XML-DB Logical Layout View B “Store“ “Retrieve“ XML Schema DTD ENTERPRISEINFORMATIONCOMPUT. ENG.TECHNOLOGY File (Template) Large Object Presentation View B “Browse“ “Store“ SVG PDF ENTERPRISEINFORMATIONCOMPUT. ENG.TECHNOLOGY Media: Screen, Paper Introduction XML Semantic Problems Viewpoints on XML Documents Realization Conclusions
16
Coping with Semantics in XML Document Management 16 Conclusions Analyze the requirements first before building an XML system –data-centric vs. document-centric documents –huge impact on the choice of technology (storage platform) Think in viewpoints to understand the semantics –mixed occurrence of content view and logical layout in XML documents –expand viewpoints into the specification of a new system Use generic relationships for constraint modelling Beware of the difference between specification and realization Introduction XML Semantic Problems Viewpoints on XML Documents Realization Conclusions
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.