Presentation is loading. Please wait.

Presentation is loading. Please wait.

SchemaPath: a minimal extension to XML Schema for conditional constraints Paolo Marinelli Claudio Sacerdoti Coen Fabio Vitali University of Bologna (Italy)

Similar presentations


Presentation on theme: "SchemaPath: a minimal extension to XML Schema for conditional constraints Paolo Marinelli Claudio Sacerdoti Coen Fabio Vitali University of Bologna (Italy)"— Presentation transcript:

1 SchemaPath: a minimal extension to XML Schema for conditional constraints Paolo Marinelli Claudio Sacerdoti Coen Fabio Vitali University of Bologna (Italy)

2 Next: Why validate XML documents?2/20 Validation Validation is writing correctness rules for an XML document, and verifying that they hold for every document received. Possible with a number of schema languages, roughly divided in two kinds: grammar-based languages: DTD, XML Schema (XSD), Relax NG, etc. A whole generative grammar is created, and every document that can be built with this grammar is valid. Rule-based languages: Schematron, xlinkit, etc. Rules are defined to check for special conditions (required or rejected). Every document that does not violate any of these rules is valid.

3 Next: The PSVI3/20 XML doc DOM parser Invalid DOM tree DOM Tree + PSVI Schema validator Not well-formed downstream application rules Why validate XML documents? Usually, when receiving data from an unreliable source, programmers intersperse their application code with checks on data values, error handling, remedial procedures, etc. Validation does all checks before submitting the data to the downstream application, removing the need for most of the checks on data values.

4 Next: Unfortunately…4/20 The PSVI XML Schema adds to the validation of XML structures another concept: the decoration of each structure of the XML document with additional information. This is called the Post-Schema Validation Infoset, or PSVI This can be useful for the downstream application, that can activate specific code depending on the PSVI data available for each element. The most important contribution of the PSVI is without doubt the data type: validation code can assess that an element contains a valid date, a valid number, or a valid complex markup structure, so that the downstream application can skip any control on it and call appropriate handling code.

5 Next: Plenty of examples5/20 Unfortunately… … most schema languages cannot express all the structure and data constraints that document designers may need. For example: Mutual exclusion (“element x may have either the a attribute or the b attribute, but not both) Deep exclusions (“element x cannot contain, at any level of its subtree, element y”) Structure-dependent structures (“if the item is gratis (the attribute gratis is present), then no price should be specified (the element price should be absent)”) Data-dependent structures (“if the address is a PO box, then the address must include a PO box number, otherwise it must include a street name and a street number”) These kinds of constraints are known as co-constraints, or co- occurrence constraints. Most real life XML document types have one or more of those constraints.

6 Next: Who cares?6/20 Plenty of examples XHTML “a elements cannot contain other a elements” (appendix B) Both the normative DTD and the non normative XML Schema cannot express fully this requirement (they only express a weaker form: “a elements cannot directly contain other a elements”) XSLT “In a template element at least one of the match and name attributes must be present” Again, the DTD and XML schema cannot express this requirement, and specify both attributes as optional. XML Schema itself “An element definition must either contain a ref or a name attribute, but not both. Furthermore, if the name attribute is present, then the type attribute or one of the simpleType or complexType elements must be present, but not two.” The normative XML schema can only specify all these elements and attributes as optional. … and plenty more…

7 Next: Schematron7/20 Who cares? Documents could contain violations to these rules, and still be considered valid according to the DTD or XML schema. Three solutions: Cross your fingers and hope for the best Provide a default behavior (pick one option and ignore other structures) Provide validation code within the downstream application XML doc DOM parser invalid DOM tree ? ?? DOM Tree + PSVI Schema validator Not well-formed downstream application rules incorrect

8 Next: Extending XML Schema8/20 Schematron Schematron could in fact express most of these requirements (but data- and structure-related structures only through hacks). Schematron lacks generative rules, and they can be specified with great pain, or by mixing Schematron rules with grammar-based rules of another schema language. Suggestions to use XML Schema and Schematron together in one schema document exist in literature. Quite complex in practice, requires competence in both languages, and has problems with PSVI.

9 Next: Our proposal: SchemaPath9/20 Extending XML Schema Our view is that the only practical solution is to extend XML Schema (or another grammar-based language). If the extension is minimal, then implementation costs, learning efforts, and impact on existing schemas are also minimal.

10 Next: SchemaPath syntax (in one slide!)10/20 Our proposal: SchemaPath SchemaPath is our proposal to minimally extend XML Schema to handle co-constraints of all kinds. The idea is to find a way to conditionally assign types to elements and attributes. Furthermore, a non-satisfiable type is added for specifying error conditions to avoid. SchemaPath maintains the XML Schema syntax, adds only ONE construct and ONE pre-defined simple type, maintains important XML Schema properties (the validation theorem and round-tripping and reverse round-tripping properties), and does not impact the PSVI for valid documents. Its simplest implementation is straightforward and trivial (~15 lines of code) in any language and architecture where an XSLT engine and an XML Schema engine already exist. Qualified under namespace http://www.cs.unibo.it/SchemaPath/1.0, but the parser accepts also plain XSD schema namespace.http://www.cs.unibo.it/SchemaPath/1.0

11 Next: A few examples11/20 SchemaPath syntax (in one slide!) : Expresses a condition in the type assignment of an element or an attribute. Its attributes are: cond: an optional XPath expressing the condition that must be verified for the type assignment to be performed. Multiple conditions may be verified, in which case a priority mechanism is employed. An alt elements without an explicit cond attribute implicitly has a low-priority, default, always-true condition. priority: an optional decimal number specifying the priority level of a condition, in case the default priority is unsatisfactory. type: a required XML Schema type name which is assigned to the element or attribute if the condition holds and has the top priority. xsd:error: a predefined unsatisfiable simple type. Assigning this type to an element or an attribute always determines a validation error.

12 Next: Addressing co-constraints: XHTML12/20 A few examples Mutual exclusion “Element x may have either the a attribute or the b attribute but not both”. Suppose we have defined a type myType with both a and b attributes as optional Data-dependent structures “The element quantity must be an integer if the unit element is ‘items’, and it must be a decimal value if the unit element is ‘meters’”. Suppose we have already defined the data type for the unit element to only contain the values “meters” or “items”.

13 Next: Addressing co-constraints: XSLT13/20 Addressing co-constraints: XHTML Deep exclusion of a elements within other a elements “a elements cannot contain other a elements” Suppose we have defined an inlineType to contain all inline elements that can go inside an a element, as well as inside other elements such as b, i, etc.

14 Next: Addressing co-constraints: XML Schema14/20 Addressing co-constraints: XSLT Minimal presence “In a template element at least one of the match and name attribute must be present” Suppose we have already defined a templateType type with the match and name attributes both set as optional

15 Next: Implementation: an XSD preprocessor15/20 Addressing co-constraints: XML Schema Complex mutual exclusions “An element definition must either contain a ref or a name attribute, but not both. Furthermore, if the name attribute is present, then either the type attribute or one of the simpleType or complexType elements must be present.” Suppose we have already defined an elementType with a choice of simpleType and complexType, and the type, ref and name attributes as optional. The conditions could be simpler by using different complex types

16 Next: Our XSLT-based process16/20 Implementation: an XSD preprocessor X DOM parser invalid XSD preprocessor downstream application rules SP rules rules XSD rules X’ ok Schema validator Non well-formed SchemaPath validators can be implemented: From scratch (but they have a complexity in the order of a XML Schema validator) Modifying an existing XML Schema validator (breaking the evolution path of the selected validator) As an XSD preprocessor (i.e. an independent application feeding a plain XML Schema validator) It can be proved that SP validates X iff XSD validates X’

17 Next: An example of the final schema and XML doc17/20 Our XSLT-based process Our test preprocessor is implemented simply with two (rather convoluted) XSLT stylesheets and about 20 lines of real code. The whole process uses a stylesheet T’ to create an XSD schema out of the SchemaPath, and meta-stylesheet MT to generate a stylesheet T’’ to transform the XML document X. The whole schema looks as follows: X rules SP rules rules XSD rules X’ XSLT T’’ XSLT T’ XSLT MT

18 Next: Conclusions18/20 meters 2.5 An example of the final schema and XML doc This used to be the XPath “../unit=‘items’” This used to be the XPath “../unit=‘meters”

19 Next: Thanks!19/20 Conclusions Support for co-constraints is heavily needed in many situations. Many schemas and DTDs contain plain language specifications of co- constraints Some document specifications even lament the lack of support for co- constraints in the schema language The solution is to extend a schema language One grammar, one validation, one schema document The implementation as a pre-processor is a great aid. Conditional type assignments are much cleaner than conditional types The PSVI does not change Good validity properties are preserved Much simpler to implement

20 Thanks! Visit us at http://genesispc.cs.unibo.it:3333/schemapath.asp or http://tesi.fabio.web.cs.unibo.it/schemapath/


Download ppt "SchemaPath: a minimal extension to XML Schema for conditional constraints Paolo Marinelli Claudio Sacerdoti Coen Fabio Vitali University of Bologna (Italy)"

Similar presentations


Ads by Google