Presentation is loading. Please wait.

Presentation is loading. Please wait.

9/25/08IEEE ICWS 2008 High-Performance XML Parsing and Validation with Permutation Phrase Grammar Parsers Wei Zhang & Robert van Engelen Department of.

Similar presentations


Presentation on theme: "9/25/08IEEE ICWS 2008 High-Performance XML Parsing and Validation with Permutation Phrase Grammar Parsers Wei Zhang & Robert van Engelen Department of."— Presentation transcript:

1 9/25/08IEEE ICWS 2008 High-Performance XML Parsing and Validation with Permutation Phrase Grammar Parsers Wei Zhang & Robert van Engelen Department of Computer Science Florida State University

2 IEEE ICWS 2008 2 9/25/08 Presentation Overview Schema-specific Parsers Related Work PTDX: Table-Driven XML Parser with Permutation Phrase Grammar Performance Conclusion

3 IEEE ICWS 2008 3 9/25/08 Schema-specific parsers Compile-time vs. Run-time Parsers  Compile-time parsing and validation approaches use specialized compilation techniques to generate customized parsers from schemas  Run-time approaches use generic drivers( or engines) and grammar-like representation of schemas Blocking vs. non-blocking Parsers  Blocking parsers may suspend the entire program for sufficient XML content received. E.g. recursive based parsers  Non-blocking parsers always control the program and buffered data can be incrementally supplied Time-efficient vs. Space-efficient Parsers  Time efficient but encoding many states  Space efficient but with backtracking

4 IEEE ICWS 2008 4 9/25/08 Related Work [Van Engelen, 2001]  The earliest work on schema-specific LL(1) recursive descent parser w/ namespace support and validation [Van Engelen, 2004]  Two-level DFA integrating parsing and validation [Chiu et al., 2004]  Using nondeterministic generalized automata to merge all aspects of low-level parsing and validation [Reuter, 2003]  Using Cardinality-Constraint Automaton (CCA) to perform schema-aware validation

5 IEEE ICWS 2008 5 9/25/08 Related Work (Cont’d) [Kostoulas et al., 2006]  An efficient parser generator that translates XML schema into a parser either in C or Java [Matsa, 2007]  Schema-directed interpretive XML parser using special purpose byte-codes. [Zhang et al., 2006]  A table-driven approach parsing and validating in a single pass  Generator that translates schema in C

6 IEEE ICWS 2008 6 9/25/08 PTDX: Table-Driven XML Parser with Permutation Phrase Table-driven grammar-based parser  Extended LL(1) grammar with permutation phrase support  Parsing table is constructed from extended LL(1) permutation grammar Run-time parser  Generic parsing engine (2-stack PDA) Both time and space efficient  Predictive parsing  Integrating parsing and validation into a single pass  No buffering  Operating on tokens  Main stack size growing in depth of XMLdata  Auxiliary stack size growing in number of elements of, Non-blocking parser

7 IEEE ICWS 2008 7 9/25/08 Constructing PTDX Tables XML Schemas Mapping Rules Extended LL(1) Permutation Phrase Grammar LL(1) Parsing Table Token Table Action Table Note: actions are generated from schemas to perform type-checking verification although some validation constraints are incorporated in grammar productions.

8 IEEE ICWS 2008 8 9/25/08 Mapping Rules Define translation from schema components to LL(1) grammar productions Preserve structural constraints Map Free-ordered schema components (, ) to permutation grammar

9 IEEE ICWS 2008 9 9/25/08 Mapping Example <element name=“a” type=“string” minOccurs=“0”/> <element name=“b” type=“string”/> <element name=“c” type=“string”> T → > A → bA CD eA A → ε B → bB CD eB C → bC CD eC Note: bA and eA representing tokens of starting and closing element “a” Respectively; CD representing token of CDATA

10 IEEE ICWS 2008 10 9/25/08 Permutation Phrase A permutation phrase is a grammatical phrase that specifies a syntactic construct as any permutation of a set of constituent elements. E.g., the permutation phrase > recognizes language { abc, acb, bac, bca, cab, cba }

11 IEEE ICWS 2008 11 9/25/08 Two-stack PDA for Parsing Permutation Phrase > a bc top Main stackAux stack b c aInput: bc top Main stack a Aux stack b c aInput: ac top Main stackAux stack b c aInput: top 1 23

12 IEEE ICWS 2008 12 9/25/08 Two-stack PDA for Parsing Permutation Phrase (Cont’d) > Main stackAux stack 456 c top Main stack a Aux stack top b c a Input: a top Main stackAux stack b c ab c a Input: b c ab c a Note: All optional constituent elements are left on auxiliary stack once all non-empty elements have been parsed.

13 IEEE ICWS 2008 13 9/25/08 PTDX Architecture Hot-swappable

14 IEEE ICWS 2008 14 9/25/08 Schema-directed Scanner Optimized by schema  E.g., scanning a specific tag name is more efficient than scanning the generic string then doing comparison Tokenizer  Breakes XML message into token stream Token  Defined by element names, attribute names, enumeration values  Classified as starting tags and closing tags  Normalized namespace binding

15 IEEE ICWS 2008 15 9/25/08 Experiment Settings Test environment  3.0 GHz, 2GB RAM, Linux 2.6.20-1.2320, GCC 4.1.1 with option -02  Memory-resident message  Randomly arranged free ordered elements Compared with  Validation parsers gSOAP 2.7 Xerces 2.7.0 pTDX flex based parser  Non-validation parsers Expat 2.0.1 DFA-based parser

16 IEEE ICWS 2008 16 9/25/08 Test Cases

17 IEEE ICWS 2008 17 9/25/08 Performance: comparison of validating and non-validating parsers Better performance

18 IEEE ICWS 2008 18 9/25/08 Performance: effect of number of elements in of PTDX parser Better performance

19 IEEE ICWS 2008 19 9/25/08 Performance: runtime and compile time memory usage comparison(32 elements)

20 IEEE ICWS 2008 20 9/25/08 Conclusion Free ordered constraints can be parsed and validated efficiently using a 2-stack PDA Table-driven permutation phrase grammar parsing technique is time and space optimal Table-driven approach offers flexible framework for dealing with schema evolvement


Download ppt "9/25/08IEEE ICWS 2008 High-Performance XML Parsing and Validation with Permutation Phrase Grammar Parsers Wei Zhang & Robert van Engelen Department of."

Similar presentations


Ads by Google