Presentation is loading. Please wait.

Presentation is loading. Please wait.

TDX: a High-Performance Table-Driven XML Parser Wei Zhang Robert van Engelen Department of Computer Science Florida State University.

Similar presentations


Presentation on theme: "TDX: a High-Performance Table-Driven XML Parser Wei Zhang Robert van Engelen Department of Computer Science Florida State University."— Presentation transcript:

1 TDX: a High-Performance Table-Driven XML Parser Wei Zhang Robert van Engelen Department of Computer Science Florida State University

2 2 Outline Motivation Introduction Recent Work Table-Driven XML Parsing – TDX TDX Construction Toolkit Results and Preliminary Conclusion

3 3 Motivation Enhance performance for XML-based Web Services Provide flexibility Offer high-level modularity

4 4 Roadmap Motivation Introduction Recent Work Table-Driven XML parsing – TDX TDX construction Tool Kit Experiment Results and Preliminary Conclusion

5 5 Introduction Validating XML Parsing Three stages Well-formedsness Validation Data conversion Frequent access to schema Separation introduces overhead and requires frequent access to schema well-formedness data conversion validation XML application

6 6 Introduction (cont’d) Schema-specific XML parsing (SSP) Merging well-formedness and validation No requirement to frequent access to schema Separation stage of data conversion in implemented SSP Well-formedness Data Conversion Validation

7 7 Roadmap Motivation Introduction Recent Work Table-Driven XML parsing – TDX TDX construction Tool Kit Experiment Results and Preliminary Conclusion

8 8 Recent Work Chiu: “A compiler-based cpproach to schema-specific XML parsing” Merging parsing and validation by constructing PDA No namespace support Conversion from NFA to DFA may result in exponentially growing space requirement

9 9 Recent Work(cont'd) van Engelen: “Constructing finite automata for high-performance web services” Integrates parsing and validation into one stage by parsing actions encoded by DFA Cannot process cyclic XML schema

10 10 Recent Work(cont'd) van Engelen: ”The gSOAP toolkit for web services and peer-to-peer Computing Networks ” Namespace support Merging parsing and validation Implementing a recursive-decent parsing Disadvantages of recursive-descent Code size and function calling overhead

11 11 Roadmap Motivation Introduction Recent Work Table-Driven XML parsing – TDX TDX construction Tool Kit Experiment Results and Preliminary Conclusion

12 12 Table-XML Parsing (TDX) LL(1) grammar can be derived from schema XML documents can be parsed and validated using LL(1) grammar Well-formedness (parsing) can be verified through grammar rules Validation can be accomplished using semantic actions Application-specific events can also be encoded as semantic actions

13 13 Illustrating Example LL(1) Grammar: s  ‘ ’ t ‘ ’ t  t 1 t 2 t 1  ‘ ’ DATA //imp_s(s.val) ‘ ’ t 2  ‘ ’ DATA //imp_s(s.val) ‘ ’

14 14 Illustrating Example (cont'd) XML Tech Bob s (a) An XML Instance t t1t1 t2t2 imp_s(“XML Tech”) DATA imp_s(“Bob”) (b) Predictive Parsing DATA ‘ ’

15 15 Roadmap Recent Work Table-Driven XML parsing – TDX Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/tokenizer TDX construction Tool Kit Experiment Results and Preliminary Conclusion

16 16 TDX - Architecture Token CDATA Tokens LL(1) Parsing Table Ll(1) Grammar Productions and Actions Events Error: invalid Modules application Scanner/ Tokenizer (DFA) Parsing Engine (TDX)

17 17 Roadmap Recent Work Table-Driven XML parsing – TDX Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/Tokenizer TDX construction Tool Kit Experiment Results and Preliminary Conclusion

18 18 Token Generation Defined by Element name (opening and closing) Attribute name some data type Such as Enumeration Namespace binding Identical tag names under different namespaces are represented as different tokens Normalized tokens

19 19 Roadmap Recent Work Table-Driven XML parsing – TDX Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/Tokenizer TDX construction Tool Kit Experiment Results and Preliminary Conclusion

20 20 Mapping Schema to LL(1) Grammar Structural constraints are mapped to rules Validation constraints are mapped to semantic actions Note that many types of validation constraints are mapped to rules Such as occurrence, enumeration

21 21 Mapping Example(1) state  “OFF” | “ON” value  DATA//imp_i(char *s)

22 22 <element name=“id” type=“id_type” minOccurs=“0”/> <element name=“value” type=“value_type” minOccurs=“2” maxOccurs=“unbounded”/> Mapping Example(2) c 1  ‘ ’ id_type ‘ ’ example  c 1 | c 2 c 2  c’ 2 c’ 2 c’’ 2 example  c 1 c 2 c’ 2  ‘ ’ value_type ‘ ’ c1  c1   c’’ 2   c’’ 2  c’ 2 c’’ 2

23 23 Roadmap Recent Work Table-Driven XML parsing – TDX Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/Tokenizer TDX construction Tool Kit Experiment Results and Preliminary Conclusion

24 24 LL(1) Parsing Table Constructed from LL(1) grammar Indexed by nonterminals and terminals Contains either index of grammar production or error entry

25 25 Roadmap Recent Work Table-Driven XML parsing – TDX Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/Tokenizer TDX construction Tool Kit Experiment Results and Preliminary Conclusion

26 26 Parsing Engine Schema Independent Maintains Parsing table Production table Action table Stack

27 27 Roadmap Recent Work Table-Driven XML parsing – TDX Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/Tokenizer TDX construction Tool Kit Experiment Results and Preliminary Conclusion

28 28 Scanner/Tokenizer Constructed from schema Schema provides DFA states information Element name Has attribute? Attribute name Root element needs special care Schema information

29 29 Scanner/Tokenizer example <book xmlns:x ="http://www.x.org" xmlns:y ="http://www.y.org" targetnamespace ="http://www.x.org"> XML Bible Bob professor DATA

30 30 Roadmap Motivation introduction Recent Work Table-Driven XML parsing – TDX TDX construction Tool Kit Experiment Results and Preliminary Conclusion

31 31 TDX Construction Toolkit Service.wsdl wsdl2TDX Service_flex.l Service_TDX.h tab.yy.c Service_TDX.c flex

32 32 Roadmap Motivation introduction Recent Work Table-Driven XML parsing – TDX TDX construction Tool Kit Experiment Results and Preliminary Conclusion

33 33 Experiment Setup Compare with DFA-based Parser gSOAP 2.7 eXpat 1.2 Xerces 2.7.0 Memory-resident XML message Elapsed real time using timeofday()

34 34 Parsing Performance(1)

35 35 Parsing Performance (2)

36 36 Conclusion Enhance parsing speed Flexible framework Encoding value-based validation and application-specific events as semantic rules Combining structural, syntactic and semantic constraints in one pass High-level of modularity


Download ppt "TDX: a High-Performance Table-Driven XML Parser Wei Zhang Robert van Engelen Department of Computer Science Florida State University."

Similar presentations


Ads by Google