TDX: a High-Performance Table-Driven XML Parser Wei Zhang Robert van Engelen Department of Computer Science Florida State University.

Slides:



Advertisements
Similar presentations
High-Performance Predictive XML Parsing with gSOAP Robert van Engelen Florida State University.
Advertisements

Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
1 Pass Compiler 1. 1.Introduction 1.1 Types of compilers 2.Stages of 1 Pass Compiler 2.1 Lexical analysis 2.2. syntactical analyzer 2.3. Code generation.
SDPL 2003Notes 2: Document Instances and Grammars1 2.5 XML Schemas n A quick introduction to XML Schema –W3C Recommendation, May 2, 2001: »XML Schema Part.
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
Top-Down Parsing.
A Compiler-Based Approach to Schema-Specific Parsing Kenneth Chiu Grid Computing Research Laboratory SUNY Binghamton Sponsored by NSF ANI
Chapter 4 Lexical and Syntax Analysis Sections 1-4.
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
BİL744 Derleyici Gerçekleştirimi (Compiler Design)1.
Course Revision Contents  Compilers  Compilers Vs Interpreters  Structure of Compiler  Compilation Phases  Compiler Construction Tools  A Simple.
SDPL 2002Notes 2: Document Instances and Grammars1 2.5 XML Schemas n A quick introduction to XML Schema –W3C Recommendation, May 2, 2001: »XML Schema Part.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
APPLICATIONS OF CONTEXT FREE GRAMMARS BY, BRAMARA MANJEERA THOGARCHETI.
Top-Down Parsing - recursive descent - predictive parsing
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
LANGUAGE TRANSLATORS: WEEK 3 LECTURE: Grammar Theory Introduction to Parsing Parser - Generators TUTORIAL: Questions on grammar theory WEEKLY WORK: Read.
Intro to Lexing & Parsing CS 153. Two pieces conceptually: – Recognizing syntactically valid phrases. – Extracting semantic content from the syntax. E.g.,
10/13/2015IT 3271 Tow kinds of predictive parsers: Bottom-Up: The syntax tree is built up from the leaves Example: LR(1) parser Top-Down The syntax tree.
Compiler course 1. Introduction. Outline Scope of the course Disciplines involved in it Abstract view for a compiler Front-end and back-end tasks Modules.
Testing Grammars For Top Down Parsers By Asma M Paracha, Frantisek F. Franek Dept. of Computing & Software McMaster University Hamilton, Ont.
Concordia University Department of Computer Science and Software Engineering Click to edit Master title style COMPILER DESIGN Review Joey Paquet,
SDPL 2005Notes 2.5: XML Schemas1 2.5 XML Schemas n Short introduction to XML Schema –W3C Recommendation, 1 st Ed. May, 2001; 2 nd Ed. Oct, 2004: »XML Schema.
BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is entirely grammar driven and is implemented as an LL(k)
Joey Paquet, Lecture 12 Review. Joey Paquet, Course Review Compiler architecture –Lexical analysis, syntactic analysis, semantic.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
1 Overview of XSL. 2 Outline We will use Roger Costello’s tutorial The purpose of this presentation is  To give a quick overview of XSL  To describe.
XML Extras Outline 1 - XML in 10 Points 2 - XML Family of Technologies 3 - XML is Modular 4 - RDF and Semantic Web 5- XML Example: UK GovTalk Group’s Schema.
XML Grammar and Parser for WSOL Kruti Patel, Vladimir Tosic, Bernard Pagurek Network Management & Artificial Intelligence Lab Department of Systems & Computer.
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
Introduction to Compiling
Joey Paquet, 2000, Lecture 2 Lexical Analysis.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 4.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
The Design of XML-Based Model and Experiment Description Languages for Network Simulation Andrew Hallagan Bucknell University Dept. of Computer Science.
9/25/08IEEE ICWS 2008 High-Performance XML Parsing and Validation with Permutation Phrase Grammar Parsers Wei Zhang & Robert van Engelen Department of.
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
AUTONOMOUS REQUIREMENTS SPECIFICATION PROCESSING USING NATURAL LANGUAGE PROCESSING - Vivek Punjabi.
CSC 4181 Compiler Construction
©SoftMoore ConsultingSlide 1 Structure of Compilers.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Testing Web Services by XML Perturbation Joint research with Wuzhi Xu and Juan Luo Jeff Offutt Information & Software Engineering George Mason University.
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 12–Compilers.
2016/7/9Page 1 Lecture 11: Semester Review COMP3100 Dept. Computer Science and Technology United International College.
Benchmarking XML Processors for Applications in Grid Web Services Michael R. Head*, Madhusudhan Govindaraju*, Robert van Engelen**, Wei Zhang** *Grid Computing.
Wei Zhang Robert van Engelen
Data Modeling II XML Schema & JAXB Marc Dumontier May 4, 2004
CS416 Compiler Design lec00-outline September 19, 2018
Introduction CI612 Compiler Design CI612 Compiler Design.
CPSC 388 – Compiler Design and Construction
Implementing Language Extensions with Model Transformations
XML Data DTDs, IDs & IDREFs.
CMPE 152: Compiler Design August 21/23 Lab
Subject: Language Processor
CS416 Compiler Design lec00-outline February 23, 2019
Implementing Language Extensions with Model Transformations
CSE591: Data Mining by H. Liu
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Compiler design Bottom-up parsing: Canonical LR and LALR
Presentation transcript:

TDX: a High-Performance Table-Driven XML Parser Wei Zhang Robert van Engelen Department of Computer Science Florida State University

2 Outline Motivation Introduction Recent Work Table-Driven XML Parsing – TDX TDX Construction Toolkit Results and Preliminary Conclusion

3 Motivation Enhance performance for XML-based Web Services Provide flexibility Offer high-level modularity

4 Roadmap Motivation Introduction Recent Work Table-Driven XML parsing – TDX TDX construction Tool Kit Experiment Results and Preliminary Conclusion

5 Introduction Validating XML Parsing Three stages Well-formedsness Validation Data conversion Frequent access to schema Separation introduces overhead and requires frequent access to schema well-formedness data conversion validation XML application

6 Introduction (cont’d) Schema-specific XML parsing (SSP) Merging well-formedness and validation No requirement to frequent access to schema Separation stage of data conversion in implemented SSP Well-formedness Data Conversion Validation

7 Roadmap Motivation Introduction Recent Work Table-Driven XML parsing – TDX TDX construction Tool Kit Experiment Results and Preliminary Conclusion

8 Recent Work Chiu: “A compiler-based cpproach to schema-specific XML parsing” Merging parsing and validation by constructing PDA No namespace support Conversion from NFA to DFA may result in exponentially growing space requirement

9 Recent Work(cont'd) van Engelen: “Constructing finite automata for high-performance web services” Integrates parsing and validation into one stage by parsing actions encoded by DFA Cannot process cyclic XML schema

10 Recent Work(cont'd) van Engelen: ”The gSOAP toolkit for web services and peer-to-peer Computing Networks ” Namespace support Merging parsing and validation Implementing a recursive-decent parsing Disadvantages of recursive-descent Code size and function calling overhead

11 Roadmap Motivation Introduction Recent Work Table-Driven XML parsing – TDX TDX construction Tool Kit Experiment Results and Preliminary Conclusion

12 Table-XML Parsing (TDX) LL(1) grammar can be derived from schema XML documents can be parsed and validated using LL(1) grammar Well-formedness (parsing) can be verified through grammar rules Validation can be accomplished using semantic actions Application-specific events can also be encoded as semantic actions

13 Illustrating Example LL(1) Grammar: s  ‘ ’ t ‘ ’ t  t 1 t 2 t 1  ‘ ’ DATA //imp_s(s.val) ‘ ’ t 2  ‘ ’ DATA //imp_s(s.val) ‘ ’

14 Illustrating Example (cont'd) XML Tech Bob s (a) An XML Instance t t1t1 t2t2 imp_s(“XML Tech”) DATA imp_s(“Bob”) (b) Predictive Parsing DATA ‘ ’

15 Roadmap Recent Work Table-Driven XML parsing – TDX Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/tokenizer TDX construction Tool Kit Experiment Results and Preliminary Conclusion

16 TDX - Architecture Token CDATA Tokens LL(1) Parsing Table Ll(1) Grammar Productions and Actions Events Error: invalid Modules application Scanner/ Tokenizer (DFA) Parsing Engine (TDX)

17 Roadmap Recent Work Table-Driven XML parsing – TDX Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/Tokenizer TDX construction Tool Kit Experiment Results and Preliminary Conclusion

18 Token Generation Defined by Element name (opening and closing) Attribute name some data type Such as Enumeration Namespace binding Identical tag names under different namespaces are represented as different tokens Normalized tokens

19 Roadmap Recent Work Table-Driven XML parsing – TDX Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/Tokenizer TDX construction Tool Kit Experiment Results and Preliminary Conclusion

20 Mapping Schema to LL(1) Grammar Structural constraints are mapped to rules Validation constraints are mapped to semantic actions Note that many types of validation constraints are mapped to rules Such as occurrence, enumeration

21 Mapping Example(1) state  “OFF” | “ON” value  DATA//imp_i(char *s)

22 <element name=“id” type=“id_type” minOccurs=“0”/> <element name=“value” type=“value_type” minOccurs=“2” maxOccurs=“unbounded”/> Mapping Example(2) c 1  ‘ ’ id_type ‘ ’ example  c 1 | c 2 c 2  c’ 2 c’ 2 c’’ 2 example  c 1 c 2 c’ 2  ‘ ’ value_type ‘ ’ c1  c1   c’’ 2   c’’ 2  c’ 2 c’’ 2

23 Roadmap Recent Work Table-Driven XML parsing – TDX Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/Tokenizer TDX construction Tool Kit Experiment Results and Preliminary Conclusion

24 LL(1) Parsing Table Constructed from LL(1) grammar Indexed by nonterminals and terminals Contains either index of grammar production or error entry

25 Roadmap Recent Work Table-Driven XML parsing – TDX Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/Tokenizer TDX construction Tool Kit Experiment Results and Preliminary Conclusion

26 Parsing Engine Schema Independent Maintains Parsing table Production table Action table Stack

27 Roadmap Recent Work Table-Driven XML parsing – TDX Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/Tokenizer TDX construction Tool Kit Experiment Results and Preliminary Conclusion

28 Scanner/Tokenizer Constructed from schema Schema provides DFA states information Element name Has attribute? Attribute name Root element needs special care Schema information

29 Scanner/Tokenizer example <book xmlns:x =" xmlns:y =" targetnamespace =" XML Bible Bob professor DATA

30 Roadmap Motivation introduction Recent Work Table-Driven XML parsing – TDX TDX construction Tool Kit Experiment Results and Preliminary Conclusion

31 TDX Construction Toolkit Service.wsdl wsdl2TDX Service_flex.l Service_TDX.h tab.yy.c Service_TDX.c flex

32 Roadmap Motivation introduction Recent Work Table-Driven XML parsing – TDX TDX construction Tool Kit Experiment Results and Preliminary Conclusion

33 Experiment Setup Compare with DFA-based Parser gSOAP 2.7 eXpat 1.2 Xerces Memory-resident XML message Elapsed real time using timeofday()

34 Parsing Performance(1)

35 Parsing Performance (2)

36 Conclusion Enhance parsing speed Flexible framework Encoding value-based validation and application-specific events as semantic rules Combining structural, syntactic and semantic constraints in one pass High-level of modularity