9/25/08IEEE ICWS 2008 High-Performance XML Parsing and Validation with Permutation Phrase Grammar Parsers Wei Zhang & Robert van Engelen Department of.

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
High-Performance Predictive XML Parsing with gSOAP Robert van Engelen Florida State University.
CPSC Compiler Tutorial 9 Review of Compiler.
A Compiler-Based Approach to Schema-Specific Parsing Kenneth Chiu Grid Computing Research Laboratory SUNY Binghamton Sponsored by NSF ANI
1 Contents Introduction A Simple Compiler Scanning – Theory and Practice Grammars and Parsing LL(1) Parsing LR Parsing Lex and yacc Semantic Processing.
Recap Mooly Sagiv. Outline Subjects Studied Questions & Answers.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
Chapter 2 A Simple Compiler
1 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice Grammars and Parsing.
Course Revision Contents  Compilers  Compilers Vs Interpreters  Structure of Compiler  Compilation Phases  Compiler Construction Tools  A Simple.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Introduction to Compiler Construction Robert van Engelen COP5621 Compiler Construction Copyright Robert.
AN IMPLEMENTATION OF A REGULAR EXPRESSION PARSER
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Concordia University Department of Computer Science and Software Engineering Click to edit Master title style COMPILER DESIGN Review Joey Paquet,
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Joey Paquet, Lecture 12 Review. Joey Paquet, Course Review Compiler architecture –Lexical analysis, syntactic analysis, semantic.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
An Overview and Evaluation of Web Services Security Performance Optimizations Robert van Engelen & Wei Zhang Department of Computer Science Florida State.
XML Grammar and Parser for WSOL Kruti Patel, Vladimir Tosic, Bernard Pagurek Network Management & Artificial Intelligence Lab Department of Systems & Computer.
Joey Paquet, 2000, Lecture 10 Introduction to Code Generation and Intermediate Representations.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
Introduction to Code Generation and Intermediate Representations
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Compilation With an emphasis on getting the job done quickly Copyright © – Curt Hill.
Introduction to Compiling
Introduction CPSC 388 Ellen Walker Hiram College.
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
1 Compiler & its Phases Krishan Kumar Asstt. Prof. (CSE) BPRCE, Gohana.
Top-Down Parsing.
TDX: a High-Performance Table-Driven XML Parser Wei Zhang Robert van Engelen Department of Computer Science Florida State University.
CSC 4181 Compiler Construction
using Deterministic Finite Automata & Nondeterministic Finite Automata
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
1 Compiler Construction Vana Doufexi office CS dept.
Presented by : A best website designer company. Chapter 1 Introduction Prof Chung. 1.
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
CS 154 Formal Languages and Computability March 22 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron Mak.
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 12–Compilers.
2016/7/9Page 1 Lecture 11: Semester Review COMP3100 Dept. Computer Science and Technology United International College.
Benchmarking XML Processors for Applications in Grid Web Services Michael R. Head*, Madhusudhan Govindaraju*, Robert van Engelen**, Wei Zhang** *Grid Computing.
Advanced Computer Systems
Compiler Design (40-414) Main Text Book:
Chapter 1 Introduction.
CS 3304 Comparative Languages
Wei Zhang Robert van Engelen
Chapter 4 - Parsing CSCE 343.
Programming Languages Translator
Compiler Construction
Chapter 1 Introduction.
PROGRAMMING LANGUAGES
Introduction to Compiler Construction
Introduction CI612 Compiler Design CI612 Compiler Design.
CPSC 388 – Compiler Design and Construction
Subject: Language Processor
Chapter 3 Syntactic Analysis I.
Introduction to Compiler Construction
Chapter 10: Compilers and Language Translation
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Introduction to Compiler Construction
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

9/25/08IEEE ICWS 2008 High-Performance XML Parsing and Validation with Permutation Phrase Grammar Parsers Wei Zhang & Robert van Engelen Department of Computer Science Florida State University

IEEE ICWS /25/08 Presentation Overview Schema-specific Parsers Related Work PTDX: Table-Driven XML Parser with Permutation Phrase Grammar Performance Conclusion

IEEE ICWS /25/08 Schema-specific parsers Compile-time vs. Run-time Parsers  Compile-time parsing and validation approaches use specialized compilation techniques to generate customized parsers from schemas  Run-time approaches use generic drivers( or engines) and grammar-like representation of schemas Blocking vs. non-blocking Parsers  Blocking parsers may suspend the entire program for sufficient XML content received. E.g. recursive based parsers  Non-blocking parsers always control the program and buffered data can be incrementally supplied Time-efficient vs. Space-efficient Parsers  Time efficient but encoding many states  Space efficient but with backtracking

IEEE ICWS /25/08 Related Work [Van Engelen, 2001]  The earliest work on schema-specific LL(1) recursive descent parser w/ namespace support and validation [Van Engelen, 2004]  Two-level DFA integrating parsing and validation [Chiu et al., 2004]  Using nondeterministic generalized automata to merge all aspects of low-level parsing and validation [Reuter, 2003]  Using Cardinality-Constraint Automaton (CCA) to perform schema-aware validation

IEEE ICWS /25/08 Related Work (Cont’d) [Kostoulas et al., 2006]  An efficient parser generator that translates XML schema into a parser either in C or Java [Matsa, 2007]  Schema-directed interpretive XML parser using special purpose byte-codes. [Zhang et al., 2006]  A table-driven approach parsing and validating in a single pass  Generator that translates schema in C

IEEE ICWS /25/08 PTDX: Table-Driven XML Parser with Permutation Phrase Table-driven grammar-based parser  Extended LL(1) grammar with permutation phrase support  Parsing table is constructed from extended LL(1) permutation grammar Run-time parser  Generic parsing engine (2-stack PDA) Both time and space efficient  Predictive parsing  Integrating parsing and validation into a single pass  No buffering  Operating on tokens  Main stack size growing in depth of XMLdata  Auxiliary stack size growing in number of elements of, Non-blocking parser

IEEE ICWS /25/08 Constructing PTDX Tables XML Schemas Mapping Rules Extended LL(1) Permutation Phrase Grammar LL(1) Parsing Table Token Table Action Table Note: actions are generated from schemas to perform type-checking verification although some validation constraints are incorporated in grammar productions.

IEEE ICWS /25/08 Mapping Rules Define translation from schema components to LL(1) grammar productions Preserve structural constraints Map Free-ordered schema components (, ) to permutation grammar

IEEE ICWS /25/08 Mapping Example <element name=“a” type=“string” minOccurs=“0”/> <element name=“b” type=“string”/> <element name=“c” type=“string”> T → > A → bA CD eA A → ε B → bB CD eB C → bC CD eC Note: bA and eA representing tokens of starting and closing element “a” Respectively; CD representing token of CDATA

IEEE ICWS /25/08 Permutation Phrase A permutation phrase is a grammatical phrase that specifies a syntactic construct as any permutation of a set of constituent elements. E.g., the permutation phrase > recognizes language { abc, acb, bac, bca, cab, cba }

IEEE ICWS /25/08 Two-stack PDA for Parsing Permutation Phrase > a bc top Main stackAux stack b c aInput: bc top Main stack a Aux stack b c aInput: ac top Main stackAux stack b c aInput: top 1 23

IEEE ICWS /25/08 Two-stack PDA for Parsing Permutation Phrase (Cont’d) > Main stackAux stack 456 c top Main stack a Aux stack top b c a Input: a top Main stackAux stack b c ab c a Input: b c ab c a Note: All optional constituent elements are left on auxiliary stack once all non-empty elements have been parsed.

IEEE ICWS /25/08 PTDX Architecture Hot-swappable

IEEE ICWS /25/08 Schema-directed Scanner Optimized by schema  E.g., scanning a specific tag name is more efficient than scanning the generic string then doing comparison Tokenizer  Breakes XML message into token stream Token  Defined by element names, attribute names, enumeration values  Classified as starting tags and closing tags  Normalized namespace binding

IEEE ICWS /25/08 Experiment Settings Test environment  3.0 GHz, 2GB RAM, Linux , GCC with option -02  Memory-resident message  Randomly arranged free ordered elements Compared with  Validation parsers gSOAP 2.7 Xerces pTDX flex based parser  Non-validation parsers Expat DFA-based parser

IEEE ICWS /25/08 Test Cases

IEEE ICWS /25/08 Performance: comparison of validating and non-validating parsers Better performance

IEEE ICWS /25/08 Performance: effect of number of elements in of PTDX parser Better performance

IEEE ICWS /25/08 Performance: runtime and compile time memory usage comparison(32 elements)

IEEE ICWS /25/08 Conclusion Free ordered constraints can be parsed and validated efficiently using a 2-stack PDA Table-driven permutation phrase grammar parsing technique is time and space optimal Table-driven approach offers flexible framework for dealing with schema evolvement