A Compiler-Based Approach to Schema-Specific Parsing Kenneth Chiu Grid Computing Research Laboratory SUNY Binghamton Sponsored by NSF ANI-0330568.

Slides:



Advertisements
Similar presentations
Inside an XSLT Processor Michael Kay, ICL 19 May 2000.
Advertisements

1 CIS 461 Compiler Design and Construction Fall 2014 Instructor: Hugh McGuire slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module.
Semantics Static semantics Dynamic semantics attribute grammars
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
CSI 3125, Preliminaries, page 1 Programming languages and the process of programming –Programming means more than coding. –Why study programming languages?
Program Representations. Representing programs Goals.
CS105 INTRODUCTION TO COMPUTER CONCEPTS INTRO TO PROGRAMMING Instructor: Cuong (Charlie) Pham.
Lecture 01 - Introduction Eran Yahav 1. 2 Who? Eran Yahav Taub 734 Tel: Monday 13:30-14:30
Greg MorrisettFall  Compilers.  (duh)  Translating one programming language into another.  Also interpreters.  Translating and running a language.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
ISBN Chapter 1 Preliminaries. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.1-2 Chapter 1 Topics Motivation Programming Domains.
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
BİL744 Derleyici Gerçekleştirimi (Compiler Design)1.
CS 153: Concepts of Compiler Design August 25 Class Meeting Department of Computer Science San Jose State University Fall 2014 Instructor: Ron Mak
Building An Interpreter After having done all of the analysis, it’s possible to run the program directly rather than compile it … and it may be worth it.
Introduction & Overview CS4533 from Cooper & Torczon.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
Advances in Language Design
Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.
COP4020 Programming Languages
Chapter 1. Introduction.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
DEPARTMENT OF COMPUTER SCIENCE & TECHNOLOGY FACULTY OF SCIENCE & TECHNOLOGY UNIVERSITY OF UWA WELLASSA 1 CST 221 OBJECT ORIENTED PROGRAMMING(OOP) ( 2 CREDITS.
Natural and programming languages v0.2 – initial draft, Pikaro Tarmo v0.3 – updated, Pikaro Tarmo.
Compiler course 1. Introduction. Outline Scope of the course Disciplines involved in it Abstract view for a compiler Front-end and back-end tasks Modules.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
1 COMP 3438 – Part II-Lecture 1: Overview of Compiler Design Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Copyright © 2007 Addison-Wesley. All rights reserved.1-1 Reasons for Studying Concepts of Programming Languages Increased ability to express ideas Improved.
CS 363 Comparative Programming Languages Semantics.
May 31, May 31, 2016May 31, 2016May 31, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University,
Parse & Syntax Trees Syntax & Semantic Errors Mini-Lecture.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
Parallel XML Parsing Using Meta-DFAs Yinfei Pan 1, Ying Zhang 1, Kenneth Chiu 1, Wei Lu 2 1 State University of New York (SUNY) Binghamton 2 Indiana University.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Overview of Previous Lesson(s) Over View  A program must be translated into a form in which it can be executed by a computer.  The software systems.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Joey Paquet, 2000, Lecture 2 Lexical Analysis.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Chapter# 6 Code generation.  The final phase in our compiler model is the code generator.  It takes as input the intermediate representation(IR) produced.
9/25/08IEEE ICWS 2008 High-Performance XML Parsing and Validation with Permutation Phrase Grammar Parsers Wei Zhang & Robert van Engelen Department of.
The Model of Compilation Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
1 Compiler & its Phases Krishan Kumar Asstt. Prof. (CSE) BPRCE, Gohana.
TDX: a High-Performance Table-Driven XML Parser Wei Zhang Robert van Engelen Department of Computer Science Florida State University.
Compiler Construction CPCS302 Dr. Manal Abdulaziz.
CSC 4181 Compiler Construction
Compilers and Interpreters
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Presented by : A best website designer company. Chapter 1 Introduction Prof Chung. 1.
Chapter 1: Preliminaries Lecture # 2. Chapter 1: Preliminaries Reasons for Studying Concepts of Programming Languages Programming Domains Language Evaluation.
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
Welcome! Simone Campanoni
Compiler Design (40-414) Main Text Book:
Efficient Evaluation of XQuery over Streaming Data
PROGRAMMING LANGUAGES
Lecture 2 Lexical Analysis Joey Paquet, 2000, 2002, 2012.
CS416 Compiler Design lec00-outline September 19, 2018
Introduction CI612 Compiler Design CI612 Compiler Design.
CSE401 Introduction to Compiler Construction
Data Flow Analysis Compiler Design
CS416 Compiler Design lec00-outline February 23, 2019
Course Overview PART I: overview material PART II: inside a compiler
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Reasons To Study Programming Languages
1.3.7 High- and low-level languages and their translators
Presentation transcript:

A Compiler-Based Approach to Schema-Specific Parsing Kenneth Chiu Grid Computing Research Laboratory SUNY Binghamton Sponsored by NSF ANI

Motivation Schema provides additional information. Use it to speed up parsing. Generate code as efficient as hand-written. –From this: –Generate this: assure_3_chars_in_buf(); if (*c++ != ‘e’) goto error; if (*c++ != ‘l’) goto error; if (*c++ != ‘3’) goto error; if (++el3_count > 3) goto error2;

A Schema Compiler Sch 1IL 2 Machine Code Pass 1Pass 2Code Gen. Schema Data Structures Compile Engine Interpret

Prototype Architecture XML Schema Generalized Automata Small Memory C++ Front-Ends Back-Ends RELAX NG Fast C++ Java Bytecode

Generalized Automata A generalization of PDAs. –Each GA has a set of variables. Possibly unbounded in value. –Each transition is “guarded” by a predicate over the variables. –Each transition has a set of actions over the variables. Actions are executed when the transition is taken. Not a model for computation, since anything can happen in predicates and actions. –In theory can handle any kind of schema construct. Real question is whether it enables generation of optimized code for that construct.

Why Not CFGs? CFGs are very good for complex syntactic structures. –Very good at things like recreating an AST for an expression from a sequence of chars. XML structure is relatively simple. –Easy to recreate the tree structure from a sequence of chars. CFGs cannot model some things well, like occurrence constraints. Want something that permits a well-defined set of transforms, without being too restrictive.

Example 1 3 2

Predicates and Actions Predicates and actions are the instruction set of an abstract schema machine. –Transformed into executable code. –Definition not part of GA model. One set for all schema languages? –Regular tree language –Efficiency

Examples match ‘a’ –Current input character is ‘a’. occurrence ‘el3’ ‘<= 5’ –Element ‘el3’ has occurred no more 5 times. consume –Consume current input character. prefix_start –Beginning of namespace prefix. prefix_char ‘a’ –Encountered prefix character ‘a’. prefix_end –End of prefix.

Examples RELAX NG a b not char/ check_max ‘a’ 1 not char/ inc_count ‘b’ <

Content tag Type Content tag call return

NGA to DGA Easier to generate NGAs than DGAs. Conversion takes two steps. –Move compression Similar to epsilon closure. –Subset construction Each predicate has a readset. –Variables it reads to evaluate. Each action has a writeset. –Variables it changes.

Move Compression

Subset Construction ,

Performance Test Schema <element name="elem" type="elemType“ maxOccurs="N"/>

Results

Ratio to SSP

Conclusions Goal is to generate code as good as hand-written. Compile all the way down to low-level IL. Generalized automata seem to be an appropriate low-level IL. Preliminary results are encouraging, but not conclusive. Future work: –More schema features, namespaces. –Optimizations. Outlining, reverse partial evaluation Buffer precheck –Higher-level IL? Enables different optimizations? –Compiling to special architecture? –XSLT-like transforms? Given a transform that swaps two elements, can we generate code as efficient as can written by hand?