Bernd Fischer COMP2010: Compiler Engineering Abstract Syntax Trees.

Slides:



Advertisements
Similar presentations
JavaCUP JavaCUP (Construct Useful Parser) is a parser generator
Advertisements

Abstract Syntax Mooly Sagiv html:// 1.
Semantic Analysis Chapter 4. Role of Semantic Analysis Following parsing, the next two phases of the "typical" compiler are – semantic analysis – (intermediate)
1 JavaCUP JavaCUP (Construct Useful Parser) is a parser generator Produce a parser written in java, itself is also written in Java; There are many parser.
Exercise: Balanced Parentheses
Abstract Syntax Trees Compiler Baojian Hua
ML-YACC David Walker COS 320. Outline Last Week –Introduction to Lexing, CFGs, and Parsing Today: –More parsing: automatic parser generation via ML-Yacc.
CS Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412.
Slide 1 Chapter 2-b Syntax, Semantics. Slide 2 Syntax, Semantics - Definition The syntax of a programming language is the form of its expressions, statements.
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter 2.2 (Partial)
Chapter 2 A Simple Compiler
Abstract Syntax Mooly Sagiv html://
EECS 6083 Intro to Parsing Context Free Grammars
Attribute Grammars They extend context-free grammars to give parameters to non-terminals, have rules to combine attributes Attributes can have any type,
Parser construction tools: YACC
Abstract Syntax Trees Lecture 14 Wed, Mar 3, 2004.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Syntax & Semantic Introduction Organization of Language Description Abstract Syntax Formal Syntax The Way of Writing Grammars Formal Semantic.
Syntax Directed Translation. Syntax directed translation Yacc can do a simple kind of syntax directed translation from an input sentence to C code We.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
LEX and YACC work as a team
Semantic Analysis (Generating An AST) CS 471 September 26, 2007.
Automated Parser Generation (via CUP)CUP 1. High-level structure JFlexjavac Lexer spec Lexical analyzer text tokens.java CUPjavac Parser spec.javaParser.
1 Top Down Parsing. CS 412/413 Spring 2008Introduction to Compilers2 Outline Top-down parsing SLL(1) grammars Transforming a grammar into SLL(1) form.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Lab 3: Using ML-Yacc Zhong Zhuang
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 8: Semantic Analysis and Symbol Tables.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
COMP Parsing 3 of 4 Lectures 23. Using the Scanner Break input into tokens Use Scanner with delimiter: public void parse(String input ) { Scanner.
Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter 2.2 (Partial) 1.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Abstract Syntax Mooly Sagiv Schrierber Wed 10:00-12:00 html://
Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015.
Parsing III (Top-down parsing: recursive descent & LL(1) ) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
CPS 506 Comparative Programming Languages Syntax Specification.
Abstract Syntax Trees Compiler Baojian Hua
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
Compiler Principles Fall Compiler Principles Lecture 6: Parsing part 5 Roman Manevich Ben-Gurion University.
More Parsing CPSC 388 Ellen Walker Hiram College.
Chapter 3 Context-Free Grammars and Parsing. The Parsing Process sequence of tokens syntax tree parser Duties of parser: Determine correct syntax Build.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Compiler Principles Fall Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
Semantic Analysis II Type Checking EECS 483 – Lecture 12 University of Michigan Wednesday, October 18, 2006.
CPSC 388 – Compiler Design and Construction Parsers – Syntax Directed Translation.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Syntax-Directed Definitions CS375 Compilers. UT-CS. 1.
1 Topic 4: Abstract Syntax Symbol Tables COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Comp 411 Principles of Programming Languages Lecture 3 Parsing
Announcements/Reading
Chapter 3 – Describing Syntax
COMP261 Lecture 18 Parsing 3 of 4.
Parsing III (Top-down parsing: recursive descent & LL(1) )
A Simple Syntax-Directed Translator
CS 3304 Comparative Languages
Parsing & Context-Free Grammars
CS510 Compiler Lecture 4.
Chapter 3 Context-Free Grammar and Parsing
Chapter 3 – Describing Syntax
Interpreters Study Semantics of Programming Languages through interpreters (Executable Specifications) cs7100(Prasad) L8Interp.
CS 614: Theory and Construction of Compilers
Compiler Design 4. Language Grammars
CSE401 Introduction to Compiler Construction
Abstract Syntax Prabhaker Mateti 1.
The Recursive Descent Algorithm
Presentation transcript:

Bernd Fischer COMP2010: Compiler Engineering Abstract Syntax Trees

Parse trees represent derivations. E F EC ε T FC ( E ) EC NUM(1) F T FC ε * ID(b) T EC ε +F ε ID(a)T (a+1)*b E→ F EC EC→ + F EC | – F EC | ε F→ T FC FC→ * T FC | / T FC | ε T→ ( E ) | ID | NUM each path corresponds to possible call stack contains “punctuation” tokens: (, ), begin,... ⇒ concrete syntax tree contains redundant non-terminal symbols ⇒ chain rules: E → F → T ⇒ too much detail! E * E +ID(a)NUM(1) ID(b) ? How do we get there?

Abstract syntax trees represent the essential structure of derivations. Abstract syntax drops detail: punctuation tokens chain productions Abstract syntax rules can be ambiguous only describes structure of legal trees not meant for parsing usually allows unparsing (text reconstruction) ⇒ abstract syntax tree (AST) is clean interface (a+1)*b E * E +ID(a)NUM(1) ID(b) E→ F EC EC→ + F EC | – F EC | ε F→ T FC FC→ * T FC | / T FC | ε T→ ( E ) | ID | NUM E→E + E | E – E | E * E | E / E | ID | NUM

Manually building ASTs in Java Design principle based on abstract syntax grammar: One abstract class per non-terminal One concrete class per rule –One field per non-terminal on rhs public abstract class Expr {} public class Num extends Expr { public int val; public Num(int v) { val=v;} } public class Sum extends Expr { public Expr left,right; public Sum(Expr l, Expr r) {left = l; right = r;} } public class Diff extends Expr { … Alternatively: public class Binop extends Expr { public Expr left,right; public int op; public Binop(Expr l, Expr r, int o; ) {left = l; right = r; op = o;} }

Manually building ASTs in Java Design principle based on abstract syntax grammar: One abstract class per non-terminal One concrete class per rule –One field per non-terminal on rhs public abstract class Expr {} public class Num extends Expr { public int val; public Num(int v) { val=v;} } public class Sum extends Expr { public Expr left,right; public Sum(Expr l, Expr r) {left = l; right = r;} } public class Diff extends Expr { … For error reporting: public class Expr { public FilePos start,end; } public class Sum extends Expr { public Sum(Expr l,r) {left = l; right = r; start = l.start; end = r.end;} }

Manually building ASTs in Java (II) /* T -> ( E ) | Num */ public static Expr T() throws ParseException { Expr r; switch(token) { case '(': advance(); r = E(); eat(')'); return r; case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': return Num(); break; default: throw new ParseException("in T"); } change return value type from void add explicit returns add auxiliary variables for results of recursive calls

Problem: left-factorization moves left arguments upwards ??? E ECT NUM(3) +TEC –T NUM(2) NUM(1) ε E→ T EC EC→ + T EC | – T EC | ε T→ ( E ) | ID | NUM

/* E -> F EC */ public static Expr E() throws ParseException { Expr left = F(); return EC(left); } /* EC -> + F EC | - F EC | epsilon */ public static Expr EC(Expr left) throws ParseException { Expr right; switch(token) { case ')': case '\n': return left; case '+': advance(); right = F(); return EC(new Binop(left, right, PLUS); case '-': advance(); right = F(); return EC(new Binop(left, right, MINUS); default:... } Manually building ASTs in Java (III) add semantic value as argument to functions for left-factorized symbols

ANTLR automates building ASTs. Design principle: add tree building instructions to rules rule: rule-elems 1 -> build-instr 1 | rule-elems 2 -> build-instr 2... | rule-elems n -> build-instr n ; build instructions are automatically executed when rule is applied build instructions return AST node or AST node list use with options{output=AST;ASTLabelType=CommonTree;}

Basic AST building instructions reference: use AST node from parse element trm: '(' exp ')' -> exp; named reference: resolve ambiguities add: l=exp '+' r=exp -> $l $r; node construction: build tagged node ext: 'exit' exp -> ^('exit’ exp); dcl: type ID -> ^(VARDCL ID type); return exp AST, ignore brackets return list with both exp ASTs tag tokenchildren virtual tag token (must be defined in tokens) children ‘exit' exp

Collecting and duplicating elements list elements can be collected into a single list: args: arg (',' arg)* -> arg+; individual elements can be copied into lists: dcl: type ID (',' ID)* -> ^(VARDCL type ID+); vs. dcl: type ID (',' ID)* -> ^(VARDCL type ID)+; VARDCL type[ID, ID, ID,...] VARDCL typeID VARDCL typeID →→

Building alternative trees nodes can be null : init: exp? -> ^(INIT exp)?; nodes can be built for empty input: skip: -> ^SKIP; sub-trees can be added: for: 'for' '(' dcl? ';' c=exp? ';' i=exp? ')' stmts -> ^('for' dcl? ^(COND $c)? ^(ITER $i)? stmts); nodes can be built in rule alternatives: if: 'if' '(' expr ')' s1=stmt ('else' s2=stmt -> ^(IFELSE expr $s1 $s2) | -> ^('if' expr $s1) ); 'for' dclCONDITER stmts c i

Updating trees nodes can be initialized in rule parts and updated: exp: (INT -> INT) ('+' i=INT -> ^('+' $exp $i))*; 1:1+2:1+2+3: INT(1) '+' $expINT(2) '+' $expINT(3) '+' INT(2)INT(1) '+' INT(3) '+' INT(2)INT(1)