Presentation is loading. Please wait.

Presentation is loading. Please wait.

Abstract Syntax Trees Compiler Baojian Hua

Similar presentations


Presentation on theme: "Abstract Syntax Trees Compiler Baojian Hua"— Presentation transcript:

1 Abstract Syntax Trees Compiler Baojian Hua bjhua@ustc.edu.cn

2 Front End source code abstract syntax tree lexical analyzer parser tokens IR semantic analyzer

3 Recap Lexer Program source to token sequence Parser token sequence, and answer Y or N Today’s topic: abstract syntax trees

4 Abstract Syntax Trees Parse trees encodes the grammatical structure of the source program However, they contain a lot of unnecessary information What are essential here? E E*E 15 (E) E +E 3 4

5 Abstract Syntax Trees For the compiler to understand an expression, it only need to know operators and operands punctuations, parentheses, etc. are not needed Similar for statements, functions, etc. E E*E 15 (E) E +E 3 4

6 Abstract Syntax Trees E E*E 15 (E) E +E 3 4 * + 34 Parse treeAbstract syntax tree

7 Concrete vs Abstract Syntax Concrete Syntax is needed for parsing includes punctuation symbols, factoring, elimination of left recursion, depends on the format of the input Abstract Syntax is a simpler, more convenient internal representation clean interface between the parser and the later phases of the compiler

8 Concrete and Abstract Syntax S E +E T F 2 T x 3 FT* F E ::= E + T | T T ::= T * F | F F ::= id | num | ( E ) 2 + 3 * x

9 Concrete and Abstract Syntax 2 + 3 * x E ::= id | num | E + E | E * E | ( E ) + 2 * 3 x

10 AST Data Structures In the compiler, abstract syntax makes use of the implementation language to represent aspects of the grammatical structure Highly target and implementation languages dependent art more than science

11 AST in C

12 AST for “ exp ” in C /* data structures */ typedef struct E *E; enum kind {ID, INT, ADD, TIMES}; struct E { enum kind kind; union { char *id; int num; struct {E e1; E e2;} add; struct {E e1; E e2;} times; } u; }; // This technique is called tagged- union. E -> id | num | E + E | E * E | ( E )

13 AST in C /* sample program “2+3*x” */ E e1 = malloc (sizeof (*e1)); e1->kind = INT; e1->u.num = 3; E e2 = malloc (sizeof (*e2)); e2->kind = ID; e2->u.id = “x”; E e3 = malloc (sizeof (*e3)); e3->kind = TIMES; e3->u.times.e1 = e1; e2->u.times.e2 = e2; … /* boring and error-prone :-( */ E ::= id | num | E + E | E * E | ( E )

14 AST for “ stm ” in C /* data structures */ typedef List SS; typedef struct S *S; enum kind {ASSIGN, PRINT}; struct S { enum kind kind; union { struct {char *id; E e;} assign; E print; } u; }; (* to encode “x:=3; print(x) *) prog = …; // leave to you… SS -> S | SS ; S S -> id := E | print (E)

15 Operations are tree-walkings (* pretty printing *) int pp (E e){ switch (e->kind) { case INT: printf (“%d”, e->u.num); return; case ID: printf (“%s”, e->u.id); return; case ADD: printf (“(“); pp (e->u.add.e1); printf (“)”); printf (“ + “); printf (“(“); pp (e->u.add.e2); printf (“)”); return; case TIMES: /* similar */ default: error (“compiler bug”); }

16 Operations are tree-walkings (* number of nodes in an AST *) int numNodes (E e) { switch (e->kind) { case INT: return 1; case ID: return 1; case ADD: case TIMES: return 1 + numNodes (e->u.add.e1) + numNodes (e->u.add.e2); default: error (“compiler bug”); } C compiler is stupid!

17 AST in F#

18 AST for “ exp ” in F# (* data structures *) type E = Int of int | Id of string | Add of E * E | Times of E * E E ::= id | num | E + E | E * E | ( E ) (* to encode “2+3*x” *) val prog = Add (Int 2, Times (Int 3, Id “x”)) (* Easy and happy! *)

19 AST for “ stm ” in SML /* data structures */ type SS = S list type S = Assign of string * exp | Print of exp (* to encode “x:=3; print(x)” *) val prog = [Assign (“x”, 3), Print (“x”)] SS -> S | SS ; S S -> x := E | print (E)

20 AST in F# (* number of nodes *) let rec numNodes e = match e with Int _ => 1 | Id _ => 1 | Add (e1, e2) => 1 + (numNodes e1) + (numNodes e2) | Times (e1, e2) => 1 + (numNodes e1) + (numNodes e2) (* Note this may be too inefficient, why? *)

21 AST in F# (* tail-recursion using caching *) let rec numNodes (e, n) = match e with Int _ => 1 + n | Id _ => 1 + n | Add (e1, e2) => let val n’ = numNodes (e1, n) in numNodes (e2, 1+n’) end | Times (e1, e2) => …(*similar)

22 AST in F# (* yet another version using reference *) val nodes = ref 0; val op ++ = fn x => x := !x + 1 let rec numNodes e = match e with Int _ => ++ nodes | Id _ => ++ nodes | Add (e1, e2) => (numNodes e1 ; ++ nodes ; numNodes e2) ) | Times (e1, e2) => … (* similar *)

23 AST in Java

24 AST for “ exp ” in Java /* data structures */ abstract class Exp {} class IntExp extends Exp { int i; IntExp (int i){ this.i = i; } // contructors omitted from the following classes class IdExp extends Exp {String id;} class AddExp extends Exp {Exp e1; Exp e2;} class TimesExp extends Exp {Exp e1; Exp e2;} E ::= id | num | E + E | E * E | ( E )

25 Local Class Hierarchy E ::= id | num | E + E | E * E | ( E ) /* to encode “2+3*x” */ Exp e = new AddExp(new IntExp (2), new TimesExp (new IntExp (3), new IdExp (“x”))) /* Not so ugly as that in C, but still boring */ Exp IntExpIdExpAddExpTimesExp

26 AST for “ stm ” in Java /* data structures */ class Stms{ LinkedList stms; } class Stm{} class AssignStm extends Stm{ String x; Exp e; } class PrintStm extends Stm {Exp e;} (* to encode “x:=3; print(x)” *) val prog = LinkedList.addAll(new AssignStm (…), new PrintStm(…)); stms -> stm | stms ; stm stm -> x := e | print (e)

27 AST in Java (* number of nodes again *) int numNodes (Exp e) { if (e instanceof IntExp) return 1; else if (e instanceof IdExp) return 1; else if (e instanceof AddExp) { AddExp f = (Add)e; return 1 + numNodes(f.e1) + numNodes(f.e2); } … } But this break the modularity of Java. A better way is to use the so-called visitor pattern. Read Tiger chap 4 and do lab 2.

28 How to construct AST automatically?

29 AST Generations Attribute-grammar scheme each nonterminal may have a semantic value v associated with it when the parser recognizes rule X -> s1 … sn a semantic action will be executed uses semantic values from symbols in si when parsing completes successfully parser returns semantic value associated with the start symbol usually an abstract syntax tree

30 AST Generations in tools In a top-down parser, ASTs are returned (recursively) as values Yacc-like tools encode this strategy in semantics actions

31 SS -> S | S ; SS S -> id := E | print (E) E -> id | num | E + E | E * E /* AST generation in recursive decedent parser */ List parse_stms () = List list = new List (); S stm = parse_stm (); list.addLast(stm); while (current_token == ;) eat (;); stm = parse_stm (); list.addLast (stm); return list; List S E

32 SS -> S | S ; SS S -> id := E | print (E) E -> id | num | E + E | E * E /* AST generation in recursive decedent parser */ S parse_stm () = switch (current_token) case ID: String x = current_token; // remember the “x” eat(ASSIGN); E exp = parse_exp (); S stm = new AssignStm (x, exp); return stm; case PRINT: eat(PRINT); eat (LPAREN); E exp = parse_exp (); eat (RPAREN); S stm = new PrintStm (exp); return stm; List S E

33 SS -> S | S ; SS S -> id := E | print (E) E -> id | num | E + E | E * E /* AST generation in recursive decedent parser */ E parse_addexp () = E e1 = parse_timesexp (); while (current_token == +) eat (+); E e2 = parse_timesexp (); E e3 = new AddExp (e1, e2); return e3; E parse_timesexp () = E e1 = parse_atom (); while (current_token == *) eat (*); E e2 = parse_atom (); E e3 = new AddExp (e1, e2); return e3; List S E

34 SS -> S | S ; SS S -> id := E | print (E) E -> id | num | E + E | E * E /* AST generation in recursive decedent parser */ E parse_atom () = switch (current_token) case ID: return new IdExp (current_token); case NUM: return new NumExp (current_token); default: error (“want ID or NUM”, but got …); List S E

35 AST generation in LR parser E T F 2 2 + 3 * 4 + 3 * 4 3 * 4 * 4 2 F T E E + E + 3 E + F E + T + 3 F S E T 4 F* T Each nonterminal is associated with a tree. 2 2 2 2 3 3 3 4 4 * +

36 AST generation in LR parser e -> e PLUS e ($$ = Add ($1, $3)) | e TIMES e ($$ = Times ($1, $3)) | ID ($$ = Id ($1)) | NUM ($$ = Num ($1))

37 Source Position In one-pass compiler, error messages are precise early compilers never worry about this But in a multi-pass compiler, source positions must be stored in AST for later use (* Example *) class AddExp{ Exp left; Exp right; int lineNum; int columnNum; }

38 Summary Abstract syntax trees are compiler internal data structures of source programs interface between front-end and compiler later parts Abstract syntax trees design is language-dependent more art than science


Download ppt "Abstract Syntax Trees Compiler Baojian Hua"

Similar presentations


Ads by Google