Presentation is loading. Please wait.

Presentation is loading. Please wait.

C Chuen-Liang Chen, NTUCS&IE / 11 A SIMPLE COMPILER Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University.

Similar presentations


Presentation on theme: "C Chuen-Liang Chen, NTUCS&IE / 11 A SIMPLE COMPILER Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University."— Presentation transcript:

1 c Chuen-Liang Chen, NTUCS&IE / 11 A SIMPLE COMPILER Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, TAIWAN

2 c Chuen-Liang Chen, NTUCS&IE / 12 Structures of compilers (2/3) calling tree (1 pass) machine code main parser scanner semantic routines optimizer symbol table attribute table token SS : syntactic structure (parse tree) source code SS pass 1 code generator

3 c Chuen-Liang Chen, NTUCS&IE / 13 Language specification grammar 1.  begin end 2.  { } 3.  ID := ; 4.  read ( ) ; 5.  write ( ) ; 6.  ID {, ID } 7.  {, } 8.  { } 9.  ( ) 10.  ID 11.  INTLITERAL 12.  + 13.  - 14.  SCANEOF  Backus-Naur form (BNF) IDletter { letter | digit | underline } * INTLITERALdigit digit * comment- - anything EOL

4 c Chuen-Liang Chen, NTUCS&IE / 14 Tokens sequence of characters having a collective meaning example 1.  begin end 2.  { } 3.  ID := ; 4.  read ( ) ; 5.  write ( ) ; 6.  ID {, ID } 7.  {, } 8.  { } 9.  ( ) 10.  ID 11.  INTLITERAL 12.  + 13.  - 14.  SCANEOF

5 c Chuen-Liang Chen, NTUCS&IE / 15 Scanner (1/3) called by parser, usually to group input characters into tokens type of tokens -- begin end read write identifier integer ( ) ;, + - :=  excluding -- comment, blank, tab,... QUIZ: benefit ? – QUIZ: benefit ?  including -- End-Of-File QUIZ: if exclude EOF, then... ? – QUIZ: if exclude EOF, then... ? key issues  do not read too many  how to distinguish different identifiers (integers) ?  how to recognize begin end read write from identifiers ? comments  ungetc() -- for lookahead  buffer_char() -- save in_char into token buffer  check_reserved() -- check whether token in buffer is a reserved word & return BEGIN, END, READ, WRITE, or ID (token code) –BEGIN, END, READ, WRITE and ID are integer constants, usually

6 c Chuen-Liang Chen, NTUCS&IE / 16 Scanner (2/3) #include /* character classification macros */ #include extern char token_buffer[ ]; token scanner(void) { int in_char, c; clear_buffer(); if (feof(stdin)) return SCANEOF; while ((in_char = getchar()) != EOF) { if (isspace(in_char)) continue; /* do nothing */ else if ( ??? ) { ??? } else lexical_error(in_char); } else if (isalpha(in_char)) { /* * ID ::= LETTER| ID LETTER *| ID DIGIT *| ID UNDERSCORE */ buffer_char(in_char); for (c = getchar(); isalnum(c) || c == '_'; c = getchar()) buffer_char(c); ungetc(c, stdin); return check_reserved(); }

7 c Chuen-Liang Chen, NTUCS&IE / 17 Scanner (3/3) else if (isdigit(in_char)) { /* * INTLITERAL ::=DIGIT | *INTLITERAL DIGIT */ buffer_char(in_char); for (c = getchar(); isdigit(c); c = getchar()) buffer_char(c); ungetc(c, stdin); return INTLITERAL; } else if (in_char == '(') return LPAREN; else if (in_char == ')') return RPAREN; else if (in_char == ';') return SEMICOLON; else if (in_char == ',') return COMMA; else if (in_char == '+') return PLUSOP; else if (in_char == ':') { /* looking for ":=" */ c = getchar(); if (c == '=') return ASSIGNOP; else { ungetc(c, stdin); lexical_error(in_char); } } else if (in_char == '-') { /* is it --, comment start */ c = getchar(); if (c == '-') { do in_char = getchar(); while (in_char != '\n'); } else { ungetc(c, stdin}; return MINUSOP; }

8 c Chuen-Liang Chen, NTUCS&IE / 18 Parser (1/5) main program of a compiler (analysis part, at least) to check structure by context-free grammar recursive decent parsing  left-hand-side –one nonterminal one routine  right-hand-side –one nonterminal one routine call –one terminal one “match”  not work for all context-free grammar comments  match() -- call scanner; if match: OK, skip this token; else error handling  next_token() -- just see the next token, not skip (lookahead)

9 c Chuen-Liang Chen, NTUCS&IE / 19 Parser (2/5) void system_goal(void) { /* ::= SCANEOF */ program(); match(SCANEOF); } void program(void) { /* ::= BEGIN END */ match(BEGIN) statement_list(); match(END); } void statement_list(void) { /* ::= { } */ statement(); while (TRUE) { switch (next_token()) { case ID: case READ: case WRITE: statement(); break; default: return; } QUIZ: Why ID, READ, WRITE ?

10 c Chuen-Liang Chen, NTUCS&IE / 20 Parser (3/5) void statement(void) { token tok = next_token(); switch (tok) { case ID: /* ::= ID := ; */ match(ID); match(ASSIGNOP); expression(); match(SEMICOLON); break; case READ: /* ::= READ ( ) ; */ match(READ); match(LPAREN); id_list(); match(RPAREN); match(SEMICOLON); break; case WRITE: /* ::= WRITE ( ) ; */ match(WRITE); match(LPAREN); expr_list(); match(RPAREN); match(SEMICOLON); break; default: syntax_error(tok); break; }

11 c Chuen-Liang Chen, NTUCS&IE / 21 Parser (4/5) void id_list(void) { /* ::= ID {, ID } */ match(ID); while (next_token() == COMMA) { match(COMMA); match(ID); } void expression(void) { /* ::= { } */ token t; primary(); for (t = next_token(); t == PLUSOP || t == MINUSOP; t = next_token()) { add_op(); primary(); } void expr_list(void) { /* ::= {, } */ expression(); while (next_token() == COMMA) { match(COMMA); expression(); } void add_op(void) { /* ::= PLUSOP I MINUSOP */ token tok = next_token(); if (tok == PLUSOP || tok == MINUSOP) match(tok); else syntax_error(tok); }

12 c Chuen-Liang Chen, NTUCS&IE / 22 Parser (5/5) void primary(void) { token tok = next_token(); switch (tok) { case LPAREN: /* ::= ( ) */ match(LPAREN); expression(); match(RPAREN); break; case ID: /* ::= ID */ match(ID); break; case INTLITERAL: /* ::= INTLITERAL */ match(INTLITERAL); break; default: syntax_error(tok); break; }

13 c Chuen-Liang Chen, NTUCS&IE / 23 Action symbols to determine when to call semantic routines 1.  #start begin end 2.  { } 3.  := #assign ; 4.  read ( ) ; 5.  write ( ) ; 6.  #read_id {, #read_id } 7.  #write_expr {, #write_expr } 8.  { #gen_infix } 9.  ( ) 10.  11.  INTLITERAL #process_literal 12.  + #process_op 13.  - #process_op 14.  ID #process_id 15.  SCANEOF #finish  possibly, with some modifications

14 c Chuen-Liang Chen, NTUCS&IE / 24 Semantic record to keep semantic information associated with grammar symbol #define MAXIDLEN 33 typedef char string[MAXIDLEN]; /* for operators */ typedef struct operator { enum op { PLUS, MINUS } operator; } op_rec; /* for and */ enum expr { IDEXPR, LITERALEXPR, TEMPEXPR }; typedef struct expression { enum expr kind; union { string name;/* for IDEXPR, TEMPEXPR */ int val;/* for LITERALEXPR */ }; } expr_rec;

15 c Chuen-Liang Chen, NTUCS&IE / 25 Temporary using Temp&1, Temp&2,... char *get_temp(void) { /* max temporary allocated so far */ static int max_temp = 0; static char tempname[MAXIDLEN]; max_temp++; sprintf(tempname, "Temp&%d", max_temp); check_id(tempname); return tempname; }

16 c Chuen-Liang Chen, NTUCS&IE / 26 Parser + semantic routines void expression(void) { token t; /* ::= { } */ primary(); for (t = next_token(); t == PLUSOP || t == MINUSOP; t = next_token()) { add_op(); primary(); } void expression (expr_rec *result) { expr_rec left_operand, right_operand; op_rec op; /* ::= { #gen_infix } */ primary(&left_operand) while (next_token() == PLUSOP || next_token() == MINUSOP) { add_op(&op); primary(&right_operand); left_operand = gen_infix(left_operand, op, right_operand); } *result = left_operand; } QUIZ: where is syntatic structure?

17 c Chuen-Liang Chen, NTUCS&IE / 27 to determine when to call semantic routines 1.  #start begin end 2.  { } 3.  := #assign ; 4.  read ( ) ; 5.  write ( ) ; 6.  #read_id {, #read_id } 7.  #write_expr {, #write_expr } 8.  { #gen_infix } 9.  ( ) 10.  11.  INTLITERAL #process_literal 12.  + #process_op 13.  - #process_op 14.  ID #process_id 15.  SCANEOF #finish possibly, with some modifications Action symbols

18 c Chuen-Liang Chen, NTUCS&IE / 28 Semantic routines (1/3) to produce targat language (quadruple intermediate file) comments  generate() -- produce output  extract() -- get semantic information void start(void) { /* Semantic initializations, none needed. */ } void finish(void) { /* Generate code to finish program. */ generate("Halt", "", "", ""); }

19 c Chuen-Liang Chen, NTUCS&IE / 29 Semantics routines (2/3) expr_rec process_id(void) { /* Declare ID and build a corresponding semantic record. */ expr_rec t; check_id(token_buffer); t.kind = IDEXPR; strcpy(t.name, token_buffer); return t; } void read_id(expr_rec in_var) { /* Generate code for read. */ generate("Read", in_var.name, "Integer", ""); } expr_rec process_literal(void) { /* Convert literal to a numeric represen- tation and build semantic record. */ expr_rec t; t.kind = LITERALEXPR; (void) sscanf(token_buffer, "d", &t.val); return t; } op_rec process_op(void) { /* Produce operator descriptor. */ op_rec o; if (current_token == PLUSOP) o.operator = PLUS; else o.operator = MINUS; return o; }

20 c Chuen-Liang Chen, NTUCS&IE / 30 Semantics routines (3/3) expr_rec gen_infix(expr_rec e1, op_rec op, expr_rec e2) { /* * Generate code for infix operation. * Get result temp and set up semantic * record for result. */ expr_rec e_rec; /* An expr_rec with temp variant set. */ e_rec.kind = TEMPEXPR; strcpy(e_rec.name, get_temp()); generate(extract(op), extract(e1), extract(e2), e_rec.name); return e_rec; } void write_expr(expr_rec out_expr) { /* Generate code for write. */ generate("Write", extract(out_expr), "Integer", ""); } void assign(expr_rec target, expr_rec source) { /* Generate code for assignment. */ generate("Store", extract(source), target.name, ""); }

21 c Chuen-Liang Chen, NTUCS&IE / 31 Symbol table just for space allocation /* Is s in the symbol table? */ extern int lookup(string s); /* Put s unconditionally into symbol table. */ extern void enter(string s); void check_id(string s) { if (! lookup(s)) { enter(s); generate("Declare", s, "Integer", ""); }

22 c Chuen-Liang Chen, NTUCS&IE / 32 Tracing example (1/2) Step Parser ActionRemaining InputGenerated Code begin A:=BB-314+A; end SCANEOF (1)Call system_goal()begin A:=BB-314+A; end SCANEOF (2)Call program()begin A:=BB-314+A; end SCANEOF (3)Semantic Action: start() begin A:=BB-314+A; end SCANEOF (4)match(BEGIN)A:=BB-314+A; end SCANEOF (5)Call statement_list()A:=BB-314+A; end SCANEOF (6)Call statement()A:=BB-314+A; end SCANEOF (7)Call ident()A:=BB-314+A; end SCANEOF (8)match(ID):=BB-314+A; end SCANEOF (9)Semantic Action: process_id() :=BB-314+A; end SCANEOFDeclare A,lnteger (10)match(ASSIGNOP)BB-314+A; end SCANEOF (11)Call expression()BB-314+A; end SCANEOF (12)Call primary()BB-314+A; end SCANEOF (13)Call ident()BB-314+A; end SCANEOF (14)match(ID)-314+A; end SCANEOF (15)Semantic Action: process_id() -314+A; end SCANEOFDeclare BB,lnteger (16)Call add_op()-314+A; end SCANEOF (17)match(MINUSOP)314+A; end SCANEOF (18)Semantic Action: process_op() 314+A; end SCANEOF

23 c Chuen-Liang Chen, NTUCS&IE / 33 Tracing example (2/2) Step Parser ActionRemaining InputGenerated Code (19)Call primary()314+A; end SCANEOF (20)match(INTLITERAL)+A; end SCANEOF (21)Semantic Action: process_literal() +A; end SCAN EOF (22)Semantic Action: gen_infix() +A; end SCANEOFDeclare Temp&1,Integer Sub BB,314,Temp&1 (23)Call add_op()+A; end SCANEOF (24)match(PLUSOP)A; end SCANEOF (25)Semantic Action: process_op() A; end SCANEOF (26)Call primary()A; end SCANEOF (27)Call ident()A; end SCANEOF (28)match(ID); end SCANEOF (29)Semantic Action: process_id() ; end SCANEOF Declaration is unnecessary (30)Semantic Action: gen_infix() ; end SCANEOFDeclare Temp&2,Integer Add Temp&1,A,Temp&2 (31)Semantic Action: assign() ; end SCANEOFStore Temp&2,A (32)match(SEMICOLON)end SCANEOF (33)match(END)SCANEOF (34)match(SCANEOF) (35)Semantic Action: finish() Halt


Download ppt "C Chuen-Liang Chen, NTUCS&IE / 11 A SIMPLE COMPILER Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University."

Similar presentations


Ads by Google