Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Study of a Simple Compiler In this chapter we will study a simple compiler and study the different steps to build a compiler. This chapter will be an.

Similar presentations


Presentation on theme: "1 Study of a Simple Compiler In this chapter we will study a simple compiler and study the different steps to build a compiler. This chapter will be an."— Presentation transcript:

1 1 Study of a Simple Compiler In this chapter we will study a simple compiler and study the different steps to build a compiler. This chapter will be an introduction of the rest of the course.

2 2 Arithmetic expression processing using the stack The stack operations are: Push (x) : puts the value of X in the top of the stack Pop () : returns the value in the top of the stack. Before using the stack for arithmetic expression processing we have to translate the expression from Infix form to postfix form.

3 3 Examples of expression translation InfixPostfix 1 + 51 5 + 1 + 5 * 21 5 2 * + (1+5) * 21 5 + 2 * 9 – 5 + 29 5 – 2 +

4 4 Processing of expression To process an arithmetic expression using the stack we have to follow the following steps: 1)Read the expression from left to write 2)When getting a number put it in the top of the stack (using push). 3)When getting an operation:  Get the first number from the top of the stack (using pop)  Get the second number from the top of the stack (using pop)  Do the operation between the first number and the second number.  Put the result in the top if the stack (using push).

5 5 If we process the following expression 1 + 5 * 2 1 5 2 * + Translation 1 push 1 1 5 push 5 1 5 2 push 2 1 10 pop r1 Pop r2 mult r2,r1 push r2 11 pop r1 Pop r2 add r2,r1 push r2

6 6 Exercise 1) Process the other expression in the above table (page 3) using the stack. 2) Complete the following table. InfixPostfix 1 - 5 1 + 5 - 2 9 – 3 / (1+2) (9-3)/1+2

7 7 Simple compiler structure Lexical analyzer Syntax-directed translator Character stream (Infix representation) Token stream Intermediate representation (Postfix Representation)

8 8 Grammar Grammar (context free grammar (CFG)) 1)Set of Tokens (called terminal symbols ( 2)Set of Non-terminals 3)Set of rules each has Left part (Non-terminal) Arrow Right part (sequence (string) of Tokens and/or Non-terminal symbols) 4) Start symbol (one of Non-terminal symbols)

9 9 1) Example 1: List  list + digit List  list – digit List  digit Digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 This may be written as follow: List  list + digit | list – digit | digit Digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

10 10 - Terminal symbols (Tokens) + - 0 1 2 3 4 5 6 7 8 9 - Non-terminals Digit, List - Starting non-terminal List  String of tokens: is a sequence of number of Tokens or terminal symbols. This number may be zero in this case the string is called Empty String and is written .  All Token strings that may be built from a grammar starting at the start symbol form the language represented by this grammar.

11 11 Exercise Example 2) 1.determine the non-terminal symbols and the terminal symbols from the following grammar: 2.Determine the start symbol 3.Give three token strings derived from this grammar: Block  begin compound_stmts end Compound_stmts  stmt_list |  Stmt_list  stmt_list ; stmt | stmt Stmt  a | c | b

12 12 Parse Tree Shows how the start symbol of a grammar can derive a string in the language A tree with the following properties: 1- the root is the start symbol 2- each internal node is a Non-terminal  each leaf is a Token or . 4- If A is the label for an interior node, and X 1,X 2, …,X n (nonterminals or tokens) are the labels of its children, then the following production must exist: A  X 1 X 2 …X n A X2X2 X1X1 XnXn...

13 13 Example S  S S + | S S * | a 1) Derive the following string: aa+a* S  S S *  Sa*  SS+a*  Sa+a*  aa+a* S  SS* SaSaS  SS+SaSa SaSa

14 14 2) Draw the Parse tree of the derivation: S  S S *  Sa*  SS+a*  Sa+a*  aa+a* s ss sa * s + a a

15 15 Ambiguous Grammars If any string has more than one parse tree, grammar is said to be ambiguous Need to avoid for compilation, since string can have more than one meaning List of digits separated by plus or minus signs: Example merges notion of digit and list into single nonterminal string Same strings are derivable, but some strings have multiple parse trees (possible meanings) string → string + string | string – string |0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

16 16 Two Parse Trees: 9 – 5 + 2

17 17 Precedence and Associativity Precedence –Determines the order in which different operators are evaluated when they occur in the same expression –Operators of higher precedence are applied before operators of lower precedence Associativity –Determines the order in which operators of equal precedence are evaluated when they occur in the same expression –Most operators have a left-to-right associativity, but some have right-to-left associativity

18 18 Precedence and Associativity Example: Arithmetic Expression We start with the lowest level in the grammar (highest priority) Digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Then the higher level (lower priority) Factor  digit | (expr) Then the higher level (lower priority) Term  term * factor | term / factor | factor Then the highest level (lowest priority) expr  expr + term | expr – term | term

19 19 Postfix Notation Formal rules, infix → postfix –If E is variable or constant, E → E –If E is expression of form E1 op E2, where op is binary operator, E 1 → E 1 ’, and E 2 → E 2 ’, then E → E 1 ’ E 2 ’ op –If E is expression of form (E1) and E 1 → E 1 ’, then E → E 1 ’ Parentheses are not needed!

20 20 Translation Schemes Adds to a CFG Includes “semantic actions” embedded within productions expr  expr + term { print(‘+’) } expr  expr – term { print(‘-’) } expr  term term  0 { print(‘0’) } term  1 { print(‘1’) } … term  9 { print(‘9’) } Example Translation Scheme

21 21 Equivalent Translation Scheme expr  term rest rest  + term { print(‘+’) } rest rest  - term { print(‘-’) } rest rest  ε term  0 { print(‘0’) } term  1 { print(‘1’) } … term  9 { print(‘9’) }

22 22 Parsing Parsing is the process of determining if a string of tokens can be generated by a grammar

23 23 Top-down Parsing Recursively apply the following steps: –At node n with nonterminal A, select a production for A –Construct children at n for symbols on right side of selected production –Find next node for which subtree needs to be constructed Top-down parsing uses a “lookahead” symbol Selecting production may involve trial-and-error and backtracking

24 24 Predictive Parsing Recursive-descent parsing is a recursive, top-down approach to parsing A procedure is associated with each nonterminal of the grammar Predictive parsing –Special case of recursive-descent parsing –The lookahead symbol unambiguously determines the procedure for each nonterminal

25 25 Procedures for Nonterminals Production with right side α used if lookahead is in FIRST(α) –FIRST(α) is set of all symbols that can be first symbol of α –If lookahead symbol is not in FIRST set for any production, can use production with right side of ε –If two or more possibilities, can not use this method –If no possibilities, an error is declared Nonterminals on right side of selected production are recursively expanded

26 26 Left Recursion Left-recursive productions can cause recursive- descent parsers to loop forever Example: example  example + term Can eliminate left recursion A  A α | β A  β R R  α R | ε

27 27 Eliminating Left Recursion expr  expr + term { print(‘+’) } expr  expr – term { print(‘-’) } expr  term term  0 { print(‘0’) } term  1 { print(‘1’) } … term  9 { print(‘9’) } expr  term rest rest  + term { print(‘+’) } rest rest  - term { print(‘-’) } rest rest  ε term  0 { print(‘0’) } term  1 { print(‘1’) } … term  9 { print(‘9’) }

28 28 Infix to Prefix Code: Part 1 #include int lookahead; void expr(void); void rest(void); void term(void); void match(int); void error(void); int main(void) { lookahead = getchar(); expr(); putchar('\n'); /* adds trailing newline character */ } …

29 29 Infix to Prefix Code: Part 2 … void expr(void) { term(); rest(); } void term(void) { if (isdigit(lookahead)) { putchar(lookahead); match(lookahead); } else error(); } …

30 30 Infix to Prefix Code: Part 3 … void rest(void) { if (lookahead == '+') { match('+'); term(); putchar('+'); rest(); } else if (lookahead == '-') { match('-'); term(); putchar('-'); rest(); } …

31 31 Infix to Prefix Code: Part 4 … void match(int t) { if (lookahead == t) lookahead = getchar(); else error(); } void error(void) { printf("syntax error\n"); /* print error message */ exit(1); /* then halt */ }

32 32 Code Optimization 1 void rest(void) { REST: if (lookahead == '+') { match('+'); term(); putchar('+'); goto REST; } else if (lookahead == '-') { match('-'); term(); putchar('-'); goto REST; }

33 33 Code Optimization 2 void expr(void) { term(); while (1) { if (lookahead == '+') { match('+'); term(); putchar('+'); } else if (lookahead == '-') { match('-'); term(); putchar('-'); } else break; }

34 34 Improvements Remaining Want to ignore whitespace Allow numbers Allow identifiers Allow additional operators (multiplications and division) Allow multiple expressions (separated by semicolons)

35 35 Lexical Analyzer Eliminates whitespace (and comments) Reads numbers (not just single digits) Reads identifiers and keywords

36 36 Implementing the Lexical Analyzer

37 37 Allowable Tokens expected tokens: +, -, *, /, DIV, MOD, (, ), ID, NUM, DONE ID represents an identifier, NUM represents a number, DONE represents EOF

38 38 Tokens and Attributes LEXEMETOKENATTRIBUTE VALUE white space--- sequence of digitsNUMnumeric value of sequence divDIV--- modMOD--- letter followed by letters and digits IDindex into symbol table EOFDONE--- any other characterthat characterNONE

39 39 A Simple Symbol Table Each record of symbol table contains a token type and a string (lexeme or keyword) Symbol table has fixed size All lexemes in array of fixed size Will be able to insert and search for tokens: –insert(s, t) : creates entry with string s and token t, returns index into symbol table –lookup(s) : searches for entry with string s, returns index if found, 0 otherwise Keywords ( div and mod ) will be inserted into symbol table, they can not be used as identifiers

40 40 Updated Translation Scheme start  list eof list  expr; list | ε expr  expr + term { print(‘+’) } | expr – term { print(‘-’) } | term term  term * factor { print(‘*’) } | term / factor { print(‘/’) } | term div factor { print(‘DIV’) } | term mod factor { print(‘MOD’) } | factor factor  (expr) | id { print(id.lexeme) } | num { print(num.value) }

41 41 After Eliminating Left Recursion start  list eof list  expr; list | ε expr  term moreterms moreterms  + term { print(‘+’) } moreterms | - term { print(‘-’) } moreterms | ε term  factor morefactors morefactors  * factor { print(‘*’) } morefactors | / factor { print(‘/’) } morefactors | div factor { print(‘DIV’) } morefactors | mod factor { print(‘MOD’) } morefactors | ε factor  (expr) | id { print(id.lexeme) } | num { print(num.value) }

42 42 Final Code About 250 lines of C Pretty sloppy, otherwise would be longer

43 43 #include #define BSIZE 128 #define NONE -1 #define EOS '\0' #define NUM 256 #define DIV 257 #define MOD 258 #define ID 259 #define DONE 260 int tokenval; int lineno; struct entry { char *lexptr; int token; }; ********** global.h الملف *************

44 44 #include "global.h" struct entry keywords[] = { "div", DIV, "mod", MOD, 0, 0 }; void init() { struct entry *p; for (p = keywords; p->token; p++) insert(p->lexptr, p->token); } ********** Init.c ************* ID MOD DIV diveos m od count i lexptrtoken Array lexemes Array symtable

45 45 The lexical analyzer calls: -Lookup function for symbol search in the symbol table. -Insert function to add a symbol to the symbol table. -Adds 1 to the counter of lines when the end of line character is found.

46 46 #include "global.h" #define STRMAX 999 #define SYMMAX 100 char lexemes[STRMAX]; int lastchar = -1; struct entry symtable[SYMMAX]; int lastentry = 0; int lookup(char s[]) { int p; for (p = lastentry; p > 0; p = p-1) if (strcmp(symtable[p].lexptr, s) == 0) return p; return 0; } int insert(char s[], int tok) { int len; len = strlen(s); if (lastentry + 1 >= SYMMAX) error("symbol table full"); if (lastchar + len + 1 >= STRMAX) error("lexemes array full"); lastentry = lastentry + 1; symtable[lastentry].token = tok; symtable[lastentry].lexptr = &lexemes[lastchar + 1]; lastchar = lastchar + len + 1; strcpy(symtable[lastentry].lexptr, s); return lastentry; } ********** symbol.c *************

47 47 #include "global.h" char lexbuf[BSIZE]; int lineno = 1; int tokenval = NONE; int lexan() { int t; while(1) { t = getchar(); if (t == ' ' || t == '\t'); else if (t == '\n') lineno = lineno + 1; else if (isdigit (t)) { ungetc(t, stdin); scanf("%d", &tokenval); return NUM; } else if (isalpha(t)) { int p, b = 0; while (isalnum(t)) { lexbuf[b] = t; t = getchar(); b = b + 1; if (b >= BSIZE) error("compiler error"); } lexbuf[b] = EOS; if (t != EOF) ungetc(t, stdin); p = lookup(lexbuf); if(p == 0) p = insert(lexbuf, ID); tokenval = p; return symtable[p].token; } else if (t == EOF) return DONE; else { tokenval = NONE; return t; } ********** lexer.c *************

48 48 #include "global.h" void emit(t, tval) int t, tval; { switch(t) { case '+': case '-': case '*': case '/': printf("%c", t); break; case DIV: printf(“ DIV "); break; case MOD: printf(“ MOD "); break; case NUM: printf("%d", tval); break; case ID: printf(” %s ", symtable[tval].lexptr); break; default: printf("token %d, tokenval %d\n", t, tval); } ********** emitter.c *************

49 49 void parse() { lookahead = lexan(); while (lookahead != DONE) { expr(); match(';'); } void expr() { int t; term(); while(1) switch (lookahead) { case '+': case '-': t = lookahead; match(lookahead); term(); emit(t, NONE); continue; default: return; } void term() { int t; factor(); while(1) switch (lookahead) { case '*': case '/': case DIV: case MOD: t = lookahead; match(lookahead); factor(); emit(t, NONE); continue; default: return; } ********** parse.c *************

50 50 void factor() { switch (lookahead) { case '(': match ('('); expr(); match(')'); break; case NUM: emit(NUM, tokenval); match(NUM); break; case ID: emit(ID, tokenval); match(ID); break; default: error("syntax error"); } void match(t) int t; { if (lookahead == t) lookahead = lexan(); else error ("syntax error"); } ********** parse.c (Con’d)**********

51 51 #include "global.h" void error(char* m) { fprintf(stderr, "line %d: %s\n", lineno, m); exit(1); } #include "global.h" void main() { init(); parse(); exit(0); } *** error.c ****** main.c ***


Download ppt "1 Study of a Simple Compiler In this chapter we will study a simple compiler and study the different steps to build a compiler. This chapter will be an."

Similar presentations


Ads by Google