Presentation is loading. Please wait.

Presentation is loading. Please wait.

Short introduction to compilers

Similar presentations


Presentation on theme: "Short introduction to compilers"— Presentation transcript:

1 Short introduction to compilers
Prabhas Chongstitvatana This is a compilation from the slides used in Compiler course at Dept. of Math and CS, Chulalongkorn. It is generously granted to be used in this class.

2 Introduction Jaruloj Chongstitvatana
Department of Mathematics and Computer Science Chulalongkorn University Chapter 1 : Introduction

3 What is a Compiler? A compiler is a computer program that translates a program in a source language into an equivalent program in a target language. A source program/code is a program/code written in the source language, which is usually a high- level language. A target program/code is a program/code written in the target language, which often is a machine language or an intermediate code. compiler Source program Target program Error message Chapter 1 : Introduction

4 Process of Compiling Stream of characters Stream of tokens
scanner Stream of tokens parser Parse/syntax tree Semantic analyzer Annotated tree Intermediate code generator Intermediate code Code optimization Intermediate code Code generator Target code Code optimization Target code Chapter 1 : Introduction

5 Some Data Structures Symbol table Literal table Parse tree Chapter 1
: Introduction

6 Symbol Table Identifiers are names of variables, constants, functions, data types, etc. Store information associated with identifiers Information associated with different types of identifiers can be different Information associated with variables are name, type, address,size (for array), etc. Information associated with functions are name,type of return value, parameters, address, etc. Chapter 1 : Introduction

7 Symbol Table (cont’d) Accessed in every phase of compilers
The scanner, parser, and semantic analyzer put names of identifiers in symbol table. The semantic analyzer stores more information (e.g. data types) in the table. The intermediate code generator, code optimizer and code generator use information in symbol table to generate appropriate code. Mostly use hash table for efficiency. Chapter 1 : Introduction

8 Literal table Store constants and strings used in program
reduce the memory size by reusing constants and strings Can be combined with symbol table Chapter 1 : Introduction

9 Parse tree Dynamically-allocated, pointer-based structure
Information for different data types related to parse trees need to be stored somewhere. Nodes are variant records, storing information for different types of data Nodes store pointers to information stored in other data structure, e.g. symbol table Chapter 1 : Introduction

10 Scanner

11 2301373 Introduction to Compilers
A scanner, sometimes called a lexical analyzer A scanner : gets a stream of characters (source program) divides it into tokens Tokens are units that are meaningful in the source language. Lexemes are strings which match the patterns of tokens. Scanner Introduction to Compilers

12 2301373 Introduction to Compilers
Examples of Tokens in C Tokens Lexemes identifier Age, grade,Temp, zone, q1 number 3.1416, , string “A cat sat on a mat.”, “ ” open parentheses ( close parentheses ) Semicolon ; reserved word if IF, if, If, iF Scanner Introduction to Compilers

13 2301373 Introduction to Compilers
Scanning When a token is found: It is passed to the next phase of compiler. Sometimes values associated with the token, called attributes, need to be calculated. Some tokens, together with their attributes, must be stored in the symbol/literal table. it is necessary to check if the token is already in the table Examples of attributes Attributes of a variable are name, address, type, etc. An attribute of a numeric constant is its value. Scanner Introduction to Compilers

14 How to construct a scanner
Define tokens in the source language. Describe the patterns allowed for tokens. Write regular expressions describing the patterns. Construct an FA for each pattern. Combine all FA’s. Scanner Introduction to Compilers

15 2301373 Introduction to Compilers
Regular Expression a character or symbol in the alphabet : an empty string : an empty set if r and s are regular expressions r | s r s r * (r ) l f Scanner Introduction to Compilers

16 Extension of regular expr.
[a-z] any character in a range from a to z . any character r + one or more repetition r ? optional subexpression ~(a | b | c), [^abc] any single character NOT in the set Scanner Introduction to Compilers

17 2301373 Introduction to Compilers
Examples of Patterns (a | A) = the set {a, A} [0-9]+ = (0 |1 |...| 9) (0 |1 |...| 9)* (0-9)? = (0 | 1 |...| 9 | ) [A-Za-z] = (A |B |...| Z |a |b |...| z) A . = the string with A following by any one symbol ~[0-9] = [^ ] = any character which is not 0, 1, ..., 9 l Scanner Introduction to Compilers

18 Describing Patterns of Tokens
reservedIF = (IF| if| If| iF) = (I|i)(F|f) letter = [a-zA-Z] digit =[0-9] identifier = letter (letter|digit)* numeric = (+|-)? digit+ (. digit+)? (E (+|-)? digit+)? Comments { (~})* } /* ([^*]*[^/]*)* */ ;(~newline)* newline Scanner Introduction to Compilers

19 2301373 Introduction to Compilers
Disambiguating Rules IF is an identifier or a reserved word? A reserved word cannot be used as identifier. A keyword can also be identifier. <= is < and = or <=? Principle of longest substring When a string can be either a single token or a sequence of tokens, single-token interpretation is preferred. Scanner Introduction to Compilers

20 2301373 Introduction to Compilers
FA Recognizing Tokens Identifier Numeric Comment letter letter,digit E digit . +,-,e ~/ / * ~* Scanner Introduction to Compilers

21 2301373 Introduction to Compilers
Lookahead letter, digit I,i F,f Return ID [other] Return IF Scanner Introduction to Compilers

22 2301373 Introduction to Compilers
Nested IF switch (state) { case 0: { if isletter(nxt) state=1; elseif isdigit(nxt) state=2; else state=3; break; } case 1: { if isletVdig(nxt) else state=4; letter, digit other 1 4 letter digit 2 other 3 Scanner Introduction to Compilers

23 2301373 Introduction to Compilers
Transition table St ch 1 2 3 letter .. digit 4 letter, digit other 1 4 letter digit 2 other 3 Scanner Introduction to Compilers

24 2301373 Introduction to Compilers
Simulating a DFA initialize current_state=start while (not final(current_state)) { next_state=dfa(current_state, next) current_state=next_state; } Scanner Introduction to Compilers

25 Context-Free Grammars
Using grammars in parsers Jaruloj Chongstitvatana Department of Mathametics and Computer Science Chulalongkorn University Chapter 3 Context-free Grammar

26 Chapter 3 Context-free Grammar
Parsing Process Call the scanner to get tokens Build a parse tree from the stream of tokens A parse tree shows the syntactic structure of the source program. Add information about identifiers in the symbol table Report error, when found, and recover from thee error Chapter 3 Context-free Grammar

27 Chapter 3 Context-free Grammar
a quintuple (V, T, P, S) where V is a finite set of nonterminals, containing S, T is a finite set of terminals, P is a set of production rules in the form of aβ where  is in V and β is in (V UT )*, and S is the start symbol. Any string in (V U T)* is called a sentential form. Chapter 3 Context-free Grammar

28 Chapter 3 Context-free Grammar
Examples E  E O E E  (E) E  id O  + O  - O  * O  / S  SS S  (S)S S  () S   Chapter 3 Context-free Grammar

29 Backus-Naur Form (BNF)
Nonterminals are in < >. Terminals are any other symbols. ::= means . | means or. Examples: <E> ::= <E><O><E>| (<E>) | ID <O> ::= + | - | * | / <S> ::= <S><S> | (<S>)<S> | () |  Chapter 3 Context-free Grammar

30 Chapter 3 Context-free Grammar
Examples S  SS | (S)S | () | S  SS  (S)SS (S)S(S)S  (S)S(())S  ((S)S)S(())S  ((S)())S(())S  ((())())S(())S  ((())()) (())S  ((())())(()) E  E O E | (E) | id O  + | - | * | / E  E O E  (E) O E  (E O E) O E * ((E O E) O E) O E  ((id O E)) O E) O E  ((id + E)) O E) O E  ((id + id)) O E) O E  * ((id + id)) * id) + id Chapter 3 Context-free Grammar

31 Parsing Jaruloj Chongstitvatana
Department of Mathematics and Computer Science Chulalongkorn University

32 Introduction Parsing is a process that constructs a syntactic structure (i.e. parse tree) from the stream of tokens. We already learn how to describe the syntactic structure of a language using (context-free) grammar. So, a parser only need to do this? Stream of tokens Context-free grammar Parser Parse tree

33 Parse Trees and Derivations
+ E E  E + E  id + E  id + E * E  id + id * E  id + id * id  E + E * E  E + E * id  E + id * id E * id id id Top-down parsing E E E + E E id * id id Bottom-up parsing

34 Recursive-Descent Write one procedure for each set of productions with the same nonterminal in the LHS Each procedure recognizes a structure described by a nonterminal. A procedure calls other procedures if it need to recognize other structures. A procedure calls match procedure if it need to recognize a terminal.

35 Recursive-Descent: Example
E  E O F | F O  + | - F  ( E ) | id procedure F { switch token { case (: match(‘(‘); E; match(‘)’); case id: match(id); default: error; } For this grammar: We cannot decide which rule to use for E, and If we choose E  E O F, it leads to infinitely recursive loops. Rewrite the grammar into EBNF procedure E { F; while (token=+ or token=-) { O; F; } } E ::= F {O F} O ::= + | - F ::= ( E ) | id procedure E { E; O; F; }

36 Match procedure procedure match(expTok) { if (token==expTok)
then getToken else error } The token is not consumed until getToken is executed.

37 Problems in Recursive-Descent
Difficult to convert grammars into EBNF Cannot decide which production to use at each point Cannot decide when to use -production A 

38 LL(1) Parsing LL(1) Use stack to simulate leftmost derivation
Read input from (L) left to right Simulate (L) leftmost derivation 1 lookahead symbol Use stack to simulate leftmost derivation Part of sentential form produced in the leftmost derivation is stored in the stack. Top of stack is the leftmost nonterminal symbol in the fragment of sentential form.

39 Concept of LL(1) Parsing
Simulate leftmost derivation of the input. Keep part of sentential form in the stack. If the symbol on the top of stack is a terminal, try to match it with the next input token and pop it out of stack. If the symbol on the top of stack is a nonterminal X, replace it with Y if we have a production rule X  Y. Which production will be chosen, if there are both X  Y and X  Z ?

40 Example of LL(1) Parsing
E TX FNX (E)NX (TX)NX (FNX)NX (nNX)NX (nX)NX (nATX)NX (n+TX)NX (n+FNX)NX (n+(E)NX)NX (n+(TX)NX)NX (n+(FNX)NX)NX (n+(nNX)NX)NX (n+(nX)NX)NX (n+(n)NX)NX (n+(n)X)NX (n+(n))NX (n+(n))MFNX (n+(n))*FNX (n+(n))*nNX (n+(n))*nX (n+(n))*n ( T N ( n + ) * $ E X F + F A n ) E  T X X  A T X |  A  + | - T  F N N  M F N |  M  * F  ( E ) | n N T T ( N Finished X M E * X F ) F n N N T X E $


Download ppt "Short introduction to compilers"

Similar presentations


Ads by Google