Download presentation
Presentation is loading. Please wait.
Published byNagendra Singh Modified over 4 years ago
1
Compiler Design Dr. Nagendra Pratap Singh
4
Single-pass Compiler A Compiler pass refers to the traversal of a compiler through the entire program. Compiler pass are two types: Single Pass Compiler Two Pass Compiler or Multi Pass Compiler Single Pass Compiler If we combine or group all the phases of compiler design in a single module known as single pass compiler.
5
Two Pass Compiler or Multi-pass Compiler Two pass/multi-pass Compiler: A Two pass/multi-pass Compiler is a type of compiler that processes the source code or abstract syntax tree of a program multiple times. In multipass Compiler we divide phases in two pass.
6
Que-1: Design a compiler for five different programming languages that run on single machine.
7
Que-2: Design a compiler for a programming language L 1 that run on five different machines.
8
Que-3: Design a compiler for a five programming languages that run on three different machines.
9
Lexical Analysis Lexical analysis phase of a compiler takes the modified source code from language preprocessors that are written in the form of sentences. The lexical analyzer breaks these syntaxes into a series of tokens, by removing any whitespace or comments in the source code. If lexical analyzer finds a token invalid, it generates an error.
10
Lexical Analysis The lexical analyzer reads character streams from the source code, checks for legal tokens, and passes the data to the syntax analyzer when it demands. Tokens: It is a sequence of characters (alphanumeric) that can be treated as a unit in the grammar of the programming languages. Lexemes: It is a sequence of characters (alphanumeric) in a token. There are some predefined rules for every lexeme to be identified as a valid token. These rules are defined by grammar rules, by means of a pattern. A pattern explains what can be a token, and these patterns are defined by means of regular expressions.
11
Lexical Analysis Keywords Constants Identifiers Punctuations symbols etc. Strings Numbers Operators In programming language the following can be considered as tokens. : For example, in C language, the variable declaration line contains the 5 tokens:
12
Lexical Analysis For each lexeme, the lexical analyzer produces an output token of the form (token-name, attribute-value) First component token-name is an abstract symbol that is used during syntax analysis, and the second component attribute- value points to an entry in the symbol table for this token. For example, suppose a source program contains the assignment statement Alpha = Beta + Gamma * 5
13
The characters in this assignment could be grouped into the following lexemes and mapped into the following tokens : 1. Alpha is a lexeme that would be mapped into a token, where id is an abstract symbol standing for identifier and 1 points to the symbol table entry for position. The symbol-table entry for an identifier holds information about the identifier, such as its name and type. 2. = (assignment symbol) is a lexeme that is mapped into the token. Since this token needs no attribute-value, we have omitted the second component. Lexical Analysis
14
3. Beta is a lexeme that is mapped into the token, where 2 points to the symbol-table entry for initial. 4. + is a lexeme that is mapped into the token. 5. Gamma is a lexeme that is mapped into the token, where 3 points to the symbol-table entry for rate. 6. * is a lexeme that is mapped into the token. 7. 5 is a lexeme that is mapped into the token Blanks separating the lexemes would be discarded by the lexical analyzer. Lexical Analysis
15
The representation of the assignment statement after lexical analysis as the sequence of tokens is In this representation, the token names =, +, and * are abstract symbols for the assignment, addition, and multiplication operators, respectively. Lexical Analysis
16
Consider the following statement: x = a + b * 2 Example {,,,,,, } {x, =, a, +, b, *, 2} Corresponding tokens: lexemes:
17
Consider the following statement: if (y <= t) y=y-3; Example {,,,,,,,,,,, } {if, (, y, <=, t, ), y, =, y, -, 3, ; } Corresponding tokens: lexemes:
18
Que-1: Find the number of tokens in the following program. Example based on Lexical Analysis Ans: Total number of tokens are18 int main() { // 2 variables int a, b; a = 10; return 0; } 'int' 'main' '(' ')' '{' 'int' 'a' ',' 'b' ';' 'a' '=' '10' ';' 'return‘ '0' ';' '}'
19
Que-2: Find the number of tokens in the following program. Example based on Lexical Analysis Ans: Total number of tokens are5 printf (“ Lexical Analysis”); 'printf', '(', “ Lexical Analysis ”, ')', ';'
20
Que-3: Find the number of tokens in the following program. Example based on Lexical Analysis Ans: Total number of tokens are27 int main() { int a = 10, b = 20; printf ("sum is :%d", a+b); return 0; } ‘int’, ‘main’, ‘(‘, ‘)’, ‘{‘, ‘int’, ‘a’, ‘=‘, ’10’, ‘,’, ‘b’, ‘=‘, ’20’, ‘;’, ‘printf’, ‘(‘, "sum is :%d“, ‘,’, ‘a’, ‘+’, ‘b’, ‘)’, ‘;’, ‘return’, ‘0’, ‘;’, ‘}’
21
Que-4: Find the number of different tokens in a following program. program gcd (input, output); var i, j : integer; begin read (i, j); while i <> j do if i > j then i := i - j else j := j - i; writeln (i) end. Example based on Lexical Analysis Ans: Total number of different tokens are 27
22
LEXEME - Sequence of characters matched by PATTERN forming the TOKEN PATTERN - The set of rule that define a TOKEN TOKEN - The meaningful collection of characters over the character set of the programming language exp: ID, Constant, Keywords, Operators, Punctuation, Literal String Summary
23
Lexical Analyzer Tool Lexeme may be different with respect to different programming languages. So Lexical Analyzer Tools are required: lex flex Basic requirement of Lexical Analyzer tools are lexical specification. Lexical specifications for any programming language must contains the information about identifying each and every token that is defined for it. Regular expression (RE) is used to write lexical specification. Quex lexer etc.
24
Lexical Analyzer Tool
25
Regular Expression Minimum State DFA Deterministic Finite Automata (DFA) NFA without epsilon moves NFA with epsilon moves Used to identify the set of characters (string) accepted by DFA or not. If the set of characters (string) accepted by DFA then input string is valid token otherwise invalid.
26
Examples:-
27
Que: Design a DFA for the regular expression (a/b)*abb
28
Solution:-
33
Final DFA for the regular expression (a/b)*abb
34
Next Lecture:- Brief Discussion on Phases of Compiler Continued
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.