Presentation is loading. Please wait.

Presentation is loading. Please wait.

CPSC 325 - Compiler Tutorial 2 Scanner & Lex.

Similar presentations


Presentation on theme: "CPSC 325 - Compiler Tutorial 2 Scanner & Lex."— Presentation transcript:

1 CPSC Compiler Tutorial 2 Scanner & Lex

2 Tokens Input Token Stream: Each significant lexical chunk of the program is represented by a token Operators & Punctuation: { } ! + - = * ; : … Keywords: if while return goto Identifier: id & actual name Constants: kind & value; int, floating-point character, string, …

3 Token – example 1 Input text if( x >= y ) y = 10; Token Stream IF
LP ID(x) GEQ ID(y) RP ID(y) Assign INT(10) SEMI

4 Parser Tokens IF LP ID(x) GEQ ID(y) RP ID(y) Assign INT(10) SEMI
IfStmt >= assign ID(y) ID(y) INT(10) ID(x)

5 Sample Grammar Program ::= statement | program statement
Statement ::= assignStmt | ifStmt assignStmt ::= id = expr; ifStmt ::= if ( expr ) Statement Expr ::= id | int | expr + expr id ::= a | b | … | y | z Int ::= 1 | 2 | … | 9 | 0 a, b, 1, 2, 0 – terminal symbols; program, statement, id: non-terminal symbols.

6 Why Separate the Scanner and Parser?
Simplicity & Separation of Concerns Scanner hides details from parser (comments, whitespace, input files, etc.) Parser is easier to build; has simpler input stream Efficiency Scanner can use simpler, faster design (But still often consumes a surprising amount of the compiler’s total execution time)

7 Principle of Longest Match
In most of languages, the scanner should pick the longest possible string to make up the next token if there is a choice. Example return apple != banana; Should be recognized as 5 tokens Not more (not parts of words or identifier, or ! And = as separate tokens) return ID(apple) NEQ ID(banana) SEMI

8 Scanner DFA Example (1) White space or comments Accept EOF 1
Accept EOF 1 end of input ( Accept LP 2 ) 3 Accept RP ; 4 Accept SEMI

9 Scanner DFA Example (2) White space or comments Accept NEQ ! = 6 5
Accept NOT other 7 8 < = 9 Accept LEQ other 10 Accept LESS

10 Scanner DFA Example (3) White space or comments [0-9] [0-9] 11
Accept INT other 12

11 Scanner DFA Example (4) White space or comments [a-zA-Z] [a-zA-Z] 13
Accept ID or keyword other 14

12 Lex/Flex Use Flex instead of Lex Use Bison instead of yacc
When compile, link to the library flex file.lex gcc –o object lex.yy.c –ll object

13 Lex - Structure Declarations/Definitions %% Rules/Production
- Lex expression - white space - C statement (optional) Additional Code/Subroutines

14 Lex – Basic operators * - zero or more occurrences . - “ANY” character
.* - matches any sequence | - separator + - one or more occurrences. (a+ :== aa*) ? - zero or one of something. (b? :== (b+null) [ ] - choice, so [12345]  (1|2|3|4|5) (Note: [*+] represent a choice between star and plus. They lost their specialty. [a-zA-Z]  a to z and A to Z, all the letters. \ - \* matches *, and \. Match period or decimal point.


Download ppt "CPSC 325 - Compiler Tutorial 2 Scanner & Lex."

Similar presentations


Ads by Google