Presentation is loading. Please wait.

Presentation is loading. Please wait.

C Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei,

Similar presentations


Presentation on theme: "C Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei,"— Presentation transcript:

1 c Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, TAIWAN

2 c Chuen-Liang Chen, NTUCS&IE / 36 Scanner (lexical analyzer) primary function -- grouping input characters into tokens called by -- parser return --1. token code 2. attribute (optional) theoretical bases -- regular expression, finite automata implementation  dedicated program (hardwired)  table-driven construction  hand-coded  by generator, in order to limit the effort in building a scanner by specifying which tokens the scanner is to recognize –program [lex] –table + standard driver program [ScanGen]

3 c Chuen-Liang Chen, NTUCS&IE / 37 Regular expression (1/2) being used to  specify simple set of strings (regular set)  specify tokens of programming language  program a scanner generator string -- catenation of characters in vocabulary, denoted V regular expression  meta-characters: ( ) ‘ * + ? | –have to be quoted when used as ordinary characters 1.  -- empty set 2. -- set of null string 3.s-- { string s } 4.A | B-- alternation of corresponding regular sets 5.A B-- catenation of corresponding regular sets 6.A*-- Kleene closure of corresponding regular set –repeating zero or more times

4 c Chuen-Liang Chen, NTUCS&IE / 38 Regular expression (2/2) other notations  A + = A A*  A ? = A |  Not(A) = V - A for set of characters A  Not(S) = V* - S for set of stings S –may be infinite but still regular  A k = A A... A (k times) examples  -- anything EolComment = - - ( Not(Eol) )* Eol  fixed decimal literalLit = D +. D +  identifierbegin with letterID = L ( L | D )* ( _ ( L | D ) + )* end with letter/digit without consecutive underlines being able to represent all finite sets and many but not all infinite sets QUIZ: counter example?  QUIZ: counter example?

5 c Chuen-Liang Chen, NTUCS&IE / 39 being used to recognize the tokens specified by a regular expression consisting of  a finite set of states  a set of transitions labeled with characters in V  a start state  a set of final states transition diagram transition table  blank: error entry deterministic finite automata (DFA)  unique transition for a given state and character  otherwise, nondeterministic finite automata (NFA) Finite automata 1234 -- Not(Eol) Eol

6 c Chuen-Liang Chen, NTUCS&IE / 40 rules   Kleene closure  vocabulary  catenation  alternation NFA for A A NFA for B NFA for A A NFA for A A NFA for B B a From RE to NFA

7 c Chuen-Liang Chen, NTUCS&IE / 41 From NFA to DFA major operation: -closure example 1 34 25 a aa b a | b 1,24,5 3, 4,5 5 ab a a | b 1,2 3, 4,5 a 1,24,5 3, 4,5 5 ab a a | b 1,24,5 3, 4,5 5 ab a 1. -closure(1) = 1, 2 2. -closure( 3, 4, 5 ) = 3, 4, 5 3. -closure( 4, 5 ) = 5 4. -closure( 5 ) = 5

8 c Chuen-Liang Chen, NTUCS&IE / 42 major operation: partition states into equivalent classes according to  final / non-final states  transition functions example DFA optimization ( A B C D E ) ( A B C D ) ( E ) ( A B C ) ( D ) ( E ) ( A C ) ( B ) ( D ) ( E )

9 c Chuen-Liang Chen, NTUCS&IE / 43 From DFA to scanner (1/3) dedicated program  example  if (current_char == '-') { current_char = getchar(); if (current_char == '-') { do current_char = getchar(); while (current_char != '\n'); } else { ungetc(current_char, stdin); lexical_error(current_char); } else lexical_error(current_char); /* Return or process valid token. */  ungetc() -- lookahead 1234 -- Not(Eol) Eol

10 c Chuen-Liang Chen, NTUCS&IE / 44 table-driven  transition table + return token code + character save/toss operation + process of valid token  example From DFA to scanner (2/3)  /* * Note: current_char is already set * to the current input character. */ state = initial_state; while (TRUE) { next_state = T[state][current_char]; if (next_state == ERROR) break; state = next_state; if (current_char == EOF) break; current_char = getchar(); } if (is_final_state(state)) /* Return or process valid token. */ else lexical_error(current_char); QUIZ: where is “lookahead” ?  QUIZ: where is “lookahead” ?

11 c Chuen-Liang Chen, NTUCS&IE / 45 From DFA to scanner (3/3) toss operation  example -- ( " ( Not(") | " " )* " ) QUIZ: how to program?  QUIZ: how to program? " " " H i " " " " H i " T( " ) " NOT( " )

12 c Chuen-Liang Chen, NTUCS&IE / 46 Reserved words identifiers reserved for particular usage approach 1  one reserved word one regular expression approach 2  exceptions to ordinary identifiers  approach used in our simple example QUIZ: comparison?

13 c Chuen-Liang Chen, NTUCS&IE / 47 Lexical error recovery strategies  delete the characters read so far  delete the first character handling of runaway string QUIZ: why need special handling?  QUIZ: why need special handling?  " ( Not("|Eol) | " " )* "  " ( Not("|Eol) | " " )* Eol –print out special error message handling of runaway comment  { Not({|})* }  { ( Not({|})* { Not({|})* )+ } –warning  { Not(})* Eof –error

14 c Chuen-Liang Chen, NTUCS&IE / 48 Lex (1/2) input file -- E[Ee] OtherLetter[A-DF-Za-df-z] Digit[0-9] Letter{E} | {OtherLetter} IntLit{Digit}+ % [ \t\n]+{ /* delete */} [Bb][Ee][Gg][Ii][Nn]{ minor=0; return(4);} [Ee][Nn][Dd]{ minor=0; return(5);} [Rr][Ee][Aa][Dd]{ minor=0; return(6);} [Ww][Rr][Ii][Tt][Ee]{ minor=0; return(7};} {Letter}({Letter} | {Digit} | _)*{ minor=0; return(1);} {IntLit}{ minor=1; return(2};} ({IntLit}[.]{IntLit})({E}[+-]?{IntLit})?{ minor=2; return(2};} \"([^\"\n] I \"\")*\"{ stripquotes(); minor=3; return(2);} \"([^\"\n] I \"\"}*\n{ stripquotes(); minor=0; return(3);} "("{ minor=0; return(8};} ")"{ minor=0; return(9);} ";"{ minor=0; return(10);} ","{ minor=0; return(11);} ":="{ minor=0; return(12);} "+"{ minor=0; return(13};} " "{ minor=0; return(14};} % executed when RE is matched precedence regular expression class to reduce table size

15 c Chuen-Liang Chen, NTUCS&IE / 49 Lex (2/2) input file -- /* Strip unwanted quotes from string in yytext; adjust yyleng. */ void stripquotes(void} { int frompos, topos = 0, numquotes = 2; for (frompos = 1; frompos < yyleng; frompos++) { yytext[topos++] = yytext[frompos]; if (yytext[frompos] == '"' && yytext[frompos+1] == '"') { frompos++; numquotes++; } yyleng -= numquotes; yytext[yyleng] = '\0'; } output -- a program interface --int yylex( ) char yytext; int yyleng; auxiliary routine(s)


Download ppt "C Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei,"

Similar presentations


Ads by Google