C Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei,

Slides:



Advertisements
Similar presentations
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Advertisements

Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands.
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
CSc 453 Lexical Analysis (Scanning)
1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
1 Chapter 2: Scanning 朱治平. Scanner (or Lexical Analyzer) the interface between source & compiler could be a separate pass and places its output on an.
1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.
Scanner 中正理工學院 電算中心副教授 許良全. Copyright © 1998 by LCH Compiler Design Overview of Scanning n The purpose of a scanner is to group input characters into.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary  Quoted string in.
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
Topic #3: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Finite-State Machines with No Output Longin Jan Latecki Temple University Based on Slides by Elsa L Gunter, NJIT, and by Costas Busch Costas Busch.
Finite-State Machines with No Output
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary –Quoted string in.
1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
Lexical Analyzer (Checker)
1 Chapter 3 Scanning – Theory and Practice. 2 Overview of scanner A scanner transforms a character stream of source file into a token stream. It is also.
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
CSE 5317/4305 L2: Lexical Analysis1 Lexical Analysis Leonidas Fegaras.
1 Languages and Compilers (SProg og Oversættere) Lexical analysis.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Lexical Analysis – Part I EECS 483 – Lecture 2 University of Michigan Monday, September 11, 2006.
By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.
1 Using Lex. Flex – Lexical Analyzer Generator A language for specifying lexical analyzers Flex compilerlex.yy.clang.l C compiler -lfl a.outlex.yy.c a.outtokenssource.
Introduction to Lex Fan Wu
CSc 453 Lexical Analysis (Scanning)
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Lexical Analysis (Scanning) Lexical Analysis (Scanning)
1st Phase Lexical Analysis
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
CS 536 © CS 536 Spring Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 2.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
CS 536 © CS 536 Spring Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Deterministic Finite Automata Nondeterministic Finite Automata.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
LEX & Yacc Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
Compiler Chapter 4. Lexical Analysis Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Department of Software & Media Technology
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Lecture 2 Lexical Analysis
Chapter 3 Lexical Analysis.
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Finite-State Machines (FSMs)
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
Two issues in lexical analysis
Recognizer for a Language
Lexical Analysis Why separate lexical and syntax analyses?
Department of Software & Media Technology
Review: Compiler Phases:
Lecture 5: Lexical Analysis III: The final bits
Lecture 5 Scanning.
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

c Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, TAIWAN

c Chuen-Liang Chen, NTUCS&IE / 36 Scanner (lexical analyzer) primary function -- grouping input characters into tokens called by -- parser return --1. token code 2. attribute (optional) theoretical bases -- regular expression, finite automata implementation  dedicated program (hardwired)  table-driven construction  hand-coded  by generator, in order to limit the effort in building a scanner by specifying which tokens the scanner is to recognize –program [lex] –table + standard driver program [ScanGen]

c Chuen-Liang Chen, NTUCS&IE / 37 Regular expression (1/2) being used to  specify simple set of strings (regular set)  specify tokens of programming language  program a scanner generator string -- catenation of characters in vocabulary, denoted V regular expression  meta-characters: ( ) ‘ * + ? | –have to be quoted when used as ordinary characters 1.  -- empty set set of null string 3.s-- { string s } 4.A | B-- alternation of corresponding regular sets 5.A B-- catenation of corresponding regular sets 6.A*-- Kleene closure of corresponding regular set –repeating zero or more times

c Chuen-Liang Chen, NTUCS&IE / 38 Regular expression (2/2) other notations  A + = A A*  A ? = A |  Not(A) = V - A for set of characters A  Not(S) = V* - S for set of stings S –may be infinite but still regular  A k = A A... A (k times) examples  -- anything EolComment = - - ( Not(Eol) )* Eol  fixed decimal literalLit = D +. D +  identifierbegin with letterID = L ( L | D )* ( _ ( L | D ) + )* end with letter/digit without consecutive underlines being able to represent all finite sets and many but not all infinite sets QUIZ: counter example?  QUIZ: counter example?

c Chuen-Liang Chen, NTUCS&IE / 39 being used to recognize the tokens specified by a regular expression consisting of  a finite set of states  a set of transitions labeled with characters in V  a start state  a set of final states transition diagram transition table  blank: error entry deterministic finite automata (DFA)  unique transition for a given state and character  otherwise, nondeterministic finite automata (NFA) Finite automata Not(Eol) Eol

c Chuen-Liang Chen, NTUCS&IE / 40 rules   Kleene closure  vocabulary  catenation  alternation NFA for A A NFA for B NFA for A A NFA for A A NFA for B B a From RE to NFA

c Chuen-Liang Chen, NTUCS&IE / 41 From NFA to DFA major operation: -closure example a aa b a | b 1,24,5 3, 4,5 5 ab a a | b 1,2 3, 4,5 a 1,24,5 3, 4,5 5 ab a a | b 1,24,5 3, 4,5 5 ab a 1. -closure(1) = 1, closure( 3, 4, 5 ) = 3, 4, closure( 4, 5 ) = closure( 5 ) = 5

c Chuen-Liang Chen, NTUCS&IE / 42 major operation: partition states into equivalent classes according to  final / non-final states  transition functions example DFA optimization ( A B C D E ) ( A B C D ) ( E ) ( A B C ) ( D ) ( E ) ( A C ) ( B ) ( D ) ( E )

c Chuen-Liang Chen, NTUCS&IE / 43 From DFA to scanner (1/3) dedicated program  example  if (current_char == '-') { current_char = getchar(); if (current_char == '-') { do current_char = getchar(); while (current_char != '\n'); } else { ungetc(current_char, stdin); lexical_error(current_char); } else lexical_error(current_char); /* Return or process valid token. */  ungetc() -- lookahead Not(Eol) Eol

c Chuen-Liang Chen, NTUCS&IE / 44 table-driven  transition table + return token code + character save/toss operation + process of valid token  example From DFA to scanner (2/3)  /* * Note: current_char is already set * to the current input character. */ state = initial_state; while (TRUE) { next_state = T[state][current_char]; if (next_state == ERROR) break; state = next_state; if (current_char == EOF) break; current_char = getchar(); } if (is_final_state(state)) /* Return or process valid token. */ else lexical_error(current_char); QUIZ: where is “lookahead” ?  QUIZ: where is “lookahead” ?

c Chuen-Liang Chen, NTUCS&IE / 45 From DFA to scanner (3/3) toss operation  example -- ( " ( Not(") | " " )* " ) QUIZ: how to program?  QUIZ: how to program? " " " H i " " " " H i " T( " ) " NOT( " )

c Chuen-Liang Chen, NTUCS&IE / 46 Reserved words identifiers reserved for particular usage approach 1  one reserved word one regular expression approach 2  exceptions to ordinary identifiers  approach used in our simple example QUIZ: comparison?

c Chuen-Liang Chen, NTUCS&IE / 47 Lexical error recovery strategies  delete the characters read so far  delete the first character handling of runaway string QUIZ: why need special handling?  QUIZ: why need special handling?  " ( Not("|Eol) | " " )* "  " ( Not("|Eol) | " " )* Eol –print out special error message handling of runaway comment  { Not({|})* }  { ( Not({|})* { Not({|})* )+ } –warning  { Not(})* Eof –error

c Chuen-Liang Chen, NTUCS&IE / 48 Lex (1/2) input file -- E[Ee] OtherLetter[A-DF-Za-df-z] Digit[0-9] Letter{E} | {OtherLetter} IntLit{Digit}+ % [ \t\n]+{ /* delete */} [Bb][Ee][Gg][Ii][Nn]{ minor=0; return(4);} [Ee][Nn][Dd]{ minor=0; return(5);} [Rr][Ee][Aa][Dd]{ minor=0; return(6);} [Ww][Rr][Ii][Tt][Ee]{ minor=0; return(7};} {Letter}({Letter} | {Digit} | _)*{ minor=0; return(1);} {IntLit}{ minor=1; return(2};} ({IntLit}[.]{IntLit})({E}[+-]?{IntLit})?{ minor=2; return(2};} \"([^\"\n] I \"\")*\"{ stripquotes(); minor=3; return(2);} \"([^\"\n] I \"\"}*\n{ stripquotes(); minor=0; return(3);} "("{ minor=0; return(8};} ")"{ minor=0; return(9);} ";"{ minor=0; return(10);} ","{ minor=0; return(11);} ":="{ minor=0; return(12);} "+"{ minor=0; return(13};} " "{ minor=0; return(14};} % executed when RE is matched precedence regular expression class to reduce table size

c Chuen-Liang Chen, NTUCS&IE / 49 Lex (2/2) input file -- /* Strip unwanted quotes from string in yytext; adjust yyleng. */ void stripquotes(void} { int frompos, topos = 0, numquotes = 2; for (frompos = 1; frompos < yyleng; frompos++) { yytext[topos++] = yytext[frompos]; if (yytext[frompos] == '"' && yytext[frompos+1] == '"') { frompos++; numquotes++; } yyleng -= numquotes; yytext[yyleng] = '\0'; } output -- a program interface --int yylex( ) char yytext; int yyleng; auxiliary routine(s)