Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spring 2012 61102 Compilers Software Eng. Dept. – Ort Braude Lexical Analyzer Lecturer: Esti Stein brd4.braude.ac.il/~esti2.

Similar presentations


Presentation on theme: "Spring 2012 61102 Compilers Software Eng. Dept. – Ort Braude Lexical Analyzer Lecturer: Esti Stein brd4.braude.ac.il/~esti2."— Presentation transcript:

1 Spring 2012 61102 Compilers Software Eng. Dept. – Ort Braude Lexical Analyzer Lecturer: Esti Stein brd4.braude.ac.il/~esti2

2 winter 2010 61102 Compilers Software Eng. Dept. – Ort Braude Read in characters and group them into tokens. [most of the compilation time is spent on lexical analysis]. What is a lexical analyzer? Source Program Stream of (token, value) pairs Symbol table

3 winter 2010 61102 Compilers Software Eng. Dept. – Ort Braude Why using a lexical analyzer? Modular design – partitioning the compiler to independent parts. The parser is dealing with words (not characters). Isolate character set dependencies: –ASCII versus EBCDIC Isolate representation of symbols: – versus !=, { } versus begin..end

4 winter 2010 61102 Compilers Software Eng. Dept. – Ort Braude A token is: A place holder for logical entity: keywords constants operators punctuation Identifiers Not white spaces and comments.

5 winter 2010 61102 Compilers Software Eng. Dept. – Ort Braude Example of tokenizing if( val1 + val2 >= 6.5) todo = false; tokentoken#valuecomment if10keyword (20Left parenth. val150val1identifier +1+Add op. val250val2identifier >=2 Relational op. 6.5516.5Float const. )21Right parenth. todo50todoidentifier =3Assign op. false50falseidentifier ;4seperator

6 winter 2010 61102 Compilers Software Eng. Dept. – Ort Braude Example [program]: token Getoken( ) { SkipWhiteSpace( ); c = getchar( ); if( isletter(c )) return( ScanForIdentifier( ) ); if( isdigit(c )) return( ScanForConstant( ) ); switch( c) { case ‘(‘: return( LEFT_PAREN); case ‘)‘: return( RIGHT_PAREN); case ‘+’: return( ScanForAddOrIncrement( )); case ‘=‘: return( ScanForAssignOrEqual( )); case ‘/’: return( ScanForCommentOrDivide( )); … default: return( ERROR); }

7 winter 2010 61102 Compilers Software Eng. Dept. – Ort Braude Automating: Most tokens can be easily defined by a regular grammar: the user defines tokens in a form equivalent to regular grammar the system converts the grammar into code. Variety of tools – lex, flex..

8 winter 2010 61102 Compilers Software Eng. Dept. – Ort Braude Regular Expressions & Automata See at the “Technion” tutorial – about automata.

9 winter 2010 61102 Compilers Software Eng. Dept. – Ort Braude Exercise 1: A real number consists of two parts: The integer part, consisting of one or more digits. A number may not begin with a zero, unless the integer part is just zero. The decimal part, consisting of a decimal point followed by one or more digits. Construct a regular expression for real numbers.

10 winter 2010 61102 Compilers Software Eng. Dept. – Ort Braude Convert to DFA… Converting an NDFA to a DFA Stateab SS,AS ACerror B D,F CCB,C DD,F

11 winter 2010 61102 Compilers Software Eng. Dept. – Ort Braude Converting an NDFA to a DFA[2] Stateab SSAS SACS SBC SACSBCDF=F

12 winter 2010 61102 Compilers Software Eng. Dept. – Ort Braude The Code S:c = getchar( ); if( c = = ‘a’) goto SA; if( c = = ‘b’) goto S; error( ); SA:c = getchar( ); if( c = = ‘a’) goto SAC; if( c = = ‘b’) goto S; error( ); SAC: c = getchar( ); if( c = = ‘a’) goto SAC; if( c = = ‘b’) goto SBC; error( ); …

13 winter 2010 61102 Compilers Software Eng. Dept. – Ort Braude The Code[2] tokenLexicalDriver( LexTable) { state = laststate; for(;;) { c = NextChar( ); state = LexTable[ state, c]; if( state != error && state != finalstate) { AddToToken( c); AdvanceInput( ); } else break; } if( state != finalstate) return( ERROR); else return( Token[ finalstate]); }

14 winter 2010 61102 Compilers Software Eng. Dept. – Ort Braude Output Lexical Errors A compiler produce a listing of the compiled program + error messages – near the locations of the errors. The errors are queued and printed once a new-line is reached. Two ways for recover: –Ignore erroneous token, and start new token. –Delete the 1 st char. Read and start re-reading the input. (complicate!) Be careful not to propagate error messages!

15 winter 2010 61102 Compilers Software Eng. Dept. – Ort Braude LEX – the Lexical Analyzer See at the “Technion” tutorial – about the Lex.


Download ppt "Spring 2012 61102 Compilers Software Eng. Dept. – Ort Braude Lexical Analyzer Lecturer: Esti Stein brd4.braude.ac.il/~esti2."

Similar presentations


Ads by Google