Scanning & FLEX CPSC 388 Ellen Walker Hiram College.

Slides:



Advertisements
Similar presentations
Lexical Analysis Dragon Book: chapter 3.
Advertisements

COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands.
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax Lecture 2 - Syntax, Spring CSE3302 Programming Languages, UT-Arlington ©Chengkai.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
1 Chapter 2: Scanning 朱治平. Scanner (or Lexical Analyzer) the interface between source & compiler could be a separate pass and places its output on an.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #4 Lexical.
Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
Lexical Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 2.
Chapter 3 Chang Chi-Chung. The Structure of the Generated Analyzer lexeme Automaton simulator Transition Table Actions Lex compiler Lex Program lexemeBeginforward.
Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Lexical Analysis Recognize tokens and ignore white spaces, comments
Lecture 2: Lexical Analysis CS 540 George Mason University.
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
CS 536 Spring Learning the Tools: JLex Lecture 6.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
1 Flex. 2 Flex A Lexical Analyzer Generator  generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using.
Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use , Semester 1, Lex.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
COMP313A Programming Languages Lexical Analysis. Lecture Outline Lexical Analysis The language of Lexical Analysis Regular Expressions.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
Flex: A fast Lexical Analyzer Generator CSE470: Spring 2000 Updated by Prasad.
LEX (04CS1008) A tool widely used to specify lexical analyzers for a variety of languages We refer to the tool as Lex compiler, and to its input specification.
CS 536 Fall Scanner Construction  Given a single string, automata and regular expressions retuned a Boolean answer: a given string is/is not in.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Introduction to Lex Ying-Hung Jiang
1 Using Lex. 2 Introduction When you write a lex specification, you create a set of patterns which lex matches against the input. Each time one of the.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
1 Using Lex. Flex – Lexical Analyzer Generator A language for specifying lexical analyzers Flex compilerlex.yy.clang.l C compiler -lfl a.outlex.yy.c a.outtokenssource.
Introduction to Lex Fan Wu
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
Compiler Principle and Technology Prof. Dongming LU Mar. 26th, 2014.
Joey Paquet, 2000, Lecture 2 Lexical Analysis.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 4.
1 LEX & YACC Tutorial February 28, 2008 Tom St. John.
C Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei,
Scanner Generation Using SLK and Flex++ Followed by a Demo Copyright © 2015 Curt Hill.
LECTURE 7 Lex and Intro to Parsing. LEX Last lecture, we learned a little bit about how we can take our regular expressions (which specify our valid tokens)
More LR Parsing and Bison CPSC 388 Ellen Walker Hiram College.
LECTURE 6 Scanning Part 2. FROM DFA TO SCANNER In the previous lectures, we discussed how one might specify valid tokens in a language using regular expressions.
Scanning & Regular Expressions CPSC 388 Ellen Walker Hiram College.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
Chapter 2-II Scanning Sung-Dong Kim Dept. of Computer Engineering, Hansung University.
June 13, 2016 Prof. Abdelaziz Khamis 1 Chapter 2 Scanning – Part 2.
LEX SUNG-DONG KIM, DEPT. OF COMPUTER ENGINEERING, HANSUNG UNIVERSITY.
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
LEX & Yacc Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
Sung-Dong Kim, School of Computer Engineering, Hansung University
NFAs, scanners, and flex.
Tutorial On Lex & Yacc.
Lecture 2 Lexical Analysis Joey Paquet, 2000, 2002, 2012.
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Using SLK and Flex++ Followed by a Demo
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University
Regular Languages.
TDDD55- Compilers and Interpreters Lesson 2
Lecture 5: Lexical Analysis III: The final bits
Lexical Analysis - An Introduction
Compiler Design 3. Lexical Analyzer, Flex
Presentation transcript:

Scanning & FLEX CPSC 388 Ellen Walker Hiram College

Scanning (review) Input: characters from the source code Output: Tokens –Keywords: IF, THEN, ELSE, FOR … –Symbols: PLUS, LBRACE, SEMI … –Variable tokens: ID, NUM Augment with string or numeric value

Token Class (partial) Class Token { Public: TokenType tokenval; string tokenchars; double numval; }

GetToken(): A scanning function Token *getToken(istream &sin) –Read characters from sin until a complete token is extracted, return the token –Usually called by the parser –Note: version in the book uses global variables and returns only the token type

Using GetToken (Review) Token *myToken = GetToken(cin); While (myToken != NULL){ //process the token switch (myToken->TokenType){ //cases for each token type } myToken = GetToken(cin); }

Result of GetToken (Review)

Regular Expressions for Common Tokens Special characters: (the characters) Identifier: [a-zA-Z][a-zA-Z_]* Numbers: –Int: [1-9][0-9]* –Float: [1-9][0-9]*(  |(.[0-9]*)) –Scientific: [1-9][0-9]*(  |(.[0-9]*))(E+e)(+|–|  )[1-9][0-9]*

Reg. Exp. For Comments Comment to end of line –//[^\n]* (last part: (all chars except \n)* ) /*…*/ comment –ab (~b|b~a)*b?ba <--- ab … ba –/\* (~\* | \*~/)*(\*)? \*/ <--- needs escapes! –Does not require matching of “inner” /**/

Comments in Practice Often handled by “ad-hoc” methods Scanner simply loops to ignore characters from /* to */ –If character is not ‘*’, ignore it –Else if next character is not “/”, ignore it –Else ignore “/*” and return to scanning normally

Delimiters and Ambiguity Comments are not totally ignored! –“fo/**/r” is not the keyword “for” ! Principle of longest substring (“maximal munch”) –“fork” is not “for” followed by “k” Disallow keywords as identifiers –Scan identifier, then look it up instead of including keywords explicitly in language

FORTRAN’s mistakes Ignored white space (no delimiters) –DO99I=1.2 (DO99I = 1.2) vs. –DO99I=1,2 (DO 99 I = 1, 2) No reserved words –IF(IF.EQ.0)THENTHEN=17 Result: arbitrary backtracking (or lookahead) needed!

TINY Lexemes Reserved words: if, then, else, end, repeat, until, read, write Symbols: +, -, *, /, =, <, (, ), ;, := Other: number (integer only), identifier (letters only) Comment: {…} Principle of longest substring holds

TINY DFA

Using the TINY DFA Implement DFA directly or with a table Each call to gettoken() starts at the current point of the string, scans until no transition is possible. If final state is reached, return the token determined by the link to the final state. Otherwise, report an error. Characters in [ ] are not consumed

DFA pseudocodde State = Start_state While (chars available ){ last_state = state; state = next_state(next_char, state); if state = null return (final (last_state)); } return final(last_state);

LEX (FLEX) FLEX generates a scanner automatically! –Input: description of regular expression for each token, optional additional code –Output: lex.yy.c - includes function yylex() for parsing (like gettoken)

DFA Pseudocode state = initial-state while(chars in string){ c = next char from string state = next_state[state][c] } If final[state] return ACCEPT

Parts of a LEX file Definitions –code for the top of the file, and define expressions such as “digit” –All code in %{ and %} directly copied Rules – { expression } {code when recognized} Auxiliary Routines –Define additional functions here (including main)

Predefined items yylex() - lex scanning routine (like getToken) - generated by FLEX yytext - current string (a character array, not a C++ string class) Input() - get a char from flex input ECHO - print yytext to yyout

Example: Definitions %{ /* add line numbers to text and print */ #include int lineno=1; %} line.*\n %

Example: Rules & Aux. Code {line} {cout << lineno++ <<“ “<< yytext;} % main(){ yylex(); return 0; }

Using the Scanner First, create the code –flex test.lex Next, compile the program –g++ lex.yy.c -o test -lfl Finally, scan the input file –./test < input_file