Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v14.6.18.

Slides:



Advertisements
Similar presentations
Lexical Analysis Consider the program: #include main() { double value = 0.95; printf("value = %f\n", value); } How is this translated into meaningful machine.
Advertisements

Compiler Baojian Hua Lexical Analysis (II) Compiler Baojian Hua
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
Lex(1) and flex(1). Lex public interface FILE *yyin; /* set before calling yylex() */ int yylex(); /* call once per token */ char yytext[];/* chars matched.
CPSC Compiler Tutorial 9 Review of Compiler.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
Guide To UNIX Using Linux Third Edition
Lecture 2: Lexical Analysis CS 540 George Mason University.
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 1 Chapter 4 Chapter 4 Lexical analysis.
1 Flex. 2 Flex A Lexical Analyzer Generator  generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Parser-Driven Games Tool programming © Allan C. Milne Abertay University v
Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use , Semester 1, Lex.
Introduction To Yacc and Semantics © Allan C. Milne Abertay University v
Lexical Analysis Mooly Sagiv Schrierber Wed 10:00-12:00 html:// Textbook:Modern.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
Lecture 2: Lexical Analysis
COMPUTER PROGRAMMING. A Typical C++ Environment Phases of C++ Programs: 1- Edit 2- Preprocess 3- Compile 4- Link 5- Load 6- Execute Loader Primary Memory.
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
Flex: A fast Lexical Analyzer Generator CSE470: Spring 2000 Updated by Prasad.
Group 4 Java Compiler Group Members: Atul Singh(Y6127) Manish Agrawal(Y6241) Mayank Sachan(Y6253) Sudeept Sinha(Y6483)
LEX (04CS1008) A tool widely used to specify lexical analyzers for a variety of languages We refer to the tool as Lex compiler, and to its input specification.
Compiler Tools Lex/Yacc – Flex & Bison. Compiler Front End (from Engineering a Compiler) Scanner (Lexical Analyzer) Maps stream of characters into words.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
JLex Lecture 4 Mon, Jan 24, JLex JLex is a lexical analyzer generator in Java. It is based on the well-known lex, which is a lexical analyzer generator.
Syntax Specification with YACC © Allan C. Milne Abertay University v
Introduction to Lex Ying-Hung Jiang
Introduction Lecture 1 Wed, Jan 12, The Stages of Compilation Lexical analysis. Syntactic analysis. Semantic analysis. Intermediate code generation.
1 Using Lex. 2 Introduction When you write a lex specification, you create a set of patterns which lex matches against the input. Each time one of the.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
1 Using Lex. Flex – Lexical Analyzer Generator A language for specifying lexical analyzers Flex compilerlex.yy.clang.l C compiler -lfl a.outlex.yy.c a.outtokenssource.
Introduction to Lex Fan Wu
Lexical Analysis with lex(1) and flex(1) © 2014 Clinton Jeffery.
Practical 1-LEX Implementation
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
Lex & Yacc By Hathal Alwageed & Ahmad Almadhor. References *Tom Niemann. “A Compact Guide to Lex & Yacc ”. Portland, Oregon. 18 April 2010 *Levine, John.
COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University.
1 Steps to use Flex Ravi Chotrani New York University Reviewed By Prof. Mohamed Zahran.
Scanner Generation Using SLK and Flex++ Followed by a Demo Copyright © 2015 Curt Hill.
LECTURE 6 Scanning Part 2. FROM DFA TO SCANNER In the previous lectures, we discussed how one might specify valid tokens in a language using regular expressions.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
Comp 311 Principles of Programming Languages Lecture 2 Syntax Corky Cartwright August 26, 2009.
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
CS 3304 Comparative Languages
Lexical Analysis.
NFAs, scanners, and flex.
Tutorial On Lex & Yacc.
Lexical Analysis (Sections )
CSc 453 Lexical Analysis (Scanning)
Using SLK and Flex++ Followed by a Demo
Regular Languages.
TDDD55- Compilers and Interpreters Lesson 2
JLex Lecture 4 Mon, Jan 26, 2004.
CS 3304 Comparative Languages
Lecture 4: Lexical Analysis & Chomsky Hierarchy
CS 3304 Comparative Languages
Compiler Structures 3. Lex Objectives , Semester 2,
Appendix B.1 Lex Appendix B.1 -- Lex.
More on flex.
Regular Expressions and Lexical Analysis
Systems Programming & Operating Systems Unit – III
Compiler Design 3. Lexical Analyzer, Flex
Presentation transcript:

Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v

Agenda. Lexical analysis & tokens. What is Flex? Lex program structure. Regular expression patterns. Examples.

The context. BNF defines the syntax of a language. A source program is written in that language. The compiler processes the source program according to the rules of the BNF. –We do not want to do this processing in terms of the individual characters of the program. The first step is to identify the lexical elements (tokens) of the source program. –identify groups of characters that form the tokens; keywords, punctuation, microsyntax.

The lexical analyser. Also known as the scanner. Its role: –to transform an input stream of characters –into tokens, –and expose these tokens to the rest of the compiler. if (x==10) … Character stream "if" "(" " " "==" " " ")" … Output tokens

What are tokens? Tokens are the internal compiler representations of the terminals of a source program as defined by the language BNF. Simple terminals –keywords; e.g. begin, for, if, … –single character punctuation; e.g. {, =, … –multi-character punctuation; e.g. ==,, … Microsyntax terminals –defined in the microsyntax of the BNF. –e.g. identifiers, literal constants.

Creating a scanner. Write a bespoke scanner, usually based around a finite state machine (FSM). Use a utility to process a language specification and automatically generate a scanner.

What is Flex? Lex was developed in the mid-1970’s as a utility to generate a lexical analyser in C. Flex is a Lex clone developed by the Gnu foundation as a free download. It generates a scanner in C/C++ from a Lex program that specifies token patterns and associated actions.

Processing with Flex. flex –obase.yy.c base.l –base.l : Lex program defining patterns/actions. –base.yy.c : generated C scanner. cl /Febase.exe base.yy.c –base.exe : the executable scanner. Base –Execute the scanner against standard input (keyboard). Base <file –Execute scanner redirecting input from the named file.

Lex program structure. %{ … C declarations … %} … Lex definitions … % … Lex rules of the form … pattern { … C actions … } % … C functions … %{ int allanCount; %} % [aA]ll?an { allanCount++; }. ; % int yywrap () { return 1; } int main () { yylex(); printf ("Input contains %d Allan’s. \n", allanCount); return 0; }

Generated scanner operation. Input is matched character by character to the patterns in the rules section. The longest pattern match then causes the associated actions to be executed. –If no match then the character is copied to the output. Defaults for input and output are stdin and stdout. –These can be changed by assigning to the predefined variables FILE *yyin, *yyout

Simplest Lex program. % int yywrap () { return 1; } int main () { yylex(); return 0; } Copies the input to the output. Note that yywrap() and main() are generated automatically by some Lex implementations. yywrap() indicates whether or not wrap-up is complete. –Almost always return 1 here. –Called by Lex when input is exhausted. As usual main() is the program entry point. –Calls yylex() to initiate the lexer.

Rules section. Each rule is a pattern / action pair. –Patterns must start in column 1; –Followed by whitespace; –Then an optional C statement or {…} C block. Any text not starting in column 1 is copied verbatim to the generated C program; –e.g. comments.

Patterns. A pattern is a regular expression composed of constant characters and meta-characters. Review the Lex meta-character document for a list of the meta-characters and some pattern examples.Review the Lex meta-character document It is the creation of patterns using combinations of meta-characters that is the core of creating a Lex program.

Character, word and line counter. %{ int lineCount, wordCount, charCount; %} % \n lineCount++; [^ \t\n]+ { wordCount++; charCount+=yyleng; }. charCount++; % int yywrap () { return 1; } int main () { yylex(); printf ("Input contains \n"); printf ("%d lines\n%d words\n%d chars\n", lineCount, wordCount, charCount); return 0; }

Pattern matching. Input is matched character by character to the patterns in the rules section. A pattern match causes the associated actions to be executed. –If two patterns match then the longest match is used; if matches are of equal length then the first pattern is used. If no match then the input character is copied to the output.

Pattern definitions. Patterns may be given a name in the definitions section and this name used in a rule pattern by enclosing the name in {…}. LETTER [a-zA-Z] %{ int wc; %} % {LETTER}+ wc++;.|\n ; % int yywrap() { return 1; } int main() { yylex(); printf ("%d words found.\n", wc); return 0; }