Practical 1-LEX Implementation

Slides:



Advertisements
Similar presentations
Lexical Analysis Consider the program: #include main() { double value = 0.95; printf("value = %f\n", value); } How is this translated into meaningful machine.
Advertisements

Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands.
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
CS 445 Lecture #2 Lexical Analysis. Regular Expressions ε is a r.e. Any char in the alphabet is a r.e. If r and s are r.e.’s then r | s is a r.e. If r.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
Lexical Analysis - Scanner- Contd Computer Science Rensselaer Polytechnic Compiler Design Lecture 4(01/26/98)
Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:
Chapter 3 Chang Chi-Chung. The Structure of the Generated Analyzer lexeme Automaton simulator Transition Table Actions Lex compiler Lex Program lexemeBeginforward.
Scanning with Jflex.
Lecture 2: Lexical Analysis CS 540 George Mason University.
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
CS 536 Spring Learning the Tools: JLex Lecture 6.
LEX and YACC work as a team
1 Flex. 2 Flex A Lexical Analyzer Generator  generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using.
Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use , Semester 1, Lex.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Miscellaneous 컴파일러 입문.
Lecture 2: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – JLex Scanner Generator.
COMP 3438 – Part II - Lecture 2: Lexical Analysis (I) Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
Flex: A fast Lexical Analyzer Generator CSE470: Spring 2000 Updated by Prasad.
LEX (04CS1008) A tool widely used to specify lexical analyzers for a variety of languages We refer to the tool as Lex compiler, and to its input specification.
JLex Lecture 4 Mon, Jan 24, JLex JLex is a lexical analyzer generator in Java. It is based on the well-known lex, which is a lexical analyzer generator.
Introduction to Lex Ying-Hung Jiang
–Writing a parser with YACC (Yet Another Compiler Compiler). Automatically generate a parser for a context free grammar (LALR parser) –Allows syntax direct.
1 Using Lex. 2 Introduction When you write a lex specification, you create a set of patterns which lex matches against the input. Each time one of the.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
1 Using Lex. Flex – Lexical Analyzer Generator A language for specifying lexical analyzers Flex compilerlex.yy.clang.l C compiler -lfl a.outlex.yy.c a.outtokenssource.
Introduction to Lex Fan Wu
Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v
Flex Fast LEX analyzer CMPS 450. Lexical analysis terms + A token is a group of characters having collective meaning. + A lexeme is an actual character.
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
1 Using Yacc. 2 Introduction Grammar –CFG –Recursive Rules Shift/Reduce Parsing –See Figure 3-2. –LALR(1) –What Yacc Cannot Parse It cannot deal with.
Lex & Yacc By Hathal Alwageed & Ahmad Almadhor. References *Tom Niemann. “A Compact Guide to Lex & Yacc ”. Portland, Oregon. 18 April 2010 *Levine, John.
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
Applications of Context-Free Grammars (CFG) Parsers. The YACC Parser-Generator. by: Saleh Al-shomrani.
COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Yet Another Compiler-Compiler Stephen C. Johnson July 31, 1978 YACC.
1 Steps to use Flex Ravi Chotrani New York University Reviewed By Prof. Mohamed Zahran.
Lex & Yacc logoLex.l logoLex.h logoLex.c logoYacc.y logoYacc.h logoYacc.c.
Scanner Generation Using SLK and Flex++ Followed by a Demo Copyright © 2015 Curt Hill.
LECTURE 7 Lex and Intro to Parsing. LEX Last lecture, we learned a little bit about how we can take our regular expressions (which specify our valid tokens)
LECTURE 6 Scanning Part 2. FROM DFA TO SCANNER In the previous lectures, we discussed how one might specify valid tokens in a language using regular expressions.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
Lexical Analysis.
NFAs, scanners, and flex.
Tutorial On Lex & Yacc.
Using SLK and Flex++ Followed by a Demo
Regular Languages.
TDDD55- Compilers and Interpreters Lesson 2
JLex Lecture 4 Mon, Jan 26, 2004.
Review: Compiler Phases:
Subject Name:Sysytem Software Subject Code: 10SCS52
Compiler Structures 3. Lex Objectives , Semester 2,
Appendix B.1 Lex Appendix B.1 -- Lex.
Compiler Lecture Note, Miscellaneous
Compiler Design Yacc Example "Yet Another Compiler Compiler"
More on flex.
Regular Expressions and Lexical Analysis
Systems Programming & Operating Systems Unit – III
Compiler Design 3. Lexical Analyzer, Flex
Lexical Analysis - Scanner-Contd
Lex Appendix B.1 -- Lex.
Presentation transcript:

Practical 1-LEX Implementation Lecture 1 Practical 1-LEX Implementation

Introduction The unix utility lex parses a file of characters. It uses regular expression matching; typically it is used to ‘tokenize’ the contents of the file. In that context, it is often used together with the yacc utility. However, there are many other applications possible.

Structure of a lex file A lex file looks like ...definitions... %% ...rules... ...code...

Here is a simple example: %{ int charcount=0,linecount=0; %} %% . charcount++; \n {linecount++; charcount++;} int main() { yylex(); printf("There were %d characters in %d lines\n", charcount,linecount); return 0; }

Cond……….. If you store this code in a file count.l, you can build an executable from it by lex -t count.l > count.c cc -c -o count.o count.l cc -o counter count.o –ll OR lex count.l > count.c gcc count.c –o counter –ll If you write only lex count.l Then by default it take lex.yy.c instead of given counter

In the example you just saw, all three sections are present: Definitions All code between %{ and %} is copied to the beginning of the resulting C file. Rules A number of combinations of pattern and action: if the action is more than a single command it needs to be in braces. Code This can be very elaborate, but the main ingredient is the call to yylex, the lexical analyzer. If the code segment is left out, a default main is used which only calls yylex.

Definitions section There are three things that can go in the definitions section: C code -Any indented code between %{ and %} is copied to the C file. This is typically used for defining file variables, and for prototypes of routines that are defined in the code segment. Definitions -A definition is very much like a #define cpp directive. For example letter [a-zA-Z] digit [0-9] punct [,.:;!?] nonblank [ˆ \t] These definitions can be used in the rules section: one could start a rule {letter}+ {... State definitions - If a rule depends on context, it’s possible to introduce states and incorporate those in the rules. A state definition looks like %s STATE, and by default a state INITIAL is already given.

Rules section The rules section has a number of pattern-action pairs. The patterns are regular expressions and the actions are either a single C command, or a sequence enclosed in braces. If more than one rule matches the input, the longer match is taken. If two matches are the same length, the earlier one in the list is taken.

Matched text When a rule matches part of the input, the matched text is available to the programmer as a variable char* yytext of length int yyleng. To extend the example from the introduction to be able to count words, we would write---------

Cond………. %{ int charcount=0,linecount=0,wordcount=0; %} letter [ˆ \t\n] %% {letter}+ {wordcount++; charcount+=yyleng;} . charcount++; \n {linecount++; charcount++;}

Regular expressions Many Unix utilities have regular expressions of some sort, but unfortunately they don’t all have the same power. Here are the basics: . Match any character except newlines. \n A newline character.

Cond…………. \t A tab character. ˆ The beginning of the line. $ The end of the line. <expr>* Zero or more occurences of the expression. <expr>+ One or more occurences of the expression. (<expr1>|<expr2>) One expression of another. [<set>] A set of character or ranges, such as [a-zA-Z]. [ˆ<set>] The complement of the set, for instance [ˆ \t].

User code section If the lex program is to be used on its own, this section will contain a main program. If you leave this section empty you will get the default main: int main() { yylex(); return 0; } where yylex is the parser that is built from the rules.