Flex Fast LEX analyzer CMPS 450. Lexical analysis terms + A token is a group of characters having collective meaning. + A lexeme is an actual character.

Slides:



Advertisements
Similar presentations
Lexical Analysis Consider the program: #include main() { double value = 0.95; printf("value = %f\n", value); } How is this translated into meaningful machine.
Advertisements

Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
CS252: Systems Programming Ninghui Li Topic 4: Regular Expressions and Lexical Analysis.
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:
Chapter 3 Chang Chi-Chung. The Structure of the Generated Analyzer lexeme Automaton simulator Transition Table Actions Lex compiler Lex Program lexemeBeginforward.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Scripting Languages Chapter 8 More About Regular Expressions.
Lecture 2: Lexical Analysis CS 540 George Mason University.
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
CS 536 Spring Learning the Tools: JLex Lecture 6.
1 Flex. 2 Flex A Lexical Analyzer Generator  generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using.
Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use , Semester 1, Lex.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
CPSC 388 – Compiler Design and Construction Scanners – JLex Scanner Generator.
Other Issues - § 3.9 – Not Discussed More advanced algorithm construction – regular expression to DFA directly.
Lexical Analyzer (Checker)
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
Flex: A fast Lexical Analyzer Generator CSE470: Spring 2000 Updated by Prasad.
LEX (04CS1008) A tool widely used to specify lexical analyzers for a variety of languages We refer to the tool as Lex compiler, and to its input specification.
Compiler Tools Lex/Yacc – Flex & Bison. Compiler Front End (from Engineering a Compiler) Scanner (Lexical Analyzer) Maps stream of characters into words.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
JLex Lecture 4 Mon, Jan 24, JLex JLex is a lexical analyzer generator in Java. It is based on the well-known lex, which is a lexical analyzer generator.
By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.
Introduction to Lex Ying-Hung Jiang
1 Using Lex. 2 Introduction When you write a lex specification, you create a set of patterns which lex matches against the input. Each time one of the.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
1 Using Lex. Flex – Lexical Analyzer Generator A language for specifying lexical analyzers Flex compilerlex.yy.clang.l C compiler -lfl a.outlex.yy.c a.outtokenssource.
COMPILERS AND INTERPRETERS Lesson 3 – TDDD16 TDDB44 Compiler Construction 2010 Kristian Stavåker Department.
Introduction to Lex Fan Wu
Lex.
Lexical Analysis with lex(1) and flex(1) © 2014 Clinton Jeffery.
By Neng-Fa Zhou Programming language syntax 4 Three aspects of languages –Syntax How are sentences formed? –Semantics What does a sentence mean? –Pragmatics.
Practical 1-LEX Implementation
CSc 453 Lexical Analysis (Scanning)
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
Syntactic Analysis Tools
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University.
Lexical Analysis.
1st Phase Lexical Analysis
1 Steps to use Flex Ravi Chotrani New York University Reviewed By Prof. Mohamed Zahran.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
Lexical Analysis.
NFAs, scanners, and flex.
Tutorial On Lex & Yacc.
CSc 453 Lexical Analysis (Scanning)
Regular Languages.
TDDD55- Compilers and Interpreters Lesson 2
Lexical Analysis Why separate lexical and syntax analyses?
Review: Compiler Phases:
Other Issues - § 3.9 – Not Discussed
Compiler Structures 3. Lex Objectives , Semester 2,
Appendix B.1 Lex Appendix B.1 -- Lex.
More on flex.
Regular Expressions and Lexical Analysis
Systems Programming & Operating Systems Unit – III
NFAs, scanners, and flex.
Compiler Design 3. Lexical Analyzer, Flex
Lex Appendix B.1 -- Lex.
Presentation transcript:

Flex Fast LEX analyzer CMPS 450

Lexical analysis terms + A token is a group of characters having collective meaning. + A lexeme is an actual character sequence forming a specific instance of a token, such as num. + A pattern is a rule expressed as a regular expression and describing how a particular token can be formed. For example, [A-Za-z][A-Za-z_0-9]* is a rule.

Characters between tokens are called whitespace; these include spaces, tabs, newlines, and formfeeds. Many people also count comments as whitespace, though since some tools such as lint/splint look at comments, this categorization is not perfect.

Attributes for tokens Tokens can have attributes that can be passed back to the calling function. Constants could have the value of the constant, for instance. Identifiers might have a pointer to a location where information is kept about the identifier.

Flex - our lexical analyzer generator Is linked with its library (libfl.a) using -lfl as a compile-time option. Can be called as yylex(). It is easy to interface with bison/yacc.

l file  lex  lex.yy.c lex.yy.c and  gcc  lexical analyzer other files input stream  lexical analyzer  actions taken when rules applied

Flex specifications Lex source: { definitions } % { rules } % { user subroutines }

Definitions + Declarations of ordinary C variables and constants. + flex definitions

Rules The form of rules are: regularexpression action The actions are C/C++ code.

s string s literally \c character c literally, where c would normally be a lex operator [s] character class ~ indicates beginning of line [~s] characters not in character class [s-t] range of characters s? s occurs zero or one time Flex regular expressions

. any character except newline s* zero or more occurrences of s s+ one or more occurrences of s r|s r or s (s) grouping $ end of line s/r s iff followed by r (not recommended) (r is*NOT*consumed) s{m,n} m through n occurrences of s Flex regular expressions, Cont.

a* zero or more a’s.* zero or more of any character except newline.+ one or more characters [a-z] a lowercase letter [a-zA-Z] any alphabetic letter [^a-zA-Z] any non-alphabetic character a.b a followed by any character followed by b rs|tu rs or tu Examples of regular expressions in flex

a(b|c)d abd or acd ^start beginning of line with then the literal characters start END$ the characters END followed by an end-of-line.

Actions are C source fragments. If it is compound, or takes more than one line, enclose with braces (’{’ ’}’). Example rules: [a-z]+ printf("found word\n"); [A-Z][a-z]* { printf("found capitalized word:\n"); printf(" ’%s’\n",yytext); } Flex actions

Flex definitions The form is simply name definition The name is just a word beginning with a letter (or an underscore, but I don’t recommend those for general use) followed by zero or more letters, underscore, or dash. The definition actually goes from the first non-whitespace character to the end of line. You can refer to it via {name}, which will expand to (definition). (cite: this is largely from “man flex”.) DIGIT [0-9] Now if you have a rule that looks like {DIGIT}*\.{DIGIT}+ that is the same as writing ([0-9])*\.([0-9])+

An example Flex program /* either indent or use %{ %} */ %{ int num_lines = 0; int num_chars = 0; %} % \n ++num_lines; ++num_chars;. ++num_chars; % int main(int argc, char **argv) { yylex(); printf("# of lines = %d, # of chars = %d\n", num_lines, num_chars ); }

Another example program digits [0-9] ltr [a-zA-Z] alphanum [a-zA-Z0-9] % (-|\+)*{digits}+ printf("found number: ’%s’\n", yytext); {ltr}(_|{alphanum})* printf("found identifer: ’%s’\n", yytext); ’.’ printf("found character: {%s}\n", yytext);. { /* absorb others */ } % int main(int argc, char **argv) { yylex(); }