Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v14.6.18.

Similar presentations


Presentation on theme: "Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v14.6.18."— Presentation transcript:

1

2 Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v14.6.18

3 Agenda. Lexical analysis & tokens. What is Flex? Lex program structure. Regular expression patterns. Examples.

4 The context. BNF defines the syntax of a language. A source program is written in that language. The compiler processes the source program according to the rules of the BNF. –We do not want to do this processing in terms of the individual characters of the program. The first step is to identify the lexical elements (tokens) of the source program. –identify groups of characters that form the tokens; keywords, punctuation, microsyntax.

5 The lexical analyser. Also known as the scanner. Its role: –to transform an input stream of characters –into tokens, –and expose these tokens to the rest of the compiler. if (x==10) … Character stream "if" "(" " " "==" " " ")" … Output tokens

6 What are tokens? Tokens are the internal compiler representations of the terminals of a source program as defined by the language BNF. Simple terminals –keywords; e.g. begin, for, if, … –single character punctuation; e.g. {, =, … –multi-character punctuation; e.g. ==,, … Microsyntax terminals –defined in the microsyntax of the BNF. –e.g. identifiers, literal constants.

7 Creating a scanner. Write a bespoke scanner, usually based around a finite state machine (FSM). Use a utility to process a language specification and automatically generate a scanner.

8 What is Flex? Lex was developed in the mid-1970’s as a utility to generate a lexical analyser in C. Flex is a Lex clone developed by the Gnu foundation as a free download. It generates a scanner in C/C++ from a Lex program that specifies token patterns and associated actions.

9 Processing with Flex. flex –obase.yy.c base.l –base.l : Lex program defining patterns/actions. –base.yy.c : generated C scanner. cl /Febase.exe base.yy.c –base.exe : the executable scanner. Base –Execute the scanner against standard input (keyboard). Base <file –Execute scanner redirecting input from the named file.

10 Lex program structure. %{ … C declarations … %} … Lex definitions … % … Lex rules of the form … pattern { … C actions … } % … C functions … %{ int allanCount; %} % [aA]ll?an { allanCount++; }. ; % int yywrap () { return 1; } int main () { yylex(); printf ("Input contains %d Allan’s. \n", allanCount); return 0; }

11 Generated scanner operation. Input is matched character by character to the patterns in the rules section. The longest pattern match then causes the associated actions to be executed. –If no match then the character is copied to the output. Defaults for input and output are stdin and stdout. –These can be changed by assigning to the predefined variables FILE *yyin, *yyout

12 Simplest Lex program. % int yywrap () { return 1; } int main () { yylex(); return 0; } Copies the input to the output. Note that yywrap() and main() are generated automatically by some Lex implementations. yywrap() indicates whether or not wrap-up is complete. –Almost always return 1 here. –Called by Lex when input is exhausted. As usual main() is the program entry point. –Calls yylex() to initiate the lexer.

13 Rules section. Each rule is a pattern / action pair. –Patterns must start in column 1; –Followed by whitespace; –Then an optional C statement or {…} C block. Any text not starting in column 1 is copied verbatim to the generated C program; –e.g. comments.

14 Patterns. A pattern is a regular expression composed of constant characters and meta-characters. Review the Lex meta-character document for a list of the meta-characters and some pattern examples.Review the Lex meta-character document It is the creation of patterns using combinations of meta-characters that is the core of creating a Lex program.

15 Character, word and line counter. %{ int lineCount, wordCount, charCount; %} % \n lineCount++; [^ \t\n]+ { wordCount++; charCount+=yyleng; }. charCount++; % int yywrap () { return 1; } int main () { yylex(); printf ("Input contains \n"); printf ("%d lines\n%d words\n%d chars\n", lineCount, wordCount, charCount); return 0; }

16 Pattern matching. Input is matched character by character to the patterns in the rules section. A pattern match causes the associated actions to be executed. –If two patterns match then the longest match is used; if matches are of equal length then the first pattern is used. If no match then the input character is copied to the output.

17 Pattern definitions. Patterns may be given a name in the definitions section and this name used in a rule pattern by enclosing the name in {…}. LETTER [a-zA-Z] %{ int wc; %} % {LETTER}+ wc++;.|\n ; % int yywrap() { return 1; } int main() { yylex(); printf ("%d words found.\n", wc); return 0; }


Download ppt "Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v14.6.18."

Similar presentations


Ads by Google