Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lexical Analysis with lex(1) and flex(1) © 2014 Clinton Jeffery.

Similar presentations


Presentation on theme: "Lexical Analysis with lex(1) and flex(1) © 2014 Clinton Jeffery."— Presentation transcript:

1 Lexical Analysis with lex(1) and flex(1) © 2014 Clinton Jeffery

2 Reading Read Sections 3-10 of Lexical Analysis with Flex Check out the class lecture notes Ask questions from either source

3 Traits of Scanners Function: convert from chars to tokens Identify and categorize kinds of tokens Detect boundaries between tokens Discard comments and whitespace Remember line/col #’s for error reporting Report lexical errors Run as fast as possible

4 Regular Expressions ε is a r.e. Any char in the alphabet is a r.e. If r and s are r.e.’s then r | s is a r.e. If r and s are r.e.’s then r s is a r.e. If r is a r.e. then r* is a r.e. If r is a r.e. then (r) is a r.e.

5 Common extensions to regular expression notation r+ is equivalent to rr* r? is equivalent to r|ε [abc] is equivalent to a|b|c [a-z] is equivalent to a | b| … |z [^abc] is equivalent to anything but a,b, or c

6 Lex’s extended regular expressions \cescapes for most operators “s”match C string as-is (superescape) r{m,n}match r between m and n times r/smatch r when s follows ^rmatch r when at beginning of line r$match r when at end of line

7 Lexical Attributes A lexical attribute is a piece of information about a token Compiler writer can define as needed Typically: – Categoryinteger code, used in parsing – Lexemeactual string as appears in source – Line, columnlocation in source code – Valuefor literals, the binary they represent

8 Meanings of the word “token” A single word from the source code An integer code that categorizes a word A set of lexical attributes that are computed from a single word of input An instance of a class (given by category)

9 Lex public interface FILE *yyin; /* set before calling yylex() */ int yylex(); /* call once per token */ char yytext[];/* chars matched by yylex() */ int yywrap();/* end-of-file handler */

10 .l file format header % body % helper functions

11 Lex header C code inside %{ … %} – prototypes for helper functions – #include’s that #define integer token categories Macro definitions, e.g. letter[a-zA-Z] digit[0-9] ident{letter}({letter}|{digit})* Warning: macros are fraught with peril

12 Lex body Regular expressions with semantic actions “ “{ /* discard */ } {ident}{ return IDENT; } “*”{ return ASTERISK; } “.”{ return PERIOD; } Match the longest r.e. possible Break ties with whichever appears first If it fails to match: copy unmatched to stdout

13 Lex helper functions Follows rules of ordinary C code Compute lexical attributes Do stuff the regular expressions can’t do Write a yywrap() to switch files on EOF

14 struct token – typical compiler struct token { int category; char *text; int linenumber; int column; char *filename; union literal value; }

15 “string removal tool” % “zap me”

16 whitespace trimmer % [ \t]+putchar(‘ ‘); [ \t]+/* drop entirely */

17 string replacement % usernameprintf(“%s”, getlogin() );

18 Line/word counter int lines=0, chars=0; % \n++lines; ++chars;.++chars; % main() { yylex(); printf(“lines: %d chars: %d\n”, lines, chars); }

19 Example: C/C++ reals Allow.2 ? What about 2. ? Is it: [0-9]*.[0-9]* Is it: ([0-9]+.[0-9]* | [0-9]*.[0-9]+) What about scientific notation? 3e4

20 Tweaking the Input Stream From within a semantic action after a match: – yymore() - append next token onto yytext, instead of replacing it – yyless(n) – consume only first n characters – unput(c) – place c back into input stream – input() - reads next char of input


Download ppt "Lexical Analysis with lex(1) and flex(1) © 2014 Clinton Jeffery."

Similar presentations


Ads by Google