Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:

Similar presentations


Presentation on theme: "Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:"— Presentation transcript:

1 Tools for building compilers Clara Benac Earle

2 Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator: yacc Java –Lexical Analyzer generators: JLex, JFlex, –Syntax Analyzer generator: CUP These tools with their documentation can be found on the internet

3 Lex: Lexical Analyzer Generator Lex Compiler C compiler example.llex.yy.c a.exe

4 Description A tool for generating scanners The scanner is described as pairs of regular expressions and C code Flex generates as output a C source file, lex.yy.c, which defines a routine yylex(). This file produces an executable When the executable is run, it analyzes its input for occurrences of the regular expressions. Whenever it finds one, it executes the corresponding C code

5 Format of the input file The flex input file consists of three sections separated by % Definitions%Rules% User Code

6 Skeleton of a lex specification (.l file) %{ %} [DEFINITION SECTION] % [RULES SECTION] % This part will be embedded into *.c substitutions, code and start states; will be copied into *.c any user code. For example, a main function to call the scanning function yylex(). define how to scan and what action to take for each token

7 The definition section Contains name definitions and declarations of start conditions Name definitions have the form: name definition name definitionExamples: DIGIT[0-9] ID[a-z][a-z0-9]*

8 The rules section Form: % { } … % Patterns are specified by regular expressions Examples: % [A-Za-z]*{ printf(“this is a word”); } %

9 Extended regular expressions xmatch the character “x”.any character except newline []a character class [xy]match either an “x” or a “y” [a-z]match any letter from “a” to “z” [^a-z]any character but those in the class r*zero or more r´s r+one or more r´s r?zero or one r {name}the expansion of the name definition {name}the expansion of the name definition

10 Extended regular expressions x|yx or y x/yx, only if followed by y (y not removed from input) x{m,n}m to n occurrences of x  xx, but only at beginning of line x$x, but only at end of line "s"exactly what is in the quotes (except for "\" and following character) A regular expression finishes with a space, tab or newline

11 Meta-characters –meta-characters (do not match themselves, because they are used in the preceding reg exps): ( ) [ ] { } + /, ^ * |. \ " $ ? - % ( ) [ ] { } + /, ^ * |. \ " $ ? - % –to match a meta-character, prefix with "\" –to match a backslash, tab or newline, use \\, \t, or \n

12 Regular Expression Examples an integer: 12345 [1-9][0-9]* a word: cat [a-zA-Z]+ a (possibly) signed integer: 12345 or -12345 [-+]?[1-9][0-9]* a floating point number: 1.2345 [0-9]*”.”[0-9]+

13 Two Rules 1.lex will always match the longest (number of characters) token possible. 2. If two or more possible tokens are of the same length, then the token with the regular expression that is defined first in the lex specification is favored.

14 How the input is matched Once the match is determined, the text corresponding to the match is made available in the global character pointer yytext, and its length in the global integer yyleng. The action corresponding to the matched pattern is then executed, and then the remaining input is scanned for another match Once the match is determined, the text corresponding to the match is made available in the global character pointer yytext, and its length in the global integer yyleng. The action corresponding to the matched pattern is then executed, and then the remaining input is scanned for another match

15 Actions Can be any arbitrary C statement Normally they are written between {} If the action is empty, then when the pattern is matched the input token is simply discarded The action “|” means “same as the action for the next rule”

16 Actions: examples % [ \t \n]+ ; ":=" return ASIG; "<“ return MINOR; "if" return IF;

17 Start conditions A mechanism for conditionally activating rules %s comment % “/*” { BEGIN comment; } ”*/” { END comment; /* = BEGIN 0; */ } ”*/” { END comment; /* = BEGIN 0; */ }. { }. { }

18 Special Functions yytext –where text matched most recently is stored yyleng –number of characters in text most recently matched yylval –associated value of current token yymore() –append next string matched to current contents of yytext yyless(n) –remove from yytext all but the first n characters unput(c) –return character c to input stream yywrap() –may be replaced by user –The yywrap method is called by the lexical analyzer whenever it inputs an EOF as the first character when trying to match a regular expression

19 Let us run a lex program


Download ppt "Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:"

Similar presentations


Ads by Google