Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lexical Analysis Consider the program: #include main() { double value = 0.95; printf("value = %f\n", value); } How is this translated into meaningful machine.

Similar presentations


Presentation on theme: "Lexical Analysis Consider the program: #include main() { double value = 0.95; printf("value = %f\n", value); } How is this translated into meaningful machine."— Presentation transcript:

1 Lexical Analysis Consider the program: #include main() { double value = 0.95; printf("value = %f\n", value); } How is this translated into meaningful machine instructions? First, each separate entity must be recognised: e.g. the 5th line is processed as This process is known as lexical analysis

2 Application: Lex A program generator Series of regular expressions lex A lexical analyser Lex input file:... definitions... %... regular expression/action pairs... %... user-defined functions...

3 Lex Regular Expressions meta-characters (do not match themselves): ( ) [ ] { } + /, ^ * |. \ " $ ? - % Let c be a character, x,y, regular expressions, s a string, m,n integers and i an identifier. regular expressions: cany character except meta-characters [...]the list of chars enclosed (may be range) [ ­...]the list of chars not enclosed.any ASCII char except newline xyconcatenation of x and y x/yx, only if followed by y (y not read) x{m,n}m to n occurrences of x ­ xx, only at beginning of line x$x, only at end of line "s"exactly what is in the quotes (except for "\" and following character) x*same as x * x+same as x + x?an optional x (same as x+ ) x|yx or y {i}definition of i

4 Lex Regular Expressions (cont.) meta characters are obtained by preceding with "\". regular expresions are terminated by space or tab backslash, tab and newline represented by \\, \t, \n

5 Definitions if identifier string appears in the definition section, string replaces identifier in {identifier}. L [a-zA-Z] % {L}+; is same as: % [a-zA-Z]+; Anything enclosed between %{... %} in this section will be copied straight into lex.yy.c include and define statements, all variables, all function definitions, and any comments should be placed here. E.g. %{ #include /* an example program */ %}

6 Actions A C-language statement followed by ; Example: [0-9]+printf("Integer\n"); [a-zA-Z]+printf("String\n"); will output "Integer" after receiving a digit string, and "String" after receiving a character string. Input: 12+19=sum; will be result in: Integer +Integer =String ; Note: a recognised regular expression is held in the string yytext. Its length is held in the integer yylen.

7 Running Lex To run a lex program "example.l", type lex example.l cc lex.yy.c -ll a.out "-ll" calls the lex library. This library contains a "main" program, which calls yylex(). You can override this by defining your own "main".

8 Example Lex Program %{ /* simple word recognition */ %} L[a-zA-Z] % [ \t]+;/* ignore whitespace */ is|areprintf("verb: %s; ", yytext); a|theprintf("determiner: %s; ", yytext); dog | cat | male | femaleprintf("noun: %s; ", yytext); {L}+printf("unknown: %s; ", yytext);.|\nECHO; % main() { yylex(); }

9 Example Session % word the dog is a male determiner: the;noun: dog; verb: is; determiner: a; noun: male; female cat dog is noun: female; noun: cat; noun: dog; verb: is; catdog is male unknown: catdog; verb: is; noun: male; -d %

10 Practical 1: Lexical Analysis Aim: To write a lexical analyser in C using Lex, for the language L, defined below. identifiers: sequence of one or more letters, must be declared before use, int or real. integers: optional sign, one or more digits reals: optional sign, one or more digits, decimal point, one or more digits expressions: bracketed expressions using +, -, *, / and :=. comments: start with !, to end of line print statements: either printi or printr, for printing integers and reals, one argument.

11 Example L Program ! example L program real a; real baboon; int x y; ! end of declarations x := 300; printi(x); y := 7 - x; a := / 3 * 5 - 5; baboon := a * y; printi(5); printr(baboon);

12 Required Structure Output should be in the form of pairs. Every element of the program should be classified. Thus, output for the 9th line should be:, Numbers should be converted from strings to the appropriate form. The input must be described by regular expressions. You must use Lex. A "tokens.h" file will be supplied, defining all the different tokens to be used. You should output the token names and not the associated numbers.


Download ppt "Lexical Analysis Consider the program: #include main() { double value = 0.95; printf("value = %f\n", value); } How is this translated into meaningful machine."

Similar presentations


Ads by Google