Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yacc.

Similar presentations


Presentation on theme: "Yacc."— Presentation transcript:

1 Yacc

2 Lexical vs. Syntactic Analysis
Phase Input Output Lexer Sequence of characters Sequence of tokens Parser Parse tree Lex is a tool for writing lexical analyzers. Yacc is a tool for constructing parsers. Slide credit: Wes Weimer

3 The Functionality of the Parser
Input: sequence of tokens from lexer – e.g., the lex files you will write in the 1st phase of your course project. Output: parse tree of the program Also called an abstract syntax tree Output: error if the input is not valid – e.g., “parse error on line 3" Slide credit: Wes Weimer

4 Example Cool program text if x = y then 1 else 2 fi
Parser input (tokens) IF ID = ID THEN INT ELSE INT FI Parser output (tree) Slide credit: Wes Weimer

5 The Role of the Parser Not all sequences of tokens are programs
– then x * / + 3 while x ; y z then The parser must distinguish between valid and invalid sequences of tokens We need context free grammars. Yacc stands for Yet Another Compiler to Compiler. Reads a specification file that codifies the grammar of a language and generates a parsing routine

6

7 Yacc Yacc specification describes a Context Free Grammar (CFG), that can be used to generate a parser. Elements of a CFG: 1. Terminals: tokens and literal characters, 2. Variables (nonterminals): syntactical elements, 3. Production rules, and 4. Start rule.

8 Yacc Format of a production rule: symbol: definition {action} ;
Example: A → Bc is written in yacc as a: b 'c';

9 Yacc Format Format of a yacc specification file: declarations %%
grammar rules and associated actions C programs

10 Declarations To define tokens and their characteristics
%token: declare names of tokens %left: define left-associative operators %right: define right-associative operators %nonassoc: define operators that may not associate with themselves %type: declare the type of variables

11 Declarations %union: declare multiple data types for semantic values
%start: declare the start symbol (default is the first variable in rules) %prec: assign precedence to a rule %{ C declarations directly copied to the resulting C program %} (E.g., variables, types, macros…)

12 A simple yacc specification to accept L={ anbn | n>1}.
/*anbn0.y */ %token A B %% start: anbn '\n' {return 0;} anbn: A B | A anbn B ; #include "lex.yy.c"

13 Lex – yacc pair /* anbn0.l */ %% a return (A); b return (B);
. return (yytext[0]); \n return ('\n'); /*anbn0.y */ %token A B start: anbn '\n' {return 0;} anbn: A B | A anbn B ; #include "lex.yy.c"

14 Running yacc on linux In linux there is no liby.a library for yacc functions You have to add the following lines to end of your yacc specification file int yyerror(char *s) { printf(“%s\n”, s); } int main(void) yyparse(); Then type gcc -o exe_file y.tab.c -lfl yyparse();

15 Printing messages If the input stream does not match start, the default message of "syntax error" is printed and program terminates. However, customized error messages can be generated. /*anbn1.y */ %token A B %% start: anbn '\n' {printf(" is in anbn\n"); return 0;} anbn: A B | A anbn B ; #include "lex.yy.c" yyerror(s) char *s; { printf("%s, it is not in anbn\n", s); }

16 Example Output $anbn aabb is in anbn acadbefbg
Syntax error, it is not in anbn

17 A grammar to accept L = {anbn | n >= 0}.
/*anbn_0.y */ %token A B %% start: anbn '\n' {printf(" is in anbn_0\n"); return 0;} anbn: empty | A anbn B ; empty: ; #include "lex.yy.c" yyerror(s) char *s; { printf("%s, it is not in anbn_0\n", s);

18 Positional assignment of values for items
$$: left-hand side $1: first item in the right-hand side $n: nth item in the right-hand side

19 Example : printing integers
/*print-int.l*/ %% [0-9]+ {sscanf(yytext, "%d", &yylval); return(INTEGER); } \n return(NEWLINE); . return(yytext[0]); /* print-int.y */ %token INTEGER NEWLINE lines: /* empty */ | lines NEWLINE | lines line NEWLINE {printf("=%d\n", $2);} | error NEWLINE {yyerror("Reenter:"); yyerrok;} ; line: INTEGER {$$ = $1;} #include "lex.yy.c"

20 Example continued $print-int 7 =7 007 zippy syntax error Reenter: _

21 Recursive Rules Although right-recursive rules can be used in yacc, left-recursive rules are preferred, and, in general, generate more efficient parsers.

22 yylval yylex() function returns an integer, the token number, representing the kind of token read. If there is a value associated with that token, it should be assigned to the external variable yylval.

23 yylval The type of yylval is int by default. To change the type of yylval use macro YYSTYPE in the declarations section of a yacc specifications file. %{ #define YYSTYPE double %} If there are more than one data types for token values, yylval is declared as a union.

24 yylval Example with three possible types for yylval: %union{
double real; /* real value */ int integer; /* integer value */ char str[30]; /* string value */ } Example: yytext = “0012”, type of yylval: int, value of yylval: 12 yytext = “+1.70”, type of yylval: float, value of yylval: 1.7

25 Token types The type of associated values of tokens can be specified by %token as %token <real> REAL %token <integer> INTEGER %token <str> IDENTIFIER STRING Type of variables can be defined by %type as %type <real> real-expr %type <integer> integer-expr

26 To return values for tokens from a lexical analyzer:
/* lexical-analyzer.l */ alphabetic [A-Za-z] digit [0-9] alphanumeric ({alphabetic}|{digit}) [+-]?{digit}*(\.)?{digit}+ {sscanf(yytext, %lf", &yylval.real); return REAL; } {alphabetic}{alphanumeric}* {strcpy(yylval.str,yytext); return IDENTIFIER;

27 Operator Precedence All of the tokens on the same line are assumed to have the same precedence level and associativity; The lines are listed in order of increasing precedence or binding strength. %left '+' '-' %left '*' '/' describes the precedence and associativity of the four arithmetic operators. Plus and minus are left associative, and have lower precedence than star and slash, which are also left associative.

28 Example: simple calculator - lex
integer [0-9]+ dreal ([0-9]*\.[0-9]+) ereal ([0-9]*\.[0-9]+[Ee][+-]?[0-9]+) real {dreal}|{ereal} nl \n %% [ \t] ; {integer} { sscanf(yytext, "%d", &yylval.integer); return INTEGER; } {real} { sscanf(yytext, "%lf", &yylval.real); return REAL; \ { return PLUS;} \ { return MINUS;} \* { return TIMES;} \/ { return DIVIDE;} \( { return LP;} \) { return RP;} {nl} { extern int lineno; lineno++; return NL; { return yytext[0]; } int yywrap() { return 1; }

29 Example: simple calculator - yacc
%{ #include <stdio.h> %} %union{ double real; /* real value */ int integer; /* integer value */ } %token <real> REAL %token <integer> INTEGER %token PLUS MINUS TIMES DIVIDE LP RP NL %type <real> rexpr %type <integer> iexpr %left PLUS MINUS %left TIMES DIVIDE %left UMINUS

30 Example: simple calculator - yacc
%% lines: /* nothing */ | lines line ; line: NL | iexpr NL { printf("%d) %d\n", lineno, $1);} | rexpr NL { printf("%d) %15.8lf\n", lineno, $1);} iexpr: INTEGER | iexpr PLUS iexpr { $$ = $1 + $3;} | iexpr MINUS iexpr { $$ = $1 - $3;} | iexpr TIMES iexpr { $$ = $1 * $3;} | iexpr DIVIDE iexpr { if($3) $$ = $1 / $3; else { yyerror("divide by zero"); } } | MINUS iexpr %prec UMINUS { $$ = - $2;} | LP iexpr RP { $$ = $2;}

31 Example: simple calculator - yacc
rexpr: REAL | rexpr PLUS rexpr { $$ = $1 + $3;} | rexpr MINUS rexpr { $$ = $1 - $3;} | rexpr TIMES rexpr { $$ = $1 * $3;} | rexpr DIVIDE rexpr { if($3) $$ = $1 / $3; else { yyerror( "divide by zero" ); } } | MINUS rexpr %prec UMINUS { $$ = - $2;} | LP rexpr RP { $$ = $2;} | iexpr PLUS rexpr { $$ = (double)$1 + $3;} | iexpr MINUS rexpr { $$ = (double)$1 - $3;} | iexpr TIMES rexpr { $$ = (double)$1 * $3;} | iexpr DIVIDE rexpr { if($3) $$ = (double)$1 / $3; else { yyerror( "divide by zero" ); } } | rexpr PLUS iexpr { $$ = $1 + (double)$3;} | rexpr MINUS iexpr { $$ = $1 - (double)$3;} | rexpr TIMES iexpr { $$ = $1 * (double)$3;} | rexpr DIVIDE iexpr { if($3) $$ = $1 / (double)$3; else { yyerror( "divide by zero" ); } ;

32 Actions between Rule Elements
input: ab output: input: aa output: syntax error input: ba output: 14 syntax error

33 References • http://memphis.compilertools.net/interpreter.html
xcu/yacc.html

34 Yacc Parser and Conflicts

35 How the parser works Yacc turns the specification file into a C program which parses the input according to the specifications given • The parser produced by yacc consists of a finite state machine with a stack (yığıt) • The parser is capable of reading and remembering the next input token (lookahead token) • The current state is always the one on top of the stack • Initially the stack contains initial state (state 0) and no lookahead token has been read

36 How the parser works The machine has only for actions : shift, reduce, accept, error Based on the current state, the parser decides whether it needs a lookahead token to decide what action should be done, if it needs one and does not have one it calls yylex() to obtain the next token Using the current state, and the lookahead token if needed, the parser decides on its next action, and carries it out. This may result in states being pushed onto the stack, or popped off the stack lookahead token being processed or left alone

37 Shift action <lookahead_token> shift <state> e.g.
IF shift 34 When lookahead token is IF push down the current state on the stack, put state 34 onto stack and make it current state clear the lookahead symbol

38 Reduce Action When the parser has seen the right hand side of a grammar rule and is prepared to announce that it has seen an instance of the rule, replace the right hand side by left hand side Example: . reduce 18 => means reduce grammar rule 18 Example: A : x y z; => pop off the top three states (number of rules on the RHS) from the stack, then perform Example: A goto 20 => causing state 20 to be pushed onto stack, and become the current state

39 Accept and Error Accept action indicates that the entire input has been seen and that it matches the specifications Appears only when the lookahead symbol is the endmarker and indicates that the parser has successfully done its job The error action represents a place where the parser can no longer continue parsing according to the specification. The input tokens it has seen together with the lookahead token cannot be followed by anything that would result in a legal input.

40 Example %token DING DONG DELL %% rhyme : sound place ;
sound : DING DONG; place : DELL; $yacc -v filename.y produces a file named y.output it is a human readable description of the parser

41 Example state 0 state 3 $accept: _rhyme $end sound:DING_DONG
DING shift 3 . error rhyme goto 1 sound goto 2 state 1 $accept:rhyme_$end $end accept state 2 rhyme:sound_place DELL shift 5 place goto 4 state 3 sound:DING_DONG DONG shift 6 . error state 4 rhyme:sound place_ (1) . reduce 1 state 5 place:DELL_ (3) . reduce 3 state 6 sound : DING DONG_ (2) . reduce 2

42 Example (for input DING DONG DELL)
Initially current state is 0 First token DING is read Action in state 0, on token DING is shift 3 Push state 3 onto stack Clear the lookahead symbol Make state 3 current state Initial stack : 0 Current stack : 0 3 state 0 $accept: _rhyme $end DING shift 3 . error rhyme goto 1 sound goto 2

43 Example (for input DING DONG DELL)
Read the next token: DONG It becomes the lookahead symbol The action in state 3 on token DONG is shift 6 Push state 6 onto stack Clear lookahead token Current stack : 0 3 6 state 3 sound: DING _ DONG DONG shift 6 . error

44 Example (for input DING DONG DELL)
In state 6 without even consulting the lookahead the parser reduces by rule 2 sound : DING DONG This rule has two symbols on the right hand side, so two states, 6 and 3, are popped off of the stack, uncovering state 0 Current stack : 0 state 6 sound : DING DONG_ (2) . reduce 2

45 Example (for input DING DONG DELL)
In state 0, look for a goto on sound Push state 2 onto stack State 2 becomes current state Current stack : 0 2 state 0 $accept: _rhyme $end DING shift 3 . error rhyme goto 1 sound goto 2

46 Example (for input DING DONG DELL)
Next token is DELL The action in state 2 on token DELL is shift 5 Push state 5 onto stack Make it current state Clear lookahead symbol Current stack : 0 2 5 state 2 rhyme:sound_place DELL shift 5 . error place goto 4

47 Example (for input DING DONG DELL)
In state 5 the only action is reduce by rule 3 It has one symbol in the right So one state (state 5) is popped from the stack, and state 2 is uncovered Current stack : 0 2 state 5 place:DELL_ (3) . reduce 3

48 Example (for input DING DONG DELL)
In state 2, goto on place results in state 4 Push state 4 onto stack Current stack : 0 2 4 state 2 rhyme:sound_place DELL shift 5 . error place goto 4

49 Example (for input DING DONG DELL)
In state 4 only action is reduce 1 There are two symbols on the right Pop off two states from the stack Uncover state 0 Current stack : 0 state 4 rhyme:sound place_(1) . reduce 1

50 Example (for input DING DONG DELL)
In state 0, goto on rhyme causes the parser to enter state 1 Push state 1 onto stack Make state 1 current state Current stack : 0 1 state 0 $accept: _rhyme $end DING shift 3 . error rhyme goto 1 sound goto 2

51 Example (for input DING DONG DELL)
In state 1 the input is read and endmarker is obtained ($end) The action is accept Successfully end the parser state 1 $accept:rhyme_$end $end accept . error

52 Pointer Model A pointer moves (right) on the RHS of a rule while input tokens and variables are processed When all elements on the RHS are processed (pointer reaches the end of a rule) the rule is reduced If a rule reduces, the pointer returns to the rule it was called % token A B C %% start : A B C /* after reading A: start : A B C */

53 Conflicts There is a conflict if a rule is reduced when there is more than one pointer Conflicts are yacc’s way of detecting ambiguities yacc detects conflicts when it is attempting to build the parser. Yacc looks one token ahead to see if the number of tokens reduces to one before declaring a conflict

54 Conflicts Example: After tokens A and B, either one of the tokens, or both will disappear. If the next token is E the first, if the next token is C the second will disappear. If the next token is anything other than C or E both will disappear Therefore, there is no conflict. % token A B C D E F %% start : x | y x: A B C D; y: A B E F;

55 Conflicts The other way for pointers to disappear is for them to merge in a common subrule Example: %token A B C D E F %% start : x| y x: A B z E; y: A B z F; z : C D;

56 Conflicts Initially there are two pointers. After reading A and B these two pointers remain. Then these two pointers merge in the z rule. The state after reading token C is : %token A B C D E F %% start : x| y x: A B z E; y: A B z F; z : C D;

57 Conflicts However after reading A B C D this pointer splits again into two pointers Note that yacc looks one token ahead before declaring any conflict. Since one of the pointers will disappear depending on the next token, yacc does not declare any conflict %token A B C D E F %% start : x| y x: A B z E; y: A B z F; z : C D;

58 Reduce-Reduce Conflict
Conflict example After A there are two pointers. Both rules (x and y) want to reduce at the same time. If the next token is B, there will be still two pointers. Such conflicts are called reduce/reduce conflict %token A B %% start : x B | y B ; x : A ; reduce y : A ; reduce reduce/reduce conflict on B

59 Shift-Reduce Conflict
Another type of conflict occurs when one rule reduces while the other shifts. Such conflicts are called shift/reduce conflicts after A, y rule reduces x rule shifts. The next token for both cases is R. %token A R %% start : x | y R ; x : A R; (shift) y : A ; (reduce) shift/reduce conflict on R

60 Conflict Example At the end of each string there is a $end token.
%token A %% start : x | y ; x : A ; (reduce) y : A ; (reduce) reduce/reduce conflict on $end. At the end of each string there is a $end token. Therefore yacc declares reduce/reduce conflict on $end for the grammar above.

61 Conflicts Empty rules %token A B %% start : empty A A | A B; empty : ;
Without any tokens | A B; empty : ; shift/reduce conflict on A If the next token is A the empty rule will reduce and second rule (of start) will shift. Therefore yacc declares shift/reduce conflict on A

62 Debugging Yacc $yacc -v filename.y
produces a file named y.output for debugging purposes. Makefile parser: y.tab.c gcc -o parser y.tab.c -ly -ll y.tab.c: parser.y lex.yy.c yacc parser.y lex.yy.c: scanner.l lex scanner.l

63 Example y.output for this grammar starts with

64 Example

65 Example


Download ppt "Yacc."

Similar presentations


Ads by Google