LEX & Yacc Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.

Slides:



Advertisements
Similar presentations
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
Advertisements

Yacc YACC BNF grammar example.y Other modules example.tab.c Executable
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
1 Chapter 2: Scanning 朱治平. Scanner (or Lexical Analyzer) the interface between source & compiler could be a separate pass and places its output on an.
Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
Chapter 3 Chang Chi-Chung. The Structure of the Generated Analyzer lexeme Automaton simulator Transition Table Actions Lex compiler Lex Program lexemeBeginforward.
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
CS 536 Spring Learning the Tools: JLex Lecture 6.
Syntax Analysis – Part II Quick Look at Using Bison Top-Down Parsers EECS 483 – Lecture 5 University of Michigan Wednesday, September 20, 2006.
Compilers: Yacc/7 1 Compiler Structures Objective – –describe yacc (actually bison) – –give simple examples of its use , Semester 1,
Saumya Debray The University of Arizona Tucson, AZ 85721
LEX and YACC work as a team
1 Flex. 2 Flex A Lexical Analyzer Generator  generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using.
1 Using Yacc: Part II. 2 Main() ? How do I activate the parser generated by yacc in the main() –See mglyac.y.
Using the LALR Parser Generator yacc By J. H. Wang May 10, 2011.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
1 YACC Parser Generator. 2 YACC YACC (Yet Another Compiler Compiler) Produce a parser for a given grammar.  Compile a LALR(1) grammar Original written.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Miscellaneous 컴파일러 입문.
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
Introduction to Lex Ying-Hung Jiang
–Writing a parser with YACC (Yet Another Compiler Compiler). Automatically generate a parser for a context free grammar (LALR parser) –Allows syntax direct.
Introduction to Yacc Ying-Hung Jiang
1 Using Lex. Flex – Lexical Analyzer Generator A language for specifying lexical analyzers Flex compilerlex.yy.clang.l C compiler -lfl a.outlex.yy.c a.outtokenssource.
Introduction to Lex Fan Wu
Lex.
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
Syntactic Analysis Tools
Compiler Principle and Technology Prof. Dongming LU Mar. 26th, 2014.
YACC. Introduction What is YACC ? a tool for automatically generating a parser given a grammar written in a yacc specification (.y file) YACC (Yet Another.
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
COMPILER CONSTRUCTION Principles and Practice Kenneth C. Louden.
Introduction to YACC CS 540 George Mason University.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Yet Another Compiler-Compiler Stephen C. Johnson July 31, 1978 YACC.
C Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei,
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Scanner Generation Using SLK and Flex++ Followed by a Demo Copyright © 2015 Curt Hill.
Compiler Principle and Technology Prof. Dongming LU Feb. 28th, 2014.
LECTURE 11 Semantic Analysis and Yacc. REVIEW OF LAST LECTURE In the last lecture, we introduced the basic idea behind semantic analysis. Instead of merely.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
Chapter 2-II Scanning Sung-Dong Kim Dept. of Computer Engineering, Hansung University.
LEX SUNG-DONG KIM, DEPT. OF COMPUTER ENGINEERING, HANSUNG UNIVERSITY.
YACC Primer CS 671 January 29, CS 671 – Spring Yacc Yet Another Compiler Compiler Automatically constructs an LALR(1) parsing table from.
YACC (Yet Another Compiler-Compiler) Chung-Ju Wu
Parser Generation Tools (Yacc and Bison) CS 471 September 24, 2007.
1 Syntax Analysis Part III Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
Compiler Construction Sohail Aslam Lecture Parser Generators  YACC – Yet Another Compiler Compiler appeared in 1975 as a Unix application.  The.
YACC SUNG-DONG KIM, DEPT. OF COMPUTER ENGINEERING, HANSUNG UNIVERSITY.
Syntax error handling –Errors can occur at many levels lexical: unknown operator syntactic: unbalanced parentheses semantic: variable never declared runtime:
COMPILER CONSTRUCTION
Sung-Dong Kim, School of Computer Engineering, Hansung University
Syntax Analysis Part III
Tutorial On Lex & Yacc.
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Using SLK and Flex++ Followed by a Demo
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University
Chapter 4 Syntax Analysis.
Syntax Analysis Part III
Syntax Analysis Part III
Syntax Analysis Part III
Syntax Analysis Part III
Compiler Lecture Note, Miscellaneous
Compiler Structures 7. Yacc Objectives , Semester 2,
CMPE 152: Compiler Design December 4 Class Meeting
Systems Programming & Operating Systems Unit – III
Presentation transcript:

LEX & Yacc Sung-Dong Kim, Dept. of Computer Engineering, Hansung University

LEX Input: tiny.l Output: lex.yy.c or lexyy.c Procedure yylex Table-driven implementation of a DFA Similar to “getToken” (2010-1) Compiler2 Lex Scanner (C code) RE + action

LEX Convention (1) Metacharacters Quotes: actual characters For not metacharacters: “if”, if For metacharacters: “(” Backslash \(\* = “\*” \n, \t (aa|bb)(a|b)*c? = (“aa”|“bb”)(“a”|“b”)* “c”? (2010-1) Compiler3

LEX Convention (2) [...] : any one of them [abxz]: any one of the characters a, b, x, z (aa|bb)(ab)*c? Hyphen Ranges of characters [0-9] 4(2010-1) Compiler

LEX Convention (3). Represents a set of characters Any character except a newline ^ Complementary sets [^0-9abc]: any character that is not a digit and is not one of the letter a, b, c (2010-1) Compiler5

LEX Convention (4) Square bracket Most of the metacharacters lose their special status [-+] == (“+”|“-”) [+-]: from “+”, all characters [.”?]: any of the three characters., ”, ? [\^\\]: ^ or \ (2010-1) Compiler6

LEX Convention (5) Curly bracket Names of regular expressions (2010-1) Compiler7 nat = [0-9]+ signedNat = (“+”|“-”)? nat nat [0-9]+ signedNat (“+”|“-”)? {nat}

Format of LEX Input (1) Input file = regular expression + C code Definitions Any C code that must be inserted to any function - %{…}% Names of regular expressions Rules Regular expressions + C code (action) Auxiliary routines (optional) C code + main program (if needed) (2010-1) Compiler8

Format of LEX Input (2) Layout (2010-1) Compiler9 {definitions} % {rules} % {auxiliary routines}

(2010-1) Compiler10 Example 1: scanner that adds line numbers to text %{ /* a Lex program that adds line numbers to lines of text, printing the new text to the standard output */ #include int lineno = 1; %} line.*\n % {line} {printf(“%5d %s”,lineno++,yytext); } % main() { yylex(); return 0; }

(2010-1) Compiler11 %{ /* a Lex program that changes all numbers from decimal to hexadecimal notation, printing a summary statistic stderr */ #include int count = 0; %} digit [0-9] number {digit}+ % {number} { int n = atoi(yytext); printf(“%x”, n); if (n > 9) count++; } %

main() { yylex(); fprintf(stderr, “number of replacements = %d”, count); return 0; } 12(2010-1) Compiler

13 %{ /* Selects only lines that end or begin with the letter ‘a’. Deletes everything else. */ #include %} ends_with_a.*a\n begins_with_a a.*\n % {ends_with_a} ECHO; {begins_with_a} ECHO;.*\n ; % main() { yylex(); return 0; }

Summary (1) Ambiguity resolution The principles of longest substring Substring with equal length: first-match first-serve No match: copy the next character and continue (2010-1) Compiler14

Summary (2) Insertion of C Code %{ … %}: exact copy Auxiliary procedure section: exact copy at the end Any code following a RE (action): at the appropriate place in yylex (2010-1) Compiler15

Lex Internal Names lex.yy.c: Lex output file name or lexyy.c yylex: Lex scanning routine yytext: String matched on current action yyin: Lex input file (default: stdin) yyout: Lex output file (default: stdout) input: Lex buffered input routine ECHO: Lex default action (print yytext to yyout) (2010-1) Compiler16

%{ #include “globals.h” #include “util.h” #include “scan.h” /* lexeme of identifier or reserved word */ char tokenString[MAXTOKENLEN+1]; */ digit[0-9] number{digit}+ letter[a-zA-Z] identifier{letter}+ newline\n whitespace[ \t] % LEX for TINY (2010-1) Compiler17

“if”{ return IF; } “then”{ return THEN; } “else”{ return ELSE; } “end”{ return END; } “repeat”{ return REPEAT; } “until”{ return UNTIL; } “read”{ return READ; } “write”{ return WRITE; } “:=”{ return ASSIGN; } “=”{ return EQ; } “<”{ return LT; } “+”{ return PLUS; } “-”{ return MINUS; } “*”{ return TIMES; } “/”{ return OVER; } “(”{ return LPAREN; } “)”{ return RPAREN; } “;”{ return SEMI; } (2010-1) Compiler18

(2010-1) Compiler19 {number}{ return NUM; } {identifier}{ return ID; } {newline}{ lineno++; } {whitespace}{ /* skip whitespace */ } “{”{ char c; do { c = input(); if (c == ‘\n’) lineno++; } while (c != ‘}’); }.{ return ERROR; } %

(2010-1) Compiler20 TokenType getToken(void) {static int firstTime = TRUE; TokenType currentToken; if (firstTime) { firstTime = FALSE; lineno++; yyin = source; yyout = listing; } currentToken = yylex(); strncpy(tokenString, yytext, MAXTOKENLEN); if (TraceScan) { fprintf(listing, “\t%d: “, lineno); printToken(currentToken, tokenString); } return currentToken; }

YACC LALR(1) parser generator Yet another compiler compiler (2010-1) Compiler21 Parser Generator synta x spec. parser

YACC Basics (1) Input/output Specification file format (2010-1) Compiler22 Yacc filename.y y.tab.c ytab.c filename.tab.c {definitions} % {rules} % {auxiliary routines}

YACC Basics (2) Definitions Information about tokens, data types, grammar rules C code  output file Rules Modified BNF format C code Auxiliary routines Procedure and function declarations main()  yyparse()  yylex() (2010-1) Compiler23

(2010-1) Compiler24 %{ #include %} %token NUMBER % command : exp {printf(“%d\n”,$1);} exp : exp ‘+’ term {$$ = $1 + $3;} | exp ‘-’ term {$$ = $1 - $3;} | term {$$ = $1;} ; term : term ‘*’ factor {$$ = $1 * $3;} | factor {$$ = $1;} ; factor : NUMBER {$$ = $1;} | ‘(’ exp ‘)’ {$$ = $2;} ; %

(2010-1) Compiler25 main() { return yyparse(); } int yylex(void) { int c; while((c = getchar()) == ‘ ‘); /* blank 제거 */ if (isdigit(c)) { ungetc(c,stdin); scanf(“%d”,&yylval); return(NUMBER); } if (c == ‘\n’) return 0; /* 파싱 정지 */ return(c); } void yyerror(char *s) { fprintf(stderr,”%s\n”,s); /* 에러메시지 출력 */ return 0; }

YACC Options (1) -d Header file generation yacc –d filename.y y.tab.h, ytab.h, filename.tab.h Other file #include y.tab.h Call yylex() (2010-1) Compiler26

YACC Options (2) -v option Verbose option yacc –d filename.y y.output (2010-1) Compiler27

(2010-1) Compiler28 state 0 $accept : command $end NUMBER shift 5 ( shift 6. error command goto 1 exp goto 2 term goto 3 factor goto 4 state 1 $accept : command_$end $end accept. error state 2 command : exp_ (1) exp : exp_+ term exp : exp_- term + shift 7 - shift 8. reduce 1 state 3 exp : term_ (4) term : term_* factor * shift 9. reduce 4 state 4 term : factor_ (6). reduce 6

(2010-1) Compiler29 state 7 exp : exp +_term NUMBER shift 5 ( shift 6. error term goto 11 factor goto 4 state 8 exp : exp -_term NUMBER shift 5 ( shift 6. error term goto 12 factor goto 4 state 5 factor : NUMBER_ (7). reduce 7 state 6 factor : (_exp ) NUMBER shift 5 ( shift 6. error exp goto 10 term goto 3 factor goto 4

(2010-1) Compiler30 state 11 exp : exp + term_ (2) term : term_* factor * shift 9. reduce 2 state 12 exp : exp – term_ (3) term : term_* factor * shift 9. reduce 3 state 13 term : term * factor_ (5). reduce 5 state 9 term : term *_factor NUMBER shift 5 ( shift 6. error factor goto 13 state 10 exp : exp_+ term exp : exp_- term factor : ( exp_) + shift 7 - shift 8 ) shift 14. error

(2010-1) Compiler31 state 14 factor : ( exp )_ (8). reduce 8 8/127 terminals, 4/600 nonterminals 9/300 grammar rules, 15/1000 states 0 shift/reduce, 0 reduce/reduce conflicts reported 9/601 working sets used memory: states, etc. 36/2000, parser 11/4000 9/601 distinct lookahead sets 6 extra closures 18 shift entries, 1 exceptions 8 goto entries 4 entries saved by goto default Optimizer space used: input 50/2000, output 218/ table entries, 202 zero maximum spread: 257, maximum offset: 43