Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 February 23, 2016 1 February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.

Similar presentations


Presentation on theme: "1 February 23, 2016 1 February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University."— Presentation transcript:

1 1 February 23, 2016 1 February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS400 Compiler Construction

2 2  Formalization February 23, 2016 2 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction Lexical Analysis & Lexical Analyzer Generators  Regular Expressions  Finite Automata  RE  Conversion  FA  Lexer Design

3 3 February 23, 2016 3 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction RegEx to FA –C–Conversion –F–Finite state processing –T–Table-driven processing Thompson Construction –C–Compound of RE by NFA –R–Rule out ambiguity –F–Finite state / table driven processing RE -> DFA (Directly) –C–Conversion without ambiguity –C–Closure turns to finite –T–Table-driven processing Keep in mind following questions

4 4 Design of a Lexical Analyzer Generator: RE to NFA to DFA p 1 { action 1 } p 2 { action 2 } … p n { action n } Lex specification with regular expressions NFA DFA Subset construction February 23, 2016 4 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

5 5 From Regular Expression to NFA (Thompson’s Construction) February 23, 2016 5 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

6 6 From Regular Expression to DFA Directly The “important states” of an NFA are those without an  -transition, that is if move({s},a)   for some a then s is an important state The subset construction algorithm uses only the important states when it determines  -closure(move(T,a)) February 23, 2016 6 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

7 7 From Regular Expression to DFA Directly (Algorithm) Augment the regular expression r with a special end symbol # to make accepting states important: the new expression is r# Construct a syntax tree for r# Traverse the tree to construct functions nullable, firstpos, lastpos, and followpos February 23, 2016 7 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

8 8 From Regular Expression to DFA Directly: Syntax Tree of ( a | b )* abb# February 23, 2016 8 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

9 9 From Regular Expression to DFA Directly: Annotating the Tree nullable(n): the subtree at node n generates languages including the empty string firstpos(n): set of positions that can match the first symbol of a string generated by the subtree at node n lastpos(n): the set of positions that can match the last symbol of a string generated be the subtree at node n followpos(i): the set of positions that can follow position i in the tree February 23, 2016 9 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

10 10 From Regular Expression to DFA Directly: Annotating the Tree February 23, 2016 10 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

11 11 From Regular Expression to DFA Directly: Syntax Tree of ( a | b )* abb# February 23, 2016 11 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

12 12 From Regular Expression to DFA Directly: followpos for each node n in the tree do if n is a cat-node with left child c 1 and right child c 2 then for each i in lastpos(c 1 ) do followpos(i) := followpos(i)  firstpos(c 2 ) end do else if n is a star-node for each i in lastpos(n) do followpos(i) := followpos(i)  firstpos(n) end do end if end do February 23, 2016 12 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

13 13 From Regular Expression to DFA Directly: Algorithm s 0 := firstpos(root) where root is the root of the syntax tree Dstates := {s 0 } and is unmarked while there is an unmarked state T in Dstates do mark T for each input symbol a   do let U be the set of positions that are in followpos(p) for some position p in T, such that the symbol at position p is a if U is not empty and not in Dstates then add U as an unmarked state to Dstates end if Dtran[T,a] := U end do end do February 23, 2016 13 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

14 14 From Regular Expression to DFA Directly: Example February 23, 2016 14 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

15 15 Time-Space Tradeoffs Automaton Space (worst case) Time (worst case) NFA O(r)O(r)O(  r  x  ) DFAO(2 |r| ) O(x)O(x) February 23, 2016 15 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

16 16 February 23, 2016 16 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction Got it with following questions RegEx to FA –C–Conversion –F–Finite state processing –T–Table-driven processing Thompson Construction –C–Compound of RE by NFA –R–Rule out ambiguity –F–Finite state / table driven processing RE -> DFA (Directly) –C–Conversion without ambiguity –C–Closure turns to finite –T–Table-driven processing

17 17 Thank you very much! Questions? February 23, 2016 17 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

18 18 February 23, 2016 18 February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS400 Compiler Construction

19 19  Formalization February 23, 2016 19 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction Lexical Analysis & Lexical Analyzer Generators  Regular Expressions  Finite Automata  RE  Conversion  FA  Lexer Design

20 20 February 23, 2016 20 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction Design: from RE to FA –F–From RegEx to NFA –F–From NFA to DFA –L–Lex Specification Lex / Yacc Compiler –L–Lex Specification –Y–Yet Another Compiler Compiler –F–From stream to interesting bits Design Organization –I–Initial stage –L–L/S/S analysis stage –I–IR as outputs for further generation Keep in mind following questions

21 21 Design of a Lexical Analyzer Generator Translate regular expressions to NFA Translate NFA to an efficient DFA regular expressions NFADFA Simulate NFA to recognize tokens Simulate DFA to recognize tokens Optional February 23, 2016 21 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

22 22 Design of a Lexical Analyzer Generator: RE to NFA to DFA p 1 { action 1 } p 2 { action 2 } … p n { action n } Lex specification with regular expressions NFA DFA Subset construction February 23, 2016 22 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

23 23 The Lex and Flex Scanner Generators Lex and its newer cousin flex are scanner generators Systematically translate regular definitions into C source code for efficient scanning Generated code is easy to integrate in C applications February 23, 2016 23 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

24 24 Creating a Lexical Analyzer with Lex and Flex lex or flex compiler lex source program lex.l lex.yy.c input stream C compiler a.out sequence of tokens lex.yy.c a.out February 23, 2016 24 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

25 25 Lex/Flex Compiler February 23, 2016 25 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction What is Lex Lex is officially known as a "Lexical Analyser". It's main job is to break up an input stream into more usable elements. What is Yacc Yacc is officially known as a "parser". It's job is to analyse the structure of the input stream, and operate of the "big picture". Yacc stands for “Yet Another Compiler Compiler What are Flex and Bison Lex and Yacc are part of BSD Unix. GNU has it's own, enhanced, versions called Flex and Bison. I'll keep referring to "Lex" and "Yacc", but you can use Flex and Bison as "drop-in" replacements in most cases. In fact, the additional features of Flex and Bison make them an irresistable choice.

26 26 Lex/Flex Compiler February 23, 2016 26 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

27 27 Lex Specification A lex specification consists of three parts: regular definitions, C declarations in %{ %} % translation rules % user-defined auxiliary procedures The translation rules are of the form: p 1 { action 1 } p 2 { action 2 } … p n { action n } February 23, 2016 27 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

28 28 Regular Expressions in Lex x match the character x \. match the character. “ string ” match contents of string of characters. match any character except newline ^ match beginning of a line $ match the end of a line [xyz] match one character x, y, or z (use \ to escape - ) [^xyz] match any character except x, y, and z [a-z] match one of a to z r * closure (match zero or more occurrences) r + positive closure (match one or more occurrences) r ? optional (match zero or one occurrence) r 1 r 2 match r 1 then r 2 (concatenation) r 1 | r 2 match r 1 or r 2 (union) ( r ) grouping r 1 \ r 2 match r 1 when followed by r 2 { d } match the regular expression defined by d February 23, 2016 28 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

29 29 Example Lex Specification 1 %{ #include %} % [0-9]+ { printf(“%s\n”, yytext); }.|\n { } % main() { yylex(); } Contains the matching lexeme Invokes the lexical analyzer lex spec.l gcc lex.yy.c -ll./a.out < spec.l Translation rules February 23, 2016 29 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

30 30 Example Lex Specification 2 %{ #include int ch = 0, wd = 0, nl = 0; %} delim [ \t]+ % \n { ch++; wd++; nl++; } ^{delim} { ch+=yyleng; } {delim} { ch+=yyleng; wd++; }. { ch++; } % main() { yylex(); printf("%8d%8d%8d\n", nl, wd, ch); } Regular definition Translation rules February 23, 2016 30 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

31 31 Example Lex Specification 3 %{ #include %} digit [0-9] letter [A-Za-z] id {letter}({letter}|{digit})* % {digit}+ { printf(“number: %s\n”, yytext); } {id} { printf(“ident: %s\n”, yytext); }. { printf(“other: %s\n”, yytext); } % main() { yylex(); } Regular definitions Translation rules February 23, 2016 31 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

32 32 Organization February 23, 2016 32 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction

33 33 February 23, 2016 33 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction Got it with following questions Design: from RE to FA –F–From RegEx to NFA –F–From NFA to DFA –L–Lex Specification Lex / Yacc Compiler –L–Lex Specification –Y–Yet Another Compiler Compiler –F–From stream to interesting bits Design Organization –I–Initial stage –L–L/S/S analysis stage –I–IR as outputs for further generation

34 34 Thank you very much! Questions? February 23, 2016 34 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction


Download ppt "1 February 23, 2016 1 February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University."

Similar presentations


Ads by Google