Presentation is loading. Please wait.

Presentation is loading. Please wait.

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.

Similar presentations


Presentation on theme: "College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation."— Presentation transcript:

1 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation Techniques Dr. Ying JIN Associate Professor Sept. 2007

2 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -2- What we have already introduced –What this course is about? –What is a compiler? –The ways to design and implement a compiler; –General functional components of a compiler; –The general translating process of a compiler;

3 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -3- What will be introduced Scanning –The first phase in a compiler –Functional requirement (input, output, functions) –Data structures –General techniques in developing a scanner Two formal languages –Regular expression –Finite automata (DFA, NFA) Three algorithms –From regular expression to NFA –From NFA to DFA –Minimizing DFA One implementation –Implementing DFA to get a scanner

4 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -4- Functional components of a Compiler Lexical Analysis scanning Syntax Analysis Parsing Semantic Analysis Intermediate Code Optimization Intermediate Code Generation Target Code Generation analysis/front endsynthesis/back end

5 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -5- Presentation Time What you have read and known?

6 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -6- Outline 2.1 Overview 2.1.1 General Function of a Scanner 2.1.2 Some Issues about Scanning 2.2 Finite Automata 2.2.1 Definition and Implementation of DFA 2.2.2 Non-Determinate Finite Automata 2.2.3 Transforming NFA into DFA 2.2.4 Minimizing DFA 2.3 Regular Expressions 2.3.1 Definition of Regular Expressions 2.3.2 Regular Definition 2.3.4 From Regular Expression to DFA 2.4 Design and Implementation of a Scanner 2.4.1 Developing a Scanner from DFA 2.4.2 A Scanner Generator – Lex

7 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -7- Knowledge Relation Graph Develop a Scanner Lexical definition basing on Regular ExpressionNFA DFA usingtransfo rming minimized DFA minimizing implement

8 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -8- Outline 2.1 Overview 2.1.1 General Function of a Scanner 2.1.2 Some Issues about Scanning 2.2 Finite Automata 2.2.1 Definition and Implementation of DFA 2.2.2 Non-Determinate Finite Automata 2.2.3 Transforming NFA into DFA 2.2.4 Minimizing DFA 2.3 Regular Expressions 2.3.1 Definition of Regular Expressions 2.3.2 Regular Definition 2.3.4 From Regular Expression to DFA 2.4 Design and Implementation of a Scanner 2.4.1 Developing a Scanner from DFA 2.4.2 A Scanner Generator – Lex

9 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -9- 2.1 Overview General function of scanning –Input –Output –Functional description Some issues about scanning –Tokens –Blank/tab, return, newline, comments –Lexical errors

10 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -10- General Function of Scanning Input –Source program Output –Sequence of words (tokens) Functional description (similar to spelling) –Read source program ; –Recognize words one by one according to the lexical definition of the source language ; –Build internal representation of words – tokens; –Check lexical errors; –Output the sequence of words ;

11 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -11- Some Issues about Scanning

12 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -12- Tokens Source program –Stream of characters Token –A sequence of characters that can be treated as a unit in the grammar of a programming language; –Token types Identifier: x, y1, …… Number: 12, 12.3, Keywords (reserved words): int, real, main, Operator: +, -, *, /, >, <, …… Delimiter: ;, {, }, ……

13 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -13- Tokens The content of a token token-typeSemantic information Semantic information: - Identifier: the string - Number: the value - Keywords (reserved words): the number in the keyword table - Operator: itself - Delimiter: itself

14 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -14- Keywords –Words that have special meaning –Can not be used for other meanings Reserved words –Words that are reserved by a programming language for special meaning; –Can be used for other meaning, overload previous meaning; Keyword table –to record all keywords defined by the source programming language

15 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -15- Sample Source Program ) a ; t d ) 1 eyr(  dr  ;y, △ x) ni  v sle △ )(teirw △ nhe  y>x △ i }  ; (0t i w △ e artx{ a ( ; e f △ r e 

16 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -16- Sequence of Tokens var , k  int, kx, ide,y, ide;  {read, k(x, ide);  read, k (y, ide);  if, kx, ide> y, idethen, kwrite,k(1, num);  else, kwrite,k(0, num);  }

17 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -17- Blank, tab, newline, comments No semantic meaning Only for readability Can be removed Line number should be calculated;

18 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -18- The End of Scanning Two optional situations –Once read the character representing the end of a program; PASCAL: ‘.’ –The end of the source program file

19 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -19- Lexical Errors Limited types of errors can be found during scanning –illegal character ; §, ← –the first character is wrong; “/abc” Lexical Error Recovery –Once a lexical error has been found, scanner will not stop, it will take some measures to continue the process of scanning Ignore current character, start from next character if § a then x = 12.else ……

20 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -20- Scanner Two forms CharList Independent Scanner TokenList Attached Scanner Syntax Analysis call Token CharList Syntax Analysis

21 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -21- To Develop a Scanner Now we know what is the function of a Scanner ; How to implement a scanner? –Basis : lexical rules of the source programming language Set of allowed characters; What kinds of tokens it has? The structure of each token-type –Scanner will be developed according to the lexical rules;

22 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -22- How to define the lexical structure of a programming language? 1.Natural language (English, Chinese …… ) - easy to write, ambiguous and hard to implement 2.Formal languages - need some special background knowledge; - concise, easy to implement; - automation;

23 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -23- Two formal languages for defining lexical structure of a programming language –Finite automata Non-deterministic finite automata (NFA) Deterministic finite automata (DFA) –Regular expressions (RE) –Both of them can formally describing lexical structure The set of allowed words –They are equivalent –FA is easy to implement; RE is easy to use;

24 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -24- Outline 2.1 Overview 2.1.1 General Function of a Scanner 2.1.2 Some Issues about Scanning 2.2 Finite Automata 2.2.1 Definition and Implementation of DFA 2.2.2 Non-Determinate Finite Automata 2.2.3 Transforming NFA into DFA 2.2.4 Minimizing DFA 2.3 Regular Expressions 2.3.1 Definition of Regular Expressions 2.3.2 Regular Definition 2.3.4 From Regular Expression to DFA 2.4 Design and Implementation of a Scanner 2.4.1 Developing a Scanner from DFA 2.4.2 A Scanner Generator – Lex √

25 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -25- 2.2 Finite Automata Definition of DFA Implementation of DFA Non-Determinate Finite Automata Transforming NFA into DFA Minimizing DFA

26 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -26- Definition of DFA - formal definition - two ways of representations - examples - some concepts

27 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -27- Formal Definition of DFA – (  , SS, S0, , TS ) –  ( alphabet ), set of allowed characters, each character can be called as input symbol; –SS = {S0, S1, S2, ……} , a finite set, each element is called state; –S 0  SS, start state –  : SS    SS  {  }, transforming function –TS  SS, set of terminal (accept) states –Note :  is a function which accepts a state and a symbol and returns either one unique state or  (no definition) ; One DFA defines a set of strings; each string is a sequence of characters in  ; Start state gives the start point of generating strings; Terminal states give the end point; Transforming function give the rules how to generate strings;

28 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -28- Features of a DFA –One start state ; –For a state and a symbol, it has at most one edge; Functions of DFA –It defines a set of strings; –It can be used for defining lexical structure of a programming language

29 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -29- Two ways of Representations Table –Convenient for implementation Graph –easy to read and understand

30 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -30- Two ways of Representations (Table) Transforming Table –start state : S 0 –terminal state: * –Row( 行 ): characters –Column( 列 ) : states –Cell( 单元 ): states or ⊥

31 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -31- Example of Transforming Table abcd S0S0 S1 ⊥ S2S*S* S1 ⊥ ⊥ S2 S*S* ⊥⊥⊥ S*S* ⊥⊥ S*S* ⊥  : {a, b, c, d} SS: {S 0, S1, S2, S*} Start state: S 0 Set of terminal states: {S * }  : {(S 0,a)  S1, (S 0,c)  S2, (S 0,d)  S *, (S1,b)  S1, (S1,d)  S2, (S2,a)  S *, (S *, c)  S * }

32 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -32- Two ways of Representations (graph) Graph –start state: –terminal state: –State –Edge: S0S0 S S a, b

33 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -33- Example of Graphical DFA :: {a, b, c, d} SS: {S 0, S1, S2, S*} Start state: S 0 Set of terminal states: {S * } :: {(S 0,a)  S1, (S 0,c)  S2, (S 0,d)  S *, (S1,b)  S1, (S1,d)  S2, (S2,a)  S *, (S *, c)  S * } S0S0 a S1 S2 S*S* c d d a b c

34 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -34- Some Concepts String acceptable by a DFA – If A is a DFA, a 1 a 2 … a n is a string, if there exists a sequence of states (S 0, S 1, …,S n ), which satisfies S 0 S 1, S 1 S 2, ……, S n-1 S n where S 0 is the start symbol, S n is one of the accept states, the string a 1 a 2 … a n is acceptable by the DFA A. Set of strings defined by DFA –The set of all the strings that are acceptable by a DFA A is called the set of strings defined by A, which is denoted as L(A) a1a2an

35 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -35- Some Concepts Special case : If a DFA is composed by one state, which is the start state and the accept state, the set of strings defined by the DFA is an empty set  ( 空字符串集 ). S

36 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -36- Relating DFA to Lexical Structure of a Programming language Use a DFA to define the lexical structure of one word-type in a programming language –usigned real number ( 无符号实数 ) A DFA can be defined for the lexical structure of all the words in a programming language; The set of strings defined by the DFA is the set of allowed words in the programming language; The implementation of the DFA can be used as a scanner for the programming language; S0S0 S1 1, 2, …, 9 0,1,2, …,9 S1. 0,1,2, …,9

37 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -37- Assignment For standard C programming language –Find out the token types and their lexical structures –Write down DFA for each token type

38 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -38- Implementation of DFA

39 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -39- Implementation of DFA Objective (meaning of implementing a DFA) –Given a DFA which defines rules for a set of strings –Develop a program, which Read a string Check whether this string is accepted by the DFA Two ways –Basing on transforming table of DFA –Basing on graphical representation of DFA

40 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -40- Transforming Table based Implementation Main idea –Input : a string –Output: true if acceptable, otherwise false –Data structure Transforming table (two dimensional array T) –Two variables CurrentState: record current state; CurrentChar: record current character that is read in the string;

41 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -41- Transforming Table based Implementation Main idea –General Algorithm 1.CurrentState = S 0 2. read the first character as CurrentChar 3. if CurrentChar is not the end of the string, if T(CurrentState,CurrentChar)  error, CurrentState = T(CurrentState,CurrentChar), read next character of the string as CurrentChar, goto 3; 4. if CurrentChar is the end of the string and CurrentState is one of the terminal states, return true; otherwise, return false.

42 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -42- Example abcd S0S0 S1 ⊥ S2S*S* S1 ⊥ ⊥ S2 S*S* ⊥⊥⊥ S*S* ⊥⊥ S*S* ⊥ 1) abbacc true 2) cab false

43 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -43- Transforming table for the DFA Variables: CurrentChar, CurrentState Read the string that want to be checked Checking process:

44 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -44- each state corresponds to a case statement each edge corresponds to a goto statement for accept state, add one more branch, if current char is the end of the string then accept; Graph based Implementation of DFA i b j k a Li: case CurrentChar of a : goto Lj b : goto Lk other : Error( ) i Li: case CurrentChar of a : goto Lj b : goto Lk # : return true; other : Error( )

45 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -45- S0S0 a S1 S2 S*S* c d d a b c LS0: read character to CurrentChar; case CurrentChar of a: goto LS1; c: goto LS2: d: goto LS3; other: return false; LS1: read character to CurrentChar; case CurrentChar of b: goto LS1; d: goto LS2: other: return false;

46 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -46- S0S0 a S1 S2 S*S* c d d a b c LS2: read character to CurrentChar; case CurrentChar of a: goto LS3; other: return false; LS3: read character to CurrentChar; case CurrentChar of c: goto LS2: #: return true; other: return false;

47 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -47- Definition of NFA

48 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -48- Formal Definition – (  , SS, SS0, , TS ) –  ( alphabet ), set of allowed characters, each character can be called as input symbol; –SS = {S0, S1, S2, ……} , a finite set, each element is called state; –S 0  SS, set of start states –  : SS    power set of SS  {  }, transforming function –TS  SS, set of terminal (accept) states –Note :  is a function which accepts a state and a symbol and returns a set of states or  (no definition) ;

49 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -49- Differences between DFA & NFA DFANFA Start stateOne start stateSet of start states  noallowed T (S, a) S’ or ⊥ {S1, …, Sn} or ⊥ implementationeasyNon-deterministic

50 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -50- Example of NFA :: {a, b, c, d} SS: {S 0, S1 0, S2, S*} Set of Start state: {S 0, S1 0 } Set of terminal states: {S * } :: {(S 0,a)  {S1 0, S* },(S 0,  )  {S2}, (S1 0,b)  {S1 0 }, (S1 0,  )  {S2}, (S2,  )  {S *}, (S *, c)  {S * }} S0S0 a S1 0 S2 S*S*  a   b c

51 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -51- From NFA to DFA

52 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -52- Main Idea Solve two problems –  edge  -closure (SS)  闭包 –Merging those edges with the same symbol NextStates(SS, a) Conversion of NFA to DFA –Using a set of states in NFA as one state in DFA –Assuring accepting the same set of strings

53 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -53-  -closure (  闭包 ) For a given NFA A, and a set of states SS, –  -closure( SS ) = SS ; –If there exists a state s in SS, which has a  -edge referring to a state s’ and s’  -closure( SS ), add to s’ to  -closure( SS ) ; –Repeat until there is no state having  -edge to states that is not in  -closure( SS ) ;

54 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -54-  -closure (  闭包 ) -- Example S0S0 a S1 0 S2 S*S*  a   b c  -closure({S 0, S1 0 }) = ① {S 0, S1 0 } ② {S 0, S1 0, S2} ③ {S 0, S1 0, S2} ④ {S 0, S1 0, S2,S * }

55 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -55- Moving States For a given set of states SS and a symbol a in a NFA A, –NextStates( SS, a ) = {s | if there is a state s1  SS, and a edge s1 s in A } a

56 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -56- Moving States S0S0 a S1 0 S2 S*S*  a   b c NextStates({S 0, S1 0 }, a) = {S1 0, S * } NextStates({S 0, S1 0 }, b) = {S2}

57 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -57- Algorithm Given a NFA A = { , SS, SS 0, , TS} Generating an equivalent DFA A’ = { , SS’,S 0,  ’, TS’} Steps –(1) S 0 =  -closure(SS 0 ), add S 0 to SS’; –(2) select one state s from SS’, for any symbol a , let s’ = NextStates(  -closure(s), a), add (s, a)  s’ to  ’, if s’  SS’, add s’ to SS’; –(3) repeat (2) until all states are handled; –(4) for a state s in SS’, s = {S1,.., Sn}, if there exists Si  TS, then s is an accept state in A’, add s to TS’;

58 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -58- Example S0S0 a S1 0 S2 S*S*  a   b c  = {a, b, c}, S0 =  -closure({S 0, S1 0 }) = {S 0, S1 0, S2,S * }, abc {S 0, S1 0, S2,S * } {S1 0, S *,S2} {S*}{S*} {S*}{S*} {S*}{S*}{S*}{S*}

59 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -59- Minimizing DFA

60 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -60- Problem Equivalent of two DFAs –If the set of strings accepted by two DFAs are the same; Among those DFAs that accept the same set of strings, the minimal DFA refers to the one that has minimal number of states; How this happens?

61 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -61- Equivalent DFAs S0S0 a S1 S4 * b d S3 * S2 d S0S0 a S1 b d S*S* There are states that accepting the same set of strings!

62 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -62- Main Idea Equivalent states( 等价状态 ) –For two states s1 and s2 in a DFA, if treat s1 and s2 as start states and they accept the same set of strings, s1 and s2 will be called equivalent states; Two ways to minimizing DFA –Merging equivalent states; ( 状态合并 ) –Splitting non-equivalent states; (状态分离)

63 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -63- Algorithm Given a DFA A = { , SS, S 0, , TS} Generating an equivalent DFA A’ = { , SS’,S 0 ’,  ’, TS’} Splitting Steps –(1) two groups {non-terminal states}, {terminal states}; –(2) select one group of states SS i = {Si1,…, Sin}, replace SS i with split(SS i ); –(3) repeat (2) until all groups are handled; –(4) SS’ = set of groups; –(5) S 0 ’ is the group consisting of S 0 ; –(6) if the group consisting of terminal states of A, it is terminal state of A’; –(7)  ’: SS i SS j, if there is Si Sj in A, Si  SS i, Sj  SS j aa

64 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -64- Splitting a Set of States Given –a NFA A = { , SS, S 0, , TS }; –Groups of states {SS1, …, SSm}, SS1  ……  SSm = SS; –SS i = {Si1,…, Sin}, split(SS i ) is to split SS i into two group G1 and G2, –For j = 1 to n for any a , If (Si1,a )  Sk  (Sij, a)  Sl  Sk and Sl belong to the same group SSp, add Sij to G1; Otherwise, add Sij to G2;

65 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -65- Simple Example S0S0 a S1 S4 * b d S3 * S2 d {S 0, S1, S2}, {S3 *, S4 * } {S 0 }, {S1, S2}, {S3 *, S4 * }

66 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -66- Outline 2.1 Overview 2.1.1 General Function of a Scanner 2.1.2 Some Issues about Scanning 2.2 Finite Automata 2.2.1 Definition and Implementation of DFA 2.2.2 Non-Determinate Finite Automata 2.2.3 Transforming NFA into DFA 2.2.4 Minimizing DFA 2.3 Regular Expressions 2.3.1 Definition of Regular Expressions 2.3.2 Regular Definition 2.3.4 From Regular Expression to DFA 2.4 Design and Implementation of a Scanner 2.4.1 Developing a Scanner from DFA 2.4.2 A Scanner Generator – Lex √ √

67 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -67- 2.3 Regular Expressions Definition of Regular Expressions Regular Definition From Regular Expression to DFA

68 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -68- Definition of Regular Expressions (RE) - Some Concepts - Formal Definition of RE - Example - Properties of RE - Extensions to RE - Limitations of RE - Using RE to define Lexical Structure

69 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -69- Some Concepts alphabet( 字母表 ) : a non-empty finite set of symbols , which is denoted as  , one of its elements is called symbol. string( 符号串 ) : finite sequence of symbols, we use or  to represent empty string( 空串 ); 空串集 { } is different from empty set  。 length of a string( 符号串长度 ) : the number of symbols ina string, we use |  | to represent the length of the string  ; concatenate operator for strings( 符号串连接操作 ) : if  and  are strings , we use  as the concatenation of two strings, especially we have  =  =  ;

70 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -70- Some Operators on Set of Strings product of set of strings ( 符号串集的乘积 ) : if A and B are two sets of strings, AB is called the product of two sets of strings, AB={  |  A ,   B} especially  A=A  =A , where  represents empty set 。 power of set of strings( 符号串集合的方幂 ) : if A is a set of strings, A i is called ith power of A, where i is a non-negative integer( 非负整数 ) 。 A 0 ={ } A 1 = A, A 2 = A A A K = AA......A (k) positive closure( 符号串集合的正闭包 ) : A + =A 1  A 2  A 3...... star closure( 符号串集合的星闭包 ) : A * =A 0  A 1  A 2  A 3......

71 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -71- Formal Definition For a given alphabet , a regular expression for  defines a set of strings of , If we use R  to represent a regular expression for , and L(R  ) to represent the set of strings that R  defines.

72 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -72- Formal Definition ■  is a regular expression , L(  )={ } ■ is a regular expression , L( )={ } ■ for any c , c is a regular expression, L(c)={c} ■ if A and B are regular expressions, following operators can be used – ( A ) , L( (A) )= L(A) –choice among alternatives A | B , L( A | B )=L(A)  L(B) –concatenation A B , L( A B )= L(A)L(B) –repetation A* , L( A*)= L(A)*

73 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -73- Example  ={ a,b }. RE 1.ab* 2. a(a|b)* Set of strings 1.{a,ab, abb, abbb, …… } 2. {a, aa, ab, aaa, aab, aba, abb, …… }

74 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -74- Comparing with DFA Equivalent in describing the set of strings; Can be conversed into each other; DFA is convenient for implementation; RE is convenient for defining and understanding; Both of them can be used to define the lexical structure of programming languages;

75 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -75- Properties A | B = B | A | 的可交换性 A | (B | C) =(A | B ) C| 的可结合性 A (B C) =(A B )C 连接的可结合性 A (B | C) =A B | A C 连接的可分配性 (A | B ) C =A C | B C 连接的可分配性 A** =A* 幂的等价性 A= A=A 是连接的恒等元素

76 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -76- Extensions Some extensions can be made to facilitate definition – A + –any symbol: “.” –range: [0-9] [a-z] [A-Z] –not in the range: ~(a|b|c) –optional: r?=( |r)

77 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -77- Limitations RE can not define such structure like –Pairing 配对, () –Nesting 嵌套, RE can not describe those structures that include finite number of repetitions for example : w c w, w is a string containing a and b; (a|b)* c (a|b)* can not be used, because it cannot guarantee that the strings on both sides of c are the same all the time;

78 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -78- Regular Definition

79 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -79- Definition It is inconvenient to define set of long strings with RE, so another formal notation is introduced, which is called “Formal Definition” ; The main idea is that naming some sub-expressions in RE; Example : ( 1|2|…|9)(0|1|2|…9) * NZ_digit= 1|2|…|9 digit = NZ_digit | 0 NZ_digit digit *

80 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -80- Defining Lexical Structure of ToyL letter = a|…|z|A|…|Z digit = 0|…|9 NZ-digit = 1|…|9 Reserved words: Reserved = var| if | then | else| while| read| write| intIdentifiers: letter digit* Constant: integer: int = NZ-digit digit* | 0 Other symbols: syms = +|-| *|/ | > | < | = | : | ; | ( | ) | { | } Lexical structure: lex = Reserved | identifier |int | syms

81 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -81- From RE to NFA

82 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -82- Rules ■  is a regular expression , L(  )={ } ■ is a regular expression , L( )={ } ■ for any c , c is a regular expression, L(c)={c} S0S0 S S0S0 S c

83 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -83- Rules ■ ( A ), L( (A) )= L(A), no change; ■ A B , L( A B )= L(A)L(B) NFA(A) NFA(B) 

84 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -84- Rules ■ ( A ), L( (A) )= L(A), no change; ■ A | B , L( A | B )=L(A)  L(B) NFA(A) NFA(B)    

85 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -85- Rules ■ A* , L( A*)= L(A)* NFA(A)   

86 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -86- Attention The rules introduced above are effective for those NFAs that have one start state and one terminal state; Any NFA can be extended to meet this requirement; …… NFA ……    

87 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -87- Example (a | b)* a b b (a | b) a b  a b b b a  

88 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -88- Outline 2.1 Overview 2.1.1 General Function of a Scanner 2.1.2 Some Issues about Scanning 2.2 Finite Automata 2.2.1 Definition and Implementation of DFA 2.2.2 Non-Determinate Finite Automata 2.2.3 Transforming NFA into DFA 2.2.4 Minimizing DFA 2.3 Regular Expressions 2.3.1 Definition of Regular Expressions 2.3.2 Regular Definition 2.3.4 From Regular Expression to DFA 2.4 Design and Implementation of a Scanner 2.4.1 Developing a Scanner from DFA 2.4.2 A Scanner Generator – Lex √ √ √

89 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -89- 2.4 Design and Implementation of a Scanner Developing a Scanner Manually A Scanner Generator – Lex

90 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -90- Developing a Scanner Manually Develop a Scanner Lexical Definition in Regular Expression NFA DFA transforming minimized DFA minimizing implement

91 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -91- Implement Scanner with DFA Implementation of DFA –J–Just checking whether a string is acceptable by the DFA; Implementation of a Scanner –n–not checking; –b–but recognizing an acceptable string(word) and establish its internal representation <token-type, semantic information>

92 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -92- Defining Lexical Structure of ToyL letter = a|…|z|A|…|Z digit = 0|…|9 NZ-digit = 1|…|9 Reserved words: Reserved = var| if | then | else| while| read| write| int Identifiers: letter digit* Constant: integer: int = NZ-digit digit* | 0 Other symbols: syms = + |- | * |/ | > | < | = | : | ; | ( | ) | { | } Lexical structure: lex = Reserved | identifier |int | syms

93 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -93- DFA for ToyL start done ID IntNum Assign NZ-digit digit other +, -, *, /, > | < | = | : | ; | ( | ) | { | } : = letter digit other refer to those symbols that are not allowed! Reserved(key)-words will be decided by checking identifier in reserved(key)-words table;

94 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -94- Developing a Scanner from DFA Input: a sequence of symbols, with a special symbol EOF as the end of the sequence; Output: a sequence of tokens;

95 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -95- Developing a Scanner from DFA Token Type: typedef enum { IDE, NUM, ASS, // 标识符,整数,赋值号 PLUS, MINUS, MUL, // +, - , * , DIV, GT, LT, EQ, // /, >, <, = SEMI, LPAREN, RPAREN // ;, (. ) LG, RG, COLON, // {, }, : VAR, THEN, ELSE, INT, WHILE, READ, WRITE, IF } TkType //keywords

96 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -96- Developing a Scanner from DFA Data Structure for TOKEN: struct Token {TkType type; string val[50];}

97 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -97- Developing a Scanner from DFA Global Variables: - string str[50]; ----- store the string has been read already; - int len = 0; ----- the length of the str - Token tk; ----- current token - Token TokenList[100]; ---- the sequence of tokens - int total = 0; ----- the number of tokens generated

98 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -98- Developing a Scanner from DFA Predefined Functions: - ReadNext() --- read current symbol to CurrentChar, if current symbol is EOF returns false; else returns true; - IsKeyword(str) --- checking whether str is one of keywords, if str is a keyword, it returns the number of the keywords; else it returns -1;

99 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -99- Developing a Scanner from DFA if (not ReadNext()) exit; start: case CurrentChar of “ 1..9 ” : str[len] = CurrentChar; len++; goto IntNum ; “ a..z ”, “ A..Z ” : str[len] = CurrentChar; len++; goto ID; “ : ” : goto Assign; “ + ” : tk.type =PLUS; if (not ReadNext()) exit; …… ( “ -,*,/, >, <, =, :, ;, (, ), {, } ” ) other: error();

100 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -100- Developing a Scanner from DFA IntNum: if (not ReadNext()) {if len !=0 error; exit;}; case CurrentChar of “ 0..9 ” : str[len] = CurrentChar; len++; goto IntNum ; other: tk.type = NUM, strcpy(tk.val, str);

101 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -101- Developing a Scanner from DFA ID: if (not ReadNext()) {if len !=0 error; exit;}; case CurrentChar of “ 0..9 ” : str[len] = CurrentChar; len++; goto ID; “ a..z ”, “ A..Z ” : str[len] = CurrentChar; len++; goto ID; other: if IsKeyword(str) {tk.type = IsKeyword(str) } else {tk.type = IDE, strcpy(tk.val, str) }; goto done;

102 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -102- Developing a Scanner from DFA Assign: if (not ReadNext()) {if len !=0 error; exit;}; case CurrentChar of “ = ” : Tk.type = ASS; if (not ReadNext()) exit; goto Done ; other:error();

103 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -103- Developing a Scanner from DFA Done: TokenList[total] = tk; // add new token to the token list total ++; // len = 0; //start storing new token string strcpy(str, “” ); // reset the token string goto start; //start scanning new token

104 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -104- A Scanner Generator – Lex Different versions of Lex; flex is distributed by GNU compiler package produced by the Free Software Foundation, which is freely available from Internet; flex.l RE-like definition lexyy.c ( yylex() )

105 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -105- Summary for §2. Scanning

106 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -106- Summary About finite automata –Definition of DFA ( , start state, set of states, set of terminate states, f) –Definition of NFA ( , set of start states, set of states, set of terminate states, f) –Differences between NFA and DFA Number of start states  Allows more than one ledges for a state and one same symbol

107 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -107- Summary About finite automata –From NFA to DFA main idea solve problem –Minimizing DFA main idea solve problem –Implementing DFA

108 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -108- Summary About regular expressions –Definition of regular expression –Regular definition –From regular expression to NFA main idea solving problem –Defining lexical structure with regular expression

109 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -109- Summary About scanner –Defining lexical structure of the programming language with regular expression –Transforming regular expression into NFA –Transforming NFA into DFA –Minimizing DFA –Implementing DFA

110 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -110- Summary Original Problem –Develop a Scanner Read source program in the form of stream of characters, and recognize tokens with respect to lexical rules of source language; General techniques –Use RE to define Lexical structure; –RE -> NFA -> DFA -> minimized DFA -> implement General Problem –Use RE/FA to define the structural rules –Check whether the input meets the structural rules

111 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -111- Summary Application in similar problems –Use RE(DFA) to formally describe the structures Strings Security policies Interface specification –Check whether a string meets the structural rules; Whether certain execution meets security policies; Properties checking

112 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -112-

113 College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -113- Reading Assignment Topic: How to develop a parser( 语法分析器 ) ? Objectives –Get to know What is a parser? (input, output, functions) The Syntactical structure of a C programs? Different Parsing techniques and their main idea? References –Optional textbooks Hand in a report either in English or in Chinese, and one group will be asked to give a presentation at the beginning of next class; Tips: –Collect more information from textbooks and internet; –Establish your own opinion;


Download ppt "College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation."

Similar presentations


Ads by Google