1 词法分析部分总结田聪. 2 1. 描述：用正规式对模式进行描述； 2. 构造 NFA ：为每个正规式构造一个 NFA ； 3. 确定化：将 NFA 转换成等价的 DFA ； 4. 最小化：优化 DFA ，使其状态数最少； 5. 构造词法分析器：由 DFA 构造词法分析器（表驱动，直.

1 词法分析部分总结田聪

2 1. 描述：用正规式对模式进行描述； 2. 构造 NFA ：为每个正规式构造一个 NFA ； 3. 确定化：将 NFA 转换成等价的 DFA ； 4. 最小化：优化 DFA ，使其状态数最少； 5. 构造词法分析器：由 DFA 构造词法分析器（表驱动，直接编码， LEX ）。构造词法分析器的一般方法和步骤

3 涉及到的形式化概念  正则表达式（正规式）  NFA  DFA 正则表达式例如： Char(char|digit)* ， … 字符串， abb A111, …. NFA DFA 正则语言（正规集）

4 正则语言上下文无关语言上下文有关语言递归可枚举语言

5 正则语言  正则语言  正则表达式  有限状态自动机  上下文无关语言  上下文无关文法  非确定的下推自动机  上下文有关语言  上下文有关文法  线性有界自动机（特殊的图灵机）  递归可枚举语言  短语结构  图灵机

6 Algebraic Laws for RE ’ s  Union and concatenation behave like addition and multiplication.  + is commutative ( 可交换的 ) and associative ( 可结合的 )  a+b=b+a, a+b+c= a+(b+c)  concatenation is associative ( 可结合的 )  a.b.c=a.(b.c)  Concatenation distributes over +( 可分配的 )  a.(b+c)=a.b+a.c  Exception: Concatenation is not commutative ( 可交换的 )  a.b ≠b.a

7 Identities and Annihilators  ∅ is the identity ( 单位元 ) for +.  R + ∅ = R.  ε is the identity ( 单位元 ) for concatenation.  ε R = R ε = R.  ∅ is the annihilator ( 零元 ) for concatenation.  ∅ R = R ∅ = ∅.

8 正则语言相关内容  如何证明正则表达式和 NFA 的等价性？  （ 1 ） We need to show that for every RE, there is an automaton that accepts the same language.  （ 2 ） And for every automaton, there is a RE defining its language.

9 RE to ε -NFA: Basis  Symbol a:  ε : a ε

10 RE to ε -NFA: Induction 1 – Union For E 1 For E 2 For E 1  E 2 ε εε ε

11 RE to ε -NFA: Induction 2 – Concatenation For E 1 For E 2 For E 1 E 2 ε

12 RE to ε -NFA: Induction 3 – Closure For E For E* ε ε εε

13 正则语言相关内容  如何证明正则表达式和 NFA 的等价性？  （ 1 ） We need to show that for every RE, there is an automaton that accepts the same language.  （ 2 ） And for every automaton, there is a RE defining its language.

14 From Automata to RE  Arden ’ s rule  For any sets of strings S and T, the equation X=SX+T has X=S*T as a solution. Moreover, this solution is unique if ε not in S.

15 From Automata to RE  Given an automaton A  A has states {q0, …, qn} with q0 being the start state  Let Xi denote the set of strings accepted by A starting in state qi  Thus, L(A)=X0  We can write an equation for each Xi, defining it in terms of the sets corresponding to its successor states.

16 From Automata to RE q2 q0 q3 q1 b, c a c c a a,b,c a,b b A0

17 From Automata to RE q2 q0 q3 q1 b, c a c c a a,b,c a,b b A0 (0) X0=aX1+bX3+cX3 (1) X1=aX3+bX2+cX0+ε (2) X2=aX3+bX3+cX0 (3) X3=aX3+bX3+cX3 X3=(a+b+c)X3+ ∅, by Arden’s rule: X3=(a+b+c)* ∅ = ∅ (0) X0=aX1 (1) X1=bX2+cX0+ε (2) X2=cX0 Substituing (0) and (2) in (1): X1=(bc+c)aX1+ε =((bc+c)a)* (by Arden’s rule) X0=a((bc+c)a)*

18 DFA-to-RE  Another approach  Page 93, theorem 3.4. （形式语言与自动机）  Induction on k-path.

19 Decision Properties of Regular Languages

20 Properties of Language Classes  A language class is a set of languages.  We have one example for language class: the regular languages.  任何一个正则表达式都表达了一个语言，所有的正则表达式构成了语言类：正则语言  Language classes have two important kinds of properties: 1. Decision properties. 2. Closure properties.

21 Decision Properties  A decision property for a class of languages is an algorithm that takes a formal description of a language (e.g., a DFA) and tells whether or not some property holds.  Example: Is language L empty?  Suppose the representation is a DFA.  Can you tell if L(A) =  for DFA A?

22 Why Decision Properties?  When we talked about protocols represented as DFA ’ s, we noted that important properties of a good protocol were related to the language of the DFA.  Example: “ Does the protocol terminate? ” = “ Is the language finite? ”  Example: “ Can the protocol fail? ” = “ Is the language nonempty? ”

23 Why Decision Properties – (2)  We might want a “ smallest ” representation for a language, e.g., a minimum-state DFA or a shortest RE.  If you can ’ t decide “ Are these two languages the same? ”  I.e., do two DFA ’ s define the same language? You can ’ t find the “ smallest. ”

24 Closure Properties  A closure property ( 封闭性 ) of a language class says that given languages in the class, an operator (e.g., union) produces another language in the same class.  Example: the regular languages are obviously closed under union, concatenation, and (Kleene) closure. ( 求补？求交？ )  ε 是正规式  若 a 是 Σ 上的字符，则 a 是正规式  若 r 和 s 分别是 Σ 上的正规式，那么  （ a ） r|s 是正规式  （ b ） rs 是正规式  （ c ） r* 是正规式

25 The Membership Question  Our first decision property is the question: “ is string w in regular language L? （成员问题） ”  Assume L is represented by a DFA A.  Simulate the action of A on the sequence of input symbols forming w.

26 Example: Testing Membership Start 1 0 ACB 1 0 0,1 0 1 0 1 1 Next symbol Current state

32 The Emptiness Problem  Given a regular language, does the language contain no string at all （判空问题）.  Assume representation is DFA.  Construct the transition graph.  Compute the set of states reachable from the start state.  If any final state is reachable, then yes, else no.

33 The Infiniteness Problem  Is a given regular language infinite?  Start with a DFA for the language.  Key idea: if the DFA has n states, and the language contains any string of length n or more, then the language is infinite.  Otherwise, the language is surely finite.  Limited to strings of length n or less.

34 Proof of Key Idea  If an n-state DFA accepts a string w of length n or more, then there must be a state that appears twice on the path labeled w from the start state to a final state.  Because there are at least n+1 states along the path.

35 Proof – (2) |w| = 5 s0s2 s4 s1s3 1234

36 Finding Cycles 1. Eliminate states not reachable from the start state. 2. Eliminate states that do not reach a final state. 3. Test if the remaining transition graph has any cycles.

37 The Pumping Lemma  We have, almost accidentally, proved a statement that is quite useful for showing certain languages are not regular.  Called the pumping lemma for regular languages.

38 Statement of the Pumping Lemma For every regular language L There is an integer n, such that For every string w in L of length > n We can write w = xyz such that: 1. |xy| < n. 2. |y| > 0. 3. For all i > 0, xy i z is in L. Number of states of DFA for L Labels along first cycle on path labeled w

39 Example: Use of Pumping Lemma  用来证明一个语言不是正则语言（必要非充分条件）  Example: {0 k 1 k | k > 1} is not a regular language.  Suppose it were. Then there would be an associated n for the pumping lemma.  Let w = 0 n 1 n. We can write w = xyz, where x and y consist of 0 ’ s, and y  ε.  But then xyyz would be in L, and this string has more 0 ’ s than 1 ’ s. 泵引理是正则语言的必要非充分条件！一个正则语言，必须满足泵引理。如果一个语言不满足泵引理，那么它肯定不是正则语言。如果它满足泵引理，它不一定是正则语言。

40 Pumping Lemma 满足泵引理，但不是正则语言 Jeffrey Jaffe (MIT) 的泵引理（正则语言的必要充分条件）

41 Decision Property: Equivalence  Given regular languages L and M, is L = M?  Algorithm involves constructing the product DFA from DFA ’ s for L and M.  Let these DFA ’ s have sets of states Q and R, respectively.  Product DFA has set of states Q  R.  I.e., pairs [q, r] with q in Q, r in R.

42 Product DFA – Continued  Start state = [q 0, r 0 ] (the start states of the DFA ’ s for L, M).  Transitions: δ ([q,r], a) = [ δ L (q,a), δ M (r,a)]  δ L, δ M are the transition functions for the DFA ’ s of L, M.  That is, we simulate the two DFA ’ s in the two state components of the product DFA.

43 Example: Product DFA A C B D 0 1 0, 1 1 1 0 0 [A,C][A,D] 0 [B,C] 1 0 1 0 1 [B,D] 0 1

44 Equivalence Algorithm  Make the final states of the product DFA be those states [q, r] such that exactly one of q and r is a final state of its own DFA.  若等价，一个接收，另一个也接收！  Thus, the product accepts w iff w is in exactly one of L and M.

45 Example: Equivalence A C B D 0 1 0, 1 1 1 0 0 [A,C][A,D] 0 [B,C] 1 0 1 0 1 [B,D] 0 1

46 Equivalence Algorithm – (2)  The product DFA ’ s language is empty iff L = M.  We already have an algorithm to test whether the language of a DFA is empty.

47 Decision Property: Containment  Given regular languages L and M, is L  M?  Algorithm also uses the product automaton.  How do you define the final states [q, r] of the product so its language is empty iff L  M? Answer: q is final; r is not.

48 Example: Containment A C B D 0 1 0, 1 1 1 0 0 [A,C][A,D] 0 [B,C] 1 0 1 0 1 [B,D] 0 1 Note: the only final state is unreachable, so containment holds.

49 The Minimum-State DFA for a Regular Language  In principle, since we can test for equivalence of DFA ’ s we can, given a DFA A find the DFA with the fewest states accepting L(A).  Test all smaller DFA ’ s for equivalence with A.  But that ’ s a terrible algorithm.

50 Efficient State Minimization  填表法  不可区分的状态  尽最大努力求出可区分状态 r b A B C B B C

51 Efficient State Minimization  基础：如果 p 是可接收状态， q 是不可接收状态，那么 {p, q} 是可区分的。  归纳：对于 p, q, 如果 r=  (p,a) 与 s=  (q,a) 是可区分的，那么 p, q 是可区分的。

52 Example: State Minimization r b {1} * {1,3,5,7,9}{2,4,6,8} {1,3,5,7,9} * {1,3,7,9}{2,4,6,8} {5} {2,4,6,8} {1,3,5,7,9}{2,4,6,8} {2,4,6,8} {1,3,7,9}{5} {2,4}{2,4,6,8} {1,3,5,7} {1,3,5,7} {2,4} {5} {2,4,6,8} {1,3,5,7,9} Remember this DFA? It was constructed for the chessboard NFA by the subset construction. r b A B C B D E C D F D D G E D G F D C G D G * * Here it is with more convenient state names

53 Efficient State Minimization X r b A B C B D E C D F D D G E D G F D C G D G * * X X X X B C D E F G FEDCBA

54 Efficient State Minimization X X r b A B C B D E C D F D D G E D G F D C G D G * * X X X X X X X X B C D E F G FEDCBA

55 Efficient State Minimization X X X r b A B C B D E C D F D D G E D G F D C G D G * * X X X X X X X X X B C D E F G FEDCBA

56 Efficient State Minimization X X X X X r b A B C B D E C D F D D G E D G F D C G D G * * X X X X X X X X X B C D E F G FEDCBA

57 Efficient State Minimization X X X X X r b A B C B D E C D F D D G E D G F D C G D G * * X X X X X X X X X X X B C D E F G FEDCBA

58 Efficient State Minimization X X X X X r b A B C B D E C D F D D G E D G F D C G D G * * X X X X X X X X X X X X B C D E F G FEDCBA

59 Efficient State Minimization X X X X X r b A B C B D E C D F D D G E D G F D C G D G * * X X X X X X X X X X X X X X B C D E F G FEDCBA

60 Efficient State Minimization X X X X X X r b A B C B D E C D F D D G E D G F D C G D G * * X X X X X X X X X X X X X X B C D E F G FEDCBA

61 Example – Concluded r b A B C B D E C D F D D G E D G F D C G D G * * r b A B C B H H C H F H H G F H C G H G * * Replace D and E by H. Result is the minimum-state DFA. X X X X X X X X X X X X X X X X X X X X B C D E F G FEDCBA

62 Eliminating Unreachable States  Unfortunately, combining indistinguishable states could leave us with unreachable states in the “ minimum-state ” DFA.  Thus, before or after, remove states that are not reachable from the start state.

63 Closure Under Union  If L and M are regular languages, so is L  M.  Proof: Let L and M be the languages of regular expressions R and S, respectively.  Then R+S is a regular expression whose language is L  M.

64 Closure Under Concatenation and Kleene Closure  Same idea:  RS is a regular expression whose language is LM.  R* is a regular expression whose language is L*.

65 Closure Under Intersection  If L and M are regular languages, then so is L  M.  Proof: Let A and B be DFA ’ s whose languages are L and M, respectively.  Construct C, the product automaton of A and B.  Make the final states of C be the pairs consisting of final states of both A and B.

66 Example: Product DFA for Intersection A C B D 0 1 0, 1 1 1 0 0 [A,C][A,D] 0 [B,C] 1 0 1 0 1 [B,D] 0 1

67 Closure Under Difference  If L and M are regular languages, then so is L – M = strings in L but not M.  Proof: Let A and B be DFA ’ s whose languages are L and M, respectively.  Construct C, the product automaton of A and B.  Make the final states of C be the pairs where A- state is final but B-state is not.

68 Example: Product DFA for Difference A C B D 0 1 0, 1 1 1 0 0 [A,C][A,D] 0 [B,C] 1 0 1 0 1 [B,D] 0 1

69 Closure Under Complementation  The complement of a language L (with respect to an alphabet Σ such that Σ * contains L) is Σ * – L.  Since Σ * is surely regular, the complement of a regular language is always regular.

70 Closure Under Reversal – (2)  Given language L, L R is the set of strings whose reversal is in L.  Example: L = {0, 01, 100}; L R = {0, 10, 001}.  Proof: Let E be a regular expression for L.  We show how to reverse E, to provide a regular expression E R for L R.

71 Reversal of a Regular Expression  Basis: If E is a symbol a, ε, or ∅, then E R = E.  Induction: If E is  F+G, then E R = F R + G R.  FG, then E R = G R F R  F*, then E R = (F R )*.

72 Example: Reversal of a RE  Let E = 01* + 10*.  E R = (01* + 10*) R = (01*) R + (10*) R  = (1*) R 0 R + (0*) R 1 R  = (1 R )*0 + (0 R )*1  = 1*0 + 0*1.

73 Homomorphisms  A homomorphism on an alphabet is a function that gives a string for each symbol in that alphabet.  Example: h(0) = ab; h(1) = ε.  Extend to strings by h(a 1 … a n ) = h(a 1 ) … h(a n ).  Example: h(01010) = ababab.

74 Closure Under Homomorphism  If L is a regular language, and h is a homomorphism on its alphabet, then h(L) = {h(w) | w is in L} is also a regular language.  Proof: Let E be a regular expression for L.  Apply h to each symbol in E.  Language of resulting RE is h(L).

75 Example: Closure under Homomorphism  Let h(0) = ab; h(1) = ε.  Let L be the language of regular expression 01* + 10*.  Then h(L) is the language of regular expression ab ε * + ε (ab)*.

76 Example – Continued  ab ε * + ε (ab)* can be simplified.  ε * = ε, so ab ε * = ab ε.  ε is the identity under concatenation.  That is, ε E = E ε = E for any RE E.  Thus, ab ε * + ε (ab)* = ab ε + ε (ab)* = ab + (ab)*.  Finally, L(ab) is contained in L((ab)*), so a RE for h(L) is (ab)*.

77 正则语言小结田聪

78 正则语言  确定有限状态自动机  非确定有限状态自动机  带  的非确定有限状态自动机  正则表达式 RE DFA NFA ε -NFA L(RE)=L(ε-NFA)=L(NFA)=L(DFA)= 正则语言

79 正则语言的性质  泵引理（必要非充分条件）  可用来证明一个特定的语言不是正则语言  不能用来证明一个特定的语言是正则语言  判定性  一个自动机接收的语言是否为空  串 w 是否可被某自动机接收  两个自动机是否等价

80 正则语言的性质封闭性  正则语言的并操作  交  补  差  反转  闭包  连接  同态

Similar presentations

Similar presentations

About project

Feedback

Log in

Auth with social network:

Similar presentations

Similar presentations

About project

Feedback