Download presentation
Presentation is loading. Please wait.
Published byPhillip Sharp Modified over 9 years ago
1
1 词法分析部分总结 田 聪
2
2 1. 描述:用正规式对模式进行 描述; 2. 构造 NFA :为每个正规式构 造一个 NFA ; 3. 确定化:将 NFA 转换成等价 的 DFA ; 4. 最小化:优化 DFA ,使其状 态数最少; 5. 构造词法分析器:由 DFA 构 造词法分析器(表驱动,直 接编码, LEX )。 构造词法分析器的一般方法和步骤
3
3 涉及到的形式化概念 正则表达式(正规式) NFA DFA 正则表达式 例如: Char(char|digit)* , … 字符串, abb A111, …. NFA DFA 正则语言(正规集)
4
4 正则语言 上下文无关语言 上下文有关语言 递归可枚举语言
5
5 正则语言 正则语言 正则表达式 有限状态自动机 上下文无关语言 上下文无关文法 非确定的下推自动机 上下文有关语言 上下文有关文法 线性有界自动机(特殊的图灵机) 递归可枚举语言 短语结构 图灵机
6
6 Algebraic Laws for RE ’ s Union and concatenation behave like addition and multiplication. + is commutative ( 可交换的 ) and associative ( 可结合的 ) a+b=b+a, a+b+c= a+(b+c) concatenation is associative ( 可结合的 ) a.b.c=a.(b.c) Concatenation distributes over +( 可分配的 ) a.(b+c)=a.b+a.c Exception: Concatenation is not commutative ( 可交换的 ) a.b ≠b.a
7
7 Identities and Annihilators ∅ is the identity ( 单位元 ) for +. R + ∅ = R. ε is the identity ( 单位元 ) for concatenation. ε R = R ε = R. ∅ is the annihilator ( 零元 ) for concatenation. ∅ R = R ∅ = ∅.
8
8 正则语言相关内容 如何证明正则表达式和 NFA 的等价性? ( 1 ) We need to show that for every RE, there is an automaton that accepts the same language. ( 2 ) And for every automaton, there is a RE defining its language.
9
9 RE to ε -NFA: Basis Symbol a: ε : a ε
10
10 RE to ε -NFA: Induction 1 – Union For E 1 For E 2 For E 1 E 2 ε εε ε
11
11 RE to ε -NFA: Induction 2 – Concatenation For E 1 For E 2 For E 1 E 2 ε
12
12 RE to ε -NFA: Induction 3 – Closure For E For E* ε ε εε
13
13 正则语言相关内容 如何证明正则表达式和 NFA 的等价性? ( 1 ) We need to show that for every RE, there is an automaton that accepts the same language. ( 2 ) And for every automaton, there is a RE defining its language.
14
14 From Automata to RE Arden ’ s rule For any sets of strings S and T, the equation X=SX+T has X=S*T as a solution. Moreover, this solution is unique if ε not in S.
15
15 From Automata to RE Given an automaton A A has states {q0, …, qn} with q0 being the start state Let Xi denote the set of strings accepted by A starting in state qi Thus, L(A)=X0 We can write an equation for each Xi, defining it in terms of the sets corresponding to its successor states.
16
16 From Automata to RE q2 q0 q3 q1 b, c a c c a a,b,c a,b b A0
17
17 From Automata to RE q2 q0 q3 q1 b, c a c c a a,b,c a,b b A0 (0) X0=aX1+bX3+cX3 (1) X1=aX3+bX2+cX0+ε (2) X2=aX3+bX3+cX0 (3) X3=aX3+bX3+cX3 X3=(a+b+c)X3+ ∅, by Arden’s rule: X3=(a+b+c)* ∅ = ∅ (0) X0=aX1 (1) X1=bX2+cX0+ε (2) X2=cX0 Substituing (0) and (2) in (1): X1=(bc+c)aX1+ε =((bc+c)a)* (by Arden’s rule) X0=a((bc+c)a)*
18
18 DFA-to-RE Another approach Page 93, theorem 3.4. (形式语言与自动机) Induction on k-path.
19
19 Decision Properties of Regular Languages
20
20 Properties of Language Classes A language class is a set of languages. We have one example for language class: the regular languages. 任何一个正则表达式都表达了一个语言,所有的 正则表达式构成了语言类:正则语言 Language classes have two important kinds of properties: 1. Decision properties. 2. Closure properties.
21
21 Decision Properties A decision property for a class of languages is an algorithm that takes a formal description of a language (e.g., a DFA) and tells whether or not some property holds. Example: Is language L empty? Suppose the representation is a DFA. Can you tell if L(A) = for DFA A?
22
22 Why Decision Properties? When we talked about protocols represented as DFA ’ s, we noted that important properties of a good protocol were related to the language of the DFA. Example: “ Does the protocol terminate? ” = “ Is the language finite? ” Example: “ Can the protocol fail? ” = “ Is the language nonempty? ”
23
23 Why Decision Properties – (2) We might want a “ smallest ” representation for a language, e.g., a minimum-state DFA or a shortest RE. If you can ’ t decide “ Are these two languages the same? ” I.e., do two DFA ’ s define the same language? You can ’ t find the “ smallest. ”
24
24 Closure Properties A closure property ( 封闭性 ) of a language class says that given languages in the class, an operator (e.g., union) produces another language in the same class. Example: the regular languages are obviously closed under union, concatenation, and (Kleene) closure. ( 求补?求交? ) ε 是正规式 若 a 是 Σ 上的字符,则 a 是正规式 若 r 和 s 分别是 Σ 上的正规式,那么 ( a ) r|s 是正规式 ( b ) rs 是正规式 ( c ) r* 是正规式
25
25 The Membership Question Our first decision property is the question: “ is string w in regular language L? (成员问题) ” Assume L is represented by a DFA A. Simulate the action of A on the sequence of input symbols forming w.
26
26 Example: Testing Membership Start 1 0 ACB 1 0 0,1 0 1 0 1 1 Next symbol Current state
27
27 Example: Testing Membership Start 1 0 ACB 1 0 0,1 0 1 0 1 1 Next symbol Current state
28
28 Example: Testing Membership Start 1 0 ACB 1 0 0,1 0 1 0 1 1 Next symbol Current state
29
29 Example: Testing Membership Start 1 0 ACB 1 0 0,1 0 1 0 1 1 Next symbol Current state
30
30 Example: Testing Membership Start 1 0 ACB 1 0 0,1 0 1 0 1 1 Next symbol Current state
31
31 Example: Testing Membership Start 1 0 ACB 1 0 0,1 0 1 0 1 1 Next symbol Current state
32
32 The Emptiness Problem Given a regular language, does the language contain no string at all (判空问题). Assume representation is DFA. Construct the transition graph. Compute the set of states reachable from the start state. If any final state is reachable, then yes, else no.
33
33 The Infiniteness Problem Is a given regular language infinite? Start with a DFA for the language. Key idea: if the DFA has n states, and the language contains any string of length n or more, then the language is infinite. Otherwise, the language is surely finite. Limited to strings of length n or less.
34
34 Proof of Key Idea If an n-state DFA accepts a string w of length n or more, then there must be a state that appears twice on the path labeled w from the start state to a final state. Because there are at least n+1 states along the path.
35
35 Proof – (2) |w| = 5 s0s2 s4 s1s3 1234
36
36 Finding Cycles 1. Eliminate states not reachable from the start state. 2. Eliminate states that do not reach a final state. 3. Test if the remaining transition graph has any cycles.
37
37 The Pumping Lemma We have, almost accidentally, proved a statement that is quite useful for showing certain languages are not regular. Called the pumping lemma for regular languages.
38
38 Statement of the Pumping Lemma For every regular language L There is an integer n, such that For every string w in L of length > n We can write w = xyz such that: 1. |xy| < n. 2. |y| > 0. 3. For all i > 0, xy i z is in L. Number of states of DFA for L Labels along first cycle on path labeled w
39
39 Example: Use of Pumping Lemma 用来证明一个语言不是正则语言(必要非充分条件) Example: {0 k 1 k | k > 1} is not a regular language. Suppose it were. Then there would be an associated n for the pumping lemma. Let w = 0 n 1 n. We can write w = xyz, where x and y consist of 0 ’ s, and y ε. But then xyyz would be in L, and this string has more 0 ’ s than 1 ’ s. 泵引理是正则语言的必要非充分条件! 一个正则语言,必须满足泵引理。 如果一个语言不满足泵引理,那么它肯定不是正则语 言。如果它满足泵引理,它不一定是正则语言。
40
40 Pumping Lemma 满足泵引理,但不是正则语言 Jeffrey Jaffe (MIT) 的泵引理(正则语言的必要充分条件)
41
41 Decision Property: Equivalence Given regular languages L and M, is L = M? Algorithm involves constructing the product DFA from DFA ’ s for L and M. Let these DFA ’ s have sets of states Q and R, respectively. Product DFA has set of states Q R. I.e., pairs [q, r] with q in Q, r in R.
42
42 Product DFA – Continued Start state = [q 0, r 0 ] (the start states of the DFA ’ s for L, M). Transitions: δ ([q,r], a) = [ δ L (q,a), δ M (r,a)] δ L, δ M are the transition functions for the DFA ’ s of L, M. That is, we simulate the two DFA ’ s in the two state components of the product DFA.
43
43 Example: Product DFA A C B D 0 1 0, 1 1 1 0 0 [A,C][A,D] 0 [B,C] 1 0 1 0 1 [B,D] 0 1
44
44 Equivalence Algorithm Make the final states of the product DFA be those states [q, r] such that exactly one of q and r is a final state of its own DFA. 若等价,一个接收,另一个也接收! Thus, the product accepts w iff w is in exactly one of L and M.
45
45 Example: Equivalence A C B D 0 1 0, 1 1 1 0 0 [A,C][A,D] 0 [B,C] 1 0 1 0 1 [B,D] 0 1
46
46 Equivalence Algorithm – (2) The product DFA ’ s language is empty iff L = M. We already have an algorithm to test whether the language of a DFA is empty.
47
47 Decision Property: Containment Given regular languages L and M, is L M? Algorithm also uses the product automaton. How do you define the final states [q, r] of the product so its language is empty iff L M? Answer: q is final; r is not.
48
48 Example: Containment A C B D 0 1 0, 1 1 1 0 0 [A,C][A,D] 0 [B,C] 1 0 1 0 1 [B,D] 0 1 Note: the only final state is unreachable, so containment holds.
49
49 The Minimum-State DFA for a Regular Language In principle, since we can test for equivalence of DFA ’ s we can, given a DFA A find the DFA with the fewest states accepting L(A). Test all smaller DFA ’ s for equivalence with A. But that ’ s a terrible algorithm.
50
50 Efficient State Minimization 填表法 不可区分的状态 尽最大努力求出可区分状态 r b A B C B B C
51
51 Efficient State Minimization 基础:如果 p 是可接收状态, q 是不可接收状态,那 么 {p, q} 是可区分的。 归纳:对于 p, q, 如果 r= (p,a) 与 s= (q,a) 是可区分的, 那么 p, q 是可区分的。
52
52 Example: State Minimization r b {1} * {1,3,5,7,9}{2,4,6,8} {1,3,5,7,9} * {1,3,7,9}{2,4,6,8} {5} {2,4,6,8} {1,3,5,7,9}{2,4,6,8} {2,4,6,8} {1,3,7,9}{5} {2,4}{2,4,6,8} {1,3,5,7} {1,3,5,7} {2,4} {5} {2,4,6,8} {1,3,5,7,9} Remember this DFA? It was constructed for the chessboard NFA by the subset construction. r b A B C B D E C D F D D G E D G F D C G D G * * Here it is with more convenient state names
53
53 Efficient State Minimization X r b A B C B D E C D F D D G E D G F D C G D G * * X X X X B C D E F G FEDCBA
54
54 Efficient State Minimization X X r b A B C B D E C D F D D G E D G F D C G D G * * X X X X X X X X B C D E F G FEDCBA
55
55 Efficient State Minimization X X X r b A B C B D E C D F D D G E D G F D C G D G * * X X X X X X X X X B C D E F G FEDCBA
56
56 Efficient State Minimization X X X X X r b A B C B D E C D F D D G E D G F D C G D G * * X X X X X X X X X B C D E F G FEDCBA
57
57 Efficient State Minimization X X X X X r b A B C B D E C D F D D G E D G F D C G D G * * X X X X X X X X X X X B C D E F G FEDCBA
58
58 Efficient State Minimization X X X X X r b A B C B D E C D F D D G E D G F D C G D G * * X X X X X X X X X X X X B C D E F G FEDCBA
59
59 Efficient State Minimization X X X X X r b A B C B D E C D F D D G E D G F D C G D G * * X X X X X X X X X X X X X X B C D E F G FEDCBA
60
60 Efficient State Minimization X X X X X X r b A B C B D E C D F D D G E D G F D C G D G * * X X X X X X X X X X X X X X B C D E F G FEDCBA
61
61 Example – Concluded r b A B C B D E C D F D D G E D G F D C G D G * * r b A B C B H H C H F H H G F H C G H G * * Replace D and E by H. Result is the minimum-state DFA. X X X X X X X X X X X X X X X X X X X X B C D E F G FEDCBA
62
62 Eliminating Unreachable States Unfortunately, combining indistinguishable states could leave us with unreachable states in the “ minimum-state ” DFA. Thus, before or after, remove states that are not reachable from the start state.
63
63 Closure Under Union If L and M are regular languages, so is L M. Proof: Let L and M be the languages of regular expressions R and S, respectively. Then R+S is a regular expression whose language is L M.
64
64 Closure Under Concatenation and Kleene Closure Same idea: RS is a regular expression whose language is LM. R* is a regular expression whose language is L*.
65
65 Closure Under Intersection If L and M are regular languages, then so is L M. Proof: Let A and B be DFA ’ s whose languages are L and M, respectively. Construct C, the product automaton of A and B. Make the final states of C be the pairs consisting of final states of both A and B.
66
66 Example: Product DFA for Intersection A C B D 0 1 0, 1 1 1 0 0 [A,C][A,D] 0 [B,C] 1 0 1 0 1 [B,D] 0 1
67
67 Closure Under Difference If L and M are regular languages, then so is L – M = strings in L but not M. Proof: Let A and B be DFA ’ s whose languages are L and M, respectively. Construct C, the product automaton of A and B. Make the final states of C be the pairs where A- state is final but B-state is not.
68
68 Example: Product DFA for Difference A C B D 0 1 0, 1 1 1 0 0 [A,C][A,D] 0 [B,C] 1 0 1 0 1 [B,D] 0 1
69
69 Closure Under Complementation The complement of a language L (with respect to an alphabet Σ such that Σ * contains L) is Σ * – L. Since Σ * is surely regular, the complement of a regular language is always regular.
70
70 Closure Under Reversal – (2) Given language L, L R is the set of strings whose reversal is in L. Example: L = {0, 01, 100}; L R = {0, 10, 001}. Proof: Let E be a regular expression for L. We show how to reverse E, to provide a regular expression E R for L R.
71
71 Reversal of a Regular Expression Basis: If E is a symbol a, ε, or ∅, then E R = E. Induction: If E is F+G, then E R = F R + G R. FG, then E R = G R F R F*, then E R = (F R )*.
72
72 Example: Reversal of a RE Let E = 01* + 10*. E R = (01* + 10*) R = (01*) R + (10*) R = (1*) R 0 R + (0*) R 1 R = (1 R )*0 + (0 R )*1 = 1*0 + 0*1.
73
73 Homomorphisms A homomorphism on an alphabet is a function that gives a string for each symbol in that alphabet. Example: h(0) = ab; h(1) = ε. Extend to strings by h(a 1 … a n ) = h(a 1 ) … h(a n ). Example: h(01010) = ababab.
74
74 Closure Under Homomorphism If L is a regular language, and h is a homomorphism on its alphabet, then h(L) = {h(w) | w is in L} is also a regular language. Proof: Let E be a regular expression for L. Apply h to each symbol in E. Language of resulting RE is h(L).
75
75 Example: Closure under Homomorphism Let h(0) = ab; h(1) = ε. Let L be the language of regular expression 01* + 10*. Then h(L) is the language of regular expression ab ε * + ε (ab)*.
76
76 Example – Continued ab ε * + ε (ab)* can be simplified. ε * = ε, so ab ε * = ab ε. ε is the identity under concatenation. That is, ε E = E ε = E for any RE E. Thus, ab ε * + ε (ab)* = ab ε + ε (ab)* = ab + (ab)*. Finally, L(ab) is contained in L((ab)*), so a RE for h(L) is (ab)*.
77
77 正则语言小结 田聪
78
78 正则语言 确定有限状态自动机 非确定有限状态自动机 带 的非确定有限状态自动机 正则表达式 RE DFA NFA ε -NFA L(RE)=L(ε-NFA)=L(NFA)=L(DFA)= 正则语言
79
79 正则语言的性质 泵引理(必要非充分条件) 可用来证明一个特定的语言不是正则语言 不能用来证明一个特定的语言是正则语言 判定性 一个自动机接收的语言是否为空 串 w 是否可被某自动机接收 两个自动机是否等价
80
80 正则语言的性质 封闭性 正则语言的并操作 交 补 差 反转 闭包 连接 同态
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.