Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Similar presentations


Presentation on theme: "Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators."— Presentation transcript:

1 Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators

2 Regular Expressions A compact, easy-to-read language description. Use operators to denote the language constructors described earlier, to build “complex” languages from simple “atomic” ones.

3 Regular Expressions Definition: A regular expression over an alphabet Σ is recursively defined as follows: 1.ø denotes language ø 2.ε denotes language {ε} 3.a denotes language {a}, for all a  Σ. 4.(P + Q) denotes L(P) U L(Q), where P, Q are r.e.’s. 5.(PQ) denotes L(P)·L(Q), where P, Q are r.e.’s. 6.P* denotes L(P)*, where P is r.e. To prevent excessive parentheses, we assume left associativity, with the following operator precedence hierarchy, from most to least binding: *, ·, +

4 Regular Expressions Examples: (O + 1)*: any string of O’s and 1’s. (O + 1)*1: any string of O’s and 1’s, ending with a 1. 1*O1*: any string of 1’s with a single O inserted. Letter (Letter + Digit)*: an identifier. Digit Digit*: an integer. Quote Char* Quote: a string. † # Char* Eoln: a comment. † {Char*}: another comment. † † Assuming that Char does not contain quotes, eoln’s, or }.

5 Regular Expressions Conversion from Right-linear grammars to regular expressions Example: S → aSR → aS → bR → ε What does S → aS mean? L(S)  {a}·L(S) S → bR means L(S)  {b}·L(R) S → ε means L(S) {ε}

6 Regular Expressions Together, they mean that L(S) = {a}·L(S) + {b}·L(R) + {ε} or S = aS + bR + ε Similarly, R → aS means R = aS. Thus, S = aS + bR + ε R = aS System of simultaneous equations, in which the variables are nonterminals.

7 Regular Expressions Solving systems of simultaneously equations. S = aS + bR + ε R = aS Back substitute R = aS: S = aS + baS + ε = (a + ba) S + ε Question: What to do with equations of the form: X = X + β ?

8 Regular Expressions Answer: β  L(x), so α β  L(x), αα β  L(x), ααα β  L(x), … Thus α *β = L(x). In our case, S = (a + ba) S + ε = (a + ba)* ε = (a + ba)*

9 Regular Expressions Right-linear regular grammar ↓ regular expression 1. A = α 1 + α 2 + … + α n if A → α 1 → α 2. → α n

10 Regular Expressions 2.If equation is of the form X = α, where X does not appear in α, then replace every occurrence of X with α in all other equations, and delete equation X = α. If equation is of the form X = α X + β, where X does not occur in either α or β, then replace the equation with X = α *β. Note: Some algebraic manipulations may be needed to obtain the form X = α X + β. Important: Catenation is not commutative!!

11 Regular Expressions Example: S → aR → abaUU → aS → bU → U → b → bR S = a + bU + bR R = abaU + U = (aba + ε) U U = aS + b Back substitute R: S = a + bU + b(aba + ε) U U = aS + b

12 Regular Expressions Back substitute U: S = a + b(aS + b) + b(aba + ε)(aS + b) = a + baS + bb + babaaS + babab + baS + bb = (ba + babaa)S + (a + bb + babab) therefore S = (ba + babaa)*(a + bb + babab) repeats

13 Regular Expressions Summarizing: RG R RG L Minimum DFA RENSA DFA Done Soon

14 Regular Expressions Regular Expression ↓ NFA Recursively build the FSA, mimicking the structure of the regular expression. Each FSA built has one start state, and one final state. Conversions:  if ø 21

15 Regular Expressions if ε if a if P + Q if P· Q or 1 12 a 12 ε Q P ε ε ε PQ ε 1P ε Q2 ε ε

16 Regular Expressions  if P* Example: (b (aba + ε) a)* (b (aba + ε) a)* 1P ε 2 ε ε ε 12 34 56 b a b

17 Regular Expressions (b (aba + ε) a)* 78 9 1011 a a 3 456 7 8 ab a ε ε

18 Regular Expressions (b (aba + ε) a)* 3 456 7 8 ab a ε ε 13 9 12 εε ε ε 3 456 7 8 ab a ε ε 13 9 12 εε ε ε 21 b ε

19 Regular Expressions (b (aba + ε) a) * 3 456 7 8 ab a ε ε 13 9 12 εε ε ε 21 b ε 10 11 ε a

20 Regular Expressions (b (aba + ε) a)* 2 1234 6 7 εa ε ε 8 13 aε 14 1 εb 10 εε 59 ε ε 11 ε a 15 ε ε

21 Regular Expressions Regular Expression ↓ NFA Start With: ALGORITHM 2 E

22 Regular Expressions Apply Rules: a* a + b ab εε a ab a b

23 Regular Expressions Algorithm 1: Builds FSA bottom up Good for machines Bad for humans Algorithm 2: Builds FSA top down Bad for machines Good for humans Arguable

24 Regular Expressions Example (Algorithm 2): (a + b)* (aa + bb) (a + b)*aa + bb εε aa bb a + b εε a b aa bb

25 Regular Expressions Example (Algorithm 2): ba(a + b)* ab baεεab a b

26 Regular Expressions Deterministic Finite-State Automata (DFA’s) Definition: A deterministic FSA is defined just like an NFA, except that δ: Q x Σ → Q, rather than δ: Q x Σ union {ε} → 2Q Thus, both and are impossible. εa a

27 Regular Expressions Every transition of a DFA consumes a symbol. Fortunately, DFA’s are just as powerful as NFA’s. Theorem: For every NFA there exists an equivalent (accepting the same language) DFA.

28 Regular Expressions Conversion from NFA’s to DFA’s: “Simulate” all moves of the NFA with the DFA. The start state of the DFA is the start state of the NFA (say, S), together with states that are ε- reachable from S. Each state in the DFA is a subset of the set of states of the NFA; the notion of being in “any one of” a number of states. New states in the DFA are constructed by calculating the sets of states that are reachable through symbols, after the start state. The final states in the DFA are those that contain any final state of the NFA.

29 Regular Expressions Example: a*b + ba* NFA ε b b ε ε ε 1 3 4 2 5 6 a a

30 Regular Expressions DFA Input State a b 12323456 2323 6 45656 --- 6--- --- 5656 --- a b 123 23 456 56 6 b a a a

31 Regular Expressions In general, if NFA has N states, the DFA can have as many as 2 N states. Example: ba (a + b)* ab ε a ε ε ε 3 5 6 4 7 8 baε 012 b ε ε 11109 NFA

32 Regular Expressions DFA Input State a b 0 --- 1 1 234689 --- 23468934568910346789 3456891034568910 34678911 34678934568910 346789 3467891134568910346789

33 Regular Expressions a b a b 234689 346789 b 34568910 34678911 a b a01 ab

34 Regular Expressions State Minimization Theorem: Given a DFA M, there exists an equivalent DFA M’ that is minimal, i.e. no other equivalent DFA exists with fewer states than M’. Definition: A partition of a set S is a set of subsets of S such that every element of S appears in exactly one of the subsets.

35 Regular Expressions Example: S = {1, 2, 3, 4, 5} Π 1 = { {1, 2, 3, 4}, {5} } Π 2 = { {1, 2, 3,}, {4}, {5} } Π 3 = { {1, 3}, {2}, {4}, {5} } Note: Π 2 is a refinement of Π 1, and Π 3 is a refinement of Π 2.

36 Regular Expressions Minimization Algorithm: 1.Remove all undefined transitions by introducting a TRAP state, i.e. a state from which no final state is reachable. 2.Partition all states into two groups (final states and non-final states). 3.Complete the “Next State” table for each group, by specifying transitions from group to group. Form the next partition: split groups in which Next State table entries differ. Repeat 3 until no further splitting is possible. 4.Determine start and final states.

37 Regular Expressions Example: Π 0 = { {1, 2, 3, 4}, {5} } State a b 112341234 212341234 312341234 41234 5 512341234 b a b 1 2 35 4 b b a a a b a Split {4} from partition {1,2,3,4}

38 Regular Expressions Π 1 = { {1, 2, 3}, {4}, {5} } State a b 1123123 2123 4 3123123 4123 5 5123123 Split {2} from partition {1,2,3} a b 1 2 35 4 b b a a a

39 Regular Expressions Π 2 = { {1, 3}, {2}, {4}, {5} } State a b 1 2 13 3 2 13 2 2 4 4 2 5 5 2 13 No more splitting Minimal DFA 5 13 4 2 a a a a b b b

40 Regular Expressions Summary of Regular Languages Smallest class in the Chomsky hierarchy. Appropriate for lexical analysis. Four representations: RG R, RG L, RE and FSA. All four are equivalent; there are algorithms to perform transformations among them. Various advantages and disadvantages among these four, for language designer, implementor, and user. FSA’s can be made deterministic, and minimal.


Download ppt "Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators."

Similar presentations


Ads by Google