Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regular Expressions 15-211 Fundamental Data Structures and Algorithms Peter Lee March 13, 2003.

Similar presentations


Presentation on theme: "Regular Expressions 15-211 Fundamental Data Structures and Algorithms Peter Lee March 13, 2003."— Presentation transcript:

1 Regular Expressions 15-211 Fundamental Data Structures and Algorithms Peter Lee March 13, 2003

2

3 Announcements  Homework #4 is due on Monday!  Monday, March 17, 11:59pm  Reading:  Handout (from last time)

4 Recap: FSMs

5 Finite State Machines (FSMs) Input String M {Yes, No} M = (, S, q0, F, ) Input alphabet State set Initial state Final states Transition function

6  Can extend :S    S to ’:S  *  S ’(q, ) = q ’(q, aw) = ’((q, a), w) Transition functions A deterministic finite automaton (DFA) Inductively:

7 DFA example  Which strings of as and bs are accepted?  Transition function:  { (q0,a)  q1, (q0,b)  q0, (q1,a)  q2, (q1,b)  q1, (q2,a)  q2, (q2,b)  q2 } 1 2 0 aa bba,b

8 Nondeterministic FSMs (NFAs)  NFAs can transition to more than one state on any input  :S    P(S)  As before, can extend:  ’:S  *  P(S)  Inductively: ’(q, ) = {q} ’(q, aw) =  p(q, a) ’(p, w)

9 NFA example 0 1 a,b ab b Transition function:  { (q0,a)  {q0,q1}, (q0,b)  {q1}, (q1,a)  , (q1,b)  {q0,q1} }

10 Questions 1. Are there languages L that can be accepted by NFAs but not DFAs? 2. What practical use are there for FSMs? No! Today: the proof. After the proof…

11 The Idea  An NFA can be in more than one state at a time  Define a DFA whose states correspond with all combinations of the NFA states

12 Another handy extension  Extend :S    P(S) to ’:S  *  P(S) to ’’:P(S)  *  P(S) ’’({q1,…qn}, w) =  ’(qi, w) 1 i n

13 NFA into a DFA example 0 1 a,b ab b In the DFA, construct these states: S = {[], [q0], [q1], [q0,q1]} Each state in the DFA represents a set of states in the NFA NFA:

14 NFA into a DFA example 0 1 a,b ab b DFA: S= {[], [q0], [q1], [q0,q1]} What is  for the DFA? ([],a) = [] and ([],b) = [] ([q0],a) = [q0,q1] ([q0],b) = [q1] ([q1],a) = [] ([q1],b) = [q0,q1] ([q0,q1],a) = [q0,q1] ([q0,q1],b) = [q0,q1] 0 0,1 a,b a b b 1

15 The theorem  Thm: Let L be a language accepted by an NFA. Then there exists a DFA that also accepts L.  Proof:  Let’s use the construction shown on the previous slides. We must prove that the DFA accepts the same language as the NFA.

16 The proof  More formally:  Let M = (, S, q0, F, ) be the NFA, and  M’ = (, S’, q0’, F’, ’) be the DFA.  We want to prove that, given any input string w, that  ’(q0’,w)=[qi,qj,…,qk] iff  (q0,w)={qi,qj,…,qk}

17 By induction (of course!)  Base case:  Trivial for the empty input string.  Induction hypothesis:  Assume true for all input strings of length n or less.

18 By induction…  Let wa be a string of length n+1. Then  ’(q0’,wa) = ’(’(q0’,w),a)  By the IH,  ’(q0’,w) = [qi,qj,…,qk] iff (q0,w) = {qi,qj,…,qk}  And by definition of ’  ’([qi,qj,…,qk],a) = [qa,qb,…,qc] iff ({qi,qj,…,qk},a) = {qa,qb,…,qc}  Thus,  ’(q0’,wa) = [qa,qb,…,qc] iff (q0, wa) = {qa,qb,…,qc} 

19 Regular Languages

20 Regular languages  The language accepted by M: L(M) = {w | ’(q0,w)  F}  Can also say:  The language recognized by M  The language decided by M  When M is a FSM, we say that the language is regular

21 Another question  Is the complement of a regular language also regular? L’ = * - L  Hint 1: Is there a way to construct a complement machine?  Hint 2: Consider the final states…

22 Closure properties  What about union?  Intersection?  Product?

23 A Digression

24 Cheating vs Collaboration

25 A scenario  Alice and Bob are excellent students.  There is virtually no doubt that they can easily do “A” work in 15-211.  But even so, 15-211 is a lot of work.  And the time required might be better spent in another course, which is harder, and possibly more important.

26 A scenario, cont’d  So, to save time, Alice and Bob decide to work together on the 15- 211 homeworks.  They work together and hand in essentially the same programs.  Alice writes a comment into her version of the code, explaining that she has collaborated with Bob.  Bob does not do this.

27 A scenario, cont’d  Did Alice cheat?  What about Bob?

28 A second scenario  Bob works very hard on his 15-211 assignment  He gets everything working and hands it in 3 days early  He then discusses his solution with Alice  After discussing with Alice, Bob realizes that his solution is O(n 2 ), whereas the best solutions are O(nlog n)

29 Second scenario, cont’d  Bob uses this new knowledge and rewrites his assignment so that it runs in O(nlog n) time, and hands it in  Later, after further discussion with Alice, he realizes that his code, while acceptably fast, is still written poorly

30 Second scenario, cont’d  Bob has learned a lot already, but is concerned that his grade will not reflect his state of knowledge  Bob thus copies Alice’s code, makes some minor modifications, and hands it in  What has happened here?

31 Regular Expressions

32  A regular language can always be described using a regular expression.  Examples  (01)*  00    (a|b)*ab  this|that|theother  0*1*2*  01*|0 = 01*  00*11*22* = 0 + 1 + 2 +  (1|0)*00(0|1)*

33 More examples  [.?!][\]\"')]*($|\t| )[ \t\n]*  [.?!][]"')]*($| |)[ ]*  Emacs regexp:  Any of. ? ! followed by  Zero or more of ] “ ‘ ) followed by  Any of end-of-line, tab, two spaces followed by  Zero or more of space, tab, newline  [Demo of emacs, sed, grep…]

34 Regular expressions  Inductive definition. Let  = {a,b}.   is a regular expression  L = {}

35 Regular expressions  Inductive definition. Let  = {a,b}.   is a regular expression  L = {}   is a regular expression  L = {}

36 Regular expressions  Inductive definition. Let  = {a,b}.   is a regular expression  L = {}   is a regular expression  L = {} Invariant: Every machine must have exactly one final state.

37 Regular Expressions  Inductive definition. Let  = {a,b}.   is a regular expression  L = {}   is a regular expression  L = {}  a is a regular expression  L = {a} a

38 Regular Expressions  Inductive definition. Let  = {a,b}.   is a regular expression  L = {}   is a regular expression  L = {}  a is a regular expression  L = {a}

39 Regular Expressions  Inductive definition. Let  = {a,b}.   is a regular expression  L = {}   is a regular expression  L = {}  a is a regular expression  L = {a}  R+S is a regular expression if R and S are

40 Regular Expressions  Inductive definition. Let  = {a,b}.   is a regular expression  L = {}   is a regular expression  L = {}  a is a regular expression  L = {a}  R+S is a regular expression if R and S are  L R+S = L R U L S

41 Regular Expressions  Inductive definition. Let  = {a,b}.   is a regular expression  L = {}   is a regular expression  L = {}  a is a regular expression  L = {a}  R+S is a regular expression if R and S are  L R+S = L R U L S R S

42 Regular Expressions  Inductive definition. Let  = {a,b}.   is a regular expression  L = {}   is a regular expression  L = {}  a is a regular expression  L = {a}  R+S is a regular expression if R and S are  L R+S = L R U L S Invariant: Every machine must have exactly one final state. R S

43 Regular Expressions  Inductive definition. Let  = {a,b}.   is a regular expression  L = {}   is a regular expression  L = {}  a is a regular expression  L = {a}  R+S is a regular expression if R and S are  L R+S = L R U L S Add a new final state with  transitions from old final states if necessary Invariant: Every machine must have exactly one final state. R S

44 Regular Expressions  Inductive definition. Let  = {a,b}.   is a regular expression  L = {}   is a regular expression  L = {}  a is a regular expression  L = {a}  R+S is a regular expression if R and S are  L R+S = L R U L S     R S

45 Regular Expressions  Inductive definition. Let  = {a,b}.   is a regular expression   is a regular expression  a is a regular expression  R+S is a regular expression if R and S are  RS is a regular expression if R and S are  L RS = {uv | u  L R & v  L S }  R S

46 Regular Expressions  Inductive definition. Let  = {a,b}.   is a regular expression   is a regular expression  a is a regular expression  R+S is a regular expression if R and S are  RS is a regular expression if R and S are  R* is a regular expression if R is  L R* = U 0 i L R i  R  

47 Regular Expressions  The language described by a regular expression can be accepted by an FSM. RE  NFA  NFA  DFA  A regular language can always be described using a regular expression. DFA  RE

48 Regular Expressions  Membership in a regular language can be tested in time linear in the size of the input string.

49 Building FSMs  An FSM is a directed graph  How large is the input alphabet?  How many states?  How fast must it run?  How to get the lowest constant factor?  How to minimize space?  Representations  Matrix  Array of lists  Hashtable  Overlapping hashtable  Switch statement ab 011 123 210 323 414

50 Manipulating FSMs  Eliminate unreachable states  Transform NFA into DFA  Transform NFA into NFA  Minimize DFA  Create FSM from regular expression  Create regular expression from FSM  Test equivalence of FSMs  Test emptiness of FSM language


Download ppt "Regular Expressions 15-211 Fundamental Data Structures and Algorithms Peter Lee March 13, 2003."

Similar presentations


Ads by Google