Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.

Similar presentations


Presentation on theme: "1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002."— Presentation transcript:

1 1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002

2 2 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Outline Finite State Automata Why regular expressions are not enough Context Free Grammars Next Time: Relationship to Programming Languages and Compilers

3 3 Adapted from Jurafsky & Martin 2000 Three Equivalent Representations Finite automata Regular expressions Regular languages Each can describe the others Theorem: For every regular expression, there is a deterministic finite-state automaton that defines the same language, and vice versa.

4 4 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Finite Automata A FA is similar to a compiler in that: –A compiler recognizes legal programs in some (source) language. –A finite-state machine recognizes legal strings in some language. Example: Pascal Identifiers –sequences of one or more letters or digits, starting with a letter: letter letter | digit S A

5 5 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Finite-Automata State Graphs The start state An accepting state A transition a A state

6 6 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Finite Automata Transition s 1  a s 2 Is read In state s 1 on input “a” go to state s 2 If end of input –If in accepting state => accept –Otherwise => reject If no transition possible (got stuck) => reject FSA = Finite State Automata

7 7 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Language defined by FSA The language defined by a FSA is the set of strings accepted by the FSA. –in the language of the FSM shown below: x, tmp2, XyZzy, position27. –not in the language of the FSM shown below: 123, a?, 13apples. letter letter | digit S A

8 8 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Example: Integer Literals FSA that accepts integer literals with an optional + or - sign: Note – two different edges from S to A \(+|-)?[0-9]+\ + digit S B A -

9 9 Example: FSA that accepts three letter English words that begin with p and end with d or t. Here I use the convenient notation of making the state name match the input that has to be on the edge leading to that state. p t a o u d i

10 10 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Practice Write an automaton that accepts Java identifiers –One or more letters, digits, or underscores, starting with a letter or an underscore. –Start with the regexp

11 11 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Formal Definition A finite automaton is a 5-tuple (, Q, , q, F) where: –An input alphabet  –A set of states Q –A start state q –A set of accepting states F  Q – is the state transition function: Q x   Q (i.e., encodes transitions state  input state)

12 12 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html How to Implement an FSA A table-driven approach: table: –one row for each state in the machine, and –one column for each possible character. Table[j][k] –which state to go to from state j on character k, –an empty entry corresponds to the machine getting stuck.

13 13 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html The table-driven program for a Deterministic FSA state = S // S is the start state repeat { k = next character from the input if (k == EOF) // the end of input if state is a final state then accept else reject state = T[state,k] if state = empty then reject // got stuck }

14 14 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Regular expressions are not enough. You can write an automaton that accepts the specific strings: –“a”, “(a)”, “((a))”, and “(((a)))” But you can’t write one for this (in the general case): –“a”, “(a)”, “((a))”, “(((a)))”, … “( k a) k ”

15 15 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Regular expressions are not enough. What programs are generated by? digit+ ( ( “+” | “-” | “*” | “/” ) digit+ )* [Note: the Perl-style replacement operators such as /1 are not part of the definition of regular expressions.] What important properties does this regular expression fail to express? –Regex’s are not good at showing Precedence Nesting Recursion

16 16 Context-Free Grammars

17 17 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Derivation 1.Begin with a string consisting of the start symbol “S” 2.Replace any non-terminal in the string by a the right-hand side of some production 3.Repeat (2) until there are no non-terminals in the string

18 18 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Example Define a language to recognize strings of balanced parentheses: The grammar: Recognizes these strings: () (()) (((()))) …

19 19 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Derivation Rules Is the same as

20 20 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Arithmetic Example Simple arithmetic expressions: Some valid strings in the language:

21 21 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Arithmetic Example Simple arithmetic expressions: Some valid strings in the language:

22 22 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Derivations and Parse Trees A derivation is a sequence of productions A derivation can be drawn as a tree –Start symbol is the tree’s root –For a production add children to node –Stop when you reach all non-terminals

23 23 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Right-most Derivation Example Input: id * id + id E E EE E+ id*

24 24 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Left-most and Right-most Derivations The example is a right- most derivation –At each step, replace the right-most non-terminal There is an equivalent notion of a left-most derivation

25 25 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html A formal definition of CFGs A CFG consists of –A set of terminals T –A set of non-terminals N –A start symbol S (a non-terminal) –A set of productions:

26 26 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Derivations and Parse Trees Note that right-most and left-most derivations have the same parse tree The difference is the order in which branches are added

27 27 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Example E E EE E+ id* The program: x * y + z Input to parser: ID TIMES ID PLUS ID we’ll write tokens as follows: id * id + id Output of parser: the parse tree 

28 28 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Ambiguity Grammar String

29 29 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Ambiguity (Cont.) This string has two parse trees E E EE E * id+ E E EE E+ *

30 30 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Ambiguity (Cont.) A grammar is ambiguous if for some string (the following three conditions are equivalent) –it has more than one parse tree –if there is more than one right-most derivation –if there is more than one left-most derivation Ambiguity is BAD –Makes the meaning of some programs ill-defined

31 31 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Dealing with Ambiguity There are several ways to handle ambiguity Most direct method is to rewrite grammar unambiguously Enforces precedence of * over +

32 32 Adapted from lecture by Ras Bodik, http://www.cs.wisc.edu/~bodik/cs536.html Properties of CFGs Membership in a language is “yes” or “no” Form of the grammar is important –Different grammars can generate the same language Need an “implementation” of CFG’s, –i.e. the parser –we’ll create the parser using a parser generator available generators: yacc, javacc

33 33 CFGs vs. Regex’s CFGs are better at expressing recursive structure –They are often described using trees … –… which we know have a recursive structure Example: –if E then S1 else S2 –if E1 then if E2 then S1 else S2 else S3 –if E1 then (if E2 then S1 else S2) else S3

34 34 Regular Languages vs CFGs Every regex can be expressed as a CFG The converse is not true –BUT regex’s cover much of what you need When to use which? –According to Aho, Sethi, and Ullman ’86: –Regex’s are more concise and easier to understand –More efficient analyzers can be constructed from regex’s –It is often useful to separate the structure of a language into lexical and nonlexical parts Lexical processed with regex The structure processed with CFGs


Download ppt "1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002."

Similar presentations


Ads by Google