CSCI 3130: Formal Languages and Automata Theory Tutorial 5

Presentation on theme: "CSCI 3130: Formal Languages and Automata Theory Tutorial 5"— Presentation transcript:

CSCI 3130: Formal Languages and Automata Theory Tutorial 5
Hung Chun Ho Office: SHB 1026 Department of Computer Science & Engineering 1

Agenda Cocke-Younger-Kasami (CYK) algorithm Pushdown Automata (PDA)
Parsing CFG in normal form Pushdown Automata (PDA) Design 2

Bottom-up Parsing for normal form
CYK Algorithm Bottom-up Parsing for normal form 3

Cocke-Younger-Kasami Algorithm
Used to parse context-free grammar in Chomsky normal form (or simply normal form) Every production is of type X  YZ X  a S  ε Normal Form Example S  AB A  CC | a | c B  BC | b C  CB | BA | c 4

CYK Algorithm - Idea = Algorithm 2 in Lecture Note (10L8.pdf)
Idea: Bottom Up Parsing Algorithm: Given a string s of length N For k = 1 to N For every substring of length k Determine what variable(s) can derive it 5

CYK Algorithm - Example
CFG Parse abbc S  AB A  CC | a | c B  BC | b C  CB | BA | c 6

CYK Algorithm – Idea (1) Idea: We parse the strings in this order:
Length-1 substring abbc 7

CYK Algorithm – Idea (1) Idea: We parse the strings in this order:
Length-2 substring abbc 8

CYK Algorithm – Idea (1) Idea: We parse the strings in this order:
Length-3 substring abbc Length-4 substring Done! 9

CYK Algorithm – Idea (2) Idea: Parsing of longer substrings depends on parsing of shorter substrings Example: abb may be decomposed as ab + b a + bb If we know how to parse ab and b (or, a and bb) then we know how to parse abb 10

CYK Algorithm – Substring
Denote sub(i, j) := substring with start index = i and end index = j Example: For abbc, sub(2,4) = bbc This notation is not to complicate things, but just for the sake of convenience in the following discussion… 11

CYK Algorithm – Table Each cell corresponds to a substring
Store variables deriving the substring Substring of length = 3 Starting with index = 2 i.e., sub(2,3) = bbc Length of Substring a b b c 12 Start Index of Substring

CYK Algorithm – Simulation
Base Case : length = 1 The possible choices of variable(s) can be known by scanning through each production S  AB A  CC | a | c B  BC | b C  CB | BA | c A B B A , C a b c 13

CYK Algorithm – Simulation
Loop : length = 2 For each substring of length 2 Decompose into shorter substrings Check cells below it A B A, C S  AB A  CC | a | c B  BC | b C  CB | BA | c ab Let’s parse this substring a b c 14

CYK Algorithm – Simulation
For sub(1,2) = ab, it can be decomposed: ab = a + b = sub(1,1) + sub(2,2) Possible choices: AB Scan rules A B A, C : S S  AB A  CC | a | c B  BC | b C  CB | BA | c S a b c 15

CYK Algorithm – Simulation
For sub(2,3) = bb, it can be decomposed: bb = b + b = sub(2,2) + sub(3,3) Possible choices: BB Scan rules No suitable rules are found  The CFG cannot parse this substring S　 A B A, C : ∅ S  AB A  CC | a | c B  BC | b C  CB | BA | c a b c 16

CYK Algorithm – Simulation
For sub(3,4) = bc, it can be decomposed: bc = b + c = sub(3,3) + sub(4,4) Possible choices: BA, BC Scan rules S　 ∅　 A B A, C : B, C S  AB A  CC | a | c B  BC | b C  CB | BA | c B, C a b c 17

CYK Algorithm – Simulation
For sub(1,3) = abb: abb = ab + b = sub(1,2) + sub(3,3) Possible choices: SB Scan rules No suitable variables found yet But, there is another way to decompose the string S　 ∅　 B, C　 A B A, C : ∅ S  AB A  CC | a | c B  BC | b C  CB | BA | c a b c 18

CYK Algorithm – Simulation
For sub(1,3) = abb: abb = a + bb = sub(1,1) + sub(2,3) Possible choices: ∅ Scan rules Cant parse smaller substring  Cant parse the string  No need to scan rules S　 ∅　 B, C　 A B A, C S  AB A  CC | a | c B  BC | b C  CB | BA | c a b c 19

CYK Algorithm – Simulation
For sub(1,3) = abb: abb = sub(1,1) + sub(2,3) gives no valid parsing abb = sub(1,2) + sub(3,3) gives no valid parsing Cannot parse S　 ∅　 B, C A B A, C S  AB A  CC | a | c B  BC | b C  CB | BA | c a b c 20

CYK Algorithm – Simulation
For sub(2,4) = bbc: bbc = sub(2,2) + sub(3,4) Possible choices: BB, BC bbc = sub(2,3) + sub(4,4) Possible choices: ∅  Variable: B ∅　 S　 B, C　 A B A, C S  AB A  CC | a | c B  BC | b C  CB | BA | c B a b c 21

CYK Algorithm – Simulation
Finally, for sub(1,4) = abbc: Possible choices: Variables: This cell represents the original string, and it consists S  abbc is in the language AB , SB, SC ∅　 B　 S　 B, C　 A B A, C S S  AB A  CC | a | c B  BC | b C  CB | BA | c a b c 22

CYK Algorithm – Parse Tree
abbc is in the language! How to obtain the parse tree? Tracing back the derivations: sub(1,4) is derived using SAB from sub(1,1) and sub(2,4) sub(1,1) is derived using Aa sub(2,4) is derived using BBC from sub(2,2) and sub(3,4) So, record also the used derivations! 23

CYK Algorithm – Parse Tree
Obtained from the table S　 ∅　 B　 B, C　 A B A, C a b c 24

CYK Algorithm – Conclusion
A bottom up parsing algorithm Dynamic Programming Solution of a subproblem (parsing of a substring) depends on that of smaller subproblems Before employing CYK Algorithm, convert the grammar into normal form Remove ε-productions Remove unit-productions 25

CYK Algorithm – Detailed
D = “On input w = w1w2…wn: If w = ε, and S  ε is rule, Accept For i = 1 to n: For each variable A: Test whether A  b is a rule, where b = wi. If so, place A in table(i, i). For l = 2 to n: For i = 1 to n – l + 1: Let j = i + l – 1, For k = i to j – 1: For each rule A  BC: If table(i,k) contains B and table(k+1, j) contains C Put A in table(i, j) If S is in table (1,n), accept. Otherwise, reject.” 26

NFA with infinite memory/states
Pushdown Automata NFA with infinite memory/states 27

Pushdown Automata PDA ~= NFA, with a stack of memory Transition:
NFA – Depends on input PDA – Depends on input and top of stack Push a symbol to stack Pop a symbol to stack Read a terminal on string Transitions are non-deterministic (possibly ε) 28

Pushdown Automata and NFA
Accept: NFA – Go to an Accept state PDA – Go to an Accept state 29

PDA – Example 1 Given the following language: Design a PDA for it
L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 30

PDA – Example 1 - Idea Idea: The input has two sections First half
All ‘0’s Second half All ‘1’s #‘1 depends on #‘0’ #‘0’ ≤ #‘1’ ≤ #‘0’ × 2 31

PDA – Example 1 – Solution
q0 e,e/\$ 0,e/X e,e/e q1 q2 e,\$/e 1,X/e 1,X/X q3 L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 32

PDA – Example 1 – Explain Solution: Let’s try some string… w = 00111
See white board for simulation… q0 e,e/\$ 0,e/X e,e/e q1 q2 e,\$/e 1,X/e 1,X/X q3 L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 33

PDA – Example 1 – Explain Solution: Indicates the start of parsing
q0 e,e/\$ 0,e/X e,e/e q1 q2 e,\$/e 1,X/e 1,X/X q3 L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 34

PDA – Example 1 – Explain Solution:
This part saves information about #‘0’ # ‘X’ in stack = #‘0’ q0 e,e/\$ 0,e/X e,e/e q1 q2 e,\$/e 1,X/e 1,X/X q3 L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 35

PDA – Example 1 – Explain Solution: This part accounts for #‘1’
#‘0’ ≤ #‘1’ ≤ #‘0’ × 2 q0 e,e/\$ 0,e/X e,e/e q1 q2 e,\$/e 1,X/e 1,X/X q3 L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 36

PDA – Example 1 – Explain Solution: Consume one ‘X’ and eats one ‘1’
q0 e,e/\$ 0,e/X e,e/e q1 q2 e,\$/e 1,X/e 1,X/X q3 L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 37

PDA – Example 1 – Explain Solution: Consume one ‘X’ and eats two ‘1’
q0 e,e/\$ 0,e/X e,e/e q1 q2 e,\$/e 1,X/e 1,X/X q3 L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 38

PDA – Example 1 – Explain Solution: Consume one ‘X’, and then
eats one ‘1’, or eat two ‘1’ q0 e,e/\$ 0,e/X e,e/e q1 q2 e,\$/e 1,X/e 1,X/X q3 L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 39

PDA – Example 1 – Explain Solution: Indicates the end of parsing
q0 e,e/\$ 0,e/X e,e/e q1 q2 e,\$/e 1,X/e 1,X/X q3 L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 40

PDA – Example 2 Given the following language: Design a PDA for it
L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d} 41

PDA – Example 2 – Idea Idea:
Sequentially read (multiple) ‘a’, ‘b’, ‘c’ and ‘d’ Maintain: #‘a’ + #‘c’ #‘b’ + #‘d’ If these numbers equal Accept 42

PDA – Example 2 – Solution
e,e/\$ q5 q1 a,e/X e,e/e b,\$/\$Y q2 c,X/XX q3 q4 e, \$ /e b,X/e b,Y/YY c,\$/\$X c,Y/e d,X/e d,\$/\$Y d,Y/YY L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d} 43

PDA – Example 2 – Explain Solution:
q5 q1 a,e/X e,e/e b,\$/\$Y q2 c,X/XX q3 q4 e, \$ /e b,X/e b,Y/YY c,\$/\$X c,Y/e d,X/e d,\$/\$Y d,Y/YY start a b c d end L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d} 44

PDA – Example 2 – Explain Solution: Each X in stack = An extra a or c
q5 q1 a,e/X e,e/e b,\$/\$Y q2 c,X/XX q3 q4 e, \$ /e b,X/e b,Y/YY c,\$/\$X c,Y/e d,X/e d,\$/\$Y d,Y/YY L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d} 45

PDA – Example 2 – Explain Solution: Each Y in stack = An extra b or d
q5 q1 a,e/X e,e/e b,\$/\$Y q2 c,X/XX q3 q4 e, \$ /e b,X/e b,Y/YY c,\$/\$X c,Y/e d,X/e d,\$/\$Y d,Y/YY L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d} 46

PDA – Example 2 – Explain Solution: X and Y ‘cancel’ each other
The stack contains only X’s or only Y’s e,e/\$ q5 q1 a,e/X e,e/e b,\$/\$Y q2 c,X/XX q3 q4 e, \$ /e b,X/e b,Y/YY c,\$/\$X c,Y/e d,X/e d,\$/\$Y d,Y/YY L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d} 47

PDA – Example 2 – Explain Solution: No X’s and no Y’s means
#a + #c = #b + #d  Accept e,e/\$ q5 q1 a,e/X e,e/e b,\$/\$Y q2 c,X/XX q3 q4 e, \$ /e b,X/e b,Y/YY c,\$/\$X c,Y/e d,X/e d,\$/\$Y d,Y/YY L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d} 48