CREATING PARSERS Assume you have a GRAMMAR G and a string S of Tokens of G. How can you tell if G can generate S? How can you extract the structure of S if it can be generated by G? Example G, Tokens a,b,c,d 1. Z ::= d 2. Z ::=XYZ 3. Y ::= b 4. Y ::= c 5. X ::= Y 6 X ::= aZ Example string ‘bbd’
CREATING “RECURSIVE DESCENT” PARSERS Method Outline: (1) Each non-terminal N becomes a Method/Procedure N (2) Each set of productions for a non-terminal N are used to define N’s body: -the RHS forms a statement sequence where non-terminals are method calls -alternative productions for the same non-terminal form if/case statements (3) Each Method/Procedure ‘consumes’ tokens – it inputs a list and outputs a smaller list
RECURSIVE DESCENT PARSER FOR G (NB - This is rough - e.g. it does not handle failure!) Method Z(in: S: list, out: S3: list) if head S == ‘d’ then record(1); S3 = tail S; else record(2); call X(S,S1); call Y(S1,S2); call Z(S2,S3) END Method Y(in: S:list, out S1:list) if head S == ‘b’ then record(3); S1 = tail S; else if head S == ‘c’ then record(4) S1 = tail S; END Method X(in:S: list, out S1:list) if head S == ‘a’ then record(6); call Z(tail S,S1); else record(5); call Y(S,S1) END PARSER_for_G(S) = call Z(S, OUT); if OUT == ““ then SUCCESS.
RECURSIVE DESCENT PARSER FOR G in PROLOG (NB - it does not handle failure!) z([HS|TS], TS) :- HS = ‘d’, write('rule 1 used'),nl. z([HS|TS], TS0) :- write('rule 2 used '),nl, x(S,S1), y(S1,S2), z(S2,TS0). y([HS|TS], TS) :- HS = ‘b’, write('rule 3 used'),nl. y([HS|TS], TS) :- HS = ‘c’, write('rule 4 used'),nl. x([HS|TS], TS0) :- HS = ‘a’, write('rule 6 used'),nl,z(TS,TS0). x([HS|TS], TS0) :- write('rule 2 used '),nl,y(TS,TS0). Eg | ?- z("adbd",C). rule 2 used rule 6 used rule 1 used rule 3 used rule 1 used true ?
Definition of ‘LL’ Parser: A Recursive Descent Parser is termed LL(1) because -- It parsers from Left to right; -- It calls methods corresponding to non- terminals in a Left to right fashion -- It looks for the next 1 Token in the string to decide what branch to take in the parser
PROBLEMS with the RD PARSER BUT THE RD won’t work unless G is LL(1), i.e. (i) G is NOT ambiguous (ii) G is NOT left recursive (iii) for EVERY two of G’s productions of the form X ::= W1, X ::= W2, it is the case that First(W1) and First(W2) have no common element [ e.g. Add extra rule 7: Z ::= b ]
TABLE DRIVEN PARSERS Rather than translating a grammar straight into a program, its much better to translate in into a “machine” or “table”, forming a “table- driven parser”. Translating a grammar into a table is a good form of analysis as well as one step towards a parser: GRAMMAR => TABLE => PARSER
TO AUTOMATICALLY CONSTRUCT AN LL(1) PARSING TABLE The Table is of size i x j – it has i columns corresponding to G’s i Tokens, j rows corresponding to G’s j non-terminals Entries in the table are one or more production rules METHOD: 1. Enter the rule N ::= w in Row N Column m for each m in the First(w) 2. For each rule N ::= w, find if w is nullable. If w is nullable, enter N ::= w in Row N Column m for each m in the set Follow(N)
TO RUN AN LL(1) PARSING TABLE …. on a string in its language (Exercise: adjust it to make it execute on any string) Program Run-parsing-table: % assume input string is consumed a token at a time M = special symbol; a = get_token; call Loop(M,a); end Loop(in: M, in/out: a) 1. Apply the production P in Row M Column a; 2. For each of the symbols w in the RHS of P if w is a Non-Terminal: call Loop(w,a); else if w is a Token: a = get_token end
SUMMARY LL parsers are top-down - they start at the special symbol and try to parse a string from left to right. We have seen how to construct two kinds of LL(1) parser - the recursive descent method and the table driven method NEXT WEEK - LR table driven parsing - how JavaCup constructs its parsers