# Recursive Descent Parsing (with combinators) Greg Morrisett.

## Presentation on theme: "Recursive Descent Parsing (with combinators) Greg Morrisett."— Presentation transcript:

Recursive Descent Parsing (with combinators) Greg Morrisett

Last Time We saw how to use combinators to build not just a lexer, but a parser. The only difference is that parsers are generally recursive. And that recursion can get us into trouble.

For Example Suppose we have a grammar that looks like this: intlist -> INT intlist |

Using our Combinators intlist -> INT intlist | let int_p (ts:token list) = match ts with | (INT i)::rest -> [(i,rest)] | _ -> [] let rec intlist_p = fun ts -> ((int_p \$ intlist_p) % cons ++ eps) ts

A Manual Parser intlist -> INT intlist | let rec intlist_p ts = match ts with | (INT i)::rest -> let (ints,ts’) = intlist_p rest in (i::ints, ts’) | _ -> ([], ts)

For Example But what if we instead wrote: intlist -> intlist INT | Now the grammar is left-recursive since in one case, we run into the non-terminal intlist before we see any terminal.

Using our Combinators intlist -> intlist INT | let int_p (ts:token list) = match ts with | (INT i)::rest -> [(i,rest)] | _ -> [] let rec intlist_p = fun ts -> ((intlist_p \$ int_p) % cons_end ++ eps) ts

A Manual Parser intlist -> intlist INT | let rec intlist_p ts = let (ints, ts’) = intlist_p ts in match ts’ with | (INT i)::rest -> (ints @ [i], rest) | _ -> ([], ts) Oops! That’s definitely going to loop forever. So we want to avoid writing grammars that are left recursive.

Another Example exp -> INT | exp ‘+’ exp let rec exp_p ts = (int_p ++ (exp_p \$ tok PLUS \$ exp_p) % (function ((i,_),j) -> i+j))) ts

Inlining “++” let rec exp_p ts = (int_p ts) @ ((exp_p \$ tok PLUS \$ exp_p) % (function ((i,_),j) -> i+j) ts)

Inlining “\$” and “%” let rec exp_p ts = (int_p ts) @ let s1 = exp_p ts in fold_right (function (i,ts1) a ->...)

Note – infinite loop! let rec exp_p ts = (int_p ts) @ let s1 = exp_p ts in fold_right (function (i,ts1) a ->...)

Refactoring the Grammar exp -> INT | exp ‘+’ exp exp -> INT | INT ‘+’ exp This accepts the same strings, but is no longer left-recursive.

With our Combinators exp -> INT | INT ‘+’ exp let rec exp_p ts = int_p ++ (int_p \$ tok PLUS \$ exp_p) % (function ((i,_),j) -> i+j)

Unwinding the definitions let rec exp_p ts = (int_p ts) ++ let s1 = int_p ts in fold_right (function (i,ts2) -> match ts2 with | PLUS::ts3 -> let s2 = exp_p ts2 in... By the time we do the recursive call, the list of tokens is smaller.

Let’s Scale Up exp -> INT | exp ‘+’ exp | exp ‘*’ exp In addition to the problem with left-recursion, we have the problem that we’ll get multiple parse results for an expression like “3 + 2 * 6”.

Getting Rid of Left Recursion exp -> INT | INT ‘+’ exp | INT ‘*’ exp let rec exp_p = int_p ++ (int_p \$ tok PLUS \$ exp_p) %... (int_p \$ tok TIMES \$ exp_p) %...

Grouping exp -> term | term ‘+’ exp term -> INT | INT * exp

Grouping exp -> term | term ‘+’ exp term -> INT | INT ‘*’ term let rec term ts = (INT ++ (INT \$ tok TIMES \$ term) %...) ts and exp ts = (term ++ (term \$ tok PLUS \$ exp) %...) ts