# Layered Combinator Parsers with a Unique State Pieter Koopman Rinus Plasmeijer Nijmegen, The Netherlands.

## Presentation on theme: "Layered Combinator Parsers with a Unique State Pieter Koopman Rinus Plasmeijer Nijmegen, The Netherlands."— Presentation transcript:

Layered Combinator Parsers with a Unique State Pieter Koopman Rinus Plasmeijer Nijmegen, The Netherlands

Parser CombinatorsPieter Koopman2 Overview conventional parser combinators requirements new combinators system-architecture new parser combinators separate scanner and parser error handling

Parser CombinatorsPieter Koopman3 parser combinators Non deterministic, list of results :: Parser s r :== [s] -> [ ParseResult s r ] :: ParseResult s r :== ([s],r) fail & yield fail = \ss = [] yield r = \ss = [(ss,r)] recognize symbol satisfy :: (s->Bool) -> Parser s s satisfy f = p where p [s:ss] | f s = [(ss,s)] p _ = [] symbol sym :== satisfy ((==) sym)

Parser CombinatorsPieter Koopman4 parser combinators 2 sequence-combinators ( ) infixr 6::(Parser s r)(r->Parser s t)->Parser s t ( ) p1 p2 = \ss1 = [ tuple \\ (ss2,r1) <- p1 ss1, tuple <- p2 r1 ss2 ] ( )infixl 6::(Parser s(r->t))(Parser s r)->Parser s t ( ) p1 p2 = \ss1 = [ (ss3,f r) \\ (ss2,f) <- p1 ss1, (ss3,r) <- p2 ss2 ] choose-combinator ( ) infixr 4::(Parser s r) (Parser s r)->Parser s r ( ) p1 p2 = \ss = p1 ss ++ p2 ss

Parser CombinatorsPieter Koopman5 parser combinators 3 some useful abbreviations (@>) infixr 7 (@>) f p :== yield f p ( ) infixl 6 ( ) p1 p2 :== (\h t=[h:t]) @> p1 p2

Parser CombinatorsPieter Koopman6 parser combinators 4 Kleene star star p = p star p yield [] plus p = p star p parsing an identifier identifier :: Parser Char String identifier = toString @> satisfy isAlpha star (satisfy isAlphanum)

Parser CombinatorsPieter Koopman7 parser combinators 5 context sensitive parsers twice the same character doubleChar = satisfy isAlpha \c -> symbol c arbitrary look ahead lookAhead = symbol 'a' +> symbol 'b' symbol 'a' +> symbol 'c'

Parser CombinatorsPieter Koopman8 parser combinators 5 context sensitive parsers twice the same character doubleChar = satisfy isAlpha \c -> symbol c arbitrary look ahead lookAhead = symbol 'a' +> symbol 'b' symbol 'a' +> symbol 'c' star (satisfy isSpace) +> symbol 'a' symbol 'x'

Parser CombinatorsPieter Koopman9 properties of combinators + concise and clear parsers + full power of fpl available + context sensitive + arbitrary look-ahead + can be efficient, continuations IFL '98 - no error handling (messages & recovery) - no unique symbol tables - separate scanner yields problems scan entire file before parser starts

Parser CombinatorsPieter Koopman10 Requirements parse state with error file notion of position user-defined extension e.g. symbol table possibility to add separate scanner efficient implementation, continuations for programming languages we want a single result (deterministic grammar)

Parser CombinatorsPieter Koopman11 Uniqueness files and windows that should be single-threaded are unique in Clean fwritec :: Char *File -> *File data-structures can be updated destructively when they are unique only unique arrays can be changed

Parser CombinatorsPieter Koopman12 System-architecture replace the list of symbols by a structure containing actual input position error administration user defined part of the state use a type constructor class to allow multiple levels

Parser CombinatorsPieter Koopman13 Type constructor class Reading a symbol class PSread ps s st :: (*ps s *st)->(s, *ps s *st) Copying the state is not allowed, use functions to manipulate the input class PSsplit ps s st :: (s, *ps s *st)->(s, *ps s *st) class PSback ps s st :: (s, *ps s *st)->(s, *ps s *st) class PSclear ps s st :: (s, *ps s *st)->(s, *ps s *st) Minimal parser state requires Clean 2.0 class ParserState ps symbol state | PSread, PSsplit, PSback, PSclear ps symbol state

Parser CombinatorsPieter Koopman14 New parser combinators Parsers have three arguments 1. success-continuation determines action upon success SuccCont :== Item failCont State -> (Result, State) 2. fail-continuation specifies what to do if parser fails FailCont :== State -> (Result, State) 3. current input state State :== (Symbol, ParserState)

Parser CombinatorsPieter Koopman15 New parser combinators 2 yield and fail, apply appropriate continuation yield r = \succ fail tuple = succ r fail tuple failComb = \succ fail tuple = fail tuple sequence of parsers, change continuation p1 p2 = \sc fc t -> p1 (\a _ -> p2 a sc fc) fc t choice, change continuations ( ) p1 p2 = \succ fail tuple = p1 (\r f t = succ r fail (PSclear t)) (\t2 = p2 succ fail (PSback t2)) (PSsplit tuple)

Parser CombinatorsPieter Koopman16 string input a very simple instance of ParserState :: *StringInput symbol state = { si_string :: String // string holds input, si_pos :: Int // index of current char, si_hist :: [Int] // to remember old positions, si_state :: state // user-defined extension, si_error :: ErrorState } instance PSread StringInput Char state where PSread si=:{si_string,si_pos} = (si_string.[si_pos],{si & si_pos = si_pos+1}) instance PSsplit StringInput Char state where PSsplit (c,si=:{si_pos,si_hist}) = (c,{si & si_hist = [si_pos:si_hist]}) instance PSback StringInput Char state where PSback (_,si=:{si_string,si_hist=[h:t]}) = (si_string.[h-1],{si & si_pos = h, si_hist = t})

Parser CombinatorsPieter Koopman17 Separate scanner and parser sometimes it is convenient to have a separate scanner e.g. to implement the offside rule task of scanner and parser is similar. So, use the same combinators due to the type constructor class we can nest parser states

Parser CombinatorsPieter Koopman18 a simple scanner use of combinators doesn’t change produces tokens (algebraic datatype) scanner = skipSpace +> ( generateOffsideToken satisfy isAlpha star (satisfy isAlphanum) plus (satisfy isDigit) symbol '=' symbol '(' symbol ')' <@ K CloseToken )

Parser CombinatorsPieter Koopman19 generating offside tokens use an ordinary parse function generateOffsideToken = pAcc getCol \col -> // get current coloumn pAcc getOffside \os_col -> // get offside position handleOS col os_col where handleOS col os_col | EndGroupGenerated os_col | col < os_col = pApp popOffside (yield EndOfGroupToken) = pApp ClearEndGroup failComb | col <= os_col = pApp SetEndGroup (yield EndOfDefToken) = failComb

Parser CombinatorsPieter Koopman20 Parser state for nesting parser state contains scanner and its state :: *NestedInput token state = E..ps sym scanState: { ni_scanSt :: (ps sym scanState), ni_scanner :: (ps sym scanState) -> *(token, ps sym scanState)), ni_buffer :: [token], ni_history :: [[token]], ni_state :: state } can be nested to any depth we can, but doesn’t have to, use this

Parser CombinatorsPieter Koopman21 Parser state for nesting 2 NestedInput *File *ErrorState *OffsideState ScanState scanner *HashTable

Parser CombinatorsPieter Koopman22 Parser state for nesting 3 apply scanner to read token instance PSread NestedState token state where PSread ns=:{ns_scanner, ns_scanSt} # (tok, state) = ns_scanner ns_scanSt = (tok, {ns & ns_scanSt = state}) here, we ignored the buffer define instances for other functions in class ParserState

Parser CombinatorsPieter Koopman23 error handling general error correction is difficult correct simple errors skip to new definition otherwise Good error messages: location:position in file what are we parsing:stack of contexts Error [t.icl,20,[caseAlt,Expression]]: ) expected instead of =

Parser CombinatorsPieter Koopman24 error handling 2 basic error generation parseError expected val = \succ fail (t,ps) = let msg = toString expected +++ " expected instead of " +++ toString t in succ val fail (PSerror msg (PSread ps)) useful primitives wantSymbol sym = symbol sym parseError sym sym want p msg value = p parseError msg value skipToSymbol sym = symbol sym parseError sym sym +> star (satisfy ((<>) sym)) +> symbol sym

Parser CombinatorsPieter Koopman25 Parser Parsing expressions pExpression = "Expression" ::> BV @> match mBasicValue pIdentifier symbol CaseToken +> pDeter Case @> pCompoundExpression star pCaseAlt <+ skipToSymbol EndOfGroupToken symbol OpenToken +> pCompoundExpression <+ wantSymbol CloseToken

Parser CombinatorsPieter Koopman26 identifiers in hashtable use a parse-function hashtable is user defined state in ParserState pIdentifier = match mIdentToken \ident = pAccSt (putNameInHashTable ident) <@ \name={app_symb=UnknownSymbol name, app_args=[]} the function pAccSt applies a function to the user defined state

Parser CombinatorsPieter Koopman27 limitations of this approach syntax specified by parse functions grammar is not a datastructure no detection of left recursion runtime error instead of nice message no automatic left-factoring do it by hand, or runtime overhead p1 = p q1 p q2 p2 = p (q1 q2)

Parser CombinatorsPieter Koopman28 discussion old advantages concise, fpl-power, arbitrary look ahead, context sensitve new advantages unique and extendable parser state one or more layers decent error handling, simple error correction can be added still efficient, overhead < 2 non-determinism only when needed

Download ppt "Layered Combinator Parsers with a Unique State Pieter Koopman Rinus Plasmeijer Nijmegen, The Netherlands."

Similar presentations