Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 425: Syntax II Context Free Grammars and BNF In context free grammars (CFGs), structures are independent of the other structures surrounding them Backus-Naur.

Similar presentations


Presentation on theme: "CSE 425: Syntax II Context Free Grammars and BNF In context free grammars (CFGs), structures are independent of the other structures surrounding them Backus-Naur."— Presentation transcript:

1 CSE 425: Syntax II Context Free Grammars and BNF In context free grammars (CFGs), structures are independent of the other structures surrounding them Backus-Naur form (BNF) notation describes CFGs –Symbols are either tokens or nonterminal symbols –Productions are of the form nonterminal → definition where definition defines the structure of a nonterminal –Rules may be recursive, with nonterminal symbol appearing both on left side of a production and in its own definition –Metasymbols are used to identify the parts of the production (arrow), alternative definitions of a nonterminal (vertical bar) –Next time we’ll extend metasymbols for repeated (braces) or optional (square brackets) structure in a definition (EBNF)

2 CSE 425: Syntax II Parse Trees and Abstract Syntax Trees Parse trees show derivation of a structure from BNF –E.g., number → DIGIT | DIGIT number Abstract syntax trees (ASTs) encapsulate the details –Very useful for converting between structurally similar forms parse treeabstract syntax tree number DIGIT number DIGIT 4 2 5 hornclause body head predicate …

3 CSE 425: Syntax II Ambiguity, Associativity, Precedence If any statement in the language has more than one distinct parse tree, the language is ambiguous –Ambiguity can be removed implicitly, as in always replacing the leftmost remaining nonterminal (an implementation hack) Recursive production structure also can disambiguate –E.g., adding another production to the grammar to establish precedence (lower in parse tree gives higher precedence) –E.g., replacing exp → exp + exp with alternative productions exp → exp + term or exp → term + exp Recursive productions also define associativity –I.e., left-recursive form exp → exp + term is left-associative, right-recursive form exp → term + exp is right-associative

4 CSE 425: Syntax II Extended Backus-Naur Form (EBNF) Optional/repeated structure is common in programs –E.g., whether or not there are any arguments to a function –E.g., if there are arguments, how many there are We can extend BNF with metasymbols –E.g., square brackets indicate optional elements, as in the production function → name ‘(‘ [args] ‘)’ –E.g., curly braces to indicate zero or more repetitions of elements, as in the production args → arg {‘,’ arg} –Doesn’t change the expressive power of the grammar A limitation of EBNF is that it obscures associativity –Better to use standard BNF to generate parse/syntax trees

5 CSE 425: Syntax II Recursive-Descent Parsing Shift-reduce (bottom-up) parsing techniques are powerful, but complex to design/implement manually –Further details about them are in another course (CSE 431) –Still will want to understand how they work, use techniques Recursive-descent (top-down) parsing is often more straightforward, and can be used in many cases –We’ll focus on these techniques somewhat in this course Key idea is to design (potentially recursive) parsing functions based on the productions’ right-hand sides –Then, work through a grammar from more general rules to more specific ones, consuming input tokens upon a match –EBNF helps with left recursion removal (making a loop) and left factoring (making remainder of parse function optional)

6 CSE 425: Syntax II Lookahead with First and Follow Sets Recursive descent parsing functions are easiest to write if they only have to consider the current token –I.e., the head of a stream or list of input tokens Optional and repeated elements complicate this a bit –E.g., function → name ( [args] ) and arg → 0 |…| 9 and args → arg {, arg} with ( ) 0 |…| 9, as terminal symbols But, EBNF structure helps in handling these two cases –The set of tokens that can be first in a valid sequence, e.g., each digit in 0 |…| 9 is in the first set for arg (and for args) –The set of tokens that can follow a valid sequence of tokens, e.g., ‘)’ is in the follow set for args –A token from the first set gives a parse function permission to start, while one from the follow set directs it to end

7 CSE 425: Syntax II Today’s Studio Exercises We’ll code up ideas from Scott Chapter 2.3 –Looking at more ideas and mechanisms for parsing, especially ones that are relevant to the lab assignment Today’s exercises are again all in C++ –Please take advantage of the on-line tutorial and reference manual pages that are linked on the course web site –As always, please ask us for help as needed When done, email your answers to the course account with “Syntax Studio II” in the subject line


Download ppt "CSE 425: Syntax II Context Free Grammars and BNF In context free grammars (CFGs), structures are independent of the other structures surrounding them Backus-Naur."

Similar presentations


Ads by Google