Presentation is loading. Please wait.

Presentation is loading. Please wait.

5. Context-Free Grammars and Languages

Similar presentations


Presentation on theme: "5. Context-Free Grammars and Languages"— Presentation transcript:

1 5. Context-Free Grammars and Languages
CIS Automata and Formal Languages – Pei Wang

2 Languages and grammars
Regular expression: constants and operators Grammar: variables and rewriting rules Difference: whether to give a pattern a name Example: Binary palindromes do not form a regular language, but can be specified as P → ɛ | 0 | 1 | 0P0 | 1P1 where ‘P’ is a variable, ‘→’ the production symbol, and ‘|’ for alternatives

3 Context-free grammar A Context-Free Grammar (CFG) G is defined as G = (V, T, P, S): V: the set of variables (non-terminals, syntactic categories, each as a language) T: the set of terminal symbols (alphabet) P: the set of productions (rules) that each has a variable (head) and a string (body) S: the start symbol (as the whole language)

4 Example of CFG A simple arithmetic expression consists of identifiers connected by ‘+’ and ‘*’ operators E → I | E + E | E * E | (E) I → a | b | Ia | Ib | I0 | I1 The rules are defined individually, without ‘|’ In E → E + E, the three E’s represent different strings The star operator can be achieved by recursion

5 Derivation using a CFG A CFG defines a language that consists of the strings of terminals derived from the start symbol using the production rules Derivation: from the start symbol to the terminals Recursive inference: from the terminals to the start symbol

6 Example of recursive inference

7 Example of derivation Here ‘’ means “derive in one step”. With a ‘*’ above, it means “derive in any number of steps”; With a ‘G’ below, it means “derive by grammar G”

8 Leftmost/rightmost derivation
Leftmost/rightmost derivation restrict the selection of variable to be derived

9 Context-free language
L(G) is called a context-free language (CFL) since G is a context-free grammar A string derived from S is a “sentential form”, which can be “left” (or “right”) if formed by an leftmost (or rightmost) derivation

10 CFG and regular language
A CFG specifies a regular language if it is in one of the following two forms: Right-linear: if all of its rules have the form of P → ε, P → a, or P → aQ Left-linear: if all of its rules have the form of P → ε, P → a, or P → Qa The former maps to an ε-NFA, while the latter to the reverse of the former

11 5.1.1(a): define the CFG of { 0n1n | n  1 }
Exercises for Section 5.1 5.1.1(a): define the CFG of { 0n1n | n  1 } 5.1.1(b): define the CFG of { aibjck | i ≠ j or j ≠ k } Solutions:

12 Parse trees A derivation can be expressed as a parsing tree

13 Equivalent statements about CFG
The sequence of leaves of a parse tree, from left to right, is the yield of the tree, which is the terminal string derived from the start symbol

14 Parsers Parsing or syntactic analysis is the process of analyzing a string of symbols according to the rules of a formal grammar A parser is a program that generates parse trees from input strings according to a given grammar In UNIX, the YACC command takes a CFG as input, and the output is a fragment of C code that can generate a parse tree

15 Ambiguity in CFG A CFG is “ambiguous” if there is a string as the yield of different parse trees For example, the grammar of arithmetic expressions allow E + E * E to be parsed in two ways, for the different orders of the two operators The mere existence of different derivations does not imply ambiguity

16 Removing ambiguity There is no algorithm that can decide whether an arbitrary CFG is ambiguous, nor to remove all ambiguity Some ambiguity can be removed by revising the CFG, such as separating the order of + and * in expressions:

17 Unique derivation In an unambiguous grammar, leftmost derivations are unique, and so are rightmost derivations Therefore though a variable can have more than one production rule, only one can be applied in each situation For a given CFG, a string has two distinct parse trees if and only if it has two distinct leftmost derivations from the start symbol

18 Inherent ambiguity A CFL is “inherently ambiguous” if all its grammars are ambiguous Example: L = {anbncmdm}  {anbmcmdn} where m and n are positive integers It is easy to get a CFG that recognizes the two types of strings separately, but it will given the string “aabbccdd” two leftmost derivations, as well as two parse trees

19 Inherent ambiguity: example

20 Exercises for Section 5.4 Exercise 5.4.3: Find an unambiguous grammar for the above language Solutions:

21 Applications of CFG Examples: Mathematical language Logical language Markup language Programming language Natural language


Download ppt "5. Context-Free Grammars and Languages"

Similar presentations


Ads by Google