Chapter 5 Context-Free Grammars

Slides:



Advertisements
Similar presentations
Lecture # 8 Chapter # 4: Syntax Analysis. Practice Context Free Grammars a) CFG generating alternating sequence of 0’s and 1’s b) CFG in which no consecutive.
Advertisements

1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
CS5371 Theory of Computation
Transparency No. P2C4-1 Formal Language and Automata Theory Part II Chapter 4 Parse Trees and Parsing.
Discussion #31/20 Discussion #3 Grammar Formalization & Parse-Tree Construction.
Specifying Languages CS 480/680 – Comparative Languages.
Lecture 9UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 9.
COP4020 Programming Languages
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together.
EECS 6083 Intro to Parsing Context Free Grammars
Problem of the DAY Create a regular context-free grammar that generates L= {w  {a,b}* : the number of a’s in w is not divisible by 3} Hint: start by designing.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Formal Grammars Denning, Sections 3.3 to 3.6. Formal Grammar, Defined A formal grammar G is a four-tuple G = (N,T,P,  ), where N is a finite nonempty.
Lecture 16 Oct 18 Context-Free Languages (CFL) - basic definitions Examples.
CMSC 330: Organization of Programming Languages
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Context-free Grammars Example : S   Shortened notation : S  aSaS   | aSa | bSb S  bSb Which strings can be generated from S ? [Section 6.1]
Syntax Analysis The recognition problem: given a grammar G and a string w, is w  L(G)? The parsing problem: if G is a grammar and w  L(G), how can w.
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
1 Context-Free Languages Not all languages are regular. L 1 = {a n b n | n  0} is not regular. L 2 = {(), (()), ((())),...} is not regular.  some properties.
Classification of grammars Definition: A grammar G is said to be 1)Right-linear if each production in P is of the form A  xB or A  x where A and B are.
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
Context-Free Grammars
Lecture # 19. Example Consider the following CFG ∑ = {a, b} Consider the following CFG ∑ = {a, b} 1. S  aSa | bSb | a | b | Λ The above CFG generates.
Grammars CPSC 5135.
Languages & Grammars. Grammars  A set of rules which govern the structure of a language Fritz Fritz The dog The dog ate ate left left.
Lecture 15 Naveen Z Quazilbash Ambiguity. Overview S-Grammars Ambiguity in Grammars Ambiguous grammars and Unambiguous Grammars.
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
Dept. of Computer Science & IT, FUUAST Automata Theory 2 Automata Theory V Context-Free Grammars andLanguages.
CMSC 330: Organization of Programming Languages Context-Free Grammars.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
CFG1 CSC 4181Compiler Construction Context-Free Grammars Using grammars in parsers.
Context Free Grammars.
CMSC 330: Organization of Programming Languages
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 11 Midterm Exam 2 -Context-Free Languages Mälardalen University 2005.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
Chapter 7 Pushdown Automata
Re-enter Chomsky More about grammars. 2 Parse trees S  A B A  aA | a B  bB | b Consider L = { a m b n | m, n > 0 } (one/more a ’s followed by one/more.
Grammars Hopcroft, Motawi, Ullman, Chap 5. Grammars Describes underlying rules (syntax) of programming languages Compilers (parsers) are based on such.
Grammars CS 130: Theory of Computation HMU textbook, Chap 5.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
1 Chapter 6 Simplification of CFGs and Normal Forms.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
Chapter 5 Context-free Languages
Syntax Analyzer (Parser)
Context-Free Languages
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Theory of Languages and Automata By: Mojtaba Khezrian.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Compiler Chapter 5. Context-free Grammar Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
CONTEXT-FREE LANGUAGES
Formal Language & Automata Theory
CS510 Compiler Lecture 4.
Fall Compiler Principles Context-free Grammars Refresher
Recap lecture 33 Example of trees, Polish Notation, examples, Ambiguous CFG, example,
Syntax Specification and Analysis
Compiler Construction
PARSE TREES.
Context-Free Languages
Finite Automata and Formal Languages
Fall Compiler Principles Context-free Grammars Refresher
Presentation transcript:

Chapter 5 Context-Free Grammars

Grammar (Review) A grammar is a 4-tuple G = (NT, T, S, P), where NT: a finite set of non-terminal symbols T: a finite set of terminal (alphabet) symbols S: is the starting symbol S  NT P: a finite set of production rules: x → y x  NT y  (NT  T)* Derivation of w  T*: S  w1 ···  wn  w (written as S * w) The language generated by the grammar is L(G) = {w  T* : S * w}

5.1: Context-Free Grammars (1) Context-Free Grammars (CFG) are language-description mechanisms used to generate the strings of a language A grammar is said to be context-free if every rule has a single non-terminal on the left-hand side A   A  NT: single non-terminal on the left hand side   (NT  T)*: string of terminals and non-terminals A language L is called context-free language (CFL) if and only if there is a context-free grammar G (CFG) that generates it (i.e., L = L(G))

5.1: Context-Free Grammars (3) Example 5.1 G = ({S}, {a, b}, S, {S → aSa | bSb | }) For aabbaa, typical derivation might be: S  aSa  aaSaa  aabSbaa  aabbaa Grammar generates language L(G)={wwR : w  {a, b}*}

5.1: Context-Free Grammars (5) What is the language of this grammar? G = ({S}, {a, b}, S, {S → aSb, S → ab}) L(G) = {anbn | n  1} G = ({S}, {a, b}, S, {S → aSb, S → e}) L(G) = {anbn | n  0}

5.1: Context-Free Grammars (6) What is the language of this grammar? G = ({S, A}, {a, b}, S, {S → aSa | aAa A → bA | b } L(G) = {anbman | n  1, m  1} G1 = ({S, A, B}, {a, b}, S, {S → AB A → aA | a B → bB | e } G2 = ({S, B}, {a, b}, S, {S → aS | aB L(G) = {a+b*} = {anbm | n  1, m  0}

5.1: Context-Free Grammars (7) What is the language of this grammar? G = ({S, B}, {a, b, c}, S, {S → abScB | e B → bB | b } L = {(ab)n(cbm)n | n  0, m  1} What is the grammar of this language? L = {anbmcmdn | n  1, m  1} G = ({S, A}, {a, b, c, d}, S, { S → aSd | aAd A → bAc | bc } L = {anbmcmd2n | n  0, m  1} G = ({S, A}, {a, b, c, d}, S, { S → aSdd | A

5.1: Context-Free Grammars (8) What is the grammar of this language? L = {a*ba*ba*} G1 = ({S, A}, {a, b}, S, {S → AbAbA A → aA | e } G2 = ({S, A, C}, {a, b}, S, {S → aS | bA A → aA | bC } C → aC | e

5.1: Context-Free Grammars (9) What is the grammar of this language? A language over {a, b} with at least 2 b’s G1 = ({S, A}, {a, b}, S, {S → AbAbA A → aA | bA | e } G2 = ({S, A, C}, {a, b}, S, {S → aS | bA A → aA | bC C → aC | bC | e }

5.1: Context-Free Grammars (10) What is the grammar of this language? A language over {a, b} of even-length strings G = ({S, O}, {a, b}, S, {S → aO | bO | e O → aS | bS } A language over {a, b} of an even no. of b’s G = ({S, B, C}, {a, b}, S, { S → aS | bA | e A → aA | bS }

5.1: Context-Free Grammars (11) What is the grammar of this language? A language over {a, b, c} that do not contain the substring abc G = ({S, B, C}, {a, b, c}, S, {S → bS | cS | aA | e A → aA | bB | cS | e B → aA | bS | e}

5.1: Context-Free Grammars (12) Write a CFG for the following Languages L1 = {wcwR | w  {a, b}*} S  aSa | bSb | c L2 = {anbncmdm | n  1, m  1} S  XY X  aXb | ab Y  cYd | cd

5.1: Context-Free Grammars (13) The following languages are NOT CFLs L1 = {wcw | w  {a, b}*} L2 = {anbmcndm | n  1, m  1} L3 = {anbncn | n  0} G = ({S}, {a, b}, S, {S → aSb | bSa | SS | }) is this grammar context-free? yes, there is a single variable on the left hand side

5.1: Context-Free Grammars (14) Are regular languages context-free? Why? yes, SEE NEXT SLIDE But, not all context-free grammars are regular. Regular languages are a proper subset of the class of context-free languages. Regular grammars are a proper subset of context-free grammars. Non-regular languages can be generated by context-free grammars, so the term context-free generally includes non-regular languages and regular languages.

5.1: Context-Free Grammars (14) Constructing (right linear) CFG from DFA Each state of the DFA will be represented by a nonterminal The initial state will correspond to the start nonterminal For each transition d(qi, a) = qj , add a rule qi → aqj For each accepting state qf , add a rule qf → e

5.1: Context-Free Grammars (15) Leftmost & Rightmost Derivations Given the grammar S → aAB, A → bBb, B → A |  String abbbb can be derived in different ways: Left-most derivation: always replace the leftmost NT in each sentential form S  aAB  abBbB  abAbB  abbBbbB  abbbbB  abbbb Right-most derivation: always replace the rightmost NT in each sentential form S  aAB  aA  abBb  abAb  abbBbb  abbbb

5.1: Context-Free Grammars (16) DEFINITION  derives in one step  derives in  one step  indicates that the derivation utilizes the rules of grammar G + * G

5.1: Context-Free Grammars (17) Derivation Order Leftmost: Replace the leftmost non-terminal symbol Rightmost: Replace the rightmost non-terminal symbol lm lm Important Notes: A  y If xAz xyz , what’s true about x ? terminals If xAz xyz , what’s true about z ? terminals  lm  rm

5.1: Context-Free Grammars (18) Example S  S + E | E E  number | ( S ) Left-most derivation S  S+E  E+E  (S)+E  (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E (1+2+E)+E (1+2+(S))+E (1+2+(S+E))+E (1+2+(E+E))+E (1+2+(3+E))+E (1+2+(3+4))+E (1+2+(3+4))+5 Right-most derivation S  S+E  E+5  (S)+5  (S+E)+5 (S+(S))+5 (S+(S+E))+5 (S+(S+4))+5 (S+(E+4))+5 (S+(3+4))+5 (S+E+(3+4))+5 (S+2+(3+4))+5 (E+2+(3+4))+5 (1+2+(3+4))+5

5.1: Context-Free Grammars (19) Derivation (Parsing) Trees Given the grammar S → aaSB | , B → bB | b A leftmost derivation S  aaSB  aaB  aab A rightmost derivation S  aaSB  aaSb  aab Both derivations correspond to the parse (or derivation) tree above.

5.1: Context-Free Grammars (20) In a parse tree, the nodes labeled with NTs correspond to the left side of a production rule and the children of that node correspond to the right side of the rule, e.g., S → aaSB. The tree structure shows the rule that is applied to each NT (for a specific derivation), without showing the order of rule application. Each internal node of the tree corresponds to a NT, and the leaves of the derivation tree represent the string of terminals. Note the tree applies to a specific derivation and may not include all rules, e.g., B → bB is not shown above.

5.1: Context-Free Grammars (21) Definition 5.3: Let G = (NT, T, S, P) be a context-free grammar. An ordered tree is a derivation tree for G if and only if it has the following properties: 1. the root is labeled S 2. every leaf has a label from T  {} 3. every non-leaf (interior) vertex has a label from NT. 4. if a vertex has label A  NT, and its children are labeled (left to right) a1, a2, ..., an, then P contain a production of the form A → a1a2…an 5. a leaf labeled  has no siblings; that is, a vertex with a child labeled  can have no other children Partial derivation tree: a tree having properties 3–5, but for which property 1 may not hold, and for which property 2 is replaced with: 2a. every leaf has a label from NT  T  {}

5.1: Context-Free Grammars (22) Derivation  Parse Tree S Parse Tree S  S + E | E E  number | ( S ) S + E E 5 ( S ) S + E S + E ( S ) E S + E 2 4 1 E 3 Derivation S  S +E  E+E  (S)+E  (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E (1+2+E)+E … (1+2+(3+4))+E (1+2+(3+4))+5

5.1: Context-Free Grammars (23) op E  E op E E op E  E op E | ( E ) | - E | id op  + | - | * | / |   id op E id E op id *  id * E E op id *  id * id

5.1: Context-Free Grammars (24) A derivation tree for grammar G yields a sentence of the language L(G) A partial derivation tree yields a sentential form The yield of a parse tree is the string of leaf symbols obtained by reading the tree left-to-right the order they are encountered when the tree is traversed in a depth-first manner, always taking the leftmost unexplored branch

5.1: Context-Free Grammars (25) Given the grammar S → aAB A → bBb B → A | , the yield of the partial derivation tree is the sentential form abBbB

5.1: Context-Free Grammars (26) Given the grammar S → aAB A → bBb B → A | , the yield of the derivation tree is the sentence abbbb  L(G).

5.1: Context-Free Grammars (27) Theorem 5.1 (Relationship between Sentential Forms & Derivation Trees) Let G = (NT, T, S, P) be a CFG. Then for every w  L(G), there exists a derivation tree of G whose yield is w. the yield, w, of any derivation tree of G is such that w  L(G). if tG is any partial derivation tree for G whose root is labeled S, then the yield of tG is a sentential form of G. As a side note, any w  L(G) has a leftmost and a rightmost derivation.

5.2: Parsing and Ambiguity (1) In practical applications, it is usually not enough to decide whether a string belongs to a language. It is also important to know how to derive the string from the language. Parsing uncovers the syntactical structure of a string, which is represented by a parse tree. The syntactical structure is important for assigning semantics to the string -- for example, if it is a computer program.

5.2: Parsing and Ambiguity (2) Application of parsing (Compilers) Let G be a context-free grammar for the C language. Let the string w be a C program. One thing a compiler does - in particular, the part of the compiler called the “parser” - is determine whether w is a syntactically correct C program. It also constructs a parse tree for the program that is used in code generation. There are many sophisticated and efficient algorithms for parsing. You may study them in more advanced classes (for example, on compilers).

5.2: Parsing and Ambiguity (3) S-grammars Definition 5.4: A context-free grammar G = (NT, T, S, P) is said to be a simple grammar or s-grammar if all of its productions are of the form A → ax where A  NT, a  T, x  NT*, and any pair (A, a) occurs at most once in P. Examples: S → aS | bSS | c is an s-grammar. S → aS | bSS | aSB | c is not an s-grammar. because pair (S, a) occurs in two productions: S → aS, and S → aSB

5.2: Parsing and Ambiguity (4) Example: given s-grammar G with production S → aS | bSS | c show the derivation of the string w = abcc Since G is an s-grammar, the only way to get the a up front is via rule S → aS the only way to get the b is via rule S → bSS the only way to get each c is via rule S → c Thus, we have parsed the string in 4 steps as S  aS  abSS  abcS  abcc

5.2: Parsing and Ambiguity (5) Find an s-grammar for aaa*b | b We need to start with an a or have only the b. S → aA S → b A → aB B → aB B → b

5.2: Parsing and Ambiguity (6) A terminal string may be generated by a number of different derivations Let G be a CFG. A string w is in L(G), iff there is a leftmost derivation of w from S. (S  w) Question Is there a unique leftmost derivation of every sentence (string) in the language of a grammar? NO * lm

5.2: Parsing and Ambiguity (7) Consider the expression grammar: E  E+E | E*E | (E) | -E | id Two different leftmost derivations of id + id * id E  E + E  id + E  id + E * E  id + id * E  id + id * id E * + id

5.2: Parsing and Ambiguity (8) Consider the expression grammar: E  E+E | E*E | (E) | -E | id Two different leftmost derivations of id + id * id E  E * E  E + E * E  id + E * E  id + id * E  id + id * id E + * id

5.2: Parsing and Ambiguity (9) Ambiguity Definition 5.5: a grammar G is ambiguous if there is a string with at least two parse trees, two or more leftmost or rightmost derivations. Example: CFG G with productions S → aSb | SS |  is ambiguous because there are two parse tree for w = aabb as shown below.

5.2: Parsing and Ambiguity (10) A CFG is ambiguous if there is a string w  L(G) that can be derived by two distinct leftmost derivations A grammar G is ambiguous if there exists a sentence in G with more than one derivation (parsing) tree A grammar that is not ambiguous is called unambiguous If G is ambiguous then L(G) is not necessarily ambiguous A language L is inherently ambiguous if there is no unambiguous grammar that generates it

5.2: Parsing and Ambiguity (11) Let G be S → aS | Sa | a G is ambiguous since the string aa has 2 distinct leftmost derivation S  aS  aa S  Sa  aa L(G) = a+ This language is also generated by the unambiguous grammar S → aS | a L(G) is not ambiguous. Why?

5.2: Parsing and Ambiguity (12) Let G be S → bS | Sb | a L(G) = b*ab* G is ambiguous since the string bab has 2 distinct leftmost derivation S  bS  bSb  bab S  Sb  bSb  bab The ability to generate the b’s in either order must be eliminated to obtain an unambiguous grammar This language is also generated by the unambiguous grammars G2: S → bS | A A → Ab | a G1: S → bS | aA A → bA | e

5.2: Parsing and Ambiguity (13) Eliminating Ambiguity Consider the following grammar E  E + E | E * E | num Consider the sentence 1 + 2 * 3 Leftmost Derivation 1 E  E + E  1 + E  1 + E * E  1 + 2 * E  1 + 2 * 3 Leftmost Derivation 2 E  E * E  E + E * E E * + 1 3 2 E + * 3 1 2

5.2: Parsing and Ambiguity (14) Different parse trees correspond to different evaluations (meaning) LMD-1 LMD-2 * + + * 1 2 3 1 2 3 = 9 = 7

5.2: Parsing and Ambiguity (15) Ambiguous: E → E + E | E * E | number Both + and * have the same precedence To remove ambiguity, you have to give + and * different precedence Let us say * has higher precedence than + E  E + T | T T  T * num | num

5.2: Parsing and Ambiguity (16) Eliminating Ambiguity E  E + T | T T  T * num | num Consider the sentence 1 + 2 * 3 Leftmost Derivation (only one LMD) E  E + T  T + T  num + T  1 + T  1 + T * num  1 + num * num  1 + 2 * num  1 + 2 * 3 num T * E + 1 3 2 = 7

5.2: Parsing and Ambiguity (17) Let us say + has higher precedence than * E  E * T | T T  T + num | num Consider the sentence 1 + 2 * 3 Leftmost Derivation (only one LMD) E  E * T  T * T  T + num * T  num + num * T  1 + num * T  1 + 2 * T  1 + 2 * num  1 + 2 * 3 E T * T num 3 num T + 2 num = 9 1

5.2: Parsing and Ambiguity (18) A derivation tree… Corresponds to exactly one leftmost derivation. Corresponds to exactly one rightmost derivation. CFG ambiguous  any of following equivalent statements:  string w with multiple derivation trees.  string w with multiple leftmost derivations.  string w with multiple rightmost derivations. Note: Defining grammar, not language, ambiguity.

5.3: CFGs & Programming Languages (1) Programming languages are context-free, but not regular Programming languages have the following features that require infinite “stack memory” matching parentheses in algebraic expressions nested if .. then .. else statements nested loops block structure

5.3: CFGs & Programming Languages (2) <unsigned constant>  <unsigned number> <constant>  <unsigned number> | <sign> <unsigned number> <unsigned number>  <unsigned integer> | <unsigned real> <unsigned integer>  <digit> <unsigned integer> | <digit> <unsigned real>  <unsigned integer> . <unsigned integer> | <unsigned integer> . <unsigned integer> E <exp> | <unsigned integer> E <exp> <exp>  <unsigned integer> | <sign> <unsigned integer> <digit>  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 <letter>  a | b | c | … | y | z | A | B | C | … | Y | Z <sign>  + | - <identifier>  <letter> <identifier tail> <identifier tail>  <letter> <identifier tail> | <digit> <identifier tail>

5.3: CFGs & Programming Languages (3) <expression>  <simple expression> <simple expression>  <term> | <sign term> | <simple expression> <adding operator> <term> <term>  <factor> | <term> <multiplying operator> <factor> <factor>  <variable> | <unsigned constant> | (<expression>) <adding operator>  + | - <multiplying operator>  * | / | div | mod