Download presentation
Presentation is loading. Please wait.
Published byЛюдмила Малявко Modified over 5 years ago
1
Grammars Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5th edition, by Kenneth Rosen
2
Alphabets and Languages (Review)
Definition: A vocabulary (or alphabet) V is a finite, nonempty set of symbols. Definition: A word or sentence over V is a finite string of symbols from V. Definition: The empty string or null string, denoted by , is the string containing no symbols. Definition: The set of all words over V is denoted by V*. Definition: A language over V is a subset of V*.
3
Language Examples (Review)
Let V={0,1} 00110, 11111, 00, and 11 are words over V 012, a234, and 222 are not words over V V*={0,1,00,01,10,11,000,…} In other words, V* is the set of all binary strings The set of strings consisting of only 0s is a language over V* {1,10,100,1000,10000,…} is a language over V*
4
Concatenation (Review)
Definition: Let V be a vocabulary, and A and B be subsets of V*. The concatenation of A and B, denoted by AB, is the set of all strings of the form xy, where xÎA and yÎB. Example: Let A={0, 10}, and B={1,12}. Then AB={01, 012, 101, 1012} BA={10, 110, 120, 1210} AA={00, 010, 100, 1010} AAA=A(AA)={000, 0010, 0100, 01010, 1000, 10010, 10100, }
5
Concatenation: An (Review)
Definition: Let V be a vocabulary, and A a subset of V*. Then A0={} , and for n>0, we can define An=A(n-1)A Example: Let A={0, 10}. Then A0={l} A1=A0A={l}A=A={0,10} A2=A1A ={00, 010, 100, 1010} A3= A2A={000, 0010, 0100, 01010, 1000, 10010, 10100, }
6
Kleene Closure (Review)
Definition: Let V be a vocabulary, and A a subset of V*. The Kleene closure of A, denoted by A*, is the set consisting of concatenations of an arbitrary number of strings from A. That is, Definition: A+ is the set of nonempty strings over A. In other words,
7
Kleene Closure Example (Review)
Example: Let A={0, 1}. Then A0={} A1={0,1} A2={00, 01, 10, 11} A3={000, 001, 010, 011, 100, 101, 110, 111} A*={0,1}*={All binary strings} Example: Let B={111}. Then B0={}, B1={111}, B2={111111} B3={ } B* is the set of strings with 3n 1s, for every n³0.
8
Grammars and Languages
Many languages can be defined by grammars. We are particularly interested in phrase-structure grammars. Before we can define phrase-structure grammars, we need to define a few more terms.
9
Special Symbols Definition: A nonterminal symbol (or just nonterminal) is a symbol which can be replaced by other symbols. Definition: A terminal symbol (or just terminal) is a symbol which cannot be replaced by other symbols. Definition: The start symbol is a special symbol, usually denoted by S. The set of terminals is denoted by T, and the set of nonterminals by N. S is a nonterminal.
10
Productions Definition: A production (or substitution rule) is a rule which tells how to replace one string from V* with another string. Productions are denoted by ab, which denotes that a can be replaced by b. Example Let SA0, AA1, and A0 be productions Then I can replace S with A0 Since I can replace A with A1, A0 can become A10 Since I can replace A with 0, A10 can become 010 Thus, I can replace S with 010
11
Phrase-Structure Grammars
Definition: A phrase-structure grammar is a 4-tuple G=(V,T,S,P), where V is a vocabulary TV is a set of terminals SV is a start symbol P is a set of productions N=V-T is the set of nonterminals Each production contains at least one nonterminal on its left side. We will always use S as the start symbol.
12
Direct Derivations Let G=(V,T,S,P) be a phrase-structure grammar.
Let A=lar and B=lbr, where l, a, b, r Î V*. Let ab be a production. Then we can derive B from A. Thus we say that A is directly derivable from B. We write this as AB
13
Derivations Let G=(V,T,S,P) be a phrase-structure grammar
Let A1, A2,…,An V* be such that A1A2…An Then we say that An is derivable from A1. We write A1* An The sequence of productions used is called a derivation.
14
Generating Languages Let G=(V,T,S,P) be a grammar
Definition: The language generated by G, denoted L(G) , is the set of all strings of terminals that are derivable from S. Put another way, L(G)={w T* | S * w }
15
Example 1 Let G be the grammar with V={S,0,1} T={0,1} P={SS0, S0}
Clearly S0, so 0L(G) Also, SS000, so 00L(G) And, SS0S00000, so 000L(G) It is not hard to see that L(G) is the language consisting of all strings with 1 or more 0s.
16
Example 2 Let G be the grammar with V={S,0,1}, T={0,1}, and P={SSS, S1, S0} Clearly S0, so 0L(G) Also, S1, so 1L(G) Since SSSS101, so 01L(G) In general, we can get a sequence of Ss, and replace each with either 0 or 1. Given this fact, it is easy to see that L(G) ={0,1}+, the set of all non-empty binary strings
17
Example 3 Let G be the grammar with V={S,A,B,0,1}, T={0,1}, and
P={SAB, BBB, AAA, A0, B1} Clearly SAB0B01, so 01L(G) Also, SABAAB0AB00B001, so 001L(G) Similarly, we can get 011, 0011, 0001, etc. In general, we can get a sequence of n 0s followed by m 1s, where n>0, m>0. Thus L(G) ={0n1m | m and n are positive integers}
18
Type 0 Grammars Type 0 grammars have no restrictions on the types of productions that are allowed. Thus type 0 grammars are just phrase-structure grammars. This is not too exciting, so we will move on to type 1 grammars.
19
Type 1 Grammars In a type 1 grammar, productions are of the form
aXbacb,where XN and a,b,cV* with c¹ (or S, but ignore this for now) Thus, a production can only be applied if the symbol X is surrounded by a and b. In other words, the production can only be applied in a certain context. This is why type 1 grammars are also called context-sensitive grammars.
20
Type 2 Grammars Productions are of the form
Xa, where XN and aV*. Thus, if X is in a string, we can replace X with a no matter what surrounds X. In other words, the context in which X appears does not matter. This is why type 2 grammars are called context-free grammars. Context-free grammars produce context-free languages.
21
Type 3 Grammars Productions are of the form Xa, where XN and aT
XaY, where X,YN and aT S Type 3 grammars are called regular grammars. Regular grammars produce regular languages. It is easy to see that a type 3 grammar is a type 2 grammar.
22
Types of Grammars Type Productions allowed Almost any kind allowed 1
Almost any kind allowed 1 aXbacb, where XN, a,b,cV*, c¹ S 2 Xa, where XN and aV* 3 Xa, where XN and aT XaY, where X,YN and aT
23
Types of Grammars The following summarizes the relationships between the types of grammars Type 0: phrase-structure Type 1: context-sensitive Type 2: context-free Type 3: regular
24
Regular Grammar Example
Let G be the grammar with V={S,A,0,1}, T={0,1}, and P={S0A, A0A, A1A, A1} We can determine what the language is by constructing a few words. S0A01 S0A00A001 S0A01A011 S0A00A000A0001 S0A00A001A0011 S0A01A010A0101 S0A01A011A0111 We can see that in general, L(G) is the set of binary strings beginning with 0 and ending with 1.
25
Limitations Problem: Find a regular grammar that recognizes the following language L={0n1n | n=0,1,2,…} Solution: It cannot be done. Proof: we will see this later. Can you describe L with a regular expression? Can you give a finite-state automaton that generates L? Can you give any grammar that generates L?
26
Regular Languages and Sets
Theorem: Let A be a subset of V* . Then A is a regular language if and only if A is a regular set. In other words, a language defined by a regular grammar can also be defined by a regular expression, and vice-versa. Example: We just saw that the grammar with V={S,A,0,1}, T={0,1}, and P={S0A, A0A, A1A, A1} generates the set of binary strings beginning with 0 and ending with 1. Recall that the regular set defined by 0(0È1)*1 is also the set of all binary strings beginning with 0 and ending with 1.
27
Grammars, Expressions, and Automata
Consider the set A={binary strings which start with 0 and end with 1} We have seen previously that A is recognized by a finite-state automata. We just saw that A was generated by the grammar with V={S,A,0,1}, T={0,1}, and P={S0A, A0A, A1A, A1} We also saw that A is defined by the regular expression 0(0È1)*1 This is no coincidence, as we will see next.
28
Grammars, Expressions, and Automata
Theorem: Let L be a language. The following three statements are equivalent L is regular set (that is, L generated by a regular expression) L is a regular language (that is, L generated by a regular grammar) L is recognized by a finite-state automaton Put another way, L is a regular set if and only if L is a regular language if and only if L is recognized by a finite-state automaton. In other words, regular sets, regular languages, and languages recognized by finite-state automata are all the same thing.
29
Grammar Applications Context-free grammars are used to define the syntax of most programming languages. Regular grammars are used in several applications, including the following Searching text for patterns Lexical analysis (during program compilation) Efficient algorithms exist to determine if a string is in a context-free or regular language. This is important for tasks like determining whether or not a program is syntactically valid.
30
Backus-Naur Form Backus-Naur form (BNF) is a more compact representation of productions in a type 2 grammar. All productions with the same left hand side are combined into one production The symbol is replaced with ::= All terminals are enclosed in < and > The right hand sides of the various productions are combined, and separated by |
31
Backus-Naur Form Example
Consider the set of productions SAB BBB AAA A0 B1 In BNF, they are represented by <S> ::= <A><B> <B> ::= <B><B> | 1 <A> ::= <A><A> | 0
32
Backus-Naur Form Example 2
The Backus Naur form for the production of a signed integer is <signed integer> ::= <sign><integer> <sign> ::= + | - <integer> ::= <digit> | <digit><integer> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
33
Backus-Naur Form Applications
Specifying the syntax for programming languages including Java LISP Specifying database languages SQL Specifying markup languages XML
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.