Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parsing. Language A set of strings from an alphabet which may be empty, finite or infinite. A language is defined by a grammar and we may also say that.

Similar presentations


Presentation on theme: "Parsing. Language A set of strings from an alphabet which may be empty, finite or infinite. A language is defined by a grammar and we may also say that."— Presentation transcript:

1 Parsing

2 Language A set of strings from an alphabet which may be empty, finite or infinite. A language is defined by a grammar and we may also say that a grammer generates a language. Basic Definitions Symbol A symbol is an entity that has no meaning by itself (A character). Alphabet An alphabet is a finite set of symbols. Examples: B = {0, 1} C = {a, b, c} E = {a,b,…,z,A,B,…,Z,1,2,…,9,…} String (Sentence) A finite sequence of symbols from an alphabet. Examples: 0, 1, 111, 10110101 abc, cbba, bbc, aaabbbccc English, Mary, school, John plays football.

3 Grammar A grammar G is defined as a four tuple (V, T, P, S), where: V={S} T={a,b} P: S -> aSb S -> ab This grammer generates strings containing an equal number of a’s and b’s starting with a’s at the beginning of the string and b’s following the a’s. To Generate the string aaaaabbbbb: S => aaSbb => aaaSbbb => aaaaSbbbb => aaaaabbbbb V is a set of symbols called variables (S, A, B, …) T is a set of symbols called terminal ( 0, 1, a, b, …) S is the start variable (an element of V) P is a set of grammer rules An example grammer:

4 Scanning (Lexical Analysis) Phases of compilation Parsing (Syntax Analysis) Intermediate Code Gen. (Semantics Analysis) Optimization Code Generation Tokens ([x, +, ‘(’, y, *, x, ‘)’] ) Parse tree Intermediate code Optimized code Object Code x+(y*x)

5 A parser is a program which embodies the rules of a grammar. Parsing is the task of showing whether or not a string of symbols conforms to the syntactic rules (grammer) of a language. If the string conforms to the rules we say that the grammar generates the string. Scanner while A > = B Do [while, A, > =, B, Do] Parser Yes, this conforms to the syntactic rules

6 A simple grammar for expressions expr ::= term | term addop expr term ::= factor | factor multop term factor ::= ‘x’ | ‘y’ | lbr expr rbr addop ::= ‘+’ | ‘-’ multop ::= ‘*’ | ‘/’ lbr ::= ‘(’ rbr ::= ‘)’ E -> T E -> T A E T -> F T -> F M T … Parsing Expressions Arithmetic Expressions x x+y x+(y*x)

7 Representing this grammar in Prolog We should write a relation expr, such that expr(E) succeeds if E can be parsed as an expression. Suppose we represent an expression as a list of symbols as follows: [x, +, ‘(’, y, *, x, ‘)’] How can we parse this as an expression? The grammar tells us that there are 2 ways that a list of symbols can represent an expression. expr ::= term | term addop expr The first clause is simple: expr(E) :- term(E). % E is an expression if it is a term

8 The second clause says that a list of symbols is an expression if it starts with a term followed by an addop and ends with an expression. So we must write code to find the first part of the list which can be parsed as a term. We will use append(L1, L2, L3) to split the list up and find all the possible first parts of the list: ?- append([a,b],[c,d],L). L = [a, b, c, d] ?- append(L1,L2,[a,b,c,d]). L1 = [] L2 = [a, b, c, d] ; L1 = [a] L2 = [b, c, d] ; L1 = [a, b] L2 = [c, d] ; L1 = [a, b, c] L2 = [d] ; L1 = [a, b, c, d] L2 = [] ; No

9 append(Start, Finish, [x, +, '(', y, *, x, ')']) Start = [], Finish = [x, +, '(', y, *, x, ')']; Start = [x], Finish = [+, '(', y, *, x, ')']; Start = [x, +], Finish = ['(', y, *, x, ')']; Start = [x, +, '('], Finish = [y, *, x, ')']; Start = [x, +, '(', y], Finish = [*, x, ')']; Start = [x, +, '(', y, *], Finish = [x, ')']; Start = [x, +, '(', y, *, x], Finish = [')']; Start = [x, +, '(', y, *, x, ')'], Finish = []; No [x, +, ‘(’, y, *, x, ‘)’]

10 expr(L) :- append(First, Rest, L), % Split the list up term(First), % find a term at the start append(L1, L2, Rest), %split the Rest up addop(L1), %find an addop expr(L2). %find an expr expr ::= term | term addop expr

11 term(L) :- append(First, Rest, L), % Split the list up factor(First), % find a factor at the start append(L1, L2, Rest), %split the Rest up multop(L1), %find a multop term(L2). %find a term Term term ::= factor | factor multop term Clause 1 term(L):- factor(L). Clause 2

12 factor([x]). factor([y]). factor(L) :- append(First, Rest, L), lbr(First), append(L1, L2, Rest), expr(L1), rbr(L2). Factors factor ::= ‘x’ | ‘y’ | lbr expr rbr A list of symbols is a factor if it is ‘x’ or ‘y’

13 Parsing operators and brackets addop ::= ‘+’ | ‘-’ multop ::= ‘*’ | ‘/’ lbr ::= ‘(’ rbr ::= ‘)’ Defining operators and the brackets addop([+]). addop([-]). multop([*]). multop([-]). lbr(['(']). rbr([')']).

14 ?- expr([x, +, '(', y, *, x, ')']). Yes ?- expr([y, -, '(', x, +, y, ')']). Yes ?- expr([y, -, '(', x, +, y]). No ?- expr([y,/, '(', x, +, y,')']). Yes ?- expr([y,/, (, x, +, y,)]). ERROR: Syntax error: Operator expected ERROR: expr([y,/, (, ERROR: ** here ** ERROR: x, +, y,)]). ?- Now we can user expr to parse strings of symbols as expressions: x+(y*x) x-(x+y) y-(x+y y/(x+y)

15 It is possible with most Prolog implementations to write a grammar directly in Prolog and use a special predicate to parse strings. we have seen that we can construct a parser in Prolog from a grammar. It is even possible to write a Prolog program which takes a grammar and produces a parser. Such a program is called a parser generators.

16 expr --> term. expr --> term, addop, expr. term --> factor. term --> factor, multop, term. factor --> [x]. factor --> [y]. factor --> lbr, expr, rbr. addop --> ['+']. addop --> ['-']. multop --> ['*']. multop --> ['/']. lbr --> ['(']. rbr --> [')']. Here is the grammar that we presented earlier rewritten using Prolog grammer rules:

17 The Prolog grammar rules are just a bunch of facts, like any others we might have in our database. To use them for parsing we need to give Prolog the query: phrase(P, List) ?- phrase(expr, [y, '*', '(', x, '+', x, ')']) Yes ?- phrase(factor, [y]) Yes ?- phrase(rbr, [')']) Yes ?- phrase(factor, [y, '*', x]) No ?- phrase(expr, [y, '*', x]). Yes ?- phrase(factor, ['(',y, '*', x,')']). Yes ?- ?- phrase(expr, L). L = [x] ; L = [y] ; L = ['(', x, ')'] ; L = ['(', y, ')'] ;

18 A Grammar for a very small fragment of English sentence --> noun_phrase, verb_phrase. noun_phrase --> determiner, noun. noun_phrase --> proper_noun. determiner --> [the]. determiner --> [a]. proper_noun --> [pedro]. noun --> [man]. noun --> [apple]. verb_phrase --> verb, noun_phrase. verb_phrase --> verb. verb --> [eats]. verb --> [sings].

19 ?- phrase(sentence, [the, man, eats]). yes ?- phrase(sentence, [the, man, eats, the, apple]). yes ?- phrase(sentence, [the, apple, eats, a, man]). yes ?- phrase(sentence, [pedro, sings, the, pedro]). no ?- phrase(sentence,[eats, apple, man]). no ?- phrase(sentence,L).

20 L = [the, man, eats, the, man] ; L = [the, man, eats, the, apple] ; L = [the, man, eats, a, man] ; L = [the, man, eats, a, apple] ; L = [the, man, eats, pedro] ; L = [the, man, sings, the, man] ; L = [the, man, sings, the, apple] ; L = [the, man, sings, a, man] ; L = [the, man, sings, a, apple] ; L = [the, man, sings, pedro] ; L = [the, man, eats] ; L = [the, man, sings] ; L = [the, apple, eats, the, man] ; L = [the, apple, eats, the, apple] ; L = [the, apple, eats, a, man] ; L = [the, apple, eats, a, apple] ; L = [the, apple, eats, pedro] ; L = [the, apple, sings, the, man] ; L = [the, apple, sings, the, apple] ; L = [the, apple, sings, a, man] ;

21 L = [the, apple, sings, a, apple] ; L = [the, apple, sings, pedro] ; L = [the, man, sings, pedro] ; L = [the, man, eats] ; L = [the, man, sings] ; L = [the, apple, eats, the, man] ; L = [the, apple, eats, the, apple] ; L = [the, apple, eats, a, man] ; L = [the, apple, eats, a, apple] ; L = [the, apple, eats, pedro] ; L = [the, apple, sings, the, man] ; L = [the, apple, sings, the, apple] ; L = [the, apple, sings, a, man] ; L = [the, apple, sings, a, apple] ; L = [the, apple, sings, pedro] ; L = [a, apple, eats, the, man] ; L = [a, apple, eats, the, apple] ; L = [a, apple, eats, a, man] ; L = [a, apple, eats, a, apple] ; L = [a, apple, eats, pedro] ; L = [a, apple, sings, the, man] ;

22 L = [a, apple, sings, the, apple] ; L = [a, apple, sings, a, man] ; L = [a, apple, sings, a, apple] ; L = [a, apple, sings, pedro] ; L = [a, apple, eats] ; L = [a, apple, sings] ; L = [pedro, eats, the, man] ; L = [pedro, eats, the, apple] ; L = [pedro, eats, a, man] ; L = [pedro, eats, a, apple] ; L = [pedro, eats, pedro] ; L = [pedro, sings, the, man] ; L = [pedro, sings, the, apple] ; L = [pedro, sings, a, man] ; L = [pedro, sings, a, apple] ; L = [pedro, sings, pedro] ; L = [pedro, eats] ; L = [pedro, sings] ; No ?-


Download ppt "Parsing. Language A set of strings from an alphabet which may be empty, finite or infinite. A language is defined by a grammar and we may also say that."

Similar presentations


Ads by Google