Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 363 Comparative Programming Languages

Similar presentations


Presentation on theme: "CS 363 Comparative Programming Languages"— Presentation transcript:

1 CS 363 Comparative Programming Languages
Lecture 3: Syntax & Notation

2 Topics The General Problem of Describing Syntax
Formal Methods of Describing Syntax Context Free Grammars, BNF Parse Trees CS 363 Spring 2005 GMU

3 Introduction Who must use language definition
Language designers Implementors Programmers (the users of the language) Syntax - the form or structure of the expressions, statements, and program units Semantics - the meaning of the expressions, statements, and program units Our focus today CS 363 Spring 2005 GMU

4 What is a language? Alphabet (S) – finite set of basic syntatic elements (characters, tokens) The S of C++ includes {while, for, +, identifiers, integers, …} Sentence – finite sequence of elements in S – can be l, the empty string (Some texts use e as the empty string) A legal C++ program is a single sentence in that language Language – possibly infinite set of sentences over some alphabet – can be { }, the empty language. Set of all legal C++ programs defines the language CS 363 Spring 2005 GMU

5 Suppose S = {a,b,c}. Some languages over S could be:
{aa,ab,ac,bb,bc,cc} {ab,abc,abcc,abccc,. . .} { l }, where l (e) is the empty string (length = 0) { } {a,b,c,l} CS 363 Spring 2005 GMU

6 Recognizing Languages
Typically the task of a compiler Find tokens (S) from the input See if tokens in appropriate order Determine what that token ordering means All of this must be formally specified CS 363 Spring 2005 GMU

7 A Typical Compiler Architecture
Syntactic/semantic structure tokens Syntactic structure Scanner (lexical analysis) Parser (syntax analysis) Semantic Analysis (IC generator) Code Generator Source language Code Optimizer Symbol Table CS 363 Spring 2005 GMU

8 Token lexeme – indivisible string in an input language:
ex: while, (, main, … token – (possibly infinite) set of lexemes defining an atomic element with a defined meaning while_token = {“while”} identifier_token = {“main”, “x”, … } Tokens are often describable using a pattern. The language of tokens is regular. CS 363 Spring 2005 GMU

9 while (a < limit) { a=a + 1; }
Lexical Analysis Break input string of characters into tokens. while (a < limit) { a=a + 1; } Remove white space, comments CS 363 Spring 2005 GMU

10 Describing Language Syntax
Enumeration – what are all the possible legal token orderings Formal approaches to describing syntax: Recognizers - used in compilers– “Is the given sentence in the language?” Generators – generate the sentences of a language CS 363 Spring 2005 GMU

11 Metalanguages for Describing Syntax
A metalanguage is a language used to describe another language. Abstractions are used to represent classes of syntactic structures--they act like syntactic variables (also called nonterminal symbols) Define a class of languages called context-free languages Context-Free Grammars (Noam Chomsky in mid 1950’s) Backus-Naur Form or BNF (1959 invented by John Backus to describe Algol 58) CS 363 Spring 2005 GMU

12 Backus-Naur Form (BNF)
<while_stmt>  while ( <logic_expr> ) <stmt> This is a rule describing the structure of a while statement Non-terminals are placeholders for other rules: <while_stmt>, <logic_expr>, <stmt> Tokens (terminal symbols) are part of the language alpahbet CS 363 Spring 2005 GMU

13 BNF Examples Vt = {+,-,0..9}, Vn = {<L>,<D>}, s = {<L>} <L>  <L> + <D> | <L> – <D> | <D> <D>  0 | … | 9 Vt={(,)}, Vn = {<L>}, s = {<L>} <L>  ( <L> ) <L> <L>  l recursion CS 363 Spring 2005 GMU

14 BNF Examples Vt = {a,b,c,d,;,=,+,-,const}, Vn = {<program>, <stmts>, <stmt>, <var>, <expr>, <term> } <program>  <stmts> <stmts>  <stmt> | <stmt> ; <stmts> <stmt>  <var> = <expr> <var>  a | b | c | d <expr>  <term> + <term> | <term> - <term> <term>  <var> | const CS 363 Spring 2005 GMU

15 Applying BNF rules Definition: Given a string a A b and a production A  g, we can replace A with g: a A b  a g b is a single step derivation. (a, b, and g are strings of zero or more terminals/non-terminals) Examples: <L> + <D>  <L> – <D> + <D> using <L>  <L> - <D> ( <L> ) ( <L> )  ( ( <L> ) <L> ) ( <L> ) using <L>  ( <L> ) <L> CS 363 Spring 2005 GMU

16 Derivations Definition: A sequence of rule applications:
w0  w1 …  wn is a derivation of wn from w0 (w0 * wn) <L> production <L>  ( <L> ) <L> ( <L> ) <L> production <L>  l ( ) <L> production <L>  l  ( ) <L> * () If wi has non-terminal symbols, it is referred to as sentential form. CS 363 Spring 2005 GMU

17 Derivation A sentence is a sentential form that has only terminal symbols A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded A derivation may be neither leftmost nor rightmost CS 363 Spring 2005 GMU

18 Derivation of (())() (<L>) <L>
production <L>  ( <L> )<L> (<L>) <L> (<L>) (<L>)<L> production <L>  l (<L>) (<L>) ((<L>)<L>)(<L>) (( ) <L>) (<L>) ( ( )<L>) ( ) ( ( ) ) ( ) Grammar: <L>  (<L>)<L> <L>  l < L>  * (( )) ( ) CS 363 Spring 2005 GMU

19 Same String, Leftmost Derivation
production <L>  ( <L> )<L> (<L>) <L> production <L>  (<L>)<L> ((<L>)<L>) <L> production <L>  l (() <L>)<L> (())<L> (( ))(<L>) <L> (( )) () <L> (()) () Grammar: <L>  (<L>)<L> <L>  l <L>   * (( )) ( ) CS 363 Spring 2005 GMU

20 Same String, Rightmost Derivation
<L> production <L>  ( <L> )<L> (<L>) <L> production <L>  (<L>)<L> (<L>) (<L>) <L> production <L>  l (<L>) ( <L>) (<L>)( ) ((<L>) <L>)( ) ((<L>)) ( ) (()) () Grammar: <L>  (<L>)<L> <L>  l <L>   * (( )) ( ) CS 363 Spring 2005 GMU

21 Both () and (())() are in L(G) for the previous grammar.
L(G), the language generated by grammar G is {w in Vt*: s * w for start symbol s} Both () and (())() are in L(G) for the previous grammar. CS 363 Spring 2005 GMU

22 Parse Trees The parse tree for some string in some language is defined by the grammar G as follows: The root is the start symbol of G The leaves are terminals or l. When visited from left to right, the leaves form the input string The interior nodes are non-terminals of G For every non-terminal A in the tree with children B1 … Bk, there is some production A  B1 … Bk If a string is in the given language, a parse tree must exist. CS 363 Spring 2005 GMU

23 Parse Tree for (())() l l l l L ( L ) L ( L ) L ( L ) L <L>
 ( ( ) ) ( ) ( L ) L ( L ) L ( L ) L l l l l CS 363 Spring 2005 GMU

24 Ambiguity A grammar is ambiguous if there at least two parse trees (or leftmost derivations ) for some string in the language E  E + E E  E * E E  0 | … | 9 E E E * E E E 4 2 E * E E E 3 4 2 3 2 + 3 * 4 CS 363 Spring 2005 GMU

25 An Unambiguous Expression Grammar
Grammars can be written that enforce precedence: <expr>  <expr> + <term> | <term> <term>  <term> * <c> | <c> <C>  0 | 1 | … | 9 <expr> <expr> + <term> <c> <term> <term> * 2 + 3 * 4 4 <c> <c> 3 2 CS 363 Spring 2005 GMU

26 Formal Methods of Describing Syntax
Operator associativity can also be indicated by a grammar <expr> -> <expr> + <expr> | const (ambiguous) <expr> -> <expr> + const | const (unambiguous) <expr> <expr> <expr> + const <expr> + const const CS 363 Spring 2005 GMU

27 EBNF Extended BNF: Shorthand for BNF Optional parts are placed in brackets ([ ]) <proc_call> -> ident [ ( <expr_list>)] Put alternative parts of RHSs in parentheses and separate them with vertical bars <term> -> <term> (+ | -) const Put repetitions (0 or more) in braces ({ }) <ident> -> letter {letter | digit} CS 363 Spring 2005 GMU

28 BNF and EBNF BNF: <expr>  <expr> + <term> EBNF:
<term>  <term> * <factor> | <term> / <factor> | <factor> EBNF: <expr>  <term> {(+ | -) <term>} <term>  <factor> {(* | /) <factor>} CS 363 Spring 2005 GMU

29 Lexical and Syntax Analysis
If a string is in a language, a parse tree can be derived for that string Problem: We need to go from a string of characters (input file) to a legal parse tree to show that a string is in the language. From introduction: compilers, interpreters, hybrid approaches Our Focus: Top-Down Parsing CS 363 Spring 2005 GMU

30 Parsing Take sequence of tokens and produce a parse tree
Two general algorithms (methods): top-down, bottom-up Algorithms derived from the cfg Note: We can’t always derive an algorithm from a cfg CS 363 Spring 2005 GMU

31 Top Down Start symbol L String: (())()
<L>  (<L>)<L> <L>  l String: (())() CS 363 Spring 2005 GMU

32 Top Down L ( L ) L String: (())() <L>  (<L>)<L>
CS 363 Spring 2005 GMU

33 Top Down L ( L ) L ( L ) L String: (())()
CS 363 Spring 2005 GMU

34 Top Down l L ( L ) L ( L ) L String: (())()
CS 363 Spring 2005 GMU

35 Top Down l l L ( L ) L ( L ) L String: (())()
CS 363 Spring 2005 GMU

36 Top Down l l L ( L ) L ( L ) L ( L ) L String: (())()
CS 363 Spring 2005 GMU

37 Top Down l l l L ( L ) L ( L ) L ( L ) L String: (())()
CS 363 Spring 2005 GMU

38 Top Down l l l l L ( L ) L ( L ) L ( L ) L String: (())()
CS 363 Spring 2005 GMU

39 Writing a recursive descent parser
Procedure for each non-terminal. Use next token (lookahead) to choose which production for that nonterminal to ‘mimic’: for non-terminal X, call procedure X() for terminals X, call ‘match(X)’ match(symbol) { if (symbol == lookahead) lookahead = next_token() else error() } Function next_token() gets the next token from the lexical analyzer – must be called before the first call to get first lookahead. CS 363 Spring 2005 GMU

40 Simplified RDP Example
L  ( L ) L | l L() { if (lookahead == ‘(‘) { /* L  ( L ) L */ match(‘(‘); L(); match(‘)’); L(); } else return; /* L  l */ main() { lookahead = next_token(); L(); CS 363 Spring 2005 GMU

41 Tracing the Recursive Descent Parse
call L() L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

42 Tracing the Recursive Descent Parse
call L() L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

43 Tracing the Recursive Descent Parse
call L() L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

44 Tracing the Recursive Descent Parse
call L() call L() - return L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

45 Tracing the Recursive Descent Parse
call L() call L() - return L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

46 Tracing the Recursive Descent Parse
call L() call L() - return L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

47 Tracing the Recursive Descent Parse
call L() call L() - return call L() – return L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

48 Tracing the Recursive Descent Parse
call L() call L() - return call L() – return L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

49 Tracing the Recursive Descent Parse
call L() call L() - return call L() – return L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

50 Tracing the Recursive Descent Parse
call L() call L() - return call L() – return L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

51 Tracing the Recursive Descent Parse
call L() call L() - return call L() – return L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

52 Tracing the Recursive Descent Parse
call L() call L() - return call L() – return L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

53 Tracing the Recursive Descent Parse
call L() - return call L() – return L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

54 Simplified RDP Example
L  ( L ) L | l L() { if (lookahead == ‘(‘) { /* L  ( L ) L */ match(‘(‘); L(); match(‘)’); L(); } else return; /* L  l */ main() { lookahead = next_token(); L(); The body of the function for a given non-terminal ‘mimics’ the productions. CS 363 Spring 2005 GMU

55 Another Grammar A  a B A  b A  c B B B  a B B  b A
if (lookahead == ‘a’) { lookahead = next_token(); B(); } else if (lookahead == ‘b’) { lookahead = next_token(); } else if (lookahead == ‘c’) { lookahead = next_token(); B(); B(); } else error(); } B() { else if (lookahead == ‘b’) { lookahead = next_token(); A(); } else error(); Key: Finding the set of symbols (lookahead) that indicate which production to use! CS 363 Spring 2005 GMU

56 How do we find the lookaheads?
Can compute lookahead sets for some grammars from FIRST() sets lookhead(A  a) = FIRST(a) For this to work for a given grammar, the lookahead sets for a given non-terminal will be disjoint. CS 363 Spring 2005 GMU

57 FIRST Sets FIRST(a) is the set of all terminal symbols that can begin some sentential form that starts with a FIRST(a) = {a in Vt | a * ab } U { l } if a * l Example: <stmt>  simple | begin <stmts> end FIRST(<stmt>) = {simple, begin} Remember, a is a string of zero or more terminals and nonterminals CS 363 Spring 2005 GMU

58 Computing FIRST sets Initially FIRST(A) is empty
For productions A  a b, where a in Vt Add { a } to FIRST(A) For productions A  l Add { l } to FIRST(A) For productions A  a B b, where a * l and NOT (B * l) Add FIRST(aB) to FIRST(A) For productions A  a, where a * l Add FIRST(a) and { l } to FIRST(A) CS 363 Spring 2005 GMU

59 { To compute FIRST across strings of terminals and non-terminals:
FIRST(l) = { l } A if A is a terminal FIRST(Aa) = FIRST(A) U FIRST(a) if A * l FIRST(A) otherwise { CS 363 Spring 2005 GMU

60 Example 1 S  a S e S  B B  b B e B  C C  c C e C  d FIRST(C) =
FIRST(B) = FIRST(S) = CS 363 Spring 2005 GMU

61 Example 1 S  a S e S  B B  b B e B  C C  c C e C  d
FIRST(C) = {c,d} FIRST(B) = {b,c,d} FIRST(S) = {a,b,c,d} CS 363 Spring 2005 GMU

62 Example 2 P  i | c | n T S Q  P | a S | b S c S T R  b | l
S  c | R n | l T  R S q FIRST(P) = FIRST(Q) = FIRST(R) = FIRST(S) = FIRST(T) = CS 363 Spring 2005 GMU

63 Example 2 P  i | c | n T S Q  P | a S | b S c S T R  b | l
S  c | R n | l T  R S q FIRST(P) = {i,c,n} FIRST(Q) = {i,c,n,a,b} FIRST(R) = {b, l} FIRST(S) = {c,b,n, l} FIRST(T) = {b,c,n,q} CS 363 Spring 2005 GMU

64 Example 3 S  a S e | S T S T  R S e | Q R  r S r | l Q  S T | l
FIRST(S) = FIRST(R) = FIRST(T) = FIRST(Q) = CS 363 Spring 2005 GMU

65 Example 3 S  a S e | S T S T  R S e | Q R  r S r | l Q  S T | l
FIRST(S) = {a} FIRST(R) = {r, l} FIRST(T) = {r,a, l} FIRST(Q) = {a, l} CS 363 Spring 2005 GMU

66 Bottom up Parsing (shift/reduce, LR)
Less intuitive but more efficient than top down Two actions: Shift – move some token from the input to the parse tree forest Reduce – merge 0 or more parser trees with a single parent. CS 363 Spring 2005 GMU

67 Bottom Up String: (())() <L>  (<L>)<L>
CS 363 Spring 2005 GMU

68 Bottom Up ( String: (())() Shift ( <L>  (<L>)<L>
CS 363 Spring 2005 GMU

69 Bottom Up ( ( String: (())() Shift ( <L>  (<L>)<L>
CS 363 Spring 2005 GMU

70 Bottom Up l ( ( L String: (())() Reduce L  l
<L>  (<L>)<L> <L>  l ( ( L l String: (())() Reduce L  l CS 363 Spring 2005 GMU

71 Bottom Up l ( ( L ) String: (())() Shift )
<L>  (<L>)<L> <L>  l ( ( L ) l String: (())() Shift ) CS 363 Spring 2005 GMU

72 Bottom Up l l ( ( L ) L String: (())() Reduce L  l
CS 363 Spring 2005 GMU

73 Bottom Up l l ( L ( L ) L String: (())() Reduce L  ( L ) L
CS 363 Spring 2005 GMU

74 Bottom Up l l ( L ) ( L ) L String: (())() Shift )
CS 363 Spring 2005 GMU

75 Bottom Up l l ( L ) ( L ) L ( String: (())() Shift (
CS 363 Spring 2005 GMU

76 Bottom Up l l l ( L ) ( L ) L ( L String: (())() Reduce L  l
CS 363 Spring 2005 GMU

77 Bottom Up l l l ( L ) ( L ) L ( L ) String: (())() Shift )
CS 363 Spring 2005 GMU

78 Bottom Up l l l l ( L ) ( L ) L ( L ) L String: (())() Reduce L  l
CS 363 Spring 2005 GMU

79 Bottom Up l l l l ( L ) L ( L ) L ( L ) L String: (())()
Reduce L  ( L ) L CS 363 Spring 2005 GMU

80 Bottom Up l l l l L ( L ) L ( L ) L ( L ) L String: (())()
Reduce L  ( L ) L CS 363 Spring 2005 GMU


Download ppt "CS 363 Comparative Programming Languages"

Similar presentations


Ads by Google