Presentation is loading. Please wait.

Presentation is loading. Please wait.

Syntax and Semantics Structure of programming languages.

Similar presentations


Presentation on theme: "Syntax and Semantics Structure of programming languages."— Presentation transcript:

1 Syntax and Semantics Structure of programming languages

2 Introduction Syntax: the form or structure of the expressions, statements, and program units Semantics: the meaning of the expressions, statements, and program units

3 Introduction Syntax and semantics provide a language’s definition – Users of a language definition Other language designers Implementers Programmers (the users of the language)

4 Syntax of a Languages Motivation for using a subset of C: Grammar Language (pages) Reference Pascal5Jensen & Wirth C 6Kernighan & Richie C++22Stroustrup Java14Gosling, et. al. The grammar fits on one page, so it’s a far better tool for studying language design.

5 C Grammar: Statements Program  int main ( ) { Declarations Statements } Declarations  { Declaration } Declaration  Type Identifier [ [ Integer ] ] {, Identifier [ [ Integer ] ] } ; Type  int | bool | float | char Statements  { Statement } Statement  ; | Block | Assignment | IfStatement | WhileStatement Block  { Statements } Assignment  Identifier [ [ Expression ] ] = Expression ; IfStatement  if ( Expression ) Statement [ else Statement ] WhileStatement  while ( Expression ) Statement

6 C Grammar: Expressions Expression  Conjunction { || Conjunction } Conjunction  Equality { && Equality } Equality  Relation [ EquOp Relation ] EquOp  == | != Relation  Addition [ RelOp Addition ] RelOp  | >= Addition  Term { AddOp Term } AddOp  + | - Term  Factor { MulOp Factor } MulOp  * | / | % Factor  [ UnaryOp ] Primary UnaryOp  - | ! Primary  Identifier [ [ Expression ] ] | Literal | ( Expression ) | Type ( Expression )

7 Describing Syntax: Terminology A sentence is a string of characters over some alphabet A language is a set of sentences A lexeme is the lowest level syntactic unit of a language (e.g., *, sum, begin ) A token is a category of lexemes (e.g., identifier)

8 Formal Definition of Languages Recognizers –A recognition device reads input strings of the language and decides whether the input strings belong to the language –Example: syntax analysis part of a compiler Generators –A device that generates sentences of a language –One can determine if the syntax of a particular sentence is correct by comparing it to the structure of the generator

9 Lexical Analysis The process of converting a character stream into a corresponding sequence of meaningful symbols (called tokens or lexemes) is called tokenizing, lexing or lexical analysis. A program that performs this process is called a tokenizer, lexer, or scanner. In Scheme, we tokenize (set! x (+ x 1)) as ( set! x ( + x 1 ) ) ‏ Similarly, in Java, we tokenize System.out.println("Hello World!"); as System. out. println ( "Hello World!" ) ;

10 Lexical Analysis Lexical analyzer splits it into tokens –Token = sequence of characters (symbolic name) representing a single terminal symbol Identifiers: myVariable … Literals: 123 5.67 true … Keywords: char sizeof … Operators: + - * / … Punctuation: ;, } { … Discards whitespace and comments

11 Examples of Tokens in C TokensLexemes identifierAge, grade,Temp, zone, q1 number3.1416, -498127,987.76412097 string“A cat sat on a mat.”, “90183654” open parentheses( close parentheses) Semicolon; reserved word ifIF, if, If, iF

12 Whitespace Whitespace is any space, tab, end-of-line character (or characters), or character sequence inside a comment No token may contain embedded whitespace (unless it is a character or string literal) Example: >= one token > = two tokens

13 program confusing; const true = false; begin if (a<b) = true then f(a) else … Redefining Identifiers can be dangerous

14 14 Parsing Process Call the scanner to get tokens Build a parse tree from the stream of tokens –A parse tree shows the syntactic structure of the source program. Add information about identifiers in the symbol table Report error, when found, and recover from the error

15 1-15 Describing Syntax Backus-Naur Form and Context-Free Grammars (BNF) –Most widely known method for describing programming language syntax Extended BNF –Improves readability and writability of BNF

16 Backus-Naur Form (BNF) Backus-Naur Form (1959) –Invented by John Backus to describe Algol 58 –BNF is equivalent to context-free grammars –BNF is a meta-language used to describe another language –In BNF, abstractions are used to represent classes of syntactic structures--they act like syntactic variables (also called non-terminal symbols)

17 Example: Binary Digits Consider the grammar: binaryDigit  0 binaryDigit  1 or equivalently: binaryDigit  0 | 1 Here, | is a metacharacter that separates alternatives.

18 Example: Decimal Numbers Grammar for unsigned decimal integers –Terminal symbols: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 –Non-terminal symbols: Digit, Integer –Production rules: Integer  Digit | Integer Digit Digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 –Start symbol: Integer Can derive any unsigned integer using this grammar –Language = set of all unsigned decimal integers

19 Integer  Integer Digit  Integer 2  Integer Digit 2  Integer 5 2  Digit 5 2  3 5 2 Rightmost derivation At each step, the rightmost non-terminal is replaced Derivation of 352 as an Integer Production rules: Integer  Digit | Integer Digit Digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

20 Parse Tree for 352 as an Integer

21 Arithmetic Expression Grammar The following grammar defines the language of arithmetic expressions with 1-digit integers, addition, and subtraction. Expr  Expr + Term | Expr – Term | Term Term  0 |... | 9 | ( Expr )

22 Parse of the String 5-4+3

23 1-23 BNF Fundamentals Non-terminals: BNF abstractions Terminals: lexemes and tokens Grammar: a collection of rules –Examples of BNF rules: → identifier | identifier, → if then

24 Regular Expressions xcharacter x \xescaped character, e.g., \n { name }reference to a name M | NM or N M NM followed by N M*0 or more occurrences of M M+1 or more occurrences of M [x 1 … x n ]One of x 1 … x n –Example: [aeiou] – vowels, [0-9] - digits

25 An Example Grammar   | ;  =  a | b | c | d  + | -  | const

26 Derivation A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (all terminal symbols), e.g., => => = => a = => a = + => a = b + => a = b + const

27 Parse Tree A hierarchical representation of a derivation const a = b +

28 CFG For Floating Point Numbers ::= stands for production rule; are non-terminals; | represents alternatives for the right-hand side of a production rule Sample parse tree:

29 Associativity and Precedence A grammar can be used to define associativity and precedence among the operators in an expression. E.g., + and - are left-associative operators in mathematics; * and / have higher precedence than + and -. Consider the grammar G 1 : Expr -> Expr + Term | Expr – Term | Term Term -> Term * Factor | Term / Factor | Term % Factor | Factor Factor -> Primary ** Factor | Primary Primary -> 0 |... | 9 | ( Expr )

30 30 Precedence E  E + E E  E - E E  E * E E  E / E E  id E  E + E E  E - E E  F F  F * F F  F / F F  id + E E E E * E id3id2 id1 * E E id2 E E + E id1 id3 E + E E *F id2 id1 F F F

31 PrecedenceAssociativityOperators 3right ** 2 left * / % \ 1 left + - Note: These relationships are shown by the structure of the parse tree: highest precedence at the bottom, and left-associativity on the left at each level. Associativity and Precedence for Grammar

32 1-32 Associativity of Operators Operator associativity can also be indicated by a grammar -> + | const (ambiguous) -> + const | const (unambiguous) const + +

33 33 Associativity Left-associative operators E  E + F | E - F | F F  F * X | F / X | X X  ( E ) | id (id1 + id2) * id3 / id4 = (((id1 + id2) * id3) / id4) / F X + E id3 X E E F )( * F X id1 X F id4 X F id2

34 Ambiguity in Grammars A grammar is ambiguous if and only if it generates a sentential form that has two or more distinct parse trees

35 Example of Dangling Else With which ‘if’ does the following ‘else’ associate? if (x < 0) if (y < 0) y = y - 1; else y = 0; Answer: either one!

36 1-36 An Ambiguous Expression Grammar  | const  / | - const --//

37 The Dangling Else Ambiguity Figure 2.5

38 Solving the dangling else ambiguity 1.Algol 60, C, C++: associate each else with closest if ; use {} or begin…end to override. 2.Algol 68, Modula, Ada: use explicit delimiter to end every conditional (e.g., if…fi ) 3.Java: rewrite the grammar to limit what can appear in a conditional: IfThenStatement -> if ( Expression ) Statement IfThenElseStatement -> if ( Expression ) StatementNoShortIf else Statement The category StatementNoShortIf includes all except IfThenStatement.

39 39 Ambiguity in Expressions Which operation is to be done first? –solved by precedence An operator with higher precedence is done before one with lower precedence. An operator with higher precedence is placed in a rule (logically) further from the start symbol. –solved by associativity If an operator is right-associative (or left- associative), an operand in between 2 operators is associated to the operator to the right (left). Right-associated : W + (X + (Y + Z)) Left-associated : ((W + X) + Y) + Z

40 An Unambiguous Expression Grammar If we use the parse tree to indicate precedence levels of the operators, we cannot have ambiguity  - |  / const| const const / -

41 Extended BNF Optional parts are placed in brackets [ ] -> ident [( )] Alternative parts of RHSs are placed inside parentheses and separated via vertical bars → (+|-) const Repetitions (0 or more) are placed inside braces { } → letter {letter|digit}

42 BNF and EBNF BNF  + | - |  * | / | EBNF  {(+ | -) }  {(* | /) }

43 Finite State Automata Set of states –Usually represented as graph nodes Input alphabet + unique “end of program” symbol State transition function –Usually represented as directed graph edges (arcs) –Automaton is deterministic if, for each state and each input symbol, there is at most one outgoing arc from the state labeled with the input symbol Unique start state One or more final (accepting) states

44 DFA for C Identifiers

45 Syntax Diagram for Expressions with Addition


Download ppt "Syntax and Semantics Structure of programming languages."

Similar presentations


Ads by Google