Presentation is loading. Please wait.

Presentation is loading. Please wait.

Programming Languages Third Edition Chapter 6 Syntax.

Similar presentations


Presentation on theme: "Programming Languages Third Edition Chapter 6 Syntax."— Presentation transcript:

1 Programming Languages Third Edition Chapter 6 Syntax

2 Objectives Understand the lexical structure of programming languages Understand context-free grammars and BNFs Become familiar with parse trees Understand ambiguity, associativity, and precedence Read Sections 6.1 – 6.4, pp. 204-220 Programming Languages, Third Edition2

3 Introduction Syntax is the structure of a language Syntax rules are analogous to the grammar rules of a natural language John Backus and Peter Naur developed a notational system for describing these grammars, now called Backus-Naur forms, or BNFs –First used to describe the syntax of Algol60 Every modern computer scientist needs to know how to read, interpret, and apply BNF descriptions of language syntax Programming Languages, Third Edition3

4 4 Source Code (your program) Object Code (machine language) Compiler Flowchart for Compilation

5 Programming Languages, Third Edition5 Source Code (your program = char stream) Object Code (machine language) Scanner (lexical analysis) Flowchart for Compilation - Details Lexical items / Tokens Parser (syntactic analysis) Parse tree Intermediate Code Semantic analysis (analyses meaning) Optimization

6 Lexical Structure of Programming Languages Lexical structure: the structure of the tokens, or words, of a language –Related to, but different than, the syntactic structure Scanning phase: the phase in which a translator collects sequences of characters from the input program and forms them into tokens Parsing phase: the phase in which the translator processes the tokens, determining the program’s syntactic structure Programming Languages, Third Edition6

7 Lexical Structure of Programming Languages (cont’d.) Tokens generally fall into several categories: –Reserved words (or keywords) –Literals or constants –Special symbols, such as “;” “<=“ “+” –Identifiers Programming Languages, Third Edition7

8 Lexical Structure of Programming Languages (cont’d.) Token delimiters (or white space): formatting that affects the way tokens are recognized Indentation can be used to determine structure Free-format language: one in which format has no effect on program structure other than satisfying the principle of longest substring Fixed format language: one in which all tokens must occur in prespecified locations on the page Tokens can be formally described by regular expressions Programming Languages, Third Edition8

9 Scanning Regular Expressions Metalanguage for describing patterns for strings of characters – metasymbols are | means choice * means zero or more occurrences + means one of more occurrences ? means one optional occurrence [ ] choose one of list of chars in brackets can use a range. (period) means one of any character ( ) can be used for grouping \ can precede metasymbol with this to use metasymbol in string Programming Languages, Third Edition9

10 Regular Expressions (cont’d.) Most modern text editors use regular expressions in text searches Utilities such as lex can automatically turn a regular expression description of a language’s tokens into a scanner Programming Languages, Third Edition10

11 Regular Expressions (cont’d.) Examples: (a|b)*c [ab]*c [aeiou] [aeiouAEIOU] [aeiouAEIOU]+ [aeiouAEIOU]* [A-Z][a-z]* [A-Z]+[a-z] [A-Za-z]* Programming Languages, Third Edition11

12 Regular Expressions (cont’d.) Examples: [0-9]+ [0-9]+(\.[0-9]+) Can test by making text file on Unix and using egrep –x “pattern” filename Programming Languages, Third Edition12

13 Regular Expressions (cont’d.) Let’s try writing some: –Signed integers, sign not optional –Signed integers, sign optional –Signed integers, sign optional, no signed zero –Signed integers, allow leading zeros, but no signed zero Programming Languages, Third Edition13

14 Regular Expressions (cont’d.) Let’s try writing some for license plates: –Start with VA, followed by zero or more digits –Start with VA, followed by one or more digits –Start with VA, followed by 2 digits, followed by zero or more lower case letters –Start with V or A, followed by -, followed by 2-4 digits –Start with VA, any case, followed by 2-3 digits or 2-3 letters (tried in class) –Start with VA, any case, followed by 2-3 digits, followed by 2-3 letters (meant to try) Programming Languages, Third Edition14

15 Parsing Context-Free Grammars and BNFs Context-free grammar: consists of –a series of grammar rules (Productions) Each rule has a single phrase structure name on the left, then a  metasymbol, followed by a sequence of symbols or other phrase structure names on the right –Nonterminals: names for phrase structures, since they are broken down into further phrase structures –Start symbol: one of the Nonterminals –Terminals: words or token symbols that cannot be broken down further Programming Languages, Third Edition15

16 Example 1: Unsigned Integers ::= | ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Terminals: 0, 1, …, 9 Nonterminals:, Start Symbol: Productions: there are 12 Metasymbols: “::=“, “|” Programming Languages, Third Edition16

17 Example 1 (cont’d) Derivation: the process of building in a language by beginning with the start symbol and replacing left-hand sides by choices of right-hand sides in the rules Let’s derive the number 123 (on board) Parse tree: graphical depiction of the replacement process in a derivation Let’s draw parse tree for 123 (on board) Programming Languages, Third Edition17

18 Example 1 (cont’d) Notice recursion in one of rules Notice recursive symbol is on left This is a left-recursive grammar This is a left-associative grammar Notice how parse tree cascades to left Programming Languages, Third Edition18

19 Example 2: Unsigned Integers ::= | ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Only made one change, so now grammar is Right-recursive Right-associative Let’s draw parse tree for 123 Programming Languages, Third Edition19

20 Ex 3: Simple Expression Grammar ::= + | * | ( ) | ::= | ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Let’s derive parse tree for: 3 + 4 + 5 Programming Languages, Third Edition20

21 Ex 3 (cont’d) ::= + | * | ( ) | ::= | ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Is there another parse tree for: 3 + 4 + 5 Programming Languages, Third Edition21

22 Ex 3 (cont’d) ::= + | * | ( ) | ::= | ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 A grammar is ambiguous if there are two parse trees for the same string Programming Languages, Third Edition22

23 Ex 3 (cont’d) ::= + | * | ( ) | ::= | ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Ambiguity is undesirable (but sometimes unavoidable) Let’s see why it’s undesirable: Derive parse trees for 3 + 4 * 5 Programming Languages, Third Edition23

24 Ex 3 (cont’d) ::= + | * | ( ) | ::= | ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 So what was the problem? Which tree provides correct arithmetic interpretation? Programming Languages, Third Edition24

25 Ex 3 (cont’d) Can we modify the grammar to “fix” the problem? YES! Add more levels of productions: ::= + | ::= * | ::= ( ) | ::= | ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Programming Languages, Third Edition25

26 Ex 3 (cont’d) ::= + | ::= * | ::= ( ) | ::= | ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Redraw parse trees for 3 + 4 + 5 and 3 + 4 * 5 Programming Languages, Third Edition26

27 Chapter 6 Final Thoughts A grammar is context-free when nonterminals appear singly on the left sides of productions –There is no context under which only certain replacements can occur Anything not expressible using context-free grammars is a semantic, not a syntactic, issue BNF form of language syntax makes it easier to write translators Parsing stage can be automated Programming Languages, Third Edition27

28 Chapter 6 Final Thoughts Syntax establishes structure, not meaning –But meaning is related to syntax Syntax-directed semantics: process of associating the semantics of a construct to its syntactic structure –Must construct the syntax so that it reflects the semantics to be attached later Programming Languages, Third Edition28


Download ppt "Programming Languages Third Edition Chapter 6 Syntax."

Similar presentations


Ads by Google