Presentation is loading. Please wait.

Presentation is loading. Please wait.

Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.

Similar presentations


Presentation on theme: "Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters."— Presentation transcript:

1 Syntax The Structure of a Language

2 Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters and collects them into tokens

3 Tokens Reserved words (keywords) –if while Literals or constants –3.14 “Fred” Special symbols –+ = Identifiers

4 Principle of Longest Substring At each point, the longest possible string is collected into a single token Natural token separators –Token separators ; + = –White space Spaces and tabs Newlines Comments

5 FORTRAN violates these rules DO 99 I = 1.10 –Assigns 1.10 to the variable DO99I DO 99 I = 1,10 –Sets up a loop with loop counter I going from 1 to 10 FORTRAN has no reserved words at all

6 C token conventions Six classes of tokens –Identifiers –Keywords –Constants –String literals –Operators –Other operators White space characters are ignored except as they separate tokens Adheres to the principle of longest substring

7 Regular Expressions Regular expressions were invented by Stephen Kleene and appeared in a Rand Corporation report in about 1950 Regular expressions represent a form of language definition Each regular expression E denotes a language L(E) defined over the alphabet of the language

8 Rules defining REs Empty –  is a RE Atom –Any symbol from the alphabet is a RE Alternation –If a and b are REs then so is a|b –All strings identified by a and all those identified by b Concatenation –If a and b are REs then so is ab –All strings formed by concatenating a string identified by b to the end of one identified by a

9 More rules for REs Kleene Closure –If a is an RE then so is a* –All strings formed by concatenating zero or more strings identified by a Positive Closure –If a is an RE then so is a+ –All strings formed by concatenating one or more strings identified by a

10 Examples of Res (a|b)c –Recognizes ac and bc but no others (a|b)*c –Recognizes c ac bc aac abc abac (a|b)+c –Does not recognize c but all the others above

11 Extensions [] – any one of a set of characters –[A-Z] – any capitol letter – [0123456789] – any digit ? – an optional item (0 or 1 of these) –[A-Z][0-9]? – a single capitol letter or a single capitol letter followed by a single digit. (period) – any character

12 More Examples [0-9]+ –Simple integer constants [0-9]+(\.[0-9])? –Simple floating-point constants

13 Context-Free Grammars (CFGs) Context-free grammars were developed by Noam Chomsky as a way to specify language Rules are generally specified in Backus-Naur Form (BNF) or ain Extended BNF (EBNF)

14 What makes up a CFG? A set N of non-terminal symbols A set T of terminal symbols A set P of production rules A special non-terminal symbol S called the start symbol (or goal symbol)

15 Sample CFG sentence  noun-phrase verb-phrase. noun-phrase  article noun article  a | the noun  girl | dog verb-phrase  verb noun-phrase verb  sees | pets

16 Parts of the grammar Non-terminal symbols: {sentence, noun-phrase, article, noun, verb- phrase, verb} Terminal Sumbols {.,a, the, girl, dog, sees, pets} Production rules The previous slide provides these Start Symbol sentence

17 Notes on CFG Non-terminal symbols are those that appear on the left-hand side (lhs) of the production rules Terminal symbols are those that appear only on the right-hand side (rhs) of the production rules  and | are meta-symbols

18 (Left-Most) Derivation sentence  noun-phrase verb-phrase.  article noun verb-phrase.  the noun verb-phrase.  the girl verb-phrase.  the girl verb noun-phrase.  the girl sees noun-phrase.  the girl sees article noun.  the girl sees a noun.  the girl sees a dog.

19 Corresponding Parse Tree sentence noun-phraseverb-phrase. articlenoun verb noun-phrase articlenoun the girlsees adog

20 Ambiguous Grammars A grammar is ambiguous of a sentence has two distinct derivations or two distinct parse trees

21 Grammar for expressions expr  expr + expr | expr * expr | (expr) | number number  number digit | digit digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

22 Parse trees for 3 + 5 * 7 expr + * + * number digit 3 number digit 5 number digit 7 number digit 3 number digit 5 number digit 7

23 Handling Ambiguity The grammar rules for expressions can be modified to eliminate the ambiguity that precedence should take care of Introduce a new non-terminal that forces the higher-precedence operator lower in the parse tree

24 Precedence handled expr  expr + expr | term term  term * term | ( expr ) | number number  number digit | digit digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

25 Associativity This grammar is still ambiguous There are two parse trees for 5 + 7 + 9 This may be ok for addition & multiplication, but not for subtraction & addition which are left-associative

26 Revised Grammar (not ambiguous) expr  expr + term | term term  term * factor | factor factor  ( expr ) | number number  number digit | digit digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

27 EBNFs Extended BNF adds more metasymbols { } – a repeated item (0 or more times) [ ] – an optional item (0 or 1 time)

28 Expression Grammar in EBNF expr  term { + term } term  factor { * factor } factor  ( expr ) | number number  digit { digit } digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

29 EBNF for if-statement if-statement  if (expression) statement [ else statement ]

30 Syntax Diagrams Syntax diagrams are an alternative to EBNF Study the diagrams on pp 99-101 and observe the direct relationship of each to the EBNF grammar rules for expressions

31 Parsers This simplest parser is a recognizer Accepts or rejects strings on whether they are legal strings in the language More general parsers Build parse trees (or abstract syntax trees) May calculate values of expressions, etc.

32 Bottom-up Parsers Attempts to match the input with the RHSs of the grammar rules When a match occurs, the RHS is replaced by the non-teminal on the LHS of the rule (called a reduce) Sometimes called shift-reduce parsing

33 Top-down Parsers Non-terminals are expanded to match incoming tokens and the parser directly constructs a derivation

34 Recursive-Descent Parsing A program made up of a collection of mutually recursive procedures, one for each non-terminal.


Download ppt "Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters."

Similar presentations


Ads by Google