Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Fall 2013. Chart 2  Sub-phases of Syntactic Analysis  Grammars Revisited  Parsing  Abstract Syntax Trees  Scanning  Case Study: Syntactic Analysis.

Similar presentations


Presentation on theme: " Fall 2013. Chart 2  Sub-phases of Syntactic Analysis  Grammars Revisited  Parsing  Abstract Syntax Trees  Scanning  Case Study: Syntactic Analysis."— Presentation transcript:

1  Fall 2013

2 Chart 2  Sub-phases of Syntactic Analysis  Grammars Revisited  Parsing  Abstract Syntax Trees  Scanning  Case Study: Syntactic Analysis in the Triangle Compiler

3 Chart 3 Lexical Analyzer Parser & Semantic Analyzer Intermediate Code Generation Optimization Assembly Code Generation Symbol Table Source code Assembly code tokens parse tree intermediate representation

4 Chart 4  Main function o Parse source program to discover its phrase structure o Recursive-descent parsing o Constructing an AST o Scanning to group characters into tokens

5 Chart 5  Scanning (or lexical analysis) o Source program transformed to a stream of tokens Identifiers Literals Operators Keywords Punctuation o Comments and blank spaces discarded  Parsing o To determine the source programs phrase structure o Source program is input as a stream of tokens (from the Scanner) o Treats each token as a terminal symbol  Representation of phrase structure o AST

6 Chart 6  Scan the file character by character and group characters into words and punctuation (tokens), remove white space and comments  Tokens for this example: let var y : Integer in y := y + 1 let var y: Integer in !new year y := y+1 Note: !new year does not appear in list of tokens. Comments are removed along with white spaces.

7 Chart 7 let var y: Integer in !new year y := y+1 Input Converter Buffer Scanner let  vary:Integerin..   = space) character string let var y Ident. : colon Integer Ident. in y Ident. := becomes y Ident. + op. 1 Intlit. eot

8 Chart 8 // literals, identifiers, operators... INTLITERAL= 0, " ", CHARLITERAL= 1, " ", IDENTIFIER= 2, " ", OPERATOR= 3, " ", // reserved words - must be in alphabetical order... ARRAY= 4,"array", BEGIN= 5, "begin", CONST= 6, "const", DO= 7, "do", ELSE= 8, "else", END= 9, "end", FUNC= 10, "func", IF= 11, "if", IN= 12, "in", LET= 13,"let", OF= 14, "of", PROC= 15, "proc", RECORD= 16, "record", THEN= 17, "then", TYPE= 18, "type", VAR= 19, "var", WHILE= 20, "while", // punctuation... DOT= 21,".", COLON= 22,":", SEMICOLON= 23, ";", COMMA= 24, ",", BECOMES= 25, "~", IS= 26, // brackets... LPAREN= 27, "(", RPAREN= 28,")", LBRACKET= 29,[", RBRACKET= 30, "]", LCURLY= 31, "{", RCURLY= 32, "}", // special tokens... EOT= 33,"", ERROR= 34; " "

9 Chart 9  Context free grammars o Generates a set of sentences o Each sentence is a string of terminal symbols o An unambiguous sentence has a unique phrase structure embodied in its syntax tree  Develop parsers from context-free grammars

10 Chart 10  A regular expression (RE) is a convenient notation for expressing a set of stings of terminal symbols  Main features o ‘|’ separates alternatives o ‘*’ indicates that the previous item may be represented zero or more times o ‘(‘ and ‘)’ are grouping parentheses   The empty string -- a special string of length 0

11 Chart 11  Algebraic Properties o | is commutative and associative r|s = s|r r|(s|t) = (r|s)|t o Concatenation is associative (rs)t = r(st) o Concatenation distributes over | r(s|t) = rs|rt (s|t)r = sr|tr –  is the identity for concatenation  r = r r  = r o * is idempotent r** = r* r* = (r|  )*

12 Chart 12  Common Extensions o r+one or more of expression r, same as rr* o r k k repetitions of r r 3 = rrr o ~rthe characters not in the expression r ~[\t\n] o r-zrange of characters [0-9a-z] o r?Zero or one copy of expression (used for fields of an expression that are optional)

13 Chart 13  Regular Expression for Representing Months o Examples of legal inputs January represented as 1 or 01 October represented as 10 o First Try: [0|1|  ][0-9] 0, 1, or  followed by a number between 0 and 9 Matches all legal inputs? Yes 1, 2, 3, …, 10, 11, 12, 01, 02, …, 09 Matches any illegal inputs? Yes 0, 00, 18

14 Chart 14  Regular Expression for Representing Months o Examples of legal inputs January represented as 1 or 01 October represented as 10 o Second Try: [1-9]|(0[1-9])|(1[0-2]) Any number between 1 and 9 or 0 followed by any number between 1 and 9 or 1 followed by any number between 0 and 2 Matches all legal inputs? Yes 1, 2, 3, …, 10, 11, 12, 01, 02, …, 09 Matches any illegal inputs? No

15 Chart 15  Regular Expression for Floating Point Numbers o Examples of legal inputs 1.0, 0.2, 3.14159, -1.0, 2.7e8, 1.0E-6, -2.5e+5 Assume that a 0 is required before numbers less than 1 and does not prevent extra leading zeros, so numbers such as 0011 or 0003.14159 are legal o Building the regular expression Assume digit  0|1|2|3|4|5|6|7|8|9 Handle simple decimals such as 1.0, 0.2, 3.14159 digit+.digit+ 1 or more digits followed by. followed by 1 or more decimals Add an optional sign (only minus, no plus) (-|  )digit+.digit+or-?digit+.digit+

16 Chart 16  Regular Expression for Floating Point Numbers (cont.) o Building the regular expression (cont.) Format for the exponent (E|e)(+|-)?(digit+) Adding it as an optional expression to the decimal part (-|  )digit+.digit+((E|e)(+|-)?(digit+))?

17 Chart 17  Extended BNF (EBNF) o Combination of BNF and RE o N::=X, where N is a nonterminal symbol and X is an extended RE, i.e., an RE constructed from both terminal and nonterminal symbols o EBNF Right hand side may use |. *, (, ) Right hand side may contain both terminal and nonterminal symbols

18 Chart 18 Expression::=primary-Expression (Operator primary-Expression)* primary-Expression::=Identifier |( Expression ) Identifier::=a|b|c|d|e Operator::=+|-|*|/ Generates e a + b a – b – c a + (b * c) a + (b + c) / d a – (b – (c – (d – e)))

19 Chart 19  Left Factorization XY | XZ is equivalent to X(Y | Z) single-Command::= V-name := Expression |if Expression then single-Command else single-Command single-Command::=V-name := Expression |if Expression then single-Command (  |else single-Command)

20 Chart 20  Elimination of left recursion N::= X | NY is equivalent to N::=X(Y)* Identifier::= Letter |Identifier Letter |Identifier Digit Identifier::=Letter |Identifier (Letter | Digit) Identifier::=Letter(Letter | Digit)*

21 Chart 21  Substitution of nonterminal symbols Given N::=X, we can substitute each occurrence of N with X iff N::=X is nonrecursive and is the only production rule for N single-Command::=for Control-Variable := Expression To-or-Downto Expression do single-Command |… Control-Variable::=Identifier To-or-Downto::=to |down single-Command::=for Identifier := Expression (to|downto) Expression do single-Command |…

22 Chart 22  Starter set of an RE X o Starters[[X]] o Set of terminal symbols that can start a string generated by X  Examples o Starter[[his | her | its]] = {h, i} o Starter[[(re)* set]] = {r, s}

23 Chart 23  Precise and complete definition of starters: starters[[   starters[[t]] = {t}where t is a terminal symbol starters[[X Y]] = starters[[X]]  starters[[Y]]if X generates  starters[[X Y]] = starters[[X]]if X does not generate  starters[[X | Y]] = starters[[X]]  starters[[Y]] starters[[X *]] = starters[[X]]  To generalize for a starter set of an extended RE add o starters[[N]] = starters[[X]]where N is a nonterminal symbol defined production rule N ::= X

24 Chart 24 Expression::=primary-Expression (Operator primary-Expression)* primary-Expression::=Identifier |( Expression ) Identifier::=a|b|c|d|e Operator::=+|-|*|/ starters[[Expression]] = starters[[primary-Expression (Operator primary-Expression)*]] = starters[[primany-Expression]] = starters[[Identifier]]  starters[[ (Expressions ) ]] = starters[[a | b | c | d | e]]  { ( } = {a, b, c, d, e, (}

25 Chart 25  The purpose of scanning is to recognize tokens in the source program. Or, to group input characters (the source program text) into tokens.  Difference between parsing and scanning: o Parsing groups terminal symbols, which are tokens, into larger phrases such as expressions and commands and analyzes the tokens for correctness and structure o Scanning groups individual characters into tokens

26 Chart 26 Lexical Analyzer Parser & Semantic Analyzer Intermediate Code Generation Optimization Assembly Code Generation Symbol Table Source code Assembly code tokens parse tree intermediate representation

27 Chart 27 let var y: Integer in !new year y := y+1 Input Converter Buffer Scanner let  vary:Integerin..   = space) character string let var y Ident. : colon Integer Ident. in y Ident. := becomes y Ident. + op. 1 Intlit. eot

28 Chart 28  Handle keywords (reserve words) o Recognizes identifiers and keywords o Match explicitly Write regular expression for each keyword Identifier is any alpha numeric string which is not a keyword o Match as an identifier, perform lookup No special regular expressions for keywords When an identifier is found, perform lookup into preloaded keyword table How does Triangle handle keywords? Discuss in terms of efficiency and ease to code.

29 Chart 29  Remove white space o Tabs, spaces, new lines  Remove comments o Single line -- Ada comment o Multi-line, start and end delimiters { Pascal comment } /* c comment */ o Nested o Runaway comments Nonterminated comments can’t be detected till end of file

30 Chart 30  Perform look ahead o Multi-character tokens 1..10 vs. 1.10 &, && <, <= etc  Challenging input languages o FORTRAN Keywords not reserved Blanks are not a delimiter Example (comma vs. decimal) DO10I=1,5 start of a do loop (equivalent to a C for loop) DO10I=1.5 an assignment statement, assignment to variable DO10I

31 Chart 31  Challenging input languages (cont.) o PL/I, keywords not reserved IF THEN THEN THEN = ELSE; ELSE ELSE = THEN;

32 Chart 32  Error Handling o Error token passed to parser which reports the error o Recovery Delete characters from current token which have been read so far, restart scanning at next unread character Delete the first character of the current lexeme and resume scanning from next character. o Examples of lexical errors: 3.25ebad format for a constant Var#1illegal character o Some errors that are not lexical errors Mistyped keywords Begim Mismatched parenthesis Undeclared variables

33 Chart 33  Issues o Simpler design – parser doesn’t have to worry about white space, etc. o Improve compiler efficiency – allows the construction of a specialized and potentially more efficient processor o Compiler portability is enhanced – input alphabet peculiarities and other device-specific anomalies can be restricted to the scanner

34 Chart 34  What are the keywords in Triangle?  How are keywords and identifiers implemented in Triangles?  Is look ahead implemented in Triangle? o If so, how?

35 Chart 35 Lexical Analyzer Parser Intermediate Code Generation Optimization Assembly Code Generation Symbol Table Source code Assembly code tokens parse tree intermediate representation Semantic Analyzer

36 Chart 36  Given an unambiguous, context free grammar, parsing is o Recognition of an input string, i.e., deciding whether or not the input string is a sentence of the grammar o Parsing of an input string, i.e., recognition of the input string plus determination of its phrase structure. The phrase structure can be represented by a syntax tree, or otherwise. Unambiguous is necessary so that every sentence of the grammar will form exactly one syntax tree.

37 Chart 37  The syntax of programming language constructs are described by context-free grammars.  Advantages of unambiguous, context-free grammars o A precise, yet easy-to understand, syntactic specification of the programming language o For certain classes of grammars we can automatically construct an efficient parser that determines if a source program is syntactically well formed. o Imparts a structure to a programming language that is useful for the translation of source programs into correct object code and for the detection of errors. o Easier to add new constructs to the language if the implementation is based on a grammatical description of the language

38 Chart 38  Check the syntax (structure) of a program and create a tree representation of the program  Programming languages have non-regular constructs o Nesting o Recursion  Context-free grammars are used to express the syntax for programming languages sequence of tokens parser syntax tree

39 Chart 39  Comprised of o A set of tokens or terminal symbols o A set of non-terminal symbols o A set of rules or productions which express the legal relationships between symbols o A start or goal symbol  Example: 1. expr  expr – digit 2. expr  expr + digit 3. expr  digit 4. digit  0|1|2|…|9  Tokens: -,+,0,1,2,…,9  Non-terminals: expr, digit  Start symbol: expr

40 Chart 40 1. expr  expr – digit 2. expr  expr + digit 3. expr  digit 4. digit  0|1|2|…|9 Example input: 3 + 8 - 2 expr digit 3 2 8 + -

41 Chart 41  Given a grammar for a language and a program, how do you know if the syntax of the program is legal?  A legal program can be derived from the start symbol of the grammar Grammar must be unambiguous and context-free

42 Chart 42  The derivation begins with the start symbol  At each step of a derivation the right hand side of a grammar rule is used to replace a non-terminal symbol  Continue replacing non-terminals until only terminal symbols remain 1. expr  expr – digit 2. expr  expr + digit 3. expr  digit 4. digit  0|1|2|…|9 Example input: 3 + 8 - 2 expr  expr – digit  expr – 2  expr + digit - 2 Rule 1Rule 4 Rule 2  expr + 8-2  digit + 8-2  3+8 -2 Rule 4 Rule 3Rule 4

43 Chart 43  The rightmost non-terminal is replaced in each step 1. expr  expr – digit 2. expr  expr + digit 3. expr  digit 4. digit  0|1|2|…|9 Example input: 3 + 8 - 2 expr + digit - 2  expr + 8-2 expr + 8-2  digit + 8-2 Rule 3 expr  expr – digit Rule 1 expr – digit  expr – 2 Rule 4 expr – 2  expr + digit - 2 Rule 2 Rule 4 digit + 8-2  3+8 -2 Rule 4

44 Chart 44  The leftmost non-terminal is replaced in each step 1. expr  expr – digit 2. expr  expr + digit 3. expr  digit 4. digit  0|1|2|…|9 Example input: 3 + 8 - 2 digit + digit – digit  3 + digit – digit 3 + digit – digit  3 + 8 – digit Rule 4 expr  expr – digit Rule 1 expr – digit  expr + digit – digit Rule 2 expr + digit – digit  digit + digit – digit Rule 3 Rule 4 3 + 8 – digit  3 + 8 – 2 Rule 4

45 Chart 45  The leftmost non-terminal is replaced in each step digit + digit – digit  3 + digit – digit 3 + digit – digit  3 + 8 – digit Rule 4 expr  expr – digit Rule 1 expr – digit  expr + digit – digit Rule 2 expr + digit – digit  digit + digit – digit Rule 3 Rule 4 3 + 8 – digit  3 + 8 – 2 Rule 4 expr digit 3 2 8 + - 3 2 1 4 5 6 1 2 3 4 5 6

46 Chart 46  Parser examines terminal symbols of the input string, in order from left to right  Reconstructs the syntax tree from the bottom (terminal nodes) up (toward the root node)  Bottom-up parsing reduces a string w to the start symbol of the grammar. o At each reduction step a particular sub-string matching the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse.

47 Chart 47  Types of bottom-up parsing algorithms o Shift-reduce parsing At each reduction step a particular sub-string matching the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse. o LR(k) parsing L is for left-to-right scanning of the input, the R is for constructing a right-most derivation in reverse, and the k is for the number of input symbols of look-ahead that are used in making parsing decisions.

48 Chart 48 1. expr  expr – digit 2. expr  expr + digit 3. expr  digit 4. digit  0|1|2|…|9 Example input: 3 + 8 - 2 3 + 8 - 2 3 + 8 - 2 digit 3 + 8 - 2 3 + 8 - 2 expr

49 Chart 49 3 + 8 - 2 digit expr 3 + 8 - 2 digit expr digit expr 3 + 8 - 2 digit expr digit

50 Chart 50 1. S  aABe 2. A  Abc | b 3. B  d Example input: abbcde ab bcde a bbcde A Abbcde  aAbcde a b bc de A aAbcde

51 Chart 51 1. S  aABe 2. A  Abc | b 3. B  d Example input: abbcde a bbc d e A A aAde a bbcde A A aAbcde  aAde

52 Chart 52 1. S  aABe 2. A  Abc | b 3. B  d Example input: abbcde a bbcde A A aAde  aABe B a bbcd e A A aABe B

53 Chart 53 1. S  aABe 2. A  Abc | b 3. B  d Example input: abbcde a bbcd e A A aABe  S B S

54 Chart 54 1. Sentence  Subject Verb Object. 2. Subject  I | a Noun | the Noun 3. Object  me | a Noun | the Noun 4. Noun  cat | mat | rat 5. Verb  like | is | see | sees Example input: the cat sees a rat thecat seesarat. the catseesarat Noun the cat sees a rat.  the Noun sees a rat.. the catseesarat Noun the Noun sees a rat..

55 Chart 55 1. Sentence  Subject Verb Object. 2. Subject  I | a Noun | the Noun 3. Object  me | a Noun | the Noun 4. Noun  cat | mat | rat 5. Verb  like | is | see | sees Example input: the cat sees a rat thecat sees arat Noun Subject sees a rat. Subject. thecatseesarat Noun the Noun sees a rat.  Subject sees a rat. Subject.

56 Chart 56 1. Sentence  Subject Verb Object. 2. Subject  I | a Noun | the Noun 3. Object  me | a Noun | the Noun 4. Noun  cat | mat | rat 5. Verb  like | is | see | sees Example input: the cat sees a rat thecatseesarat Noun Subject sees a rat.  Subject Verb a rat. Subject Verb. thecatsees arat Noun Subject Verb a rat. Subject Verb.

57 Chart 57 1. Sentence  Subject Verb Object. 2. Subject  I | a Noun | the Noun 3. Object  me | a Noun | the Noun 4. Noun  cat | mat | rat 5. Verb  like | is | see | sees Example input: the cat sees a rat Subject Verb a rat.  Subject Verb a Noun. thecatsees a rat Noun Subject Verb. Noun thecatsees a rat Noun Subject Verb. Noun Subject Verb a Noun.

58 Chart 58 1. Sentence  Subject Verb Object. 2. Subject  I | a Noun | the Noun 3. Object  me | a Noun | the Noun 4. Noun  cat | mat | rat 5. Verb  like | is | see | sees Example input: the cat sees a rat thecatseesarat Noun Subject Verb. Noun Subject Verb a Noun.  Subject Verb Object. Object Subject Verb Object. thecatseesarat Noun Subject Verb. Noun Object What would happened if we choose ‘Subject  a Noun’ instead of ‘Object  a Noun’?

59 Chart 59 1. Sentence  Subject Verb Object. 2. Subject  I | a Noun | the Noun 3. Object  me | a Noun | the Noun 4. Noun  cat | mat | rat 5. Verb  like | is | see | sees Example input: the cat sees a rat thecatseesarat Noun Subject Verb. Noun Subject Verb Object. Object Sentence

60 Chart 60  The parser examines the terminal symbols of the input string, in order from left to right.  The parser reconstructs its syntax tree from the top (root node) down (towards the terminal nodes). An attempt to find the leftmost derivation for an input string

61 Chart 61  General rules for top-down parsers o Start with just a stub for the root node o At each step the parser takes the left most stub o If the stub is labeled by terminal symbol t, the parser connects it to the next input terminal symbol, which must be t. (If not, the parser has detected a syntactic error.) o If the stub is labeled by nonterminal symbol N, the parser chooses one of the production rules N::= X 1 …X n, and grows branches from the node labeled by N to new stubs labeled X 1,…, X n (in order from left to right). o Parsing succeeds when and if the whole input string is connected up to the syntax tree.

62 Chart 62  Two forms o Backtracking parsers Guesses which rule to apply, back up, and changes choices if it can not proceed o Predictive Parsers Predicts which rule to apply by using look-ahead tokens Backtracking parsers are not very efficient. We will cover Predictive parsers

63 Chart 63  Many types o LL(1) parsing First L is scanning the input form left to right; second L is for producing a left-most derivation; 1 is for using one input symbol of look-ahead Table driven with an explicit stack to maintain the parse tree o Recursive decent parsing Uses recursive subroutines to traverse the parse tree

64 Chart 64  Lookahead in predictive parsing o The lookahead token (next token in the input) is used to determine which rule should be used next o For example: 1. term  num term’ 2. term’  ‘+’ num term’ | ‘-’ num term’ |  – num  ‘0’|’1’|’2’|…|’9’ Example input: 7 + 3 - 2 term’num 7 + term num term’ term num term’

65 Chart 65 1. term  num term’ 2. term’  ‘+’ num term’ | ‘-’ num term’ |  – num  ‘0’|’1’|’2’|…|’9’ Example input: 7 + 3 - 2 term’num 7 + term num term’ 3 num 7 + term num term’ 3 - num term’

66 Chart 66 1. term  num term’ 2. term’  ‘+’ num term’ | ‘-’ num term’ |  – num  ‘0’|’1’|’2’|…|’9’ Example input: 7 + 3 - 2 term’ num 7 + term num term’ 3 - num term’ 2 num 7 + term num term’ 3 - num term’ 2 

67 Chart 67  Top-down parsing algorithm o Consists of a group of methods (programs) parseN, one for each nonterminal symbol N of the grammar. o The task of each method parseN is to parse a single N-phrase o These parsing methods cooperate to parse complete sentences

68 Chart 68 1. Sentence  Subject Verb Object. 2. Subject  I | a Noun | the Noun 3. Object  me | a Noun | the Noun 4. Noun  cat | mat | rat 5. Verb  like | is | see | sees Example input: the cat sees a rat Sentence Subject Verb Object. thecatseesarat. a.Decide which production rule to apply. Only one, #1. This step created four stubs.

69 Chart 69 1. Sentence  Subject Verb Object. 2. Subject  I | a Noun | the Noun 3. Object  me | a Noun | the Noun 4. Noun  cat | mat | rat 5. Verb  like | is | see | sees Example input: the cat sees a rat Sentence Subject Verb Object. the catseesarat Noun

70 Chart 70 1. Sentence  Subject Verb Object. 2. Subject  I | a Noun | the Noun 3. Object  me | a Noun | the Noun 4. Noun  cat | mat | rat 5. Verb  like | is | see | sees Example input: the cat sees a rat Sentence Subject Verb Object. the cat seesarat Noun

71 Chart 71 1. Sentence  Subject Verb Object. 2. Subject  I | a Noun | the Noun 3. Object  me | a Noun | the Noun 4. Noun  cat | mat | rat 5. Verb  like | is | see | sees Example input: the cat sees a rat Sentence Subject Verb Object. the catsees arat Noun

72 Chart 72 1. Sentence  Subject Verb Object. 2. Subject  I | a Noun | the Noun 3. Object  me | a Noun | the Noun 4. Noun  cat | mat | rat 5. Verb  like | is | see | sees Example input: the cat sees a rat Sentence Subject Verb Object. the catseesa rat Noun

73 Chart 73 1. Sentence  Subject Verb Object. 2. Subject  I | a Noun | the Noun 3. Object  me | a Noun | the Noun 4. Noun  cat | mat | rat 5. Verb  like | is | see | sees Example input: the cat sees a rat Sentence Subject Verb Object. the catseesarat Noun

74 Chart 74 1. Sentence  Subject Verb Object. 2. Subject  I | a Noun | the Noun 3. Object  me | a Noun | the Noun 4. Noun  cat | mat | rat 5. Verb  like | is | see | sees Example input: the cat sees a rat Sentence Subject Verb Object. the catseesarat Noun

75 Chart 75 ParseSentence ParseSubject ParseObject ParseVerb ParseNoun 1. Sentence  Subject Verb Object. 2. Subject  I | a Noun | the Noun 3. Object  me | a Noun | the Noun 4. Noun  cat | mat | rat 5. Verb  like | is | see | sees

76 Chart 76 ParseSentence parseSubject parseVerb parseObject parseEnd 1. Sentence  Subject Verb Object. 2. Subject  I | a Noun | the Noun 3. Object  me | a Noun | the Noun 4. Noun  cat | mat | rat 5. Verb  like | is | see | sees Sentence  Subject Verb Object.

77 Chart 77 ParseSubject if input = “I” accept else if input =“a” accept parseNoun else if input = “the” accept parseNoun else error 1. Sentence  Subject Verb Object. 2. Subject  I | a Noun | the Noun 3. Object  me | a Noun | the Noun 4. Noun  cat | mat | rat 5. Verb  like | is | see | sees Subject  I | Noun a | the

78 Chart 78 ParseNoun if input = “cat” accept else if input =“mat” accept else if input = “rat” accept else error 1. Sentence  Subject Verb Object. 2. Subject  I | a Noun | the Noun 3. Object  me | a Noun | the Noun 4. Noun  cat | mat | rat 5. Verb  like | is | see | sees Noun  cat |mat |rat

79 Chart 79 ParseObject if input = “me” accept else if input =“a” accept parseNoun else if input = “the” accept parseNoun else error 1. Sentence  Subject Verb Object. 2. Subject  I | a Noun | the Noun 3. Object  me | a Noun | the Noun 4. Noun  cat | mat | rat 5. Verb  like | is | see | sees Object  me | Noun a | the

80 Chart 80 ParseVerb if input = “like” accept else if input =“is” accept else if input = “see” accept else if input = “sees” accept else error 1. Sentence  Subject Verb Object. 2. Subject  I | a Noun | the Noun 3. Object  me | a Noun | the Noun 4. Noun  cat | mat | rat 5. Verb  like | is | see | sees Verb  like |is |see |sees

81 Chart 81 ParseEnd if input = “.” accept else error 1. Sentence  Subject Verb Object. 2. Subject  I | a Noun | the Noun 3. Object  me | a Noun | the Noun 4. Noun  cat | mat | rat 5. Verb  like | is | see | sees.

82 Chart 82  Given a (suitable) context-free grammar o Express the grammar in EBNF, with a single production rule for each nonterminal symbol, and perform any necessary grammar transformations Always eliminate left recursion Always left-factorize whenever possible o Transcribe each EBNF production rule N::=X to a parsing method parseN, whose body is determined by X o Make the parser consist of: A private variable currentToken; Private parsing methods developed in previous step Private auxiliary methods accept and acceptIt, both of which call the scanner A public parse method that calls parseS, where S is the start symbol of the grammar), having first called the scanner to store the first input token in currentToken

83 Chart 83  “C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows away your whole leg.” o Bjarne Stroustrup

84 Chart 84 Did you really say that? Dr. Bjarne Stroustrup: Yes, I did say something along the lines of C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows your whole leg off. What people tend to miss is that what I said about C++ is to a varying extent true for all powerful languages. As you protect people from simple dangers, they get themselves into new and less obvious problems. Someone who avoids the simple problems may simply be heading for a not-so-simple one. One problem with very supporting and protective environments is that the hard problems may be discovered too late or be too hard to remedy once discovered. Also, a rare problem is harder to find than a frequent one because you don't suspect it. I also said, "Within C++, there is a much smaller and cleaner language struggling to get out." For example, that quote can be found on page 207 of The Design and Evolution of C++. And no, that smaller and cleaner language is not Java or C#. The quote occurs in a section entitled "Beyond Files and Syntax". I was pointing out that the C++ semantics is much cleaner than its syntax. I was thinking of programming styles, libraries and programming environments that emphasized the cleaner and more effective practices over archaic uses focused on the low-level aspects of C.

85 Chart 85  For production rule N::=X o Convert production rule to parsing method named parseN Private void parseN () { Parse X } o Refine parseE to a dummy statement o Refine parse t (where t is a terminal symbol) to accept(t) or acceptIt() o Refine parse N (where N is a non terminal symbol) to a call of the corresponding parsing method parseN() o Refine parse X Y to { parseX parseY }} o Refine parse X|Y Switch (currentToken.kind) { Cases in starter[[X]] Parse X Break; Cases in starters[[Y]]: Parse Y Break Default: Report a syntax error }

86 Chart 86  For X | Y o Choose parse X only if the current token is one that can start an X- phrase o Choose parse Y only if the current token is one that can start an Y- phrase starters[[X]] and starters[[Y]] must be disjoint  For X* o Choose while (currentToken.kind is in starters[[X]]) starter[[X]] must be disjoint from the set of tokens that can follow X* in this particular context

87 Chart 87  A grammar that satisfies both these conditions is called an LL(1) grammar  Recursive-descent parsing is suitable only for LL(1) grammars

88 Chart 88  Good programming languages are designed with a relatively large “distance” between syntactically correct programs, to increase the likelihood that conceptual mistakes are caught on syntactic errors.  Error repair usually occurs at two levels: o Local: repairs mistakes with little global import, such as missing semicolons and undeclared variables. o Scope: repairs the program text so that scopes are correct. Errors of this kind include unbalanced parentheses and begin/end blocks.

89 Chart 89  Repair actions can be divided into insertions and deletions. Typically the compiler will use some look ahead and backtracking in attempting to make progress in the parse. There is great variation among compilers, though some languages (PL/C) carry a tradition of good error repair. Goals of error repair are: o No input should cause the compiler to collapse o Illegal constructs are flagged o Frequently occurring errors are repaired gracefully o Minimal stuttering or cascading of errors. LL-Style parsing lends itself well to error repair, since the compiler uses the grammar’s rules to predict what should occur next in the input

90 Chart 90 Single-Command ::=  | V-name := Expression | Identifier ( Actual-Parameter-Sequence ) | begin Command end | let Declaration in single-Command | if Expression then single-Command else single-Command | while Expression do single-Command V-name ::= Identifier | V-name. Identifier | V-name [ Expression ] Identifier :: = Letter (Letter | Digit)* Letter ::= a|b|c|d|e|f|g|h|I|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z |A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z Digit :: = 0|1|2|3|4|5|6|7|8|9

91 Chart 91  Starter Set for RE o starters[[X]] is the string of terminal symbols that can start a string generated by X  Example starters[[single-Command]] = starters[[:=, (, begin, let, if, while]] What about Vname vs Identifier? Use the look ahead when encounter Identifier to look for := or (.

92 Chart 92 Program::=CommandProgram(1.14) Command::=V-name := ExpressionAssignCommand(1.15a) |Identifier ( Expression )CallCommand(1.15b) |Command ; CommandSequentialCommand(1.15c) |if Expression then CommandIfCommand(15.d) else Command |while Expression do CommandWhileCommand(1.15e |let Declaration in CommandLetCommand(1.15f) Expression::=Integer-LiteralIntegerExpression(1.16a) |V-nameVnameExpression(1.16b) |Operator ExpressionUnaryExpression(1.16c) |Expression Operator ExpressionBinaryExpressioiun(1.16d) V-name::=IdentifierSimpelVname(1.17) Declaration::=const Identifier ~ ExpressionConstDeclaration(1.18a) |var Identifier : Typoe-denoterVarDeclaration(1.18b) |Declaration ; DeclarationSequentialDeclaration(1.18c) Type-denoter::=IdentifierSimpleTypeDenoter(1.19) Label

93 Chart 93  An explicit representation of the source program’s phrase structure  AST for Mini-Triangle

94 Chart 94  Program ASTs (P): Program C Program::=CommandProgram(1.14 Command ASTs (C): AssignCommand V E CallCommand Identifier E spelling SequentialCommand C1C1 C2C2 Command::=V-name := ExpressionAssignCommand(1.15a) |Identifier ( Expression )CallCommand(1.15b) |Command ; CommandSequentialCommand(1.15c) (1.15a) (1.15b)(1.15c)

95 Chart 95 Command ASTs (C): WhileCommand E C IfCommand C1C1 C2C2 (1.15e) (1.15d) LetCommand D C (1.15f) E Command::=|if Expression then CommandIfCommand(15.d) else Command |while Expression do CommandWhileCommand(1.15e |let Declaration in CommandLetCommand(1.15f)


Download ppt " Fall 2013. Chart 2  Sub-phases of Syntactic Analysis  Grammars Revisited  Parsing  Abstract Syntax Trees  Scanning  Case Study: Syntactic Analysis."

Similar presentations


Ads by Google