Presentation is loading. Please wait.

Presentation is loading. Please wait.

Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

Similar presentations


Presentation on theme: "Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture."— Presentation transcript:

1 Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II: inside a compiler 4Syntax analysis 5Contextual analysis 6Runtime organization 7Code generation PART III: conclusion 8Interpretation 9Review

2 Syntax Analysis (Chapter 4) 2 Abstract Syntax Trees So far we have talked about how to build a recursive descent parser which recognizes a given language described by an (LL 1) EBNF grammar. Now we will look at –how to represent AST as data structures. –how to modify the parser to construct an AST data structure. We make heavy use of Object–Oriented Programming! (classes, inheritance, dynamic method binding)

3 Syntax Analysis (Chapter 4) 3 AST Representation: Possible Tree Shapes Command ::= V-name := Expression AssignCmd | Identifier ( Expression ) CallCmd | if Expression then Command else Command IfCmd | while Expression do Command WhileCmd | let Declaration in Command LetCmd | Command ; Command SequentialCmd Command ::= V-name := Expression AssignCmd | Identifier ( Expression ) CallCmd | if Expression then Command else Command IfCmd | while Expression do Command WhileCmd | let Declaration in Command LetCmd | Command ; Command SequentialCmd The possible form of AST structures is determined by an AST grammar (as described earlier in Chapter 1) Example: remember the Mini-triangle abstract syntax

4 Syntax Analysis (Chapter 4) 4 AST Representation: Possible Tree Shapes Command ::= VName := Expression AssignCmd |... Command ::= VName := Expression AssignCmd |... Example: remember the Mini-triangle abstract syntax (excerpt below) AssignCommand V E

5 Syntax Analysis (Chapter 4) 5 AST Representation: Possible Tree Shapes Command ::=... | Identifier ( Expression ) CallCmd... Command ::=... | Identifier ( Expression ) CallCmd... Example: remember the Mini-triangle abstract syntax (excerpt below) CallCommand Identifier E Spelling

6 Syntax Analysis (Chapter 4) 6 AST Representation: Possible Tree Shapes Command ::=... | if Expression then Command else Command IfCmd... Command ::=... | if Expression then Command else Command IfCmd... Example: remember the Mini-triangle abstract syntax (excerpt below) IfCommand E C1C2

7 Syntax Analysis (Chapter 4) 7 AST LHS Tag1Tag2… abstract concrete abstract AST Representation: Java (or C ++ ) Data Structures public abstract class AST {... } Example: Java classes to represent Mini-Triangle AST’s 1) A common (abstract) super class for all AST nodes 2) A Java class for each “type” of node. abstract as well as concrete node types LHS ::=... Tag1 |... Tag2 LHS ::=... Tag1 |... Tag2

8 Syntax Analysis (Chapter 4) 8 Example: Mini Triangle Commands AST’s public abstract class Command extends AST {... } public class AssignCommand extends Command {... } public class CallCommand extends Command {... } public class IfCommand extends Command {... } etc. public abstract class Command extends AST {... } public class AssignCommand extends Command {... } public class CallCommand extends Command {... } public class IfCommand extends Command {... } etc. Command ::= V-name := Expression AssignCmd | Identifier ( Expression ) CallCmd | if Expression then Command else Command IfCmd | while Expression do Command WhileCmd | let Declaration in Command LetCmd | Command ; Command SequentialCmd Command ::= V-name := Expression AssignCmd | Identifier ( Expression ) CallCmd | if Expression then Command else Command IfCmd | while Expression do Command WhileCmd | let Declaration in Command LetCmd | Command ; Command SequentialCmd

9 Syntax Analysis (Chapter 4) 9 Example: Mini Triangle Command AST’s Command ::= V-name := Expression AssignCmd | Identifier ( Expression ) CallCmd |... Command ::= V-name := Expression AssignCmd | Identifier ( Expression ) CallCmd |... public class AssignCommand extends Command { public Vname V;// variable on left side of := public Expression E;// expression on right side of :=... } public class CallCommand extends Command { public Identifier I;// procedure name public Expression E;// actual parameter... }... public class AssignCommand extends Command { public Vname V;// variable on left side of := public Expression E;// expression on right side of :=... } public class CallCommand extends Command { public Identifier I;// procedure name public Expression E;// actual parameter... }...

10 Syntax Analysis (Chapter 4) 10 AST Terminal Nodes public abstract class Terminal extends AST { public String spelling;... } public class Identifier extends Terminal {... } public class IntegerLiteral extends Terminal {... } public class Operator extends Terminal {... } public abstract class Terminal extends AST { public String spelling;... } public class Identifier extends Terminal {... } public class IntegerLiteral extends Terminal {... } public class Operator extends Terminal {... }

11 Syntax Analysis (Chapter 4) 11 AST Construction public class AssignCommand extends Command { public Vname V; // Left side variable public Expression E; // right side expression public AssignCommand (Vname V, Expression E) { this.V = V; this.E=E; }... } public class Identifier extends Terminal { public class Identifier (String spelling) { this.spelling = spelling; }... } public class AssignCommand extends Command { public Vname V; // Left side variable public Expression E; // right side expression public AssignCommand (Vname V, Expression E) { this.V = V; this.E=E; }... } public class Identifier extends Terminal { public class Identifier (String spelling) { this.spelling = spelling; }... } Examples: Of course, every concrete AST class needs a constructor.

12 Syntax Analysis (Chapter 4) 12 AST Construction We will now show how to refine our recursive descent parser to actually construct an AST. private N parseN( ) {// note that return type is N N theAST; parse X and simultaneously construct theAST return theAST; } private N parseN( ) {// note that return type is N N theAST; parse X and simultaneously construct theAST return theAST; } N ::= X

13 Syntax Analysis (Chapter 4) 13 Example: Construction Mini-Triangle AST’s // old (recognizing only) version: private void parseCommand( ) { parseSingleCommand( ); while (currentToken.kind==Token.SEMICOLON) { acceptIt( ); parseSingleCommand( ); } // old (recognizing only) version: private void parseCommand( ) { parseSingleCommand( ); while (currentToken.kind==Token.SEMICOLON) { acceptIt( ); parseSingleCommand( ); } Command ::= single-Command ( ; single-Command )* // AST-generating version private Command parseCommand( ) { Command theAST; theAST = parseSingleCommand( ); while (currentToken.kind==Token.SEMICOLON) { acceptIt( ); Command extraCmd = parseSingleCommand( ); theAST = new SequentialCommand (theAST, extraCmd); } return theAST; } // AST-generating version private Command parseCommand( ) { Command theAST; theAST = parseSingleCommand( ); while (currentToken.kind==Token.SEMICOLON) { acceptIt( ); Command extraCmd = parseSingleCommand( ); theAST = new SequentialCommand (theAST, extraCmd); } return theAST; }

14 Syntax Analysis (Chapter 4) 14 Example: Construction Mini-Triangle AST’s private Command parseSingleCommand( ) { Command comAST; parse it and construct AST return comAST; } private Command parseSingleCommand( ) { Command comAST; parse it and construct AST return comAST; } single-Command ::= Identifier ( := Expression | ( Expression ) ) | if Expression then single-Command else single-Command | while Expression do single-Command | let Declaration in single-Command | begin Command end single-Command ::= Identifier ( := Expression | ( Expression ) ) | if Expression then single-Command else single-Command | while Expression do single-Command | let Declaration in single-Command | begin Command end

15 Syntax Analysis (Chapter 4) 15 Example: Construction Mini-Triangle AST’s private Command parseSingleCommand( ) { Command comAST; switch (currentToken.kind) { case Token.IDENTIFIER: parse Identifier ( := Expression | ( Expression ) ) case Token.IF: parse if Expression then single-Command else single-Command case Token.WHILE: parse while Expression do single-Command case Token.LET: parse let Declaration in single-Command case Token.BEGIN: parse begin Command end default: report syntax error } return comAST; } private Command parseSingleCommand( ) { Command comAST; switch (currentToken.kind) { case Token.IDENTIFIER: parse Identifier ( := Expression | ( Expression ) ) case Token.IF: parse if Expression then single-Command else single-Command case Token.WHILE: parse while Expression do single-Command case Token.LET: parse let Declaration in single-Command case Token.BEGIN: parse begin Command end default: report syntax error } return comAST; }

16 Syntax Analysis (Chapter 4) 16 Example: Construction Mini-Triangle AST’s... case Token.IDENTIFIER: // parse Identifier ( := Expression | ( Expression ) ) Identifier idAST = parseIdentifier( ); switch (currentToken.kind) { case Token.BECOMES: acceptIt( ); Expression expAST = parseExpression( ); comAST = new AssignmentCommand (idAST, expAST); break; case Token.LPAREN: acceptIt( ); Expression expAST = parseExpression( ); comAST = new CallCommand (idAST, expAST); accept(Token.RPAREN); break; } break;... case Token.IDENTIFIER: // parse Identifier ( := Expression | ( Expression ) ) Identifier idAST = parseIdentifier( ); switch (currentToken.kind) { case Token.BECOMES: acceptIt( ); Expression expAST = parseExpression( ); comAST = new AssignmentCommand (idAST, expAST); break; case Token.LPAREN: acceptIt( ); Expression expAST = parseExpression( ); comAST = new CallCommand (idAST, expAST); accept(Token.RPAREN); break; } break;...

17 Syntax Analysis (Chapter 4) 17 Example: Construction Mini-Triangle AST’s... break; case Token.IF: // parse if Expression then single-Command // else single-Command acceptIt( ); Expression expAST = parseExpression( ); accept(Token.THEN); Command thenAST = parseSingleCommand( ); accept(Token.ELSE); Command elseAST = parseSingleCommand( ); comAST = new IfCommand (expAST, thenAST, elseAST); break; case Token.WHILE:... break; case Token.IF: // parse if Expression then single-Command // else single-Command acceptIt( ); Expression expAST = parseExpression( ); accept(Token.THEN); Command thenAST = parseSingleCommand( ); accept(Token.ELSE); Command elseAST = parseSingleCommand( ); comAST = new IfCommand (expAST, thenAST, elseAST); break; case Token.WHILE:...

18 Syntax Analysis (Chapter 4) 18 Example: Construction Mini-Triangle AST’s... break; case Token.BEGIN: // parse begin Command end acceptIt( ); comAST = parseCommand( ); accept(Token.END); break; default: report syntax error } return comAST; }... break; case Token.BEGIN: // parse begin Command end acceptIt( ); comAST = parseCommand( ); accept(Token.END); break; default: report syntax error } return comAST; }

19 Syntax Analysis (Chapter 4) 19 Syntax Analysis: Scanner Scanner Source Program Abstract Syntax Tree Error Reports Parser Stream of “Tokens” (Stream of Characters) Error Reports Dataflow chart

20 Syntax Analysis (Chapter 4) 20 Scanner public class Parser { private Token currentToken; private void accept (byte expectedKind) { if (currentToken.kind == expectedKind) currentToken = scanner.scan( ); else report syntax error } private void acceptIt( ) { currentToken = scanner.scan( ); } public void parse( ) {... } public class Parser { private Token currentToken; private void accept (byte expectedKind) { if (currentToken.kind == expectedKind) currentToken = scanner.scan( ); else report syntax error } private void acceptIt( ) { currentToken = scanner.scan( ); } public void parse( ) {... } Remember: We have not yet implemented this

21 Syntax Analysis (Chapter 4) 21 Steps for Developing a Scanner 1) Express the “lexical” grammar in EBNF (do necessary transformations) 2) Implement scanner based on this grammar (details explained later) 3) Modify scanner to keep track of spelling and kind of currently scanned token To save some time we’ll do steps 2 and 3 together

22 Syntax Analysis (Chapter 4) 22 Developing a Scanner Express the “lexical” grammar in EBNF Token ::= Identifier | Integer-Literal | Operator | ; | : | := | ~ | ( | ) | eot Identifier ::= Letter (Letter | Digit)* Integer-Literal ::= Digit Digit* Operator ::= + | - | * | / | | = Separator ::= Comment | space | eol Comment ::= ! Graphic* eol Token ::= Identifier | Integer-Literal | Operator | ; | : | := | ~ | ( | ) | eot Identifier ::= Letter (Letter | Digit)* Integer-Literal ::= Digit Digit* Operator ::= + | - | * | / | | = Separator ::= Comment | space | eol Comment ::= ! Graphic* eol Next perform substitution and left factorization... Token ::= Letter (Letter | Digit)* | Digit Digit* | + | - | * | / | | = | ; | : (= |  ) | ~ | ( | ) | eot Separator ::= ! Graphic* eol | space | eol Token ::= Letter (Letter | Digit)* | Digit Digit* | + | - | * | / | | = | ; | : (= |  ) | ~ | ( | ) | eot Separator ::= ! Graphic* eol | space | eol

23 Syntax Analysis (Chapter 4) 23 Developing a Scanner Now implement the scanner public class Scanner { private char currentChar; private StringBuffer currentSpelling; private byte currentKind; private char take (char expectedChar) {... }// analogous to accept private char takeIt( ) {... }// analogous to acceptIt // other private auxiliary methods and scanning methods go here public Token scan( ) {... } } public class Scanner { private char currentChar; private StringBuffer currentSpelling; private byte currentKind; private char take (char expectedChar) {... }// analogous to accept private char takeIt( ) {... }// analogous to acceptIt // other private auxiliary methods and scanning methods go here public Token scan( ) {... } }

24 Syntax Analysis (Chapter 4) 24 Developing Scanner public class Token { byte kind; String spelling; final static byte IDENTIFIER = 0; INTLITERAL = 1; OPERATOR = 2; BEGIN = 3; CONST = 4;......// in C++ can improve this by using an enum type public Token (byte kind, String spelling) { this.kind = kind; this.spelling = spelling; if spelling matches a keyword then change kind automatically (e.g. “begin” => 3, “const” => 4, …) }... } public class Token { byte kind; String spelling; final static byte IDENTIFIER = 0; INTLITERAL = 1; OPERATOR = 2; BEGIN = 3; CONST = 4;......// in C++ can improve this by using an enum type public Token (byte kind, String spelling) { this.kind = kind; this.spelling = spelling; if spelling matches a keyword then change kind automatically (e.g. “begin” => 3, “const” => 4, …) }... } Scanner will return instances of Token

25 Syntax Analysis (Chapter 4) 25 Developing a Scanner public class Scanner { private char currentChar = get first source char; private StringBuffer currentSpelling; private byte currentKind; private char take (char expectedChar) {// analogous to accept if (currentChar == expectedChar) { currentSpelling.append (currentChar); currentChar = get next source char; } else report lexical error } private char takeIt( ) { // analogous to acceptIt currentSpelling.append (currentChar); currentChar = get next source char; }... public class Scanner { private char currentChar = get first source char; private StringBuffer currentSpelling; private byte currentKind; private char take (char expectedChar) {// analogous to accept if (currentChar == expectedChar) { currentSpelling.append (currentChar); currentChar = get next source char; } else report lexical error } private char takeIt( ) { // analogous to acceptIt currentSpelling.append (currentChar); currentChar = get next source char; }...

26 Syntax Analysis (Chapter 4) 26 Developing a Scanner... public Token scan( ) { // get rid of potential separators before scanning a token while ( (currentChar == ‘ ! ’) || (currentChar == ‘ ’) || (currentChar == ‘\n’ ) ) scanSeparator( ); currentSpelling = new StringBuffer( ); currentKind = scanToken( ); return new Token (currentkind, currentSpelling.toString( )); } private void scanSeparator( ) {... } private byte scanToken( ) {... }... public Token scan( ) { // get rid of potential separators before scanning a token while ( (currentChar == ‘ ! ’) || (currentChar == ‘ ’) || (currentChar == ‘\n’ ) ) scanSeparator( ); currentSpelling = new StringBuffer( ); currentKind = scanToken( ); return new Token (currentkind, currentSpelling.toString( )); } private void scanSeparator( ) {... } private byte scanToken( ) {... }... Developed in much the same way as parsing methods

27 Syntax Analysis (Chapter 4) 27 Developing a Scanner private byte scanToken( ) { switch (currentChar) { case ‘a’: case ‘b’:... case ‘z’: case ‘A’: case ‘B’:... case ‘Z’: scan Letter (Letter | Digit)* return Token.IDENTIFIER; case ‘0’:... case ‘9’: scan Digit Digit* return Token.INTLITERAL; case ‘+’: case ‘-’:... : case ‘=’: takeIt( ); return Token.OPERATOR;...etc... } private byte scanToken( ) { switch (currentChar) { case ‘a’: case ‘b’:... case ‘z’: case ‘A’: case ‘B’:... case ‘Z’: scan Letter (Letter | Digit)* return Token.IDENTIFIER; case ‘0’:... case ‘9’: scan Digit Digit* return Token.INTLITERAL; case ‘+’: case ‘-’:... : case ‘=’: takeIt( ); return Token.OPERATOR;...etc... } Token ::= Letter (Letter | Digit)* | Digit Digit* | + | - | * | / | | = | ; | : (=|  ) | ~ | ( | ) | eot Token ::= Letter (Letter | Digit)* | Digit Digit* | + | - | * | / | | = | ; | : (=|  ) | ~ | ( | ) | eot

28 Syntax Analysis (Chapter 4) 28 Developing a Scanner... return... case ‘a’: case ‘b’:... case ‘z’: case ‘A’: case ‘B’:... case ‘Z’: scan Letter (Letter | Digit)* return Token.IDENTIFIER; case ‘0’:... case ‘9’:... return... case ‘a’: case ‘b’:... case ‘z’: case ‘A’: case ‘B’:... case ‘Z’: scan Letter (Letter | Digit)* return Token.IDENTIFIER; case ‘0’:... case ‘9’:... return... case ‘a’: case ‘b’:... case ‘z’: case ‘A’: case ‘B’:... case ‘Z’: scan Letter scan (Letter | Digit)* return Token.IDENTIFIER; case ‘0’:... case ‘9’:... return... case ‘a’: case ‘b’:... case ‘z’: case ‘A’: case ‘B’:... case ‘Z’: scan Letter scan (Letter | Digit)* return Token.IDENTIFIER; case ‘0’:... case ‘9’:... return... case ‘a’: case ‘b’:... case ‘z’: case ‘A’: case ‘B’:... case ‘Z’: takeIt( ); scan (Letter | Digit)* return Token.IDENTIFIER; case ‘0’:... case ‘9’:... return... case ‘a’: case ‘b’:... case ‘z’: case ‘A’: case ‘B’:... case ‘Z’: takeIt( ); scan (Letter | Digit)* return Token.IDENTIFIER; case ‘0’:... case ‘9’:... return... case ‘a’: case ‘b’:... case ‘z’: case ‘A’: case ‘B’:... case ‘Z’: takeIt( ); while (isLetter(currentChar) || isDigit(currentChar) ) scan (Letter | Digit) return Token.IDENTIFIER; case ‘0’:... case ‘9’:... return... case ‘a’: case ‘b’:... case ‘z’: case ‘A’: case ‘B’:... case ‘Z’: takeIt( ); while (isLetter(currentChar) || isDigit(currentChar) ) scan (Letter | Digit) return Token.IDENTIFIER; case ‘0’:... case ‘9’:... return... case ‘a’: case ‘b’:... case ‘z’: case ‘A’: case ‘B’:... case ‘Z’: takeIt( ); while (isLetter(currentChar) || isDigit(currentChar) ) takeIt( ); return Token.IDENTIFIER; case ‘0’:... case ‘9’:... return... case ‘a’: case ‘b’:... case ‘z’: case ‘A’: case ‘B’:... case ‘Z’: takeIt( ); while (isLetter(currentChar) || isDigit(currentChar) ) takeIt( ); return Token.IDENTIFIER; case ‘0’:... case ‘9’:... Look at the identifier case in more detail


Download ppt "Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture."

Similar presentations


Ads by Google