Download presentation
Presentation is loading. Please wait.
Published byBarrie Fields Modified over 8 years ago
0
CSC 415: Translators and Compilers
Dr. Chuck Lillie
1
Course Outline Translators and Compilers Major Programming Project
Language Processors Compilation Syntactic Analysis Contextual Analysis Run-Time Organization Code Generation Interpretation Major Programming Project Project Definition and Planning Implementation Weekly Status Reports Project Presentation
2
Project Implement a Compiler for the Programming Language Triangle
Appendix B: Informal Specification of the Programming Language Triangle Appendix D: Class Diagrams for the Triangle Compiler Present Project Plan What and How Weekly Status Reports Work accomplished during the reporting period Deliverable progress, as a percentage of completion Problem areas Planned activities for the next reporting period
3
Chapter 1: Introduction to Programming Languages
Programming Language: A formal notation for expressing algorithms. Programming Language Processors: Tools to enter, edit, translate, and interpret programs on machines. Machine Code: Basic machine instructions Keep track of exact address of each data item and each instruction Encode each instruction as a bit string Assembly Language: Symbolic names for operations, registers, and addresses.
4
Programming Languages
High Level Languages: Notation similar to familiar mathematical notation Expressions: +, -, *, / Data Types: truth variables, characters, integers, records, arrays Control Structures: if, case, while, for Declarations: constant values, variables, procedures, functions, types Abstraction: separates what is to be performed from how it is to be performed Encapsulation (or data abstraction): group together related declarations and selectively hide some
5
Programming Languages
Any system that manipulates programs expressed in some particular programming language Editors: enter, modify, and save program text Translators and Compilers: Translates text from one language to another. Compiler translates a program from a high-level language to a low-level language, preparing it to be run on a machine Checks program for syntactic and contextual errors Interpreters: Runs program without compliation Command languages Database query languages
6
Programming Languages Specifications
Syntax Form of the program Defines symbols How phrases are composed Contextual constraints Scope: determine scope of each declaration Type: Semantics Meaning of the program
7
Representation Syntax Backus-Naur Form (BNF): context-free grammar
Terminal symbols (>=, while, ;) Non-terminal symbols (Program, Command, Expression, Declaration) Start symbol (Program) Production rules (defines how phrases are composed from terminals and sub-phrases) N::=a|b|…. Syntax Tree Used to define language in terms of strings and terminal symbols
8
Representation Semantics Abstract Syntax Abstract Syntax Tree
Concentrate on phrase structure alone Abstract Syntax Tree
9
Contextual Constraints
Scope Binding Static: determined by language processor Dynamic: determined at run-time Type Statically: language processor can detect all errors Dynamically: type errors cannot be detected until run-time Will assume static binding and statically typed
10
Semantics Concerned with meaning of program
Behavior when run Usually specified informally Declarative sentences Could include side effects Correspond to production rules
11
Chapter 2: Language Processors
Translators and Compilers Interpreters Real and Abstract Machines Interpretive Compilers Portable Compilers Bootstrapping Case Study: The Triangle Language Processor
12
Translators & Compilers
Translator: a program that accepts any text expressed in one language (the translator’s source language), and generates a semantically-equivalent text expressed in another language (its target language) Chinese-into-English Java-into-C Java-into-x86 X86 assembler
13
Translators & Compilers
Assembler: translates from an assembly language into the corresponding machine code Generates one machine code instruction per source instruction Compiler: translates from a high-level language into a low-level language Generates several machine-code instructions per source command.
14
Translators & Compilers
Disassembler: translates a machine code into the corresponding assembly language Decompiler: translates a low-level language into a high-level language Question: Why would you want a disassembler or decompiler?
15
Translators & Compilers
Source Program: the source language text Object Program: the target language text Compiler Source Program Syntax Check Context Constraints Object Program Generate Object Code Semantic Analysis Object program semantically equivalent to source program If source program is well-formed
16
Translators & Compilers
Why would you want to do: Java-into-C translator C-into-Java translator Assembly-language-into-Pascal decompiler
17
Translators & Compilers
P = Program Name L = Implementation Language M M = Target Machine M P L For this to work, L must equal M, that is, the implementation language must be the same as the machine language S = Source Language S T L T = Target Language L = Translator’s Implementation Language S-into-T Translator is itself a program that runs on machine L
18
Translators & Compilers
Translating a source program P Expressed in language T, Using an S-into-T translator Running on machine M
19
Translators & Compilers
sort sort sort Java Java x86 x86 x86 x86 x86 x86 Translating a source program sort Expressed in language Java, Using an Java-into-x86 translator Running on an x86 machine The object program is running on the same machine as the compiler
20
Translators & Compilers
sort sort sort Java Java PPC PPC PPC download x86 PPC x86 Translating a source program sort Expressed in language Java, Using an Java-into-PPC translator Running on an x86 machine Downloaded to a PPC machine Cross Compiler: The object program is running on a different machine than the compiler
21
Translators & Compilers
sort sort x86 sort sort Java Java C C C x86 x86 x86 x86 Translating a source program sort Expressed in language Java, Using an Java-into-C translator Running on an x86 machine Then translating the C program Using an C-into x86 compiler Running on an x86 machine Into x86 object program Two-stage Compiler: The source program is translated to another language before being translated into the object program
22
Translators & Compilers
Translator Rules Can run on machine M only if it is expressed in machine code M Source program must be expressed in translator’s source language S Object program is expressed in the translator’s target language T Object program is semantically equivalent to the source program
23
Interpreters Accepts any program (source program) expressed in a particular language (source language) and runs that source program immediately Does not translate the source program into object code prior to execution
24
Interpreters Interpreter Source Program Fetch Instruction
Analyze Instruction Program Complete Execute Instruction Source program starts to run as soon as the first instruction is analyzed
25
Interpreters When to Use Interpretation Disadvantages
Interactive mode – want to see results of instruction before entering next instruction Only use program once Each instruction expected to be executed only once Instructions have simple formats Disadvantages Slow: up to 100 times slower than in machine code
26
Interpreters Examples Basic Lisp Unix Command Language (shell) SQL
27
Interpreters S interpreter expressed in language L
Program P expressed in language S, using Interpreter S, running on machine M S M M Basic x86 graph Program graph written in Basic running on a Basic interpreter executed on an x86 machine
28
Real and Abstract Machines
Hardware emulation: Using software to execute one set of machine code on another machine Can measure everything about the new machine except its speed Abstract machine: emulator Real machine: actual hardware An abstract machine is functionally equivalent to a real machine if they both implement the same language L
29
Real and Abstract Machines
New Machine Instruction (nmi) interpreter written in C nmi C nmi interpreter expressed in machine code M nmi interpreter written in C The nmi interpreter is translated into machine code M using the C compiler nmi C nmi M C M M P nmi P nmi Compiler to translate C program into M machine code nmi M nmi M
30
Interpretive Compilers
Combination of compiler and interpreter Translate source program into an intermediate language It is intermediate in level between the source language and ordinary machine code Its instructions have simple formats, and therefore can be analyzed easily and quickly Translation from the source language into the intermediate language is easy and fast An interpretive compiles combines fast compilation with tolerable running speed
31
Interpretive Compilers
Java JVM M Java into JVM translator running on machine M JVM code interpreter running on machine M JVM M P JVM M Java JVM M P A Java program P is first translated into JVM-code, and then the JVM-code object program is interpreted
32
Portable Compilers A program is portable if it can be compiled and run on any machine, without change A portable program is more valuable than an unportable one, because its development cost can be spread over more copies Portability is measured by the proportion of code that remains unchanged when it is moved to a dissimilar machine Language affects protability Assembly language: 0% portable High level language: approaches 100% portability
33
Portable Compilers Language Processors
Valuable and widely used programs Typically written in high-level language Pascal, C, Java Part of language processor is machine dependent Code generation part Language processor is only about 50% portable Compiler that generates intermediate code is more portable than a compiler that generates machine code
34
Portable Compilers Rewrite interpreter in C
Java JVM Java JVM JVM Java Java JVM C Rewrite interpreter in C P JVM M JVM C JVM M P Java P JVM C M Java JVM M JVM M M Note: C M Compiler exists; rewrite JVM interpreter from Java to C
35
Bootstrapping The language processor is used to process itself
Implementation language is the source language Bootstrapping a portable compiler A portable compiler can be bootstrapped to make a true compiler – one that generates machine code – by writing an intermediate-language-into-machine-code translator Full bootstrap Writing the compiler in itself Using the latest version to upgrade the next version Half bootstrap Compiler expressed in itself but targeted for another machine Bootstrapping to improve efficiency Upgrade the compiler to optomize code generation as well as to improve compile efficiency
36
Bootstrapping Bootstrap an interpretive compiler to generate machine code Java M M JVM Java M JVM Java First, write a JVM-coded-into-M translator in Java Next, compile translator using existing interpreter Use translator to translate itself Java JVM M P Java JVM Java JVM M JVM M Translate Java-into-JVM-code translator into machine code Two stage Java-into-M compiler M
37
Bootstrapping Full bootstrap Ada-S M C Ada-S M C Ada-S M Ada-S M Ada M
v1 Ada-S M C v1 v2 Ada-S M C Ada-S M Write Ada-S compiler in C Convert the C version of Ada-S into Ada-S version of Ada-S Ada-S M v2 Ada M Ada-S v3 v1 v3 v2 Ada M Ada-S Extend Ada-S compiler to (full) Ada compiler
38
Bootstrapping Half bootstrap P P P Ada TM TM TM HM Ada HM Ada HM Ada
39
Bootstrapping Bootstrap to improve efficiency M P Ada Ada Ms Ada Ms
Mf v2 Ada Mf v2 Ada Mf Ms v2 Ada Ms v1 M P Ada Ada Mf Ms v2
40
Chapter 3: Compilation Phases Passes Case Study: The Triangle Compiler
Syntactic Analysis Contextual Analysis Code Generation Passes Multi-pass Compilation One-pass Compilation Compiler Design Issues Case Study: The Triangle Compiler
41
Phases Syntactic Analysis Contextual Analysis Code Generation
The source program is parsed to check whether it conforms to the source language’s syntax, and to determine its phrase structure Contextual Analysis The parsed program is analyzed to check whether it conforms to the source language's contextual constraints Code Generation The checked program is translated to an object program, in accordance with the semantics of the source and target languages
42
Phases Syntactic Analysis Contextual Analysis Code Generation
Source Program Syntactic Analysis Error Report AST Contextual Analysis Error Report Decorated AST Code Generation Object Program
43
Syntactic Analysis To determine the source program’s phrase structure
Parsing Contextual analysis and code generation must know how the program is composed Commands, expressions, declarations, … Check for conformance to the source language’s syntax Construct suitable representation of its phrase structure (AST) AST Terminal nodes corresponding to identifiers, literals, and operators Sub trees representing the phases of the source program Blanks and comments not in AST (no meaning) Punctuation and brackets not in AST (only separate and enclose)
44
Contextual Analysis Analyzes the parsed program Produces decorated AST
Scope rules Type rules Produces decorated AST AST with information gathered during contextual analysis Each applied occurrence of an identifier is linked ot the corresponding declaration Each expression is decorated by its type T
45
Code Generation The final translation of the checked program to an object program After syntactic and contextual analysis is completed Treatment of identifiers Constants Binds identifier to value Replace each occurrence of identifier with value Variables Binds identifier to some memory address Replace each occurrence of identifier by address Target language Assembly language Machine code
46
Passes Multi-pass compilation One-pass compilation
Traverses the program or AST several times Compiler Driver Syntactic Analyzer Contextual Analyzer Code Generator One-pass compilation Single traverse of program Contextual analysis and code generation are performed ‘on the fly’ during syntactic analysis Compiler Driver Syntactic Analyzer Contextual Analyzer Code Generator
47
Compiler Design Issues
Speed Compiler run time Space Storage: size of compiler + files generated Modularity Multi-pass compiler more modular than one-pass compiler Flexibility Multi-pass compiler is more flexible because it generates an AST that can be traversed in any order by the other phases Semantics-preserving transformations To optimize code – must have multi-pass compiler Source language properties May restrict compiler choice – some language constructs may require multi-pass compilers
48
Chapter 4: Syntactic Analysis
Sub-phases of Syntactic Analysis Grammars Revisited Parsing Abstract Syntax Trees Scanning Case Study: Syntactic Analysis in the Triangle Compiler
49
Structure of a Compiler
Lexical Analyzer Source code tokens Symbol Table Parser & Semantic Analyzer parse tree Intermediate Code Generation intermediate representation Optimization intermediate representation Assembly Code Generation Assembly code
50
Syntactic Analysis Main function
Parse source program to discover its phrase structure Recursive-descent parsing Constructing an AST Scanning to group characters into tokens
51
Sub-phases of Syntactic Analysis
Scanning (or lexical analysis) Source program transformed to a stream of tokens Identifiers Literals Operators Keywords Punctuation Comments and blank spaces discarded Parsing To determine the source programs phrase structure Source program is input as a stream of tokens (from the Scanner) Treats each token as a terminal symbol Representation of phrase structure AST
52
Lexical Analysis – A Simple Example
Scan the file character by character and group characters into words and punctuation (tokens), remove white space and comments Some tokens for this example: main ( ) { int a , b c ; Main() { int a, b, c; char number[5]; /* get user inputs */ A = atoi ( gets(number)); B = atoi (gets(number)); /* calculate value for c */ C = 2*(a+b) + a*(a+b); /* print results */ Printf(“%d”,c); }
53
Creating Tokens – Mini-Triangle Example
let var y: Integer in !new year y := y+1 Buffer Input Converter character string (S = space) l e t S v a r S y : S I n t e g e r S i n Scanner let var Ident. colon Ident. in Ident. becomes Ident. op. Intlit. eot let var y : Integer in y := y + 1
54
Tokens in Triangle // literals, identifiers, operators...
INTLITERAL = 0, "<int>", CHARLITERAL = 1, "<char>", IDENTIFIER = 2, "<identifier>", OPERATOR = 3, "<operator>", // reserved words - must be in alphabetical order... ARRAY = 4, "array", BEGIN = 5, "begin", CONST = 6, "const", DO = 7, "do", ELSE = 8, "else", END = 9, "end", FUNC = 10, "func", IF = 11, "if", IN = 12, "in", LET = 13, "let", OF = 14, "of", PROC = 15, "proc", RECORD = 16, "record", THEN = 17, "then", TYPE = 18, "type", VAR = 19, "var", WHILE = 20, "while", // punctuation... DOT = 21, ".", COLON = 22, ":", SEMICOLON = 23, ";", COMMA = 24, ",", BECOMES = 25, "~", IS = 26, // brackets... LPAREN = 27, "(", RPAREN = 28, ")", LBRACKET = 29, [", RBRACKET = 30, "]", LCURLY = 31, "{", RCURLY = 32, "}", // special tokens... EOT = 33, "", ERROR = 34; "<error>"
55
Grammars Revisited Context free grammars
Generates a set of sentences Each sentence is a string of terminal symbols An unambiguous sentence has a unique phrase structure embodied in its syntax tree Develop parsers from context-free grammars
56
Regular Expressions A regular expression (RE) is a convenient notation for expressing a set of stings of terminal symbols Main features ‘|’ separates alternatives ‘*’ indicates that the previous item may be represented zero or more times ‘(‘ and ‘)’ are grouping parentheses
57
Regular Expression Basics
e The empty string a special string of length 0 Regular expression operations | separates alternatives * indicates that the previous item may be represented zero or more times (repetition) ( and ) are grouping parentheses
58
Regular Expression Basics
Algebraic Properties | is commutative and associative r|s = s|r r|(s|t) = (r|s)|t Concatenation is associative (rs)t = r(st) Concatenation distributes over | r(s|t) = rs|rt (s|t)r = sr|tr e is the identity for concatenation e r = r r e = r * is idempotent r** = r* r* = (r| e)*
59
Regular Expression Basics
Common Extensions r+ one or more of expression r, same as rr* rk k repetitions of r r3 = rrr ~r the characters not in the expression r ~[\t\n] r-z range of characters [0-9a-z] r? Zero or one copy of expression (used for fields of an expression that are optional)
60
Regular Expression Example
Regular Expression for Representing Months Examples of legal inputs January represented as 1 or 01 October represented as 10 First Try: [0|1|e][0-9] Matches all legal inputs? Yes 1, 2, 3, …, 10, 11, 12, 01, 02, …, 09 Matches any illegal inputs? Yes 0, 00, 18
61
Regular Expression Example
Regular Expression for Representing Months Examples of legal inputs January represented as 1 or 01 October represented as 10 Second Try: [1-9]|(0[1-9])|(1[0-2]) Matches all legal inputs? Yes 1, 2, 3, …, 10, 11, 12, 01, 02, …, 09 Matches any illegal inputs? No
62
Regular Expression Example
Regular Expression for Floating Point Numbers Examples of legal inputs 1.0, 0.2, , -1.0, 2.7e8, 1.0E-6 Assume that a 0 is required before numbers less than 1 and does not prevent extra leading zeros, so numbers such as 0011 or are legal Building the regular expression Assume Digit 0|1|2|3|4|5|6|7|8|9 Handle simple decimals such as 1.0, 0.2, Digit+.digit+ Add an optional sign (only minus, no plus) (-| e)digit+.digit+ or -?digit+.digit+
63
Regular Expression Example
Regular Expression for Floating Point Numbers (cont.) Building the regular expression (cont.) Format for the exponent (E|e)(+|-)?(digit+) Adding it as an optional expression to the decimal part (-| e)digit+.digit+((E|e)(+|-)?(digit+))?
64
Extended BNF Extended BNF (EBNF) Combination of BNF and RE
N::=X, where N is a nonterminal symbol and X is an extended RE, i.e., an RE constructed from both terminal and nonterminal symbols EBNF Right hand side may use |. *, (, ) Right hand side may contain both terminal and nonterminal symbols
65
Example EBNF Expression ::= primary-Expression (Operator primary-Expression)* Primary-Expression ::= Identifier | ( Expression ) Identifier ::= a|b|c|d|e Operator ::= +|-|*|/ Generates e a + b a – b – c a + (b * c) a + (b + c) / d a – (b – (c – (d – e)))
66
Grammar Transformations
Left Factorization XY | XZ is equivalent to X(Y | Z) single-Command ::= V-name := Expression | if Expression then single-Command else single-Command single-Command ::= V-name := Expression (e |else single-Command)
67
Grammar Transformations
Elimination of left recursion N::= X | NY is equivalent to N::=X(Y)* Identifier ::= Letter | Identifier Letter | Identifier Digit Identifier ::= Letter | Identifier (Letter | Digit) Identifier ::= Letter(Letter | Digit)*
68
Grammar Transformations
Substitution of nonterminal symbols Given N::=X, we can substitute each occurrence of N with X iff N::=X is nonrecursive and is the only production rule for N single-Command ::= for Control-Variable := Expression To-or-Downto Expression do single-Command | … Control-Variable ::= Identifier To-or-Downto ::= to | down single-Command ::= for Identifier := Expression (to|downto)
69
Scanning (Lexical Analysis)
The purpose of scanning is to recognize tokens in the source program. Or, to group input characters (the source program text) into tokens. Difference between parsing and scanning: Parsing groups terminal symbols, which are tokens, into larger phrases such as expressions and commands and analyzes the tokens for correctness and structure Scanning groups individual characters into tokens
70
Structure of a Compiler
Lexical Analyzer Source code tokens Symbol Table Parser & Semantic Analyzer parse tree Intermediate Code Generation intermediate representation Optimization intermediate representation Assembly Code Generation Assembly code
71
Creating Tokens – Mini-Triangle Example
let var y: Integer in !new year y := y+1 Buffer Input Converter character string (S = space) l e t S v a r S y : S I n t e g e r S i n Scanner let var Ident. colon Ident. in Ident. becomes Ident. op. Intlit. eot let var y : Integer in y := y + 1
72
What Does a Scanner Do? Hand keywords (reserve words)
Recognizes identifiers and keywords Match explicitly Write regular expression for each keyword Identifier is any alpha numeric string which is not a keyword Match as an identifier, perform lookup No special regular expressions for keywords When an identifier is found, perform lookup into preloaded keyword table How does Triangle handle keywords? Discuss in terms of efficiency and ease to code.
73
What Does a Scanner Do? Remove white space Remove comments
Tabs, spaces, new lines Remove comments Single line -- Ada comment Multi-line, start and end delimiters { Pascal comment } /* c comment */ Nested Runaway comments Nonterminated comments can’t be detected till end of file
74
What Does a Scanner Do? Perform look ahead Challenging input languages
Multi-character tokens 1..10 vs. 1.10 &, && <, <= etc Challenging input languages FORTRAN Keywords not reserved Blanks are not a delimiter Example (comma vs. decimal) DO10I=1,5 start of a do loop (equivalent to a C for loop) DO10I=1.5 an assignment statement, assignment to variable DO10I
75
What Does a Scanner Do? Challenging input languages (cont.)
PL/I, keywords not reserved IF THEN THEN THEN = ELSE; ELSE ELSE = THEN;
76
What Does a Scanner Do? Error Handling
Error token passed to parser which reports the error Recovery Delete characters from current token which have been read so far, restart scanning at next unread character Delete the first character of the current lexeme and resume scanning form next character. Examples of lexical errors: 3.25e bad format for a constant Var#1 illegal character Some errors that are not lexical errors Mistyped keywords Begim Mismatched parenthesis Undeclared variables
77
Scanner Implementation
Issues Simpler design – parser doesn’t have to worry about white space, etc. Improve compiler efficiency – allows the construction of a specialized and potentially more efficient processor Compiler portability is enhanced – input alphabet peculiarities and other device-specific anomalies can be restricted to the scanner
78
Scanner Implementation
What are the keywords in Triangle? How are keywords and identifiers implemented in Triangles? Is look ahead implemented in Triangle? If so, how?
79
Structure of a Compiler
Lexical Analyzer Source code tokens Symbol Table Semantic Analyzer Parser parse tree Intermediate Code Generation intermediate representation Optimization intermediate representation Assembly Code Generation Assembly code
80
Parsing Given an unambiguous, context free grammar, parsing is
Recognition of an input string, i.e., deciding whether or not the input string is a sentence of the grammar Parsing of an input string, i.e., recognition of the input string plus determination of its phrase structure. The phrase structure can be represented by a syntax tree, or otherwise. Unambiguous is necessary so that every sentence of the grammar will form exactly one syntax tree.
81
Parsing The syntax of programming language constructs are described by context-free grammars. Advantages of unambiguous, context-free grammars A precise, yet easy-to understand, syntactic specification of the programming language For certain classes of grammars we can automatically construct an efficient parser that determines if a source program is syntactically well formed. Imparts a structure to a programming language that is useful for the translation of source programs into correct object code and for the detection of errors. Easier to add new constructs to the language if the implementation is based on a grammatical description of the language
82
Parsing sequence of tokens parser syntax tree Check the syntax (structure) of a program and create a tree representation of the program Programming languages have non-regular constructs Nesting Recursion Context-free grammars are used to express the syntax for programming languages
83
Context-Free Grammars
Comprised of A set of tokens or terminal symbols A set of non-terminal symbols A set of rules or productions which express the legal relationships between symbols A start or goal symbol Example: expr expr – digit expr expr + digit expr digit digit 0|1|2|…|9 Tokens: -,+,0,1,2,…,9 Non-terminals: expr, digit Start symbol: expr
84
Context-Free Grammars
expr digit 3 2 8 + - expr expr – digit expr expr + digit expr digit digit 0|1|2|…|9 Example input:
85
Checking for Correct Syntax
Given a grammar for a language and a program, how do you know if the syntax of the program is legal? A legal program can be derived from the start symbol of the grammar Grammar must be unambiguous and context-free
86
Deriving a String The derivation begins with the start symbol
At each step of a derivation the right hand side of a grammar rule is used to replace a non-terminal symbol Continue replacing non-terminals until only terminal symbols remain expr expr – digit expr expr + digit expr digit digit 0|1|2|…|9 Example input: Rule 1 Rule 4 Rule 2 expr expr – digit expr – 2 expr + digit - 2 Rule 3 Rule 4 Rule 4 expr digit
87
Rightmost Derivation The rightmost non-terminal is replaced in each step expr expr – digit Rule 1 expr expr – digit expr expr + digit expr digit digit 0|1|2|…|9 Example input: Rule 4 expr – digit expr – 2 Rule 2 expr – 2 expr + digit - 2 Rule 4 expr + digit - 2 expr + 8-2 Rule 3 expr digit + 8-2 Rule 4 digit
88
Leftmost Derivation The leftmost non-terminal is replaced in each step
expr expr – digit Rule 1 expr expr – digit expr expr + digit expr digit digit 0|1|2|…|9 Example input: Rule 2 expr – digit expr + digit – digit Rule 3 expr + digit – digit digit + digit – digit Rule 4 digit + digit – digit 3 + digit – digit Rule 4 3 + digit – digit – digit Rule 4 3 + 8 – digit – 2
89
Leftmost Derivation The leftmost non-terminal is replaced in each step
1 expr expr expr – digit Rule 1 1 Rule 2 2 6 2 expr – digit expr + digit – digit expr - digit Rule 3 3 expr + digit – digit digit + digit – digit 3 5 expr + digit Rule 4 2 4 digit + digit – digit 3 + digit – digit Rule 4 5 3 + digit – digit – digit 4 digit 8 Rule 4 6 3 + 8 – digit – 2 3
90
Bottom-Up Parsing Parser examines terminal symbols of the input string, in order from left to right Reconstructs the syntax tree from the bottom (terminal nodes) up (toward the root node) Bottom-up parsing reduces a string w to the start symbol of the grammar. At each reduction step a particular sub-string matching the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse.
91
Bottom-Up Parsing Types of bottom-up parsing algorithms
Shift-reduce parsing At each reduction step a particular sub-string matching the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse. LR(k) parsing L is for left-to-right scanning of the input, the R is for constructing a right-most derivation in reverse, and the k is for the number of input symbols of look-ahead that are used in making parsing decisions.
92
Bottom-Up Parsing Example 3+8-2
expr expr – digit expr expr + digit expr digit digit 0|1|2|…|9 Example input: 3 + 8 - 2 3 + 8 - 2 digit 3 + 8 - 2 digit 3 + 8 - 2 digit expr
93
Bottom-Up Parsing Example 3+8-2
digit expr 3 + 8 - 2 digit expr expr 3 + 8 - 2 digit
94
Bottom-Up Parsing Example abbcde
S aABe A Abc | b B d Example input: abbcde a b b c d e A a b b c d e Abbcde aAbcde A a b b c d e aAbcde
95
Bottom-Up Parsing Example abbcde
S aABe A Abc | b B d Example input: abbcde A A a b b c d e aAbcde aAde A A a b b c d e aAde
96
Bottom-Up Parsing Example abbcde
S aABe A Abc | b B d Example input: abbcde A B A a b b c d e aAde aABe A B A a b b c d e aABe
97
Bottom-Up Parsing Example abbcde
S aABe A Abc | b B d Example input: abbcde A B A a b b c d e aABe S
98
Bottom-Up Parsing Example the cat sees a rat.
Sentence Subject Verb Object. Subject I | a Noun | the Noun Object me | a Noun | the Noun Noun cat | mat | rat Verb like | is | see | sees Example input: the cat sees a rat the cat sees a rat . Noun . the cat sees a rat the cat sees a rat. the Noun sees a rat. Noun the cat sees a rat . the Noun sees a rat.
99
Bottom-Up Parsing Example the cat sees a rat.
Sentence Subject Verb Object. Subject I | a Noun | the Noun Object me | a Noun | the Noun Noun cat | mat | rat Verb like | is | see | sees Example input: the cat sees a rat Subject Noun the cat sees a rat . the Noun sees a rat. Subject sees a rat. Subject Noun . the cat sees a rat Subject sees a rat.
100
Bottom-Up Parsing Example the cat sees a rat.
Sentence Subject Verb Object. Subject I | a Noun | the Noun Object me | a Noun | the Noun Noun cat | mat | rat Verb like | is | see | sees Example input: the cat sees a rat Subject Noun Verb the cat sees a rat . Subject sees a rat. Subject Verb a rat. Subject Noun Verb the cat sees a rat . Subject Verb a rat.
101
Bottom-Up Parsing Example the cat sees a rat.
Sentence Subject Verb Object. Subject I | a Noun | the Noun Object me | a Noun | the Noun Noun cat | mat | rat Verb like | is | see | sees Example input: the cat sees a rat Subject Noun Verb Noun the cat sees a rat . Subject Verb a rat. Subject Verb a Noun. Subject Noun Verb Noun the cat sees a rat . Subject Verb a Noun.
102
Bottom-Up Parsing Example the cat sees a rat.
Sentence Subject Verb Object. Subject I | a Noun | the Noun Object me | a Noun | the Noun Noun cat | mat | rat Verb like | is | see | sees Example input: the cat sees a rat Subject Object Noun Verb Noun the cat sees a rat . Subject Verb a Noun. Subject Verb Object. What would happened if we choose ‘Subject a Noun’ instead of ‘Object a Noun’? Subject Object Noun Verb Noun the cat sees a rat . Subject Verb Object.
103
Bottom-Up Parsing Example the cat sees a rat.
Sentence Subject Verb Object. Subject I | a Noun | the Noun Object me | a Noun | the Noun Noun cat | mat | rat Verb like | is | see | sees Example input: the cat sees a rat Sentence Subject Object Noun Verb Noun the cat sees a rat . Subject Verb Object.
104
Top-Down Parsing The parser examines the terminal symbols of the input string, in order from left to right. The parser reconstructs its syntax tree from the top (root node) down (towards the terminal nodes). An attempt to find the leftmost derivation for an input string
105
Top-Down Parsers General rules for top-down parsers
Start with just a stub for the root node At each step the parser takes the left most stub If the stub is labeled by terminal symbol t, the parser connects it to the next input terminal symbol, which must be t. (If not, the parser has detected a syntactic error.) If the stub is labeled by nonterminal symbol N, the parser chooses one of the production rules N::= X1…Xn, and grows branches from the node labeled by N to new stubs labeled X1,…, Xn (in order from left to right). Parsing succeeds when and if the whole input string is connected up to the syntax tree.
106
Top-Down Parsing Two forms
Backtracking parsers Guesses which rule to apply, back up, and changes choices if it can not proceed Predictive Parsers Predicts which rule to apply by using look-ahead tokens Backtracking parsers are not very efficient. We will cover Predictive parsers
107
Predictive Parsers Many types LL(1) parsing Recursive decent parsing
First L is scanning the input form left to right; second L is for producing a left-most derivation; 1 is for using one input symbol of look-ahead Table driven with an explicit stack to maintain the parse tree Recursive decent parsing Uses recursive subroutines to traverse the parse tree
108
Predictive Parsers (Lookahead)
Lookahead in predictive parsing The lookahead token (next token in the input) is used to determine which rule should be used next For example: term num term’ term num term’ term’ ‘+’ num term’ | ‘-’ num term’ | e num ‘0’|’1’|’2’|…|’9’ Example input: term num term’ 7 + num term’
109
Predictive Parsers (Lookahead)
term num term’ term num term’ term’ ‘+’ num term’ | ‘-’ num term’ | e num ‘0’|’1’|’2’|…|’9’ Example input: 7 + num term’ 3 term num term’ 7 + num term’ 3 - num term’
110
Predictive Parsers (Lookahead)
term num term’ term num term’ term’ ‘+’ num term’ | ‘-’ num term’ | e num ‘0’|’1’|’2’|…|’9’ Example input: + num term’ 7 3 - num term’ 2 term num term’ + num term’ 7 3 - num term’ 2 e
111
Recursive-Decent Parsing
Top-down parsing algorithm Consists of a group of methods (programs) parseN, one for each nonterminal symbol N of the grammar. The task of each method parseN is to parse a single N-phrase These parsing methods cooperate to parse complete sentences
112
Recursive-Decent Parsing
Sentence Subject Verb Object. Subject I | a Noun | the Noun Object me | a Noun | the Noun Noun cat | mat | rat Verb like | is | see | sees Example input: the cat sees a rat Sentence Subject Verb Object . the cat sees a rat . Decide which production rule to apply. Only one, #1. This step created four stubs.
113
Recursive-Decent Parsing
Sentence Subject Verb Object. Subject I | a Noun | the Noun Object me | a Noun | the Noun Noun cat | mat | rat Verb like | is | see | sees Example input: the cat sees a rat Sentence Subject Verb Object . Noun the cat sees a rat
114
Recursive-Decent Parsing
Sentence Subject Verb Object. Subject I | a Noun | the Noun Object me | a Noun | the Noun Noun cat | mat | rat Verb like | is | see | sees Example input: the cat sees a rat Sentence Subject Verb Object . Noun the cat sees a rat
115
Recursive-Decent Parsing
Sentence Subject Verb Object. Subject I | a Noun | the Noun Object me | a Noun | the Noun Noun cat | mat | rat Verb like | is | see | sees Example input: the cat sees a rat Sentence Subject Verb Object . Noun the cat sees a rat
116
Recursive-Decent Parsing
Sentence Subject Verb Object. Subject I | a Noun | the Noun Object me | a Noun | the Noun Noun cat | mat | rat Verb like | is | see | sees Example input: the cat sees a rat Sentence Subject Verb Object . Noun Noun the cat sees a rat
117
Recursive-Decent Parsing
Sentence Subject Verb Object. Subject I | a Noun | the Noun Object me | a Noun | the Noun Noun cat | mat | rat Verb like | is | see | sees Example input: the cat sees a rat Sentence Subject Verb Object . Noun Noun the cat sees a rat
118
Recursive-Decent Parsing
Sentence Subject Verb Object. Subject I | a Noun | the Noun Object me | a Noun | the Noun Noun cat | mat | rat Verb like | is | see | sees Example input: the cat sees a rat Sentence Subject Verb Object . Noun Noun the cat sees a rat
119
Recursive-Descent Parser for Micro-English
Sentence Subject Verb Object. Subject I | a Noun | the Noun Object me | a Noun | the Noun Noun cat | mat | rat Verb like | is | see | sees ParseSentence ParseSubject ParseObject ParseVerb ParseNoun
120
Recursive-Descent Parser for Micro-English
Sentence Subject Verb Object. Subject I | a Noun | the Noun Object me | a Noun | the Noun Noun cat | mat | rat Verb like | is | see | sees ParseSentence parseSubject parseVerb parseObject parseEnd Sentence Subject Verb Object .
121
Recursive-Descent Parser for Micro-English
Sentence Subject Verb Object. Subject I | a Noun | the Noun Object me | a Noun | the Noun Noun cat | mat | rat Verb like | is | see | sees ParseSubject if input = “I” accept else if input =“a” parseNoun else if input = “the” else error Subject I | a Noun | the Noun
122
Recursive-Descent Parser for Micro-English
Sentence Subject Verb Object. Subject I | a Noun | the Noun Object me | a Noun | the Noun Noun cat | mat | rat Verb like | is | see | sees ParseNoun if input = “cat” accept else if input =“mat” else if input = “rat” else error Noun cat | mat | rat
123
Recursive-Descent Parser for Micro-English
Object Sentence Subject Verb Object. Subject I | a Noun | the Noun Object me | a Noun | the Noun Noun cat | mat | rat Verb like | is | see | sees ParseObject if input = “me” accept else if input =“a” parseNoun else if input = “the” else error me | a Noun | the Noun
124
Recursive-Descent Parser for Micro-English
ParseVerb if input = “like” accept else if input =“is” else if input = “see” else if input = “sees” else error Verb Sentence Subject Verb Object. Subject I | a Noun | the Noun Object me | a Noun | the Noun Noun cat | mat | rat Verb like | is | see | sees like | is | see | sees
125
Recursive-Descent Parser for Micro-English
ParseEnd if input = “.” accept else error Sentence Subject Verb Object. Subject I | a Noun | the Noun Object me | a Noun | the Noun Noun cat | mat | rat Verb like | is | see | sees .
126
Systematic Development of a Recursive-Descent Parser
Given a (suitable) context-free grammar Express the grammar in EBNF, with a single production rule for each nonterminal symbol, and perform any necessary grammar transformations Always eliminate left recursion Always left-factorize whenever possible Transcribe each EBNF production rule N::=X to a parsing method parseN, whose body is determined by X Make the parser consist of: A private variable currentToken; Private parsing methods developed in previous step Private auxiliary methods accept and acceptIt, both of which call the scanner A public parse method that calls parseS, where S is the start symbol of the grammar), having first called the scanner to store the first input token in currentToken
127
Quote of the Week “C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows away your whole leg.” Bjarne Stroustrup
128
Quote of the Week Did you really say that? Dr. Bjarne Stroustrup:
Dr. Bjarne Stroustrup: Yes, I did say something along the lines of C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows your whole leg off. What people tend to miss is that what I said about C++ is to a varying extent true for all powerful languages. As you protect people from simple dangers, they get themselves into new and less obvious problems. Someone who avoids the simple problems may simply be heading for a not-so-simple one. One problem with very supporting and protective environments is that the hard problems may be discovered too late or be too hard to remedy once discovered. Also, a rare problem is harder to find than a frequent one because you don't suspect it. I also said, "Within C++, there is a much smaller and cleaner language struggling to get out." For example, that quote can be found on page 207 of The Design and Evolution of C++. And no, that smaller and cleaner language is not Java or C#. The quote occurs in a section entitled "Beyond Files and Syntax". I was pointing out that the C++ semantics is much cleaner than its syntax. I was thinking of programming styles, libraries and programming environments that emphasized the cleaner and more effective practices over archaic uses focused on the low-level aspects of C.
129
Converting EBNF Production Rules to Parsing Methods
For production rule N::=X Convert production rule to parsing method named parseN Private void parseN () { Parse X } Refine parseE to a dummy statement Refine parse t (where t is a terminal symbol) to accept(t) or acceptIt() Refine parse N (where N is a non terminal symbol) to a call of the corresponding parsing method parseN() Refine parse X Y to { parseX parseY }} Refine parse X|Y Switch (currentToken.kind) { Cases in starter[[X]] Break; Cases in starters[[Y]]: Parse Y Break Default: Report a syntax error
130
Converting EBNF Production Rules to Parsing Methods
For X | Y Choose parse X only if the current token is one that can start an X-phrase Choose parse Y only if the current token is one that can start an Y-phrase starters[[X]] and starters[[Y]] must be disjoint For X* Choose while (currentToken.kind is in starters[[X]]) starter[[X]] must be disjoint from the set of tokens that can follow X* in this particular context
131
Converting EBNF Production Rules to Parsing Methods
A grammar that satisfies both these conditions is called an LL(1) grammar Recursive-descent parsing is suitable only for LL(1) grammars
132
Error Repair Good programming languages are designed with a relatively large “distance” between syntactically correct programs, to increase the likelihood that conceptual mistakes are caught on syntactic errors. Error repair usually occurs at two levels: Local: repairs mistakes with little global import, such as missing semicolons and undeclared variables. Scope: repairs the program text so that scopes are correct. Errors of this kind include unbalanced parentheses and begin/end blocks.
133
Error Repair Repair actions can be divided into insertions and deletions. Typically the compiler will use some look ahead and backtracking in attempting to make progress in the parse. There is great variation among compilers, though some languages (PL/C) carry a tradition of good error repair. Goals of error repair are: No input should cause the compiler to collapse Illegal constructs are flagged Frequently occurring errors are repaired gracefully Minimal stuttering or cascading of errors. LL-Style parsing lends itself well to error repair, since the compiler uses the grammar’s rules to predict what should occur next in the input
134
Mini-Triangle Production Rules
Program ::= Command Program (1.14) Command ::= V-name := Expression AssignCommand (1.15a) | Identifier ( Expression ) CallCommand (1.15b) | Command ; Command SequentialCommand (1.15c) | if Expression then Command IfCommand (15.d) else Command | while Expression do Command WhileCommand (1.15e | let Declaration in Command LetCommand (1.15f) Expression ::= Integer-Literal IntegerExpression (1.16a) | V-name VnameExpression (1.16b) | Operator Expression UnaryExpression (1.16c) | Expression Operator Expression BinaryExpressioiun (1.16d) V-name ::= Identifier SimpelVname (1.17) Declaration ::= const Identifier ~ Expression ConstDeclaration (1.18a) | var Identifier : Typoe-denoter VarDeclaration (1.18b) | Declaration ; Declaration SequentialDeclaration (1.18c) Type-denoter ::= Identifier SimpleTypeDenoter (1.19)
135
Abstract Syntax Trees An explicit representation of the source program’s phrase structure AST for Mini-Triangle
136
Abstract Syntax Trees Program ASTs (P): Program C Command ASTs (C):
Program ::= Command Program (1.14 Program C Command ASTs (C): AssignCommand CallCommand SequentialCommand V E Identifier E C1 C2 (1.15a) (1.15b) (1.15c) spelling Command ::= V-name := Expression AssignCommand (1.15a) | Identifier ( Expression ) CallCommand (1.15b) | Command ; Command SequentialCommand (1.15c)
137
Abstract Syntax Trees Command ASTs (C): WhileCommand LetCommand
SequentialCommand V E D C E C1 C2 (1.15e) (1.15f) (1.15d) Command ::= | if Expression then Command IfCommand (15.d) else Command | while Expression do Command WhileCommand (1.15e | let Declaration in Command LetCommand (1.15f)
138
Midterm Review: Chapter 1
Context-free Grammar A finite set of terminal symbols A finite set of non-terminal symbols A start symbol A finite se to production rules Aspects of a programming language that need to be specified Syntax: form of programs Contextual constraints: scope rules and type variables Semantics: meaning of programs
139
Midterm Review: Chapter 1
Language specification Informal: written in English Formal: precise notation (BNF, EBNF) Unambiguous Consistent Complete Context-free language Syntax tree Phrase Sentence
140
Midterm Review: Chapter 1
Syntax tree Terminal node labeled by terminal symbol Non-terminal nodes labeled b y non-terminal symbol Abstract Syntax Tree (AST) Each non-terminal node ius labeled by production rule Each non-terminal node has exactly one subtree for each subprogram Does not generate sentences
141
Midterm Review: Chapter 2
Translator Accepts any text expressed in one language (source language) and generates a semantically-equivalent text expressed in another language (target language) Compiler Translates from high-level language into low-level language Interpreter A program that accepts any program (source program) expressed in a particular language (source language) and runs that source program immediately
142
Midterm Review: Chapter 2
Interpretive compiler Combination of compiler and interpreter Some of the advantages of each Portable compiler Compiled and run on any mainline, without change Portability measured by proportion of code that remains unchanged Portability is an economic issue Bootstrapping Using the language processor to process itself Tombstone diagrams
143
Midterm Review: Chapter 3
Three phases of compilation Syntactic analysis Contextual analysis Code generation Single pass compilers Multi-pass compilers Compiler design issues Speed Space Modularity Flexibility Semantic preserving transformations Source language properties
144
Midterm Review: Chapter 4
Sub-phases of syntactic analysis Scanning (lexical analysis) Source program transformed to stream of tokens Comments and blank spaces between tokens are discarded Parsing Source program in form of stream of tokens parsed to determine phrase structure Parser treats each token as a terminal symbol Representation of the phrase structure A data structure representing the source program’s phrase structure Typically an abstract syntax tree (AST)
145
Midterm Review: Chapter 4
Tokens An atomic symbol of the source program May consist of several characters Classified according to kind All tokens of the same kind can be freely interchanged without affecting the program’s phrase structure Each token completely described by it’s kind and spelling Token represented by tuple Only kind of each token examined by parser Spelling examined by contextual analyzer and/or code generator
146
Midterm Review: Chapter 4
Grammars Regular expressions “|” separates alternatives “*” indicates that the previous item may be repeated zero or more times “(“ and “)” are grouping parenthesis e is the empty string a special string of length 0 Algebraic properties Common extensions Grammar transformations Left factorization Elimination of left recursion Substitution of non-terminal symbols
147
Midterm Review: Chapter 4
Structure of compiler Source code Lexical analyzer Parser & semantic analyzer Intermediate code generation Optimization Assembly code generation Assembly code
148
Midterm Review: Chapter 4
Scanning (lexical analysis) What does it do? Handles keywords (reserve words Removes white space (tabs, spaces, new lines) Removes comments Perform look ahead Error handling Issues Simpler design Improve compiler efficiency Enhance compiler portability
149
Midterm Review: Chapter 4
Parsing Given an unambiguous, context-free grammar Recognition of input string – sentence in grammar Parsing an input string – determines its phrase structure Why is unambiguous important? Advantages of unambiguous, context-free grammars (see chart 81) How do you know the syntax of a language is legal? A legal program can be derived from the start symbol of the grammar
150
Midterm Review: Chapter 4
Parsing Rightmost (replace rightmost non-terminal in each step) and leftmost (replaced leftmost non-terminal in each step) derivation Bottom-up (reconstructs syntax tree from terminal nodes up toward the root node) and top-down (reconstructs syntax tree from the root node down towards the terminal nodes) Predictive parsers LL(1) Recursive decent
151
Midterm Review: Chapter 4
Parsing Converting EBNF production rules to parsing methods Error repair
152
Chapter 5: Contextual Analysis
Identification Monolithic Block Structure Flat Block Structure Nested Block Structure Attributes Standard Environment Type Checking A Contextual Analysis Algorithm Case Study: Contextual Analysis in the Triangle Compiler
153
Contextual Analysis Given a parsed program, the purpose of contextual analysis is to check that the program conforms to the source language’s contextual constraints. Scope rules: rules governing declarations and applied occurrences of identifiers Type rules: rules that allow us t0 infer the types of expressions, and to decide whether each expression has a valid type Analysis of the program to determine correctness with respect to the language definition (beyond structure)
154
Contextual Analysis Contextual analysis consists of two sub-phases:
Identification: applying the source language’s scope rules to relate each applied occurrence of an identifier to its declaration (if any). Type checking: applying the source language's type rules to infer the type of each expression, and compare that type with the expected type.
155
Structure of a Compiler
Lexical Analyzer Source code tokens Symbol Table Semantic Analyzer Parser parse tree Intermediate Code Generation Semantic Analyzer Identification Type checking intermediate representation Optimization intermediate representation Assembly Code Generation Assembly code
156
Identification Relate each applied occurrence of an identifier in the source program to the corresponding declaration Ill-formed program if no corresponding declaration – generate error Identification could cause compiler efficiency problems Inefficient to use the AST
157
Identification Table Also known as symbol table
Associates identifiers with their attributes Basic operation Make the identification table empty Add an entry associating a given identifier with a given attribute Retrieve the attribute (if any) associated with a given identifier Attribute Consists of information relevant to contextual analysis Obtained from the identifier’s declaration
158
Identification Table Each declaration in a program has a defined scope
Portion of program over which the declaration takes effect Block: any program phase that delimits the scope of declarations within it Example Triangle block command Let D in C Scope of each declaration in D extends over the subcommand C
159
Identification Table: Structure/Implementation
Maintain scope An identifier should be found in the table only when valid If an identifier is defined in multiple scopes, then a lookup in the table must provide the appropriate meaning for the use Efficiency How fast is lookup? How fast to enter/exit a scope? What is the overall table size?
160
Identification Table: Structure/Implementation
Different implementations Organized for efficient retrieval Binary search tree Hash table
161
Identification Table: Functionality
A mapping of identifiers to their meanings Information Name Type Location Operations Create Insert Lookup Delete Update entry Entering a new scope Leaving a scope
162
Block Structures Monolithic block structure Flat block structure
Basic and Cobol Flat block structure Fortran Nested block structure Pascal, Ada, C, and Java
163
Monolithic Block Structure
The only block is the entire program All declarations are global Simple rules No identifier may be declared more than once For every applied occurrence of an identifier I, there must be a corresponding declaration of I No identifier may be used unless declared The identification table should contain entries for all declarations in the source program At most, one entry for each identifier The table contains an identifier I and the associated attribute A
164
Monolithic Block Structure
Program integer b = 10 integer n char C begin … n = n * b Write c end Identification Attribute b n c (1) (2) (3) Create new table create command At declaration for identifier I, make table entry insert command At applied occurrence of identifier I, retrieve information from table lookup command
165
Flat Block Structure Program partitioned into several disjoint blocks
Two scope levels Some declarations are local in scope Identifiers restricted to particular block Other declarations are global in scope Identifiers allowed anywhere in the program – the program as a whole is a block Less simple rules No global declared identifier may be redeclared globally But same identifier may also be declared locally No locally declared identifier may be redeclared in the same block Same identifier may be declared locally in several different blocks For every applied occurrence of an identifier I in a block B, there must be a corresponding declaration of I Either global declaration of I or a declaration of I local to B Minor complication is to distinguish global and local declaration entries
166
Flat Block Structure Identification Attribute Q r pi (1) (2) (3) Level
global local (5) integer c begin … end (4) procedure R (2) real r (3) real pi = 3.14 (1) procedure Q (6) integer i (7) boolean b (8)char c call R program Create new table create command At start of a block enter new scope command At end of a block leave scope command delete command At declaration for identifier I, make table entry insert command At applied occurrence of identifier I, retrieve information from table lookup command Identification Attribute Q R c (1) (4) (5) Level global local Identification Attribute Q R i (1) (4) (6) Level global local Identification Attribute Q R (1) (4) Level global local b (7) local c (8)
167
Nested Block Structure
Blocks may be nested one within another Many scope levels Declarations in the outermost block are global in scope. The outermost block is at scope level 1 Declarations inside an inner block are local to that block Every inner block is completely enclosed by another block Next to outermost block is at scope level 2 If enclosed by a level-n, the block is at scope level n+1
168
Nested Block Structure
More complex rules No identifier may be declared more than once in the same block Same identifier may be declared in different blocks, even if they are nested For every applied occurrence of an identifier I in a block B, there must be a corresponding declaration of I Must be in B itself Or in the block B’ immediately enclosing B Or in B’’ immediately enclosing B’ Etc. In smallest enclosing block that contains any declaration of I
169
Nested Block Structure
Create new table create command At start of a block enter new scope command At end of a block leave scope command delete command At declaration for identifier I, make table entry insert command Level number determined by number of calls to enter new scope At applied occurrence of identifier I, retrieve information from table using highest level for I lookup command Identification Attribute a b (1) (2) Level 1 Let (1) var a: Integer; (2) var b: Boolean In begin …; let (3) var b: Integer; (4) var c: Boolean In begin …; Identification Attribute a b (1) (2) (3) Level 1 2 c (4) let (5) var d: Integer; In …; … end; Identification Attribute a b (1) (2) (3) Level 1 2 3 c d (4) (5) Identification Attribute a b d (1) (2) (6) Level 1 2 e (7) … let (6) var d: Boolean; (7) Var e: Integer in …; … end
170
Attributes Examples Kind Type constant variable procedure function
boolean character integer record array
171
Attributes Information to be extracted from declaration
Constant, variable, procedure, function, type Procedure or function declaration includes a list of formal parameters that may be a constant, variable, procedural, or functional parameter Language provides whole families of record and array types How to manage attribute information Extract type information from declarations and store in information table Could be complex for a realistic programming language Could require tedious programming Use the AST Pointers in information table pointing to location in AST with that identifier
172
Attributes (1) (2) (6) (7) Identification Attribute a b
Program LetCommand SequentialDeclaration SequentialCommand (1) (2) VarDeclaration VarDeclaration SequentialCommand . . . Ident. int Ident. bool . . . LetCommand a b SequentialDeclaration . . . (6) VarDeclaration (7) VarDeclaration Ident. bool Ident. int d e Identification Attribute a b Level 1 Identification Attribute a b d Level 1 2 e 2
173
Standard Environment Predefined constants, variables, types, procedures, and functions These are loaded into the identification table Scope rules for standard environment Scope enclosing the entire program Level 0 Same scope level as global declarations Example is C
174
Structure of a Compiler
Lexical Analyzer Source code tokens Symbol Table Semantic Analyzer Parser parse tree Intermediate Code Generation Semantic Analyzer Identification Type checking intermediate representation Optimization intermediate representation Assembly Code Generation Assembly code
175
Type Checking Second task of contextual analyzer is to ensure that the source program contains on type errors Once applied occurrence of an identifier has been identified, the contextual analyzer will check that the identifier is used in a way consistent with its declaration
176
Type Checking Statically –typed language can detect any type errors without actually running the program For every expression E in the language, the compiler can infer either that E has some type T or that E is ill-typed If E does have type T, then E will always yield a value of type T If a value of type T’ is expected, then compiler checks that T’ is equivalent to T
177
Type Checking Infers the type of each expression bottom-up
Starting with literals and identifiers, and working up through larger and larger subexpressions Literal: The type of a literal is immediately known Identifier: The type of an applied occurrence of identifier I is obtained from the corresponding declaration of I Unary operator application: Consider “O E” where O is a unary operator of type T1 T2 Type checker ensures that E’s type is equivalent to T1 Infers that type of “O E” is T2. Otherwise a type error Binary operator application: Consider “E1 O E2” where O is binary operator of type T1 X T2 T3 E1’s type is equivalent to T1 E2’s type is equivalent to T2 ‘E1 O E2‘ is of type T3 Otherwise type error
178
Type Checking Type of a nontrivial expression is inferred from the types of its sub-expressions, using the appropriate type rules Must be able to test if two given types T and T’ are equivalent
179
Type Checking – Constant or Variable Identifier
ConstDeclaration Ident. Expr. x . . . :T SimpelVname Ident. x ConstDeclaration Ident. Expr. x . . . :T SimpelVname
180
Type Checking – Variable Declaration
SimpelVname Ident. x VarDeclaration Ident. T x VarDeclaration SimpelVname :T Ident. Ident. T x x
181
Type Checking – Binary Operator
BinaryExpression Ident. . . . Expr. Op. :int < BinaryExpression :bool Ident. Op. Expr. :int :int . . . < . . . < is of type Int X int bool
182
Type Checking Each applied occurrence of an identifier must be identified before type checking can proceed
183
Chapter 6: Run-time Organization
Marshal the resources of the target machine (instructions, storage, and system software) in order to implement the source language
184
Chapter 6: Run-time Organization
Data Representation How should we represent the values of each source-language type in the target machine? Expression Evaluation How should we organize the evaluation of expressions, taking care of intermediate results? Static Storage Allocation How should we organize storage for variables, taking into account the different lifetimes of global, local, and heap variables? Stack Storage Allocation Routines How should we implement procedures, functions, and parameters, in terms of low-level routines? Heap Storage Allocation Run-time Organization for Object-oriented Languages How should we represent objects and methods? Case Study: The Abstract Machine TAM Requirements are defined first: Customer needs are analyzed and requirements are developed for the entire system prior to any of the remaining DLM activities being initiated. Multiple internal development cycles: Multiple subprojects implementing a subset of the system are planned and conducted as part of the overall project. Multiple customer deliveries: Multiple deliveries (I.e., builds) of the system are provided to the customer at interim points in the project. Only the finial build typically contains the total functionality of the system. Functional content primary driver: Refers to the primary kind of information that determines the functionality of the system. Process primary driver: Refers to the primary process related issues that has heavy influence on selecting the most appropriate DLM.
185
Data Representation High-Level Data Types Machine Data Types
How should we represent the values of each source-language type in the target machine? High-Level Data Types Truth values Integers Characters Records Arrays Operations over these types Machine Data Types Bits Bytes Words Double-words Low-level arithmetic and logical operations Need to bridge the semantic gap between high-level types and machine level types
186
Data Representation -- Fundamental Principles
Non-confusion Different values of a given type should have different representations If two different values are confused, i.e., have the same representation, then comparison of these values will incorrectly treat the values as equal Example: approximate representation of real numbers Real numbers that are slightly different mathematically might have the same approximate representation Difficult to avoid – need to take care during compiler design Must avoid confusion in the representations of discrete types such as truth values, characters, and integers For statically typed languages need only be concedrned with values of the same type 00…002 may represent false, the integer 0, the real number 0.0 Compile time type checks will denote the values of different types
187
Data Representation -- Fundamental Principles
Uniqueness Each value should always have the same representation Example of non-uniqueness Ones-complement representation of integers in which zero is represented both by and 11…112 (+0 and –0) A simple bit-string co0parison would incorrectly treat these values as unequal More specialized integer comparison must be used Alternative twos-complement representation gives us unique representations of integers
188
Data Representation – Pragmatic Issues
Constant-size representation The representations of all values of a given type should occupy the same amount of space Make possible for compiler to plan the allocation of storage Knowing the type of variable but not the actual value, the compiler will know exactly how much storage space the variable will occupy
189
Data Representation – Pragmatic Issues
Direct representation vs. indirect representation Should the values of a given type be represented directly, or indirectly through pointers? Direct representation Just the binary representation of the value consisting of one or more bits, bytes, words Indirect representation A handle that points to the storage area which has the binary representation of the value Essential for types whose values vary greatly in size List or dynamic array
190
Direct representation vs. indirect representation
x y handle Same type as x but requiring more space
191
Notation #T: cardinality of type T
Number of distinct values of type T #[[Boolean]] = 2 Size T: amount of space (in bits, bytes, or words) occupied by each value of type T For indirect representation only handle is counted For direct representation of type T size T log2 (#T) or 2(size T) #T size T is represented in bits In n bits we can represent at most 2n distinct values if we are to avoid confusion non-confusion requirement
192
Primitive Types Cannot be decomposed into simpler values
Most programming languages provide these primitive types Boolean, Char, Integer Also provide elementary logical and arithmetic operations Machines typically support the above primitive types, so choice of representation is straightforward
193
Primitive Types Representation
Boolean true and false Since #[[Boolean]] = 2 then size[[Boolean]] 1 bit Can represent Boolean with one bit, one bye, or one word For single bit: 0 for false and 1 for true For byte or word: 00…002 for false and either 00…012 or 11…112 for true Negation, conjunction, disjunction NOT, AND, OR
194
Primitive Types Representation
Char Source language can specify character set Ada: ISO-Latin1 character set (28 distinct characters) Java: Unicode character set (216 distinct characters) Most do not Allows compiler writers to choose the machine’s native character set (27 or 28 distinct characters) ISO defines character representation for “A” to be Can represent a character by one byte or one word
195
Primitive Types Representation
Integer Denotes an implementation-defined bounded range of integers Defined by the individual language processor Binary representation determined by target machine’s arithmetic unit and almost always occupies one word Can implement language’s integer operations with machine's integer operations Pascal and Triangle -maxint, …, -1, 0, +1, …, +maxint maxint is implementation defined #[[Integer]] = 2 X maxint + 1 2size[[Integer]] 2 X maxint + 1 For word size of w bits, size[[Integer]] = w, maxint = 2w-1 – 1 Java Int denotes –231, …, -1, 0, +1, …, +231 – 1 #[[Int]] = 232
196
Record Type Consists of several fields, each of which has an identifier All records of a particular type have fields with the same identifiers and types Fundamental operation on records is field selection Use one field identifier to access the corresponding field Simple representation Juxtapose the fields to make them occupy consecutive positions in storage Allows us to predict total sized of each record and the position of each field relative to the base of the record
197
Record Type Consider the following size T = size T1 + … + size Tn
type T = record I1: T1, …, In: Tn end; var r: T size T = size T1 + … + size Tn If size T1, .., and size Tn are all constant, then size T is also constant Implementation of field selection Address[[r.Ii]] = address r + (size T1 + … + size Ti-1) Some machines have alignment restrictions, which force unused space to be left between record fields; cannot use these equations r.I1 Value of type T1 r.I2 Value of type T2 … … r.In Value of type Tn
198
Disjoint Unions Tag and a variant part
Value of tag determines type of variant part T = T1 + … + Tn In each value of type T, the variant part is a value chosen from one of the types T1, …, or Tn; the tag indicates which one Size T = size Ttag + max(sizeT1, …, size Tn) Address[[u.Itag]] = address u + 0 Address[[u.Ii]] = address u + size Ttag value of type Ttag Will have wasted space u.Itag u.Itag u.Itag value of type T2 value of type T1 value of type Tn u.I1 u.I2 or … or … u.In Max(sizeT1,…,sizeTn) Wasted space
199
Static Arrays Consists of several elements, all of the same type
Bounded range of indices – usually integers Each index has exactly one element Fundamental operation on arrays is indexing Access an individual element by giving its index Index evaluated at run-time Static Array Index bounds are known at compile-time Direct representation is to juxtapose the array elements, in order of increasing indices. Implemented by run-time address computation
200
Static Arrays (lower index bound is 0)
Consider the following example Type T = array n of Telem; Var a: T Size T = n X size Telem The number of elements n is constant, so size Telem is constant, then size T is also constant Address[[a[i] ]] = address a + (i X size Telem) Since i is known only at run-time, an array indexing implies a run-time address computation a[0] a[1] a[2] values of type Telem a[n-1]
201
Static Arrays (programmer chooses lower and upper array bounds)
Consider the following example Type T = array [l..u] of Telem; Var a: T size T = (u - l + 1) X size Telem The number of elements (u – l + 1) is constant, so size Telem is constant, then size T is also constant address[[a[i] ]] = address a + (i – l) X size Telem) = address a – (l X size Telem) + (i X size Telem) Address[[a[0] ]] = address a – (l X size Telem) Address[[a[i] ]] = address[[a[0] ]] + (i X size Telem) Since i is known only at run-time, an array indexing implies a run-time address computation Index check must ensure that l i u a[l] a[l+1] a[l+2] values of type Telem a[u]
202
Dynamic Arrays An array whose index bounds are not know until run-time
Different dynamic arrays of the same type may have different index bounds, and therefore different numbers of elements Need to satisfy constant-size requirement Create array descriptor or handle Pointer to the array’s elements Index bounds Handle has constant size
203
Dynamic Arrays Ada example size T = address:size + 2 X size[[Integer]]
Type T is array [Integer range <>) of Telem; a: T (E1 .. E2); size T = address:size + 2 X size[[Integer]] Address:size is the amount of space required to store an address – usually one word. Satisfies constant-size requirement Declaration of array variable a: E1 and E2 are evaluated to yield a’s index bounds (say l and u) Space is allocated for (u – l + 1) elements, juxtaposed and separate from a’s handle Address[[a(0)]] = address[[a(l)]] – (l X size Telem) Values for address[[a(0)]], l, and u are stored in a’s handle The element with index i will be address as follows: Address[[a(i)]] = address[[a(0)]] + (i X size Telem) = content(address[[a]]) + (i X size Telem) Index check is l i u where l = content(address[[a]] + address:size) and u = content(address[[a]]+ address:size + size[[Integer]]
204
Dynamic Arrays a[l] a[l+1] a[l+2] origin a[0] a lower bound l
upper bound u handle a[u] elements of type Telem
205
Status Chapter 6: Run-time Organization Data Representations
Primitive types Record types Disjoint unions Static arrays Dynamic arrays Recursive types Expression Evaluation Register machine Stack machine Static Storage Allocation Global variables Stack Storage Allocation Local variables
206
Recursive Types Defined in terms of itself
Values of recursive type T have components that are themselves of type T Examples List with tail being itself a list Tree with the sub-trees themselves being trees
207
Recursive Types Consider the Pascal declaration
type IntList = ^IntNode; IntNode = record head: Integer; tail: IntList end; var primes: IntList Size[[IntList]] = address:size (usually 1 word) primes handle Always use pointers to represent values of the recursive type
208
Expression Evaluation Register Machine
How should we organize the evaluation of expressions The problem is the need to keep intermediate results somewhere Consider the expression a * b + (1 – (c * 2)) Will have intermediate results for a * b, c * 2, and 1 – (c * 2) For a register based machine (non-stack machine) Use the registers to store intermediate results Problem arises when there are not enough registers for all intermediate results
209
Expression Evaluation Example a * b + (1 – (c * 2))
LOAD R1 a MULT R1 b LOAD R2 #1 LOAD R3 c MULT R3 #2 SUB R2 R3 ADD R1 R2 a, b, c are memory addresses for the values of a, b, c
210
Expression Evaluation Stack Machine
The machine provides a stack for holding intermediate results For the expression a * b + (1 – (c * 2)) LOAD a LOAD b MULT LOADL 1 LOAD c LOADL 2 SUB ADD
211
Expression Evaluation Stack Machine Example a * b + (1 – (c * 2))
(1) After LOAD a (2) After LOAD b (3) After MULT (4) After LOAD 1 value of a value of a value of a*b value of a*b value of b 1 unused space (5) After LOAD c (6) After LOAD 2 (7) After MULT (8) After SUB value of a*b value of a*b value of a*b value of a*b 1 1 1 value of 1-(c*2) value of c value of c value of c*2 2 (9) After ADD value of (a*b)+(1-(c*2)) Operands of different types (and therefore different sizes) can be evaluated in just the same way. E.g., AND, OR, function, etc. Each operation takes values from top of stack and places results onto top of stack
212
Static Storage Allocation Global Variables
Each variable in source program requires enough storage to contain any value that might be assigned to it As a consequence of constant-size representation, the compiler knows how much storage needs to be allocated to variable, based on type of variable (size T) Global variables Variables that exist and take up storage throughout the program’s run-time. Static storage allocation: Compiler locates these variables at some fixed positions in storage (decides each global variable’s address relative to the base of the storage region in which global variables are located)
213
Static Storage Allocation Global Variables: Example
let type Date = record y: Integer, m: Integer; d: Integer end; var a: array 3 of Integer; var b: Boolean; var c: Char; var t: Date in . . . a(0) a(1) a(2) b c t.y t.m t.d unused space a t
214
Stack Storage Allocation Local Variables
A local variable v is one that is declared inside a procedure (or function). Lifetime of v: the variable v exists (occupies storage) only during an activation of that procedure If same procedure is activated several times v will have several lifetimes Each activation creates a distinct variable
215
Stack Storage Allocation Local Variables: An Example
let var a: array 3 of Integer; var b: Boolean; var c: Char; proc Y () ~ var d: Integer; var e: record c: Char, n: Integer end in . . . proc Z () ~ var f: Integer begin …; Y(); … end begin …; Y(); …; Z(); … end
216
Stack Storage Allocation Local Variables: An Example
time Program calls Y Return from Y calls Z Z calls Y from Z stops Lifetime of variables local to Y Lifetime of variables local to Z Lifetime of global variables Observations: Global variables are the only ones that exist throughout the program’s run-time Use static allocation for global variables Lifetimes of local variables are properly nested Use a stack for local variables
217
Stack Storage Allocation Stack Frames: An Example
(1) After program starts (2) After program calls Y (3) After return from Y (4) After program calls Z SB SB SB SB globals globals globals globals ST LB ST LB frame for Z frame for Y ST ST (5) After Z calls Y (6) After return from Y (7) After return from Z SB SB SB dynamic links globals globals globals LB ST frame for Z frame for Z Registers SB: Stack Base – Location of global variables LB: Local Base – Local variables of currently running procedure ST: Stack Top – Very top of stack LB frame for Y ST ST
218
Stack Storage Allocation
The stack varies in size For example, the frames for each of Y’s activation are at two different locations The position of a frame within a stack cannot be predicted in advance Need registers dedicated to point to the frames Registers (find address of variables relative to these registers) SB: stack base – is fixed, pointing to the base of the stack. This is where the global variables are located. LB: local base – points to the base of the topmost frame in the stack. This frame always contains the variables of the currently running procedure. ST: stack top – points to the very top of the stack. ST keeps track of the frame boundary as expressions are evaluated and the top of the stack expands and contracts.
219
Stack Storage Allocation
Frame contents Space for local variables Link data Return address – code address to which control will be returned at the end of the procedure activation. It is the address of the instruction following the call instruction that activated the procedure in the first place. Dynamic link – the pointer to the base of the underlying fram e in the stack. It is the old content of LB and will be restored at end of procedure activation Since there are two words of link data, local variable addresses are offset by 2 dynamic link link data return address This only considers access to local or global variables, not nested variables. local data
220
Chapter 7: Code Generation
Code Selection A Code Generation Algorithm Constants and Variables Procedures and Functions Case Study: Code Generation in the Triangle Compiler Requirements are defined first: Customer needs are analyzed and requirements are developed for the entire system prior to any of the remaining DLM activities being initiated. Multiple internal development cycles: Multiple subprojects implementing a subset of the system are planned and conducted as part of the overall project. Multiple customer deliveries: Multiple deliveries (I.e., builds) of the system are provided to the customer at interim points in the project. Only the finial build typically contains the total functionality of the system. Functional content primary driver: Refers to the primary kind of information that determines the functionality of the system. Process primary driver: Refers to the primary process related issues that has heavy influence on selecting the most appropriate DLM.
221
Code Generation Translation of the source program to object code
Dependent on source language and target machine Target Machines Registers, or stack, or both for intermediate results Instructions with zero, one, two, or three operands, or a mixture Single addressing mode, or many
222
Code Generation Major Subproblems
Code selection: which sequence of target machine instructions will be the object code for each phrase Write code templates: a general rule specifying the object code of all phases of a particular form (e.g., all assignment commands, etc.) But there are usually lots of special cases Storage allocation: deciding the storage address of each variable in source program Exact for glob al variables, but only relative for local variables Register allocation: should be used to hold intermediate results during expression evaluation Complex expressions -- not enough registers Since code generation for stack machine much simpler than for register machine, will only generate code for stack machine
223
Code Generation Code Selection
Deciding which sequence of instructions to generate for each case Code template: specifies the object code to which a phrase is translated, in terms of the object code to which its sub phrases are translated. Object code: sequence of instructions to which the source-language phrase will be translated Code specification: collection of code functions and code templates; must cover the entire source langauge
224
Abstract Machine TAM Suitable for executing programs compiled from a block-structured language such as Triangle All evaluation takes place o a stack Primitive arithmetic, logical, and other operations are treated uniformly with programmed functions and procedures Two separate stores Code Store: 32-bit instruction words (read only) Data Store: 16-bit data words (read-write)
225
Abstract Machine TAM Code and Data Stores
Code Store Fixed while program is running Code segment: contains the program’s instructions CB points to base of code segment CT points to top of code segment CP points to next instruction to be executed Initialized to CB (programs first instruction is at base of code segment) Primitive segment: contains ‘microcode’ for elementary arithmetic , logical, input-output, heap, and general-purpose operations PB points to base of primitive segment PT points to top of primitive segment
226
Abstract Machine TAM Code and Data Stores
While program is running segments of data store may vary Stack grows from low-address end of Data Store SB points to base of the stack ST points to top of the stack Initialized to SB Heap grows from the high-address endo fo Data Store HB points to base of heap HT points to top of heap Initialized to HB
227
Abstract Machine TAM Code and Data Stores
Code Store code segment unused primitive CB CP CT PB PT Data Store SB global segment Stack and heap can expand and contract Global segment is always at base of stack Stack can contain any number of other segments known as frames containing data local to an activation of some routine LB points to base of topmost frame frame stack LB frame ST unused HT heap segment HB
228
Code Functions run P Run the program P and then halt, starting and finishing with an empty stack Execute the command C, possibly updating variables, but neither expanding nor contracting the stack Execute the expression E, pushing its result on to the stack top, but having no other effect Push the value of the constant or variable named V on to the stack top Pop a value from t he stack top, and store it in the variable named V Elaborate the declaration D, expanding the stack to make space for any constants and variables declared therein execute C evaluate E fetch V assign V elaborate D
229
Abstract Machine TAM Instructions
LOAD(n) d[r] Fetch an n-word object from the data address (d+register r), and push it on the stack Push the data address (d+register r) on to the stack Pop a data address from the stack, fetch an n-word object from that address, and push it on to the stack Push the 1-word literal value d on to the stack Pop an n-word object from the stack, and store it at the data address (d+register r) Pop an address from the stack, then pop an n-word object from t he stack and store it at that address Call the routine at code address (d+register r), using the address in register n as the static link Pop a closure (static link and code address) from the stack, then call the routine at that code address Return from the current routine: pop an n-word result from the stack, then pop the topmost frame, then pop d words of arguments, then push the result back on to the stack Push d words (uninitialized) on to the stack Pop an n-word result from the stack, then pop d more words, then push the result back on to the stack Jump to code address (d+register r) Pop a code address from the stack, then jump to that address Pop a 1-word value from the stack, then jump to code address (d+register r) if and only if that value equals n Stop execution of the program LOADA d[r] LOADI(n) LOADL d STORE(n) d[r] STOREI(n) CALL(n) d[r] CALLI RETURN(n) d PUSH d POP(n) d JUMP d[r] JUMPI JUMPIF(n) d[r] HALT
230
While Command execute [[while E do C]] = g: execute C h: evaluate E
JUMP h g: execute C h: evaluate E JUMPIF(1) g
231
While Command execute [[while i > 0 do i := i – 2]]
execute [[i := I – 2]] execute [[i > 0]] 30: JUMP 35 // JUMP h g: 31: LOAD i 32: LOADL 2 33: CALL sub 34: STORE i h: 35: LOAD i 36: LOADL 0 37: CALL gt 38: JUMPIF(1) 31 // JUMPIF(1) g
232
While Command public Object visitWhileCommand(WhileCommand ast, Object o) { Frame frame = (Frame) o; int jumpAddr, loopAddr; jumpAddr = nextInstrAddr; // saves the next instruction address (g:) to put in JUMP command emit(Machine.JUMPop, 0, Machine.CBr, 0); // puts the JUMP h instruction in obj file loopAddr = nextInstrAddr; // this is address g: ast.C.visit(this, frame); // this generates code for C patch(jumpAddr, nextInstrAddr); // this establishes address h: that was needed in the JUMP h statement ast.E.visit(this, frame); // this generated code for E emit(Machine.JUMPIFop, Machine.trueRep, Machine.CBr, loopAddr); // this generated code to check expression, if false to address g: return null; }
233
While Command execute [[while E do C]] = g: execute C evaluate E
JUMPIF(1) g
234
Repeat Command execute [[repeat i := i – 2 until i < 0 do ]]
execute [[i := i – 2]] execute [[i > 0]] g: 31: LOAD i 32: LOADL 2 33: CALL sub 34: STORE i 35: LOAD i 36: LOADL 0 37: CALL lt 38: JUMPIF(0) 31 // JUMPIF(0) g
235
Repeat Command public Object visitRepeatCommand(RepeatCommand ast, Object o) { Frame frame = (Frame) o; int jumpAddr, loopAddr; // emit(Machine.JUMPop, 0, Machine.CBr, 0); // jumpAddr = nextInstrAddr; loopAddr = nextInstrAddr; ast.C.visit(this, frame); // patch(jumpAddr, nextInstrAddr); ast.E.visit(this, frame); emit(Machine.JUMPIFop, Machine.falseRep, Machine.CBr, loopAddr); return null; }
236
Abstract Machine TAM Routines
237
Abstract Machine TAM Primitive Routines
238
Extend Mini-Triangle V1 , V2 := E1 , E2
This is a simultaneous assignment: both E1 and E2 are to be evaluated, and then their values assigned to the variables V1 and V2, respectively evaluate E1 evaluate E2 assign V2 assign V1 Results pushed to top of stack Top of stack stored in variable V2 Top of stack stored in variable V1 ST Result E1 ST Result E1 Result E1 ST ST Result E2 ST Result E2 V2 Result E2 V2 Result E1 V1
239
Extend Mini-Triangle C1 , C2
This is a collateral command: the subcommands C1 and C2 are to be executed in any order chosen by the implementer execute C1 execute C2 Top of stack unchanged
240
Extend Mini-Triangle if E then C
This is a conditional command: if E evaluates to true, C is executed, otherwise nothing evaluate E JUMPIF (0) g execute C g: Results pushed to top of stack Jump to g if E evaluates to false Top of stack unchanged Jump location
241
Extend Mini-Triangle repeat C until E
This is a loop command: E is evaluated at the end of each iteration (after executing C), and the loop terminates if its value is true g: execute C evaluate E JUMPIF (0) g Top of stack unchanged Results pushed to top of stack Jump to g if E evaluates to false
242
Extend Mini-Triangle repeat C1 while E do C2
This is a loop command: E is evaluated in the middle of each iteration (after executing C1 but before executing C2), and the loop terminates if its value is false JUMP h g: execute C2 h: execute C1 evaluate E JUMPIF (1) g Top of stack unchanged Results pushed to top of stack Jump to g if E evaluates to true
243
Extend Mini-Triangle if E1 then E2 else E3
This is a conditional expression: if E1 evaluates to true, E2 is evaluated, otherwise E3 is evaluated (E2 and E3 must be of the same type) evaluate E1 JUMPIF (0) g evaluate E2 JUMP h g: evaluate E3 h: Results pushed to top of stack Jump to g if E evaluates to false Jump location
244
Extend Mini-Triangle let D in E
This is a block expression: the declaration D is elaborated, and the resultant bindings are used in the evaluation of E elaborate D evaluate E POP (n) s Expand stack for variables or constants Results pushed to top of stack Pop an n word from stack, pop s more, then push first n-word back on stack If s>0 where s = amount of storage allocated by D n = size (type of E)
245
Extend Mini-Triangle begin C; yield E end
Here the command C is executed (making side effects), and then E is evaluated execute C evaluate E Top of stack unchanged Results pushed to top of stack
246
Extend Mini-Triangle for I from E1 to E2 do C
First the expressions E1 and E2 are evaluated, yielding the integer m and n, respectively. Then the subcommand C is executed repeatedly, with I bound to integers m, m+1, …, n in successive iterations. If m < n, C is not executed at all. The scope of I is C, which may fetch I but may not assign to it.
247
Extend Mini-Triangle for I from E1 to E2 do C
evaluate E2 evaluate E1 JUMP h g: execute C CALL succ h: LOAD –1 [ST] LOAD –3 [ST] CALL le JUMPIF(1) g POP(0) 2 Compute final value Compute initial value of I Top of stack unchanged Increment current value of I Fetch current value of I Fetch final value Test current value <= final value If so, repeat Discard current and final values At g and at h, the current value of I is at the stack top (at address –1 [ST], and the final value is immediately underlying (at address –2 [ST]
248
Chapter 8: Interpretation
Interactive Interpretation Interactive Interpretations of Machine Code Interactive Interpretation of Command Languages Interactive Interpretation of Simple Programming Languages Recursive Interpretation Case Study: The TAM Interpreter
249
Chapter 9: Conclusion The Programming Language Life Cycle
Design Specification Prototype Compilers Error Reporting Compile-time Error Reporting Run-time Error Reporting Efficiency Compile-time Efficiency Run-time Efficiency
250
Structure of a Compiler
Lexical Analyzer Source code tokens Symbol Table Semantic Analyzer Parser parse tree Intermediate Code Generation Semantic Analyzer Identification Type checking intermediate representation Optimization intermediate representation Assembly Code Generation Assembly code
251
Programming Language Lifecycle: Concepts
Values Types Storage Bindings Abstractions Encapsulation Polymorphism Exceptions Concurrency Concepts Advanced Concepts
252
Programming Language Lifecycle: Simplicity & Regularity
Strive for simplicity and regularity Simplicity: support only the concepts essential to the applications for which language is intended Regularity: should combine those concepts in a systematic way, avoiding restrictions that might surprise programmers or make their task more difficult
253
Design Principles Type completeness: no operation should be arbitrarily restricted in the types of its operands Operations like assignment and parameter passing should, ideally, be applied to all types Abstraction: for phrase that specifies some kind of computation, should be a way to abstract that phrase and parameterize it Should be possible to abstract any expression and make it a function Correspondence: for each form of declaration there should be corresponding parameter mechanism Take a block with a constant definition and transform it into as procedure (or function) with a constant parameter
254
Programming Language Lifecycle
Design Specification Prototype Compilers Manuals, textbooks
255
Specification Precise specification for language’s syntax and semantics must be written Informal or formal or hybrid Informal Formal Syntax English phrases BNF, EBNF Axiomatic method (based on mathematical logic) Semantics English phrases
256
Prototypes Cheap, low quality implementation
Highlights features of language that are hard to implement Try out language Interpreter might be a ghood prototype Interpretive compiler From source to abstract machine code
257
Compile-Time Error Reporting
Rejecting ill-formed programs Report location of each error with some explanation Distinguish between the major categories of compile-time errors: Syntactic error: missing or unexpected characters or tokens Indicate what characters or tokens were expected Scope error: a violation of the language’s scope rules Indicate which identifier was used twice, or used with declaration Type error: a violation of the language’s type rule Indicate which type rule was violated and/or what type was expected
258
Run-Time Error Reporting
Common run-time errors Arithmetic overflow Division by zero Out-of-range array indexing Can be detected only at run-time, because they depend on values computed at run-time
259
Final Exam Review Final Exam is comprehensive in that: Exam Structure
Essay questions will cover Chapters 5, 6, 7, 9 Problem oriented questions require knowledge from the entire semester Exam Structure Four questions Two essay questions Discuss Describe Two problems Develop code template for new language construct Determine identification table for given program Calculate size and address for given type(s) Compare & contrast Evaluate
260
Final Exam Review Chapter 5 – Contextual Analysis
Contextual analysis checks that the program conforms to the source language’s contextual constraints Scope rules Type rules Block Structure Monolithic Flat Nested Type Checking Literal Identifier Unary operator application Binary operator application Standard Environment
261
Final Exam Review Chapter 6 – Run-Time Organization
Key Issues Data representation Expression evaluation Storage allocation Routines Fundamental Principles of Data Representation Non-confusion: different values of a given type should have different representation Uniqueness: each value should always have same representation
262
Final Exam Review Chapter 6 – Run-Time Organization
Types Primitive types: cannot be decomposed Boolean Character Integer Records Disjoint unions Static arrays Dynamic arrays Recursive types For various types be able to determine size (storage required) and address (how to locate)
263
Final Exam Review Chapter 6 – Run-Time Organization
Expression Evaluation Stack machine Register machine Static storage allocation Global variables Stack storage allocation Local variables
264
Final Exam Review Chapter 7 – Code Generation
Translation of the source program to object code Dependent on source language and target machine Target Machines Registers, or stack, or both for intermediate results Instructions with zero, one, two, or three operands, or a mixture Single addressing mode, or many
265
Final Exam Review Chapter 7 – Code Generation
Code selection: which sequence of target machine instructions will be the object code for each phrase Storage allocation: deciding the storage address of each variable in source program Register allocation: should be used to hold intermediate results during expression evaluation
266
Final Exam Review Chapter 9 – Programming Language Life-Cycle
Design Specification Prototype Compilers Manuals, textbooks
267
Final Exam Review Chapter 9 – Programming Language Life-Cycle
Strive for simplicity and regularity Design Principles Type completeness: no operation should be arbitrarily restricted in the types of its operands Abstraction: for phrase that specifies some kind of computation, should be a way to abstract that phrase and parameterize it Correspondence: for each form of declaration there should be corresponding parameter mechanism Specifications Prototype Error Reporting Compile-time Run-time
268
Final Exam Review Structure of a Compiler
Lexical Analyzer Source code tokens Symbol Table Semantic Analyzer Parser parse tree Intermediate Code Generation Semantic Analyzer Identification Type checking intermediate representation Optimization intermediate representation Assembly Code Generation Assembly code
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.