Presentation is loading. Please wait.

Presentation is loading. Please wait.

國立台灣大學 資訊工程學系 薛智文 98 Spring Compiler TH 234, DTH 103.

Similar presentations


Presentation on theme: "國立台灣大學 資訊工程學系 薛智文 98 Spring Compiler TH 234, DTH 103."— Presentation transcript:

1 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw http://www.csie.ntu.edu.tw/~cwhsueh/ 98 Spring Compiler TH 234, DTH 103

2 資工系網媒所 NEWS 實驗室 08:37 /271

3 資工系網媒所 NEWS 實驗室 08:37 /272 Why Study Compilers? 1.Excellent software-engineering example --- theory meets practice. 2.Essential software tool. 3.Influences hardware design, e.g., RISC, VLIW. 4.Tools (mostly “ optimization ” ) for enhancing software reliability and security.

4 資工系網媒所 NEWS 實驗室 08:37 /273 Compilers & Architecture Modern architectures have very complex structures, especially opportunities for parallel execution. Sequential programs can only make effective use of these features via an optimizing compiler. Hardware question: If we implemented this, could a compiler use it?

5 資工系網媒所 NEWS 實驗室 08:37 /274 Software Reliability Optimization technology (data-flow analysis) used in: Lock/unlock errors. Buffers not range-checked. Memory Leaks. SQL injection bugs..

6 資工系網媒所 NEWS 實驗室 08:37 /275 What this Course Offers? Compiler methodology for both compiler implementation and related applications. Theoretical framework. Key algorithms. Hands-on experience. Nongoal: build a complete optimizing compiler.

7 資工系網媒所 NEWS 實驗室 08:37 /276 Course Outline Part 1 --- Introduction. Part 2 --- Scanner. Part 3 --- Parser. Part 4 --- Syntax-Directed Translation. Part 5 --- Symbol Table. Part 6 --- Intermediate Code Generation. Part 7 --- Run Time Storage Organization. Part 8 --- Optimization. Part 9 --- How to Write a Compiler. Part 10 --- A Simple Code Generation (PSEUDO) Example.

8 資工系網媒所 NEWS 實驗室 08:37 /277 Introduction Compiler is one of language processors. input Compiler source program target program output

9 資工系網媒所 NEWS 實驗室 08:37 /278 What is a Compiler? Definitions: The software system translates description of computations into a program executable by a computer. Source and target must be equivalent! Compiler writing spans: programming languages; machine architecture; language theory; algorithms and data structures; software engineering. History: 1950: the first FORTRAN compiler took 18 man-years; now: using software tools, can be done in a few months as a student’s project. input Compiler source program target program output

10 資工系網媒所 NEWS 實驗室 08:37 /279 An Interpreter input Interpreter source program output

11 資工系網媒所 NEWS 實驗室 08:37 /2710 A Hybrid Compiler Translator source program input Virtual Machine intermediate program output

12 資工系網媒所 NEWS 實驗室 08:37 /2711 A Language-Processing System source program library files relocatable object files Preprocessor modified source program Compiler target assembly program Assembler relocatable machine code Linker/Loader target machine code

13 資工系網媒所 NEWS 實驗室 08:37 /2712 Applications Computer language compilers. Translator: from one format to another. query interpreter text formatter silicon compiler infix notation  postfix notation: pretty printers · · · Software productivity tools. 3 + 5 – 6 * 63 5 + 6 6 * –

14 資工系網媒所 NEWS 實驗室 08:37 /2713 Relations with Computational Theory Computational theory: a set of grammar rules ≡ the definition of a particular machine. also equivalent to a set of languages recognized by this machine. a type of machines: a family of machines with a given set of operations, or capabilities; power of a type of machines ≡ the set of languages that can be recognized by this type of machines.

15 資工系網媒所 NEWS 實驗室 08:37 /2714 A Language-Processing System source program library files relocatable object files Preprocessor modified source program Compiler target assembly program Assembler relocatable machine code Linker/Loader target machine code

16 資工系網媒所 NEWS 實驗室 08:37 /2715 Phases of a Compiler character stream Lexical Analyzer (scanner) token stream Machine-Independent Code Optimizer optimized intermediate representation Code Generator relocatable machine code Machine-Dependent Code Optimizer target-machine code Semantic Analyzer annotated abstract-syntax tree Syntax Analyzer (parser) abstract-syntax tree Intermediate Code Generator intermediate representation Symbol Table Error Handling front-end back-end

17 資工系網媒所 NEWS 實驗室 08:37 /2716 Lexical Analyzer (Scanner) Actions: Reads characters from the source program; lexemes pattern Groups characters into lexemes, i.e., sequences of characters that “go together”, following a given pattern ; token Each lexeme corresponds to a token. the scanner returns the next token, plus maybe some additional information, to the parser; The scanner may also discover lexical errors, i.e., erroneous characters. lexemetokenbad character The definitions of what a lexeme, token or bad character is depend on the definition of the source language.

18 資工系網媒所 NEWS 實驗室 08:37 /2717 Scanner Example for C Lexeme: C statement position = initial + rate * 60; (Lexeme) position = initial + rate * 60 ; (Token) ID ASSIGN ID PLUS ID TIME INT SEMI-COL Arbitrary number of blanks (white spaces) between lexemes. Erroneous sequence of characters, that are not parts of comments, for the C language: control characters @ 2abc 1position… 2initial… 3rate… Symbol Table

19 資工系網媒所 NEWS 實驗室 08:37 /2718 Syntax Analyzer (Parser) Actions: grammatical phrases Group tokens into grammatical phrases, to discover the underlying structure of the source syntax errors Find syntax errors, e.g., the following C source line: (Lexeme) index = 12 * ; (Token) ID ASSIGN INT TIMES SEMI-COL Every token is legal, but the sequence is erroneous! static semantic errors May find some static semantic errors, e.g., use of undeclared variables or multiple declared variables. May generate code, or build some intermediate representation of the source program, such as an abstract-syntax tree.

20 資工系網媒所 NEWS 實驗室 08:37 /2719 Source code: position = initial + rate * 60 Abstract-syntax tree: interior nodes of the tree are OPERATORS; a node’s children are its OPERANDS; logical unit each subtree forms a logical unit. the subtree with * at its root shows that * has higher precedence than +, the operation “rate * 60” must be performed as a unit, not “initial + rate”. Where is ”;”? Parser Example for C = + * 60 1position… 2initial… 3rate… Symbol Table

21 資工系網媒所 NEWS 實驗室 08:37 /2720 Semantic Analyzer Actions: type errors Check for more static semantic errors, e.g., type errors. May annotate and/or change the abstract syntax tree. = + * 60 = + * int_to_float 60 1position… 2initial… 3rate… Symbol Table

22 資工系網媒所 NEWS 實驗室 08:37 /2721 Intermediate Code Generator Actions: translate from abstract-syntax trees to intermediate codes. One choice for intermediate code is 3-address code : Each statement contains at most 3 operands; in addition to “=”, i.e., assignment, at most one operator. An ”easy” and “universal” format that can be translated into most assembly languages. t 1 = int_to_float(60) t 2 = id 3 * t 1 t 3 = id 2 + t 2 id 1 = t 3 = + * int_to_float 60

23 資工系網媒所 NEWS 實驗室 08:37 /2722 Optimizer Improve the efficiency of intermediate code. faster Goal may be to make code run faster, and/or to use the least number of registers · · · Current trends: to obtain smaller, but maybe slower, equivalent code for embedded systems; to reduce power consumption. t 1 = id 3 * 60.0 id 1 = id 2 + t 1 t 1 = int_to_float(60) t 2 = id 3 * t 1 t 3 = id 2 + t 2 id 1 = t 3

24 資工系網媒所 NEWS 實驗室 08:37 /2723 Code Generation A compiler may generate which is rare pure machine codes (machine dependent assembly language) directly, which is rare now ; virtual machine code. Example: PASCAL  compiler  P-code  interpreter  execution Speed is roughly 4 times slower than running directly generated machine codes. Advantages: simplify the job of a compiler; 1/3 for P-code decrease the size of the generated code: 1/3 for P-code ; can be run easily on a variety of platforms P-machine is an ideal general machine whose interpreter can be written easily; divide and conquer; recent example: JAVA and Byte-code.

25 資工系網媒所 NEWS 實驗室 08:37 /2724 Code Generation Example LDF R 2, id 3 MULF R 2, R 2, #60.0 LDF R 1, id 2 ADDF R 1, R 1, R 2 STF id 1, R 1 t 1 = id 3 * 60.0 id 1 = id 2 + t 1

26 資工系網媒所 NEWS 實驗室 08:37 /2725 Practical Considerations (1/2) Preprocessing phase: macro substitution: #define MAXC 10 rational preprocessing: add new features for old languages. BASIC C  C ++ compiler directives: #include non-standard language extensions. adding parallel primitives

27 資工系網媒所 NEWS 實驗室 08:37 /2726 Practical Considerations (2/2) Passes of compiling A pass reads an input file and writes an output file. First pass reads the text file once. May need to read the text one more time for any forward addressed objects, i.e., anything that is used before its declaration. Example: C language goto error_handling; · · · error_handling: · · ·

28 資工系網媒所 NEWS 實驗室 08:37 /2727 Reduce Number of Passes Each pass takes I/O time. Back-patching Back-patching : leave a blank slot for missing information, and fill in the empty slot when the information becomes available. Example: C language when a label is used trace if it is not defined before, save a trace into the to-be-processed table label name corresponds to LABEL TABLE[i] code generated: GOTO LABEL TABLE[i] when a label is defined check known labels for redefined labels if it is not used before, save a trace into the to-be-processed table if it is used before, then find its trace and fill the current address into the trace Time and space trade-off !


Download ppt "國立台灣大學 資訊工程學系 薛智文 98 Spring Compiler TH 234, DTH 103."

Similar presentations


Ads by Google