Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction CPSC 388 Ellen Walker Hiram College.

Similar presentations


Presentation on theme: "Introduction CPSC 388 Ellen Walker Hiram College."— Presentation transcript:

1 Introduction CPSC 388 Ellen Walker Hiram College

2 Why Learn About Compilers? Practical application of important computer science theory Ties together computer architecture and programming Useful tools for developing language interpreters –Not just programming languages!

3 Computer Languages Machine language –Binary numbers stored in memory –Bits correspond directly to machine actions Assembly language –A “symbolic face” for machine language –Line-for-line translation High-level language (our goal!) –Closer to human expressions of problems, e.g. mathematical notation

4 Assembler vs. HLL Assembler Ldi $r1, 2 -- put the value 2 in R1 Sto $r1, x -- store that value in X HLL X = 2;

5 Characteristics of HLL’s Easier to learn (and remember) Machine independent –No knowledge of architecture needed –… as long as there is a compiler for that machine!

6 Early Milestones FORTRAN (Formula Translation) –IBM (John Backus) 1954-1957 –First High-level language, and first compiler Chomsky Hierarchy (1950’s) –Formal description of natural language structure –Ranks languages according to the complexity of their grammar

7 Chomsky Hierarchy Type 3: Regular languages –Too simple for programming languages –Good for tokens, e.g. numbers Type 2: Context Free languages –Standard representation of programming languages Type 1: Context Sensitive Languages Type 0: Unrestricted

8 CSL Another View of the Hierarchy CFL RL

9 Formal Language & Automata Theory Machines to recognizes each language class –Turing Machine (computable languages) –Push-down Automaton (context-free languages) –Finite Automaton (regular languages) Use machines to prove that a given language belongs to a class Formally prove that a given language does not belong to a class

10 Practical Applications of Theory Translate from grammar to formal machine description Implement the formal machine to parse the language Tools: –Scanner Generator (RL / FA): LEX, FLEX –Parser Generator (CFL / FA): YACC, Bison

11 Beyond Parsing Code generation Optimization –Techniques to “mindlessly” improve code –Usually after code generation –Rarely “optimal”, simply better

12 Phases of a Compiler Scanner -> tokens Parser -> syntax tree Semantic Analyzer -> annotated tree Source code optimizer -> intermediate code Code generator -> target code Target code optimizer -> better target code

13 Additional Tables Symbol table –Tracks all variable names and other symbols that will have to be mapped to addresses later Literal table –Tracks literals (such as numbers and strings) that will have to be stored along with the eventual program

14 Scanner Read a stream of characters Perform lexical analysis to generate tokens Update symbol and literal tables as needed Example: Input: a[j] = 4 + 1 Tokens: ID Lbrack ID Rbrack EQL NUM PLUS NUM

15 Parser Performs syntax analysis Relates the sequence of tokens to the grammar Builds a tree that represents this relationship, the parse tree

16 Partial Grammar assign-expr -> expr = expr array-expr -> ID [ expr ] expr -> array-expr expr -> expr + expr expr -> ID expr -> NUM

17 Example Parse assign-expression expression add-expressionarray-expression ID[ ] = NUM + expression

18 Abstract Syntax Tree assign-expression expression add-expressionarray-expression ID NUM expression

19 Semantic Analyzer Determine the meaning (not structure) of the program This is “compile-time” or static semantics only Example; a[j] = 4 + 1 –a refers to an array location –a contains integers –j is an integer –j is in the range of the array (not checked in C) Parse or Syntax tree is “decorated” with this information

20 Source Code Optimizer Simplify and improve the source code by applying rules –Constant folding: replace “4+2” by 6 –Combine common sub-expressions –Reordering expressions (often prior to constant folding) –Etc. Result: modified, decorated syntax tree or Intermediate Representation

21 Code Generator Generates code for the target machine Example: –MOV R0, jvalue of j into R0 –MUL R0, 22*j in R0 (int = 2 wds) –MOV R1, &avalue of a in R1 –ADD R1, R0a+2*j in R1 (addr of a[j]) –MOV *R1, 66 into address in R1

22 Target Code Optimizer Apply rules to improve machine code Example: –MOV R0, j –SHL R0(shift to multiply by 2) Use more complex –MOV &a[R0], 6machine instruction to replace simpler ones

23 Major Data Structures Tokens Syntax Tree Symbol Table Literal Table Intermediate Code Temporary files

24 Structuring a Compiler Analysis vs. Synthesis –Analysis = understanding the source code –Synthesis = generating the target code Front end vs. Back end –Front end: parsing & intermediate code generation (target machine-independent) –Back end: target code generation Optimization included in both parts

25 Multiple Passes Each pass process the source code once –One pass per phase –One pass for several phases –One pass for entire compilation Language definition can preclude one- pass compilation

26 Runtime Environments Static (e.g. FORTRAN) –No pointers, no dynamic allocation, no recursion –All memory allocation done prior to execution Stack-based (e.g. C family) –Stack for nested allocation (call/return) –Heap for random allocation (new) Fully dynamic (LISP) –Allocation is automatic (not in source code) –Garbage collection required

27 Error Handling Each phase finds and handles its own types of errors –Scanning: errors like: 1o1 (invalid ID) –Parsing: syntax errors –Semantic Analysis: type errors Runtime errors handled by the runtime environment –Exception handling by programmer often allowed

28 Compiling the Compiler Using machine language –Immediately executable, hard to write –Necessary for the first (FORTRAN) compiler Using a language with an existing compiler and the same target machine Using the language to be compiled (bootstrapping)

29 Bootstrapping Write a “quick & dirty” compiler for a subset of the language (using machine language or another available HLL) Write a complete compiler in the language subset Compile the complete compiler using the “quick & dirty” compiler


Download ppt "Introduction CPSC 388 Ellen Walker Hiram College."

Similar presentations


Ads by Google