Compiler construction 2002

Slides:



Advertisements
Similar presentations
Compilers Course 379K, TTH 9:30-11:00 Instructor: Dr. Doron A. Peled Office Hours: Mon 11:00-12:00.
Advertisements

Lesson 6 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands.
Compiler construction in4020 – lecture 4 Koen Langendoen Delft University of Technology The Netherlands.
Compiler construction in4020 – lecture 10 Koen Langendoen Delft University of Technology The Netherlands.
Lecture 01 - Introduction Eran Yahav 1. 2 Who? Eran Yahav Taub 734 Tel: Monday 13:30-14:30
Lecture 02 – Lexical Analysis Eran Yahav 1. 2 You are here Executable code exe Source text txt Compiler Lexical Analysis Syntax Analysis Parsing Semantic.
Cpeg421-08S/final-review1 Course Review Tom St. John.
Chapter3: Language Translation issues
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1.
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1
Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:
Reference Book: Modern Compiler Design by Grune, Bal, Jacobs and Langendoen Wiley 2000.
Compiler construction 2002
Lexical Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 2.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
Compiler Construction
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1.
Compiler construction in4020 – lecture 3 Koen Langendoen Delft University of Technology The Netherlands.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Lexical Analysis Natawut Nupairoj, Ph.D.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Compiler course 1. Introduction. Outline Scope of the course Disciplines involved in it Abstract view for a compiler Front-end and back-end tasks Modules.
CST320 - Lec 11 Why study compilers? n n Ties lots of things you know together: –Theory (finite automata, grammars) –Data structures –Modularization –Utilization.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
COMP313A Programming Languages Lexical Analysis. Lecture Outline Lexical Analysis The language of Lexical Analysis Regular Expressions.
D. M. Akbar Hussain: Department of Software & Media Technology 1 Compiler is tool: which translate notations from one system to another, usually from source.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
1.  10% Assignments/ class participation  10% Pop Quizzes  05% Attendance  25% Mid Term  50% Final Term 2.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 1, 08/28/03 Prof. Roy Levow.
CPS 506 Comparative Programming Languages Syntax Specification.
Overview of Previous Lesson(s) Over View  A program must be translated into a form in which it can be executed by a computer.  The software systems.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
COMPILER CONSTRUCTION Lesson 1 – TDDD16 TDDB44 Compiler Construction 2010 Kristian Stavåker (Erik Hansson.
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Lexical Analysis - Scanner- Contd Computer Science Rensselaer Polytechnic Compiler Design Lecture 3(01/21/98)
The Model of Compilation Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Yet Another Compiler-Compiler Stephen C. Johnson July 31, 1978 YACC.
2/1/20161 Programming Languages and Compilers (CS 421) Grigore Rosu 2110 SC, UIUC Slides by Elsa Gunter, based in.
C H A P T E R T W O Linking Syntax And Semantics Programming Languages – Principles and Paradigms by Allen Tucker, Robert Noonan.
Dr. Mohamed Ramadan Saady 314ALL CH1.1 Chapter 1: Introduction to Compiling.
Compiler Construction CPCS302 Dr. Manal Abdulaziz.
CSC 4181 Compiler Construction
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
1 Compiler Construction Vana Doufexi office CS dept.
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
Compiler Chapter 4. Lexical Analysis Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Chapter 1 Introduction.
Compiler Construction (CS-636)
Chapter 1 Introduction.
课程名 编译原理 Compiling Techniques
TDDD55- Compilers and Interpreters Lesson 2
Compiler Lecture 1 CS510.
Lecture 2: General Structure of a Compiler
Review: Compiler Phases:
Lecture 4: Lexical Analysis & Chomsky Hierarchy
Designing a Predictive Parser
Compiler Structures 2. Lexical Analysis Objectives
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compiler Construction
Presentation transcript:

Compiler construction in4020 – course 2001/2002 week 1 Compiler construction in4020 – course 2001/2002 Koen Langendoen Delft University of Technology The Netherlands

Compiler construction 2002 week 1 Goals understand the structure of a compiler understand how the components operate understand the tools involved scanner generator, parser generator, etc. understanding means [theory] be able to read source code [practice] be able to adapt/write source code

Format: “werkcollege” + practicum Compiler construction 2002 week 1 Format: “werkcollege” + practicum 14 x 2 hours of interactive lectures 1 sp book “Modern Compiler Design” schedule: see blackboard handouts: see blackboard assignment 2 sp groups of 2 students modify reference compiler oral exam 1 sp

Compiler construction 2002 week 1 Homework find a partner for the “practicum” register your group send e-mail to koen@pds.twi.tudelft.nl

Compiler construction 2002 week 1 What is a compiler? program in some source language executable code for target machine compiler Ask audience first.

Compiler construction 2002 week 1 What is a compiler? program in some source language front-end analysis semantic represen- tation back-end synthesis compiler executable code for target machine

Why study compilerconstruction? week 1 Why study compilerconstruction? curiosity better understanding of programming language concepts wide applicability transforming “data” is very common many useful data structures and algorithms practical application of “theory” Ask audience first.

Compiler construction 2002 week 1 Overview lecture 1 [introduction] compiler structure exercise ----------------- 15 min. break ---------------------- lexical analysis excercise

Compiler construction 2002 week 1 Compiler structure program in some source language front-end analysis executable code for target machine back-end synthesis L+M modules = LxM compilers program in some source language front-end analysis semantic represen- tation executable code for target machine back-end synthesis compiler Ask audience about disadvantages BEFORE next slide. executable code for target machine back-end synthesis

Limitations of modular approach Compiler construction 2002 week 1 Limitations of modular approach performance generic vs specific loss of information variations must be small same programming paradigm similar processor architecture program in some source language front-end analysis semantic represen- tation executable code for target machine back-end synthesis compiler

Semantic representation Compiler construction 2002 week 1 Semantic representation program in some source language executable code for target machine semantic represen- tation front-end analysis back-end synthesis compiler heart of the compiler intermediate code linked lists of pseudo instructions abstract syntax tree (AST)

Compiler construction 2002 week 1 AST example expression grammar expression  expression ‘+’ term | expression ‘-’ term | term term  term ‘*’ factor | term ‘/’ factor | factor factor  identifier | constant | ‘(‘ expression ‘)’ example expression b*b – 4*a*c

Compiler construction 2002 week 1 parse tree: b*b – 4*a*c expression expression ‘-’ term term term factor ‘*’ term factor term factor identifier ‘*’ ‘*’ Ask if they notice anything redundant in the parse tree. (wat valt je op?) factor identifier factor identifier ‘c’ identifier ‘b’ constant ‘a’ ‘b’ ‘4’

Compiler construction 2002 week 1 AST: b*b – 4*a*c ‘-’ ‘*’ ‘*’ ‘b’ ‘b’ ‘*’ ‘c’ ‘4’ ‘a’

annotated AST: b*b – 4*a*c Compiler construction 2002 week 1 annotated AST: b*b – 4*a*c type: real loc: reg1 ‘-’ type: real loc: reg1 type: real loc: reg2 ‘*’ ‘*’ type: real loc: sp+16 type: real loc: sp+16 type: real loc: reg2 type: real loc: sp+24 ‘b’ Colors denote types of the nodes. ‘b’ ‘*’ ‘c’ identifier constant term expression type: real loc: const type: real loc: sp+8 ‘4’ ‘a’

Compiler construction 2002 week 1 AST exercise (5 min.) expression grammar expression  expression ‘+’ term | expression ‘-’ term | term term  term ‘*’ factor | term ‘/’ factor | factor factor  identifier | constant | ‘(‘ expression ‘)’ example expression b*b – (4*a*c) draw parse tree and AST

Compiler construction 2002 week 1 Answers

answer parse tree: b*b – 4*a*c Compiler construction 2002 week 1 answer parse tree: b*b – 4*a*c expression expression ‘-’ term term term factor ‘*’ term factor term factor identifier ‘*’ ‘*’ factor identifier factor identifier ‘c’ identifier ‘b’ constant ‘a’ ‘b’ ‘4’

answer parse tree: b*b – (4*a*c) Compiler construction 2002 week 1 answer parse tree: b*b – (4*a*c) expression expression ‘-’ term term factor term factor ‘*’ ‘(’ expression ‘)’ factor identifier identifier ‘b’ ‘4*a*c’ ‘b’

Compiler construction 2002 week 1 Break

front-end: from program text to AST Compiler construction 2002 week 1 front-end: from program text to AST program text lexical analysis syntax analysis context handling annotated AST tokens AST front-end

front-end: from program text to AST Compiler construction 2002 week 1 front-end: from program text to AST program text lexical analysis syntax analysis context handling annotated AST tokens AST scanner generator token description parser generator language grammar

Compiler construction 2002 week 1 Lexical analysis covert stream of characters to stream of tokens what is a token? sequence of characters with a semantic notion, see language definition rule of thumb: two characters belong to the same token if inserting white space changes the meaning. digit = *ptr++ - ’0’; digit = *ptr+ + - ’0’; lex-i-cal: of or relating to words or the vocabulary of a language as distinguished from its grammar and construction Webster’s Dictionary

Compiler construction 2002 week 1 Lexical analysis covert stream of characters to stream of tokens what is a token? sequence of characters with a semantic notion, see language definition rule of thumb: two characters belong to the same token if inserting white space changes the meaning. digit = *ptr++ - ’0’; digit = *ptr+ + - ’0’;

Compiler construction 2002 week 1 Tokens attributes type lexeme value file position examples typedef struct { int class; char *repr; file_pos position; } Token_Type; type lexeme IDENTIFIER foo, t3, ptr NUMBER 15, 082, 666 REAL 1.2, .002, 1e6 IF if

Compiler construction 2002 week 1 Non-tokens white spaces spaces, tabs, newlines comments /* a C-style comment */ // a C++ comment preprocessor directives #include “lex.h” #define is_digit(d) (’0’ <= (d) && (d) <= ’9’) Q: what is special about the newline character? A: its representation depends on the operating system!

Compiler construction 2002 week 1 Regular expressions Basic patterns Matching x the character x . any character, usually except a newline [abcA-Z] any of the characters a,b,c and the range A-Z Repetition operators R? an R or nothing (= optionally an R) R* zero or more occurrences of R R+ one or more occurrences of R Composition operators R1 R2 an R1 followed by an R2 R1 | R2 either an R1 or an R2 Grouping ( R ) R itself

Examples of regular expressions Compiler construction 2002 week 1 Examples of regular expressions an integer is a sequence of digits: [0-9]+ an identifier is a sequence of letters and digits; the first character must be a letter: [a-z][a-z0-9]*

Compiler construction 2002 week 1 Regular descriptions structuring regular expressions by introducing named sub expressions letter  [a-zA-Z] digit  [0-9] letter_or_digit  letter | digit identifier  letter letter_or_digit* define before use

Compiler construction 2002 week 1 Exercise (5 min.) write down regular descriptions for the following descriptions: an integral number is a non-zero sequence of digits optionally followed by a letter denoting the base class (b for binary and o for octal). a fixed-point number is an (optional) sequence of digits followed by a dot (’.’) followed by a sequence of digits. an identifier is a sequence of letters and digits; the first character must be a letter. The underscore _ counts as a letter, but may not be used as the first or last character.

Compiler construction 2002 week 1 Answers

Compiler construction 2002 week 1 Answers base  [bo] integral_number  digit+ base? dot  \. fixed_point_number  digit* dot digit+ letter  [a-zA-Z] digit  [0-9] underscore  _ letter_or_digit  letter | digit letter_or_digit_or_und  letter_or_digit | underscore identifier  letter (letter_or_digit_or_und* letter_or_digit+)? Gotcha: the dot character must be escaped by a backslash.

Compiler construction 2002 week 1 Lexical analysis covert stream of characters to stream of tokens tokens are defined by a regular description tokens are demanded one-by-one by the syntax analyzer get_next_token() program text lexical analyzer syntax analyzer AST tokens

Compiler construction 2002 week 1 interface extern Token_Type Token; /* Global variable that holds the current token. */ void start_lex(void); /* Must be called before the first call to * get_next_token(). void get_next_token(void); /* Load the next token into the global * variable Token. Q: why a global variable? A: syntax analyzer tries multiple alternatives + backwards compatibility (C could not return structs)

lexical analysis by hand Compiler construction 2002 week 1 lexical analysis by hand read complete program text into memory for simplicity avoids buffering and arbitrary limits variable length tokens get_next_token() dispatches on the next character dot input: main() { printf( ”hello world\n”);}

Compiler construction 2002 week 1 void get_next_token(void) { int start_dot; skip_layout_and_comment(); /* now we are at the start of a token or at end-of-file, so: */ note_token_position(); /* split on first character of the token */ start_dot = dot; if (is_end_of_input(input_char)) { Token.class = EoF; Token.repr = "<EoF>"; return; } if (is_letter(input_char)) {recognize_identifier();} else if (is_digit(input_char)) {recognize_integer();} if (is_operator(input_char) || is_separator(input_char)) { Token.class = input_char; next_char(); else {Token.class = ERRONEOUS; next_char();} Token.repr = input_to_zstring(start_dot, dot-start_dot); Q: why must next_char() be invoked on an erroneous token? A: to avoid an endless loop.

Character classification & token recognition Compiler construction 2002 week 1 Character classification & token recognition #define is_end_of_input(ch) ((ch) == '\0') #define is_layout(ch) (!is_end_of_input(ch) && (ch) <= ' ') #define is_uc_letter(ch) ('A' <= (ch) && (ch) <= 'Z') #define is_lc_letter(ch) ('a' <= (ch) && (ch) <= 'z') #define is_letter(ch) (is_uc_letter(ch) || is_lc_letter(ch)) #define is_digit(ch) ('0' <= (ch) && (ch) <= '9') #define is_letter_or_digit(ch) (is_letter(ch) || is_digit(ch)) #define is_underscore(ch) ((ch) == '_') #define is_operator(ch) (strchr("+-*/", (ch)) != NULL) #define is_separator(ch) (strchr(";,(){}", (ch)) != NULL) void recognize_integer(void) { Token.class = INTEGER; next_char(); while (is_digit(input_char)) {next_char();} }

Compiler construction 2002 week 1 Summary compiler is a structured toolbox front-end: program text  annotated AST back-end: annotated AST  executable code lexical analysis: program text  tokens token specifications implementation by hand exercises AST regular descriptions

Compiler construction 2002 week 1 Next week Generating a lexical analyzer generic methods specific tool lex program text lexical analysis syntax analysis context handling annotated AST tokens AST scanner generator token description

Compiler construction 2002 week 1 Homework find a partner for the “practicum” register your group send e-mail to koen@pds.twi.tudelft.nl print handout lecture 2 [blackboard]