Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compiler construction 2002

Similar presentations


Presentation on theme: "Compiler construction 2002"— Presentation transcript:

0 Compiler construction in4020 – course 2001/2002
week 1 Compiler construction in4020 – course 2001/2002 Koen Langendoen Delft University of Technology The Netherlands

1 Compiler construction 2002
week 1 Goals understand the structure of a compiler understand how the components operate understand the tools involved scanner generator, parser generator, etc. understanding means [theory] be able to read source code [practice] be able to adapt/write source code

2 Format: “werkcollege” + practicum
Compiler construction 2002 week 1 Format: “werkcollege” + practicum 14 x 2 hours of interactive lectures sp book “Modern Compiler Design” schedule: see blackboard handouts: see blackboard assignment sp groups of 2 students modify reference compiler oral exam sp

3 Compiler construction 2002
week 1 Homework find a partner for the “practicum” register your group send to

4 Compiler construction 2002
week 1 What is a compiler? program in some source language executable code for target machine compiler Ask audience first.

5 Compiler construction 2002
week 1 What is a compiler? program in some source language front-end analysis semantic represen- tation back-end synthesis compiler executable code for target machine

6 Why study compilerconstruction?
week 1 Why study compilerconstruction? curiosity better understanding of programming language concepts wide applicability transforming “data” is very common many useful data structures and algorithms practical application of “theory” Ask audience first.

7 Compiler construction 2002
week 1 Overview lecture 1 [introduction] compiler structure exercise min. break lexical analysis excercise

8 Compiler construction 2002
week 1 Compiler structure program in some source language front-end analysis executable code for target machine back-end synthesis L+M modules = LxM compilers program in some source language front-end analysis semantic represen- tation executable code for target machine back-end synthesis compiler Ask audience about disadvantages BEFORE next slide. executable code for target machine back-end synthesis

9 Limitations of modular approach
Compiler construction 2002 week 1 Limitations of modular approach performance generic vs specific loss of information variations must be small same programming paradigm similar processor architecture program in some source language front-end analysis semantic represen- tation executable code for target machine back-end synthesis compiler

10 Semantic representation
Compiler construction 2002 week 1 Semantic representation program in some source language executable code for target machine semantic represen- tation front-end analysis back-end synthesis compiler heart of the compiler intermediate code linked lists of pseudo instructions abstract syntax tree (AST)

11 Compiler construction 2002
week 1 AST example expression grammar expression  expression ‘+’ term | expression ‘-’ term | term term  term ‘*’ factor | term ‘/’ factor | factor factor  identifier | constant | ‘(‘ expression ‘)’ example expression b*b – 4*a*c

12 Compiler construction 2002
week 1 parse tree: b*b – 4*a*c expression expression ‘-’ term term term factor ‘*’ term factor term factor identifier ‘*’ ‘*’ Ask if they notice anything redundant in the parse tree. (wat valt je op?) factor identifier factor identifier ‘c’ identifier ‘b’ constant ‘a’ ‘b’ ‘4’

13 Compiler construction 2002
week 1 AST: b*b – 4*a*c ‘-’ ‘*’ ‘*’ ‘b’ ‘b’ ‘*’ ‘c’ ‘4’ ‘a’

14 annotated AST: b*b – 4*a*c
Compiler construction 2002 week 1 annotated AST: b*b – 4*a*c type: real loc: reg1 ‘-’ type: real loc: reg1 type: real loc: reg2 ‘*’ ‘*’ type: real loc: sp+16 type: real loc: sp+16 type: real loc: reg2 type: real loc: sp+24 ‘b’ Colors denote types of the nodes. ‘b’ ‘*’ ‘c’ identifier constant term expression type: real loc: const type: real loc: sp+8 ‘4’ ‘a’

15 Compiler construction 2002
week 1 AST exercise (5 min.) expression grammar expression  expression ‘+’ term | expression ‘-’ term | term term  term ‘*’ factor | term ‘/’ factor | factor factor  identifier | constant | ‘(‘ expression ‘)’ example expression b*b – (4*a*c) draw parse tree and AST

16 Compiler construction 2002
week 1 Answers

17 answer parse tree: b*b – 4*a*c
Compiler construction 2002 week 1 answer parse tree: b*b – 4*a*c expression expression ‘-’ term term term factor ‘*’ term factor term factor identifier ‘*’ ‘*’ factor identifier factor identifier ‘c’ identifier ‘b’ constant ‘a’ ‘b’ ‘4’

18 answer parse tree: b*b – (4*a*c)
Compiler construction 2002 week 1 answer parse tree: b*b – (4*a*c) expression expression ‘-’ term term factor term factor ‘*’ ‘(’ expression ‘)’ factor identifier identifier ‘b’ ‘4*a*c’ ‘b’

19 Compiler construction 2002
week 1 Break

20 front-end: from program text to AST
Compiler construction 2002 week 1 front-end: from program text to AST program text lexical analysis syntax analysis context handling annotated AST tokens AST front-end

21 front-end: from program text to AST
Compiler construction 2002 week 1 front-end: from program text to AST program text lexical analysis syntax analysis context handling annotated AST tokens AST scanner generator token description parser generator language grammar

22 Compiler construction 2002
week 1 Lexical analysis covert stream of characters to stream of tokens what is a token? sequence of characters with a semantic notion, see language definition rule of thumb: two characters belong to the same token if inserting white space changes the meaning. digit = *ptr++ - ’0’; digit = *ptr+ + - ’0’; lex-i-cal: of or relating to words or the vocabulary of a language as distinguished from its grammar and construction Webster’s Dictionary

23 Compiler construction 2002
week 1 Lexical analysis covert stream of characters to stream of tokens what is a token? sequence of characters with a semantic notion, see language definition rule of thumb: two characters belong to the same token if inserting white space changes the meaning. digit = *ptr++ - ’0’; digit = *ptr+ + - ’0’;

24 Compiler construction 2002
week 1 Tokens attributes type lexeme value file position examples typedef struct { int class; char *repr; file_pos position; } Token_Type; type lexeme IDENTIFIER foo, t3, ptr NUMBER 15, 082, 666 REAL 1.2, .002, 1e6 IF if

25 Compiler construction 2002
week 1 Non-tokens white spaces spaces, tabs, newlines comments /* a C-style comment */ // a C++ comment preprocessor directives #include “lex.h” #define is_digit(d) (’0’ <= (d) && (d) <= ’9’) Q: what is special about the newline character? A: its representation depends on the operating system!

26 Compiler construction 2002
week 1 Regular expressions Basic patterns Matching x the character x . any character, usually except a newline [abcA-Z] any of the characters a,b,c and the range A-Z Repetition operators R? an R or nothing (= optionally an R) R* zero or more occurrences of R R+ one or more occurrences of R Composition operators R1 R2 an R1 followed by an R2 R1 | R2 either an R1 or an R2 Grouping ( R ) R itself

27 Examples of regular expressions
Compiler construction 2002 week 1 Examples of regular expressions an integer is a sequence of digits: [0-9]+ an identifier is a sequence of letters and digits; the first character must be a letter: [a-z][a-z0-9]*

28 Compiler construction 2002
week 1 Regular descriptions structuring regular expressions by introducing named sub expressions letter  [a-zA-Z] digit  [0-9] letter_or_digit  letter | digit identifier  letter letter_or_digit* define before use

29 Compiler construction 2002
week 1 Exercise (5 min.) write down regular descriptions for the following descriptions: an integral number is a non-zero sequence of digits optionally followed by a letter denoting the base class (b for binary and o for octal). a fixed-point number is an (optional) sequence of digits followed by a dot (’.’) followed by a sequence of digits. an identifier is a sequence of letters and digits; the first character must be a letter. The underscore _ counts as a letter, but may not be used as the first or last character.

30 Compiler construction 2002
week 1 Answers

31 Compiler construction 2002
week 1 Answers base  [bo] integral_number  digit+ base? dot  \. fixed_point_number  digit* dot digit+ letter  [a-zA-Z] digit  [0-9] underscore  _ letter_or_digit  letter | digit letter_or_digit_or_und  letter_or_digit | underscore identifier  letter (letter_or_digit_or_und* letter_or_digit+)? Gotcha: the dot character must be escaped by a backslash.

32 Compiler construction 2002
week 1 Lexical analysis covert stream of characters to stream of tokens tokens are defined by a regular description tokens are demanded one-by-one by the syntax analyzer get_next_token() program text lexical analyzer syntax analyzer AST tokens

33 Compiler construction 2002
week 1 interface extern Token_Type Token; /* Global variable that holds the current token. */ void start_lex(void); /* Must be called before the first call to * get_next_token(). void get_next_token(void); /* Load the next token into the global * variable Token. Q: why a global variable? A: syntax analyzer tries multiple alternatives + backwards compatibility (C could not return structs)

34 lexical analysis by hand
Compiler construction 2002 week 1 lexical analysis by hand read complete program text into memory for simplicity avoids buffering and arbitrary limits variable length tokens get_next_token() dispatches on the next character dot input: main() { printf( ”hello world\n”);}

35 Compiler construction 2002
week 1 void get_next_token(void) { int start_dot; skip_layout_and_comment(); /* now we are at the start of a token or at end-of-file, so: */ note_token_position(); /* split on first character of the token */ start_dot = dot; if (is_end_of_input(input_char)) { Token.class = EoF; Token.repr = "<EoF>"; return; } if (is_letter(input_char)) {recognize_identifier();} else if (is_digit(input_char)) {recognize_integer();} if (is_operator(input_char) || is_separator(input_char)) { Token.class = input_char; next_char(); else {Token.class = ERRONEOUS; next_char();} Token.repr = input_to_zstring(start_dot, dot-start_dot); Q: why must next_char() be invoked on an erroneous token? A: to avoid an endless loop.

36 Character classification & token recognition
Compiler construction 2002 week 1 Character classification & token recognition #define is_end_of_input(ch) ((ch) == '\0') #define is_layout(ch) (!is_end_of_input(ch) && (ch) <= ' ') #define is_uc_letter(ch) ('A' <= (ch) && (ch) <= 'Z') #define is_lc_letter(ch) ('a' <= (ch) && (ch) <= 'z') #define is_letter(ch) (is_uc_letter(ch) || is_lc_letter(ch)) #define is_digit(ch) ('0' <= (ch) && (ch) <= '9') #define is_letter_or_digit(ch) (is_letter(ch) || is_digit(ch)) #define is_underscore(ch) ((ch) == '_') #define is_operator(ch) (strchr("+-*/", (ch)) != NULL) #define is_separator(ch) (strchr(";,(){}", (ch)) != NULL) void recognize_integer(void) { Token.class = INTEGER; next_char(); while (is_digit(input_char)) {next_char();} }

37 Compiler construction 2002
week 1 Summary compiler is a structured toolbox front-end: program text  annotated AST back-end: annotated AST  executable code lexical analysis: program text  tokens token specifications implementation by hand exercises AST regular descriptions

38 Compiler construction 2002
week 1 Next week Generating a lexical analyzer generic methods specific tool lex program text lexical analysis syntax analysis context handling annotated AST tokens AST scanner generator token description

39 Compiler construction 2002
week 1 Homework find a partner for the “practicum” register your group send to print handout lecture 2 [blackboard]


Download ppt "Compiler construction 2002"

Similar presentations


Ads by Google