Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Compilers Professor Yihjia Tsai 2006 Spring Tamkang University.

Similar presentations


Presentation on theme: "Introduction to Compilers Professor Yihjia Tsai 2006 Spring Tamkang University."— Presentation transcript:

1 Introduction to Compilers Professor Yihjia Tsai 2006 Spring Tamkang University

2 2 What is a compiler? Translates source code to target code –Source code is typically a high level programming language (Java, C++, etc) but does not have to be –Target code is often a low level language like assembly or machine code but does not have to be Can you think of other compilers that you have used – according to this definition?

3 3 Before we begin A-Z, a-z, 0-9 “ double quote # hash $ dollar sign % percent & ampersand ‘ single quote ( left parenthesis ) right parenthesis * star + plus, comma - hyphen, minus / slash : colon ; semicolon < less than = equal

4 4 Symbols > greater than ? question mark @ at sign [ left (open) square bracket \ back slash ] right (close) square bracket ^ caret, power _ underscore ` back quote { open brace | or } close brace ~ tilde. period, dot  bullet

5 5 Greek symbols  alpha  beta  gamma  delta  epsilon  phi  zeta  theta  iota  kappa lambda  mu nu  xi  pi  rho  sigma  tau  chi  psi  eta  omega

6 6 Other Compilers Javadoc -> HTML XML -> HTML SQL Query output -> Table Poscript -> PDF High level description of a circuit - > machine instructions to fabricate circuit

7 The Compilation Process

8 8 The analysis Stage Broken up into four phases –Lexical Analysis (also called scanning or tokenization) –Parsing –Semantic Analysis –Intermediate Code Generation

9 9 Lexing Example double d1; double d2; d2 = d1 * 2.0; doubleTOK_DOUBLE reserved word d1TOK_IDvariable name ;TOK_PUNCThas value of “;” double TOK_DOUBLE reserved word d2TOK_IDvariable name ;TOK_PUNCThas value of “;” d2TOK_IDvariable name =TOK_OPER has value of “=” d1 TOK_IDvariable name * TOK_OPER has value of “*” 2.0 TOK_FLOAT_CONST has value of 2.0 ; TOK_PUNCThas value of “;” lexemes

10 10 Syntax and Semantics Syntax - the form or structure of the expressions – whether an expression is well formed Semantics – the meaning of an expression

11 11 Syntactic Structure Syntax almost always expressed using some variant of a notation called a context- free grammar (CFG) or simply grammar –BNF –EBNF

12 12 A CFG has 4 parts A set of tokens (lexemes), known as terminal symbols A set of non-terminals A set of rules (productions) where each production consists of a left-hand side (LHS) and a right-hand side (RHS) The LHS is a non-terminal and the RHS is a sequence of terminals and/or non- terminal symbols. A special non-terminal symbol designated as the start symbol

13 13 An example of BNF syntax for real numbers ::=. ::= | ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7| 8 | 9 encloses non-terminal symbols ::= 'is' or 'is made up of ' or 'derives' (sometimes denoted with an arrow ->) | or

14 14 Example On the example from the previous slide: –What are the tokens? –What are the lexemes? –What are the non terminals? –What are the productions?

15 15 Token vs. lexeme to·ken One that represents a group, as an employee whose presence is used to deflect from the employer criticism or accusations of discrimination. to·ken A basic, grammatically indivisible unit of a language such as a keyword, operator or identifier. lexeme A minimal unit (as a word or stem) in the lexicon of a language; `go' and `went' and `gone' and `going' are all members of the English lexeme `go' lexeme A minimal lexical unit of a language. Lexical analysis converts strings in a language into a list of lexemes. For a programming language these word-like pieces would include keywords, identifiers, literals and punctuations. The lexemes are then passed to the parser for syntactic analysis.

16 16 BNF Points A non terminal can have more than RHS or an OR can be used Lists or sequences are expressed via recursion A derivation is just a repeated set of production (rule) applications Examples

17 17 Example Grammar -> -> | ; -> = -> a | b | c | d -> + | - -> | const

18 18 Example Derivation => => => = => a = => a = + => a = b + => a = b + const

19 19 Parse Trees Alternative representation for a derivation Example parse tree for the previous example var expr = term + var b const stmts stmt term a

20 20 Another Example Expression -> Expression + Expression | Expression - Expression |... Variable | Constant |... Variable -> T_IDENTIFIER Constant -> T_INTCONSTANT | T_DOUBLECONSTANT

21 21 The Parse Expression -> Expression + Expression -> Variable + Expression -> T_IDENTIFIER + Expression -> T_IDENTIFIER + Constant -> T_IDENTIFIER + T_INTCONSTANT a + 2

22 22 Parse Trees PS -> P | P PS P ->  | '(' PS ')' | ' ' | '[' PS ']' What’s the parse tree for this statement ? ] >

23 23 EBNF - Extended BNF Like BNF except that Non-terminals start w/ uppercase Parens are used for grouping terminals Braces {} represent zero or more occurrences (iteration ) Brackets [] represent an optional construct, that is a construct that appears either once or not at all.

24 24 EBNF example Exp -> Term { ('+' | '-') Term } Term -> Factor { ('*' | '/') Factor } Factor -> '(' Exp ')' | variable | constant

25 25 EBNF/BNF EBNF and BNF are equivalent How can {} be expressed in BNF? How can ( ) be expressed? How can [ ] be expressed?

26 26 Semantic Analysis The syntactically correct parse tree (or derivation) is checked for semantic errors Check for constructs that while valid syntax do not obey the semantic rules of the source language. Examples: –Use of an undeclared/un-initialized variable –Function called with improper arguments –Incompatible operands and type mismatches,

27 27 Examples int i; int j; i = i + 2; int arr[2], c; c = arr * 10; Most semantic analysis pertains to the checking of types. void fun1(int i); double d; d = fun1(2.1);

28 28 Intermediate Code Generation Where the intermediate representation of the source program is created. The representation can have a variety of forms, but a common one is called three-address code (TAC) Like assembly – the TAC is a sequence of simple instructions, each of which can have at most three operands.

29 29 Example _t1 = b * c _t2 = b * d _t3 = _t1 + _t2 a = _t3 a = b * c + b * d Note: temps

30 30 Another Example _t1 = a > b if _t1 goto L0 _t2 = a - c a = _t2 L0: t3 = b * c c = _t3 if (a <= b) a = a - c; c = b * c; Note Temps Symbolic addresses

31 31 Next Time Finish introduction to compilation stages Read Appel Chapter 1, and 2 if you have not already done so. What is a splay tree?

32 32 Selected References Appel, A., Modern Compiler Implementation In Java (2 nd Ed), Cambridge University Press, 2002. ISBN 052182060X. Aho, A.V., R. Sethi, and J.D. Ullman, Compilers Principles, Techniques and Tools, Addison- Wesley, 1988. ISBN 0-201-10088-6. Muchnick, S., Advanced Compiler Design and Implementation, Morgan Kaufmann, 1998. ISBN 1-55860-320-4.


Download ppt "Introduction to Compilers Professor Yihjia Tsai 2006 Spring Tamkang University."

Similar presentations


Ads by Google