Presentation is loading. Please wait.

Presentation is loading. Please wait.

UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02.

Similar presentations


Presentation on theme: "UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02."— Presentation transcript:

1 UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02

2 UMBC 2 What is a compiler? Translates source code to target code –Source code is typically a high level programming language (Java, C++, etc) but does not have to be –Target code is often a low level language like assembly or machine code but does not have to be Can you think of other compilers that you have used – according to this definition?

3 UMBC 3 Other Compilers Javadoc -> HTML SQL Query output -> Table Poscript -> PDF High level description of a circuit -> machine instructions to fabricate circuit

4 The Compilation Process

5 UMBC 5 The analysis Stage Broken up into four phases –Lexical Analysis (also called scanning or tokenization) –Parsing –Semantic Analysis –Intermediate Code Generation

6 UMBC 6 Lexing Example double d1; double d2; d2 = d1 * 2.0; doubleTOK_DOUBLE reserved word d1TOK_IDvariable name ;TOK_PUNCThas value of “;” double TOK_DOUBLE reserved word d2TOK_IDvariable name ;TOK_PUNCThas value of “;” d2TOK_IDvariable name =TOK_OPER has value of “=” d1 TOK_IDvariable name * TOK_OPER has value of “*” 2.0 TOK_FLOAT_CONST has value of 2.0 ; TOK_PUNCThas value of “;” lexemes

7 UMBC 7 Syntax and Semantics Syntax - the form or structure of the expressions – whether an expression is well formed Semantics – the meaning of an expression

8 UMBC 8 Syntactic Structure Syntax almost always expressed using some variant of a notation called a context-free grammar (CFG) or simply grammar –BNF –EBNF

9 UMBC 9 A CFG has 4 parts A set of tokens (lexemes), known as terminal symbols A set of non-terminals A set of rules (productions) where each production consists of a left-hand side (LHS) and a right-hand side (RHS) The LHS is a non-terminal and the RHS is a sequence of terminals and/or non-terminal symbols. A special non-terminal symbol designated as the start symbol

10 UMBC 10 An example of BNF syntax for real numbers ::=. ::= | ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7| 8 | 9 encloses non-terminal symbols ::= 'is' or 'is made up of ' or 'derives' (sometimes denoted with an arrow ->) | or

11 UMBC 11 Example On the example from the previous slide: –What are the tokens? –What are the lexemes? –What are the non terminals? –What are the productions?

12 UMBC 12 BNF Points A non terminal can have more than RHS or an OR can be used Lists or sequences are expressed via recursion A derivation is just a repeated set of production (rule) applications Examples

13 UMBC 13 Example Grammar -> -> | ; -> = -> a | b | c | d -> + | - -> | const

14 UMBC 14 Example Derivation => => => = => a = => a = + => a = b + => a = b + const

15 UMBC 15 Parse Trees Alternative representation for a derivation Example parse tree for the previous example var expr = term + var b const stmts stmt term a

16 UMBC 16 Another Example Expression -> Expression + Expression | Expression - Expression |... Variable | Constant |... Variable -> T_IDENTIFIER Constant -> T_INTCONSTANT | T_DOUBLECONSTANT

17 UMBC 17 The Parse Expression -> Expression + Expression -> Variable + Expression -> T_IDENTIFIER + Expression -> T_IDENTIFIER + Constant -> T_IDENTIFIER + T_INTCONSTANT a + 2

18 UMBC 18 Parse Trees PS -> P | P PS P ->  | '(' PS ')' | ' ' | '[' PS ']' What’s the parse tree for this statement ? ] >

19 UMBC 19 EBNF - Extended BNF Like BNF except that Non-terminals start w/ uppercase Parens are used for grouping terminals Braces {} represent zero or more occurrences (iteration ) Brackets [] represent an optional construct, that is a construct that appears either once or not at all.

20 UMBC 20 EBNF example Exp -> Term { ('+' | '-') Term } Term -> Factor { ('*' | '/') Factor } Factor -> '(' Exp ')' | variable | constant

21 UMBC 21 EBNF/BNF EBNF and BNF are equivalent How can {} be expressed in BNF? How can ( ) be expressed? How can [ ] be expressed?

22 UMBC 22 Semantic Analysis The syntactically correct parse tree (or derivation) is checked for semantic errors Check for constructs that while valid syntax do not obey the semantic rules of the source language. Examples: –Use of an undeclared/un-initialized variable –Function called with improper arguments –Incompatible operands and type mismatches,

23 UMBC 23 Examples int i; int j; i = i + 2; int arr[2], c; c = arr * 10; Most semantic analysis pertains to the checking of types. void fun1(int i); double d; d = fun1(2.1);

24 UMBC 24 Intermediate Code Generation Where the intermediate representation of the source program is created. The representation can have a variety of forms, but a common one is called three- address code (TAC) Like assembly – the TAC is a sequence of simple instructions, each of which can have at most three operands.

25 UMBC 25 Example _t1 = b * c _t2 = b * d _t3 = _t1 + _t2 a = _t3 a = b * c + b * d Note temps

26 UMBC 26 Another Example _t1 = a > b if _t1 goto L0 _t2 = a - c a = _t2 L0: t3 = b * c c = _t3 if (a <= b) a = a - c; c = b * c; Note Temps Symbolic addresses

27 UMBC 27 Next Time Finish introduction to compilation stages Read Aho/Sethi/Ullman Chapter 1

28 UMBC 28 Selected References Compilers Principles, Techniques and Tools, Aho, Sethi, and Ullman http://www.stanford.edu/class/cs143 /


Download ppt "UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02."

Similar presentations


Ads by Google