Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 9 Compilers and Language Translation. The Compilation Process Phase I: Lexical analysis Phase I: Lexical analysis Phase II: Parsing Phase II:

Similar presentations


Presentation on theme: "Chapter 9 Compilers and Language Translation. The Compilation Process Phase I: Lexical analysis Phase I: Lexical analysis Phase II: Parsing Phase II:"— Presentation transcript:

1 Chapter 9 Compilers and Language Translation

2 The Compilation Process Phase I: Lexical analysis Phase I: Lexical analysis Phase II: Parsing Phase II: Parsing Phase III: Semantics and code generation Phase III: Semantics and code generation Phase IV: Code Optimization Phase IV: Code Optimization

3 Introduction High-level languages are more difficult to “ translate ” than assembly languages. High-level languages are more difficult to “ translate ” than assembly languages. Assembly language and machine language are related 1-to-1. Assembly language and machine language are related 1-to-1. The relationship between a high-level language and machine language is 1- to-many. The relationship between a high-level language and machine language is 1- to-many.

4 Compiler The piece of software that translates high-level programming language codes into machine language codes. The piece of software that translates high-level programming language codes into machine language codes. Two distinct goals of compiler: Two distinct goals of compiler: CorrectnessCorrectness Efficient and concise Example: 2x 0 +2x 1 + … +2x 50000Efficient and concise Example: 2x 0 +2x 1 + … +2x 50000

5 Object file The Compilation Process ScannerParser Code Generator Optimizer

6 Lexical Analysis The compiler examines the individual characters in the source program and groups them into syntactical units, called tokens, that will be analyzed in succeeding stages. The compiler examines the individual characters in the source program and groups them into syntactical units, called tokens, that will be analyzed in succeeding stages. Analogous to grouping letters into words prior to analyzing text. Analogous to grouping letters into words prior to analyzing text.

7 Parsing During this stage the sequence of tokens formed by the scanner is checked to see whether it is syntactically correct according to the rules of the programming language. During this stage the sequence of tokens formed by the scanner is checked to see whether it is syntactically correct according to the rules of the programming language. Equivalent to checking whether the words in the text form grammatically correct sentences. Equivalent to checking whether the words in the text form grammatically correct sentences.

8 Semantic Analysis and Code Generation If the high-level language statement is structurally correct, then the compiler analyzes its meaning and generates the proper sequence of machine language instructions to carry out these actions. If the high-level language statement is structurally correct, then the compiler analyzes its meaning and generates the proper sequence of machine language instructions to carry out these actions.

9 Code Optimization The compiler takes the generated code and see whether it can be made more efficient, either by making it run faster, or having it occupy less memory. The compiler takes the generated code and see whether it can be made more efficient, either by making it run faster, or having it occupy less memory.

10 Phase I: Lexical Analysis Scanner, or lexical analyzer, groups input characters into tokens. Scanner, or lexical analyzer, groups input characters into tokens. Example: a = b + 319 - delta; Example: a = b + 319 - delta; The scanner discards nonessential characters, such as blanks and tabs, and the group the remaining characters into high-level syntactic symbols such as symbols, numbers, and operators. The scanner discards nonessential characters, such as blanks and tabs, and the group the remaining characters into high-level syntactic symbols such as symbols, numbers, and operators.

11 Token Classifications Token typeClassification number Token typeClassification number symbol1 number2 Others: =(3),+(4),-(5),;(6); ==(7), if(8), else (9), ( 10, ) 11 Others: =(3),+(4),-(5),;(6); ==(7), if(8), else (9), ( 10, ) 11

12 Phase II: Parsing During the parsing phase, a compiler determines whether the tokens recognized by the scanner fit together in a grammatically meaningful way. During the parsing phase, a compiler determines whether the tokens recognized by the scanner fit together in a grammatically meaningful way. Analogous to the operation of “ diagramming a sentence ”. Analogous to the operation of “ diagramming a sentence ”.

13 Example To prove the sequence of words: To prove the sequence of words: The man bit the dog is a correctly formed sentence. is a correctly formed sentence.

14 Another Example The man bit the

15 Programming Language Example Statement: a = b + c Statement: a = b + c

16 Parse Tree The structure shown in the previous example is called a parse tree. The structure shown in the previous example is called a parse tree. It starts from the individual tokens a,=,b,+,c and show how these tokens can be grouped together into predefined grammatical categories such as, and until the desired goal is reached. (in this case, ) It starts from the individual tokens a,=,b,+,c and show how these tokens can be grouped together into predefined grammatical categories such as, and until the desired goal is reached. (in this case, )

17 Grammars, Languages and BNF How does a parser know how to construct the parse tree? How does a parser know how to construct the parse tree? The parser must be given a formal description of the syntax, the grammatical structure, of the language that it is going to analyze. The parser must be given a formal description of the syntax, the grammatical structure, of the language that it is going to analyze. Most widely used notation for representing the syntax of programming language is called BNF, an acronym for Backus-Naur form. Most widely used notation for representing the syntax of programming language is called BNF, an acronym for Backus-Naur form.

18 BNF The syntax of a language is specified as a set of rules, also called productions. The syntax of a language is specified as a set of rules, also called productions. The entire collection of rules is called a grammar. The entire collection of rules is called a grammar. BRN rule: left-hand side::=“definition” BRN rule: left-hand side::=“definition”

19 BNF Example ::= = ::= = The rule says that the syntactical construct called is defined as a followed by the token = followed by the syntactical construct called The rule says that the syntactical construct called is defined as a followed by the token = followed by the syntactical construct called

20 Terminal/Nonterminals BNF uses two types of objects on the right hand side of a productions: BNF uses two types of objects on the right hand side of a productions: Terminals: actual tokens of the language recognized and returned by a scanner.Terminals: actual tokens of the language recognized and returned by a scanner. Nonterminals: an intermediate grammatical category used to help explain and organize the language.Nonterminals: an intermediate grammatical category used to help explain and organize the language.

21 Goal Symbol The goal symbol is the highest-level nonterminal. The goal symbol is the highest-level nonterminal. When goal symbol has been produced, the parser has finished building the tree, and the statements have been successfully parsed. When goal symbol has been produced, the parser has finished building the tree, and the statements have been successfully parsed. The collection of all statements that can be successfully parsed is called the language defined by a grammar. The collection of all statements that can be successfully parsed is called the language defined by a grammar.

22 Meta-symbols Meta-symbol: used to describe the characteristics of another language. Meta-symbol: used to describe the characteristics of another language. BNF has five meta-symbols: BNF has five meta-symbols:<>::= | :OR, Ex: :=0|1|2|3|4|5|6|7|8|9  : null string Ex: := := +|-|

23 Fundamental Rule of Parsing If, by repeated applications of the rules of the grammar, a parser can convert the sequence of input tokens into the goal symbol, then that sequence of tokens is a syntactically valid statement of the language. If, by repeated applications of the rules of the grammar, a parser can convert the sequence of input tokens into the goal symbol, then that sequence of tokens is a syntactically valid statement of the language.

24 Example A three-rule grammar A three-rule grammar 1. ::= 1. ::= 2. ::= bees|dogs 3. ::=buzz|bite Example 1: Dogs bite. Example 1: Dogs bite. Example 2: Bees dogs. Example 2: Bees dogs.

25 Another Example Grammar for a simplified assignment statement Grammar for a simplified assignment statement 1. ::= = 1. ::= = 2. ::= | + 2. ::= | + 3. ::= x|y|z

26 Generated Parse Tree

27 Wrong Path

28 How to parse? The process of parser is a complex sequence of applying rules, building grammatical constructs, seeing whether things are moving toward the correct answer (the goal symbol). If not, “ undo ” the rule just applied and try another. The process of parser is a complex sequence of applying rules, building grammatical constructs, seeing whether things are moving toward the correct answer (the goal symbol). If not, “ undo ” the rule just applied and try another. Look-ahead parsing algorithm: “ looking down the road ” a few tokens to see what would happen if a certain choice were made. Look-ahead parsing algorithm: “ looking down the road ” a few tokens to see what would happen if a certain choice were made.

29 Example Not possible to build a parse tree with the grammar.

30 Major Challenge Design a grammar that: Design a grammar that: Includes every valid statement that we want to be in the languageIncludes every valid statement that we want to be in the language Excludes every invalid statement that we do not want to be in the languageExcludes every invalid statement that we do not want to be in the language

31 Assignment Statement (2 nd try) 1. ::= = 1. ::= = 2. ::= | + (recursive definition) 3. ::= x|y|z

32 Resulting Parse Tree

33 Using Recursive Definition

34 Validity vs. Ambiguity It is possible to construct two parse trees of x=x+y+z using the 2 nd grammar.  Two different meanings. It is possible to construct two parse trees of x=x+y+z using the 2 nd grammar.  Two different meanings. X=(x+y)+zx=x+(y+z) X=(x+y)+zx=x+(y+z)

35 If-else grammar

36 Parse Tree

37 Phase III: Semantics and Code Generation 1. ::= 1. ::= 2. ::= bees|dogs 3. ::=buzz|bite Possible combinations: Possible combinations: Dogs bite.Dogs bite. Dogs bark.Dogs bark. Bees bite.Bees bite. Bees bark.Bees bark. Not all combinations make sense. Not all combinations make sense.

38 Semantics and Code Generation A compiler examines the semantics of a programming language statement. It analyzes the meaning of the tokens and tries to understand the actions they perform. A compiler examines the semantics of a programming language statement. It analyzes the meaning of the tokens and tries to understand the actions they perform. If the statement is meaningless, it is semantically rejected. Otherwise it is translated into machine language. If the statement is meaningless, it is semantically rejected. Otherwise it is translated into machine language.

39 Example The statement sum=a+b; The statement sum=a+b; is syntactically correct. But what if the variables are defined as follows: But what if the variables are defined as follows: char a; double b; int sum;

40 Semantic Records Each nonterminal symbol is associated with a semantic record, a data structure that stores information about a nonterminal, such as the actual name of the object and its data type. Each nonterminal symbol is associated with a semantic record, a data structure that stores information about a nonterminal, such as the actual name of the object and its data type.

41 Semantic Records (II) Grows gradually. Grows gradually.

42 Another Situation

43 Two-Stage Process Semantic analysis: a pass over the parse tree to determine whether all branches of the tree are semantically valid. Semantic analysis: a pass over the parse tree to determine whether all branches of the tree are semantically valid. Code generation: the compiler makes a 2 nd pass over the parse tree to produce the translated code. Code generation: the compiler makes a 2 nd pass over the parse tree to produce the translated code.

44 Example

45 Example (cont’d)

46

47

48

49 Code Optimization To make the code more efficient: To make the code more efficient: Local optimizationLocal optimization Global optimizationGlobal optimization Different from programmer optimization with compiler tools such as: Different from programmer optimization with compiler tools such as: Visual development environmentsVisual development environments On-line debuggersOn-line debuggers Reusable code librariesReusable code libraries

50 Local Optimization Look at a very small block of instructions and try to improve it. Look at a very small block of instructions and try to improve it. Possible approaches Possible approaches Constant evaluation: x=1+1;Constant evaluation: x=1+1; Strength reduction: x=x*2;Strength reduction: x=x*2; Eliminating unnecessary operationsEliminating unnecessary operations

51 Global Optimization Look at large segments of program and decide how to improve performance. Look at large segments of program and decide how to improve performance. A much harder problem. A much harder problem.


Download ppt "Chapter 9 Compilers and Language Translation. The Compilation Process Phase I: Lexical analysis Phase I: Lexical analysis Phase II: Parsing Phase II:"

Similar presentations


Ads by Google