Compilers.

Compilers

Compiler Is a program which takes one language (source program ) as input and translates it into an equivalent another language ( target program) During this process of translation if some errors are encountered, then compiler displays them as error message It takes program s such C, PASCAL, FORTRAN and converts into lower level languages like assembly language

Analysis and synthesis phase
Source program is read and broken into constituent pieces Intermediate code is created Synthesis Target program is generated

Phases of Compiler 1) Lexical analysis 2) syntax analysis
3) semantic analysis 4) Intermediate code generation 5) Code optimization 6) code generation Symbol table management Error detection an handling

Lexical analysis Also called as scanning
Complete source code is scanned Broken up into group of strings called token

Syntax analysis Also called parsing
Tokens are grouped together to form hierarchical structure Determines the structure of the source string by grouping the tokens together The hierarchical structure generated is called parse tree of syntax tree

Semantic analysis Determine meaning of source string
Like matching of parenthesis, or matching of if…else statements or performing arithmetic operations that are type compatible, or checking scope of operation.

Intermediate code generation
The code which can easily converted into target code This could be in a form of three address code

Code optimization Improve the intermediate code.
To have faster executing code or less consumption of memory Code generation Target code gets generated Sequence of machine instructions

Semantic gap Difference between the semantics of two domains
For compiler, there are two domains The domain of source language The execution domain The semantics of these two domains is very different and a gap exists Let us see four features that show how the compiler bridges the semantic gap

Causes of semantic gap : Data types
A data type is a specification of (1) values that entities of the type may have, and (2) operations that may be performed on entities of the type. We refer to these value and operations as legal value and legal operations. Compiler must check whether the variable of particular data type are assigned with legal values And whether variables and values of a type are manipulated through legal operations And issue error messages when these requirements are not met Type conversion is used to convert the value in one type into other data type for example. Only conversion between few types is possible Int is converted to real data type to perform some arithmetic computations

Data structures Program may use data structures like an array, stack, record or list To generate code for a reference to a specific element of data structure, the compiler must develop a memory mapping for finding the memory words that correspond to the required data element. A record or structure , which is a heterogeneous data structure, required a complex memory mapping.

Scope The scope of a program entity (e.g. data item) is that part of a program in which the entity is accessible. Scope rules determine whether a variable is accessible at a specific place in program Generally scope of data item is restricted to the program block in which the data item is declared

Control structures The control structure of a language is the collection of language features that can be used for altering the flow of control during execution of a program Includes unconditional and conditional transfer of control, iteration control and procedure calls. A compiler must ensures that a source program does not violate the semantics of a control structure

Binding and Binding Times
Each program entity pe in program P has a set of attributes. If pe is an identifier , it has attribute kind whose value indicate whether it is a variable, a procedure or a reserved identifier( keyword) A variable has attributes like type, dimensionality, scope, memory address etc. Note : An attribute of one program entity may itself be another program entity. Type can have attributes like size in number if memory bytes.

Binding Binding : is association of attribute of program entity with a value. For example : my_type alpha; Compiler process the statement, binds the type of variable alpha to my_type To facilitate memory allocation to alpha , the size of my_type should be known. So the size attribute of my_type should have been bound sometimes before

Binding Times Language definition time or prog language L
Which is the time at which features of a language are specified Language implementation time of a prog language L Which is the time at which the design of a language translator for L is finalized Compilation time of a program P Execution init time of procedure proc Execution time of a procure proc

Language specification L
It may specify binding times for the attributes of various entities of program For example, the specification of a block-structured language may state that binding of the local variables of procedure should be performed at execution init time of a procedure

Static and dynamic binding
A static binding is a binding performed before the execution of a program begins A dynamic binding is a binding performed after the execution of a program has begun Use of static binding leads to more efficient execution of a program than use of dynamic binding.

Data structures used in compiler
There are two data structures used by compiler Stacks: is used for activation records Heap : head is used for allocation and de-allocation of memory The stack is used to manage runtime storage Heap is used to mange dynamic memory allocation Using LIFO, activation records and data objects are pushed onto the stack. Memory allocation is efficient in Heap

Fields of activation record
Activation record is block of memory used for managing info needed by a single execution of a procedure Return value : store result of a function call. Actual parameters : information about actual parameters Control link : optional.. Points to the activation record of the calling procedure. Access link: optional.. non local data in other activation record. Saved machine status : status of machine just before the procedure is called Local variables: data that is local to the execution of procedure is stored Temporaries:

Heap data structure Heap is used for allocation and deallocation of objects When an object is created required amount of memory is allocated for it from the heap After the use of that object, the allocated memory can be free and returned to the heap. In C language, the malloc function is used to allocate the memory and using free function the memory is deallocated

Due to frequent allocation and deallocation;
Small free areas or holes get created in memory. Hence memory management techniques are required to collect al the such free memory areas and reuse them effectively Two popular techniques of memory management are – Reference count Garbage collction

Reference count In this technique, system associates a reference count with each memory area to indicate how many users or programs are currently using it. This count is incremented when user gains the access to that area And decremented when user free the memory area When reference count is zero then that means the memory area is free

Garbage Collector Makes two passes over the memory to identify unused areas In first pass, it traverse all pointers that point to allocated areas and marks the areas that are in use. In second pass; it finds all areas that are unmarked and declares them to be unused or free. Is also known as automatic memory management

Memory Allocation Strategies
A program in OS called ‘memory manager’ handles memory management by allocating required amount of primary memory to the processes Three memory allocation strategies First-fit Best- fit

First -fit Consider there are many free blocks(holes) in the memory
It allocates the first hole which is satisfying memory requirement of the process Example :a process requires 15 kb of memory Memory manger has list of 10 kb, 18kb, 16kb, 25kb, 19kb of unallocated memory First –fit will allocate 18kb of memory

Best-fit It will allocate the best suited hole to the process
Now from the list memory manager will allocate 16 kb block to the process The very small free memory areas that remain after the allocation of memory is called fragmentation. This problem can be resolved by using memory compaction techniques.

Compilation of Expression
Expression contain the arithmetic operators and operands. This can be converted into an intermediate code form Let’s see various forms of intermediate code 1. abstract syntax tree 2. polish notation 3. three address code

Abstract syntax tree Consider a string x:= -a* b + -a* b
Natural hierarchical structure is represented by syntax trees.

Polish Notation Postfix or prefix notation For e.g. x = -a* b + - a* b
Postfix form xa-b*a-b*+=

Three Address Code Quadruple representation

Triples

Indirect Triples

Code Optimization Techniques
1) Common SubExpression Elimination: Example : t1 = 4 * i; t2= a[t1]; t3= 4 * j; t4= 4 * I; t5 = n; t6= b[t4] + t5; t6= b[t1] + t5;

Code Motion Move some code from the loop to before the loop starts
Example : While ( i <= max – 1) { sum = sum + a[i]; } N = max – 1; While ( i<= n)

Strength Reduction Replace heavy operations by light Example :
for( i = 0 ; i <=50; i++) { count = i * 7; } Temp = 7; count= temp; temp = temp + 7;

Dead Code Elimination Any variable is dead if it’s value is not used in any code of program. In example shown below i=1 is a dead code because it will never happen . i=0; if(i== 1) { A = x+5; }

Copy Propagation Variable propagation means use of one variable instead of another. x=pi; … area = x * r * r; } Here variable x is eliminated.

Loop optimization Techniques
1) Code Motion 2) Induction variable and strength reduction 3) Loop Unrolling 4) Loop Fusion

Induction variables and reduction in strength
A variable x is called an induction variable of loop L if the value gets changed every time It is either incremented or decremented by some constant For example; B1 i = i + 1; t1 = 4 * i; t2 = a [t1]; If t2 < 10 goto B1 Here i and t1 are induction variables. It may be to get rid of not all but one.

Loop unrolling In this method, number of jumps and tests can be reduced by writing the code two times. int i = 1; While(i<=100) { a[i]=b[i]; i++; } int i = 1 ; While(i<= 50) a[i] = b[i];

Loop Fusion In loop fusion method several loops are merged to one loop. For example: for i=1 to n do for j=1 to m do A[i,j] = 10 Can be written as for i =1 to n*m do A[i] = 10

Compilers.

Similar presentations

Presentation on theme: "Compilers."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Compilers.

Similar presentations

Presentation on theme: "Compilers."— Presentation transcript:

Similar presentations

About project

Feedback