Microprocessors Frame Pointers and the use of the –fomit-frame-pointer switch Feb 25th, 2002.

Slides:



Advertisements
Similar presentations
Calling sequence ESP.
Advertisements

The University of Adelaide, School of Computer Science
COMP3221 lec16-function-II.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 16 : Functions in C/ Assembly - II
University of Washington Procedures and Stacks II The Hardware/Software Interface CSE351 Winter 2013.
Procedures II (1) Fall 2005 Lecture 07: Procedure Calls (Part 2)
The University of Adelaide, School of Computer Science
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Procedures and Stacks. Outline Stack organization PUSH and POP instructions Defining and Calling procedures.
Informationsteknologi Saturday, September 29, 2007 Computer Architecture I - Class 41 Today’s class More assembly language programming.
1 Storage Registers vs. memory Access to registers is much faster than access to memory Goal: store as much data as possible in registers Limitations/considerations:
1 Chapter 7: Runtime Environments. int * larger (int a, int b) { if (a > b) return &a; //wrong else return &b; //wrong } int * larger (int *a, int *b)
Memory Image of Running Programs Executable file on disk, running program in memory, activation record, C-style and Pascal-style parameter passing.
1 Function Calls Professor Jennifer Rexford COS 217 Reading: Chapter 4 of “Programming From the Ground Up” (available online from the course Web site)
CS 536 Spring Code generation I Lecture 20.
Honors Compilers Addressing of Local Variables Mar 19 th, 2002.
1 Homework Reading –PAL, pp , Machine Projects –Finish mp2warmup Questions? –Start mp2 as soon as possible Labs –Continue labs with your.
Semantics of Calls and Returns
Assembly תרגול 8 פונקציות והתקפת buffer.. Procedures (Functions) A procedure call involves passing both data and control from one part of the code to.
Run-time Environment and Program Organization
INVOKE Directive The INVOKE directive is a powerful replacement for Intel’s CALL instruction that lets you pass multiple arguments Syntax: INVOKE procedureName.
Stacks and Frames Demystified CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han.
Chapter 8 :: Subroutines and Control Abstraction
Universal Concepts of Programming Creating and Initializing local variables on the stack Variable Scope and Lifetime Stack Parameters Stack Frames Passing.
Computer Architecture and Operating Systems CS 3230 :Assembly Section Lecture 7 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.
Fall 2008CS 334: Computer SecuritySlide #1 Smashing The Stack A detailed look at buffer overflows as described in Smashing the Stack for Fun and Profit.
CS 11 C track: lecture 5 Last week: pointers This week: Pointer arithmetic Arrays and pointers Dynamic memory allocation The stack and the heap.
Carnegie Mellon Introduction to Computer Systems /18-243, spring 2009 Recitation, Jan. 14 th.
Today’s topics Parameter passing on the system stack Parameter passing on the system stack Register indirect and base-indexed addressing modes Register.
Programming Language Principles Lecture 24 Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Subroutines.
CSc 453 Runtime Environments Saumya Debray The University of Arizona Tucson.
Functions and Procedures. Function or Procedure u A separate piece of code u Possibly separately compiled u Located at some address in the memory used.
Assembly Language for Intel-Based Computers, 6 th Edition Chapter 8: Advanced Procedures (c) Pearson Education, All rights reserved. You may.
Copyright © 2005 Elsevier Chapter 8 :: Subroutines and Control Abstraction Programming Language Pragmatics Michael L. Scott.
Lecture 19: 11/7/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Procedures – Generating the Code Lecture 21 Mon, Apr 4, 2005.
Implementing Subprograms What actions must take place when subprograms are called and when they terminate? –calling a subprogram has several associated.
Prof. Fateman CS 164 Lecture 281 Virtual Machine Structure Lecture 28.
Microprocessors The ia32 User Instruction Set Jan 31st, 2002.
ITCS 3181 Logic and Computer Systems 2015 B. Wilkinson Slides4-2.ppt Modification date: March 23, Procedures Essential ingredient of high level.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 11: Functions and stack frames.
Functions/Methods in Assembly
Compiler Construction Code Generation Activation Records
University of Amsterdam Computer Systems – the instruction set architecture Arnoud Visser 1 Computer Systems The instruction set architecture.
ARM-7 Assembly: Example Programs 1 CSE 2312 Computer Organization and Assembly Language Programming Vassilis Athitsos University of Texas at Arlington.
1 Assembly Language: Function Calls Jennifer Rexford.
Calling Procedures C calling conventions. Outline Procedures Procedure call mechanism Passing parameters Local variable storage C-Style procedures Recursion.
ICS51 Introductory Computer Organization Accessing parameters from the stack and calling functions.
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and notes from the Patterson and Hennessy Text.
Section 5: Procedures & Stacks
Programs – Calling Conventions
Instructions for test_function
Storage Classes There are three places in memory where data may be placed: In Data section declared with .data in assembly language in C - Static) On the.
C function call conventions and the stack
Computer Architecture and Assembly Language
143A: Principles of Operating Systems Lecture 4: Calling conventions
Introduction to Compilers Tim Teitelbaum
Functions and Procedures
Chapter 7 Subroutines Dr. A.P. Preethy
Chapter 9 :: Subroutines and Control Abstraction
Chap. 8 :: Subroutines and Control Abstraction
Chap. 8 :: Subroutines and Control Abstraction
Stack Frames and Advanced Procedures
Stack Frame Linkage.
Procedures – Overview Lecture 19 Mon, Mar 28, 2005.
Assembly Language Programming II: C Compiler Calling Sequences
MIPS Instructions.
Understanding Program Address Space
Topic 3-a Calling Convention 1/10/2019.
Presentation transcript:

Microprocessors Frame Pointers and the use of the –fomit-frame-pointer switch Feb 25th, 2002

General Outline Usually a function uses a frame pointer to address the local variables and parameters Usually a function uses a frame pointer to address the local variables and parameters It is possible in some limited circumstances to avoid the use of the frame pointer, and use the stack pointer instead. It is possible in some limited circumstances to avoid the use of the frame pointer, and use the stack pointer instead. The -fomit-frame-pointer switch of gcc triggers this switch. This set of slides describes the effect of this feature. The -fomit-frame-pointer switch of gcc triggers this switch. This set of slides describes the effect of this feature.

-fomit-frame-pointer Consider this example Consider this example Int q (int a, int b) { int c; int d; c = a + 4; d = isqrt (b); return c + d; } Int q (int a, int b) { int c; int d; c = a + 4; d = isqrt (b); return c + d; }

Calling the function The caller does something like The caller does something like push second-arg (b) push first-arg (a) call q add esp, 8 push second-arg (b) push first-arg (a) call q add esp, 8

Stack at function entry Stack contents (top of memory first) Stack contents (top of memory first) Argument b Argument a return point  ESP Argument b Argument a return point  ESP

Code of q itself The prolog push ebp mov ebp,esp sub esp, 8 The prolog push ebp mov ebp,esp sub esp, 8

Stack after the prolog Immediately after the sub of esp second argument (b) first argument (a) return point old EBP  EBP value of c value of d  ESP Immediately after the sub of esp second argument (b) first argument (a) return point old EBP  EBP value of c value of d  ESP

Addressing using Frame Pointer The local variables and arguments are addressed by using fixed offsets from the frame pointer (ESP is not referenced) The local variables and arguments are addressed by using fixed offsets from the frame pointer (ESP is not referenced) A is at [EBP+8] A is at [EBP+8] B is at [EBP+12] B is at [EBP+12] C is at [EBP-4] C is at [EBP-4] D is at [EBP-8] D is at [EBP-8]

Code for q Code after the prolog MOV EAX, [EBP+8]; A ADD EAX,4 MOV [EBP-4], EAX; C PUSH [EBP+12]; B CALL ISQRT ADD ESP, 4 MOV [EBP-8], EAX; D MOV EAX, [EBP-4]; C ADD EAX, [EBP-8]; D Code after the prolog MOV EAX, [EBP+8]; A ADD EAX,4 MOV [EBP-4], EAX; C PUSH [EBP+12]; B CALL ISQRT ADD ESP, 4 MOV [EBP-8], EAX; D MOV EAX, [EBP-4]; C ADD EAX, [EBP-8]; D

Optimizing use of ESP We don’t really need to readjust ESP after a CALL, just so long as we do not leave junk on the stack permanently. We don’t really need to readjust ESP after a CALL, just so long as we do not leave junk on the stack permanently. The epilog will clean the entire frame anyway. The epilog will clean the entire frame anyway. Let’s use this to improve the code Let’s use this to improve the code

Code with ESP optimization Code after the prolog MOV EAX, [EBP+8]; A ADD EAX,4 MOV [EBP-4], EAX; C PUSH [EBP+12]; B CALL ISQRT MOV [EBP-8], EAX; D MOV EAX, [EBP-4]; C ADD EAX, [EBP-8]; D Code after the prolog MOV EAX, [EBP+8]; A ADD EAX,4 MOV [EBP-4], EAX; C PUSH [EBP+12]; B CALL ISQRT MOV [EBP-8], EAX; D MOV EAX, [EBP-4]; C ADD EAX, [EBP-8]; D We omitted the ADD after the CALL, not needed We omitted the ADD after the CALL, not needed

Epilog Clean up and return Clean up and return MOV ESP, EBP POP EBP RET RETOr LEAVE RET

-fomit-frame-pointer Now we will look at the effect of omitting the frame pointer on the same example, that is we will compile this with the –fomit-frame-pointer switch set. Now we will look at the effect of omitting the frame pointer on the same example, that is we will compile this with the –fomit-frame-pointer switch set. Int q (int a, int b) { int c; int d; c = a + 4; d = isqrt (b); return c + d; } Int q (int a, int b) { int c; int d; c = a + 4; d = isqrt (b); return c + d; }

Calling the function The caller does something like The caller does something like push second-arg (b) push first-arg (a) call q add esp, 8 push second-arg (b) push first-arg (a) call q add esp, 8 This is exactly the same as before, the switch affects only the called function, not the caller This is exactly the same as before, the switch affects only the called function, not the caller

Stack at function entry Stack contents (top of memory first) Stack contents (top of memory first) Argument b Argument a return point  ESP Argument b Argument a return point  ESP This is the same as before This is the same as before

Code of q itself The prolog sub esp, 8 The prolog sub esp, 8 That’s quite different, we have saved some instructions by neither saving nor setting the frame pointer That’s quite different, we have saved some instructions by neither saving nor setting the frame pointer

Stack after the prolog Immediately after the sub of esp second argument (b) first argument (a) return point value of c value of d  ESP Immediately after the sub of esp second argument (b) first argument (a) return point value of c value of d  ESP

Addressing using Stack Pointer The local variables and arguments are addressed by using fixed offsets from the stack pointer The local variables and arguments are addressed by using fixed offsets from the stack pointer A is at [ESP+12] A is at [ESP+12] B is at [ESP+16] B is at [ESP+16] C is at [ESP+4] C is at [ESP+4] D is at [ESP] D is at [ESP]

Code for q Code after the prolog MOV EAX, [ESP+12]; A ADD EAX,4 MOV [ESP+4], EAX; C PUSH [ESP+16]; B CALL ISQRT ADD ESP, 4 MOV [ESP], EAX; D MOV EAX, [ESP+4]; C ADD EAX, [ESP]; D Code after the prolog MOV EAX, [ESP+12]; A ADD EAX,4 MOV [ESP+4], EAX; C PUSH [ESP+16]; B CALL ISQRT ADD ESP, 4 MOV [ESP], EAX; D MOV EAX, [ESP+4]; C ADD EAX, [ESP]; D

Epilog for –fomit-frame-pointer We must remove the 8 bytes of local parameters from the stack, so that ESP is properly set for the RET instruction We must remove the 8 bytes of local parameters from the stack, so that ESP is properly set for the RET instruction ADD ESP,8 RET

Why not always use ESP? Problems with debugging Problems with debugging Debugger relies on hopping back frames using saved frame pointers (which form a linked list of frames) to do back traces etc. Debugger relies on hopping back frames using saved frame pointers (which form a linked list of frames) to do back traces etc. If code causes ESP to move then there are difficulties If code causes ESP to move then there are difficulties Push of parameters Push of parameters Dynamic arrays Dynamic arrays Use of alloca Use of alloca

Pushing Parameters Pushing parameters modifies ESP Pushing parameters modifies ESP Sometimes no problem, as in our example here, since we undo the modification immediately after the call. Sometimes no problem, as in our example here, since we undo the modification immediately after the call. But suppose we had called FUNC(B,B) But suppose we had called FUNC(B,B) We could not do We could not do PUSH [ESP+16] PUSH [ESP+16] PUSH [ESP+16] PUSH [ESP+16] Since ESP is moved by the first PUSH Since ESP is moved by the first PUSH

More on ESP handling Once again Once again PUSH [ESP+16] PUSH [ESP+16] PUSH [ESP+16] PUSH [ESP+16] Would not work, but we can keep track of the fact that ESP has moved and do Would not work, but we can keep track of the fact that ESP has moved and do PUSH [ESP+16]; Push B PUSH [ESP+20]; Push B again PUSH [ESP+16]; Push B PUSH [ESP+20]; Push B again And that works fine And that works fine

More on ESP optimization In the case of using the frame pointer, we were able to optimize to remove the add of ESP. In the case of using the frame pointer, we were able to optimize to remove the add of ESP. Can we still do that? Can we still do that? Answer yes, but we have to keep track of the fact that there is an extra word on the stack, so ESP is 4 “off”. Answer yes, but we have to keep track of the fact that there is an extra word on the stack, so ESP is 4 “off”.

Code with ESP optimization Code after the prolog MOV EAX, [ESP+12]; A ADD EAX,4 MOV [ESP+4], EAX; C PUSH [ESP+16]; B CALL ISQRT MOV [ESP+4], EAX; D MOV EAX, [ESP+8]; C ADD EAX, [ESP+4]; D Code after the prolog MOV EAX, [ESP+12]; A ADD EAX,4 MOV [ESP+4], EAX; C PUSH [ESP+16]; B CALL ISQRT MOV [ESP+4], EAX; D MOV EAX, [ESP+8]; C ADD EAX, [ESP+4]; D Last three references had to be modified Last three references had to be modified

Epilog for Optimized code We also have to modify the epilog in this case, since now there are 12 bytes on the stack at the exit, 8 from the local parameters, and 4 from the push we did. We also have to modify the epilog in this case, since now there are 12 bytes on the stack at the exit, 8 from the local parameters, and 4 from the push we did. Epilog becomes Epilog becomes ADD ESP,12 RET But no instructions were added But no instructions were added

Other cases of ESP moving Dynamic arrays allocated on the local stack, whose size is not known Dynamic arrays allocated on the local stack, whose size is not known Explicit call to alloca Explicit call to alloca How alloca works How alloca works Subtract given value from ESP Subtract given value from ESP Return ESP value as pointer to new area Return ESP value as pointer to new area These cases are fatal These cases are fatal MUST use a frame pointer in these cases MUST use a frame pointer in these cases

Even better, More optimization Let’s recall our example: Let’s recall our example: Int q (int a, int b) { int c; int d; c = a + 4; d = isqrt (b); return c + d; } Int q (int a, int b) { int c; int d; c = a + 4; d = isqrt (b); return c + d; } We can rewrite this to avoid the use of the local parameters c and d completely, and the compiler can do the same thing. We can rewrite this to avoid the use of the local parameters c and d completely, and the compiler can do the same thing.

Optimized Version With some optimization, we can write With some optimization, we can write Int q (int a, int b) { return isqrt (b) + a + 4; } Int q (int a, int b) { return isqrt (b) + a + 4; } We are not suggesting that the user have to rewrite the code this way, we want the compiler to do it automatically We are not suggesting that the user have to rewrite the code this way, we want the compiler to do it automatically

Optimizations We Used Commutative Optimization Commutative Optimization A + B = B + A A + B = B + A Associative Optimization Associative Optimization A + (B + C) = (A + B) + C A + (B + C) = (A + B) + C For integer operands, these optimizations are certainly valid (well see fine point on next slide) For integer operands, these optimizations are certainly valid (well see fine point on next slide) Floating-point is another matter! Floating-point is another matter!

A fine Point The transformation of The transformation of (A + B) + C to A + (B + C) (A + B) + C to A + (B + C) Works fine in 2’s complement integer arithmetic with no overflow, which is the code the compiler will generate Works fine in 2’s complement integer arithmetic with no overflow, which is the code the compiler will generate But strictly at the C source level, B+C might overflow, so at the source level this transformation is not technically correct But strictly at the C source level, B+C might overflow, so at the source level this transformation is not technically correct But we are really talking about compiler optimizations anyway, so this does not matter. But we are really talking about compiler optimizations anyway, so this does not matter.

The optimized code Still omitting the frame pointer, we now have the following modified code for the optimized function Still omitting the frame pointer, we now have the following modified code for the optimized function

The prolog (this slide intentionally blank ) (this slide intentionally blank ) No prolog code is necessary, we can use the stack exactly as it came to us: No prolog code is necessary, we can use the stack exactly as it came to us: second argument (b) first argument (a) return point  ESP second argument (b) first argument (a) return point  ESP And address parameters off unchanged ESP And address parameters off unchanged ESP A is at [ESP+4] A is at [ESP+4] B is at [ESP+8] B is at [ESP+8]

The body of the function Code after the (empty) prolog PUSH[ESP+8]; B CALLISQRT ADDEAX, [ESP+8]; A ADD EAX, 4 Code after the (empty) prolog PUSH[ESP+8]; B CALLISQRT ADDEAX, [ESP+8]; A ADD EAX, 4 Note that the reference to A was adjusted to account for the extra 4 bytes pushed on to the stack before the call to ISQRT. Note that the reference to A was adjusted to account for the extra 4 bytes pushed on to the stack before the call to ISQRT.

The epilog We pushed 4 bytes extra on to the stack, so we need to pop them off We pushed 4 bytes extra on to the stack, so we need to pop them off ADDESP,4 RET ADDESP,4 RET And that’s it, only 6 instructions in all. And that’s it, only 6 instructions in all. Removing the frame pointer really helped here, since it saved 3 instructions and two memory references Removing the frame pointer really helped here, since it saved 3 instructions and two memory references

Other advantages of omitting FP If we omit the frame pointer then we have an extra register If we omit the frame pointer then we have an extra register For the x86, going from 6 to 7 available registers can make a real difference For the x86, going from 6 to 7 available registers can make a real difference Of course we have to save and restore EBP to use it freely Of course we have to save and restore EBP to use it freely But that may well be worth while in a long function, anything to keep things in registers and save memory references is a GOOD THING! But that may well be worth while in a long function, anything to keep things in registers and save memory references is a GOOD THING!

Summary Now you know what this gcc switch does Now you know what this gcc switch does But more importantly, if you understand what it does, you understand all about frame pointers and addressing of data in local frames. But more importantly, if you understand what it does, you understand all about frame pointers and addressing of data in local frames.