Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine-Level Programming: X86-64 Topics Registers Stack Function Calls Local Storage X86-64.ppt CS 105 Tour of Black Holes of Computing.

Similar presentations


Presentation on theme: "Machine-Level Programming: X86-64 Topics Registers Stack Function Calls Local Storage X86-64.ppt CS 105 Tour of Black Holes of Computing."— Presentation transcript:

1 Machine-Level Programming: X86-64 Topics Registers Stack Function Calls Local Storage X86-64.ppt CS 105 Tour of Black Holes of Computing

2 – 2 – 105 Interesting Features of x86-64 Pointers and long Integers are 64 bits Arithmetic ops support 8, 16, 32, and 64-bit ints Set of general purpose registers is 16 Full compatibility with X32 Conditional ops are implemented using conditional move instructions (when possible) – faster than branch Much of pgm state held in registers Up to 6 arguments passed via registers (not stack) Some procedures do not need to access the stack at all?? Floating-point ops implemented using register-oriented instructions rather than stack-based With move to x86-64, gcc updated to take advantage of many architecture changes

3 – 3 – 105 Data Representations: IA32 + x86-64 Sizes of C Objects (in Bytes) C Data TypeTypical 32-bitIntel IA32x86-64 unsigned444 int444 long int448 char111 short222 float444 double888 long double810/1216 char *448 Or any other pointer x86-64: 64 bit ints and 64 bit pointers

4 – 4 – 105 %rax %rbx %rcx %rdx %rsi %rdi %rsp %rbp x86-64 Integer Registers - 16 Extend existing registers. Add 8 new ones, word (16 bit and byte (8 bit) registers still available, first 2 bytes available (rax, rbx, rcx, rdx) Make %ebp / %rbp general purpose, note 32 bit names %eax %ebx %ecx %edx %esi %edi %esp %ebp %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15 %r8d %r9d %r10d %r11d %r12d %r13d %r14d %r15d

5 – 5 – 105 Instructions Long word l (4 Bytes) ↔ Quad word q (8 Bytes) New instructions: movl → movq addl → addq sall → salq etc. 32-bit instructions that generate 32-bit results Set higher order bits of destination register to 0 Example: addl

6 – 6 – 105 Swap in 32-bit Mode void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; } swap: pushl %ebp movl %esp,%ebp pushl %ebx movl 12(%ebp),%ecx movl 8(%ebp),%edx movl (%ecx),%eax movl (%edx),%ebx movl %eax,(%edx) movl %ebx,(%ecx) movl -4(%ebp),%ebx movl %ebp,%esp popl %ebp ret Body Setup Finish

7 – 7 – 105 Swap in 64-bit Mode Operands passed in registers (why useful?) First ( xp ) in %rdi, second ( yp ) in %rsi 64-bit pointers No stack operations required, actually no stack frame 32-bit data Data held in registers %eax and %edx movl operation void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; } swap: movl(%rdi), %edx movl(%rsi), %eax movl%eax, (%rdi) movl%edx, (%rsi) retq

8 – 8 – 105 %rax %rbx %rcx %rdx %rsi %rdi %rsp %rbp x86-64 Integer Registers, 6 used for arguments %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15 Callee saved C: Callee saved Callee saved Stack pointer Used for linking Return value Argument #4 Argument #1 Argument #3 Argument #2 Argument #6 Argument #5

9 – 9 – 105 Interesting Features of Stack Frame Allocate entire frame at once All stack accesses can be relative to %rsp Do so by decrementing stack pointer Can delay allocation, since safe to temporarily use red zone- 128 bytes available below the stack Simple deallocation Increment stack pointer No base/frame pointer needed

10 – 10 – 105 x86-64 Registers Arguments passed to functions via registers If more than 6 integral parameters, then pass rest on stack These registers can be used as caller-saved as well All references to stack frame via stack pointer Eliminates need to update %ebp/%rbp Other Registers R12-R15, Rbx, Rbp, callee saved 2 or 3 have special uses Rax holds return value for a function Rsp is the stack pointer

11 – 11 – 105 Reading Condition Codes: x86-64 int gt (long x, long y) { return x > y; } xorl %eax, %eax# eax = 0 cmpq %rsi, %rdi# Compare x and y setg %al# al = x > y Body (same for both) long lgt (long x, long y) { return x > y; } SetX Instructions: Set single byte based on combination of condition codes Does not alter remaining 3 bytes Will disappear Blackboard?

12 – 12 – 105 Reading Condition Codes: x86-64 int gt (long x, long y) { return x > y; } xorl %eax, %eax# eax = 0 cmpq %rsi, %rdi# Compare x and y setg %al# al = x > y Body (same for both) long lgt (long x, long y) { return x > y; } Is %rax zero? Yes: 32-bit instructions set high order 32 bits to 0! SetX Instructions: Set single byte based on combination of condition codes Does not alter remaining 3 bytes

13 – 13 – 105 Conditionals: x86-64 absdiff: # x in %edi, y in %esi movl %edi, %eax # eax = x movl %esi, %edx # edx = y subl %esi, %eax # eax = x-y subl %edi, %edx # edx = y-x cmpl %esi, %edi # x:y cmovle %edx, %eax # eax=edx if <= ret int absdiff( int x, int y) { int result; if (x > y) { result = x-y; } else { result = y-x; } return result; } Will disappear Blackboard?

14 – 14 – 105 Conditionals: x86-64 Conditional move instruction cmovC src, dest Move value from src to dest if condition C holds More efficient than conditional branching (simple control flow) But overhead: both branches are evaluated absdiff: # x in %edi, y in %esi movl %edi, %eax # eax = x movl %esi, %edx # edx = y subl %esi, %eax # eax = x-y subl %edi, %edx # edx = y-x cmpl %esi, %edi # x:y cmovle %edx, %eax # eax=edx if <= ret int absdiff( int x, int y) { int result; if (x > y) { result = x-y; } else { result = y-x; } return result; }

15 – 15 – 105 x86-64 Procedure Summary Heavy use of registers Parameter passing More temporaries since more registers Minimal use of stack Sometimes none Allocate/deallocate entire block Many tricky optimizations What kind of stack frame to use Calling with jump Various allocation techniques

16 – 16 – 105 IA32 Example Addresses $esp0xffffbcd0 p3 0x65586008 p1 0x55585008 p40x1904a110 p20x1904a008 &p20x18049760 beyond 0x08049744 big_array 0x18049780 huge_array 0x08049760 main()0x080483c6 useless() 0x08049744 final malloc()0x006be166 address range ~2 32 FF 00 Stack Text Data Heap 08 80 not drawn to scale malloc() is dynamically linked address determined at runtime

17 – 17 – 105 x86-64 Example Addresses $rsp0x7ffffff8d1f8 p3 0x2aaabaadd010 p1 0x2aaaaaadc010 p40x000011501120 p20x000011501010 &p20x000010500a60 beyond 0x000000500a44 big_array 0x000010500a80 huge_array 0x000000500a50 main()0x000000400510 useless() 0x000000400500 final malloc()0x00386ae6a170 address range ~2 47 00007F 000000 Stack Text Data Heap 000030 not drawn to scale malloc() is dynamically linked address determined at runtime

18 – 18 – 105 x86-64 Summary Pointers and long integers are 64 bit 16 Registers, with 64, 32, 16, 8, bit uses Minimal use of stack Parameters passed in registers Allocate/deallocate entire block for a procedure Can use 128 bytes passed (beyond) stack pointer (red zone) can use stack without messing with stack pointer No Frame Pointer Many tricky optimizations What kind of stack frame to use Calling with jump Various allocation techniques Compilers need to be smarter

19 – 19 – 105 x86-64 Locals in the Red Zone Avoiding Stack Pointer Change Can hold all information within small window beyond stack pointer /* Swap, using local array */ void swap_a(long *xp, long *yp) { volatile long loc[2]; loc[0] = *xp; loc[1] = *yp; *xp = loc[1]; *yp = loc[0]; } swap_a: movq (%rdi), %rax movq %rax, -24(%rsp) movq (%rsi), %rax movq %rax, -16(%rsp) movq -16(%rsp), %rax movq %rax, (%rdi) movq -24(%rsp), %rax movq %rax, (%rsi) ret rtn Ptr unused %rsp −8−8 loc[1] loc[0] − 16 − 24

20 – 20 – 105 x86-64 Stack Frame Example Keeps values of a and i in callee save registers Must set up stack frame to save these registers long sum = 0; /* Swap a[i] & a[i+1] */ void swap_ele_su (long a[], int i) { swap(&a[i], &a[i+1]); sum += a[i]; } swap_ele_su: movq %rbx, -16(%rsp) movslq %esi,%rbx movq %r12, -8(%rsp) movq %rdi, %r12 leaq (%rdi,%rbx,8), %rdi subq $16, %rsp leaq 8(%rdi), %rsi call swap movq (%r12,%rbx,8), %rax addq %rax, sum(%rip) movq (%rsp), %rbx movq 8(%rsp), %r12 addq $16, %rsp ret Blackboard?

21 – 21 – 105 Understanding x86-64 Stack Frame swap_ele_su: movq %rbx, -16(%rsp) # Save %rbx movslq %esi,%rbx # Extend & save i movq %r12, -8(%rsp) # Save %r12 movq %rdi, %r12 # Save a – array pointer leaq (%rdi,%rbx,8), %rdi # Create &a[i] subq $16, %rsp # Allocate stack frame leaq 8(%rdi), %rsi # &a[i+1] call swap # swap() movq (%r12,%rbx,8), %rax # a[i] addq %rax, sum(%rip) # sum += a[i] movq (%rsp), %rbx # Restore %rbx movq 8(%rsp), %r12 # Restore %r12 addq $16, %rsp # Deallocate stack frame ret

22 – 22 – 105 Understanding x86-64 Stack Frame swap_ele_su: movq %rbx, -16(%rsp) # Save %rbx movslq %esi,%rbx # Extend & save i movq %r12, -8(%rsp) # Save %r12 movq %rdi, %r12 # Save a leaq (%rdi,%rbx,8), %rdi # &a[i] subq $16, %rsp # Allocate stack frame leaq 8(%rdi), %rsi # &a[i+1] call swap # swap() movq (%r12,%rbx,8), %rax # a[i] addq %rax, sum(%rip) # sum += a[i] movq (%rsp), %rbx # Restore %rbx movq 8(%rsp), %r12 # Restore %r12 addq $16, %rsp # Deallocate stack frame ret rtn addr %r12 %rsp −8−8 %rbx − 16 rtn addr %r12 %rsp +8+8 %rbx

23 – 23 – 105 Interesting Features of Stack Frame Allocate entire frame at once All stack accesses can be relative to %rsp Do by decrementing stack pointer Can delay allocation, since safe to temporarily use red zone Simple deallocation Increment stack pointer No base/frame pointer needed


Download ppt "Machine-Level Programming: X86-64 Topics Registers Stack Function Calls Local Storage X86-64.ppt CS 105 Tour of Black Holes of Computing."

Similar presentations


Ads by Google