Machine-Level Programming IV: Machine Data 9th Lecture

Machine-Level Programming IV: Machine Data 9th Lecture
Instructor: David Trammell Slides modified from those provided by textbook authors Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective (3rd Edition)

Today Arrays Structures Floating Point One-dimensional
Multi-dimensional (nested) Multi-level Structures Allocation Access Alignment Floating Point

Machine Memory is a Byte Array
How do we allocate arrays of multi-byte types? Given array of data type T and length L We contiguously allocate a region of L * sizeof(T) bytes char s[12]; x x + 12 int arr[5]; x x + 4 x + 8 x + 12 x + 16 x + 20 double data[3]; x + 24 x x + 8 x + 16 char *cpa[3]; x x + 8 x + 16 x + 24

Array Access in Byte Addressable Memory
Basic Principle: T A[L] Array of data type T and length L Identifier A can be used as a pointer of type (T *) to array element 0 Reference Type Value arr[4] int 3 //element 4 arr int * x //pointer to arr arr+1 int * x //pointer to arr[1] &arr[2] int * x //pointer to arr[2] arr[5] int (garbage) //(out of bounds) *(arr+1) int //the value at M[x+4] &arr[i] int * ? arr + i int * ? int arr[5]; 1 5 2 3 x x + 4 x + 8 x + 12 x + 16 x + 20

Array Access in Byte Addressable Memory
Basic Principle: T A[L] Array of data type T and length L (let type T have a size of K bytes) Identifier A can be used as a pointer of type (T *) to array element 0 Reference Type Value arr[4] int 3 //element 4 arr int * x //pointer to arr arr+1 int * x //pointer to arr[1] &arr[2] int * x //pointer to arr[2] arr[5] int (garbage) //(out of bounds) *(arr+1) int //the value at M[x+4] &arr[i] int * x + 4i //the value at M[x+4i] arr + i int * x + K*i // int arr[5]; 1 5 2 3 x x + 4 x + 8 x + 12 x + 16 x + 20

Array Example Array elements are guaranteed to be contiguous in memory
#define M 5 int a[M] = { 1, 5, 2, 1, 3 }; int b[M] = { 0, 2, 1, 3, 9 }; int c[M] = { 9, 4, 7, 2, 0 }; int a[] 1 5 2 3 16 20 24 28 32 36 int b[] 2 1 3 9 36 40 44 48 52 56 int c[] 9 4 7 2 56 60 64 68 72 76 Array elements are guaranteed to be contiguous in memory Separately declared arrays a, b and c might NOT be contiguous (although they are in this example)

Array Accessing Example
int a[] 1 5 2 3 16 20 24 28 32 36 int get_digit (int *z, int digit) { return z[digit]; } Register %rdi contains starting address of array Register %rsi contains array index Desired digit at %rdi + 4*%rsi Use memory reference (%rdi,%rsi,4) X86-64 # %rdi = z (1st arg) # %rsi = digit (2nd arg) movl (%rdi,%rsi,4),%eax

Array Loop Example void inc_z(int *z) { size_t i;
for (i = 0; i < 5; i++) z[i]++; } # %rdi = z (1st arg) movl $0, %eax # i = 0 jmp .L # goto .L3 .L4: # .L4 (loop start) addl $1, (%rdi,%rax,4) # z[i]++ addq $1, %rax # i++ .L3: # .L3 (loop test) cmpq $4, %rax # cmp i to 4 jbe .L # if i <= 4 goto .L4 rep; ret

Multidimensional (Nested) Arrays
Declaration: T A[R][C]; 2D array of data type T R rows, C columns Type T element requires K bytes Array Size is: R * C * K bytes Arrangement in memory Row-Major Ordering (that is, all of row 0, then all of row 1 … etc.) int A[R][C]; Board: show 3D example: a[2][3][2] to illustrate the idea of row major as enumerating indices from right to left … A[0][0] A[0][C-1] A[1][0] A[1][C-1] A[R-1][0] A[R-1][C-1] 4*R*C Bytes

Nested Array Example “Row-Major” ordering in memory
int MAT[4][5] = {{1, 5, 2, 0, 6}, {1, 5, 2, 1, 3 }, {1, 5, 2, 1, 7 }, {1, 5, 2, 2, 1 }}; 1 5 2 6 1 5 2 3 1 5 2 7 1 5 2 MAT[4][5] 76 96 116 136 156 MAT[] is an array of 4 elements (contiguously allocated) Each element in MAT[] is an array of 5 ints (allocated contiguously) “Row-Major” ordering in memory

Nested Array Row Access
Address to the start of Each Row If A[i] is an array of C elements … Starting address A + i * (C * K) 1 5 2 6 1 5 2 3 1 5 2 7 1 5 2 76 96 116 136 156

Nested Array, Row Access: MAT[4][5]
1 5 2 6 3 7 int *get_row(int index, int MAT[][]){ return MAT[index]; } # %rdi = index # %rsi = MAT //pointer to start start of matrix leaq (%rdi,%rdi,4),%rax #index + index*4 == 5*index leaq (%rsi,%rax,4),%rax # MAT + (20 * index) Row Vector MAT[index] is an array of 5 int’s Starting address: MAT +(20*index) Machine Code 1st leaq is for math, 2nd computes an address. How can we tell? Address is: MAT + 4*(5*index) == MAT + 20*index

Nested Array Element Access
Array Elements A[i][j] is element of type T, which uses K bytes &A[i][j] is: A + i * (C * K) + j * K == A + K*( C*i + j )

Nested Array Element Access
Array Elements A[i][j] is element of type T, which uses K bytes &A[i][j] is: A + i * (C * K) + j * K == A + K*( C*i + j ) MAT 1 5 2 6 3 7 int get_elem(int index, int digit, int MAT[][]){ return MAT[index][digit]; } # %rdi = index # %rsi = digit # %rdx = MAT //pointer to start start of matrix leaq (%rdi,%rdi,4), %rax # rax = 5*index addl %rax, %rsi # rsi = 5*index + digit movl (%rdx,%rsi,4), %eax # Mem[MAT+4*(5*index+digit)]

Multi-Level Array Example
Variable ap denotes array of 3 elements Each element is a pointer 8 bytes Each pointer points to array of int’s int a[5] = { 1, 5, 2, 1, 3 }; int b[5] = { 0, 2, 1, 3, 9 }; int c[5] = { 9, 4, 7, 2, 0 }; int *ap[3] = {b, a, c}; //pointers! 36 160 16 56 168 176 ap a[] b[] c[] 1 5 2 3 20 24 28 32 9 40 44 48 52 4 7 60 64 68 72 76

Element Access in Multi-Level Array
int get_ap_digit (size_t index, size_t digit, int ap[][]) { return ap[index][digit]; } # salq $2, %rsi # 4*digit addq (%rdx,%rdi,8),%rsi # rsi = ap[index] + 4*digit movl (%rsi), %eax # return *p ret Computation Element access Mem[Mem[ap+8*index]+4*digit] Must do two memory reads First get pointer to row array Then access element within array

Array Element Accesses
Nested array Multi-level array int get_MAT_digit (size_t index, size_t digit, int MAT[][]) { return MAT[index][digit]; } int get_ap_digit (size_t index, size_t digit, int *ap[]) { return ap[index][digit]; } Accesses looks similar in C, but address computations very different: Mem[MAT+20*index+4*digit] Mem[Mem[ap+8*index]+4*digit]

Next Arrays Structures Unions Floating Point One-dimensional
Multi-dimensional (nested) Multi-level Structures Allocation Access Alignment Unions Floating Point

Structure Representation
next 16 24 32 struct rec { int a[4]; size_t i; struct rec *next; }; a Structure represented as block of memory Big enough to hold all of the fields Fields ordered according to declaration Even if another ordering could yield a more compact representation Compiler determines overall size + positions of fields

Generating Pointer to Structure Member
next 16 24 32 r+4*idx struct rec { int a[4]; size_t i; struct rec *next; }; a int *get_ap (struct rec *r, size_t idx) { return &r->a[idx]; } Generating Pointer to Array Element Offset of each structure member determined at compile time Compute as r + 4*idx # r in %rdi # idx in %rsi leaq (%rdi,%rsi,4), %rax ret

Structures & Alignment
Unaligned Data Aligned Data Primitive data type requires K bytes Address must be multiple of K struct S1 { char c; int i[2]; double v; } *p; c i[0] i[1] v p p+1 p+5 p+9 p+17 c 3 bytes i[0] i[1] 4 bytes v p+0 p+4 p+8 p+16 p+24 Multiple of 4 Multiple of 8 Multiple of 8 Multiple of 8

Alignment Principles Aligned Data Why Align?
If a primitive data type requires K bytes Then address must be multiple of K This is required on some machines X86-64 doesn’t require it, but it is advised Why Align? Memory is accessed in chunks for efficiency Chunks are aligned on 4 or 8 byte boundaries Inefficient to access datum that spans boundaries Compiler puts gaps in structure to ensure alignment

Specific Cases of Alignment (x86-64)
1 byte: char, … no restrictions on address 2 bytes: short, … lowest 1 bit of address must be 02 4 bytes: int, float, … lowest 2 bits of address must be 002 8 bytes: double, long, pointers, … lowest 3 bits of address must be 0002 16 bytes: long double (GCC on Linux) lowest 4 bits of address must be 00002 (or lowest hex digit since a hex digit is 4 bits)

Satisfying Alignment with Structures
Within structure: Must satisfy each element’s alignment requirement Overall structure placement Each structure has alignment requirement K K = Largest alignment of any element Initial address & structure length must be multiples of K Example: K = 8, due to double element struct S1 { char c; int i[2]; double v; } *p; c 3 bytes i[0] i[1] 4 bytes v p+0 p+4 p+8 p+16 p+24 Multiple of 4 Multiple of 8 Multiple of 8 Multiple of 8

Meeting Overall Alignment Requirement
For largest alignment requirement K Overall structure must be multiple of K struct S2 { double v; int i[2]; char c; } *p; v i[0] i[1] c 7 bytes p+0 p+8 p+16 p+24 Multiple of K=8

Arrays of Structures Overall structure length multiple of K
struct S2 { double v; int i[2]; char c; } a[10]; Overall structure length multiple of K Satisfy alignment requirement for every element a[0] a[1] a[2] • • • a+0 a+24 a+48 a+72 v i[0] i[1] c 7 bytes a+24 a+32 a+40 a+48

Saving Space in a Structure
Always put largest data types first Effect (K=4) struct S4 { char c; int i; char d; } *p; struct S5 { int i; char c; char d; } *p; c 3 bytes i d 3 bytes i c d 2 bytes

Union Allocation Allocate according to largest element union U1 {
char c; int i[2]; double v; } *up; c i[0] i[1] v up+0 up+4 up+8 struct S1 { char c; int i[2]; double v; } *sp; c 3 bytes i[0] i[1] 4 bytes v sp+0 sp+4 sp+8 sp+16 sp+24

Using Union to Determine Endianness
typedef union { char c; int i; } char_int_u; c 3 bytes i int isLittleEndian() { charint_u test; test.i = 1; return test.c; }

Using Union to Access Bit Patterns
union uf_union { unsigned u; float f; }; u f 4 union uf_union uf; unsigned s, exp, frac; uf.f = -1.5; s = uf.u >> 31; exp = (uf.u >> 23) & 0xFF; frac = uf.u & 0x007FFFFF; printf("Float %f\n", uf.f); printf("sign: 0x%X\n", s); printf("exp: 0x%X\n", exp); printf("frac: 0x%X\n", frac); Not the same as casting (unsigned)uf.f !

Byte Ordering Example x86 x86_64 union { unsigned char c[8];
unsigned short s[4]; unsigned int i[2]; unsigned long l[1]; } example; x86 c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] s[0] s[1] s[2] s[3] i[0] i[1] l[0] x86_64 c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] s[0] s[1] s[2] s[3] i[0] i[1] l[0]

Byte Ordering on IA32 Little Endian Output on IA32 (x86_32): f0 f1 f2
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] s[0] s[1] s[2] s[3] i[0] i[1] l[0] LSB MSB LSB MSB Print Output on IA32 (x86_32): Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7] Shorts == [0xf1f0,0xf3f2,0xf5f4,0xf7f6] Ints == [0xf3f2f1f0,0xf7f6f5f4] Long == [0xf3f2f1f0]

Byte Ordering on Sun SPARC
Big Endian f0 f1 f2 f3 f4 f5 f6 f7 c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] s[0] s[1] s[2] s[3] i[0] i[1] l[0] MSB LSB MSB LSB Print Output on Sun: Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7] Shorts == [0xf0f1,0xf2f3,0xf4f5,0xf6f7] Ints == [0xf0f1f2f3,0xf4f5f6f7] Long == [0xf0f1f2f3]

Byte Ordering on x86-64 Little Endian Output on x86-64: f0 f1 f2 f3 f4
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] s[0] s[1] s[2] s[3] i[0] i[1] l[0] LSB MSB Print Output on x86-64: Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7] Shorts == [0xf1f0,0xf3f2,0xf5f4,0xf7f6] Ints == [0xf3f2f1f0,0xf7f6f5f4] Long == [0xf7f6f5f4f3f2f1f0]

Last … Arrays Structures Floating Point in x86-64 One-dimensional
Multi-dimensional (nested) Multi-level Structures Allocation Access Alignment Floating Point in x86-64

XMM Registers: 16 registers (each holding 16 bytes)
Sixteen 8-bit integers Eight 16-bit integers Four 32-bit integers Four single-precision floats Two double-precision floats One single-precision float One double-precision float

Scalar or SIMD (single instructiom multiple data)
+ %xmm0 %xmm1 addss %xmm0,%xmm1 Scalar Operations: Single Precision SIMD Operations: Single Precision Scalar Operations: Double Precision + %xmm0 %xmm1 addps %xmm0,%xmm1 + %xmm0 %xmm1 addsd %xmm0,%xmm1

FP Basics Arguments passed in %xmm0, %xmm1, ...
Result returned in %xmm0 All XMM registers caller-saved (no callee saved XMM regs) addss means add scalar single-precision addsd means add scalar double-precision float fadd(float x, float y) { return x + y; } double dadd(double x, double y) { return x + y; } # x in %xmm0, y in %xmm1 addss %xmm1, %xmm0 ret # x in %xmm0, y in %xmm1 addsd %xmm1, %xmm0 ret

FP Memory Referencing Integer (and pointer) arguments passed in regular registers FP values passed in XMM registers Different mov instructions to move between XMM registers, and between memory and XMM registers double dincr(double *p, double v) { double x = *p; *p = x + v; return x; } # p in %rdi, v in %xmm0 movapd %xmm0, %xmm1 # Copy v movsd (%rdi), %xmm0 # x = *p addsd %xmm0, %xmm1 # t = x + v movsd %xmm1, (%rdi) # *p = t ret

Other Aspects of FP Code
Lots of instructions Floating-point comparisons Instructions ucomiss and ucomisd Set condition codes CF, ZF, and PF (parity flag) FP constants: stored in memory like a string Set XMM reg to 0.0 with xorpd %xmm0,%xmm0 addsd .LC0(%rip), %xmm0 addsd .LC1(%rip), %xmm0 … .LC0: .long .long .LC1: .long .long

Summary Arrays Structures Floating Point
Elements packed into contiguous region of memory Use index arithmetic to locate individual elements Structures Elements packed into single region of memory Access using offsets determined by compiler Padding may be required to to ensure alignment Floating Point Data held and operated on in XMM registers Similar to Integer machine code, but more complex instructions

Reading Read: 3.8 and 3.9 (pages 255  ) Skip or Skim 3.10
Skim 3.11 floating point This Thursday: mid-term exam review Mid-Term is Tuesday March 6th

Machine-Level Programming IV: Machine Data 9th Lecture

Similar presentations

Presentation on theme: "Machine-Level Programming IV: Machine Data 9th Lecture"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine-Level Programming IV: Machine Data 9th Lecture

Similar presentations

Presentation on theme: "Machine-Level Programming IV: Machine Data 9th Lecture"— Presentation transcript:

Similar presentations

About project

Feedback