CSC 221 Computer Organization and Assembly Language

Name: CSC 221 Computer Organization and Assembly Language
Uploaded: 2017-08-15T14:31:28+00:00
Duration: PTM35S22
Channel: Rosalind Evans
Description: CSC 221 Computer Organization and Assembly Language

CSC 221 Computer Organization and Assembly Language
Lecture 32: Intel x86 Instruction Encoding

Lecture Outline Encoding Real x86 Instructions
x86 Instruction Format Reference x86 Opcode Sizes x86 ADD Instruction Opcode Encoding x86 Instruction Operands, MOD-REG-R/M Byte REG Field of the MOD-REG-R/M Byte MOD R/M Byte and Addressing Modes SIB (Scaled Index Byte) Layout Scaled Indexed Addressing Mode

Lecture Outline Encoding ADD Instruction Example
Encoding ADD CL, AL Instruction Encoding ADD ECX, EAX Instruction Encoding ADD EDX, DISPLACEMENT Instruction Encoding ADD EDI, [EBX] Instruction Encoding ADD EAX, [ ESI + disp8 ] Instruction Encoding ADD EBX, [ EBP + disp32 ] Instruction Encoding ADD EBP, [ disp32 + EAX*1 ] Instruction Encoding ADD ECX, [ EBX + EDI*4 ] Instruction Encoding ADD Immediate Instruction

Encoding Real x86 Instructions
It is time to take a look that the actual machine instruction format of the x86 CPU family. They don't call the x86 CPU a Complex Instruction Set Computer (CISC) for nothing! Although more complex instruction encodings exist, no one is going to challenge that the x86 has a complex instruction encoding:

One or Two byte Instruction opcode (two bytes if the special 0Fh opcode expansion prefix is present) Optional Scaled Index Byte if the instruction uses a scaled index memory addressing mode. Imm./Constant data. This is a 0,1, 2, or 4 byte constant value if the instruction has an immediate operand. Prefix Bytes 0 to 4 special prefix values that affect the operation of instruction. “mod-reg-r/m” byte that spcifies the addressing mode and Instruction operand size. This byte is only required if the instruction supports register or memory operands. Displacement. This is 0,1, 2, or 4 byte value that specifies a memory address displacement for the instruction.

Although the diagram seems to imply that instructions can be up to 16 bytes long, in actuality the x86 will not allow instructions greater than 15 bytes in length. The prefix bytes are not the opcode expansion prefix discussed earlier - they are special bytes to modify the behavior of existing instructions.

x86 Instruction Format Reference
Another view of the x86 instruction format: Instruction Prefix Address-Size Prefix Operand-Size Prefix Segment Override Number or or or or 1 of Bytes (a) Optional Instruction Prefix Number or or or , 1, 2 or , 1, 2 or 4 of Bytes OpCode Mod-R/M SIB Displacement Immediate Mod Reg/OpCode R/M Scale Index Base Bits (b) General Instruction Format

Instructions have some combination of the following fields (but no instruction has all parts) instruction prefix – sets certain options opcode - specifies the operation to perform Mod R/M - specifies addressing mode/operands SIB (scale index base) - used for array index address displacement - used for addressing memory immediate value - holds value of a constant operand

Displacement We are really talking about an address offset within a segment (usually given as a named variable or a label in code) it could be a relative address like the 8-bit value used for jumping forward or backward from the current location in the code segment or it could be the location of a variable in the data segment or it could be a FAR reference to code or data in another segment

Displacement Examples jmp next – where next is a label in the current code segment add eax, var1 – where var1 is a 32-bit variable in the current data segment sub bx, var2[ecx] – where var2 is a 16-bit variable in the current code segment and ecx is an index register

Immediate Values These are usually constants used directly in the operation – available immediately for example: ADD EAX, 7 where 7 is the immediate value – there is no variable name and no source register or: CMP AL, [EDX+10] where 10 is a constant that will be added to the contents of the EDX register to get the operand’s location or: SHL AX, 4 where the constant 4 is the number of bit positions to shift the AX register

Instruction Prefix Used to specify options for instruction execution, for example: when executing String operators (MOVS, SCAS, etc) the prefix is used to indicate that the operation should be repeated for REP and REPE, the prefix is set to F3h for REPNE, the prefix is F2h some values indicate the memory segment that should be used (instead of the default) for ES prefix is set to 26h, for FS it is 64h, etc.

Instruction Prefix Used to specify ways that the instruction should be executed, for example: to change the default data size for an instruction (from 32-bit to 16-bit or vice-versa), the prefix is set to 66h similarly, to change the size of the default address size for an instruction, set it to 67h to lock shared memory so that only this instruction has access, set it to F0h

(b) General Instruction Format
x86 Opcode Sizes The x86 CPU supports two basic opcode sizes: standard one-byte opcode two-byte opcode consisting of a 0Fh opcode expansion prefix byte. The second byte then specifies the actual instruction. This provides for up to 512 different instruction classes, although the x86 does not yet use them all. Number or or or , 1, 2 or , 1, 2 or 4 of Bytes OpCode Mod-R/M SIB Displacement Immediate Mod Reg/OpCode R/M Scale Index Base Bits (b) General Instruction Format

x86 ADD Instruction Opcode
Bit number zero marked s specifies the size of the operands the ADD instruction operates upon: If s = 0 then the operands are 8-bit registers and memory locations. If s = 1 then the operands are either 16-bits or 32-bits: Under 32-bit operating systems the default is 32-bit operands if s = 1. To specify a 16-bit operand (under Windows or Linux) you must insert a special operand-size prefix byte in front of the instruction. x86 ADD instruction opcode : Bit number one, marked d, specifies the direction of the data transfer: If d = 0 then the destination operand is a memory location, e.g. add [ebx], al If d = 1 then the destination operand is a register, e.g. add al, [ebx] d s ADD Opcode. d =0 if adding from register to memory. d =1 if adding from memory to register. s =0 if adding eight-bit operands. s =1 if adding 16/32-bit operands.

Encoding x86 Instruction Operands, MOD-REG-R/M Byte (1/4)
The MOD-REG-R/M byte specifies instruction operands and their addressing mode(*): The R/M field, combined with MOD, specifies either the second operand in a two-operand instruction, or the only operand in a single-operand instruction like NOT or NEG. The d bit in the opcode determines which operand is the source, and which is the destination: d=0: MOD R/M <- REG, REG is the source d=1: REG <- MOD R/M, REG is the destination Mod Reg/OpCode R/M (*) Technically, registers do not have an address, but we apply the term addressing mode to registers nonetheless.

The MOD field specifies x86 addressing mode: Mod Reg/OpCode R/M MOD Meaning 00 Register indirect addressing mode or SIB with no displacement (when R/M = 100) or Displacement only addressing mode (when R/M = 101). 01 One-byte signed displacement follows addressing mode style byte(s). 10 Four-byte signed displacement follows addressing mode style byte(s). 11 Register addressing mode.

The REG field specifies source or destination register: Depending on the instruction , this can be either the source or the destination operand. Many instructions have the d (direction) field in their opcode to choose REG operand role: If d=0, REG is the source, MOD R/M <- REG. If d=1, REG is the destination, REG <- MOD R/M. Mod Reg/OpCode R/M For certain (often single-operand or immediate-operand) instructions, the REG field may contain an opcode extension rather than the register bits. The R/M field will specify the operand in such case.

The REG field specifies source or destination register: Mod Reg/OpCode R/M REG Value Register if data size is 8-bits Register if data size is 16-bits Register if data size is 32-bits 000 al ax eax 001 cl cx ecx 010 dl dx edx 011 bl bx ebx 100 ah sp esp 101 ch bp ebp 110 dh si esi 111 bh di edi

MOD R/M Byte and Addressing Modes
MOD R/M Addressing Mode === === ======================== [ eax ] [ eax + disp8 ] (1) [ eax + disp32 ] register (bh/di/edi) (2) [ ecx ] [ ecx + disp8 ] [ ecx + disp32 ] register (bh/di/edi) [ edx ] [ edx + disp8 ] [ edx + disp32 ] register (bh/di/edi) [ ebx ] [ ebx + disp8 ] [ ebx + disp32 ] register (bh/di/edi) SIB Mode (3) SIB + disp8 Mode SIB + disp32 Mode register (bh/di/edi) (1) Addressing modes with 8-bit displacement fall in the range and require only a single byte displacement after the opcode (Faster!) (2) The size bit in the opcode specifies 8 or 32-bit register size. To select a 16-bit register requires a prefix byte. (3) The so-called scaled indexed addressing modes, SIB = scaled index byte mode.

MOD R/M Byte and Addressing Modes
MOD R/M Addressing Mode === === ======================== bit Disp-Only Mode (4) [ ebp + disp8 ] [ ebp + disp32 ] register (bh/di/edi) [ esi ] [ esi + disp8 ] [ esi + disp32 ] register (bh/di/edi) [ edi ] [ edi + disp8 ] [ edi + disp32 ] register (bh/di/edi) (4) Note that there is no [ ebp ] addressing. It's slot is occupied by the 32-bit displacement only addressing mode. Intel decided that programmers can use [ ebp+ disp8 ] addressing mode instead, with its 8-bit displacement set equal to zero (instruction is a little longer, though.)

SIB (Scaled Index Byte) Layout
Scaled indexed addressing mode uses the second byte (namely, SIB byte) that follows the MOD-REG-R/M byte in the instruction format. The MOD field still specifies the displacement size of zero, one, or four bytes. The MOD-REG-R/M and SIB bytes are complex, because Intel reused 16-bit addressing circuitry in the 32-bit mode, rather than simply abandoning the 16-bit format in the 32-bit mode. There are good hardware reasons for this, but the end result is a complex scheme for specifying addressing modes in the opcodes. Scaled index byte layout: Scale Index Base

SIB (Scaled Index Byte) Layout
Base Scale Value Index * Scale Value 00 Index * 1 01 Index * 2 10 Index * 3 11 Index * 4 Base Register 000 EAX 001 ECX 010 EDX 011 EBX 100 ESP 101 Disp. only if MODE=00 EBP if MOD=01 or 10 110 ESI 111 EDI Index Register 000 EAX 001 ECX 010 EDX 011 EBX 100 Illegal 101 EBP 110 ESI 111 EDI

Scaled Indexed Addressing Mode
[ reg32 + eax*n ] [ reg32 + ebx*n ] [ reg32 + ecx*n ] [ reg32 + edx*n ] [ reg32 + ebp*n ] [ reg32 + esi*n ] [ reg32 + edi*n ] MOD = 01 [ disp + reg8 + eax*n ] [ disp + reg8 + ebx*n ] [ disp + reg8 + ecx*n ] [ disp + reg8 + edx*n ] [ disp + reg8 + ebp*n ] [ disp + reg8 + esi*n ] [ disp + reg8 + edi*n ] MOD = 10 [ disp + reg32 + eax*n ] [ disp + reg32 + ebx*n ] [ disp + reg32 + ecx*n ] [ disp + reg32 + edx*n ] [ disp + reg32 + ebp*n ] [ disp + reg32 + esi*n ] [ disp + reg32 + edi*n ] MOD = 00, BASE = 101 [ disp + eax*n ] [ disp + ebx*n ] [ disp + ecx*n ] [ disp + edx*n ] [ disp + ebp*n ] [ disp + esi*n ] [ disp + edi*n ] Note: n = 1, 2, 4, or 8. In each scaled indexed addressing mode the MOD field in MOD-REG-R/M byte specifies the size of the displacement. It can be zero, one, or four bytes: MOD R/M Addressing Mode SIB SIB + disp8 SIB + disp32 The Base and Index fields of the SIB byte select the base and index registers, respectively.

Example: Encoding ADD Instruction
The ADD opcode can be decimal 0, 1, 2, or 3, depending on the direction and size bits in the opcode: How could we encode various forms of the ADD instruction using different addressing modes? d s ADD Opcode. d =0 if adding from register to memory. d =1 if adding from memory to register. s =0 if adding eight-bit operands. s =1 if adding 16/32-bit operands.

Encoding ADD CL, AL Instruction
11 Indicates that the R/M field is a register This field, along with the d bit in the opcode indicates that the destination filed is the CL register Zero indicates that we are adding Eight Bit values together s MOD R/M 1 1 Indicates the ADD Instruction. d REG This field along with the d bit in the opcode, indicates that the source filed is the AL register. Zero indicates that we are adding the REG field to the R/M field ADD cl, al = 00 C1

Encoding ADD CL, AL Instruction
d s 1 MOD REG R/M Interesting side effect of the direction bit and the MOD- REG-R/M byte organization: some instructions can have two different opcodes, and both are legal! For example, encoding of add cl, al could be 00 C1 (if d=0), or 02 C8, if d bit is set to 1. C The possibility of opcode duality issue here applies to all instructions with two register operands.

Encoding ADD ECX, EAX Instruction
11 Indicates that the R/M field is a register. This field, along with the d bit in the opcode indicates that the destination filed is the ECX register One indicates that we are adding 32-Bit values together s MOD R/M 1 1 1 Indicates the ADD Instruction. d REG This field along with the d bit in the opcode, indicates that the source filed is the EAX register. Zero indicates that we are adding the REG field to the R/M field ADD ecx, eax = 01 C1 = 03 C8

Encoding ADD EDX, DISP Instruction
The combination of MOD=00 and R/M=101indicates that this is the displacement-only addressing mode. One indicates that we are adding 32-Bit values together s MOD R/M 1 1 1 DISP32 Indicates the ADD Instruction. d REG 32-bit displacement follows the instruction. This field, along with the d bit in the opcode, indicates that the destination filed is the EDX register. One indicates that we are adding the R/M field to the REG field. ADD edx, disp = 03, 1D , ww, xx, yy, zz L.O Byte H.O Byte

Encoding ADD EDI, [EBX] Instruction
00 indicates a zero byte displacement. 011 indicates the use of [EBX] addressing mode. One indicates that we are adding 32-Bit values together s MOD R/M 1 1 1 Indicates the ADD Instruction. d REG This field along with the d bit in the opcode, indicates that the destination filed is the EDI register. One indicates that we are adding the R/M field to the REG field. ADD edi, [ebx] = 03, 3B

Encoding ADD EAX, [ESI + disp8] Instruction
01 indicates a one byte displacement. 110 indicates the use of [ESI] addressing mode. One indicates that we are adding 32-Bit values together s MOD R/M 1 1 1 DISP8 Indicates the ADD Instruction. d REG 8-bit displacement follows the MOD-REG-R/M byte. This field along with the d bit in the opcode, indicates that the destination filed is the EAX register. One indicates that we are adding the R/M field to the REG field. ADD eax, [esi + disp8] = 03, 46, xx

Encoding ADD EBX, [EBP + disp32] Instruction
MOD=10 indicates the use of a 32-bit displacement. R/M=101 indicates the use of [EBP]. One indicates that we are adding 32-Bit values together s MOD R/M 1 1 1 1 DISP32 Indicates the ADD Instruction. d REG 32-bit displacement follows the instruction. This field along with the d bit in the opcode, indicates that the destination filed is the EBX register. One indicates that we are adding the R/M field to the REG field. ADD ebx, [ebp + disp32] = 03, 9D, ww,xx,yy,zz

What is Scale Index Mode?
Mainly used to work with arrays for example, base could hold the location of an array and index the offset of an element or, the displacement field is the name of the array and the index is the offset of an element multiply the index register by a scaling factor of 1, 2, 4 or 8 times to adjust for the size of operands of different types an array of WORD requires a factor of 2 an array of DWORD requires a factor of 4

Scale Index Addressing
Can use a displacement (variable name) with index (offset) and scaling factor (n) displacement[index*n] or, a base register (pointer to a variable) with an index (offset) and a scaling factor (n) [base][index*n] or, all of the above displacement[base][index*n]

Effective Address Calculation
Effective Address = Base + (Index*Scale) + Displacement

Encoding ADD EBP, [disp32 + EAX*1] Instruction
One indicates that we are adding 32-Bit values together MOD=00 & R/M=100 means disp32+reg*1 mode Base=101 means displacement only addressing mode. s MOD R/M 1 00 101 100 00 000 101 DISP32 Indicates the ADD Instruction. d REG These two fields select the EAX*1 scaled index mode. EBP is dest. register. One indicates that we are adding the R/M field to the REG field. ADD ebp, [disp32 + eax*1] = 03, 2C, 05, ww,xx,yy,zz

Encoding ADD ECX, [EBX + EDI*4] Instruction
One indicates that we are adding 32-Bit values together MOD=00 & R/M=100 means SIB mode Base=011=EBX s MOD R/M SIB Base 1 00 001 100 10 111 011 Indicates the ADD Instruction. d REG These two fields select the EDI*4 scaled index mode. ECX is dest. register. One indicates that we are adding the R/M field to the REG field. ADD ecx, [ebx + edi * 4] = 03, 0C, BB

Encoding ADD Immediate Instruction
Optional one or two byte displacement (as specified by MOD-R/M) field s=0 : 8-bit operands s=1 : 32-bit operands MOD-R/M specify destination operand. s MOD R/M x 1 00 000 101 Constant Indicates that this is an immediate mode inst. d REG 8,16 or 32-bit constant follow the instruction. OpCode Extension, 000 for ADD Immediate. 0=Constant’s size = s 1=Constant is 1-byte operand that is sign extended to the size of the operand.

Summary Encoding Real x86 Instructions
x86 Instruction Format Reference x86 Opcode Sizes x86 ADD Instruction Opcode Encoding x86 Instruction Operands, MOD-REG-R/M Byte REG Field of the MOD-REG-R/M Byte MOD R/M Byte and Addressing Modes SIB (Scaled Index Byte) Layout Scaled Indexed Addressing Mode

Summary Encoding ADD Instruction Example
Encoding ADD CL, AL Instruction Encoding ADD ECX, EAX Instruction Encoding ADD EDX, DISPLACEMENT Instruction Encoding ADD EDI, [EBX] Instruction Encoding ADD EAX, [ ESI + disp8 ] Instruction Encoding ADD EBX, [ EBP + disp32 ] Instruction Encoding ADD EBP, [ disp32 + EAX*1 ] Instruction Encoding ADD ECX, [ EBX + EDI*4 ] Instruction Encoding ADD Immediate Instruction

Reference Instruction Format Design Encoding Real x86 Instructions
Encoding Real x86 Instructions

CSC 221 Computer Organization and Assembly Language

Similar presentations

Presentation on theme: "CSC 221 Computer Organization and Assembly Language"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSC 221 Computer Organization and Assembly Language

Similar presentations

Presentation on theme: "CSC 221 Computer Organization and Assembly Language"— Presentation transcript:

Similar presentations

About project

Feedback