Chapter 3-1 ARM ISA ARM Instruction Set Architecture ARM Instruction Set Architecture Next Lecture Next Lecture  ARM program examples.

Chapter 3-1 ARM ISA ARM Instruction Set Architecture ARM Instruction Set Architecture Next Lecture Next Lecture  ARM program examples

2 ARM processors Used in low-power and low-cost embedded applications Used in low-power and low-cost embedded applications  Cell phones, PDAs, modems Various simulation models available for embedded system design as well as low-power design Various simulation models available for embedded system design as well as low-power design Support both Big-endian and Little-endian Support both Big-endian and Little-endian All arithmetic and logic instructions operate only on data in processor registers All arithmetic and logic instructions operate only on data in processor registers Pipelining: 3 or 5 stages Pipelining: 3 or 5 stages  Instruction Fetch (IF), Decode (ID), Execute (EX), Memory Access (Mem) and Write-back (WB) http://www.heyrick.co.uk/assembler/

3 339v10 The ARM Architecture Data Sizes and Instruction Sets The ARM is a 32-bit RISC architecture. When used in relation to the ARM: Byte means 8 bits Halfword means 16 bits (two bytes) Word means 32 bits (four bytes) Most ARMs implement two instruction sets 32-bit ARM Instruction Set 16-bit Thumb Instruction Set for tiny systems Jazelle cores can also execute Java bytecode

4 439v10 The ARM Architecture r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 (sp) r14 (lr) r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr FIQIRQSVCUndefAbort User Mode r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 (sp) r14 (lr) r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers FIQIRQSVCUndefAbort r0 r1 r2 r3 r4 r5 r6 r7 r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers UserIRQSVCUndefAbort r8 r9 r10 r11 r12 r13 (sp) r14 (lr) FIQ ModeIRQ Mode r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers UserFIQSVCUndefAbort r13 (sp) r14 (lr) Undef Mode r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers UserFIQIRQSVCAbort r13 (sp) r14 (lr) SVC Mode r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers UserFIQIRQUndefAbort r13 (sp) r14 (lr) Abort Mode r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers UserFIQIRQSVCUndef r13 (sp) r14 (lr) The ARM Register Set

5 ARM Instruction Set Registers Registers  15 general purpose registers (R0-R14), 32 bit wide  R15 is Program Counter (PC)  R14 is used as Link Register (LR), R13 is Stack Pointer (SP)  Status Register (CPSR) holds the condition flags (N,Z,C, and V), the interrupt disable bits and processor mode bits  There are 15 additional general purpose registers called the banked registers, which are used when the processor switches into Supervisor or Interrupt modes CPSR 31283029 N Z CV 76 4 0 Processor mode bits Interrupt disable bits

6 Each instruction is encoded into 32 bits Each instruction is encoded into 32 bits Access to memory is through load and store only Access to memory is through load and store only In a load, the operand is transferred into the register named in the Rd field In a store, the operand is transferred from Rd into memory If the operand is a byte, it is always located in the lower order byte position of the register and on a load the higher order bytes are filled with zeros. ARM Instructions ConditionOP codeRnRdOther infoRm 31 28 27 20 19 16 15 12 11 4 3 08bits 4 bits 4448bits

7 All instructions are conditionally executed All instructions are conditionally executed The instruction is executed only if the current state of the processor condition code flags equal the condition specified in bits b31 – b28 The instruction is executed only if the current state of the processor condition code flags equal the condition specified in bits b31 – b28 One of the conditions is used to indicate that the instruction is always executed One of the conditions is used to indicate that the instruction is always executed Conditional Executions of Instructions ConditionOP codeRnRdOther infoRm 31 28 27 20 19 16 15 12 11 4 3 08bits 4 bits 4448bits CPSR N Z CV 31283029

8 CMP Rn, Rm CMP Rn, Rm  Performs the operation [Rn]-[Rm] and sets the condition codes based on the result of the operation The arithmetic and logic instructions affect the condition code flags only if explicitly specified in the Opcode field The arithmetic and logic instructions affect the condition code flags only if explicitly specified in the Opcode field Example Example  ADDS R0, R1, R2; sets the condition code flags  ADD R0, R1, R2; does not Setting Condition Code

9 939v10 The ARM Architecture ARM instructions can be made to execute conditionally by postfixing them with the appropriate condition code field. This improves code density and performance by reducing the number of forward branch instructions. CMP r3,#0 CMP r3,#0 BEQ skip ADDNE r0,r1,r2 ADD r0,r1,r2 skip By default, data processing instructions do not affect the condition code flags but the flags can be optionally set by using “S”. CMP does not need “S”. loop … SUBS r1,r1,#1 BNE loop if Z flag clear then branch decrement r1 and set flags Conditional Execution and Flags

10 39v10 The ARM Architecture Condition Codes Not equal Unsigned higher or same Unsigned lower Minus Equal Overflow No overflow Unsigned higher Unsigned lower or same Positive or Zero Less than Greater than Less than or equal Always Greater or equal EQ NE CS/HS CC/LO PL VS HI LS GE LT GT LE AL MI VC SuffixDescription Z=0 C=1 C=0 Z=1 Flags tested N=1 N=0 V=1 V=0 C=1 & Z=0 C=0 or Z=1 N=V N!=V Z=0 & N=V Z=1 or N=!V The possible condition codes are listed below: Note: AL is the default and does not need to be specified

11 39v10 The ARM Architecture Examples of conditional execution Use a sequence of several conditional instructions if (a==0) func(1); CMP r0,#0 MOVEQ r0,#1 BLEQ func Set the flags, then use various condition codes if (a==0) x=0; if (a>0) x=1;(else if) CMP r0,#0 MOVEQ r1,#0 MOVGT r1,#1 Use conditional compare instructions if (a==4 || a==10) x=0; Pop Quiz? Bonus 1pt on test

12 Basic load instruction: LDR Rd, [Rn, #offset] Basic load instruction: LDR Rd, [Rn, #offset]  Offset: a signed number in the immediate mode  EA = a signed offset + the contents of register Rn  Operation: Rd  [[Rn]+offset]  The destination register listed first The magnitude of the offset is a 12 bit immediate value contained in the lower 12 bits of the instruction The magnitude of the offset is a 12 bit immediate value contained in the lower 12 bits of the instruction LDR Rd, [Rn,Rm] performs Rd  [[Rn]+[Rm]] LDR Rd, [Rn,Rm] performs Rd  [[Rn]+[Rm]]  The magnitude is the content of a third register Rm LDR Rd, [Rn] performs Rd  [[Rn]] LDR Rd, [Rn] performs Rd  [[Rn]] Basic Addressing Modes ConditionOP codeRnRdOther infoRm 31 28 27 20 19 16 15 12 11 4 3 08bits 4 bits 4448bits offset

13 STR Rd, [Rn] performs [[Rn]]  [Rd] STR Rd, [Rn] performs [[Rn]]  [Rd]  i.e., transfers a word into the memory The STRB instruction transfers the byte contained in the low-order end of Rd The STRB instruction transfers the byte contained in the low-order end of Rd Note the order of operands Note the order of operands Addressing Modes: Store

14 [Rn, #offset] or [Rn, ±Rm, shift] [Rn, #offset] or [Rn, ±Rm, shift]  EA = [Rn] + offset, or EA = [Rn] ± [Rm] shifted Calculate operand address first, and then perform operation Calculate operand address first, and then perform operation Addressing Modes: Pre-indexed 1000 STR R3, [R5,R6] Operand 1100 Offset 100 Word (4 bytes) 1000 R5 100 R6 Base register Offset register

15 STR R3, [R5, R10, LSL #2] STR R3, [R5, R10, LSL #2]  EA = [R5] + [R10 * 4] Addressing Modes: Pre-indexed example 1000 STR R3, [R5, R10, LSL #2] Operand 1100 Offset 100 Word (4 bytes) 1000 R5 25 R10 Base register Offset register C[ ] int C[100]; … for (i=0; i++; i<N) C[i] = A[i] + B[i] i C[0]

16 [Rn, #offset]! or [Rn, ±Rm, shift]! [Rn, #offset]! or [Rn, ±Rm, shift]!  EA = [Rn] + offset  EA = [Rn] ± [Rm] shifted  Then, EA is written back into Rn Example: Example:  STR R0, [Rbase, Rindex]!  Store R0 at Rbase + Rindex, and write back new address Rbase + Rindex to Rbase. ! in the Pre-indexed mode means that a write back is to be performed ! in the Pre-indexed mode means that a write back is to be performed Pre-indexed with Write Back Q?

17 2008 27 2012 R5 27 R0 After execution of the Push instruction Base register (Stack pointer) Push instruction: STR R0, [R5, #-4]!  –  –  EA = R5 – 4, i.e., R5  R5 – 4   Perform operation Store R5 is used as the stack pointer R5 initially contains the address 2012 of the current TOS The immediate offset -4 is added to the content (2012) of R5 and written back into R5 This new TOS location is used as the EA (2008) to store the contents of R0, 27 Pre-indexed Addressing with write-back

18 The EA of the operand is the contents of Rn The EA of the operand is the contents of Rn Perform operation first with the operand Perform operation first with the operand Then add the offset to Rn (i.e., the result is written back into Rn) Then add the offset to Rn (i.e., the result is written back into Rn) The post-indexed mode always involves a write back The post-indexed mode always involves a write back The pre-indexed and post-indexed are distinguished by the way the square brackets are used. The pre-indexed and post-indexed are distinguished by the way the square brackets are used.  [Rn, #offset] vs. [Rn], #offset The offset may be given as an immediate value (range +/- 4095) or as the contents of the third register Rm The offset may be given as an immediate value (range +/- 4095) or as the contents of the third register Rm Addressing Modes: Post-indexed

19 10006 321 1200 1000R2 25 R10 Base register Word (4 bytes)/element Offset register -17 LDR R1, [R2], R10, LSL #2 100 = 25x4 Post-indexed Example: used to access a column of elements of a 25x25 matrix (1,1)(1,2)…(1,25) (2,1)… (3,1)… … 1000 1100 1200 1..00 1100 for (i=1; i++; i≤N) sum += D[i,1];

20 39v10 The ARM Architecture 0x5 r1 0x200 Base Register 0x200 r0 0x5 Source Register for STR Offset 12 0x20c r1 0x200 Original Base Register 0x200 r0 0x5 Source Register for STR Offset 12 0x20c r1 0x20c Updated Base Register Write-back (auto-update) form: STR r0,[r1,#12]! Pre or Post Indexed Addressing? Pre-indexed: STR r0,[r1,#12] Post-indexed: STR r0,[r1],#12 int *ptr; x = *ptr++;

21 LDR R0, [R1, -R2]! LDR R0, [R1, -R2]!  R0  [[R1] – [R2]]; R1  [R1] – [R2] When the offset is given in a register, it may be scaled by a power of 2 by shifting to the right or to the left. When the offset is given in a register, it may be scaled by a power of 2 by shifting to the right or to the left.  This is indicated with either LSL or LSR and the shift amount  The shift amount is in the range 0 to 31 LDR R0, [R1, -R2, LSL #4]! LDR R0, [R1, -R2, LSL #4]!  R0  [[R1] – 16 x [R2]]; R1  [R1] – 16 x [R2] The PC may be used as the base register Rn. The assembler determines the immediate offset as the signed distance between the address of the operand and the contents of the PC. (relative addressing mode) The PC may be used as the base register Rn. The assembler determines the immediate offset as the signed distance between the address of the operand and the contents of the PC. (relative addressing mode) Recap: Pre, Post-indexed Modes

22 Word (4 bytes) Memory Address 1000 LDR R1, ITEM 1004 1008 Operand ITEM= 1060 Updated [PC] = 1008 52 = offset The offset calculated by the assembler is 52 because the updated PC = 1008 EA = 1060 = 1008 + 52 Relative Addressing Mode When the effective address is calculated at instruction execution time, the contents of the PC will have been updated to the address two words (8 bytes) forward from the current instruction When the effective address is calculated at instruction execution time, the contents of the PC will have been updated to the address two words (8 bytes) forward from the current instruction Why?

23 Multiple Load and Store The ARM can also load multiple operands   Called block transfer   LDM: load multiple   STM: store multiple   The offset is always 4 ; thus it is not specified explicitly Assume R10 is the base register and it contains 1000 Assume R10 is the base register and it contains 1000 LDMIA R10!, {R0,R1,R6,R7} LDMIA R10!, {R0,R1,R6,R7} transfers the words from locations 1000, 1004, 1008, 1012 into registers R0, R1, R6 and R7 transfers the words from locations 1000, 1004, 1008, 1012 into registers R0, R1, R6 and R7 The suffix IA indicates increment after The suffix IA indicates increment after  IB: Increment Before, DA: Decrement After, DB: Decrement Before

24 39v10 The ARM Architecture LDM / STM operation Syntax: { } Rb{!}, 4 addressing modes: LDMIA / STMIA increment after LDMIB / STMIB increment before LDMDA / STMDA decrement after LDMDB / STMDB decrement before DA r1 Decreasing Address r4 r0 r1 r4 r0 r1 r0 r4r1 r0 r4 r10 DBIAIB LDMxx r10, {r0,r1,r4} STMxx r10, {r0,r1,r4} Base Register (Rb) Pop Quiz?

25 Move Instructions MOV Rd, Rm MOV Rd, Rm  Rd  [Rm] MOV R0, #76 MOV R0, #76  R0  #76

26 39v10 The ARM Architecture Branch : B{ } label Branch with Link : BL{ } subroutine_label The processor core shifts the offset field left by 2 positions, sign-extends it and adds it to the PC ± 32 Mbyte range How to perform longer branches? 2831240 Cond 1 0 1 L Offset Condition field Link bit 0 = Branch 1 = Branch with link 232527 Branch instructions

27 Conditional branch instructions contain 2 ’ s complement 24 bit offset, whihc is first left-shifted by 2 and then added to the updated contents of the PC to generate the branch target. Conditional branch instructions contain 2 ’ s complement 24 bit offset, whihc is first left-shifted by 2 and then added to the updated contents of the PC to generate the branch target. condition OPcode offset 31 28 27 24 23 0 BEQ LOCATION Updated [PC] =1008 1000 1004 LOCATION = 1100 Offset = 92 At the time the branch target address is calculated, the content of the PC has been updated to contain the address of the instruction that is two words beyond the branch instruction. Branch Target Conditional Branch Instructions

28 39v10 The ARM Architecture ARM Branches and Subroutines B PC relative. ±32 Mbyte range. BL Stores return address in LR Returning implemented by restoring the PC from LR For non-leaf functions, LR will have to be stacked STMFD sp!,{regs,lr} : BL func2 : LDMFD sp!,{regs,pc} func1func2 : BL func1 : MOV pc, lr

29 39v10 The ARM Architecture Data processing Instructions Consist of : Arithmetic: ADDADCSUBSBCRSBRSC Logical: ANDORREORBIC Comparisons: CMPCMNTSTTEQ Data movement: MOVMVN These instructions only work on registers, NOT memory. Syntax: { }{S} Rd, Rn, Operand2 Comparisons set flags only - they do not change Rd Data movement does not change Rn Second operand is sent to the ALU via barrel shifter.

30 Opcode Rd, Rn, Rm Opcode Rd, Rn, Rm ADD R0, R2, R4 ADD R0, R2, R4  R0  [R2] + [R4] SUB R0, R6, R5 SUB R0, R6, R5  R0  [R6] – [R5] ADD R0, R3, #17 ADD R0, R3, #17  R0  [R3] + 17 ADD R0, R1, R5, LSL #4 ADD R0, R1, R5, LSL #4  R0  [R1] + 16 x [R5] MUL R0, R1, R2 MUL R0, R1, R2  R0  [R1] x [R2]  Places the low-order 32 bits of the product in a third register  High order bits of the product are discarded MLA R0, R1, R2, R3 MLA R0, R1, R2, R3  R0  [R1] x [R2] + [R3]; multiply accumulate Arithmetic Instructions

31 AND Rd, Rn, Rm AND Rd, Rn, Rm  Rd  [Rn] AND [Rm] ; logical bitwise AND  Example ; R0  02FA62CA and R1  0000FFFF  AND R0, R0, R1;R0  000062CA BIC Rd, Rn, Rm BIC Rd, Rn, Rm  Bit clear, complements each bit in Rm and then performs AND with the bits in Rn  Example ; R0  02FA62CA and R1  0000FFFF  BIC R0, R0, R1;R0  02FA0000 MVN complements the bits of the source operand and places the result in Rd MVN complements the bits of the source operand and places the result in Rd  R3  0F0F0F0F  MVN R0, R3;R0  F0F0F0F0 Logic Instructions

32 39v10 The ARM Architecture Destination CF 0 Destination CF LSL : Logical Shift Left ASR: Arithmetic Shift Right Multiplication by a power of 2Division by a power of 2, preserving the sign bit Destination CF...0 Destination CF LSR : Logical Shift RightROR: Rotate Right Division by a power of 2Bit rotate with wrap around from LSB to MSB Destination RRX: Rotate Right Extended Single bit rotate with wrap around from CF to MSB CF Shift Operations A barrel shifter is a hardware device that can shift a data word left or right by any number of bits in a single operationshiftdata wordbits

33 39v10 The ARM Architecture A Barrel Shifter

34 39v10 The ARM Architecture Register, optionally with shift operation Shift value can be either be: 5 bit unsigned integer Specified in bottom byte of another register. Used for multiplication by constant Immediate value 8 bit number, with a range of 0- 255. Rotated right through even number of positions Allows increased range of 32-bit constants to be loaded directly into registers Result Operand 1 Barrel Shifter Operand 2 ALU Using the Barrel Shifter: The Second Operand

35 39v10 The ARM Architecture No ARM instruction can contain a 32 bit immediate constant All ARM instructions are fixed as 32 bits long The data processing instruction format has 12 bits available for operand2 4 bit rotate value (0-15) is multiplied by two to give range 0- 30 in steps of 2 Rule to remember is “8-bits shifted by an even number of bit positions”. 07118 immed_8 Shifter ROR rot x2 Quick Quiz: MOV r0,#255,8 Immediate Constants (1)

36 39v10 The ARM Architecture Examples: The assembler converts immediate values to the rotate form: MOV r0,#4096; uses 0x40 ror 26 ADD r1,r2,#0xFF0000; uses 0xFF ror 16 The bitwise complements can also be formed using MVN: MOV r0, #0xFFFFFFFF ; assembles to MVN r0,#0 Values that cannot be generated in this way will cause an error. 031 ror #0 range 0-0xff000000 step 0x01000000 ror #8 range 0-0x000000ff step 0x00000001 range 0-0x000003fc step 0x00000004 ror #30 000000000000000000000000 000000000000000000000000 000000000000000000000000 Immediate Constants (2)

37 39v10 The ARM Architecture To allow larger constants to be loaded, the assembler offers a pseudo-instruction: LDR rd, =const This will either: Produce a MOV or MVN instruction to generate the value (if possible). or Generate a LDR instruction with a PC-relative address to read the constant from a literal pool (Constant data area embedded in the code). For example LDR r0,=0xFF => MOV r0,#0xFF LDR r0,=0x55555555 => LDR r0,[PC,#Imm12] … … DCD 0x55555555 This is the recommended way of loading constants into a register Loading 32 Bit Constants

38 39v10 The ARM Architecture Multiply Syntax: MUL{ }{S} Rd, Rm, RsRd = Rm * Rs MLA{ }{S} Rd,Rm,Rs,RnRd = (Rm * Rs) + Rn [U|S]MULL{ }{S} RdLo, RdHi, Rm, RsRdHi,RdLo := Rm*Rs [U|S]MLAL{ }{S} RdLo, RdHi, Rm, Rs RdHi,RdLo := (Rm*Rs)+RdHi,RdLo Cycle time Basic MUL instruction 2-5 cycles on ARM7TDMI 1-3 cycles on StrongARM/XScale 2 cycles on ARM9E/ARM102xE +1 cycle for ARM9TDMI (over ARM7TDMI) +1 cycle for accumulate (not on 9E though result delay is one cycle longer) +1 cycle for “long” Above are “general rules” - refer to the TRM for the core you are using for the exact details

39 39v10 The ARM Architecture Single register data transfer LDRSTR Word LDRBSTRB Byte LDRHSTRH Halfword LDRSB Signed byte load LDRSH Signed halfword load Memory system must support all access sizes Syntax: LDR { }{ } Rd, STR { }{ } Rd, e.g. LDREQB

40 39v10 The ARM Architecture Address accessed by LDR/STR is specified by a base register plus an offset For word and unsigned byte accesses, offset can be An unsigned 12-bit immediate value (ie 0 - 4095 bytes). LDR r0,[r1,#8] A register, optionally shifted by an immediate value LDR r0,[r1,r2] LDR r0,[r1,r2,LSL#2] This can be either added or subtracted from the base register: LDR r0,[r1,#-8] LDR r0,[r1,-r2] LDR r0,[r1,-r2,LSL#2] For halfword and signed halfword / byte, offset can be: An unsigned 8 bit immediate value (ie 0-255 bytes). A register (unshifted). Choice of pre-indexed or post-indexed addressing Address Accessed

Chapter 3-1 ARM ISA ARM Instruction Set Architecture ARM Instruction Set Architecture Next Lecture Next Lecture  ARM program examples.

Similar presentations

Presentation on theme: "Chapter 3-1 ARM ISA ARM Instruction Set Architecture ARM Instruction Set Architecture Next Lecture Next Lecture  ARM program examples."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 3-1 ARM ISA ARM Instruction Set Architecture ARM Instruction Set Architecture Next Lecture Next Lecture  ARM program examples.

Similar presentations

Presentation on theme: "Chapter 3-1 ARM ISA ARM Instruction Set Architecture ARM Instruction Set Architecture Next Lecture Next Lecture  ARM program examples."— Presentation transcript:

Similar presentations

About project

Feedback