Presentation is loading. Please wait.

Presentation is loading. Please wait.

11 Architecture Revisions 199820002002 2004 time version ARMv5 ARMv6 199419962006 V4 StrongARM ® ARM926EJ-S™ XScale TM ARM102xE ARM1026EJ-S™ ARM9x6E ARM92xT.

Similar presentations


Presentation on theme: "11 Architecture Revisions 199820002002 2004 time version ARMv5 ARMv6 199419962006 V4 StrongARM ® ARM926EJ-S™ XScale TM ARM102xE ARM1026EJ-S™ ARM9x6E ARM92xT."— Presentation transcript:

1 11 Architecture Revisions 199820002002 2004 time version ARMv5 ARMv6 199419962006 V4 StrongARM ® ARM926EJ-S™ XScale TM ARM102xE ARM1026EJ-S™ ARM9x6E ARM92xT ARM1136JF-S™ ARM7TDMI-S™ ARM720T™ XScale is a trademark of Intel Corporation ARMv7 SC100™ SC200™ ARM1176JZF-S™ ARM1156T2F-S™

2 22 Data Sizes and Instruction Sets  The ARM is a 32-bit architecture.  When used in relation to the ARM:  Byte means 8 bits  Halfword means 16 bits (two bytes)  Word means 32 bits (four bytes)  Most ARM’s implement two instruction sets  32-bit ARM Instruction Set  16-bit Thumb Instruction Set  Jazelle cores can also execute Java bytecode

3 33 ARM States  ARM architecture define a 16-bit instruction set called the Thumb instruction set. The functionality of the Thumb instruction set is a subset of the functionality of the 32-bit ARM instruction set.  A processor that is executing Thumb instructions is said to be operating in Thumb state. A Thumb-capable processor that is executing ARM instructions is said to be operating in ARM state.  ARM processors always start in ARM state. You must explicitly change to Thumb state using a BX (Branch and exchange instruction set) instruction.

4 44 Processor Modes  The ARM has seven basic operating modes:  User : unprivileged mode under which most tasks run  FIQ : entered when a high priority (fast) interrupt is raised  IRQ : entered when a low priority (normal) interrupt is raised  Supervisor : entered on reset and when a Software Interrupt instruction is executed  Abort : used to handle memory access violations  Undef : used to handle undefined instructions  System : privileged mode using the same registers as user mode

5 55 r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 (sp) r14 (lr) r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr FIQIRQSVCUndefAbort User Mode r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 (sp) r14 (lr) r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers FIQIRQSVCUndefAbort r0 r1 r2 r3 r4 r5 r6 r7 r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers UserIRQSVCUndefAbort r8 r9 r10 r11 r12 r13 (sp) r14 (lr) FIQ ModeIRQ Mode r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers UserFIQSVCUndefAbort r13 (sp) r14 (lr) Undef Mode r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers UserFIQIRQSVCAbort r13 (sp) r14 (lr) SVC Mode r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers UserFIQIRQUndefAbort r13 (sp) r14 (lr) Abort Mode r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers UserFIQIRQSVCUndef r13 (sp) r14 (lr) The ARM Register Set

6 66 Registers and ARM States  In ARM state, 16 general registers and one or two status registers are accessible at any one time.  In Thumb state, eight general registers, the Program Counter (PC), Stack Pointer (SP), Link Register (LR), and Current Program Status Register (CPSR) are accessible.

7 77 Registers and ARM States  The limitations of Thumbs are alleviated with Thumb2 while retaining very dense coding  Thumb-2 introduced variable length instructions to the original Thumb  Now instructions can be a mixture of 16-bit and 32-bit. That means you retain the size advantage of the original Thumb in everyday code, but now have access to almost the full ARM feature-set in more complex code,  Aside from the aforementioned access to the full register set from all register operations,

8 88 Cortex-M4 Devices Processor mode and privilege levels for software execution  The processor modes are:  Thread mode  Used to execute application software. The processor enters Thread mode when it comes out of reset.  Handler mode  Used to handle exceptions. The processor returns to Thread mode when it has finished all exception processing.

9 99 Cortex-M4 Devices Processor mode and privilege levels for software execution  The privilege levels for software execution are:  Unprivileged  The software:  has limited access to the MSR and MRS instructions, and cannot use the CPS instruction  cannot access the system timer, NVIC, or system control block  might have restricted access to memory or peripherals.  Unprivileged software executes at the unprivileged level.  Privileged  The software can use all the instructions and has access to all resources.  Privileged software executes at the privileged level.

10 10 Vector Table Exception Handling  When an exception occurs, the ARM:  Copies CPSR into SPSR_  Sets appropriate CPSR bits  Change to ARM state  Change to exception mode  Disable interrupts (if appropriate)  Stores the return address in LR_  Sets PC to vector address  To return, exception handler needs to:  Restore CPSR from SPSR_  Restore PC from LR_ This can only be done in ARM state. Vector table can be at 0xFFFF0000 on ARM720T and on ARM9/10 family devices FIQ IRQ (Reserved) Data Abort Prefetch Abort Software Interrupt Undefined Instruction Reset 0x1C 0x18 0x14 0x10 0x0C 0x08 0x04 0x00

11 11 039v12 Program Status Registers  Condition code flags  N = Negative result from ALU  Z = Zero result from ALU  C = ALU operation Carried out  V = ALU operation oVerflowed  Sticky Overflow flag - Q flag  Architecture 5TE/J only  Indicates if saturation has occurred  J bit  Architecture 5TEJ only  J = 1: Processor in Jazelle state  Interrupt Disable bits.  I = 1: Disables the IRQ.  F = 1: Disables the FIQ.  T Bit  Architecture xT only  T = 0: Processor in ARM state  T = 1: Processor in Thumb state  Mode bits  Specify the processor mode 2731 N Z C V Q 2867 I F T mode 1623 815 54024 fsxc U n d e f i n e dJ

12 12 039v12 Cortex-M4 Registers Related Details

13 13 039v12  Run the Assemly_1 project:  Understand the overall structure of the code  Main debug operations  What is the special role of R15 and R14 ?  Understand how to call a function or subroutine  What is the relation between the back link address and R14 ? Why ?  Go to unprivileged mode  Run the Blink_NEW project:  Understand the overall structure of the code  Main debug operations  Compare counter++ and counter = counter + 2 assembly conversions  Where is counter variable stored ?  Create another function, call it from main  How call is translated in assembly? Basic Understanding of Environment

14 14  ARM instructions can be made to execute conditionally by postfixing them with the appropriate condition code field.  This improves code density and performance by reducing the number of forward branch instructions. CMP r3,#0 CMP r3,#0 BEQ skip ADDNE r0,r1,r2 ADD r0,r1,r2 skip  By default, data processing instructions do not affect the condition code flags but the flags can be optionally set by using “S”. CMP does not need “S”. loop … SUBS r1,r1,#1 BNE loop if Z flag clear then branch decrement r1 and set flags Conditional Execution and Flags

15 15 Condition Codes Not equal Unsigned higher or same Unsigned lower Minus Equal Overflow No overflow Unsigned higher Unsigned lower or same Positive or Zero Less than Greater than Less than or equal Always Greater or equal EQ NE CS/HS CC/LO PL VS HI LS GE LT GT LE AL MI VC SuffixDescription Z=0 C=1 C=0 Z=1 Flags tested N=1 N=0 V=1 V=0 C=1 & Z=0 C=0 or Z=1 N=V N!=V Z=0 & N=V Z=1 or N=!V  The possible condition codes are listed below  Note AL is the default and does not need to be specified

16 16 Conditional execution examples if (r0 == 0) { r1 = r1 + 1; } else { r2 = r2 + 1; } C source code CMP r0, #0 BNE else ADD r1, r1, #1 B end else ADD r2, r2, #1 end... ARM instructions unconditional CMP r0, #0 ADDEQ r1, r1, #1 ADDNE r2, r2, #1... conditional Compile all three cases and measure Instruction count Byte count Cycle count

17 17 Data processing Instructions  Largest family of ARM instructions, all sharing the same instruction format.  Contains:  Arithmetic operations  Comparisons (no results ‐ just set condition codes)  Logical operations  Data movement between registers  Remember, this is a load / store architecture  These instruction only work on registers, NOT memory.  They each perform a specific operation on one or two operands.  First operand always a register ‐ Rn  Second operand sent to the ALU via barrel shifter.  We will examine the barrel shifter shortly.

18 18 Arithmetic Operations  Operations are:ADDoperand1 + operand2; Add  ADC operand1 + operand2 + carry; Add with carry  SUB operand1 ‐ operand2; Subtract  SBC operand1 ‐ operand2 + carry ‐ 1 ; Subtract with carry  RSB operand2 ‐ operand1; Reverse subtract  RSC operand2 ‐ operand1 + carry ‐ 1; Reverse subtract with carry  Syntax: { }{S} Rd, Rn, Operand2  Examples  ADD r0, r1, r2  SUBGT r3, r3, #1  RSBLES r4, r5, #5

19 19 Comparisons  The only effect of the comparisons is to update the condition flags. Thus no need to set S bit.  Operations are:  CMP operand1 ‐ operand2; Compare  CMN operand1 + operand2; Compare negative  TST operand1 AND operand2; Test  TEQ operand1 EOR operand2; Test equivalence  Syntax:  { } Rn, Operand2  Examples:  CMPr0, r1  TSTEQr2, #5

20 20 Logical Operations  Operations are:  ANDoperand1 AND operand2  EORoperand1 EOR operand2  ORRoperand1 OR operand2  ORN operand1 NOR operand2  BIC operand1 AND NOT operand2 [ie bit clear]  Syntax:  { }{S} Rd, Rn, Operand2  Examples:  AND r0, r1, r2  BICE Qr2, r3, #7  EORS r1,r3,r0

21 21 Data Movement  Operations are:  MOVoperand2  MVNNOT operand2  Note that these make no use of operand1.  Syntax:  { }{S} Rd, Operand2  Examples:  MOV r0, r1  MOVS r2, #10  MVNEQ r1,#0

22 22 The Barrel Shifter  The ARM doesn’t have actual shift instructions.  Instead it has a barrel shifter which provides a mechanism to carry out shifts as part of other instructions.  So what operations does the barrel shifter support?

23 23 The Barrel Shifter  Barrel Shifter ‐ Left Shift  Shifts left by the specified amount (multiplies by powers of two) e.g.  LSL #5 => multiply by 32

24 24 The Barrel Shifter  Barrel Shifter ‐ Rotations

25 25 Register, optionally with shift operation  Shift value can be either be:  5 bit unsigned integer  Specified in bottom byte of another register.  Used for multiplication by constant Immediate value  8 bit number, with a range of 0- 255.  Rotated right through even number of positions  Allows increased range of 32-bit constants to be loaded directly into registers Result Operand 1 Barrel Shifter Operand 2 ALU Using a Barrel Shifter:The 2nd Operand

26 26 Second Operand : Shifted Register  The amount by which the register is to be shifted is contained in either:  the immediate 5 ‐ bit field in the instruction  NO OVERHEAD  Shift is done for free ‐ executes in single cycle.  the bottom byte of a register (not PC)  Then takes extra cycle to execute  ARM doesn’t have enough read ports to read 3 registers at once.  Then same as on other processors where shift is separate instruction.  If no shift is specified then a default shift is applied: LSL #0  i.e. barrel shifter has no effect on value in register.

27 27 Second Operand: Using a Shifted Register  Using a multiplication instruction to multiply by a constant means first loading the constant into a register and then waiting a number of internal cycles for the instruction to complete.  A more optimum solution can often be found by using some combination of MOVs, ADDs, SUBs and RSBs with shifts.  Multiplications by a constant equal to a ((power of 2) ±1) can be done in one cycle.  MOV R2, R0, LSL #2; Shift R0 left by 2, write to R2, (R2=R0x4)  ADD R9, R5, R5, LSL #3 ; R9 = R5 + R5 x 8 or R9 = R5 x 9  RSB R9, R5, R5, LSL #3 ; R9 = R5 x 8 ‐ R5 or R9 = R5 x 7  SUB R10, R9, R8, LSR #4 ; R10 = R9 ‐ R8 / 16  MOV R12, R4, ROR R3 ; R12 = R4 rotated right by value of R3

28 28 Data Processing Exercise 1. How would you load the two’s complement representation of -1 into Register 3 using one instruction? 2. Implement an ABS (absolute value) function for a registered value using only two instructions. 3. Multiply a number by 35, guaranteeing that it executes in 2 core clock cycles.

29 29 Data Processing Solutions 1. MVNr3, #0 2. MOVSr7,r7; set the flags RSBMIr7,r7,#0; if neg, r7=0-r7 3. ADDr9,r8,r8,LSL #2; r9=r8*5 RSBr10,r9,r9,LSL #3; r10=r9*7

30 30  No ARM instruction can contain a 32 bit immediate constant  All ARM instructions are fixed as 32 bits long  The data processing instruction format has 12 bits available for operand2  4 bit rotate value (0-15) is multiplied by two to give range 0- 30 in steps of 2  Rule to remember is “8-bits rotated right by an even number of bit positions” 07118 immed_8 Shifter ROR rot x2 Quick Quiz: 0xe3a004ff MOV r0, #??? Immediate constants

31 31 Second Operand: Immediate Value (1)  There is no single instruction which will load a 32 bit immediate constant into a register without performing a data load from memory.  All ARM instructions are 32 bits long  The data processing instruction format has 12 bits available for operand2  If used directly this would only give a range of 4096.  Instead it is used to store 8 bit constants, giving a range of 0 ‐ 255.  These 8 bits can then be rotated right through an even number of positions (ie RORs by 0, 2, 4,..30).  This gives a much larger range of constants that can be directly loaded, though some constants will still need to be loaded from memory.

32 32 Second Operand: Immediate Value (2)  This gives us:  0 ‐ 255[0 ‐ 0xff]  256,260,264,..,1020[0x100 ‐ 0x3fc, step 4, 0x40 ‐ 0xff ror30]  1024,1040,1056,..,4080[0x400 ‐ 0xff0, step 16, 0x40 ‐ 0xff ror28]  4096,4160, 4224,..,16320[0x1000 ‐ 0x3fc0, step 64, 0x40 ‐ 0xff ror26]  These can be loaded using, for example:  MOV r0, #0x40, 26; => MOV r0, #0x1000 (ie 4096)  To make this easier, the assembler will convert to this form for us if simply given the required constant:  MOV r0, #4096; => MOV r0, #0x1000 (ie 0x40 ror 26)  The bitwise complements can also be formed using MVN:  MOV r0, #0xFFFFFFFF ; assembles to MVN r0, #0  If the required constant cannot be generated, an error will be reported.

33 33  To allow larger constants to be loaded, the assembler offers a pseudo- instruction:  LDR rd, =const  This will either:  Produce a MOV or MVN instruction to generate the value (if possible). or  Generate a LDR instruction with a PC-relative address to read the constant from a literal pool (Constant data area embedded in the code).  For example  LDR r0,=0xFF => MOV r0,#0xFF  LDR r0,=0x55555555 => LDR r0,[PC,#Imm12] … … DCD 0x55555555  This is the recommended way of loading constants into a register Loading 32 bit constants

34 34 Single register data transfer LDRSTR Word LDRBSTRB Byte LDRHSTRH Halfword LDRSB Signed byte load LDRSH Signed halfword load  Memory system must support all access sizes  Syntax:  LDR { }{ } Rd,  STR { }{ } Rd, e.g. LDREQB

35 35 Address accessed  Address accessed by LDR/STR is specified by a base register with an offset  For word and unsigned byte accesses, offset can be:  An unsigned 12-bit immediate value (i.e. 0 - 4095 bytes) LDR r0, [r1, #8]  A register, optionally shifted by an immediate value LDR r0, [r1, r2] LDR r0, [r1, r2, LSL#2]  This can be either added or subtracted from the base register: LDR r0, [r1, #-8] LDR r0, [r1, -r2, LSL#2]  For halfword and signed halfword / byte, offset can be:  An unsigned 8 bit immediate value (i.e. 0 - 255 bytes)  A register (unshifted)  Choice of pre-indexed or post-indexed addressing  Choice of whether to update the base pointer (pre-indexed only) LDR r0, [r1, #-8]!

36 36 Load/Store Exercise Assume an array of 25 words. A compiler associates y with r1. Assume that the base address for the array is located in r2. Translate this C statement/assignment using just three instructions: array[10] = array[5] + y;

37 37 Load/Store Exercise Solution array[10] = array[5] + y; LDR r3, [r2, #5] ; r3 = array[5] ADD r3, r3, r1 ; r3 = array[5] + y STR r3, [r2, #10] ; array[5] + y = array[10]

38 38 Load and Store Multiples  Syntax:  { } Rb{!},  4 addressing modes:  LDMIA / STMIA increment after  LDMIB / STMIB increment before  LDMDA / STMDA decrement after  LDMDB / STMDB decrement before IA r1 Increasing Address r4 r0 r1 r4 r0 r1 r4 r0r1 r4 r0 r10 IBDADB LDMxx r10, {r0,r1,r4} STMxx r10, {r0,r1,r4} Base Register (Rb)

39 39  There are 2 classes of multiply - producing 32-bit and 64-bit results  32-bit versions on an ARM7TDMI will execute in 2 - 5 cycles  MUL r0, r1, r2; r0 = r1 * r2  MLA r0, r1, r2, r3; r0 = (r1 * r2) + r3  64-bit multiply instructions offer both signed and unsigned versions  For these instruction there are 2 destination registers  [U|S]MULL r4, r5, r2, r3; r5:r4 = r2 * r3  [U|S]MLAL r4, r5, r2, r3; r5:r4 = (r2 * r3) + r5:r4  Most ARM cores do not offer integer divide instructions  Division operations will be performed by C library routines or inline shifts Multiply and Divide

40 40  Branch : B{ } label  Branch with Link : BL{ } subroutine_label  The processor core shifts the offset field left by 2 positions, sign-extends it and adds it to the PC  ± 32 Mbyte range  How to perform longer branches? 2831240 Cond 1 0 1 L Offset Condition field Link bit 0 = Branch 1 = Branch with link 232527 Branch instructions

41 41 Register Usage r8 r9/sb r10/sl r11 r12 r13/sp r14/lr r15/pc r0 r1 r2 r3 r4 r5 r6 r7 Register variables Must be preserved Arguments into function Result(s) from function otherwise corruptible ( Additional parameters passed on stack) Scratch register (corruptible) Stack Pointer Link Register Program Counter The compiler has a set of rules known as a Procedure Call Standard that determine how to pass parameters to a function (see AAPCS) CPSR flags may be corrupted by function call. Assembler code which links with compiled code must follow the AAPCS at external interfaces The AAPCS is part of the new ABI for the ARM Architecture Register - Stack base - Stack limit if software stack checking selected - R14 can be used as a temporary once value stacked - SP should always be 8-byte (2 word) aligned

42 42 039v12 ARM Branches and Subroutines  B  PC relative. ±32 Mbyte range.  BL  Stores return address in LR  Returning implemented by restoring the PC from LR  For non-leaf functions, LR will have to be stacked STMFD sp!,{regs,lr} : BL func2 : LDMFD sp!,{regs,pc} func1func2 : BL func1 : MOV pc, lr

43 43 PSR access  MRS and MSR allow contents of CPSR / SPSR to be transferred to / from a general purpose register or take an immediate value  MSR allows the whole status register, or just parts of it to be updated  Interrupts can be enable/disabled and modes changed, by writing to the CPSR  Typically a read/modify/write strategy should be used: MRS r0,CPSR; read CPSR into r0 BIC r0,r0,#0x80; clear bit 7 to enable IRQ MSR CPSR_c,r0; write modified value to ‘c’ byte only  In User Mode, all bits can be read but only the condition flags (_f) can be modified fsxc 2731 N Z C V Q 2867 I F T mode 1623 15 54024 J 10 8 9 19 GE[3:0] E A IT cond_abc de

44 44 Agenda Introduction to ARM Ltd Fundamentals, Programmer’s Model, and Instructions  Core Family Pipelines AMBA

45 45 Pipeline changes for ARM9TDMI Instruction Fetch Shift + ALU Memory Access Reg Write Reg Read Reg Decode FETCHDECODEEXECUTEMEMORYWRITE ARM9TDMI ARM or Thumb Inst Decode Reg Select Reg Read ShiftALU Reg Write Thumb  ARM decompress ARM decode Instruction Fetch FETCHDECODEEXECUTE ARM7TDMI

46 46 ARM10 vs. ARM11 Pipelines ARM11 Fetch 1 Fetch 2 DecodeIssueShiftALU Saturate Write back MAC 1 MAC 2 MAC 3 Address Data Cache 1 Data Cache 2 Shift + ALU Memory Access Reg Write FETCH DECODE EXECUTE MEMORY WRITE Reg Read Multiply Branch Prediction Instruction Fetch ISSUE ARM or Thumb Instruction Decode Multiply Add ARM10

47 47 Agenda Introduction to ARM Ltd Fundamentals, Programmer’s Model, and Instructions Core Family Pipelines  AMBA

48 48 039v12 Example ARM-based System 16 bit RAM 8 bit ROM 32 bit RAM ARM Core I/O Peripherals Interrupt Controller nFIQnIRQ

49 49 039v12 High Performance ARM processor High-bandwidth on-chip RAM High Bandwidth External Memory Interface DMA Bus Master APB Bridge Timer Keypad UART PIO AHB APB High Performance Pipelined Burst Support Multiple Bus Masters Low Power Non-pipelined Simple Interface An Example AMBA System

50 50 039v12 HWDATA Arbiter Decoder Master #1 Master #3 Master #2 Slave #1 Slave #4 Slave #3 Slave #2 Address/Control Write Data Read Data HADDR HWDATA HRDATA HADDR HRDATA AHB Structure


Download ppt "11 Architecture Revisions 199820002002 2004 time version ARMv5 ARMv6 199419962006 V4 StrongARM ® ARM926EJ-S™ XScale TM ARM102xE ARM1026EJ-S™ ARM9x6E ARM92xT."

Similar presentations


Ads by Google