Presentation is loading. Please wait.

Presentation is loading. Please wait.

CENG 336 ARM core 1. 2 ARM Ltd Founded in November 1990 – Spun out of Acorn Computers Designs the ARM range of RISC processor cores Licenses ARM core.

Similar presentations


Presentation on theme: "CENG 336 ARM core 1. 2 ARM Ltd Founded in November 1990 – Spun out of Acorn Computers Designs the ARM range of RISC processor cores Licenses ARM core."— Presentation transcript:

1 CENG 336 ARM core 1

2 2

3 ARM Ltd Founded in November 1990 – Spun out of Acorn Computers Designs the ARM range of RISC processor cores Licenses ARM core designs to semiconductor partners who fabricate and sell to their customers. – ARM does not fabricate silicon itself Also develop technologies to assist with the design-in of the ARM architecture – Software tools, boards, debug hardware, application software, bus architectures, peripherals etc 3

4 Intoduction 4 Leading provider of 32-bit embedded RISC microprocessors, 75% of market High performance Low power consumption Low system cost Solutions for Embedded real-time systems for mass storage, automotive, industrial and networking applications Secure applications - smartcards and SIMs Open platforms running complex operating systems

5 ARM Powered Applications (1)

6 ARM Powered Applications (2) Automotive - ComRoad, empeg, Raytheon, Marine, SENA Consumer Multimedia - Sega, Sharp, Sony, Toshiba, Pace Embedded Control - Conexant, Gemplus, IBM, Olivetti

7 ARM Powered Applications (3) Handheld Computing - Apple, Ericsson, Hewlett Packard, Psion Internet Appliances - Daewoo, Oracle, RCA, Samsung Networking - 3Com, Ericsson, Virata, VLSI Portable Telephony - Ericsson, Hitachi, Nokia, Philips, Qualcomm

8 Why ARM here? ARM is one of the most licensed and thus widespread processor cores in the world Used especially in portable devices due to low power consumption and reasonable performance (MIPS / watt) Several interesting extensions available or in development like – Thumb instruction set and – Jazelle Java machine 8

9 9 ARMv1 First version of ARM processor 26-bit addressing, no multiply / coprocessor ARMv2 ARM2, First commercial chip Included 32-bit result multiply instructions / coprocessor support 2/3

10 10 ARMv2a ARM3 chip with on-chip cache Added load and store cache management ARMv3 ARM6, 32 bit addressing, virtual memory support 3/3

11 ARM Microprocessors Roadmap

12 ARM Microprocessor

13 ARM architecture 32-bit RISC-processor core (32-bit instructions) 37 pieces of 32-bit integer registers (16 available) Pipelined (ARM7: 3 stages) Cached (depending on the implementation) Von Neuman-type bus structure (ARM7), Harvard (ARM9) 8/16/32 -bit data types 7 modes of operation (usr, fiq, irq, svc, abt, sys, und) Simple structure -> reasonably good speed/power consumption ratio 13

14 Processor Core Comparison

15 15

16 ARM 7 Family

17 ARM 7 core ARM7 small, high-performance, low-power, integratable 32-bit RISC processor core. Originally developed for portable communications. ARM7TDMI the companys most widely licensed product. It combines an ARM7 instruction set with the Thumb extension to reduce memory size and system cost, Embedded ICE debug to ease system design and a DSP enhanced multiply extension to improve performance. Typical applications are digital cellular phones and hard disc drives.

18 ARM 9 Family

19 ARM 9E Family

20 ARM 9 core ARM9TDMI 5-stage pipeline integer core with Thumb extension, Debug, and Harvard buses. It provides more than twice the performance of the ARM7TDMI core on the same manufacturing process. Typical applications are networking and set-top- boxes. ARM9E Based on the ARM9TDMI TM core and includes new enhanced signal processing extensions to the ARM instruction set. It has been developed to address the rapidly growing range of applications that require both control and DSP capability. These applications include hard disk drives (HDDs), digital video disks (DVDs), mobile telephony, automotive and industrial control solutions.

21 ARM 10 Family

22 ARM 10 core ARM10TDMI Designed as a higher performance complement to the ARM9TDMI core, offering nearly twice the performance of the ARM9TDMI core. The core features a 6-stage pipeline, Harvard buses, Thumb extension and full debug access to all programmers model states. Target markets include next generation hand-held communications products and digital consumer and multimedia applications.

23 Strong ARM. Very high-performance. General purpose microprocessor. Made by Intel. SA110 - general embedded standard processor. SA1110 - palm sized devices, SA1111 companion chip.

24 PROGRAMMERS MODEL 24

25 Data Sizes and Instruction Sets The ARM is a 32-bit architecture. When used in relation to the ARM: – Byte means 8 bits – Halfword means 16 bits (two bytes) – Word means 32 bits (four bytes) Most ARM’s implement two instruction sets – 32-bit ARM Instruction Set – 16-bit Thumb Instruction Set Jazelle cores can also execute Java bytecode

26 ARM Processor Core 26 Current low-end ARM core for applications like digital mobile phones TDMI T: Thumb, 16-bit instruction set D: on-chip Debug support, enabling the processor to halt in response to a debug request M: enhanced Multiplier, yield a full 64-bit result, high performance I: EmbeddedICE hardware-on chip debug Von Neumann architecture 3-stage pipeline

27 Architecture ( ARM7TDMI ) Address register Address increment Register bank(file) Booth ’ s Multiplier Barrel Shifter ALU Read/write data register Control (decoding) Fetch/prefetch block

28 ARM Core Diagram 28

29 The Registers ARM has 37 registers all of which are 32-bits long. – 1 dedicated program counter – 1 dedicated current program status register – 5 dedicated saved program status registers – 30 general purpose registers The current processor mode governs which of several banks is accessible. Each mode can access – a particular set of r0-r12 registers – a particular r13 (the stack pointer, sp) and r14 (the link register) – the program counter, r15 (pc) – the current program status register, cpsr Privileged modes (except System) can also access – a particular spsr (saved program status register) 29

30 Instructions The ARM processor supports 25 different instructions. These instructions can be grouped into 4 categories: – Data processing instructions (18 instructions) – Memory instructions ( 4 instructions ) – Branch instructions ( 2 instructions ) – Software interrupts ( 1 instruction ) 30

31 Conditional Instructions All ARM instructions are conditional. The four most significant bits of the instruction word are used to indicate one of the following 16 conditions: 31 EQ (EQual) 0000 NE (Not Equal) 0001 CS (Carry Set) 0010 CC (Carry Clear) 0011 MI (MInus) 0100 PL (PLus) 0101 VS (oVerflow Set) 0110 VC (oVerflow Clear) 0111 HI (Higher) 1000 LS (Lower or Same) 1001 GE (Greater or Equal) 1010 LT (Less Than) 1011 GT (Greater Than) 1100 LE (Less than or Equal) 1101 AL (Always) 1110 NV (NeVer) 1111 If no condition is appended to the opcode menmonic, the condition AL (always) is assumed. Appending the condition NV (never) to any instruction effectively turns it into a ‘no operation’ instruction.

32 The Stack The ARM architecture offers extensive support for memory stack by allowing programmers to chose one of four stack format/orientation. Empty or Full: – Empty: Stack Pointer points to the next free space on stack – Full: Stack Pointer points to the last item on the stack Ascending or Descending: – Ascending: Grows from low memory to high memory – Descending: Grows from high memory to low memory I386, Sparc and PowerPC all use a “Full, Descending” stack format. 32

33 Addressing Modes The ARM architecture has a large variety of addressing modes: – Register Indirect Addressing – Pre-Indexed Addressing – Write back – Post-Indexed Addressing – Program Counter Relative Addressing 33

34 Thumb Instruction Set: – Enables Software to be coded using short 16-bit instructions. – Provides typical memory savings of 35%, while retaining the benefits of a 32- bit system. – There is zero overhead for moving between Thumb and normal Arm state. DSP Instruction Set: – Provides up to 70% for audio DSP applications. Jazelle Instruction Set: – Introduces technological infrastructure for running Java code (faster than software-based Java Virtual Machine) – Java execution is accelerated by 8x and power consumption is reduced by 80%. SIMD Extensions: – Increase the processing capability of ARM solutions in high-performance applications – Lowers power consumption required by portable, battery- powered devices 34

35 When the processor is executing in ARM state: – All instructions are 32 bits wide – All instructions must be word aligned When the processor is executing in Thumb state: – All instructions are 16 bits wide – All instructions must be halfword aligned When the processor is executing in Jazelle state: – All instructions are 8 bits wide – Processor performs a word access to read 4 instructions at once DIFFERENT STATES 35

36 Thumb is a 16-bit instruction set – Optimised for code density from C code (~65% of ARM code size) – Improved performance from narrow memory – Subset of the functionality of the ARM instruction set Core has additional execution state - Thumb – Switch between ARM and Thumb using BX instruction 01515 310 ADDS r2,r2,#1 ADD r2,#1 32-bit ARM Instruction 16-bit Thumb Instruction For most instructions generated by compiler: Conditional execution is not used Source and destination registers identical Only Low registers used Constants are of limited size Inline barrel shifter not used Thumb 36

37 ARM Interface Signals (1/4) 37

38 ARM Interface Signals (2/4) Clock control – All state change within the processor are controlled by mclk, the memory clock – Internal clock = mclk AND \wait – eclk clock output reflects the clock used by the core Memory interface – 32-bit address A[31:0], bidirectional data bus D[31:0], separate data out Dout[31:0], data in Din[31:0] – seq indicates that the memory address will be sequential to that used in the previous cycle 38

39 ARM Interface Signals (3/4) Interrupt – \fiq, fast interrupt request, higher priority – \irq, normal interrupt request – isync, allow the interrupt synchronizer to be passed Initialization – \reset, starts the processor from a known state, executing from address 00000000 16 ARM characteristics 39

40 Memory Access The ARM is a Von Neumann, load/store architecture, i.e., – Only 32 bit data bus for both inst. And data. – Only the load/store inst. (and SWP) access memory. Memory is addressed as a 32 bit address space Data type can be 8 bit bytes, 16 bit half- words or 32 bit words, and may be seen as a byte line folded into 4-byte words 40

41 Processor Core Vs CPU Core 41  Processor Core –The engine that fetches instructions and execute them –E.g.: ARM7TDMI, ARM9TDMI, ARM9E-S  CPU Core –Consists of the ARM processor core and some tightly coupled function blocks –Cache and memory management blocks –E.g.: ARM710T, ARM720T, ARM74T, ARM920T, ARM922T, ARM940T, ARM946E-S, and ARM966E-S ARM710T

42 Processor Modes The ARM has seven basic operating modes: – User : unprivileged mode under which most tasks run (normal program execution mode) – FIQ : entered when a high priority (fast) interrupt is raised (data transfer state) – IRQ : entered when a low priority (normal) interrupt is raised (Used for general interrupt services) – Supervisor : entered on reset and when a Software Interrupt (instruction is executed Protected mode for operating system support ) – Abort : used to handle memory access violations – Undef : used to handle undefined instructions – System : privileged mode using the same registers as user mode (Operating system ‘privilege’-mode for user)

43 r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 (sp) r14 (lr) r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr FIQIRQSVCUndefAbort User Mode r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 (sp) r14 (lr) r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers FIQIRQSVCUndefAbort r0 r1 r2 r3 r4 r5 r6 r7 r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers UserIRQSVCUndefAbort r8 r9 r10 r11 r12 r13 (sp) r14 (lr) FIQ ModeIRQ Mode r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers UserFIQSVCUndefAbort r13 (sp) r14 (lr) Undef Mode r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers UserFIQIRQSVCAbort r13 (sp) r14 (lr) SVC Mode r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers UserFIQIRQUndefAbort r13 (sp) r14 (lr) Abort Mode r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers UserFIQIRQSVCUndef r13 (sp) r14 (lr) The ARM Register Set

44 Register Organization Summary User mode r0-r7, r15, and cpsr r8 r9 r10 r11 r12 r13 (sp) r14 (lr) spsr FIQ r8 r9 r10 r11 r12 r13 (sp) r14 (lr) r15 (pc) cpsr r0 r1 r2 r3 r4 r5 r6 r7 User r13 (sp) r14 (lr) spsr IRQ User mode r0-r12, r15, and cpsr r13 (sp) r14 (lr) spsr Undef User mode r0-r12, r15, and cpsr r13 (sp) r14 (lr) spsr SVC User mode r0-r12, r15, and cpsr r13 (sp) r14 (lr) spsr Abort User mode r0-r12, r15, and cpsr Thumb state Low registers Thumb state High registers Note: System mode uses the User mode register set

45 Program Status Registers Condition code flags – N = Negative result from ALU – Z = Zero result from ALU – C = ALU operation Carried out – V = ALU operation oVerflowed Sticky Overflow flag - Q flag – Architecture 5TE/J only – Indicates if saturation has occurred J bit – Architecture 5TEJ only – J = 1: Processor in Jazelle state Interrupt Disable bits. – I = 1: Disables the IRQ. – F = 1: Disables the FIQ. T Bit – Architecture xT only – T = 0: Processor in ARM state – T = 1: Processor in Thumb state Mode bits – Specify the processor mode 2731 N Z C V Q 2867 I F T mode 1623 815 54024 fsxc U n d e f i n e dJ

46 When the processor is executing in ARM state: – All instructions are 32 bits wide – All instructions must be word aligned – Therefore the pc value is stored in bits [31:2] with bits [1:0] undefined (as instruction cannot be halfword or byte aligned). When the processor is executing in Thumb state: – All instructions are 16 bits wide – All instructions must be halfword aligned – Therefore the pc value is stored in bits [31:1] with bit [0] undefined (as instruction cannot be byte aligned). When the processor is executing in Jazelle state: – All instructions are 8 bits wide – Processor performs a word access to read 4 instructions at once Program Counter (r15)

47 Vector Table Exception Handling When an exception occurs, the ARM: – Copies CPSR into SPSR_ – Sets appropriate CPSR bits Change to ARM state Change to exception mode Disable interrupts (if appropriate) – Stores the return address in LR_ – Sets PC to vector address To return, exception handler needs to: – Restore CPSR from SPSR_ – Restore PC from LR_ This can only be done in ARM state. Vector table can be at 0xFFFF0000 on ARM720T and on ARM9/10 family devices FIQ IRQ (Reserved) Data Abort Prefetch Abort Software Interrupt Undefined Instruction Reset 0x1C 0x18 0x14 0x10 0x0C 0x08 0x04 0x00

48 Pipelining 48 ARM uses a 3-stage instruction pipeline Fetch: fetch instruction code from memory into the instruction pipeline Decode: instruction decoded to obtain control signals for the datapath ready for the next stage Execute: instruction “owns” the datapath - register read; shifting; ALU results generated and write-back

49 Pipelining 49 At any time, 3 different instructions may occupy each of the the 3- stages of pipeline It may take three cycles to complete a single-cycle instruction. This is said to have a three cycle latency Once a pipeline fills, the processor completes a single-cycle instruction every clock cycle. Therefore the throughput is one instruction per cycle.

50 Development of the ARM Architecture


Download ppt "CENG 336 ARM core 1. 2 ARM Ltd Founded in November 1990 – Spun out of Acorn Computers Designs the ARM range of RISC processor cores Licenses ARM core."

Similar presentations


Ads by Google