Presentation is loading. Please wait.

Presentation is loading. Please wait.

ARM Politecnico di Torino Dipartimento di Automatica e Informatica M. Sonza Reorda – M. Rebaudengo.

Similar presentations


Presentation on theme: "ARM Politecnico di Torino Dipartimento di Automatica e Informatica M. Sonza Reorda – M. Rebaudengo."— Presentation transcript:

1 ARM Politecnico di Torino Dipartimento di Automatica e Informatica M. Sonza Reorda – M. Rebaudengo

2 M. Sonza Reorda – a.a. 2006/07 2 Outline  Introduction  The instruction set  The ARM architecture  ARM systems

3 M. Sonza Reorda – a.a. 2006/07 3 Introduction The ARM processor was first developed (between 1983 and 1985) by Acorn Computers, Ltd., based in Cambridge (UK). ARM designers were heavily influenced by Berkeley RISC I. In 1990, ARM Ltd. was founded by Acorn, Apple and VLSI. Several versions of ARM processors were designed in the following years. Today, ARM cores are widely popular among SoC designers, mainly because they show a very good trade- off between performance and power consumption.

4 M. Sonza Reorda – a.a. 2006/07 4 ARM processors They are mainly sold as cores, to be used for integration in Systems on Chip (SoCs). Cores can be  Hard cores: ARM provides a physical layout, implemented in a given technology  Soft cores: ARM provides a high-level description, that can be then synthesized to any technology by the designer. In a few cases, ARM processors have been delivered as stand-alone devices.

5 M. Sonza Reorda – a.a. 2006/07 5 ARM processors They are mainly sold as cores, to be used for integration in Systems on Chip (SoCs). Cores can be  Hard cores: ARM provides a physical layout, implemented in a given technology  Soft cores: ARM provides a high-level description, that can be then synthesized to any technology by the designer. In a few cases, ARM processors have been delivered as stand-alone devices. They are generally more efficient (in terms of area, speed and power), but require a significant implementation work to be mapped on a new technology.

6 M. Sonza Reorda – a.a. 2006/07 6 ARM processors They are mainly sold as cores, to be used for integration in Systems on Chip (SoCs). Cores can be  Hard cores: ARM provides a physical layout, implemented in a given technology  Soft cores: ARM provides a high-level description, that can be then synthesized to any technology by the designer. In a few cases, ARM processors have been delivered as stand-alone devices. They are generally less efficient, but moving to a new technology is easier and can be performed by the designer (i.e., provide a higher return from investment for ARM customers).

7 M. Sonza Reorda – a.a. 2006/07 7 Characteristics  Very simple design  Load-store architecture  Fixed-length 32-bit instructions  3-address instruction formats.

8 M. Sonza Reorda – a.a. 2006/07 8 Programmer’s model r13_und r14_und r14_irq r13_irq SPSR_und r14_abt r14_svc user mode fiq mode svc mode abort mode irq mode undefined mode usable in user mode system modes only r13_abt r13_svc r8_fiq r9_fiq r10_fiq r11_fiq SPSR_irq SPSR_abt SPSR_svc SPSR_fiq CPSR r14_fiq r13_fiq r12_fiq r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 (PC)

9 M. Sonza Reorda – a.a. 2006/07 9 CPRS NZCVIFT 31282776540 mode unused CPRS stands for Current Program Status Register.

10 M. Sonza Reorda – a.a. 2006/07 10 CPRS NZCVIFT 31282776540 mode unused Condition codes: Negative Zero Carry Overflow Shows the processor operation mode Affect some processor features

11 M. Sonza Reorda – a.a. 2006/07 11 Memory Organization Data items may be: 8-bit byte 16-bit half words (aligned on even byte boundaries) 32-bit word (aligned on 4-byte boundaries).

12 M. Sonza Reorda – a.a. 2006/07 12 Load-store architecture The instruction set only processes values which are in registers (or specified directly within the instruction itself), and places the results of such processing into a register. The only operations which apply to memory state are ones which copy memory values into registers (load instruction) or copy register values into memory (store instruction).

13 M. Sonza Reorda – a.a. 2006/07 13 The ARM Assembly Language The ARM instruction set is composed of the following types of instructions:  Data processing instructions  Data transfer instructions  Control flow instructions.

14 M. Sonza Reorda – a.a. 2006/07 14 Data processing instructions The following rules apply:  All operands are 32 bits wide  They may be either registers or immediates  The result is always 32 bit wide and corresponds to a register  The two operands and the result are independently specified in the instruction.

15 M. Sonza Reorda – a.a. 2006/07 15 Examples ADDr0, r1, r2;r0 := r1 + r2 ADCr0, r1, r2;r0 := r1 + r2 + C ANDr0, r1, r2;r0 := r1 and r2 MOVr0, r2;r0 := r2 CMP r1, r2; set cc on r1 – r2 ADDr3, r3, #1; r3 := r3 + 1

16 M. Sonza Reorda – a.a. 2006/07 16 Shifted operands Any operand in an instruction can be shifted before being used. Example ADDr3, r2, r1, LSL #3; r3 := r2 + 8 × r1

17 M. Sonza Reorda – a.a. 2006/07 17 Available shift operations

18 M. Sonza Reorda – a.a. 2006/07 18 Condition codes Every instruction may (or may not) set the condition codes (N, Z, C and V) according to the programmer wish. Example ADDSr1, r2, r3; sets the cc ADDr1, r2, r3; does not set the cc

19 M. Sonza Reorda – a.a. 2006/07 19 Data transfer instructions There are three groups of these instructions:  Single register load and store  Multiple register load and store  Single register swap.

20 M. Sonza Reorda – a.a. 2006/07 20 Addressing modes register-indirect addressing Example LDRr0, [r1]; r0 := mem 32 [r1] Pre-indexed Example LDRr0, [r1, #4]; r0 := mem 32 [r1+4]

21 M. Sonza Reorda – a.a. 2006/07 21 Addressing modes (II) Auto-indexing Example LDRr0, [r1, #4]!; r0 := mem 32 [r1+4] ; r1 := r1 + 4 Post-indexed Example LDRr0, [r1], #4; r0 := mem 32 [r1] ; r1 := r1 + 4

22 M. Sonza Reorda – a.a. 2006/07 22 Multiple register data transfer When considerable quantities of data are to be transferred it is preferable to move several registers at a time. Example: LoaD Multiple Increment After LDMIA r1, {r0, r2, r5} ; r0 := mem 32 [r1] ;r2 := mem 32 [r1 + 4] ; r5 := mem 32 [r1 + 8]

23 M. Sonza Reorda – a.a. 2006/07 23 Multiple register data transfer (cont.) r5 r1 r9’ r0r9 STMIA r9!, {r0,r1,r5} 1000 16 100c 16 1018 16 r1 r5r9 STMDA r9!, {r0,r1,r5} r0 r9’ 1000 16 100c 16 1018 16 r5 r9 STMDB r9!, {r0,r1,r5} r1 r0r9’ 1000 16 100c 16 1018 16 r5 r1 r0 r9’ r9 STMIB r9!, {r0,r1,r5} 1000 16 100c 16 1018 16

24 M. Sonza Reorda – a.a. 2006/07 24 Stack addressing A stack is a form of LIFO store which supports simple dynamic memory allocation. A stack is implemented as a linear data structure which grows up (an ascending stack) or down (a descending stack) as data is added to it and shrinks back as data is removed. A stack pointer holds the address of the current top of the stack, either by pointing to the last valid data item pushed onto the stack (the full stack) or by pointing to the vacant slot where the next data item will be placed (the empty stack).

25 M. Sonza Reorda – a.a. 2006/07 25 Stack addressing (cont.) There are 4 variations on a stack:  full ascending (suffix FA), the stack grows up and the base register points to the highest address containing a valid item  empty ascending (suffx EA), the stack grows up and the base register points to the first empty location above the stack  empty descending (suffix ED), the stack grows down and the base register points to the first empty location below the stack  full descending (suffix FD), the stack grows down and the base register points to the lowest address containing a valid item.

26 M. Sonza Reorda – a.a. 2006/07 26 Stack addressing (cont.) Example: STMFD r13!, {r2-r9} ; save regs onto stack LDMFD r13!, {r2-r9} ; restore regs from stack Note that the same stack model is used for both the store and load, ensuring that the correct values will be collected.

27 M. Sonza Reorda – a.a. 2006/07 27 Single register swap The swap instruction allows a value in a register to be exchanged with a value in memory, doing both a load and a store operation in one instruction. The principal use is to implement semaphores to ensure mutual exclusion on accesses to shared data structures in multi-processor systems. SWP Rd, Rm, [Rn] ; Rd := mem 32 [Rn] ;mem 32 [Rn] = Rm Rd and Rm may be the same register: memory and register are exchanged Example SWP r1, r1, [r0]

28 M. Sonza Reorda – a.a. 2006/07 28 Control flow instructions They include  Branch instructions (unconditional and conditional)  Branch and link instructions (to activate subroutines).

29 M. Sonza Reorda – a.a. 2006/07 29 Branch instruction It performs an unconditional branch. Example BLABEL … LABEL…

30 M. Sonza Reorda – a.a. 2006/07 30 Conditional branches They perform or not the branch depending on the value of the condition codes.

31 M. Sonza Reorda – a.a. 2006/07 31 Conditional execution All ARM instructions can be executed conditionally. Example CMPr0, #5 BEQBYPASS ADDr1, r1, r0 SUBr1, r1, r2 BYPASS … is equivalent to CMPr0, #5 ADDNEr1, r1, r0 SUBNEr1, r1, r2 …

32 M. Sonza Reorda – a.a. 2006/07 32 Conditional execution (cont.) ; if ( (a == b) && (c == d) ) e++; CMP r0, r1 CMPEQ r2, r3 ADDEQr4, r4, #1

33 M. Sonza Reorda – a.a. 2006/07 33 Branch and link Supports the call to a subroutine. The address of the following instruction is saved in the link register r14. Therefore, the return operation can be performed by a simple MOV instruction.

34 M. Sonza Reorda – a.a. 2006/07 34 Branch and link Example BL SUBR; branch to SUBR... SUBR... MOVpc, r14; return ; copy r14 into pc to return Note that since the return address is held in a register, the subroutine should not call a further, nested, subroutine without first saving r14. But, a subroutine that does not call another subroutine (a leaf subroutine) need not save r14 since it will not be overwritten.

35 M. Sonza Reorda – a.a. 2006/07 35 Nested calls When a nested procedure is called, r14 is pushed onto a stack in memory. Since the subroutine will often also require some work registers, the old values in these registers can be saved at the same time using a store multiple instruction. BL SUB1... SUB1STMFD r13!, {r0-r2, r14} ; save work regs ; and link BL SUB2... LDMFD r13!, {r0-r2, pc} ; restore work regs ; and return

36 M. Sonza Reorda – a.a. 2006/07 36 The ARM architecture Several ARM processors have been developed and sold. Core Architecture ARM1 v1 ARM2 v2 ARM2aS, ARM3 v2a ARM6, ARM600, ARM610 v3 ARM7, ARM700, ARM710 v3 ARM7TDMI, ARM710T, ARM720T, ARM740T v4T StrongARM, ARM8, ARM810 v4 ARM9TDMI, ARM920T, ARM940T v4T ARM9ES v5TE ARM10TDMI, ARM1020E v5TE

37 M. Sonza Reorda – a.a. 2006/07 37 3-stage ARM This architecture was employed up to ARM7. The 3 stages are  Fetch  Decode  Execute. Some instructions (e.g., those accessing the memory) require more than 3 clock cycles to be executed. Memory is accessed once per every clock cycle (or less). Branch instructions flush and refill the pipeline.

38 M. Sonza Reorda – a.a. 2006/07 38 Pipeline behavior

39 M. Sonza Reorda – a.a. 2006/07 39 Architecture

40 M. Sonza Reorda – a.a. 2006/07 40 5-stage ARM The new architecture was adopted starting from ARM9. It uses separate data and code memories (i.e., caches). The 5 stages are  Fetch  Decode  Execute  Buffer/data  Write-back. The higher number of stages allows for a faster clock.

41 M. Sonza Reorda – a.a. 2006/07 41 The Thumb Instruction Set Some of the ARM processors (those with a T in the acronym) support the Thumb instruction set (together with the standard ARM instruction set). In the Thumb instruction set  Instructions are encoded on 16 bits  Instructions are less powerful  Instructions are less. As a result, encoding an algorithm in Thumb instructions  Requires more instructions, but less code memory  Results in slower execution, but requires less power. Thumb instructions are therefore used for low-cost, low performance applications.

42 M. Sonza Reorda – a.a. 2006/07 42 The T bit The mechanism to switch to/from Thumb instructions is driven by the T bit in the CPRS:  If T=1, the processor interprets the fetched code as a sequence of Thumb instructions  If T=0, the processor interprets the fetched code as a sequence of usual ARM instructions. The value of T can be changed via software.

43 M. Sonza Reorda – a.a. 2006/07 43 Thumb implementation The Thumb instruction set requires some additional logic to translate Thumb instructions into ARM instructions. This operation is performed in the decode stage, without significant effects on performance.

44 M. Sonza Reorda – a.a. 2006/07 44 Operating modes The ARM processor may work in several modes:  The user mode is the usual one  Privileged modes are used to handle exceptions and supervisor calls). The current operating mode is defined by the bottom five bits of the CPSR.

45 M. Sonza Reorda – a.a. 2006/07 45 SPSR Each privileged mode (except system mode) has associated with it a Saved Program Status Register (SPSR). This register is used to save the state of the CPSR when the privileged mode is entered. In this way the user state can be fully restored when the user process is restored.

46 M. Sonza Reorda – a.a. 2006/07 46 Operating modes (II)

47 M. Sonza Reorda – a.a. 2006/07 47 I/O Peripherals are accessed as memory-mapped devices.

48 M. Sonza Reorda – a.a. 2006/07 48 Exceptions Exceptions include interrupts (from the outside), traps and supervisor calls. They may be categorized in 3 groups:  Exceptions that are a direct effect of an instruction:  Software interrupts  Undefined instructions  Prefetch abort (i.e., memory fault during fetch)  Exceptions that are a side-effect of an instruction  Data aborts (i.e., memory fault during a load/store data access)  Exceptions generated externally  Reset  IRQ  FIQ.

49 M. Sonza Reorda – a.a. 2006/07 49 Exception priorities If multiple exceptions arise at the same time, the following priorities are used  Reset (highest priority)  Data abort  FIQ  IRQ  Prefetch abort  SWI and undefined instruction.

50 M. Sonza Reorda – a.a. 2006/07 50 Exceptions management When an exception is served  PC and CPSR are saved in proper registers  The operating mode is changed to the appropriate exception mode  The PC is forced to a value between 00 16 and 1C 16, depending on the exception type. Locations from 00 16 to 1C 16 are called vector address, and usually contain branches to exception handlers.

51 M. Sonza Reorda – a.a. 2006/07 51 ARM system development In order to support the development of systems based on ARM cores, the following features have been developed  A memory interface  A bus architecture  A reference peripheral specification  A debugging mechanism.

52 M. Sonza Reorda – a.a. 2006/07 52 Memory interface The memory bus interface signals include:  A 32-bit address bus  A 32-bit bidirectional data bus  Some control signals: mreq, seq, r/w, b/w, wait, etc.

53 M. Sonza Reorda – a.a. 2006/07 53 Bus architecture ARM released a standard bus architecture (named AMBA, or Advanced Microcontroller Bus Architecture) to be used for developers of cores to be connected to ARM processors. The AMBA specification includes 3 busses:  The Advanced High-performance Bus (AHB): it is used to connect high-performance modules. It supports burst mode data transfers and split transactions. All timing is referenced to a single clock edge.

54 M. Sonza Reorda – a.a. 2006/07 54 Bus architecture (II)  The Advanced System Bus (ASB): it is an old specification, to be substituted by AHB  The Advanced Peripheral Bus (APB): offers a simpler interface for low-performance peripherals. APB is generally used as a local secondary bus which appears as a slave module on the AHB.

55 M. Sonza Reorda – a.a. 2006/07 55 Typical AMBA-based system

56 M. Sonza Reorda – a.a. 2006/07 56 Bus arbitration Arbitration is performed in a centralized way using as many couples of signals AREQx/AGNTx as the modules connected on the AHB. The policy implemented by the arbiter is not specified by the standard.

57 M. Sonza Reorda – a.a. 2006/07 57 AMBA reference peripheral specification If a system developer wishes to develop a system able to more easily support an existing operating system, he should follow the ARM reference peripheral specification, that defines the following components:  A memory map  An interrupt controller  A counter timer  A reset controller.

58 M. Sonza Reorda – a.a. 2006/07 58 Debugging mechanism Debugging a SoC is particularly difficult, since the developer has no access to internal signals and the code is often written in a ROM. ARM provides a debug solution based on  An embeddedICE module, that can be programmed to halt the processor when a given instruction is executed  Exploiting the JTAG port for programming the embeddedICE and accessing internal core elements  An embedded trace macrocell that allows tracing the values passing on the busses.

59 M. Sonza Reorda – a.a. 2006/07 59 Real-time debug system organization

60 M. Sonza Reorda – a.a. 2006/07 60 ARM CPU cores In many cases, designers need not just a processor core, but a whole CPU, including caches, Memory Management Units, bus interface, etc. Therefore, ARM deliver not only processor cores, but also CPU cores. Example The ARM710T CPU core is based on the ARM7TDMI processor core. It also includes an 8Kbyte code/data cache, an AMBA bus master unit, a write buffer and MMU.

61 M. Sonza Reorda – a.a. 2006/07 61 ARM710T

62 M. Sonza Reorda – a.a. 2006/07 62 Examples of ARM-based SoCs ARM is very popular among SoC designers.

63 M. Sonza Reorda – a.a. 2006/07 63 Ruby II It is a chip to be used in portable communication devices. It is produced by VLSI Technology, Inc. and delivered as a 144- or 176-pin thin quad flat packs.

64 M. Sonza Reorda – a.a. 2006/07 64 Ruby II architecture

65 M. Sonza Reorda – a.a. 2006/07 65 Bibliography Steve Furber ARM system-on-chip architecture Addison-Wesley, 2000


Download ppt "ARM Politecnico di Torino Dipartimento di Automatica e Informatica M. Sonza Reorda – M. Rebaudengo."

Similar presentations


Ads by Google