Andes Embedded Processors

Slides:



Advertisements
Similar presentations
CPU Structure and Function
Advertisements

1 Lecture 3: MIPS Instruction Set Today’s topic:  More MIPS instructions  Procedure call/return Reminder: Assignment 1 is on the class web-page (due.
ELEN 468 Advanced Logic Design
Computer Organization and Architecture
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
Computer Organization and Architecture
Computer Organization and Architecture
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.
ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.
Review °Apply Principle of Locality Recursively °Manage memory to disk? Treat as cache Included protection as bonus, now critical Use Page Table of mappings.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
ARM 7 & ARM 9 MICROCONTROLLERS AT91 1 ARM920T Processor.
Computer Architecture Lecture 12: Virtual Memory I
CS161 – Design and Architecture of Computer
Translation Lookaside Buffer
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
Basic Processor Structure/design
Lecture 12 Virtual Memory.
From Address Translation to Demand Paging
William Stallings Computer Organization and Architecture 8th Edition
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Morgan Kaufmann Publishers
ELEN 468 Advanced Logic Design
Andes Technology Innovate SOC ProcessorsTM
RISC Concepts, MIPS ISA Logic Design Tutorial 8.
Cache Memory Presentation I
Morgan Kaufmann Publishers
AndesCoreTM N1213-S
Introduction to Pentium Processor
Andes Instruction Set & System Privileged Architecture
Pipelining: Advanced ILP
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Instructions - Type and Format
Appendix A Classifying Instruction Set Architecture
Lecture 4: MIPS Instruction Set
Lecture 17: Case Studies Topics: case studies for virtual memory and cache hierarchies (Sections )
ECE232: Hardware Organization and Design
Instruction encoding The ISA defines Format = Encoding
Overheads for Computers as Components 2nd ed.
Guest Lecturer TA: Shreyas Chand
Translation Buffers (TLB’s)
COMS 361 Computer Organization
Computer Instructions
Virtual Memory Overcoming main memory size limitation
Computer Architecture
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Introduction to Microprocessor Programming
CSE451 Virtual Memory Paging Autumn 2002
CSC3050 – Computer Architecture
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
ARM Introduction.
CSE 471 Autumn 1998 Virtual memory
Paging and Segmentation
CS703 - Advanced Operating Systems
Main Memory Background
Lecture 4: Instruction Set Design/Pipelining
Chapter 11 Processor Structure and function
Virtual Memory Lecture notes from MKP and S. Yalamanchili.
COMP755 Advanced Operating Systems
Introduction to Computer Systems Engineering
ARM920T Processor This training module provides an introduction to the ARM920T processor embedded in the AT91RM9200 microcontroller.We’ll identify the.
Presentation transcript:

Andes Embedded Processors

Andes Embedded Processors ANDES Confidential

All device on AHB are slave except N1213 AHB bus Masters CPU Core MAC LCD controller DMA controller APB Bridge Slaves All device on AHB are slave except N1213 APB bus SRAM/SDRAM are sharing the IO pin on address and data Side Band Customer design NCORE INCTRL CPU Core The basic bus structure is AMBA v2.0 We have AHB bus for high speed and high bandwidth demand devices. AHB masters are N1213, MAC, LCD controller, and DMA controller. ANDES Confidential

N903: Low-power Cost-efficient Embedded Controller Features: Harvard architecture, 5-stage pipeline. 16 general-purpose registers. Static branch prediction Fast MAC Hardware divider Fully clock gated pipeline 2-level nested interrupt External instruction/data local memory interface Instruction/data cache APB/AHB/AHB-Lite/AMI bus interface Power management instructions 45K ~ 110K gate count 250MHz @ 130nm Applications: MCU Storage Automotive control Toys External Bus Interface APB/AHB/AHB-Lite/AMI Instr Cache LM/IF Data N9 uCore JTAG/EDM ANDES Confidential

Yes (internal/external) N903 Competition Core’s Features N903 ARM7TDMI Cortex-M3 Architecture Harvard Von Neumann Pipeline Stages 5 3 Instruction Set 16-/32-bit mixable Thumb/ARM General-purpose register # 16 Branch prediction Static None Interrupt latency (Cycle) 10 24-42 12 Data endian support Big and Little Bus APB/AHB/AMI 1 AHB 3 AHB Lite Sleep Mode Yes No Vectored interrupt support Yes (internal/external) Yes (external) DMIPS/Mhz 1.38 0.95 1.25 Core Area (mm2) (TSMC 0.13G) *0.42 0.26 0.43/0.21 Core Power (mW/MHz) (TSMC 0.13G) *0.06 0.06 0.165/0.084 Max Frequency (Mhz) (TSMC 0.13G) *204 133 135/50 DMIPS (TSMC 0.13G) 281.52 126.35 168.75/62.5 Cost Performance (DMIPS/mm2) 670.3 486 393/298 *TSMC free library with max speed synthesis constraint ANDES Confidential

N1033A: Lowe-power Cost-efficient Application Processor Features: Harvard architecture, 5-stage pipeline. 32 general-purpose registers Dynamic branch prediction Fast MAC Hardware divider Audio acceleration instructions Fully clock gated pipeline 3-level nested interrupt Instruction/Data local memory Instruction/Data cache DMA support for 1-D and 2-D transfer AHB/AHB-Lite/APB bus MMU/MPU Power management instructions Applications: Portable audio/media player DVB/DMB baseband DVD DSC Toys, Games ANDES Confidential

AHB/2AHB/AHB-Lite/APB N1033A Competition Core’s Features N1033A ARM926EJ Pipeline Stages 5 Instruction Set 16-/32-bit mixable 16 or 32 General-purpose register# 32 16 Dynamic branch prediction 32/64 -Entry BTB No DMA support 1D and 2D Cache tag index Physical tag Virtual tag Vectored interrupt support Yes (64 addresses) Nested interruption level 3 Bus AHB/2AHB/AHB-Lite/APB 2 AHB Audio DSP instructions > 40 dedicated Few general DSP Max frequency (Mhz) (TSMC 0.13G) *280 276/238 Performance (DMIPS/MHz) 1.6 1.1 Core Power (mW/MHz) (TSMC 0.13G) *0.12 0.36 Core area (mm2) (TSMC 0.13G) *1.4 1.61/1.45 DMIPS (TSMC 0.13G) 448 303.6 Cost Performance (DMIPS/MHz) 320 189 physical tag (for 20% faster context switching) *TSMC free library with max speed synthesis constraint ANDES Confidential

N1213 – High Performance Application Processor Features: Harvard architecture, 8-stage pipeline. 32 general-purpose registers Dynamic branch prediction. Multiply-add and multiply-subtract instructions. Divide instructions. Instruction/Data local memory. Instruction/Data cache. MMU AHB or HSMP(AXI like) bus Power management instructions Applications: Portable media player MFP Networking Gateway/Router Home entertainment Smartphone/Mobile phone External Bus Interface AHB Instruction LM Cache Data MMU N12 Execution Core JTAG/EDM EPT I/F DTLB ITLB HSMP DMA N1213U130 hardcore: doesn’t support EPT IF and HSMP ANDES Confidential

N1213 Competition Core’s Features N1213 ARM1176 MIPS 24K Instruction Set 16-/32-bit mixable 16 or 32 General-purpose register# 32 16 Page Table Support for MMU HW and SW HW only SW only Interrupt Stack Level 3 2 1 unaligned memory access ld/st multiple mode bit ld/st left/right uncached read burst use ld multiple none DMA support 1D/2D 1D No Core die size (mm2) (TSMC 90G) *1.38 1.95/1.00 *1.44 Frequency (MHz) (TSMC 90G) *580 620/320 *520 Core power (mW/MHz) (TSMC 90G) *0.27 0.37/0.18 *0.40 Performance (DMIPS/MHz) 1.37 1.22 1.55 DMIPS (TSMC 90G) *795 756/390 *748 Cost Performance (DMIPS/mm2) 576.1 387.7 519.4 *TSMC free library with max speed synthesis constraint ANDES Confidential

N1213-S Block diagram ANDES Confidential TBD - Core Competence Positioning ANDES Confidential

Configurability for customers AndesCore™ N1213-S CPU Core 32bit CPU 8-stage pipeline AndeStar™ ISA with 16-/32-bit intermixable instructions to reduce code size Dynamic branch prediction to reduce branch penalties 32/64/128/256 BTB Configurability for customers Configuration options for power, performance and area requirements ANDES Confidential

AndesCore™ N1213-S (cont.) MMU fully-associative iTLB/dTLB: 4 or 8 entries 4-way set-associative main TLB: 32/64/128 entries Locking support for TLB I & D cache Virtual index and physical tag (for faster context switching) Cache size: 8KB/16KB/32KB/64KB Cache line size: 16B/32B 2/4-way set associative I Cache locking support ANDES Confidential

AndesCore™ N1213-S (cont.) I & D Local memory Bus wide range support for internal /external local memory 4KB~1024KB Provide fixed access latencies for internal local memory Double buffer mode for D local memory Optional external local memory interface Bus Synchronous/Asynchronous AHB 1 or 2 port configuration Synchronous HSMP AXI like ANDES Confidential

AndesCore™ N1213-S (cont.) For performance For flexibility Improved memory accesses: 1D/2D DMA, load/store multiple Efficient synchronization without locking the whole bus Load lock, store conditional instructions Vectored interrupt to improve real-time performance 6 interrupt signals MMU Optional HW page table walker TLB management instructions For flexibility Memory-mapped IO space JTAG-based debug support Optional embedded program trace interface Performance monitors for performance tuning Bi-endian modes to support flexible data input ANDES Confidential

AndesCore™ N1213-S (cont.) For power Management Clock-gated Low-power mode support instructions Redundant memory access reduction Many CPU/bus frequency ratio support ANDES Confidential

Computer architecture taxonomy von Neumann architecture ANDES Confidential

Computer architecture taxonomy von Neumann architecture Features of each:   Execution in multiple cycles Serial fetch instructions & data Single memory structure Can get data/program mixed Data/instructions same size  Examples, von Neumann: PCs (Intel 80x86/Pentium, Motorola 68000, Mot 68xx uC families ANDES Confidential

Computer architecture taxonomy (cont.) Harvard architecture address CPU data memory data PC address program memory data ANDES Confidential

Computer architecture taxonomy (cont.) Harvard architecture Features of each: Execution in 1 cycle                      Parallel fetch instructions & data    More Complex H/W                           Instructions and data always separate       Different code/data path widths  (E.G. 14 bit instructions, 8 bit data)      Harvard: 8051, Microchip PIC families, Atmel AVR, AndeScore ANDES Confidential

Architectures: CISC vs. RISC CISC - Complex Instruction Set Computers: Emphasis on hardware Includes multi-clock complex instructions Memory-to-memory Sophisticated arithmetic (multiply, divide, trigonometry etc.). Special instructions are added to optimize performance with particular compilers. ANDES Confidential

Architectures: CISC vs. RISC (cont.) RISC - Reduced Instruction Set Computers: A very small set of primitive instructions Fixed instruction format Emphasis on software All instructions execute in one cycle (Fast!). Register to register (except Load/Store instructions) Pipline architecture ANDES Confidential

Pipeline Overview ANDES Confidential

AndesCore 8-stage pipeline ANDES Confidential

Instruction Fetch Stage F1 – Instruction Fetch First Instruction Tag/Data Arrays ITLB Address Translation Branch Target Buffer Prediction F2 – Instruction Fetch Second Instruction Cache Hit Detection Cache Way Selection Instruction Alignment IF1 IF2 ID RF AG DA1 DA2 WB EX MAC1 MAC2 ANDES Confidential

Instruction Issue Stage I1 – Instruction Issue First / Instruction Decode 32/16-Bit Instruction Decode Return Address Stack prediction I2 – Instruction Issue Second / Register File Access Instruction Issue Logic Register File Access IF1 IF2 ID RF AG DA1 DA2 WB EX MAC1 MAC2 ANDES Confidential

Execution Stage IF1 IF2 ID RF AG DA1 DA2 WB EX MAC1 MAC2 E1 – Instruction Execute First / Address Generation / MAC First Data Access Address Generation Multiply Operation (if MAC presents) E2 –Instruction Execute Second / Data Access First / MAC Second / ALU Execute ALU Branch/Jump/Return Resolution Data Tag/Data arrays DTLB address translation Accumulation Operation (if MAC presents) E3 –Instruction Execute Third / Data Access Second Data Cache Hit Detection Cache Way Selection Data Alignment IF1 IF2 ID RF AG DA1 DA2 WB EX MAC1 MAC2 ANDES Confidential

Write Back Stage E4 –Instruction Execute Fourth / Write Back Interruption Resolution Instruction Retire Register File Write Back IF1 IF2 ID RF AG DA1 DA2 WB EX MAC1 MAC2 ANDES Confidential

Branch Prediction Overview Why is branch prediction required? A deep pipeline is required for high speed Increasing the number of stages between fetch and branch resolution increases the taken-branch penalty Prediction allows the penalty to be avoided in the majority of cases Why dynamic branch prediction? Static branch prediction requires knowledge of the type of branch and the target address before a prediction can be made This information is not available before the decode stage and this would still increase the penalty for all branches Dynamic branch prediction is performed at the instruction fetch stage based purely on fetch addresses – no knowledge of the incoming instructions is required ANDES Confidential

Branch Prediction Unit Branch Target Buffer (BTB) 128 entries of 2-bit saturating counters Strongly-taken, Weakly-taken, Weakly-not-taken, Strongly-not-taken 128 entries, 32-bit predicted PC and 26-bit address tag Call-return and alignment flags Return Address Stack (RAS) Four entries BTB and RAS updated by committing branches/jumps ANDES Confidential

BTB Instruction Prediction BTB predictions are performed based on the previous PC instead of the actual instruction decoding information, BTB may make the following two mistakes Wrongly predicts the non-branch/jump instructions as branch/jump instructions Wrongly predicts the instruction boundary (32-bit -> 16-bit) If these cases are detected, IFU will trigger a BTB instruction misprediction in the I1 stage and re-start the program sequence from the recovered PC. There will be a 2-cycle penalty introduced here ANDES Confidential

RAS Prediction When return instructions present in the instruction sequence, RAS predictions are performed and the fetch sequence is changed to the predicted PC. Since the RAS prediction is performed in the I1 stage. There will be a 2-cycle penalty in the case of return instructions since the sequential fetches in between will not be used. ANDES Confidential

Branch Miss-Prediction In N12 processor core, the resolution of the branch/return instructions is performed by the ALU in the E2 stage and will be used by the IFU in the next (F1) stage. In this case, the misprediction penalty will be 5 cycles. ANDES Confidential

Cache ANDES Confidential

Cache and CPU address data cache main memory CPU controller cache ANDES Confidential

Multiple levels of cache L2 cache CPU L1 cache ANDES Confidential

Uncached Instruction/data Uncached write/write-through Cache data flow I-Cache I Fetches I Cache refill Uncached Instruction/data CPU Ext Memory Uncached write/write-through Write back Load & Store D-Cache D-Cache refill ANDES Confidential

Cache operation Many main memory locations are mapped onto one cache entry. May have caches for: instructions; data; data + instructions (unified). ANDES Confidential

Replacement policy Replacement policy: strategy for choosing which cache entry to throw out to make room for a new memory location. Two popular strategies: Random. Least-recently used (LRU). ANDES Confidential

Write operations Write-through: immediately copy write to main memory. Write-back: write to main memory only when location is removed from cache. ANDES Confidential

Improving Cache Performance Goal: reduce the Average Memory Access Time (AMAT) AMAT = Hit Time + Miss Rate * Miss Penalty Approaches Reduce Hit Time Reduce or Miss Penalty Reduce Miss Rate Notes There may be conflicting goals Keep track of clock cycle time, area, and power consumption ANDES Confidential

Tuning Cache Parameters Size: Must be large enough to fit working set (temporal locality) If too big, then hit time degrades Associativity Need large to avoid conflicts, but 4-8 way is as good a FA Block Need large to exploit spatial locality & reduce tag overhead If too large, few blocks ⇒ higher misses & miss penalty Configurable architecture allows designers to make the best performance/cost trade-offs ANDES Confidential

Cache configuration Cache line per way 128/256/512/1024 Cache ways Cache line size 16B/32B Cache size combination 8KB/16KB/32KB/64KB Replacement policy Pseudo LRU (default) 3-BIT per cache line Random 2-bit pre cache line ANDES Confidential

Cache control— CCTL instruction I cache control Fill and lock Unlock Invalidate Read/write tag Read/write word data D cache control Write back ANDES Confidential

Memory Management Units (MMU) ANDES Confidential

MMU Functionality Memory management unit (MMU) translates addresses logical address memory management unit physical address CPU Virtual memory addressing: Less fragmentation occurs because the address range of the application is independent of the physical addresses of the memories Shared memory reduces the memory footprint. Example of shared memory: the C runtime library (libc.so) is shared in Linux. Protection: Why does kernel mode have permission flags? The OS is allowed to do anything in kernel mode so it can easily set these flags if they are cleared. The reason is to protect the kernel from programming mistakes rather than malicious intent. This feature is rarely used, e.g. Linux never uses these flags as it always allows the kernel to have full access. ANDES Confidential

N(=32) sets k(=4) ways =128-entry MMU Architecture M-TLB entry index IFU LSU N(=32) sets k(=4) ways =128-entry 4/8 I-uTLB 4/8 D-uTLB 6 Way number 5 4 Set number Log2(N*K)-1 Log2(N) Log2(N)-1 M-TLB arbiter M-TLB Tag M-TLB data 32x4 M-TLB HPTWK Bus interface unit ANDES Confidential

Hardware page table walker HPTW ANDES Confidential

MMU Functionality Virtual memory addressing Better memory allocation, less fragmentation Allows shared memory Dynamic loading Memory protection (read/write/execute) Different permission flags for kernel/user mode OS typically runs in kernel mode Applications run in user mode Cache control (cached/uncached) Accesses to peripherals and other processors needs to be uncached. Virtual memory addressing: Less fragmentation occurs because the address range of the application is independent of the physical addresses of the memories Shared memory reduces the memory footprint. Example of shared memory: the C runtime library (libc.so) is shared in Linux. Protection: Why does kernel mode have permission flags? The OS is allowed to do anything in kernel mode so it can easily set these flags if they are cleared. The reason is to protect the kernel from programming mistakes rather than malicious intent. This feature is rarely used, e.g. Linux never uses these flags as it always allows the kernel to have full access. ANDES Confidential

Multi-Level Page Tables 1st level page table A page table for page tables. 2nd level page table Allocated only if one of its entries corresponds to allocated data. Advantages: Page table space proportional to allocated memory Can page the page tables. Disadvantage: Complexity, especially if TLB misses handled in hardware. ANDES Confidential

uITLB/uDTLB Specifications 4/8entrys fully associative Subset of MTLB contents Context ID checking support Pseudo-LRU replacement policy D type Flip-Flop PTE storage 1T check hit or miss ANDES Confidential

MTLB (Main TLB) Specifications 32/64/128-entry 4-way set-associative Support 4K/8K/1M page VA size TLB locking support Pseudo-LRU replacement policy SRAM base PTE storage ANDES Confidential

Address Space Attribute Defines various properties Cachebility/Bufferability requirement/hint Access permissions May even Ordering requirement Translated: defined in Page Table Entry Non-translated: Address space attributes are defined in MMU control register ANDES Confidential

Hardware Page Table Walker – 4KB ANDES Confidential

Hardware Page Table Walker Responsible for reading TLB entry located in system memory under Main TLB miss condition. Less flexible than software, handles only one or may be two page table format. But it speeds up the TLB refill time 2 Level address for looking up PTE in external memory Use physical address to access memory ANDES Confidential

Local Memory (LM) ANDES Confidential

Local Memory Icache uITLB IFU PA VA IBPA ILM LDMA BIU DLM DBPA LSU uDTLB D-cache VA PA ANDES Confidential

Data Local Memory Access Modes Normal access mode The processor core and the DMA engine will see the same DLM address space and it is possible for them to access the same DLM location at the same time using the same address. Double-Buffer access mode Only DLM is divided into two banks The processor pipeline is directed to access one bank The DMA engine is directed to access the other bank. ANDES Confidential

Local memory constraint Base physical address has to be aligned on 1MB boundary for any smaller than or equal to 1MB. Any access outside of the local memory within the allocated 1MB region will cause a “Nonexistent local memory address” exception. The local memory needs to be mapped onto an uncacheable region; otherwise, UPREDICTABLE behavior may happen to the local memory content. ANDES Confidential

Direct Memory Access (DMA) ANDES Confidential

Local Memory DMA Controller Ext. Memory DMA overview Two channels One active channel Programmed using physical addressing For both instruction and data local memory External address can be incremented with stride Optional 2-D Element Transfer (2DET) feature which provides an easy way to transfer two-dimensional blocks from external memory. Local Memory DMA Controller Ext. Memory ANDES Confidential

LMDMA Double Buffer Mode Core Pipeline External Memory Local Memory Bank 0 Local Memory Bank 1 DMA Engine Computation Data Movement Bank Switch between core and DMA engine Width byte stride (in DMA Setup register)=1 ANDES Confidential

Bus Interface Unit (BIU) ANDES Confidential

BIU introduction Bus Interface unit is responsible for off-CPU memory access which includes System memory access Instruction/data local memory access Memory-mapped register access in devices. ANDES Confidential

Bus Interface Compliance to AHB/AHB-Lite/APB High Speed Memory Port Andes Memory Interface External LM Interface ANDES Confidential

HSMP – High speed memory port N12 also provides a high speed memory port interface which has higher bus protocol efficiency and can run at a higher frequency to connect to a memory controller. The high speed memory port will be AMBA3.0 (AXI) protocol compliant, but with reduced I/O requirements. ANDES Confidential

Instruction Set Architecture AndesCore Instruction Set Architecture ANDES Confidential

Data Types Data Types Bit (1-bit, b) Byte (8-bit, B) Halfword (16-bit, H) Word (32-bit, W) Double Word (64-bit, D) ANDES Confidential

Andes Registers – GPR ANDES Andes ISA has 32 32-bit GPRs: ISA # of GPR Reg. index in 32-bit ISA Reg. index in 16-bit ISA ANDES 32 5 5/4/3 MIPS 3 ARM 16 4 ANDES Confidential

General purpose registers name convention r0-r5 a0-a5 function arguments r6-r14 s0-s8 Callee saved r15 ta Assembler reserved r16-r25 t0-t9 Caller saved r26-r27 p0-p1 Operating system reserved r28 s9/fp frame pointer r29 gp global pointer r30 lp return address r31 sp stack pointer ANDES Confidential

32-Bit Baseline Instruction Data-processing instructions Load and Store Instructions Jump and Branch Instructions Miscellaneous Instructions ANDES Confidential

Data-Processing Instructions ALU Instructions with Immediate OP rt_5, ra_5, imm_15 ADDI, SUBRI, ANDI, ORI, XORI SLTI, SLTSI: set rt_5 if ra_5 < imm_15 (unsigned or signed comparison) OP rt_5, imm_20 MOVI, SETHI: set low or high 20 bits of rt_5 (the rest bits are set to 0). ALU Instructions without Immediate OP rt_5, ra_5, rb_5 ADD, SUB, AND, NOR, OR, XOR, SLT, SLTS SVA, SVS: set if overflow on add/sub OP rt_5, ra_5 SEB, SEH, ZEB, WSBH: sign- or zero-extension byte/half, word-swap bytes. Shift and rotate instructions: OP rt_5, ra_5, imm_5 SLLI, SRLI, SRAI, ROTRI SLL, SRL, SRA, ROTR ANDES Confidential

Data-Processing Instructions (cont.) Multiplication-related Instructions OP rt_5, ra_5, rb_5  32-bit results of ra_5 x rb_5 to rt_5 MUL OP d_1, ra_5, rb_5  32-bit or 64-bit results to “d” registers MULTS64, MULT64, MADDS64, MADD64, MSUBS64, MSUB64, MULT32, MADD32, MSUB32 OP rt_5, d_1.{hi, lo} MFUSR, MTUSR: move-from or move-to a USR register Example: mult64 d0, r0, r1 mfusr r2, d0.hi mfusr r3, d0.lo ANDES Confidential

Load/Store Instructions Load/Store Single: Immediate value is in the unit of access size. OP rt_5, [ra_5+imm_15] LWI, LHI, LHSI, LBI, LBSI, SWI, SHI, SBI OP rt_5, [ra_5], imm_15: with post update LWI.bi, LHI.bi, LHSI.bi, LBI.bi, LBSI.bi, SWI.bi, SHI.bi, SBI.bi Index register is left-shifted by 0,1,2,3 bits by si OP rt_5, [ra_5+rb_5<< si] LW, LH, LHS, LB, LBS, SW, SH, SB OP rt_5, [ra_5], rb_5<<si LW.bi, LH.bi, LHS.bi, LB.bi, LBS.bi, SW.bi, SH.bi, SB.bi Final addresses must be aligned to the access size. ANDES Confidential

Jump and Branch Instructions Jump Instruction OP imm_24 J: unconditional direct branch JAL: direct function call OP rb_5 JR: unconditional indirect branch RET: return JRAL: indirect function call Branch Instruction OP rt_5, ra_5, imm_14 BEQ, BNE OP rt_5, imm_16 BEQZ, BNEZ, BGEZ, BLTZ, BGTZ, BLEZ ANDES Confidential

Miscellaneous Instructions Conditional Move CMOVZ rt_5, ra_5, rb_5 rt_5 = ra_5 if (rb_5 == 0) CMOVN rt_5, ra_5, rb_5 rt_5 = ra_5 if (rb_5 != 0) Example: C code to assembly if (r0==0) r3 = r1; else r3 = r2; move r3, r2 cmovz r3, r1, r0 ANDES Confidential

Miscellaneous Instructions (cont.) NOP Instruction No Operation Never needed for correctness. Useful for code alignment in some implementations for better performance. Breakpoint and System call Instructions Trap, Return Exception Instructions System Register Access Instructions ANDES Confidential

Strength of Andes ISA Mixed-length ISA with flexible 16b instructions. Efficient constant-setting instructions (up to 20 bits) PC-relative jumps for position independent code. Bi-endian modes to support flexible data input. Performance instruction extensions for greater performance. Immediate values of load/store word/halfword are in the corresponding access size to address wider range. Load/store with post-update mode.. Plenty space for custom extensions. ANDES Confidential

Interruption ANDES Confidential

Interruption Introduction An Interruption is a control flow change of normal instruction execution generated by an Interrupt or an Exception An interrupt is a control flow change event generated by an asynchronous internal or external source An exception is a control flow change event generated as a by-product of instruction execution Two Interruption stack level : 2-level or 3-level ANDES Confidential

Interruption Stack Level 4 “Interruption Stack Level” (ISL), ISL0 means no interruption. Hardware update interruption states on “Interruption Stack Level Transition” (ISLT). ISLT01, ISLT12 updates Interruption Register Stack. ISLT23 updates limited states for server error condition. Stack level can only be up to level 2 in 2-level configured interruption ANDES Confidential

Interruption Stack Updates performed on interruption level transition 0/1 and 1/2: New value  PC  IPC  P_IPC New value  PSW  IPSW  P_IPSW VA  EVA  P_EVA Interruption ITYPE  P_ITYPE P0  P_P0 P1  P_P1 Updates performed on interruption level transition 2/3: New value  PC  O_IPC ANDES Confidential

Interruption Behavior Transition to Superuser mode Disable interrupt Disable I/D address translation Use default Endian ISLT01/ISLT12 achieves these behavior by updating corresponding PSW states while ISLT23 achieves these behavior by assumption. ANDES Confidential

Types of Interruption Reset/NMI: Cold Reset, Warn Reset, Non-Maskable Interrupt (NMI) Interrupt: External, Performance counter, Software interrupt Debug exception: Instruction Address break, Data address & value break, Other debug exceptions MMU related exception: TLB fill exception (I/D) Non-Leaf PTE not present (I/D), Leaf PTE not present (I/D) Read protection violation (D), Write protection violation (D) Page modified (D), Non-executable page (I), Access bit (I/D) Syscall exception General exception: Trap, Arithmetic, Reserved instruction/value exception Privileged instruction, mis-Alignment (I/D), Bus error (I/D) Nonexistence local memory address (I/D), MPZIU control Coprocessor N not-usable exception, Coprocessor N-related exception Machine error exception: Cache/TLB errors Priority table ANDES Confidential

Vectored Entry Point IVB register defines the base address and the size of offset. 2 interrupt controller modes (configurable) Internal VIC mode External VIC mode Interruption sub-type in ITYPE register is defined based on individual entry point Base should be 64KB aligned Offset can be 4byte  256 byte ANDES Confidential

Entry Point for IVIC/EVIC Internal VIC mode 16 entry points (9 exception + 7 interrupt) 4 bits index External VIC 73 entry points (9 exception + 64 interrupt) 7 bits index Offset Entry point Reset/NMI 1 TLB fill 2 PTE not present 3 TLB misc 4 Reserved 5 Machine Error 6 Debug related 7 General exception 8 Syscall 9 -14 HW 0- 5 15 SW 0 Offset Entry point Reset/NMI 1 TLB fill 2 PTE not present 3 TLB misc 4 Reserved 5 Machine Error 6 Debug related 7 General exception 8 Syscall 9-72 VEP 0-63 ANDES Confidential

Thank You!!!