ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

Slides:



Advertisements
Similar presentations
CPU Structure and Function
Advertisements

DSPs Vs General Purpose Microprocessors
Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
EZ-COURSEWARE State-of-the-Art Teaching Tools From AMS Teaching Tomorrow’s Technology Today.
Chapter 8: Central Processing Unit
ARM Cortex-A9 MPCore ™ processor Presented by- Chris Cai (xiaocai2) Rehana Tabassum (tabassu2) Sam Mussmann (mussmnn2)
ELEN 468 Advanced Logic Design
Processor Overview Features Designed for consumer and wireless products RISC Processor with Harvard Architecture Vector Floating Point coprocessor Branch.
Fall EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Computer Organization Lecture 20 Pipelining: “bucket brigade” MIPS.
THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.
Present by Pitipund Lorchirachoonkul Uchot Jitpaisarnsook Present by Pitipund Lorchirachoonkul Uchot Jitpaisarnsook
PART 4: (2/2) Central Processing Unit (CPU) Basics CHAPTER 13: REDUCED INSTRUCTION SET COMPUTERS (RISC) 1.
1 Microprocessor-based Systems Course 4 - Microprocessors.
Embedded Systems Programming
IXP1200 Microengines Apparao Kodavanti Srinivasa Guntupalli.
Introduction To The ARM Microprocessor
1 COMP541 Sequencing – III (Sequencing a Computer) Montek Singh April 9, 2007.
Introduction to ARM Architecture, Programmer’s Model and Assembler Embedded Systems Programming.
The ARM Microprocessor: A Little British Success Story Michelle Nabavian V Microprocessors Professor Robert Dewar Spring 2002.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
Chapter 12 CPU Structure and Function. Example Register Organizations.
ARM 7 Datapath. Has “BIGEND” input bit, which defines whether the memory is big or little endian Modes: ARM7 supports six modes of operation: (1) User.
Embedded Systems Programming
Prardiva Mangilipally
Intel Pentium 4 Processor Presented by Presented by Steve Kelley Steve Kelley Zhijian Lu Zhijian Lu.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Lect 13-1 Lect 13: and Pentium. Lect Microprocessor Family  Microprocessor  Introduced in 1989  High Integration  On-chip 8K.
ARM Processor Architecture
Intel
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
CLEMSON U N I V E R S I T Y AVR32 Micro Controller Unit Atmel has created the first processor architected specifically for 21st century applications that.
Introduction of Intel Processors
Technical Seminar Introduction to networking with Linux Administration Amit Kumar Sahoo EC ADVANCED EMBEDDED MICROPROCESSORS AND APPLICATIONS.
The MIPS R10000 Superscalar Microprocessor Kenneth C. Yeager Nishanth Haranahalli February 11, 2004.
Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.
Chapter Six Sun SPARC Architecture. SPARC Processor The name SPARC stands for Scalable Processor Architecture SPARC architecture follows the RISC design.
Presented By: Rodney Fluharty Dec. 07, Who is ARM? Advanced Risc Microprocessor is the industry's leading provider of 16/32-bit embedded RISC microprocessor.
ARM 2007 Chapter 15 The Future of the Architecture by John Rayfield Optimization Technique in Embedded System (ARM)
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Processor Architecture
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION – ARM
The Intel 86 Family of Processors
Intel Multimedia Extensions and Hyper-Threading Michele Co CS451.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
PART 5: (1/2) Processor Internals CHAPTER 14: INSTRUCTION-LEVEL PARALLELISM AND SUPERSCALAR PROCESSORS 1.
SEMINAR ON ARM PROCESSOR
Protection in Virtual Mode
ARM.
Instruction Level Parallelism
Low-power Digital Signal Processing for Mobile Phone chipsets
Visit for more Learning Resources
ELEN 468 Advanced Logic Design
Introduction to Pentium Processor
Computer Architecture
The TMS320C6x Family of DSPs
Computer Architecture
Comparison of Two Processors
Alex Saify Chad Reynolds James Aldorisio Brian Bischoff
Apparao Kodavanti Srinivasa Guntupalli
* From AMD 1996 Publication #18522 Revision E
ARM.
Computer Architecture
ARM Introduction.
ARM920T Processor This training module provides an introduction to the ARM920T processor embedded in the AT91RM9200 microcontroller.We’ll identify the.
Presentation transcript:

ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang

Advanced RISC Machines >75% of market for 32-bit RISC microprocessors ARM11 Design led by Ian Devereux

Demands of Wireless Applications High performance Low power Small size Cost

RISC for Wireless Strengths: –Clock rate –Pipelining Weaknesses: –High code density –Power consumption

ARM11 for Wireless Strengths Enhanced: –Clock rate Optimized interrupt and exception handling Minimized context switch cost Instruction set for media –Pipelining Decoupled for high bandwidth Retire before execution Weaknesses Reduced: –High code density ISA extensions Optional application specific and/or VFP coprocessors –Power consumption Architecture and instructions reduce clock rate Clock gate control

ARM11 Microarchitecture First implementation of ARMv6 architecture 8-stage pipeline 64-bit datapaths Frequency: up to 750 MHz, 350 – 500+ MHz worst case. 400 – 1,200 Dhrystone MIPS Power: 0.4 mW/MHz worst case: 0.13µm 1.2V Will be released to licensees in Q4 2002

ARMv6 Media support: SIMD extensions Improved interrupt latency ISA extensions THUMB, DSP, Jazelle 100% backwards compatibility to ARMv5

THUMB Instruction Set 32-bit performance for 16-bit systems 32-bit instructions re-coded to 16-bit op- codes 32-bit ROM stores 2 THUMB instructions per word Decompressed in pipeline to ARM instruction equivalents Improves code density by 35%

DSP Instruction Set Application accelerator for Digital Signal Processor performance Can load/store registers by pairs 16x16 or 32x16 MAC in one cycle Utilized in MAC pipeline

Jazelle Instruction Set Support for entering/exiting Java applications Fetches/decodes Java bytecodes, maintains a Java operand stack Creates a state that imitates a Java processor OS controls low-cost switch between Java and ARM/THUMB states

SIMD Instruction Set Parallel processing of 2x16-bit or 4x8-bit operands Four new Greater than or Equal to status bits (GE[3:0]) for MAC calculations Eliminates need for very high clock frequencies and hardware accelerators 2 – 4 x performance improvement for multimedia applications

Synchronization and Sharing Data Load-/store- Exclusive instructions (LDREX/STREX) support semaphores –Consolidates old Swap instruction and necessary semaphore implementation Virtual Memory System Architecture v6 ID’s separate caches –Cache hierarchy and ordering rules

Bit/Byte Order Support E-bit for current endian setting of core –Set/cleared with SETEND instruction REV* instructions reverse bytes for unaligned data support –REV – reverses a word –REV16 – reverses both halfwords –REVSH – reverses high order halfword + sign extend halfword

Exception and Interrupt Improvement Imperative for real-time tasks wherein low latency is critical F1 bit in CP15 register 1 designates: 0:Max performance mode, or 1:Low interrupt latency mode to allow interrupts VE bit enables vectored interrupts to core –Direct vs. external-> system -> vector address A-bit aborts all unaligned accesses U-bit (with clear A-bit) allows unaligned hardware access

Mode Changing and Stack Improvements CPSID/CPSIE instructions allow changing between modes with interrupt disable/enable Save Return State (SRS) saves registers and state of current mode onto stack of target mode Return From Exception (RFE) loads registers and state of saved mode Reduces exception handling overhead

8-Stage Pipeline Single-issue Dynamic branch prediction is 64-entry directly mapped BTB 64-bit data paths: read 2 registers in 1 clock Loads/stores done in background Out-of-order completion: can retire instructions before execution ALU processed in parallel with data cache access MAC processed in lock-step with ALU

Prefetch L1 memory access requires 2 cycles

Decode Decode instruction bits and allocate stack

Issue Instruction Load operands from registers

ALU and MAC ALU pipeline –Shift bits –Arithmetic and logical operations –Save state and registers 3-stage MAC –Can issue a 16x16 operation per cycle –Processed with ALU pipeline

Data Cache Access Map memory address Data cache load/store requires 2 cycles

Writeback Write results of instructions to designated memory, cache, or register

8-Stage Pipeline Diagram by Devereau:7

Power-saving features >95% of registers clock gated WFI instruction: wait for interrupt: can disable entire clock network Reduced clock cycles and use of transistors

Conclusions ARM11 will be implemented as a family of cores –Designed for maximum performance in wireless multimedia –A new standard in efficiency and power for embedded applications