Intel Itanium 2 Processor Intel’s Server Solution Raymond Ball April 2, 2004.

Slides:



Advertisements
Similar presentations
Instruction Set Design
Advertisements

Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
Chapter 3 Instruction Set Architecture Advanced Computer Architecture COE 501.
CPU Review and Programming Models CT101 – Computing Systems.
ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.
1 Lecture 3: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Microprocessors. Von Neumann architecture Data and instructions in single read/write memory Contents of memory addressable by location, independent of.
Microprocessors VLIW Very Long Instruction Word Computing April 18th, 2002.
PART 4: (2/2) Central Processing Unit (CPU) Basics CHAPTER 13: REDUCED INSTRUCTION SET COMPUTERS (RISC) 1.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
POWERPC ELEC 5200/6200 Computer Architecture and Design, Fall 2006 Lectured by Dr. V. Agrawal Lectured by Dr. V. Agrawal HARISH KONGARA.
Chapter 15 IA-64 Architecture No HW, Concentrate on understanding these slides Next Monday we will talk about: Microprogramming of Computer Control units.
Alyssa Concha Microprocessors Final Project ADSP – SHARC Digital Signal Processor.
Instruction Set Architecture (ISA) for Low Power Hillary Grimes III Department of Electrical and Computer Engineering Auburn University.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Chapter 15 IA-64 Architecture. Reflection on Superscalar Machines Superscaler Machine: A Superscalar machine employs multiple independent pipelines to.
Chapter 15 IA-64 Architecture or (EPIC – Extremely Parallel Instruction Computing)
Prince Sultan College For Woman
Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.
Advanced Computer Architectures
Just-In-Time Java Compilation for the Itanium Processor Tatiana Shpeisman Guei-Yuan Lueh Ali-Reza Adl-Tabatabai Intel Labs.
IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li.
Basics and Architectures
RISC:Reduced Instruction Set Computing. Overview What is RISC architecture? How did RISC evolve? How does RISC use instruction pipelining? How does RISC.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
TECH 6 VLIW Architectures {Very Long Instruction Word}
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
History of 64-bit Computing: AMD64 and Intel Itanium Processors
Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.
The Arrival of the 64bit CPUs - Itanium1 นายชนินท์วงษ์ใหญ่รหัส นายสุนัยสุขเอนกรหัส
The Central Processing Unit
Is Out-Of-Order Out Of Date ? IA-64’s parallel architecture will improve processor performance William S. Worley Jr., HP Labs Jerry Huck, IA-64 Architecture.
Anshul Kumar, CSE IITD CS718 : VLIW - Software Driven ILP Example Architectures 6th Apr, 2006.
RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.
Computer Architecture Lecture 3 Cache Memory. Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics.
Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.
RISC Architecture RISC vs CISC Sherwin Chan.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Overview of Super-Harvard Architecture (SHARC) Daniel GlickDaniel Glick – May 15, 2002 for V (Dewar)
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
Reduced Instruction Set Computers. Major Advances in Computers(1) The family concept —IBM System/ —DEC PDP-8 —Separates architecture from implementation.
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
DIGITAL SIGNAL PROCESSORS. Von Neumann Architecture Computers to be programmed by codes residing in memory. Single Memory to store data and program.
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
Reduced Instruction Set Computing Ammi Blankrot April 26, 2011 (RISC)
RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.
Unit II Intel IA-64 and Itanium Processor By N.R.Rejin Paul Lecturer/VIT/CSE CS2354 Advanced Computer Architecture.
IA64 Complier Optimizations Alex Bobrek Jonathan Bradbury.
EEL 4709C Prof. Watson Herman Group 4 Ali Alshamma, Derek Montgomery, David Ortiz 11/11/2008.
Addressing modes, memory architecture, interrupt and exception handling, and external I/O. An ISA includes a specification of the set of opcodes (machine.
Assembly language.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue
Advanced Topic: Alternative Architectures Chapter 9 Objectives
COSC3330 Computer Architecture
Henk Corporaal TUEindhoven 2009
Henk Corporaal TUEindhoven 2011
Sampoorani, Sivakumar and Joshua
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
General Optimization Issues
Chapter 12 Pipelining and RISC
Understanding the TigerSHARC ALU pipeline
Presentation transcript:

Intel Itanium 2 Processor Intel’s Server Solution Raymond Ball April 2, 2004

Presentation Overview Why Intel Itanium 2 in a DSP class? General specifications and features Instruction set DSP in Itanium 2 Itanium 2 vs. TigerSHARC (?)

Why Itanium 2 Itanium 2 designed for heavy loaded and number crunching servers which has some similarities to DSP It’s always a good idea to see what other solutions are available Designs tend to over time borrow ideas from other fields which may give insight To see if the power in the processor is really worth the cost Because I was interested

Specifications (April 2004) Clock GHz L3 cache up to 6MB 64 bit 128 bit bus (400 MHz) Price: $3k - $5k ea.  IA-32 “compatible” Considered RISC Pipeline 8 deep 6 instructions / cycle in 2 bundles of 3 Power consumption: 110W (130W max) registers

Register Stack Engine (RSE) First 32 registers are global (static) GR0 is hardwire as 0 GR0 is hardwire as 0 Seen this in SHARC because immediate will kill the pipeline Seen this in SHARC because immediate will kill the pipeline GR32 – GR63 local procedure registers The remaining 96 registers are used to store stacked register frames If more room is needed, the registers are pushed onto memory Transparently maintains the illusion of an infinite number of registers Only for the GRs (other registers are all global)

Instruction set Instructions come in bundles of 3 operations and 2 bundles are pulled in once a cycle Uses a special Explicitly Parallel Instruction Computing (EPIC) format The format moves the responsibility of resource management on to the compiler Template value dictates to which execution unit an operation will be performed Slot 2 Slot 1 Slot 0 Template Bit 0 Bit 127 Bit 5Bit 46Bit 87

Bundled Code Example {.mii add r1 = r2, r3 sub r4 = r5, r6 ;; shr r7 = r8, r9 } {.mfi ld4r14=[r56] fadd f10=f12,f13 add r16=r18,r19 } {.mmi st4 [r16]=r67 ;; add r24=r56,r57 add r28=r58,r59 } Cycle 0 – Start of a Memory-Integer-Integer bundle Cycle 1 – Part of the last bundle plus another Memory-Float-Integer bundle done in this cycle Cycle 2 – A single operation Cycle 3 – last two operations in the snippet

Save me compiler! Instruction set and pipeline so difficult to handle you won’t do much better than the compiler With the EPIC architecture, more resource management is put on the compiler, which means extra work for human compilers The most efficient DSP algorithms tend to come from human compilers Difficult to utilize all of the system resources like a hand made DSP algorithm Difficult to utilize all of the system resources like a hand made DSP algorithm What’s wrong with r1 = r2 + r3?

DSP Relation How does the instruction set compare to a DSP processor? RISC type instruction set RISC type instruction set For example, no mem-to-mem move For example, no mem-to-mem move Itanium 2 could easily be used to efficiently do a DSP algorithm The Itanium 2 basically includes every trick in the book thus far, which includes borrowing ideas from DSP

Pro-DSP Many single cycle instructions Instructions are designed for a heavily pipelined environment Processor has ways of accessing the data in a SIMD fashion (8x8-bit, 4x16-bit, 2x32-bit, 1x64- bit) High precision registers (82-bit floating-point accumulator) People wonder whether 64-bit processing is necessary, well THIS is where it’s necessary People wonder whether 64-bit processing is necessary, well THIS is where it’s necessary High number of registers for fast access

Anti-DSP No hardware loops No hardware circular buffers Only a single bus (although fast 6.4GB/s) High power usage

TigerSHARC vs. Itanium 2 COST! ($0.3k vs. $3k) Both heavily pipelined Both very hard to code by hand There really is no comparison Processors were made for two different intensions Processors were made for two different intensions The framework that is typically built around the chips makes it even harder to compare The framework that is typically built around the chips makes it even harder to compare

Conclusion You get what you pay for… or maybe a little less The Itanium 2 is consider to be a high-end server processor The Itanium 2 is consider to be a high-end server processor Anything high-end tends to be very over priced (rack mount equipment) Anything high-end tends to be very over priced (rack mount equipment) Sure, it’s a DSP processor but for that price it should make you toast in the morning too

References Intel Itanium 2 Processor Hardware Developer’s Manual Intel Itanium 2 Processor Reference Manual A 1.5-GHz 130-nm Itanium 2 Processor With 6MB On-die L3 Cache. IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 11, NOVEMBER Stefan Rusu, Senior Member, IEEE, Jason Stinson, Simon Tam, Member, IEEE, Justin Leung, Harry Muljono, and Brian Cherkauer.