-- Satya P. Vedula Intel © – Itanium TM Architecture.

Slides:



Advertisements
Similar presentations
CPU Structure and Function
Advertisements

HISTORY OF MICROPROCESSORS Gursharan Singh Tatla 1.
SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan
COMP375 Computer Architecture and Organization Senior Review.
More Intel machine language and one more look at other architectures.
1 Overview Assignment 4: hints Memory management Assignment 3: solution.
Complex Instruction Set Computer (CISC)
SE-292 High Performance Computing
DSPs Vs General Purpose Microprocessors
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
EZ-COURSEWARE State-of-the-Art Teaching Tools From AMS Teaching Tomorrow’s Technology Today.
ARCHITECTURE OF APPLE’S G4 PROCESSOR BY RON WEINWURZEL MICROPROCESSORS PROFESSOR DEWAR SPRING 2002.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
Computer Organization and Architecture
Computer Organization and Architecture
Computer Organization and Architecture
Processor Technology and Architecture
Chapter 12 Three System Examples The Architecture of Computer Hardware and Systems Software: An Information Technology Approach 3rd Edition, Irv Englander.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
Unit-1 PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE Advance Processor.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
CH12 CPU Structure and Function
Computer performance.
Intel Architecture. Changes in architecture Software architecture: –Front end (Feature changes such as adding more graphics, changing the background colors,
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
The Pentium Processor.
The Pentium Processor Chapter 3 S. Dandamudi To be used with S. Dandamudi, “Introduction to Assembly Language Programming,” Second Edition, Springer,
Types of Computers Mainframe/Server Two Dual-Core Intel ® Xeon ® Processors 5140 Multi user access Large amount of RAM ( 48GB) and Backing Storage Desktop.
The Arrival of the 64bit CPUs - Itanium1 นายชนินท์วงษ์ใหญ่รหัส นายสุนัยสุขเอนกรหัส
Introduction of Intel Processors
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Is Out-Of-Order Out Of Date ? IA-64’s parallel architecture will improve processor performance William S. Worley Jr., HP Labs Jerry Huck, IA-64 Architecture.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Anshul Kumar, CSE IITD CS718 : VLIW - Software Driven ILP Example Architectures 6th Apr, 2006.
Hardware Support for Compiler Speculation
Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.
Introducing The IA-64 Architecture - Kalyan Gopavarapu - Kalyan Gopavarapu.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Fundamentals of Programming Languages-II
Lecture # 10 Processors Microcomputer Processors.
Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
IA-64 Architecture Muammer YÜZÜGÜLDÜ CMPE /12/2004.
Chapter Overview General Concepts IA-32 Processor Architecture
William Stallings Computer Organization and Architecture 6th Edition
Protection in Virtual Mode
A Closer Look at Instruction Set Architectures
Basic Computer Organization
Henk Corporaal TUEindhoven 2009
Introduction to Pentium Processor
Henk Corporaal TUEindhoven 2011
Sampoorani, Sivakumar and Joshua
* From AMD 1996 Publication #18522 Revision E
Computer Architecture
Chapter 11 Processor Structure and function
Presentation transcript:

-- Satya P. Vedula Intel © – Itanium TM Architecture

Intel – Itanium Architecture 1.History 2.Introduction 3.Block Diagram 4.Pipeline 5.Register Set 6.Instruction Set 7.EPIC 8.x86 Compatibility 9.Database on Itanium 10.Security & Itanium 11.Itanium and Java 12.Itanium and Win64 Agenda

Intel – Itanium Architecture History / DX/SX SX/DX Pentium 1997 Pentium MMX k134k275k1.2M3.1M4.5M Transistors FPU None/ built-In built-in Cache 8k – L116k L132k L1 Generation

Intel – Itanium Architecture 1995 Pentium Pro 1997 Pentium II 2001 Itanium 1999 Pentium III 2001 Pentium Mobile Pentium M – 7.5M 27.4M9.3M42M25M Transistors 8 Cache 16k L1 512k L2 32k – L1 96k – L2 4M – L3 32k L1 History contd.. Generation

Intel – Itanium Architecture The Intel ® Itanium TM processor is the first in a family of processors based on the new Itanium architecture. Introduction - Itanium Explicitly Parallel Instruction Computing (EPIC) technology enables up to 20 operations/clock. Three levels of cache reduce memory latency: 2MB or 4MB Level 3 cache, 96K Level 2 cache, and 32K Level 1 cache. Operating frequencies of 733MHz and 800MHz. 266MHz data bus enables fast system bus transactions with 2.1 GB/sec bandwidth. Advanced error detection, correction and containment provided by Machine Check Architecture (MCA), comprehensive error logging, and Error Correcting Code (ECC) on caches and the system bus. IA-32 instruction binary compatibility in hardware. 6.4 giga flops at peak performance Product Highlights

Intel – Itanium Architecture 2. Block Diagram Complex block diagram Simple block diagram

Intel – Itanium Architecture Itanium – 10 stages Pentium III - 12-stages Alpha – 8 stages Pentium stages Athlon - 10 stages Pipeline 10 stage In-Order pipeline Comparison with others

Intel – Itanium Architecture general-purpose integer registers (each 64 bits wide), floating-point registers (each 82 bits wide), bit predicate registers - 64 branch registers - 8 Register Set Each task can have individual set of registers

Intel – Itanium Architecture Instructions are 41 bits long. It takes 7 bits to specify one of 128 GPR 2 source-operand fields and a destination field = 21 bits Predication = 6 bits (64 combination) 1 Bundles = 128 bits (Instructions are given in bundles) three 41-bit instructions (making 123 bits), plus one 5-bit template Instruction categories = 4 integer, load/store, floating-point, and branch operations. Instruction Set

Intel – Itanium Architecture - Conditional (predicated) execution - hinted and speculative loads (LD.A – Load Advanced, uses special buffer ALAT) - 64 free-form predicate bits (Earlier Chips have (zero), V (overflow), S (sign), and N (negative) flags ) - One conditional branch with 64 predicate bits - VLIW features - Groups of independent instructions - Simple hardware - Exploit Instruction Level Parallelism (ILP) with Compiler EPIC EPIC: Explicitly Parallel Instruction Computing It is a combination of features from RISC and VLIW Advantages - Large increase in code size - Blocking caches Disadvantages

Intel – Itanium Architecture 1. Compare x to 4 2. If not equal go to line 5 3. z = 9 4. go to line 6 5. z = 0 6. // Program continues from here if (x == 4) z = 9 else z = 0; 1. Compare x to 4 and store result in a predicate bit (we'll call it A) 2. If A==1; z = 9 3. If A==0; z = 0 EPIC – Power to Compilers C source code: Compiled on Pentium 32-bit compiled code 64-bit compiled code Compiled on Itanium

Intel – Itanium Architecture Data Speculation A sequence of instructions which consist of an advanced load, zero or more instructions dependent on the value of that load, and a check instruction Code speculation It is a Compiler Concept. An instruction or a sequence of instructions is executed before it is known that the dynamic control flow of the program will actually reach the point in the program where the sequence of instructions is needed Prediction Preprocessing 1) Register use, 2) Loop optimization, 3) Instruction execution order, and 4) logical program layout Branch prediction now given to Programmers. For dynamic runtime branch prediction EPIC Features

Intel – Itanium Architecture - Complexity shifts to compilers - Methods to express compile time information - Optimized FPUs for multimedia applications - Reliability and performance – server side Compiler advantages EPIC Features contd..

Intel – Itanium Architecture - Supports all x86 instructions including MMX, SSE (not SSE2), Protected, Virtual 8086, and Real mode features - Run entire OS in x86 mode, or run the applications under a new IA-64 OS. - X86 compatible registers: AR24 through AR31 - JMPE: Switch instruction to switch between x86 and new mode x86 compatibility x86 – Register compatibility

Intel – Itanium Architecture Transistors: 325 million Processor chip: 25 million (including L1 and L2 caches) each of the four L3 cache: 75 million Pentium III : 24 million Pentium 4: 42 million Itanium Code: 2x Pentium (estimated) 30% more than other RISC How does it looks like?

Intel – Itanium Architecture Itanium - anatomy

Intel – Itanium Architecture Photograph of Alpha Slot B module UltraSPARC-III chips MIPS 20K processor IBM Power4 module Other 64 bit processors

Intel – Itanium Architecture Overview of the processors

Intel – Itanium Architecture Its just beginning Merced McKinley Madison Deerfield Itanium Code names

Intel – Itanium Architecture Databases A quantum leap

Intel – Itanium Architecture The Coming Content Big Bang B B B B 40,000 BCE cave paintings bone tools 3500 writing 0 C.E. paper printing 1870 electricity, telephone transistor 1947 computing 1950 Late 1960s Internet (DARPA) 1993 The web 1999 GIGABYTES Source: IBM Informix Conference, 2001 Las Vegas Databases – Storage needs Contd..

Intel – Itanium Architecture Data Explosion! We are in the midst of a data explosion –The Big Bang! Terabytes of data –Common corporate expression –Petabytes(10^15) & Exabytes(10^18) is fast approaching 2-3 Exabytes = total volume of all information generated worldwide annually Storage capacities are growing –72 GB Hard Drive (HD) becoming industry standard –180 GB High Density HD – in production Source: IBM Informix Conference, 2001 Las Vegas Databases – Storage – Requirements

Intel – Itanium Architecture The Need for Speed Memory access speeds desired – long term –Memory latency averaging nano seconds –Max = 256 GB of RAM –64 bit => 20 Exabytes addressing capabilities Disk access speeds are the reality – near term –Disk latency averaging 3-4 milli seconds –4 orders of magnitude slower DW tables contain Billions of rows Light table Scan – 100 byte 1 GB/s –~ 9 million rows/sec –~ 540 million rows/minute –5.4 billion rows (500GB) ~ 10 minutes Source: IBM Informix Conference, 2001 Las Vegas Databases contd..

Intel – Itanium Architecture Databases – Itanium advantages 64-bit addressing Tens of Gigabytes to thousands of Terabytes stored in nanosecond access main memory eliminates millisecond disk access times thus improving application response time. Large number of Registers and innovative register model Data and intermediate calculations stored in on-chip registers reduce the repetitive load and store of intermediate data values thus improving the response time of an applications database request. Instruction set parallelism Ability to execute instructions in parallel allows quick access simultaneously and manipulation of data derived from multiple rows and columns of a large in-memory database table or tables. Predication Predication allows the conditional execution of instructions before it is known whether the execution is needed. Predication allows more code to execute in parallel, the performance penalty of branch-dependent code is less, and applications with heavy branching speed Up.

Intel – Itanium Architecture Databases – Itanium advantages contd.. Control/Data Speculation Control speculation allows certain load instructions to be scheduled before conditional branch instructions, rather than after. Data speculation is similar to control speculation but allow loads to be scheduled above stores. Both allow a reduction in the CPU wait states generated by branch-intensive code with high latency RAM accesses thus speeding application performance. Instruction/Data Prefetch Instruction prefetches can be signaled on branch instructions. Data can be prefetched with explicit prefetch instructions. Both prefetches speed application performance by reducing wait states. Advantages Big databases like, -Data warehousing -Decision Support -Web-Enabled ERP

Intel – Itanium Architecture Security

Intel – Itanium Architecture - Common encryption algorithms run 3-5 times faster - EPIC parallelism with register rotation makes algorithms more faster - Performance boost to CAD/CAE applications due to increased floating point registers - Performance boost to 3d applications - 82-bit floating-point unit offers high precision - RSA computations are 512-bits to 1024-bits in length - New Multiply-Add Instruction comes to aide - Parallelism comes to aide (2 128-bit computations are performed in parallel) - Predication eliminates branches (if) from RSA computations - RSA, AES, SHA-1 algorithms are improved, as they use only counted loops utilizing Register Rotation - Vast number of registers - Large Physical Memory for Security Cache: Directory Services can be stored on Memory - Network traffic can be encrypted Security

Intel – Itanium Architecture Security contd.. Performance statistics – Encryption algorithms RSAECCAESDESRC6SHA Multi-precision arithmetic XXXX Multi-precision logical operationX XXXX Fixed data rotateX X Variable data rotateX XX X Integer multiplicationX XX X Sbox lookupXX X Logical OperationXX

Intel – Itanium Architecture Java

Intel – Itanium Architecture - Garbage Collection - Object-oriented programming (OOP) - Byte code vs. native machine code - Variability of performance because of interpretation - Multithreaded applications - Java Native Interface Vs. Native Method Interface - Network Performance - Limitations with current architectures - EJB involves frequent invocation of method calls - Java needs dynamic bounds checking, null checking, exception handling - Java has a 64 bit integer data type – long - Java Object Handles (ObjId) is 64-bit Java Common Java Limitations (J2SE 1.3)

Intel – Itanium Architecture - Streamlined Garbage Collection reduces pause time - OOP: IBM Java uses Thread Local Heaps allowing variable sized thread local heaps - Just-In-Time compiler translates to optimized native code - Mixed Mode Interpreter does Selective Compilation - Multi-threading now has light weight and full power mode - JNI enhanced and NMI removed in Java 2 - N/w Performance: Java Socket API overhead removed Advantages using IBM Java2 Java Contd..

Intel – Itanium Architecture - Predication: Branching caused by Java technologys bounds checking is benefited - Speculation: Multiway branching allows address locations and data needed for Javas bounds and null checks to be prefetched increasing performance - Instruction Parallelism: Multiple execution units run instructions concurrently increasing the performance - Register Set: Smaller methods need not contend for registers as more registers are available Advantages using Itanium Java Contd..

Intel – Itanium Architecture Win64

Intel – Itanium Architecture Win64 Type NameWhat it is LONG32, INT3232-bit Signed LONG64, INT6464-bit Signed ULONG32,UNIT32, DWORD32 32-bit Unsigned ULONG64,UNIT64, DWORD64 64-bit Unsigned Type NameWhat it is INT_PTR, LONG_PTR Signed Int, Pointer Precision UINT_PTR, ULONG_PTR DWORD_PTR Unsigned Int, Pointer Precision SIZE_TUnsigned Count, Pointer Precision SSIZE_TSigned Count, Pointer Precision Win64 data types

Intel – Itanium Architecture Win64 Issues Win64 Contd.. - LLP64 issues - Porting issues (32-bit to 64-bit) - Polymorphic data usage - Pointer/length combinations - RPC and COM - Supports RPC between IA-32 and IA-64 - Supports LocalServer style (out-of-proc) COM between IA- 32 and IA-64 bit processes - IA-32 DLL cannot be loaded into 64-bit process - IA-64 DLL cant be loaded into 32-bit process - Use COM as out-of-proc (Solves prev 2 problems) - PnP should be RPCable enabled

Intel – Itanium Architecture Questions?