Presentation is loading. Please wait.

Presentation is loading. Please wait.

IA-64 Architecture Muammer YÜZÜGÜLDÜ CMPE 511 16/12/2004.

Similar presentations


Presentation on theme: "IA-64 Architecture Muammer YÜZÜGÜLDÜ CMPE 511 16/12/2004."— Presentation transcript:

1 IA-64 Architecture Muammer YÜZÜGÜLDÜ CMPE 511 16/12/2004

2 IA-64 Architecture  Agenda Architecture Basics Predication Speculation Software Pipelining

3 Architectural Basics  A full 64-bit address space  Large directly accessible register files  Enough instruction bits to communicate information from the compiler to the hardware  The ability to express arbitrariliy large amounts of ILP

4 Register Resources  128 65-bit general registers  128 82-bit floating-point registers  Space for up to 128 64-bit special-purpose application registers  8 64-bit branch registers for function call linkage and return  64 one-bit predicate registers that hold the result of conditional expression evaluation

5 Register Resources

6 Instruction Encoding  14 bits for opcode  7 bits for registers  5 bits for template to help decode and route instruction and indicate the location of stops that mark end of groups of instructions that can execute in parallel.

7 Instruction Encoding

8 IA-64 virtual memory Model  Can map 16 million Gbytes of virtual space.  Bits 63-61 of a virtual address index into eigth region registers that contain 24-bit region identifiers(RIDs)  The 24-bit RID is concatenated with the virtual page number(VPN) to form a unique lookup into the Translation look-aside buffer(TLB)

9 IA-64 virtual memory Model  The TLB lookup generates two main items:the physical page number and access privileges.

10 IA-64 virtual memory Model

11 Predication  Removes branches, converts to predicated execution Executes multiple paths simultaneously  Increases performance by exposing parallelism and reducing critical path Better utilization of wider machines Reduces mispredicted branches

12 Predication

13 Predication Benefits  Reduces branches and mispredict penalties 50% fewer branches and 37% faster code*  Parallel compares further reduce critical paths  Greatly improves code with hard to predict branches Large server apps- capacity limited Sorting, data mining- large database apps Data compression  Traditional architectures’ “bolt-on” approach can’t efficiently approximate predication Cmove: 39% more instructions, 23% slower performance* Instructions must all be speculative

14 Data Speculation  Compiler can issue a load prior to a preceding, possibly-conflicting store

15 Architectural Support for Data Speculation  Instructions ld.a - advanced loads ld.c - check loads chk.a - advance load checks  Speculative Advanced loads - ld.sa - is an advanced load with deferral  ALAT - HW structure containing outstanding advanced loads

16 Speculation Benefits  Reduces impact of memory latency Study demonstrates performance improvement of 79% when combined with predication*  Greatest improvement to code with many cache accesses Large databases Operating systems  Scheduling flexibility enables new levels of performance headroom

17 Speculation Drawbacks  Drawbacks even if speculation is correct Registers used speculatively must be kept alive until the check (increases register pressure) For each speculation, recovery code is needed, which increases code size  Drawbacks if speculation is incorrect Recovery code has to be executed; additional cycles Recovery code may not be in cache; loading delay

18 Software Pipelining  Overlapping execution of different loop iterations  More iterations in same amount of time

19 Software Pipelining  IA-64 features that make this possible Full Predication Special branch handling features Register rotation: removes loop copy overhead Predicate rotation: removes prologue & epilogue

20 Software Pipelining Benefits  Loop pipelining maximizes performance; minimizes overhead Avoids code expansion of unrolling and code explosion of prologue and epilogue Smaller code means fewer cache misses Greater performance improvements in higher latency conditions  Reduced overhead allows S/W pipelining of small loops with unknown trip counts Typical of integer scalar codes

21 Performance  Backwardly compatible through emulation with previous instruction sets (RISC – IA32), although performs badly  IA64 code (EPIC instruction set) will run on any member of the Itanium family  To get optimum performance, code must be recompiled with processor-specific information (different numbers of functional units/pipeline changes)  Itanium 2 is two times faster than Itanium

22 Performance

23 Target Market  High end servers  Database machines  Development shops  NOT suitable for home PCs

24 Questions ?


Download ppt "IA-64 Architecture Muammer YÜZÜGÜLDÜ CMPE 511 16/12/2004."

Similar presentations


Ads by Google