Presentation is loading. Please wait.

Presentation is loading. Please wait.

Adam Kunk Anil John Pete Bohman.  Released by IBM in 2010 (~ February)  Successor of the POWER6  Implements IBM PowerPC architecture v2.06  Clock.

Similar presentations


Presentation on theme: "Adam Kunk Anil John Pete Bohman.  Released by IBM in 2010 (~ February)  Successor of the POWER6  Implements IBM PowerPC architecture v2.06  Clock."— Presentation transcript:

1 Adam Kunk Anil John Pete Bohman

2  Released by IBM in 2010 (~ February)  Successor of the POWER6  Implements IBM PowerPC architecture v2.06  Clock Rate: 2.4 GHz - 4.25 GHz  Feature size: 45 nm  ISA: Power ISA v 2.06 (RISC)  Cores: 4, 6, 8  Cache: L1, L2, L3 – On Chip References: [1], [5]

3  PERCS – Productive, Easy-to-use, Reliable Computer System  DARPA funded contract that IBM won in order to develop the Power7 ($244 million contract, 2006) ▪ Contract was to develop a petascale supercomputer architecture before 2011 in the HPCS (High Performance Computing Systems) project.  IBM, Cray, and Sun Microsystems received HPCS grant for Phase II.  IBM was chosen for Phase III in 2006. References: [1], [2]

4  Side note:  The Blue Waters system was meant to be the first supercomputer using PERCS technology.  But, the contract was cancelled (cost and complexity).

5 2004 2001 20072010 POWER4/4+  Dual Core Dual Core  Chip Multi Processing Chip Multi Processing  Distributed Switch Distributed Switch  Shared L2 Shared L2  Dynamic LPARs (32) Dynamic LPARs (32)  180nm, 180nm, POWER5/5+  Dual Core & Quad Core Md Dual Core & Quad Core Md  Enhanced Scaling Enhanced Scaling  2 Thread SMT 2 Thread SMT  Distributed Switch + Distributed Switch +  Core Parallelism + Core Parallelism +  FP Performance + FP Performance +  Memory bandwidth + Memory bandwidth +  130nm, 90nm 130nm, 90nm POWER6/6+  Dual Core Dual Core  High Frequencies High Frequencies  Virtualization + Virtualization +  Memory Subsystem + Memory Subsystem +  Altivec Altivec  Instruction Retry Instruction Retry  Dyn Energy Mgmt Dyn Energy Mgmt  2 Thread SMT + 2 Thread SMT +  Protection Keys Protection Keys  65nm 65nm POWER7/7+  4,6,8 Core 4,6,8 Core  32MB On-Chip eDRAM 32MB On-Chip eDRAM  Power Optimized Cores Power Optimized Cores  Mem Subsystem ++ Mem Subsystem ++  4 Thread SMT++ 4 Thread SMT++  Reliability + Reliability +  VSM & VSX VSM & VSX  Protection Keys+ Protection Keys+  45nm, 32nm 45nm, 32nm POWER8 Future First Dual Core in Industry Hardware Virtualization for Unix & Linux Fastest Processor In Industry Most POWERful & Scalable Processor in Industry References: [3]

6  IBM POWER7 Demo IBM POWER7 Demo

7 Cores:  8 Intelligent Cores / chip (socket)  4 and 6 Intelligent Cores available on some models  12 execution units per core  Out of order execution  4 Way SMT per core  32 threads per chip  L1 – 32 KB I Cache / 32 KB D Cache per core  L2 – 256 KB per core Chip:  32MB Intelligent L3 Cache on chip Core L2 Core L2 Memory Interface Core L2 Core L2 Core L2 Core L2 Core L2 Core L2 GXGX SMPFABRICSMPFABRIC POWERPOWER BUSBUS Memory++ L3 Cache eDRAM References: [3]

8

9  Each core implements “aggressive” out-of- order (OoO) instruction execution  The processor has an Instruction Sequence Unit capable of dispatching up to six instructions per cycle to a set of queues  Up to eight instructions per cycle can be issued to the Instruction Execution units References: [4]

10

11  8 inst. fetched from L2 to L1 I-cache or fetch buffer  Balanced instruction rates across active threads  Inst. Grouping  Instructions belonging to group issued together  Groups contain independent instructions

12  Branch Prediction

13  Each POWER7 core has 12 execution units:  2 fixed point units  2 load store units  4 double precision floating point units (2x power6)  1 vector unit  1 branch unit  1 condition register unit  1 decimal floating point unit References: [4]

14

15

16  Simultaneous Multithreading  SMT1: Single instruction execution thread per core  SMT2: Two instruction execution threads per core  SMT4: Four instruction execution threads per core  This means that an 8-core Power7 can execute 32 threads simultaneously

17 Thread 1 ExecutingThread 0 ExecutingNo Thread Executing FX0 FX1 FP0 FP1 LS0 LS1 BRX CRL Single thread Out of Order FX0 FX1 FP0 FP1 LS0 LS1 BRX CRL S80 HW Multi-thread FX0 FX1 FP0 FP1 LS0 LS1 BRX CRL POWER5 2 Way SMT FX0 FX1 FP0 FP1 LS0 LS1 BRX CRL POWER7 4 Way SMT Thread 3 ExecutingThread 2 Executing References: [3]

18

19

20  (Look at section 2.1.4 in http://www.redbooks.ibm.com/redpapers/pd fs/redp4639.pdf) http://www.redbooks.ibm.com/redpapers/pd fs/redp4639.pdf

21 ParameterL1L2L3 (Local)L3 (Global) Size64 KB (32 I, 32 D) 256 KB4 MB32 MB Access Time.5 ns2 ns6 ns30 ns Associativity4-way I-cache 8-way D-cache 8-way Write PolicyWrite ThroughWrite BackPartial VictimAdaptive Line size128 B

22  2 read ports, 1 write port  Write has higher priority over a read  Write-Through  No L1 cast-outs required  B-Tree LRU replacement  Way prediction bits reduce hit latency

23  Inclusive of L1  L3 partial victim relationship

24  Details of the L3 Cache …. (leads up to eDRAM)

25  eDRAM – Embedded dynamic random-access memory  This means the L3 cache (shared 32 MB) is on-chip  Essentially faster due to decreased distance  Less area, less power, on-chip interconnects provide each core with 32-byte buses to and from the L3 cache  Side note: eDRAM is also used in many different game consoles (PS2, GameCube, Wii, Etc.) References: [5], [6]

26  eDRAM in the POWER7 provides 1/6 the latency and twice the bandwidth (compared with off-chip eDRAM), and 1/5 standby power in 1/3 the required area (compared with SRAM) References: [5]

27

28

29  1. http://en.wikipedia.org/wiki/POWER7 1. http://en.wikipedia.org/wiki/POWER7  2. http://en.wikipedia.org/wiki/PERCS 2. http://en.wikipedia.org/wiki/PERCS  3. Central PA PUG POWER7 review.ppt  http://www.google.com/url?sa=t&rct=j&q=&esrc =s&source=web&cd=1&ved=0CCEQFjAA&url=ht tp%3A%2F%2Fwww.ibm.com%2Fdeveloperwor ks%2Fwikis%2Fdownload%2Fattachments%2F1 35430247%2FCentral%2BPA%2BPUG%2BPOW ER7%2Breview.ppt&ei=3El3T6ejOI-40QGil- GnDQ&usg=AFQjCNFESXDZMpcC2z8y8NkjE- v3S_5t3A

30  4. http://www.redbooks.ibm.com/redpapers/p dfs/redp4639.pdf http://www.redbooks.ibm.com/redpapers/p dfs/redp4639.pdf  5. http://www.serc.iisc.ernet.in/~govind/243/P ower7.pdf http://www.serc.iisc.ernet.in/~govind/243/P ower7.pdf  6. http://en.wikipedia.org/wiki/EDRAMhttp://en.wikipedia.org/wiki/EDRAM


Download ppt "Adam Kunk Anil John Pete Bohman.  Released by IBM in 2010 (~ February)  Successor of the POWER6  Implements IBM PowerPC architecture v2.06  Clock."

Similar presentations


Ads by Google