Presentation is loading. Please wait.

Presentation is loading. Please wait.

TILEmpower-Gx36 - Architecture overview & performance benchmarks – Presented by Younghyun Jo 2013/12/18.

Similar presentations


Presentation on theme: "TILEmpower-Gx36 - Architecture overview & performance benchmarks – Presented by Younghyun Jo 2013/12/18."— Presentation transcript:

1 TILEmpower-Gx36 - Architecture overview & performance benchmarks – Presented by Younghyun Jo 2013/12/18

2 2Computer Systems and Platforms Lab Outlines Architecture Overview Motivation Specification of TILE-Gx8036 processors Performance evaluations Computational performance evaluation Memory performance evaluation Conclusion

3 3Computer Systems and Platforms Lab Motivation of Tilera architectures

4 4Computer Systems and Platforms Lab Motivation Dr. Anant Agarwal A founder of Tilera Corp. Computer architecture researcher, professor of EECS at MIT He led Alewife project and Raw architecture project MIT Alewife project (1990 ~ 1999) Alewife : a large scale multiprocessor Cache-coherent, distributed shared memory and user-level massage-passing in a single integrated hardware framework Raw Processor (1997 ~ 2007) Tiled multicore architecture Wire efficient multicore architecture (interconnection between tiles) Highly parallel VLSI, Compiler knows low-level details of the hardware 2002

5 5Computer Systems and Platforms Lab Motivation Scalar Operand Networks [IEEE TPDS] : Challenges and overcomes in the design of scalable Scalar Operand Networks Frequency Scalability Bandwidth Scalability Deadlock and Starvation Handling Exceptional Events Efficient Operation-Operand Matching Tiled multicore Distributed everything + Routed interconnection Replace long wires with routed interconnect From centralized clump of CPUs to distributed ALUs, Routed Bypass Network From a large centralized cache to a distributed shared cache

6 6Computer Systems and Platforms Lab Specification of TILE-Gx8036 processors

7 7Computer Systems and Platforms Lab TILE-Gx8036 36 cores DDR3 DRAM Rshim  Boot controls, diagstics TRIO  Transactional I/O with DMA mPIPE  Packet management MiCA  Hardware accellerators  Crypto & Compression

8 8Computer Systems and Platforms Lab TILE-Gx8036 Each core Processor  1.2 GHz  64 bits addressing mode  3 way VLIW CPU Storage  32 KB L1I / L1D Cache  256 KB L2 Cache  9MB coherent L3 cache : Dynamic Distributed Cache

9 9Computer Systems and Platforms Lab Processor Pipelines Processor pipelines It consists of 6 main stages  Fetch, Branch Predict, Decode, Execute 0, Execute 1, and Write Back

10 10Computer Systems and Platforms Lab Processor Pipelines Pipeline latencies

11 11Computer Systems and Platforms Lab Switch Interfaces IDN : Internal dynamic networks UDN : User dynamic networks RDN : Memory response networks QDN : Memory request networks SDN : Shared dynamic networks

12 12Computer Systems and Platforms Lab Operating systems/Processes isolation Hardwall Prevent unwanted communication between user applications running on adjacent tiles  Programmable protection bit on each outport of the UDN or STN Hardwall also provides a powerful virtualization tool

13 13Computer Systems and Platforms Lab Network Arbitration Packets requiring the same output port are blocked until the current packet has finished routing It basically use round robin manner  Round robin  Network priority round robin Routing algorithm  X dimension is checked first  Y dimension is checked as follows

14 14Computer Systems and Platforms Lab System Software Stack Tile Processor Hardware Hypervisor Supervisor : Tile Linux Applications / User 4 different modes for tiles Standard : SMP Tile Linux (2.6.38) Dataplane : Zero Overhead Linux Bare metal environments : User-created run-time environment Dedicated : Tile for debugging

15 15Computer Systems and Platforms Lab Bare metal environment Bare Metal Environment Run-time environment that allows users to run applications that require direct access to the hardware Abilities  Full access to all hardware resources  Install interrupt vectors  Virtual/physical memory allocator  I/O device setup  UDN/IDN (also can communicate with SMP Linux)  Libc utilities that do not depend on OS system services

16 16Computer Systems and Platforms Lab Power management Dynamic voltage and frequency scaling (DVS, DFS) are available Configurable I/O and accelerator shutdowns Hardware-initiated zero-latency Tile sleep Software-initiated low-power Tile NAP mode

17 17Computer Systems and Platforms Lab Multicore Development Environment TILEmpower-Gx Development environment X86 Host machine bern.snu.ac.kr -MDE 4.1/4.2- - RPM - Operating systems Multicore profiler/debugger Evaluation platforms KVM, IDE, gcc, and so on $ tile-monitor -flags

18 18Computer Systems and Platforms Lab Computational performance evaluation

19 19Computer Systems and Platforms Lab Computational performance evaluation Benchmark scenario Matrix Multiplication with OpenMP C (1000 by 1000) = A (1000 by 1000) X B (1000 by1000) Performance

20 20Computer Systems and Platforms Lab Memory performance evaluation

21 21Computer Systems and Platforms Lab Memory performance for each core Memory access cycles for each core on ZOL (Zero Overhead Linux) Blue : load buffer0 in node0 / Green : load buffer1 in node1 Tile 0 104 114 Tile 1 106 112 Tile 2 108 109 Memory Node 0 Buffer 0 Memory Node 1 Buffer 1 Tile 3 109 108 Tile 4 112 106 Tile 5 114 104 Tile 6 100 109 Tile 7 102 107 Tile 8 104 105 Tile 9 106 103 Tile 10 108 100 Tile 11 109 100 Tile 12 104 114 Tile 13 106 112 Tile 14 108 109 Tile 15 109 108 Tile 16 112 106 Tile 17 114 104 Tile 18 104 114 Tile 19 106 112 Tile 20 108 109 Tile 21 109 108 Tile 22 112 106 Tile 23 114 104 Tile 24 100 109 Tile 25 102 107 Tile 26 104 105 Tile 27 106 103 Tile 28 108 100 Tile 29 109 100 Tile 30 104 114 Tile 31 106 112 Tile 32 108 109 Tile 33 109 108 Tile 34 112 106 Tile 35 *** Faster row Legend : the number of cycles

22 22Computer Systems and Platforms Lab Memory performance for each core Memory access cycles for each core on BME (Bare Metal Environment) Blue : load buffer0 in node0 / Green : load buffer1 in node1 Tile 0 103 113 Tile 1 105 111 Tile 2 107 108 Memory Node 0 Buffer 0 Memory Node 1 Buffer 1 Tile 3 108 107 Tile 4 111 105 Tile 5 113 103 Tile 6 100 108 Tile 7 100 106 Tile 8 103 104 Tile 9 105 102 Tile 10 107 100 Tile 11 109 98 Tile 12 103 113 Tile 13 105 111 Tile 14 107 108 Tile 15 108 107 Tile 16 111 105 Tile 17 113 103 Tile 18 103 113 Tile 19 105 111 Tile 20 107 108 Tile 21 108 107 Tile 22 111 105 Tile 23 113 103 Tile 24 100 108 Tile 25 100 106 Tile 26 103 104 Tile 27 105 102 Tile 28 107 100 Tile 29 109 98 Tile 30 100 113 Tile 31 105 111 Tile 32 107 108 Tile 33 108 107 Tile 34 111 105 Tile 35 113 103 Faster row Legend : the number of cycles

23 23Computer Systems and Platforms Lab Memory controller Memory controller block diagram

24 24Computer Systems and Platforms Lab Thank you


Download ppt "TILEmpower-Gx36 - Architecture overview & performance benchmarks – Presented by Younghyun Jo 2013/12/18."

Similar presentations


Ads by Google