Presentation is loading. Please wait.

Presentation is loading. Please wait.

Heterogeneous Computing: New Directions for Efficient and Scalable High-Performance Computing Dr. Jason D. Bakos.

Similar presentations


Presentation on theme: "Heterogeneous Computing: New Directions for Efficient and Scalable High-Performance Computing Dr. Jason D. Bakos."— Presentation transcript:

1 Heterogeneous Computing: New Directions for Efficient and Scalable High-Performance Computing Dr. Jason D. Bakos

2 CSCE 190: Computing in the Modern World 2 Logic Synthesis Behavior: –S = A + B –Assume A is 2 bits, B is 2 bits, C is 3 bits ABC 00 (0) 000 (0) 00 (0)01 (1)001 (1) 00 (0)10 (2)010 (2) 00 (0)11 (3)011 (3) 01 (1)00 (0)001 (1) 01 (1) 010 (2) 01 (1)10 (2)011 (3) 01 (1)11 (3)100 (4) 10 (2)00 (0)010 (2) 10 (2)01 (1)011 (3) 10 (2) 100 (4) 10 (2)11 (3)101 (5) 11 (3)00 (0)011 (3) 11 (3)01 (1)100 (4) 11 (3)10 (2)101 (5) 11 (3) 110 (6)

3 CSCE 190: Computing in the Modern World 3 Logic Gates invNAND2 NAND3 NOR2

4 CSCE 190: Computing in the Modern World 4 Layout 3-input NAND

5 CSCE 791April 2, 2010 5 Minimum Feature Size YearProcessorSpeedTransistorsProcess 1982i2866 - 25 MHz~134,000 1.5 m 1986i38616 – 40 MHz~270,000 1 m 1989i48616 - 133 MHz~1 million.8 m 1993Pentium60 - 300 MHz~3 million.6 m 1995Pentium Pro150 - 200 MHz~4 million.5 m 1997Pentium II233 - 450 MHz~5 million.35 m 1999Pentium III450 – 1400 MHz~10 million.25 m 2000Pentium 41.3 – 3.8 GHz~50 million.18 m 2005Pentium D2 cores/package~200 million.09 m 2006Core 22 cores/die~300 million.065 m 2008Core i74 cores/die 8 threads/die ~800 million.045 m 2010“Sandy Bridge” 8 cores/die 16 threads/die?? ??.032 m

6 Computer Architecture Trends Multi-core architecture: –Individual cores are large and heavyweight, designed to force performance out of generalized code –Programmer utilizes multi-core using OpenMP CSCE 791April 2, 2010 6 L2 Cache (~50% chip) CPU Memory

7 Co-Processors CSCE 791April 2, 2010 7 Special-purpose (not general) processor Accelerates CPU

8 IBM Cell/B.E. Architecture CSCE 791April 2, 2010 8 1 PPE, 8 SPEs Programmer must manually manage 256K memory and threads invocation on each SPE Each SPE includes a vector unit like the one on current Intel processors –128 bits wide

9 CSCE 791April 2, 2010 9 High-Performance Reconfigurable Computing Heterogeneous computing with reconfigurable logic, i.e. FPGAs

10 CSCE 791April 2, 2010 10 Programming FPGAs

11 Heterogeneous Computing CSCE 791April 2, 2010 11 initialization 0.5% of run time “hot” loop 99% of run time clean up 0.5% of run time 49% of code 1% of code co-processor Kernel speedup Application speedup Execution time 50345.0 hours 100503.3 hours 200672.5 hours 500832.0 hours 1000911.8 hours Example: –Application requires a week of CPU time –Offload computation consumes 99% of execution time

12 CSCE 791April 2, 2010 12 Heterogeneous Computing with FPGAs Annapolis Micro Systems WILDSTAR 2 PRO GiDEL PROCSTAR III

13 Heterogeneous Computing with FPGAs CSCE 791April 2, 2010 13 Convey HC-1

14 Heterogeneous Computing with GPUs CSCE 791April 2, 2010 14 NVIDIA Tesla S1070

15 CSCE 791April 2, 2010 15 Heterogeneous Computing now Mainstream: IBM Roadrunner Los Alamos, second fastest computer in the world 6,480 AMD Opteron (dual core) CPUs 12,960 PowerXCell 8i GPUs Each blade contains 2 Operons and 4 Cells 296 racks First ever petaflop machine (2008) 1.71 petaflops peak (1.7 billion million fp operations per second) 2.35 MW (not including cooling) –Lake Murray hydroelectric plant produces ~150 MW (peak) –Lake Murray coal plant (McMeekin Station) produces ~300 MW (peak) –Catawba Nuclear Station near Rock Hill produces 2258 MW

16 CSCE 791April 2, 201016 “Traditional” Parallel/Multi-Processing Large-scale parallel platforms: –Individual computers connected with a high-speed interconnect Upper bound for speedup is n, where n = # processors –How much parallelism in program? –System, network overheads?

17 Acknowledgement Heterogeneous and Reconfigurable Computing Group http://herc.cse.sc.edu Zheming Jin Tiffany Mintz Krishna Nagar Jason BakosYan Zhang CSCE 791April 2, 2010 17


Download ppt "Heterogeneous Computing: New Directions for Efficient and Scalable High-Performance Computing Dr. Jason D. Bakos."

Similar presentations


Ads by Google