Presentation is loading. Please wait.

Presentation is loading. Please wait.

Carlo del Mundo Department of Electrical and Computer Engineering Ubiquitous Parallelism Are You Equipped To Code For Multi- and Many- Core Platforms?

Similar presentations


Presentation on theme: "Carlo del Mundo Department of Electrical and Computer Engineering Ubiquitous Parallelism Are You Equipped To Code For Multi- and Many- Core Platforms?"— Presentation transcript:

1 Carlo del Mundo Department of Electrical and Computer Engineering Ubiquitous Parallelism Are You Equipped To Code For Multi- and Many- Core Platforms?

2 Agenda Introduction/Motivation Why Parallelism? Why now? Survey of Parallel Hardware CPUs vs. GPUs Conclusion How Can I Start? 2

3 Talk Goal Encourage undergraduates to answer the call to the era of parallelism Education Software Engineering 3

4 Why Parallelism? Why now? You’ve already been exposed to parallelism Bit Level Parallelism Instruction Level Parallelism Thread Level Parallelism 4

5 Why Parallelism? Why now? Single-threaded performance has plateaued Silicon Trends Power Consumption Heat Dissipation 5

6 Why Parallelism? Why now? 6

7 Power Chart: P = CV 2 F 7

8 Heat Chart (Feature Size) 8

9 Why Parallelism? Why now? Issue: Power & Heat Good: Cheaper to have more cores, but slower Bad: Breaks hardware/software contract 9

10 Why Parallelism? Why now? Hardware/Software Contract Maintain backwards-compatibility with existing codes 10

11 Why Parallelism? Why now? 11

12 Agenda Introduction/Motivation Why Parallelism? Why now? Survey of Parallel Hardware CPUs vs. GPUs Conclusion How Can I Start? 12

13 Personal Mobile Device Space 13 iPhone 5 Galaxy S3

14 Personal Mobile Device Space 14 2 CPU cores/ 3 GPU cores iPhone 5 Galaxy S3

15 Personal Mobile Device Space 15 2 CPU cores/ 3 GPU cores 4 CPU cores/ 4 GPU cores iPhone 5 Galaxy S3

16 Desktop Space 16

17 Desktop Space 17 16 CPU cores AMD Opteron 6272 Rare To Have “Single Core” CPU Clock Speeds < 3.0 GHz Power Wall Heat Dissipation

18 Desktop Space 18 2048 GPU Cores AMD Radeon 7970 General Purpose Power Efficient High Performance Not All Problems Can Be Done on GPU

19 Warehouse Space (HokieSpeed) 19 Each node: 2x Intel Xeon 5645 (6 cores each) 2x NVIDIA C2050 (448 GPUs each)

20 Warehouse Space (HokieSpeed) 20 Each node: 2x Intel Xeon 5645 (6 cores each) 2x NVIDIA C2050 (448 GPUs each) 209 nodes

21 Warehouse Space (HokieSpeed) 21 Each node: 2x Intel Xeon 5645 (6 cores each) 2x NVIDIA C2050 (448 GPUs each) 209 nodes ★ 2508 CPU cores ★ 187264 GPU cores ★ 2508 CPU cores ★ 187264 GPU cores

22 All Spaces 22

23 Convergence in Computing Three Classes: Warehouse Desktop Personal Mobile Device Main Criteria Power, Performance, Programmability 23

24 Agenda Introduction/Motivation Why Parallelism? Why now? Survey of Parallel Hardware CPUs vs. GPUs Conclusion How Can I Start? 24

25 What is a CPU? CPU SR71 Jet Capacity 2 passengers Top Speed 2200 mph 25

26 What is the GPU? GPU Boeing 747 Capacity 605 passengers Top Speed 570 mph 26

27 CPU vs. GPU 27 Capacity (passengers) Speed (mph) Throughput (passengers * mph) “CPU” Fighter Jet 222004400 “GPU” 747 452555250,860

28 CPU Architecture Latency Oriented (Speculation) 28

29 GPU Architecture 29

30 APU = CPU + GPU Accelerated Processing Unit Both CPU + GPU on the same die 30

31 CPUs, GPUs, APUs How to handle parallelism? How to extract performance? Can I just throw processors at a problem? 31

32 CPUs, GPUs, APUs Multi-threading (2-16 threads) Massive multi-threading (100,000+) Depends on Your Problem 32

33 Agenda Introduction/Motivation Why Parallelism? Why now? Survey of Parallel Hardware CPUs vs. GPUs Conclusion How Can I Start? 33

34 How Can I start? CUDA Programming You most likely have a CUDA enabled GPU if you have a recent NVIDIA card 34

35 How Can I start? CPU or GPU Programming Use OpenCL (your laptop could potentially run) 35

36 How Can I start? Undergraduate research Senior/Grad Courses: CS 4234 – Parallel Computation CS 5510 – Multiprocessor Programming ECE 4504/5504 – Computer Architecture CS 5984 – Advanced Computer Graphics 36

37 In Summary … Parallelism is here to stay How does this affect you? How fast is fast enough? Are we content with current computer performance? 37

38 Thank you! Carlo del Mundo, Senior, Computer Engineering Website: http://filebox.vt.edu/users/cdel/ E-mail: cdel@vt.edu 38 Previous Internships @

39 Appendix 39

40 Programming Models pthreads MPI CUDA OpenCL 40

41 pthreads A UNIX API to create and destroy threads 41

42 MPI A communications protocol “Send and Receive” messages between nodes 42

43 CUDA Massive multi- threading (100,000+) Thread- level parallelism 43

44 OpenCL Heterogeneous programming model that is catered to several devices (CPUs, GPUs, APUs) 44

45 Comparisons pthreadsMPICUDAOpenCL Number Threads 2-16--100,000+2 – 100,000+ PlatformCPU onlyAny PlatformNVIDIA OnlyAny Platform Productivity † EasyMediumHard Parallelism through ThreadsMessagesThreads † Productivity is subjective and draws from my experiences

46 Parallel Applications Vector Add Matrix Multiplication 46

47 Vector Add 47

48 Vector Add Serial Loop N times N cycles † Parallel Assume you have N cores 1 cycles † 48 † Assume 1 add = 1 cycle

49 Matrix Multiplication 49

50 Matrix Multiplication 50

51 Matrix Multiplication 51

52 Matrix Multiplication Embarassingly Parallel Let L be the length of each side L^2 elements, each element requires L multiplies and L adds 52

53 Performance Operations/Second (FLOPS) Power (W) Throughput (# things/unit time) FLOPS/W 53

54 Puss In Boots 54 Renders that took hours now take minutes - Ken Mueseth, Effects R&D Supervisor DreamWorks Animation

55 Computational Finance Black-Scholes – A PDE which governs the price of an option essentially “eliminating” risk 55

56 Genome Sequencing Knowledge of the human genome can provide insights to new medicine and biotechnology E.g.: genetic engineering, hybridization 56

57 Applications 57

58 Why Should You Care? Trends: CPU Core Counts Double Every 2 years 2006 – 2 cores, AMD Athlon 64 X2 2010 – 8-12 cores, AMD Magny Cours Power Wall 58

59 Then And Now Today’s state-of-the-art hardware is yesterday’s supercomputer 1998 – Intel TFLOPS supercomputer 1.8 trillion floating point ops / sec (1.8 TFLOP) 2008 – AMD Radeon 4870 GPU x2 2400 trilliion floating point ops / sec (2.4 TFLOP) 59


Download ppt "Carlo del Mundo Department of Electrical and Computer Engineering Ubiquitous Parallelism Are You Equipped To Code For Multi- and Many- Core Platforms?"

Similar presentations


Ads by Google