Presentation is loading. Please wait.

Presentation is loading. Please wait.

Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

Similar presentations


Presentation on theme: "Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,"— Presentation transcript:

1 Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010, Saint-Malo, France

2 Mythbusters view on the topic CPU vs GPU CPU vs GPU http://videosift.com/video/MythBusters-CPU-vs- GPU-or-Paintball-Cannons-are-Cool http://videosift.com/video/MythBusters-CPU-vs- GPU-or-Paintball-Cannons-are-Cool http://videosift.com/video/MythBusters-CPU-vs- GPU-or-Paintball-Cannons-are-Cool http://videosift.com/video/MythBusters-CPU-vs- GPU-or-Paintball-Cannons-are-Cool Full movie: Full movie: http://www.nvidia.com/object/nvision08_gpu_v_cpu.ht ml http://www.nvidia.com/object/nvision08_gpu_v_cpu.ht ml http://www.nvidia.com/object/nvision08_gpu_v_cpu.ht ml http://www.nvidia.com/object/nvision08_gpu_v_cpu.ht ml

3 The Initial Claim Over the past 4 years NVIDIA has made a great many claims regarding how porting various types of applications to run on GPUs instead of CPUs can tremendously improve performance by anywhere from 10x to 500x. Over the past 4 years NVIDIA has made a great many claims regarding how porting various types of applications to run on GPUs instead of CPUs can tremendously improve performance by anywhere from 10x to 500x. But it actually began much earlier (SIGGRAPH 2004) But it actually began much earlier (SIGGRAPH 2004) http://pl887.pairlitesite.com/talks/2004-08-08- GP2-CPU-vs-GPU-BillMark.pdf http://pl887.pairlitesite.com/talks/2004-08-08- GP2-CPU-vs-GPU-BillMark.pdf http://pl887.pairlitesite.com/talks/2004-08-08- GP2-CPU-vs-GPU-BillMark.pdf http://pl887.pairlitesite.com/talks/2004-08-08- GP2-CPU-vs-GPU-BillMark.pdf

4 Intel’s Response? Intel, unsurprisingly, sees the situation differently, but has remained relatively quiet on the issue, possibly because Larrabee was going to be positioned as a discrete GPU. Intel, unsurprisingly, sees the situation differently, but has remained relatively quiet on the issue, possibly because Larrabee was going to be positioned as a discrete GPU.

5 Intel’s Response? The recent announcement that Larrabee has been repurposed as an HPC/scientific computing solution may therefore be partially responsible for Intel ramping up an offensive against NVIDIA's claims regarding GPU computing. The recent announcement that Larrabee has been repurposed as an HPC/scientific computing solution may therefore be partially responsible for Intel ramping up an offensive against NVIDIA's claims regarding GPU computing. At the International Symposium On Computer Architecture (ISCA) this June, a team from Intel presented a whitepaper purporting to investigate the real-world performance delta between CPUs and GPUs. At the International Symposium On Computer Architecture (ISCA) this June, a team from Intel presented a whitepaper purporting to investigate the real-world performance delta between CPUs and GPUs. CPUs

6 But before that…. December 16, 2009 December 16, 2009 One month after ISCA’s final papers were due. One month after ISCA’s final papers were due. The Federal Trade Commission filed an antitrust- related lawsuit against Intel Wednesday, accusing the chip maker of deliberately attempting hurt its competition and ultimately consumers. The Federal Trade Commission filed an antitrust- related lawsuit against Intel Wednesday, accusing the chip maker of deliberately attempting hurt its competition and ultimately consumers. antitrust- related lawsuit against Intel Wednesdayantitrust- related lawsuit against Intel Wednesday The Federal Trade Commission's complaint against Intel for alleged anticompetitive practices has a new twist: graphics chips. The Federal Trade Commission's complaint against Intel for alleged anticompetitive practices has a new twist: graphics chips.Federal Trade Commission's complaintFederal Trade Commission's complaint

7 2009 was expensive for Intel The European Commission fined Intel for nearly 1.5 billion USD, The European Commission fined Intel for nearly 1.5 billion USD,European Commission fined IntelEuropean Commission fined Intel the US Federal Trade Commission sued Intel on anti-trust grounds, and the US Federal Trade Commission sued Intel on anti-trust grounds, and US Federal Trade Commission sued IntelUS Federal Trade Commission sued Intel Intel settled with AMD for another 1.25 billion USD. Intel settled with AMD for another 1.25 billion USD. Intel settled with AMD Intel settled with AMD If nothing else it was an expensive year, and while Intel settling with AMD was a significant milestone for the company it was not the end of their troubles. If nothing else it was an expensive year, and while Intel settling with AMD was a significant milestone for the company it was not the end of their troubles.

8 Finally the settlement(s) The EU Fine is still under appeal ($1.45B) The EU Fine is still under appeal ($1.45B) 8/4/2010 Intel Settles with the FTC 8/4/2010 Intel Settles with the FTC Then there is the whole Dell issue…. Then there is the whole Dell issue….

9 So back to the paper, What did Intel Say? Throughput Computing Throughput Computing Kernels Kernels What is a kernel? What is a kernel? Kernels selected: Kernels selected: SGEMM, MC, Conv, FFT, SAXPY, LBM, Solv, SpMV, GJK, Sort, RC, Search, Hist, Bilat SGEMM, MC, Conv, FFT, SAXPY, LBM, Solv, SpMV, GJK, Sort, RC, Search, Hist, Bilat

10 The Hardware selected CPU: CPU: 3.2GHz Core i7-960, 6GB RAM 3.2GHz Core i7-960, 6GB RAM GPU GPU 1.3GHz eVGA GeForce GTX280 w/ 1GB 1.3GHz eVGA GeForce GTX280 w/ 1GB

11 Optimizations: CPU CPU Mutithreading, Mutithreading, cache blocking, and cache blocking, and reorganization of memory accesses for SIMDification reorganization of memory accesses for SIMDification GPU GPU Minimizing global synchronization, and Minimizing global synchronization, and using local shared buffers. using local shared buffers.

12 This even made Slashdot Hardware: Intel, NVIDIA Take Shots At CPU vs. GPU Performance Hardware: Intel, NVIDIA Take Shots At CPU vs. GPU Performance Hardware:Intel, NVIDIA Take Shots At CPU vs. GPU Performance Hardware:Intel, NVIDIA Take Shots At CPU vs. GPU Performance

13 And PCWorld Intel: 2-year-old Nvidia GPU Outperforms 3.2GHz Core I7 Intel: 2-year-old Nvidia GPU Outperforms 3.2GHz Core I7 Intel researchers have published the results of a performance comparison between their latest quad- core Core i7 processor and a two-year-old Nvidia graphics card, and found that the Intel processor can't match the graphics chip's parallel processing performance. Intel researchers have published the results of a performance comparison between their latest quad- core Core i7 processor and a two-year-old Nvidia graphics card, and found that the Intel processor can't match the graphics chip's parallel processing performance. http://www.pcworld.com/article/199758/intel_2ye arold_nvidia_gpu_outperforms_32ghz_core_i7.html http://www.pcworld.com/article/199758/intel_2ye arold_nvidia_gpu_outperforms_32ghz_core_i7.html http://www.pcworld.com/article/199758/intel_2ye arold_nvidia_gpu_outperforms_32ghz_core_i7.html http://www.pcworld.com/article/199758/intel_2ye arold_nvidia_gpu_outperforms_32ghz_core_i7.html

14 From the paper's abstract: In the past few years there have been many studies claiming GPUs deliver substantial speedups...over multi-core CPUs...[W]e perform a rigorous performance analysis and find that after applying optimizations appropriate for both CPUs and GPUs the performance gap between an Nvidia GTX280 processor and the Intel Core i7 960 processor narrows to only 2.5x on average. In the past few years there have been many studies claiming GPUs deliver substantial speedups...over multi-core CPUs...[W]e perform a rigorous performance analysis and find that after applying optimizations appropriate for both CPUs and GPUs the performance gap between an Nvidia GTX280 processor and the Intel Core i7 960 processor narrows to only 2.5x on average. Do you have a problem with this statement? Do you have a problem with this statement?

15 Intel's own paper indirectly raises a question when it notes: Intel's own paper indirectly raises a question when it notes: The previously reported LBM number on GPUs claims 114X speedup over CPUs. However, we found that with careful multithreading, reorganization of memory access patterns, and SIMD optimizations, the performance on both CPUs and GPUs is limited by memory bandwidth and the gap is reduced to only 5X. The previously reported LBM number on GPUs claims 114X speedup over CPUs. However, we found that with careful multithreading, reorganization of memory access patterns, and SIMD optimizations, the performance on both CPUs and GPUs is limited by memory bandwidth and the gap is reduced to only 5X.

16 What is important about the context? The International Symposium on Computer Architecture (ISCA) in Saint-Malo, France, interestingly enough, is the same event where NVIDIA’s Chief Scientist Bill Dally received the prestigious 2010 Eckert-Mauchly Award for his pioneering work in architecture for parallel computing. The International Symposium on Computer Architecture (ISCA) in Saint-Malo, France, interestingly enough, is the same event where NVIDIA’s Chief Scientist Bill Dally received the prestigious 2010 Eckert-Mauchly Award for his pioneering work in architecture for parallel computing.2010 Eckert-Mauchly Award2010 Eckert-Mauchly Award

17 NVIDIA Blog Response: It’s a rare day in the world of technology when a company you compete with stands up at an important conference and declares that your technology is *only* up to 14 times faster than theirs. It’s a rare day in the world of technology when a company you compete with stands up at an important conference and declares that your technology is *only* up to 14 times faster than theirs. http://blogs.nvidia.com/blog/2010/06/23/gpus-are- only-up-to-14-times-faster-than-cpus-says-intel/ http://blogs.nvidia.com/blog/2010/06/23/gpus-are- only-up-to-14-times-faster-than-cpus-says-intel/ http://blogs.nvidia.com/blog/2010/06/23/gpus-are- only-up-to-14-times-faster-than-cpus-says-intel/ http://blogs.nvidia.com/blog/2010/06/23/gpus-are- only-up-to-14-times-faster-than-cpus-says-intel/

18 NVIDIA Blog Response: (cont) The real myth here is that multi-core CPUs are easy for any developer to use and see performance improvements. The real myth here is that multi-core CPUs are easy for any developer to use and see performance improvements.

19 Undergraduate students learning parallel programming at M.I.T. disputed this when they looked at the performance increase they could get from different processor types and compared this with the amount of time they needed to spend in re-writing their code. Undergraduate students learning parallel programming at M.I.T. disputed this when they looked at the performance increase they could get from different processor types and compared this with the amount of time they needed to spend in re-writing their code. According to them, for the same investment of time as coding for a CPU, they could get more than 35x the performance from a GPU. According to them, for the same investment of time as coding for a CPU, they could get more than 35x the performance from a GPU.

20 Despite substantial investments in parallel computing tools and libraries, efficient multi- core optimization remains in the realm of experts like those Intel recruited for its analysis. Despite substantial investments in parallel computing tools and libraries, efficient multi- core optimization remains in the realm of experts like those Intel recruited for its analysis. In contrast, the CUDA parallel computing architecture from NVIDIA is a little over 3 years old and already hundreds of consumer, professional and scientific applications are seeing speedups ranging from 10 to 100x using NVIDIA GPUs. In contrast, the CUDA parallel computing architecture from NVIDIA is a little over 3 years old and already hundreds of consumer, professional and scientific applications are seeing speedups ranging from 10 to 100x using NVIDIA GPUs.consumer professionalscientificconsumer professionalscientific

21 Questions Where did the 2.5x, 5x, and 14x come from? Where did the 2.5x, 5x, and 14x come from? How big were the problems that Intel used for comparisons? [compare w/ cache size] How big were the problems that Intel used for comparisons? [compare w/ cache size] How were they selected? How were they selected? What optimizations were done? What optimizations were done?

22 Fermi cards were almost certainly unavailable when Intel commenced its project, but it's still worth noting that some of the GF100's architectural advances partially address (or at least alleviate) certain performance-limiting handicaps Intel points to when comparing Nehalem to a GT200 processor. Fermi cards were almost certainly unavailable when Intel commenced its project, but it's still worth noting that some of the GF100's architectural advances partially address (or at least alleviate) certain performance-limiting handicaps Intel points to when comparing Nehalem to a GT200 processor.

23 Bottom Line Parallelization is hard, whether you're working with a quad-core x86 CPU or a 240-core GPU; each architecture has strengths and weaknesses that make it better or worse at handling certain kinds of workloads. Parallelization is hard, whether you're working with a quad-core x86 CPU or a 240-core GPU; each architecture has strengths and weaknesses that make it better or worse at handling certain kinds of workloads.

24 Other Reading http://www.usenix.org/event/hotpar10/tech/f ull_papers/Vuduc.pdf On the Limits of GPU Acceleration http://www.usenix.org/event/hotpar10/tech/f ull_papers/Vuduc.pdf http://www.usenix.org/event/hotpar10/tech/f ull_papers/Vuduc.pdf http://www.usenix.org/event/hotpar10/tech/f ull_papers/Vuduc.pdf


Download ppt "Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,"

Similar presentations


Ads by Google