Presentation is loading. Please wait.

Presentation is loading. Please wait.

ICAL GPU 架構中所提供分散式運算 之功能與限制. 11/17/09ICAL2 Outline Parallel computing with GPU NVIDIA CUDA SVD matrix computation Conclusion.

Similar presentations


Presentation on theme: "ICAL GPU 架構中所提供分散式運算 之功能與限制. 11/17/09ICAL2 Outline Parallel computing with GPU NVIDIA CUDA SVD matrix computation Conclusion."— Presentation transcript:

1 ICAL GPU 架構中所提供分散式運算 之功能與限制

2 11/17/09ICAL2 Outline Parallel computing with GPU NVIDIA CUDA SVD matrix computation Conclusion

3 11/17/09ICAL3 Parallel computing with GPU Parallel computing Flynn’s Taxonomy Algorithm decomposed Amdahl’s Law Correctness concepts

4 11/17/09ICAL4 Parallel computing Parallel computing is a form of computation in which many calculations are carried out simultaneously. Parallel computers hardware: –Single machine: multi-core CPU, GPU –Multiple machines: clusters, MPPs, grid

5 11/17/09ICAL5 Parallel computing (cont.) There are several kinds of parallel computing, such as: –Bit-level –Instruction level –Data decomposed –Task decomposed The parallel computing has the speedup limit.

6 11/17/09ICAL6 Algorithm decomposed Task decompositionData Decomposition Prepared the Dinner Enjoy the dinner Cooking Cleaning table Purchasing John clean the tableMary go shopping Wishing dishes John and Mary wishing dishes

7 11/17/09ICAL7 Flynn’s Taxonomy Data Instruction SingleMultiple Single Multiple SISD MISD SIMD MIMD

8 11/17/09ICAL8 Amdahl’s Law Amdahl's law is a model for the expected speedup from partial improvement P: Parallel Portion S: Speedup of parallel portion

9 11/17/09ICAL9 Correctness concepts Race conditionDeadlock …… a=19 …… Read a save a=21 …… a=20 …… save a=a+1 ERROR!

10 11/17/09ICAL10 NVIDIA CUDA Historical Trends CUDA Programming Languages Reported Speedup

11 11/17/09ICAL11 Historical Trends

12 11/17/09ICAL12 CUDA Compute Unified Device Architecture, CUDA CUDA is a computing engine in NVIDIA GPU (graphics processing units)

13 11/17/09ICAL13 Programming Languages Application C/C++FortranOpenCL...... NVIDIA GPU with the CUDA Parallel Computing Architecture

14 11/17/09ICAL14 Reported Speedup

15 11/17/09ICAL15 CUDA Architecture Physical Reality behind CUDA CUDA Architectures Introducing the “Fermi” Architecture SM Architecture CUDA Core Architecture

16 11/17/09ICAL16 Physical Reality behind CUDA CPU (host) GPU (device) Main Memory

17 11/17/09ICAL17 CUDA Architectures G80 –First CUDA-capable processor G8x, G9x –Global memory GT200 –Double precision –Shared memory –Larger register file –Relaxed memory coalescing rules Basic CUDA architecture

18 11/17/09ICAL18 “Fermi” Architecture 3 billion transistors Over 2x the cores (512 total) 8x the peak DP performance L1 and L2 caches ~2x memory bandwidth Up to 1 terabyte of GPU memory

19 11/17/09ICAL19 SM Architecture 32 CUDA cores per SM (Streaming Multiprocessor) 8x peak double precision floating point performance Dual Thread Scheduler 64 KB of RAM for shared memory and L1 cache

20 11/17/09ICAL20 CUDA Core Architecture New IEEE 754-2008 floating-point standard Fused multiply-add (FMA) instruction for both single and double precision Newly designed integer ALC optimized for 64-bit and extended precision operations

21 11/17/09ICAL21 SVD matrix computation SVD SVD matrix computation Experiment Datasets Experiment Environment Experiment Results

22 11/17/09ICAL22 SVD The singular value decomposition (SVD) is an important factorization of matrix, with many applications in signal processing and statistics. Suppose M is an m-by-n matrix, then there exists a factorization of the form.

23 11/17/09 SVD matrix computation ImageRGB pixel matrix SVD Matrix

24 11/17/09 Experiment Datasets 3 test images RBG full color 1024x1024 total 1048576 pixels

25 11/17/09 Experiment Environment GPU Device NVIDA Geforce 9600 GSO Cores96 Processor Clock 1375 MHz Standard Memory 384 MB Memory Bandwidth 38.4 GB/sec CPU Device Intel Core2 Quad Q9300 Cores4 Processor Clock 2.5 GHz FSB speed1333 MHz L2 Cache6 MB

26 11/17/09 Experiment Results

27 11/17/09ICAL27 Conclusion Using GPU to improve the program speed is feasible. NVIDIA CUDA is good with SIMD parallel computing. But there are additional costs about Data passing between main memory and GPU memory.


Download ppt "ICAL GPU 架構中所提供分散式運算 之功能與限制. 11/17/09ICAL2 Outline Parallel computing with GPU NVIDIA CUDA SVD matrix computation Conclusion."

Similar presentations


Ads by Google