Download presentation

Presentation is loading. Please wait.

Published byYadiel Gardiner Modified over 2 years ago

1
Daniel Cederman and Philippas Tsigas Distributed Computing and Systems Chalmers University of Technology

2
System Model The Algorithm Experiments

3
System Model The Algorithm Experiments

4
Normal processorsGraphics processors Multi-core Large cache Few threads Speculative execution Many-core Small cache Wide and fast memory bus Thousands of threads hides memory latency

5
Designed for general purpose computation Extensions to C/C++

6
Global Memory

7
Multiprocessor 0 Multiprocessor 1 SharedMemorySharedMemory

8
Global Memory Multiprocessor 0 Multiprocessor 1 SharedMemorySharedMemory Thread Block 0 Thread Block 1 Thread Block x Thread Block y

9
Minimal, manual cache 32-word SIMD instruction Coalesced memory access No block synchronization Expensive synchronization primitives

10
System Model The Algorithm Experiments

11

12
Data Parallel!

13
NOT Data Parallel! NOT Data Parallel!

14
Phase One Sequences divided Block synchronization on the CPU Phase One Sequences divided Block synchronization on the CPU Thread Block 1Thread Block 2

15
Phase Two Each block is assigned own sequence Run entirely on GPU Phase Two Each block is assigned own sequence Run entirely on GPU Thread Block 1Thread Block 2

16

17
Assign Thread Blocks

18
Uses an auxiliary array In-place too expensive

19
Fetch-And-AddonLeftPointerFetch-And-AddonLeftPointer LeftPointer = 0 Thread Block ONE Thread Block TWO 1 0

20
Fetch-And-AddonLeftPointerFetch-And-AddonLeftPointer LeftPointer = 2 Thread Block ONE Thread Block TWO

21
Fetch-And-AddonLeftPointerFetch-And-AddonLeftPointer LeftPointer = 4 Thread Block ONE Thread Block TWO

22
LeftPointer =

23
Three way partition

24
< > = > <<< > < > < = < > < >>> < > = < >> ===

25
Recurse on smallest sequence ◦ Max depth log(n) Explicit stack Alternative sorting ◦ Bitonic sort

26
System Model The Algorithm Experiments

27
GPU-Quicksort GPU-QuicksortCederman and Tsigas, ESA08 GPUSort GPUSortGovindaraju et. al., SIGMOD05 Radix-Merge Radix-MergeHarris et. al., GPU Gems 3 ’07 Global radix Global radixSengupta et. al., GH07 Hybrid HybridSintorn and Assarsson, GPGPU07 Introsort Introsort David Musser, Software: Practice and Experience

28
Uniform Gaussian Bucket Sorted Zero Stanford Models

29
Uniform Gaussian Bucket Sorted Zero Stanford Models

30
Uniform Gaussian Bucket Sorted Zero Stanford Models

31
Uniform Gaussian Bucket Sorted Zero Stanford Models

32
Uniform Gaussian Bucket Sorted Zero Stanford Models

33
Uniform Gaussian Bucket Sorted Zero Stanford Models P x1x1x1x1 x2x2x2x2 x3x3x3x3 x4x4x4x4

34
First Phase ◦ Average of maximum and minimum element Second Phase ◦ Median of first, middle and last element

35
8800GTX ◦ 16 multiprocessors (128 cores) ◦ 86.4 GB/s bandwidth 8600GTS ◦ 4 multiprocessors (32 cores) ◦ 32 GB/s bandwidth CPU-Reference ◦ AMD Dual-Core Opteron 265 / 1.8 GHz

36

37

38

39
CPU-Reference ~4s CPU-Reference ~4s

40
GPUSort and Radix-Merge ~1.5s GPUSort and Radix-Merge ~1.5s

41
GPU-Quicksort ~380ms Global Radix ~510ms GPU-Quicksort ~380ms Global Radix ~510ms

42

43

44

45
Reference is faster than GPUSort and Radix-Merge

46
GPU-Quicksort takes less than 200ms GPU-Quicksort takes less than 200ms

47

48

49

50
Bitonic is unaffected by distribution

51
Three-way partition

52

53

54

55
Reference equal to Radix-Merge

56
Quicksort and hybrid ~4 times faster

57

58

59

60
Hybrid needs randomization

61
Reference better

62

63
Similar to uniform distribution

64
Sequence needs to be a power of two in length

65
Minimal, manual cache ◦ Used only for prefix sum and bitonic 32-word SIMD instruction ◦ Main part executes same instructions Coalesced memory access ◦ All reads coalesced No block synchronization ◦ Only required in first phase Expensive synchronization primitives ◦ Two passes amortizes cost

66
Quicksort is a viable sorting method for graphics processors and can be implemented in a data parallel way It is competitive

67

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google