A Comparison-FREE SORTING ALGORITHM ON CPUs Saleh Abdel-hafeez, Jordan (JUST) Ann Gordon-Ross, USA (UF) Samer AbuBaker, Jordan (JUST)
Highlights Principle Example Potential Key Factors CPU Simulation Single Threaded (no Parallelism) C-Code (Memory Locality) Execution Time Simulations Multi-threaded (Parallelism) C-Code (Atomic and Semaphore Vs. Memory) Conclusions
Principle Example
Potential Key Factors Two Representations N=2K Computations less Idea Binary One-Hot N=2K Computations less Memory Transpose Memory Mapping Idea Reduce the size of One-Hot (NxN) to NX1 Improve Locality (Spatial and Temporal)
CPU Single Thread
Loop1 Time vs. Loop2 Time (MEMORY LOCALITY)
Dependent Less on Input Distribution
CPU Single thread (Time Simulation)
CPU Single Thread Significant The Fastest Minor Effect on Data Type Distribution One Dimensional Memory Less Computations Easy to work with Less Energy & Power 7 8 10 12 14 16 18 20 22 24 26 28 29 Free-comparison 6 41 145 584 2317 6839 31414 69519 418684 1828644 7654605 16689404 quick 15 30 140 602 2673 11409 47064 148004 456904 1842128 7859271 33662489 68942299
CPU Multiple Threads (8-Threads & 4-Core)
CPU Multiple threaded (TIME)
Execution Time vs. Data Sizes 7 8 10 12 14 16 18 20 22 24 26 28 30 32 34 8-thread 345 333 363 386 1070 2085 7658 17309 58822 234639 1084792 4411107 11969863 32481103 88139858 Non-thread 6 41 145 584 2317 6839 31414 69519 418684 1828644 7654605 60934070 2.22E+08 8.12E+08
Memory Usage
Comparison with Parallel Sorting Algorithms Avoid Mutual Exclusive (Memory Blocked) Use More Memory for threaded Use Atomic for less memory Execution Time (Second) 14 20 24 26 Comparison-Free 0.00107/0.0005 0.002 0.235 1.08 [1]-2011-Bitonic-Sort-CPU&GPU 0.0012 0.076 1.97 2.23 [2]-2010-Intel (Radix) CPU 0.0075 0.025 0.081 0.33 [3]-2009-Invidia (Radix) GPU 0.008 0.031 0.12 0.27
CONCLUSION The Design is novel and is not an incremental of other hybrid sorting algorithms (Future Work); the C-Code is clear and is available Comparison-free: Single-Threaded The fastest for data sizes < 216 Comparison-free: Multi-threaded CPU (Simple 4-Core) fastest at data 220 CPU (Advance Multi-Core) need to investigate GPU (Simple and Advance) need to investigate Use less memory, and expecting less energy