Presentation is loading. Please wait.

Presentation is loading. Please wait.

XMT-GPU A PRAM Architecture for Graphics Computation Tom DuBois, Bryant Lee, Yi Wang, Marc Olano and Uzi Vishkin.

Similar presentations


Presentation on theme: "XMT-GPU A PRAM Architecture for Graphics Computation Tom DuBois, Bryant Lee, Yi Wang, Marc Olano and Uzi Vishkin."— Presentation transcript:

1 XMT-GPU A PRAM Architecture for Graphics Computation Tom DuBois, Bryant Lee, Yi Wang, Marc Olano and Uzi Vishkin

2 Bifurcation Between CPU and GPU CPUs General purpose, serial General purpose, serialGPUs Special purpose, parallel Special purpose, parallel CPUs are becoming more parallel Dual and quad cores, roadmaps predict many-cores Dual and quad cores, roadmaps predict many-cores Unclear how to build or program these many-cores Unclear how to build or program these many-cores GPUs more general Gaining momentum for more general apps Gaining momentum for more general apps

3 Is Unification Possible? Can a single general purpose, many-core processor replace a CPU + GPU? It must be Standalone – no coprocessor needed Standalone – no coprocessor needed Easy and flexible to program Easy and flexible to program Competitive on anything with CPUs Competitive on anything with CPUs Competitive on graphics with GPUs Competitive on graphics with GPUs We choose a unification candidate (XMT) in part because it satisfies the first 3 During the Q&A session we welcome your thoughts on what else could be used

4 Main Experiment and Results Can XMT satisfy the 4 th ? Simulate surface shading (a common graphics app) on GP and GPU representatives Mixed results XMT slightly faster on some GPU tasks XMT slightly faster on some GPU tasks GPUs significantly faster on others GPUs significantly faster on others Unification unlikely, but momentum may shift towards GP

5 CPU History Overview Serial random access machine programming model Great success story, dominant model for decades Great success story, dominant model for decades Popular for both theory and practice Popular for both theory and practice Relies on faster serial hardware for performance gains - no longer sufficient Relies on faster serial hardware for performance gains - no longer sufficient Multi-cores available (2-4 cores/chip) Many-cores on horizon (100’s or 1000’s); but how will they looks?

6 What Will Future CPUs Look Like? Future from major vendors unclear Proposals try to look like serial RAM to programmers Proposals try to look like serial RAM to programmers Long term software spiral broken Long term software spiral broken PRAM (parallel random access machine) Model preferred by programming community Model preferred by programming community natural extension of serial scalable included in major algorithm textbooks Discounted because of difficulty building one Discounted because of difficulty building one Recently building one has become feasible Recently building one has become feasible

7 XMT: eXplicit Multi-Threading PRAM-on-chip vision under development at the University of Maryland since 1997 Targeting ~1000 cores on chip Targeting ~1000 cores on chip PRAM-like programmability PRAM-like programmability On chip shared L1 cache provides memory bandwidth necessary for PRAM On chip shared L1 cache provides memory bandwidth necessary for PRAM Previous work has established XMT’s performance on a variety of applications Previous work has established XMT’s performance on a variety of applications Simulator and FPGA implementations available Simulator and FPGA implementations available www.umiacs.umd.edu/users/vishkin/XMT www.umiacs.umd.edu/users/vishkin/XMT www.umiacs.umd.edu/users/vishkin/XMT

8 Programming XMT XMTC: Single-program multiple-data (SPMD) extension of standard C which resembles CRCW PRAM Spawning creates lightweight, asynchronous threads, serial execution resumes once all threads complete

9 XMT FPGA In Use 64 Processor, 75MHz prototype Used in undergrad theory class (also: non-major Freshmen and 35 high-school students) 6 significant projects 6 significant projects No architecture discussion, minimal XMTC discussion No architecture discussion, minimal XMTC discussion

10 GPU History Overview Stream programming model Streams and kernels: simple and easily exploits locality for some tasks Streams and kernels: simple and easily exploits locality for some tasks Handles irregular, fine-grained, and serial code poorly Handles irregular, fine-grained, and serial code poorly Originally very inflexible “Programming” meant setting bits for Muxes “Programming” meant setting bits for Muxes Modern GPUs are much more flexible C like languages C like languages GPGPU GPGPU Still tied to stream model Still tied to stream model

11 Very High Level GPU Pipeline Old pipelined architecture New virtual pipeline architecture Vertex Processing Fragment Processing Pixel Processing Computation units capable of vertex, fragment, pixel, and other processing Flow Control

12 More Detailed Modern GPU From NVIDIA GeForce 8800 GPU Architecture Overview Virtual pipelined Virtual pipelined

13 GeForce and XMT Similarities

14 Clusters of processors, functional units

15 GeForce and XMT Similarities On chip memory and access network

16 GeForce and XMT Similarities Control Logic

17 Does XMT Meet Unification Requirements? Unification requirements Ability to stand alone Ability to stand alone Easy and flexible to program Easy and flexible to program Must perform GP tasks well Must perform GP tasks well Competitive with modern GPUs Competitive with modern GPUs Graphics performance only unknown GPUs are a unique competitor Successful, commodity, parallel hardware Successful, commodity, parallel hardware

18 Our Experiment Simulated XMT system vs. several real GPUs on fragment shading Compute shading – memory light, general Compute shading – memory light, general Texture shading – memory heavy, specialized Texture shading – memory heavy, specialized Only fragment shading stage compared

19 Simulating Fragment Shading‏ Simulated XMT used in place of the fragment shading step in software Application Display Mesa OpenGL Vertex Processing Rasterization Fragment Shading Other Fragment Ops XMT Fragment Shading Program XMT Simulator

20 The Competitors *Scaled for the same FLOPS level as GeForce 8800 Simulated XMT* Released mid 2006NVidia GeForce 7900 Released late 2006NVidia GeForce 8800 Released mid 2004ATI x700 Simulated XMT* Released mid 2006NVidia GeForce 7900 Released late 2006NVidia GeForce 8800 Released mid 2004ATI x700 Simulated XMT* Released mid 2006NVidia GeForce 7900 Released late 2006NVidia GeForce 8800 Released mid 2004ATI x700 Simulated XMT* Released mid 2006NVidia GeForce 7900 Released late 2006NVidia GeForce 8800 Released mid 2004ATI x700

21 XMT Variants We used 3 variants of XMT Version 1 - unmodified Version 1 - unmodified Version 2 - with graphics ISA Version 2 - with graphics ISA Floor, Fraction, Linear Interpolation Version 3 – graphics ISA and 4-way vector computations on 8-bit arithmetic Version 3 – graphics ISA and 4-way vector computations on 8-bit arithmetic

22 Compute Shader Results 3222 FPS XMT version 2 1423 FPSXMT version 1 2760 FPS GeForce 8800 1917 FPSGeForce 7900 354 FPSx700

23 Texture Shader Results 275 FPSXMT version 2 487 FPS XMT version 3 197 FPSXMT version 1 3179 FPS GeForce 8800 8632 FPS GeForce 7900 1846 FPSx700

24 Analysis XMT compute shades faster XMT texture shades much slower Acceptable for some apps, not all Acceptable for some apps, not all The GeForce GPUs follow the same trend Sacrificing speed on most used apps for greater flexibility on others Sacrificing speed on most used apps for greater flexibility on others

25 Summary Divide between CPU/GPU is blurring A unified system gives Ease of programming Ease of programming Good GP performance Good GP performance Good graphics performance Good graphics performance Combination systems needed for truly high performance apps An XMT + GPU system could provide the best of both worlds An XMT + GPU system could provide the best of both worlds How to partition between them? Can they cooperate? How to partition between them? Can they cooperate?www.umiacs.umd.edu/users/vishkin/XMT


Download ppt "XMT-GPU A PRAM Architecture for Graphics Computation Tom DuBois, Bryant Lee, Yi Wang, Marc Olano and Uzi Vishkin."

Similar presentations


Ads by Google