Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

2 Background, Outline  Stanford Graphics / Architecture Research –Collaborators: Kayvon Fatahalian, Solomon Boulos, Kurt Akeley, Pat Hanrahan  To appear in ACM Transactions on Graphics  CPU, GPU trends… and collision?  Two research areas: –HW/SW Interface, Programming Model –Future Graphics API

3 Problem Statement  Drive efficient development and execution in many- /multi-core systems.  Support homogeneous, heterogeneous cores.  Inform future hardware Status Quo:  GPU Pipeline (Good for GL, otherwise hard)  CPU (No guidance, fast is hard)

4  Software defined graphs  Producer-consumer, data-parallelism  Initial focus on rendering GRAMPS Input Fragment Queue Output Fragment Queue Rasterization Pipeline Ray Tracing Graph = Thread Stage = Shader Stage = Fixed-func Stage = Queue = Stage Output Frame Buffer Ray Queue Ray Hit Queue Fragment Queue CameraIntersect Shade FB Blend Frame Buffer Shade FB Blend Rasterize

5 As a Graphics Evolution  Not (too) radical for ‘graphics’  Like fixed → programmable shading –Pipeline undergoing massive shake up –Diversity of new parameters and use cases  Bigger picture than ‘graphics’ –Rendering is more than GL/D3D –Compute is more than rendering –Some ‘GPUs’ are losing their innate pipeline

6 As a Compute Evolution (1)  Sounds like streaming: Execution graphs, kernels, data-parallelism  Streaming: “squeeze out every FLOP” –Goals: bulk transfer, arithmetic intensity –Intensive static analysis, custom chips (mostly) –Bounded space, data access, execution time

7 As a Compute Evolution (2)  GRAMPS: “interesting apps are irregular” –Goals: Dynamic, data-dependent code –Aggregate work at run-time –Heterogeneous commodity platforms  Naturally allows streaming when applicable

8 GRAMPS’ Role  A ‘graphics pipeline’ is now an app!  GRAMPS models parallel state machines.  Compared to status quo: –More flexible than a GPU pipeline –More guidance than bare metal –Portability in between –Not domain specific

9 GRAMPS Interfaces  Host/Setup: Create execution graph  Thread: Stateful, singleton  Shader: Data-parallel, auto-instanced

GRAMPS Entities (1)  Accessed via windows  Queues: Connect stages, Dynamically sized –Ordered or unordered –Fixed max capacity or spill to memory  Buffers: Random access, Pre-allocated –RO, RW Private, RW Shared (Not Supported)

GRAMPS Entities (2)  Queue Sets: Independent sub-queues –Instanced parallelism plus mutual exclusion –Hard to fake with just multiple queues

12 What We’ve Built (System)

13 GRAMPS Scheduler  Tiered Scheduler  ‘Fat’ cores: per-thread, per-core  ‘Micro’ cores: shared hw scheduler  Top level: tier N

14 What We’ve Built (Apps) Direct3D Pipeline (with Ray-tracing Extension) Ray-tracing Graph IA 1 VS 1 RO Rast Trace IA N VS N PS Frame Buffer Vertex Buffers Sample Queue Set Ray Queue Primitive Queue Input Vertex Queue 1 Primitive Queue 1 Input Vertex Queue N … … OM PS2 Fragment Queue Ray Hit Queue Ray-tracing Extension Primitive Queue N Tiler Shade FB Blend Frame Buffer Sample Queue Tile Queue Ray Queue Ray Hit Queue Fragment Queue Camera Sampler Intersect = Thread Stage = Shader Stage = Fixed-func = Queue = Stage Output = Push Output

15 Initial Results  Queues are small, utilization is good

16 GRAMPS Visualization

17 GRAMPS Visualization

18 GRAMPS Portability  Portability really means performance.  Less portable than GL/D3D –GRAMPS graph is (more) hardware sensitive  More portable than bare metal –Enforces modularity –Best case, just works –Worst case, saves boiler plate

19 High-level Challenges  Is GRAMPS a suitable GPU evolution? –Enable pipeline competitive with bare metal? –Enable innovation: advanced / alternative methods?  Is GRAMPS a good parallel compute model? –Map well to hardware, hardware trends? –Support important apps? –Concepts influence developers?

20 What’s Next: Implementation  Better scheduling –Less bursty, better slot filling –Dynamic priorities –Handle graphs with loops better  More detailed costs –Bill for scheduling decisions –Bill for (internal) synchronization  More statistics

21 What’s Next: Programming Model  Yes: Graph modification (state change)  Probably: Data sharing / ref-counting  Maybe: Blocking inter-stage calls (join)  Maybe: Intra/inter-stage synchronization primitives

22 What’s Next: Possible Workloads  REYES, hybrid graphics pipelines  Image / video processing  Game Physics –Collision detection or particles  Physics and scientific simulation  AI, finance, sort, search or database query, …  Heavy dynamic data manipulation -k-D tree / octree / BVH build -lazy/adaptive/procedural tree or geometry

Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

Similar presentations

Presentation on theme: "Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

Similar presentations

Presentation on theme: "Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008."— Presentation transcript:

Similar presentations

About project

Feedback