Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

Slides:



Advertisements
Similar presentations
Sven Woop Computer Graphics Lab Saarland University
Advertisements

COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Photon Mapping on Programmable Graphics Hardware Timothy J. Purcell Mike Cammarano Pat Hanrahan Stanford University Craig Donner Henrik Wann Jensen University.
GPUs and GPU Programming Bharadwaj Subramanian, Apollo Ellis Imagery taken from Nvidia Dawn Demo Slide on GPUs, CUDA and Programming Models by Apollo Ellis.
GCAFE 28 Feb Real-time REYES Jeremy Sugerman.
GRAPHICS AND COMPUTING GPUS Jehan-François Pâris
Reducing Shading on GPUs Using Quad-Fragment Merging JAEHYUN CHO
Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.
Extending GRAMPS Shaders Jeremy Sugerman June 2, 2009 FLASHG.
GRAMPS Overview and Design Decisions Jeremy Sugerman February 26, 2009 GCafe.
Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.
Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, Pat Hanrahan Stanford University DARPA Site Visit, UNC.
Programming Many-Core Systems with GRAMPS Jeremy Sugerman 14 May 2010.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
TEMPLATE DESIGN © Sum() is now a Shader stage: An N:1 shader and a graph cycle reduce in place, in parallel. 'Barrier'
GRAMPS: A Programming Model For Graphics Pipelines Jeremy Sugerman, Kayvon Fatahalian, Solomon Boulos, Kurt Akeley, Pat Hanrahan.
GRAMPS: A Programming Model for Graphics Pipelines and Heterogeneous Parallelism Jeremy Sugerman March 5, 2009 EEC277.
Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.
Pixel Shader Vertex Shader The Real-time Graphics Pipeline Input Assembler Rasterizer Output Merger.
GRAMPS Beyond Rendering Jeremy Sugerman 11 December 2009 PPL Retreat.
Hybrid PC architecture Jeremy Sugerman Kayvon Fatahalian.
Many-Core Programming with GRAMPS Jeremy Sugerman Stanford PPL Retreat November 21, 2008.
Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.
Doing More With GRAMPS Jeremy Sugerman 10 December 2009 GCafe.
Further Developing GRAMPS Jeremy Sugerman FLASHG January 27, 2009.
FLASHG 15 Oct Graphics on GRAMPS Jeremy Sugerman Kayvon Fatahalian.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Lecture 1 – Parallel Programming Primer CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed.
CSE 690 General-Purpose Computation on Graphics Hardware (GPGPU) Courtesy David Luebke, University of Virginia.
Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.
CHAPTER 4 Window Creation and Control © 2008 Cengage Learning EMEA.
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
Cg Programming Mapping Computational Concepts to GPUs.
Matrices from HELL Paul Taylor Basic Required Matrices PROJECTION WORLD VIEW.
Raytracing and Global Illumination Intro. to Computer Graphics, CS180, Fall 2008 UC Santa Barbara.
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
Piko: A Framework for Authoring Programmable Graphics Pipelines Anjul Patney and Stanley Tzeng UC Davis and NVIDIA Kerry A. Seitz, Jr. and John D. Owens.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
GPU Architecture and Programming
A Closer Look At GPUs By Kayvon Fatahalian and Mike Houston Presented by Richard Stocker.
Fast BVH Construction on GPUs (Eurographics 2009) Park, Soonchan KAIST (Korea Advanced Institute of Science and Technology)
Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward.
Emergent Game Technologies Gamebryo Element Engine Thread for Performance.
May 8, 2007Farid Harhad and Alaa Shams CS7080 Overview of the GPU Architecture CS7080 Final Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad.
A SEMINAR ON 1 CONTENT 2  The Stream Programming Model  The Stream Programming Model-II  Advantage of Stream Processor  Imagine’s.
Havok FX Physics on NVIDIA GPUs. Copyright © NVIDIA Corporation 2004 What is Effects Physics? Physics-based effects on a massive scale 10,000s of objects.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
1 Saarland University, Germany 2 DFKI Saarbrücken, Germany.
Fateme Hajikarami Spring  What is GPGPU ? ◦ General-Purpose computing on a Graphics Processing Unit ◦ Using graphic hardware for non-graphic computations.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Lecture.
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.
Mapping Computational Concepts to GPUs Mark Harris NVIDIA.
Ray Tracing by GPU Ming Ouhyoung. Outline Introduction Graphics Hardware Streaming Ray Tracing Discussion.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
J++ Machine Jeremy Sugerman Kayvon Fatahalian. Background  Multicore CPUs  Generalized GPUs (Brook, CTM, CUDA)  Tightly coupled traditional CPU (more.
GPU Architecture and Its Application
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Graphics on GPU © David Kirk/NVIDIA and Wen-mei W. Hwu,
The Client/Server Database Environment
Texas Instruments TDA2x and Vision SDK
Graphics Processing Unit
From Turing Machine to Global Illumination
Graphics Processing Unit
Ray Tracing on Programmable Graphics Hardware
CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders
Graphics Processing Unit
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

2 Background, Outline  Stanford Graphics / Architecture Research –Collaborators: Kayvon Fatahalian, Solomon Boulos, Kurt Akeley, Pat Hanrahan  To appear in ACM Transactions on Graphics  CPU, GPU trends… and collision?  Two research areas: –HW/SW Interface, Programming Model –Future Graphics API

3 Problem Statement  Drive efficient development and execution in many- /multi-core systems.  Support homogeneous, heterogeneous cores.  Inform future hardware Status Quo:  GPU Pipeline (Good for GL, otherwise hard)  CPU (No guidance, fast is hard)

4  Software defined graphs  Producer-consumer, data-parallelism  Initial focus on rendering GRAMPS Input Fragment Queue Output Fragment Queue Rasterization Pipeline Ray Tracing Graph = Thread Stage = Shader Stage = Fixed-func Stage = Queue = Stage Output Frame Buffer Ray Queue Ray Hit Queue Fragment Queue CameraIntersect Shade FB Blend Frame Buffer Shade FB Blend Rasterize

5 As a Graphics Evolution  Not (too) radical for ‘graphics’  Like fixed → programmable shading –Pipeline undergoing massive shake up –Diversity of new parameters and use cases  Bigger picture than ‘graphics’ –Rendering is more than GL/D3D –Compute is more than rendering –Some ‘GPUs’ are losing their innate pipeline

6 As a Compute Evolution (1)  Sounds like streaming: Execution graphs, kernels, data-parallelism  Streaming: “squeeze out every FLOP” –Goals: bulk transfer, arithmetic intensity –Intensive static analysis, custom chips (mostly) –Bounded space, data access, execution time

7 As a Compute Evolution (2)  GRAMPS: “interesting apps are irregular” –Goals: Dynamic, data-dependent code –Aggregate work at run-time –Heterogeneous commodity platforms  Naturally allows streaming when applicable

8 GRAMPS’ Role  A ‘graphics pipeline’ is now an app!  GRAMPS models parallel state machines.  Compared to status quo: –More flexible than a GPU pipeline –More guidance than bare metal –Portability in between –Not domain specific

9 GRAMPS Interfaces  Host/Setup: Create execution graph  Thread: Stateful, singleton  Shader: Data-parallel, auto-instanced

GRAMPS Entities (1)  Accessed via windows  Queues: Connect stages, Dynamically sized –Ordered or unordered –Fixed max capacity or spill to memory  Buffers: Random access, Pre-allocated –RO, RW Private, RW Shared (Not Supported)

GRAMPS Entities (2)  Queue Sets: Independent sub-queues –Instanced parallelism plus mutual exclusion –Hard to fake with just multiple queues

12 What We’ve Built (System)

13 GRAMPS Scheduler  Tiered Scheduler  ‘Fat’ cores: per-thread, per-core  ‘Micro’ cores: shared hw scheduler  Top level: tier N

14 What We’ve Built (Apps) Direct3D Pipeline (with Ray-tracing Extension) Ray-tracing Graph IA 1 VS 1 RO Rast Trace IA N VS N PS Frame Buffer Vertex Buffers Sample Queue Set Ray Queue Primitive Queue Input Vertex Queue 1 Primitive Queue 1 Input Vertex Queue N … … OM PS2 Fragment Queue Ray Hit Queue Ray-tracing Extension Primitive Queue N Tiler Shade FB Blend Frame Buffer Sample Queue Tile Queue Ray Queue Ray Hit Queue Fragment Queue Camera Sampler Intersect = Thread Stage = Shader Stage = Fixed-func = Queue = Stage Output = Push Output

15 Initial Results  Queues are small, utilization is good

16 GRAMPS Visualization

17 GRAMPS Visualization

18 GRAMPS Portability  Portability really means performance.  Less portable than GL/D3D –GRAMPS graph is (more) hardware sensitive  More portable than bare metal –Enforces modularity –Best case, just works –Worst case, saves boiler plate

19 High-level Challenges  Is GRAMPS a suitable GPU evolution? –Enable pipeline competitive with bare metal? –Enable innovation: advanced / alternative methods?  Is GRAMPS a good parallel compute model? –Map well to hardware, hardware trends? –Support important apps? –Concepts influence developers?

20 What’s Next: Implementation  Better scheduling –Less bursty, better slot filling –Dynamic priorities –Handle graphs with loops better  More detailed costs –Bill for scheduling decisions –Bill for (internal) synchronization  More statistics

21 What’s Next: Programming Model  Yes: Graph modification (state change)  Probably: Data sharing / ref-counting  Maybe: Blocking inter-stage calls (join)  Maybe: Intra/inter-stage synchronization primitives

22 What’s Next: Possible Workloads  REYES, hybrid graphics pipelines  Image / video processing  Game Physics –Collision detection or particles  Physics and scientific simulation  AI, finance, sort, search or database query, …  Heavy dynamic data manipulation -k-D tree / octree / BVH build -lazy/adaptive/procedural tree or geometry