Presentation is loading. Please wait.

Presentation is loading. Please wait.

Graphics on Key by Eyal Sarfati and Eran Gilat Supervised by Prof. Shmuel Wimer, Amnon Stanislavsky and Mike Sumszyk 1.

Similar presentations


Presentation on theme: "Graphics on Key by Eyal Sarfati and Eran Gilat Supervised by Prof. Shmuel Wimer, Amnon Stanislavsky and Mike Sumszyk 1."— Presentation transcript:

1 Graphics on Key by Eyal Sarfati and Eran Gilat Supervised by Prof. Shmuel Wimer, Amnon Stanislavsky and Mike Sumszyk 1

2 Overview Motivation Algorithm Improvements Software simulation GPU VLSI Design GoK system design Challenges and contributions Summary Demo 2

3 Motivation GPU (Graphics Processing Unit) is the key for high- performance in graphics applications (games, flight simulations, virtual worlds, etc.) Mobile systems (e.g. cellphones, handheld devices…) lack a suitable GPU 3 GoK External GPU with a standard interface can significantly enhance graphic performance of systems with limited computing resources

4 Project Goal Develop a low-cost prototype which performs 3D animation and displays it on a 2D RGB screen. USB VGA GoK 4 Host Standard interface for data input/output Provides real time graphics processing to systems with limited computing resources

5 Project Stages Software Design Implementing algorithm in Matlab Simulation and analysis Adaptation of algorithm to hardware ASIC Design Architectural design Implementation in VHDL Synthesis and layout System Design Implementation of system blocks including SW and HW interfaces System integration System performance enhancement 5

6 Graphic Animation Elementary operations : Translation Rotation Scaling 6 3D Data Representation Series of triangles α β γ Each triangle is represented by:  3 vertices  3 RGB vectors  1 normal vector

7 Rendering Algorithm stages [Wimer] Rendering Algorithm stages [Wimer] Elementary transformations Four transformations are executed for every triangle: Three matrix multiplications for vertex co-ordinates One matrix multiplication for normal vector 7 1 2 Projection of triangles on viewing plane Composed of 2 stages : Transformation from 3D to 2D (projection) Transformation from real co-ordinates to screen co-ordinates Determine potential triangle visibility Hidden triangles are discarded on the basis of their normal direction  This detection reduces the processed data by 50%

8 Algorithm Details Algorithm Details Determine projected triangle’s visibility Scan all points and compare their depth with depth of previously saved points Scan in 3D space using inverse transformation 8 II I Color of visible points Compute pixel color from the RGB vector and the current lighting vector Using mathematical average for all the pixels inside triangles rather than linear interpolation To increase efficiency : Split triangles Increase parallelism

9 MATLAB Simulation Matlab implementation of rendering algorithm [Wimer] 9 Run Time on Arm based processor : 16 seconds Run Time on Matlab based software : 1 hour

10 System Overview 10 GoK Concept USB VGA GoK Prototype Host

11 GPU Architecture Design Principles Design Goal: maximize throughput Use parallel architecture to overcome bottlenecks Minimize expensive memory accesses Optimize accuracy for fast calculations 11

12 Prefetch & Visibility Detection Unit 3D Transformation Unit Triangle pre-processor FIFO task queue Rasterization 10 Rasterization 1 Rasterization 0 Scheduler Unit Z-Buffer Arbiter Snooping Cache RGB Arbiter Snooping Cache Triangles RGB Frame Z-Buffer GPU Architecture Z RGBRGBRGBRGB

13 Sort Coordinates according to y axis Triangle slopes calculation Create 2 half triangles D calculation FIFO -1 / C RGB Color Set Vertex / Normal Transform Project Triangle Transformation and Pre-processor 13 3D Transformation Unit Triangle pre-processor Note : Early elimination of invisible triangles reduces load by 50% !

14 Prefetch & Visibility Detection Unit 3D Transformation Unit Triangle pre-processor FIFO task queue Rasterization 10 Rasterization 1 Rasterization 0 Scheduler Unit Z-Buffer Arbiter Snooping Cache RGB Arbiter Snooping Cache Triangles RGB Frame Z-Buffer GPU Architecture Z RGBRGBRGBRGB

15 FIFO Task Queue Stalls input stream to prevent overflow by means of a backward communication protocol Backwards communication permeable to the Prefetch and Visibility Detection Unit 15 Triangle pre- processor FIFO task queue Scheduler Unit Target : Maximize throughput  Minimize idle time of rasterization units  Immediately issue next half triangle for processing upon completion of processing previous triangle FIFO task queue Rasterization 10 Rasterization 1 Rasterization 0 Scheduler Unit

16 Prefetch & Visibility Detection Unit 3D Transformation Unit Triangle pre-processor FIFO task queue Rasterization 10 Rasterization 1 Rasterization 0 Scheduler Unit Z-Buffer Arbiter Snooping Cache RGB Arbiter Snooping Cache Triangles RGB Frame Z-Buffer GPU Architecture Z RGBRGBRGBRGB

17 Rasterization Units For each point of each half triangle: 1. Calculate the new Z value 2. Read the stored Z value and compare it with the calculated one 3. Update both the Z-Buffer and RGB Frame Buffer accordingly 17 Rasterization 10 Rasterization 1 Rasterization 0 Scheduler Unit Z-Buffer Arbiter Snooping Cache RGB Arbiter Snooping Cache

18 Multi Core Architecture Problem 18 Multi core architecture with shared memory must cope with: 1. Efficient management of multiple requests to the shared memory 2. Guaranteeing data coherency  Solution : Arbiter Snooping Multi Cache Rasterization 10 Rasterization 1 Rasterization 0 RGB Frame Z-Buffer Z RGBRGBRGBRGB

19 Arbiter Snooping Multi Cache (ASMC) Reduce memory access time   Cache memory Simultaneous multiple memory access requests   Arbiter for efficient memory access management Data Coherency   Add Snooping mechanism to cache to guarantee data coherency Shared Memory 19 Rasterization 10 Rasterization 1 Rasterization 0 Snooping Multi - Cache Arbiter Deadlock Using Snooping mechanism Using Watchdog mechanism

20 GPU ASIC Implementation 20 Technology : 65ns CMOS 8LM Clock frequency : 300Mhz Core area : 2.25 mm 2 Power consumption : Approx. 130mW @ 300Mhz USB Host can supply up to 400mW

21 GoK System Requirements Input: The data is sent by the host to the GoK in two stages: 1. 1. Initialization : a list of triangles are sent to the GoK 2. 2. Animation : a transformation for all triangles is sent to the GoK every 40 msec (25 FPS) Output: Real-time object animation at : 1. 1. 160x120 pixels resolution 2. 2. 120,000 triangles/sec 3. 3. 25 frames/sec 21

22 FPGA USB System Overview - SoPC System Controller Communication Bus USB Controller Memory Controller VGA Controller 22 ASMCProcessor GPU Host GPU

23 Summary 23

24 Challenges Matlab implementation and simulation for detailed investigation and evaluation of algorithm VLSI design and implementation of an efficient architecture (with maximum parallelism) for GPU algorithm Real-time embedded system design on FPGA NIOS II, USB1.1, DDR2, VGA, Avalon Bus, Software drivers & code GPU integration in the system Modification of USB1.1 driver for acceptable reliability of data transfer Modification of standard VGA interface core to enable 100Mhz GPU core to interface with 50Mhz VGA unit 24

25 Main Contributions Enhancement of algorithm for increased performance Early elimination of invisible triangles - 50% computation reduction Splitting of triangles to reduce computation complexity and increase parallelism Simplification of pixel color computation Pre-process the triangles data for fast rasterization computation Efficient scheduling of half triangles to rasterization units Design and implementation of arbiter snooping multi cache Shared memory management, cache memory, data coherency Double memory buffer for continuous motion of animation 25

26 The Bottom Line Implementation of a “Graphics on Key” that enhances the graphic performance of low power, low cost gadgets The device performs the required computations and displays the animation on screen Project required specifications : 120,000 triangles/sec @ 160X120 resolution. 26 Achieved performance : 1,000,000 triangles/sec @ 640X480 resolution. Approx. 25mW @ 50Mhz

27 Demonstration 27


Download ppt "Graphics on Key by Eyal Sarfati and Eran Gilat Supervised by Prof. Shmuel Wimer, Amnon Stanislavsky and Mike Sumszyk 1."

Similar presentations


Ads by Google