Presentation is loading. Please wait.

Presentation is loading. Please wait.

High Dynamic Range Emeka Ezekwe M11 Christopher Thayer M12 Shabnam Aggarwal M13 Charles Fan M14 Manager: Matthew Russo 6/26/2015 1.

Similar presentations


Presentation on theme: "High Dynamic Range Emeka Ezekwe M11 Christopher Thayer M12 Shabnam Aggarwal M13 Charles Fan M14 Manager: Matthew Russo 6/26/2015 1."— Presentation transcript:

1 High Dynamic Range Emeka Ezekwe M11 Christopher Thayer M12 Shabnam Aggarwal M13 Charles Fan M14 Manager: Matthew Russo 6/26/2015 1

2 Agenda 2  Project Description Charles  MarketingShabnam  Behavioral DescriptionEmeka  Design ProcessChris  Floorplan EvolutionShabnam  Design SpecificationsChris  LayoutCharles  ConclusionEmeka

3 Charles Fan Project Description 3

4 4  High Dynamic Range??  Bright colors are BRIGHT  Dark colors are DARK  Details are seen CLEARLY  Otherwise…  Colors and lights look distorted & bland  FP HDR Format requires 48 bits per pixel  Problem: Too much storage space & memory bandwidth!!  Solution: HDR encoding yields 6:1 compression  OUR GOAL: Implement efficient HDR decoding in hardware  6:1 pixel compression  Increases useable storage space by 6 fold  decrease memory bandwidth by 6 fold  Effectively increases performance

5

6 Shabnam Aggarwal Marketing 6

7 7  AMD’s ATI Mobility Radeon X1900  48-bit floating point HDR HDR Compression is currently NOT supported Performance hit deters developers  Windows Vista also now requires a high end GPU to realize its full graphics potential.  Laptops & portable devices are using dedicated processors for graphics  OLED (Organic Light Emitting Diode) Displays are being developed by Sony  Contrast Ratio: 1000000:1

8

9 Marketing 9  Our decoder is designed to interface between specially encoded textures stored on the GPU’s memory and one of the GPU’s texture caches that feed into the shader processor.  Each ROP on (**ATI) is capable of processing 4 pixels per clock cycle. We plan for our hardware to decode the texture information for 4 pixels during each clock cycle.  This decoder will allow smaller textures to be stored in the GPU’s memory, which will allow graphics cards to provide the same functions with less memory.  Ultimately, this decoder can provide savings in cost, power consumption, heat dissipation, and size in current graphics cards. Our HDR Decoder!!

10 Marketing 10  Our HDR Decoder:  Smaller textures stored in GPU’s memory  Same functions…less memory  Savings in:  Cost  Power consumption  Heat dissipation  Size  HDR is the next generation of display technology

11 Emeka Ezekwe Behavioral & Algorithmic Description 11

12 Algorithmic Description  Encoding  Break texture into 4X4 pixel blocks.  Extract luminance value of each pixel.  Normalize red and blue values and average over each 2X2 block. Green can be recalculated while decoding.  Allocate more bits to luminance values.  After encoding, a 4X4 block of pixels can be compressed from 48 bpp to 8 bpp.

13 Algorithmic Description  Decoding (Luminance values)  Reconstruct Lp 1 Logical shift 1 Integer addition  Calculate GQ 1 Integer addition  Calculate final pixel values 3 floating-point multiplications  Total calculations 1 logical shift + 2 Integer additions + 3 floating-point multiplications

14 Data Flow 14 Find G Reg 7 7 4 4 4 4 8 Compute 1 pixel Compute 1 pixel Compute 1 pixel Compute 1 pixel Int to FP Reg 16 Reg 16 Reg 16 Reg 16 Reg 16 Reg 16 Reg 16 Reg 16 Reg 16 Reg 16 Reg 16 Reg 16 Serialize output Serialize output Serialize output Serialize output

15 Chris Thayer Design Process 15

16 Design Process 16  Goal: Speed  400 MHz  4 pixels per cycle, 4 cycles per block  Architectural decisions  No denormal support in Floating Point Multiplier  Pipelined design  Storing input values  Integer Multiplication  Wallace trees  Booth encoding  Critical adders  Carry select  Integer- Floating Point Conversion

17  Circuit level decisions  Mirror FA’s to reduce carry-chain delay  Two different HA’s  AOI/OAI gates  Gate sizing along critical paths  Utilize Q and ~Q outputs from registers  Clock buffers built into register blocks  Double/Triple strapped VDD and GND  Repeaters to break up long wires  Balanced clock tree  Device Folding Design Process

18 Verification Process 18  C Implementation  Structural Verilog  Gate Level Schematic  Layout  Major Modules  Pipeline Stages  Global Signals

19 Shabnam Aggarwal Floorplan Evolution 19

20 Floorplan Evolution

21 Chris Thayer Design Specifications 21

22 Design Specifications 22  Delays  Stage one pipeline: 1.8 ns  Stage two pipeline: 1.53ns  Stage three pipeline: 2.479ns  Skew  Stage one: x  Stage two: x  Stage three: x  Resulting Clock Speed: 500 MHz  2 BILLION pixels per second  Size: 442x453 microns  Aspect Ratio: 1:1.024  Transistors: 42,772  Density: 0.21 T/micron^2

23 Charles Fan Layout 23

24 Floating Point Multiplier Layout 24 Pretty beautiful

25 Floating Point Multiplier Data Flow

26 Poly Layer 26

27 Metal One Layer 27

28 Metal Two Layer 28

29 Metal Three Layer 29

30 Metal Four Layer 30

31 Questions?


Download ppt "High Dynamic Range Emeka Ezekwe M11 Christopher Thayer M12 Shabnam Aggarwal M13 Charles Fan M14 Manager: Matthew Russo 6/26/2015 1."

Similar presentations


Ads by Google