HDR- Design Presentation Team M1: Emeka Ezekwe (M11) Chris Thayer (M12) Shabnam Aggarwal (M13) Charles Fan (M14) Team M1 Manager: Matthew Russo.

HDR- Design Presentation Team M1: Emeka Ezekwe (M11) Chris Thayer (M12) Shabnam Aggarwal (M13) Charles Fan (M14) Team M1 Manager: Matthew Russo

Status Complete: Specification definition Block Diagram C Implementation Verilog (Structural almost done, not yet tested) Incomplete: Schematic Layout Testing

HDR in the G80 GPU Our decoder is designed to interface between specially encoded textures stored on the GPU’s memory and one of the GPU’s ROPs (Render Output Unit) – Each ROP on nVidia’s g80 is capable of processing 4 pixels per clock cycle. We plan for our hardware to decode the texture information for 4 pixels during each clock cycle. This decoder will allow smaller textures to be stored in the GPU’s memory, which will allow graphics cards to provide the same functions with less memory. Ultimately, this decoder can provide savings in cost, power consumption, heat dissipation, and size in current graphics cards.

Design objectives and contstraints Shooting for 400 Mhz (2 or 3 pipeline stages) Speed is clearly our goal, but power and size are also important. – minimize these after maximizing speed 4 pixels per cycle, 4 cycles per block – no wasted cycles like before when storing special luminance values

Design Decisions Removed Module to store Nzeros and Lbias – This has increased our input count from 97 to 104 Removed denormal support in the floating point multipliers. Integer Multiplication is done by Wallace trees and Booth recoding. – still a maybe. need to see how they layout Critical adders are going to be Carry select.

Updated Block Diagram Reg 7 7 4 4 4 4 Compute 1 pixel Compute 1 pixel Compute 1 pixel Compute 1 pixel Find G Int to FP Reg 16 8 Serialize output Serialize output Serialize output Serialize output Reg

Compute 1 pixel Shift >> 4 Lum p 7 11 Nzeros 5 L bias 6 11 (Int) + 11 11-bit FP MULT 11 RpRp GpGp BpBp G FP R FP B FP

FP multiplier 5 6 11 I-* 5 11 6 Vdd I-+ nor 1 ovrflw >> 14 1 15 mux 0’s mux 1’s or 4 1 mux 10 9 0’s 1 out 15 5 10 7 Vdd 7

I-FP 0 Counter << ~+7 7 I [5:0]3 3 m 6 3 bit 2’s Comp Exp [2:0] 5 [4:3] [10:6] 11 bit FP 2CE

Find G R B 7 7 7-bit Adder Bitwise inv +2 7 B 7 R 7 G

Transistor count 30,162Total 1,224 434 4,224 23,032 (5758) 1,248(312) Registers Find G Int to FP Compute one pixel* Serializing Output TransistorsBlock *this is assuming 49 FA which is an upper bound

Initial Floorplan Reg 7 7 4 4 4 4 Compute 1 pixel Find G Int to FP Reg 16 8 Serial output Serial output Serial output Serial output Reg Compute 1 pixel Compute 1 pixel Compute 1 pixel

Problems and Questions – Pipelining looks like one pipeline stage inside the fp multiplyer and another just before it will do well. Need to make sure. – Alternate designs for I-FP looks like ROM is the way. Faster, only need 3 of them (or one triple-ported ROM) instead of 12 like we thought. – How well do wallace trees layout? carry save multipliers are known to layout very wellmay simplify pipelining

HDR- Design Presentation Team M1: Emeka Ezekwe (M11) Chris Thayer (M12) Shabnam Aggarwal (M13) Charles Fan (M14) Team M1 Manager: Matthew Russo.

Similar presentations

Presentation on theme: "HDR- Design Presentation Team M1: Emeka Ezekwe (M11) Chris Thayer (M12) Shabnam Aggarwal (M13) Charles Fan (M14) Team M1 Manager: Matthew Russo."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

HDR- Design Presentation Team M1: Emeka Ezekwe (M11) Chris Thayer (M12) Shabnam Aggarwal (M13) Charles Fan (M14) Team M1 Manager: Matthew Russo.

Similar presentations

Presentation on theme: "HDR- Design Presentation Team M1: Emeka Ezekwe (M11) Chris Thayer (M12) Shabnam Aggarwal (M13) Charles Fan (M14) Team M1 Manager: Matthew Russo."— Presentation transcript:

Similar presentations

About project

Feedback