Presentation is loading. Please wait.

Presentation is loading. Please wait.

1. 2 Farhan Mohamed Ali Jigar Vora Sonali Kapoor Avni Jhunjhunwala 1 st May, 2006 Final Presentation MAD MAC 525 Design Manager: Zack Menegakis Design.

Similar presentations


Presentation on theme: "1. 2 Farhan Mohamed Ali Jigar Vora Sonali Kapoor Avni Jhunjhunwala 1 st May, 2006 Final Presentation MAD MAC 525 Design Manager: Zack Menegakis Design."— Presentation transcript:

1 1

2 2 Farhan Mohamed Ali Jigar Vora Sonali Kapoor Avni Jhunjhunwala 1 st May, 2006 Final Presentation MAD MAC 525 Design Manager: Zack Menegakis Design a crucial part of a GPU called the Multiply Accumulate Unit (MAC) which is revolutionizing graphics

3 3 Agenda Marketing – Jigar Project and Algorithm Description – Farhan Implementation Part I – Farhan Implementation Part II – Sonali Floorplan – Sonali Layout – Avni Verification – Avni Design Specifications – Avni Conclusion – Jigar

4 4 Marketing Jigar

5 5 Purpose MAD MAC 525 accelerates FP16 blending to enable true HDR graphics Huh?? MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

6 6

7 7 Beauty of High Dynamic Range With HDR rendering, pixel intensity can extend beyond the range of traditional graphics Nature doesn’t have a limited pixel intensity and neither should Computer Graphics In other words: Bright things can be really bright Dark things can be really dark And the details can be seen in both MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

8 8 Applications of HDR MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

9 9 Target Market Target Market Segment Graphic chip manufacturers High speed DSP manufacturers CPU co-processors Potential Customers MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

10 10 Design Comparison Top 180nm graphics chip is the NVIDIA NV16. Highest speed only 250MHz 9 bit Integer precision As games are becoming more advanced, they are in need of fast graphics chips Conclusion: Market Needs a FAST MAD MAC MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

11 11 Description and Implementation I Farhan

12 12 Multiply Accumulate unit (MAC) Executes function AB+C on 16 bit floating point inputs. Format – 1 bit sign, 5 bit exponent and 10 bit significand Multiply and add in parallel to greatly speed up operation Rounding performed only once so greater accuracy than individual multiply and add functions. Also known as: Fused Multiply Add (FMA) Multiply Add (MAD/MADD) in graphics shader programs Project Description MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

13 13 Algorithm FP Multiply (A*B) Multiply significands Add exponents Normalize Round FP Add (A+B) Align smaller number to larger number Add significands Normalize Round MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

14 14 Algorithm FP Multiply-Add (AB+C) Align sig C based on exp A+B-C Multiply significands A and B Add sig A*B result to aligned sig C Normalize Round MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

15 15 ABC Multiplier Exp CalcAlign Adder Normalize Round Ovf Checker Leading 0 Anticipator Output Y Block Diagram MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

16 16 Implementation Design target: 300MHz Speed is the design goal Ambitious target? How we planned achieve this Fast Logic – parallelize ops as much as possible Pipelining MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

17 17 Implementation Adder Carry Select vs Carry Lookahead tree MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

18 18 Implementation Adder Han-Carlson based carry lookahead adder 6 lookahead logic stages for 32 bit adder Less logic than a Kogge-Stone adder Less wiring than a Brent-Kung adder MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

19 19 Implementation Multiplier Carry-Save Multiplier Avoids having ripple carry in every stage Enables regular and compact layout Easy to pipeline Final 10 bit add stage using carry lookahead adder MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

20 20 Implementation Leading Zero Anticipator Predicts number of shifts to do in normalize Normalize begins with zero delay Operates in parallel with adder so normalize shifts can be predicted with accuracy of 1 shift to left or right MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

21 21 Implementation Latches Pulse Latches Practically eliminates setup time 16 transistors per pulse generator Simplified version of those used in a certain high speed CPU Clock pulse generator MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

22 22 Implementation II and Floorplan Sonali

23 23 Design Decision: Pass Logic Extensive use of Pass Logic  Reduces transistor count  Reduces area Transistor count reduced from 20,200 to 12,800 Example  Normalize: 3400 -> 942  Align: 1500 -> 530 Ensure all pass logic is buffered MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

24 24 Design Decision: Pipelining Initially planned 6 pipeline stages Reduced to 4 pipeline stages  Adder – Fast Carry Lookahead architecture  Multiplier – Ripple Carry to Carry Lookahead MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

25 25 Pipeline Stages Multiplier Align C Reg A Exp Calc Reg C Adder Ld Zero Normalize Round Reg B Output MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

26 26 Schematics MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify Multiplier I N P U T S PIPELINEPIPELINE O U T P U T S OUTPUTSOUTPUTS P I P E L I N E

27 27 Schematic Adder INPUTS OUTPUTS Look Ahead Logic MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify Sum Logic

28 28 Multiplier Align C Reg A Reg B Exp Calc Reg C Pipeline Reg Adder Ld Zero Pipeline Reg Normalize Round Initial Floorplan Reg Y Overflow checker Floorplan Evolution MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

29 29 Floorplan Evolution Exponents Align Ld zero Adder Multiplier NormalizeNormalize RoundRound OvfOvf Reg B Output Reg A Reg C Final Floorplan MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

30 30 Layout, Verification & Specification Avni

31 31 Layout Decisions 3 cell heights – 6.03, 5.04 and 3.55 Uniform width vdd and ground rails Wider vdd and ground rails in power hungry modules Max of 8 latches per clock pulse generator Uniform metal directionality within each block MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

32 32 Final Layout MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

33 33 Final Layout MULTIPLIER MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

34 34 Multiplier  Height: 191.6  Width: 206.38  Area: 20,388 I N ININ PIPELINEREGPIPELINEREG OUTPUTOUTPUT O U T P U T MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify BITSLICEBITSLICE

35 35 Final Layout MULTIPLIER ADDER MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

36 36 Adder A D D E R INCREMENTER  Height:122.9  Width: 110.2  Area:13,202 MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

37 37 Final Layout Exponents Align Ld zero Adder Multiplier N o r m a l i z e R o u n d O v f Input OUTOUT MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

38 38 Layer Masks MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify Active: 14.04%

39 39 Layer Masks Poly : 9.25% MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

40 40 Layer Masks Metal 1 : 34.08% MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

41 41 Layer Masks Metal 2 : 18.00% MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

42 42 Layer Masks Metal 3 : 14.99% MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

43 43 Layer Masks Metal 4 : 6.23% MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

44 44 Verification Of Design Behavioral and Structural Verilog Extensive Testing – Unable to find C or Matlab Code Schematic and Layout testing Analog Simulations – Compare Output with Behavioral Full Chip Verification MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

45 45 Design Specifications Critical path delay = 2.25ns Clock speed = 400MHz Pipeline stages = 4 Height by width = 195.26 um * 303.255 um Area = 59,214 um^2 Aspect ratio = 1:1.55 Transistor density = 0.22 Total Pin Count = 67 MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

46 46 Schematic Power: mW (400 MHz) Layout Power: mW (400 MHz) Schematic Power: mW (100 MHz) Layout Power: mW (100 MHz) Multiplier -w/ pipeline 2.2812.3540.61680.6297 Exponents0.35140.40940.08750.1029 Align0.07820.09260.02780.0324 Adder4.4714.8961.1181.232 Leading 00.13130.17220.0330.0433 Normalize0.58650.62380.14810.1692 Round0.63390.67820.15930.1709 OvfCheck0.16320.16660.04080.04165 Total12.2513.0083.0653.297 MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

47 47 Area: um 2 Transistor Count Transistor Density Schematic Delay (ns) Layout Delay (ns) Multiplier -w/ pipeline2038844960.22 3.38 1.9 N/A 2.25 Exponents5,1637380.141.011.2 Align3,9955000.130.4800.637 Adder13,20231740.241.341.7 Leading 01,2533640.290.5060.551 Normalize3,1909420.30.4070.437 Round1,8024940.280.8640.986 OvfCheck200700.350.4530.475 Registers, etcN/A2038N/A0.1790.193 Total59,21412,8200.22-- MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

48 48 Conclusion Jigar

49 49 Graphics – HDR Rendering, Blending and Shader ops Fastest 180nm GPU: 250 MHz (9-bit Int) MAD MAC 525: 400 MHz (16-bit FP) Everyone Needs a MAD MAC MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

50 50 DSPs – Computing Vector Dot-Products in Digital Filters Everyone Needs a MAD MAC MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

51 51 Enables Fast Division, Square Root Eliminates extra Hardware to handle such computation Available in many new CPUs such as STI’s Cell Everyone Needs a MAD MAC MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

52 52 Future Enhancements 16 to 32 Bits Newer process technology Possible modifications for low power apps MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

53 53 MA D MAC 525 Everyone Wants A


Download ppt "1. 2 Farhan Mohamed Ali Jigar Vora Sonali Kapoor Avni Jhunjhunwala 1 st May, 2006 Final Presentation MAD MAC 525 Design Manager: Zack Menegakis Design."

Similar presentations


Ads by Google