Presentation is loading. Please wait.

Presentation is loading. Please wait.

Random Number Generator Dmitriy Solmonov W1-1 David Levitt W1-2 Jesse Guss W1-3 Sirisha Pillalamarri W1-4 Matt Russo W1-5 Design Manager – Thiago Hersan.

Similar presentations


Presentation on theme: "Random Number Generator Dmitriy Solmonov W1-1 David Levitt W1-2 Jesse Guss W1-3 Sirisha Pillalamarri W1-4 Matt Russo W1-5 Design Manager – Thiago Hersan."— Presentation transcript:

1 Random Number Generator Dmitriy Solmonov W1-1 David Levitt W1-2 Jesse Guss W1-3 Sirisha Pillalamarri W1-4 Matt Russo W1-5 Design Manager – Thiago Hersan May 1, 2006

2 2 Why Random Numbers? Real-Time Simulations Encryption Gambling

3 3 Encryption Need random numbers for authentication Key generation Software vs. Hardware –Less power/time per number –Portable Gambling ePoker Rooms SoC Deck Generation Other future casino games

4 4 Business Plan Potential markets Defense and Intelligence Organizations E-Gambling / Casinos Game Consoles Mobile Communication License the IP Our design will be part of a larger ASIC or GPP design

5 5 IBAA Algorithm Uses RC4 encryption algorithm –Cryptographically secure –Deterministic 1024-bit number generated Internally Updated Seed –not user visible = secure

6 6 #define ALPHA (8) #define SIZE (1<<ALPHA) #define ind(x) ((x)&(0x1F)) #define barrel(a) (((a)<<19)^((a)13)) uint32 A, B, Y, X; uint32 M[32], R[32]; … for ( i=0; i<SIZE; i++ ) { X = m[ind(i)]; A = barrel(A) + M[ind(i +16)]; M[ind(i)] = Y = M[ind(X)] + A + B; R[ind(i)] = B = M[ind(Y>>ALPHA)] + X; } The IBAA Algorithm

7 Architecture

8 8 for ( i=0; i<SIZE; i++ ) { X = M[ind(i)]; A = barrel(A) + M[ind(i +16)]; M[ind(i)] = Y = M[ind(X)] + A + B; R[ind(i)] = B = M[ind(Y>>ALPHA)] + X; } IBAA Algorithm to Architecture 4 Reads from M 1 Write to M 1 Write to R dependencies, feedback, and RAW hazards

9 9 Algorithm to Architecture Hardware Limits –Max. of 2 simultaneous reads from memory Can’t do better than two stages Each stage must take multiple cycles to complete

10 10 Chosen Timing –Addition = 1 cycle –Memory Read = 0.5 cycles –Memory is clocked ½ period off phase –Set address and receive data in 1 cycle When forwarding is applied, need 4 cycles per stage Algorithm to Architecture

11 11 SRAM (M) SRAM (R) FSM Adder Counter Control Logic Register Counter Adder (X) Reg (B) Reg (Y) Reg Adder (Y1) Reg Adder (A) Reg Stage 1 -------------------------------------- M1 = M[i+16] -------------------------------------- X = M[i] | A = M1 + barrel (A) -------------------------------------- M3 = M[X] | C 1 = (X==i-1) -------------------------------------- Y1 = A + (C 1 ) ? Y : M3 Stage 2 ------------------------------------ Y = B + Y1 ------------------------------------ M4 = M[Y addr ] | C 2 = (i==Y addr ) ------------------------------------ B = X + (C 2 ) ? Y : M4 ------------------------------------ M[i] = Y | R[i] = B (M4) Reg (M1) Reg (M2) Reg (M3) Reg

12 Design For Manufacture Regular Fabrics

13 13

14 14

15 15

16 16 Why DFM? Ability to print on smaller processes Robust Manufacturability Sacrifice area, speed and metal layers for a regular design

17 17 Sample Layout: Regular Fabrics

18 18 Lithography Simulations

19 Hardware

20 20 Adder Four adders execute 256 times. Hybrid adder Fast and low power. CS4CS18CS6CS4 A[3:0]B[3:0]A[9:4]B[9:4] A[27:10]B[27:10] A[31:28]B[31:28] S[31:28]S[27:10]S[9:4]S[3:0] C’[4]C[10]C’[28]C[32]

21 21 32-Bit Adder: First 4 Bits

22 22 32-Bit Adder: CS6 Block

23 23 32-Bit Adder: CS18 Block

24 24 32 Bit Fast Adder

25 25 Adder Performance Delay: 1.56 ns Energy Consumption –(worst case switching) : 12.4 pJ Power Dissipation –(estimating with our switch factor) : 148 μW

26 26 SRAM Single Bus Cell Double Bus Cell

27 27 SRAM

28 28 Functional Verification Structural Verilog vs. C Code: –Generate numbers under equal load conditions –Compare Numbers Schematic vs. Structural Verilog –Under equal inputs, check if port outputs match LVS

29 29 Verification Schematic and Extracted Parasitic spice simulations of major blocks –Check for clean signals –Check delays and rise/fall times Extracted Parasitic simulation of critical Register-Register Path –Signals are clean –Delay = 2.1 ns Extracted Parasitic simulation of chip clock distribution

30 30 Critical Delay

31 31 Final Layout

32 32 Poly Density 7.52% Metal1 Density 20.85 %

33 33 Metal2 Density 19.89% Metal3 Density 18.76%

34 34 Metal5 Density 6.8% Metal4 Density 9.36%

35 Analysis

36 36 Specifications Pins –36 input pins 32 bit seed input, gen, read, rst, clk –34 output pins 32 bit random output, rdy, done –2 input/output pins vdd, gnd 475 MHz chip speed 436 KHz throughput

37 37 Part Trans Count Area (um 2 ) Density Prop Delay (ns) Power (1x) (mW) 500MHz Power (Avg) (mW) 475 MHz Adders (4) 5,856 (1,464 ea.) 25,200 (6,300 ea.) 0.232 1.45 1.56 0.60 0.62 0.14 0.148 SRAM (M&R) 17,736 (M=10,458 R=7,278) 51,000 (M=35,000 R=16,000 0.348 (M=0.293 R=0.456) 0.735 0.845 W: 0.51 W: 3.25 R: 0.19 R: 1.40 0.27 1.86 Regs (10) 6,400 (640 ea.) 38,400 (3,840 ea.) 0.167 0.220 0.275 0.53 0.59 0.13 0.145 Total 33,371182,0000.194 2.1 ns 475 MHz -----4.1 mW Putting it All Together Schematic ExtractRC

38 38 Performance Comparison Operation Time (ms) ~4,000,000 Runs Intel P4 3.20 GHz (90 nm)5000 W1-2006 475 MHz (180 nm)9000 AMD Opteron Blade 1.005 GHz ()14000 ARM Intel XScale 700 MHz ()125000

39 39 Where to Now ? ERC, tapeout, etc. Thermal noise unit to use as input seed On-Chip Bus Interface HyperTransport™ Interface

40 40 References Jenkins, Robert J. “ISAAC”. http://burtleburtle.net/bob/rand/isaac.html Chirca, Schulte, Glossner, et al. “A Static Low-Power, High-Performance 32-bit Carry Skip Adder”. http://mesa.ece.wisc.edu/publications/cp_2004- 12.pdf “CLA and Ling Adders”. http://umunhum.stanford.edu/~farland/notes.html

41 41 Questions


Download ppt "Random Number Generator Dmitriy Solmonov W1-1 David Levitt W1-2 Jesse Guss W1-3 Sirisha Pillalamarri W1-4 Matt Russo W1-5 Design Manager – Thiago Hersan."

Similar presentations


Ads by Google