Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka.

Similar presentations


Presentation on theme: "1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka."— Presentation transcript:

1 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

2 2 Agenda Background (Marques) Project Description (Marques) Algorithmic Description (Joe) Data Flow/Block Diagram (Joe) Design Process (Nathan) Simulations (Nathan) Floorplan/Layout (Brian) Conclusions (Brian)

3 3 History of 2 P -1 16 th century it was believed 2 P -1 was prime for all prime P’s 1536 Hudalricus Regius proved 2 11 -1 was not prime French monk Marin Mersenne published Cogitata Physica-Mathematica where he stated 2 P -1 was prime for P = 2, 3, 5, 7, 13, 17, 19, 31, 67, 127 and 257

4 4 Lucas-Lehmer François Edouard Anatole Lucas 1876 proved that the number 2 127 - 1 is prime using his own methods Derrick Lehmer –1930 he refined Lucas’s method

5 5 Make History December 2005 43rd Known Mersenne Prime Found!! Dr. Curtis Cooper and Dr. Steven Boone Professors at Central Missouri State University 2 30,402,457 -1

6 6 Prime Number Competitions Electronic Frontier Foundation $50,000 to the first individual or group who discovers a prime number with at least 1,000,000 decimal digits (awarded Apr. 6, 2000) $100,000 to the first individual or group who discovers a prime number with at least 10,000,000 decimal digits $150,000 to the first individual or group who discovers a prime number with at least 100,000,000 decimal digits $250,000 to the first individual or group who discovers a prime number with at least 1,000,000,000 decimal digits

7 7 rankprimedigitswhowhenreference 12 30402457 -19152052G92005Mersenne 43 22 25964951 -17816230G82005Mersenne 42 32 24036583 -17235733G72004Mersenne 41 42 20996011 -16320430G62003Mersenne 40 52 13466917 -14053946G52001Mersenne 39 627653. 2 9167433 +12759677SB82005 728433. 2 7830457 +12357207SB72004 82 6972593 -12098960G41999Mersenne 38 95359. 2 5054502 +11521561SB62003 104847. 2 3321063 +1999744SB92005

8 8 Mersenne Prime Algorithm Only used for numbers that are in the form 2 P -1 For P > 2 2 P -1 is prime if and only if S p-2 is zero in this sequence: S 0 = 4 S N = (S N-1 2 - 2) mod (2 P -1)

9 9 Example to Show 2 7 - 1 is Prime 2 7 – 1 = 127 S 0 = 4 S 1 = (4 * 4 - 2) mod 127 = 14 S 2 = (14 * 14 - 2) mod 127 = 67 S 3 = (67 * 67 - 2) mod 127 = 42 S 4 = (42 * 42 - 2) mod 127 = 111 S 5 = (111 * 111 - 2) mod 127 = 0

10 10 Computations needed: -Squaring (not a problem…) -Add/Subtract (not a problem…) -Modulo (2 n – 1) multiplication (?) Algorithmic description We knew the necessary computations, but how to translate that to gates?

11 11 Mechanisms behind the math If done with brute force, modulo 2 n -1 could have been ugly. –Would need to square and find the remainder via division. Luckily, for that specific computation, math is on our side, the 2 n -1 constraint saves us from division, as will be seen. A quick search on www.ieee.org produced inspiration.www.ieee.org Reto Zimmermann. Efficient VLSI Implementation of Modulo (2 n +- 1) Addition and Multiplication. Computer Arithmetic, 1999; p158- 167.

12 12 Useful Math: Multiplication Just like any other multiplication, a modulo multiplication can be computed by (modulo) summing the partial products. So modulo multiplication is multiplication using a modulo adder. From the Zimmerman paper

13 13 Mod Calc Mod add Count Subtract 2 Block Diagram P Out 16 1 FSM start 1 done Register 16 Compare 2 1 4 2 2 1 16 Counter Next Partial Product 16 Register 16 2 S1 = (4 * 4) mod 127 - 2 = 14 Loop xP-2 S5 = (111 * 111 - 2) mod 127 = 0... S2 = (14 * 14) mod 127 - 2 = 67 Loop x16

14 14 Design Process The Process So far: - Found Mathematical Means (core algorithm) - Found Computational Means (modulo multiplier, adder) From the above, a high level C program was written in a manner that would easily translate to verilog and gates, or at least more standard operations int mod_square_minus(int value, int p, int offset) { int acc, i; int mod = (1 << p) - 1; for(acc=offset, i=0; i<(sizeof(int)*8-1); i++) { int a = (value >> i) & 1; int temp; if (a) { if (i-p > 0) temp = value << (i-p); else temp = value >> (p-i); acc = acc + temp + ((value << i) & ((1 << p) - 1)); } if (acc >= mod) acc = acc - mod; } return acc; } This easily translated into behavorial verilog, and readily turned into a gate- level implementation. Essentially it was written in a more low-level manner.

15 15 Design Process The rest of the design can simply be thought of as a wrapper for the modulo multiplier. The following slides contain Verilog code that was directly taken from the C code below. module mod_mult(out, itrCount, x, y, mod, p, reset, en, clk); input [15:0] x, y, mod, p; output [15:0] out; input reset, en, clk; wire [15:0] pp, ma0, temp; output [3:0] itrCount; counter mycount(itrCount, reset, en, clk); partial_product ppg(pp, x, y, itrCount, mod, p); mod_add modAdder(out, pp, temp, mod); dff_16_lp partial(clk, out, temp, reset, en); endmodule Top level of multiplier

16 16 module partial_product(out, x, y, i, mod, p); output [15:0] out; input [15:0] x, y, mod, p; input [3:0] i; wire [15:0] diff1, diff2, added, result, corrected, final; wire [15:0] high, low, shifted, toadd; wire cout1, cout2, ithbith, toobig; sub_16 difference1(diff1, cout1, {12'b0, i}, p); sub_16 difference2(diff2, cout2, p, {12'b0, i}); shift_left shiftL(high, y, diff1[3:0]); shift_right shiftR(low, y, diff2[3:0]); mux16 choose(high, low, shifted, cout1); shift_left shiftL2(toadd, y, i); and16 bigand(added, toadd, mod); fulladder_16 addhighlow(.out(result),.xin(added),.yin(shifted),.cin({1'b0}),.cout(nowhere)); sub_16 correct(.out(corrected),.cout(toobig),.xin(mod),.yin(result)); mux16 correctionMux(.out(final),.high(corrected),.low(result),.sel(toobig)); shift_right ibit({15'b0, ithbit}, x, i); select16 checkfor0(.out(out),.x(result),.sel(ithbit)); endmodule Partial Product Unit w/ modulo reduction

17 17 module mod_add(out, x, y, mod); input [15:0] x, y, mod; output [15:0] out; wire cout, isDouble, cin; wire [15:0] plus, lowbits, done, mod_bar, check; fulladder_16 add(.out(plus),.xin(x),.yin(y),.cin(cin),.cout()); invert_16 inverter(mod_bar, mod); and16 hihnbits(check, plus, mod_bar); and16 lownbits(done, plus, mod); or8 (cin, check[0], check[1], check[2], check[3], check[4], check[5], check[6], check[7], check[8], check[9], check[10], check[11], check[12], check[13], check[14], check[15]); compare_16 checkfordouble(isDouble, done, 16'b1111_1111_1111_1111); mux16 fixdouble(.out(out),.high(16'b0),.low(done),.sel(isDouble)); endmodule Modulo Adder

18 18 Final Design Process Notes Lessons learned: Never tweak the schematics without retesting the verilog first. Timing issues can be subtle. Verilog is better for catching them and quickly fixing/retesting than schematics. Considering total time spent during this phase, roughly half was on the “core” and the FSM, the rest on the “wrapper”.

19 19 Road to verification : C 2 Examples of the high-level C implementations: Tyrion:~/Desktop/15525 nstohs$./prime4 7 round 1: (4 * 4 - 2) mod 127 = 14 round 2: (14 * 14 - 2) mod 127 = 67 round 3: (67 * 67 - 2) mod 127 = 42 round 4: (42 * 42 - 2) mod 127 = 111 round 5: (111 * 111 - 2) mod 127 = 0 2 7 -1 is prime Tyrion:~/Desktop/15525 nstohs$./prime4 11 round 1: (4 * 4 - 2) mod 2047 = 14 round 2: (14 * 14 - 2) mod 2047 = 194 round 3: (194 * 194 - 2) mod 2047 = 788 round 4: (788 * 788 - 2) mod 2047 = 701 round 5: (701 * 701 - 2) mod 2047 = 119 round 6: (119 * 119 - 2) mod 2047 = 1877 round 7: (1877 * 1877 - 2) mod 2047 = 240 round 8: (240 * 240 - 2) mod 2047 = 282 round 9: (282 * 282 - 2) mod 2047 = 1736 2 11 -1 is not prime

20 20 Road to verification: Verilog Samples of Verilog Verification output: Partial Product Unit p = 7 380 ppOut= 56, x= 14, y= 14, i= 2, mod= 127, p= 7 400 ppOut= 112, x= 14, y= 14, i= 3, mod= 127, p= 7 420 ppOut= 0, x= 14, y= 14, i= 4, mod= 127, p= 7 440 ppOut= 0, x= 14, y= 14, i= 5, mod= 127, p= 7 Top Level p = 7 itrOut= x itrOut= 4 itrOut= 14 itrOut= 67 itrOut= 42 itrOut= 111 itrOut= 0 Top Level p = 11 itrOut= x itrOut= 4 itrOut= 14 itrOut= 194 itrOut= 788 itrOut= 701 itrOut= 119 itrOut= 1877 … Tests were either specific tests on important units such as Partial_Product …or top level tests. Note that these are the same results generated from the C code

21 21 Road to verification: Schematic I Schematic Test of our modular adder. 128 + 68 Mod 127 = 69

22 22 Road to verification: Schematic II Plot of the top level output after a single iteration, p=7 Output after a single iteration is 14, the expected value.

23 23 Road to verification: Schematic III 4 14 67 42 111

24 24 Road to verification: Intermission Disk Space required for a full-length schematic test of p=7 : 6 GB Time required for a full-length schematic test of p=7 : 5 hours Disk Space required for a full-length extractedRC test of p=7 : 20 GB Time required for a full-length extractedRC test of p=7 : 8 hours Simulations become lengthy due to tests needing to be “deep” to be useful.

25 25 Layout: ExtractedRC – Full Run 4 14 67 42 111

26 26 Timing To determine the bounds of our clock, Pathmill was used once major portions of the schematic was complete. The critical path through our design is one loop through the modular multiplier, which runs through the modular adder and partial products module. The pathmill delay of the modular adder was 9ns, and 5.2 ns through the partial products module. This already puts our total delay at 14.2 ns, putting our schematic delay at 70 MHz. For extractedRC, due in part to simulation issues, a conservative 50 MHz was chosen as the final clock.

27 27 Issues extractedRC of partial_product module Registers switch –Custom design to DFFs with muxes Switching from parallel calculations to series –Transistor count vs. clock cycles Syncing up design between people –Transferring files –Different design styles LONG simulation times Floorplanning –Too much emphasis on aspect ratios and not enough on wiring –Couldn’t decide on one set floorplan

28 28 Floorplan v1.0

29 29 Floorplan v2.0

30 30 Final Floorplan

31 31 Pin Specifications PinType# of Pins Vdd!In/Out1 Gnd!In/Out1 p In16 clkIn1 startIn1 DoneOut1 outOut1 Total-22

32 32 Initial Module Specifications ModuleTransistor Count Area (µm²) Transistor Density FSM300900.33 mod_p2,4407,000.35 mod_add1,2829,000.14 partial_product8,67665,000.13 count1,6566,000.27 sub_167043,500.20 Registers1,8486,000.30 compare36300.12 Total16,94297,700.17

33 33 Final Module Specifications ModuleTransistor Count Area (µm²) Transistor Density FSM1521,200.13 mod_p1,2808,603.15 mod_add1,1685,603.21 partial_product7,52054,680.14 count1,4248,701.16 sub_165762,934.20 Registers8966,028.15 compare56201.28 Total13,70286,621.16 Aspect Ratio 2.45 0.79 2.40 1.16 6.88 4.49 4.76 4.41 1.01

34 34 Chip Specifications Transistor Count: 13,702 Size: 296.51µm x 292.13µm Area: 86,621µm² Aspect Ratio: 1.01:1 Density: 0.16 transistors/µm²

35 35 Final Floorplan

36 36 Final Floorplan

37 37 Partial Product shift_rightshift_left shift_rightshift_left adder 16-bit and Select 16 Sub_16 mux

38 38 Poly Layer Density: 7.14%

39 39 Active Layer Density: 8.76%

40 40 Metal1 Layer Density: 23.86%

41 41 Metal2 Layer Density: 19.97%

42 42 Metal3 Layer Density: 11.30%

43 43 Metal4 Layer Density: 10.34%

44 44 Conclusions Plan for buffers -Will be hard to put them in after the fact Your design will change dramatically from start to finish so be flexible Communication is key Do layout in parallel


Download ppt "1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka."

Similar presentations


Ads by Google