Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka.

Similar presentations


Presentation on theme: "1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka."— Presentation transcript:

1 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

2 2 Ancient Greeks 300 BC Euclid’s Proof Proved that were an infinite number of Prime numbers that were irregularly spaced

3 3 How to find Prime Numbers The method used for smaller numbers is called Sieve of Eratosthenes from 240 BC Trial Division is another method for smaller numbers

4 4 43rd Known Mersenne Prime Found!! December 2005 Dr. Curtis Cooper and Dr. Steven Boone Professors at Central Missouri State University 2 30,402,457 -1

5 5 rankprimedigitswhowhenreference 12 30402457 -19152052G92005Mersenne 43 22 25964951 -17816230G82005Mersenne 42 32 24036583 -17235733G72004Mersenne 41 42 20996011 -16320430G62003Mersenne 40 52 13466917 -14053946G52001Mersenne 39 627653. 2 9167433 +12759677SB82005 728433. 2 7830457 +12357207SB72004 82 6972593 -12098960G41999Mersenne 38 95359. 2 5054502 +11521561SB62003 104847. 2 3321063 +1999744SB92005

6 6 Prime Number Competitions Electronic Frontier Foundation $50,000 to the first individual or group who discovers a prime number with at least 1,000,000 decimal digits (awarded Apr. 6, 2000) $100,000 to the first individual or group who discovers a prime number with at least 10,000,000 decimal digits $150,000 to the first individual or group who discovers a prime number with at least 100,000,000 decimal digits $250,000 to the first individual or group who discovers a prime number with at least 1,000,000,000 decimal digits

7 7 Mersenne Prime Algorithm For P > 2 2 P -1 is prime if and only if S p-2 is zero in this sequence: S 0 = 4 S N = (S N-1 2 - 2) mod (2 P -1)

8 8 Example to Show 2 7 - 1 is Prime 2 7 – 1 = 127 S 0 = 4 S 1 = (4 * 4 - 2) mod 127 = 14 S 2 = (14 * 14 - 2) mod 127 = 67 S 3 = (67 * 67 - 2) mod 127 = 42 S 4 = (42 * 42 - 2) mod 127 = 111 S 5 = (111 * 111 - 2) mod 127 = 0

9 9 Computations needed: -Squaring (not a problem…) -Add/Subtract (not a problem…) -Modulo (2^n – 1) multiplication (?) Algorithmic description We knew the computations needed, but how to translate that to gates?

10 10 Mechanisms behind the math If done with brute force, modulo 2^n-1 could have been ugly. –Would need to square and find the remainder via division. Luckily, for that specific computation, math is on our side, the 2^n-1 constraint saves us from division, as will be seen. A quick search on www.ieee.com produced inspiration.www.ieee.com Taken from “Efficient VLSI Implementation of Modulo (2^n +- 1) Addition and Multiplication” Reto Zimmermann Swiss Federal Institute of Technology (ETH)

11 11 Useful Math: Multiplication Just like any other multiplication, a modulo multiplication can be computed by (modulo) summing the partial products. So modulo multiplication is multiplication using a modulo adder.

12 12 Useful Math: The Modulo Adder The more logic driven math that is the basis of our modulo adder.

13 13 Last Bits: Modulo Reduction At various points, such as when finding the partial product, the result has to be reduced. There is a nifty way to do that as well.

14 14 Mod Calc Mod Multiply Count Subtract 2 Block Diagram P Out 16 1 FSM start 1 done 16 Register r4 16 Compare 2 1 4 2 2 1 clk 16

15 15 Mod Multiply Block Diagram Mod add Register 2 p -1 16 Counter 16 Next Partial Product 16 FSM clk 2 FSM clk 2 4 16 P to sub 2 from register

16 16 Mod Calc Mod Multiply Count Subtract 2 Block Diagram P Out 16 1 FSM start 1 done 16 Register r2 16 Compare 2 1 4 2 2 1 clk 16

17 17 Design Process The Process So far: - Found Mathematical Means (core algorithm) - Found Computational Means (modulo multiplier, adder) From the above, a high level C program was written in a manner that would easily translate to verilog and gates, or at least more standard operations int mod_square_minus(int value, int p, int offset) { int acc, i; int mod = (1 << p) - 1; for(acc=offset, i=0; i<(sizeof(int)*8-1); i++) { int a = (value >> i) & 1; int temp; if (a) { if (i-p > 0) temp = value << (i-p); else temp = value >> (p-i); acc = acc + temp + ((value << i) & ((1 << p) - 1)); } if (acc >= mod) acc = acc - mod; } return acc; } This easily translated into behavorial verilog, and readily turned into a gate- level implementation. Essentially it was written in a more low-level manner.

18 18 Design Process The rest of the design can simply be thought of as a wrapper for the modulo multiplier. The following slides contain Verilog code that was directly taken from the C code below. module mod_mult(out, itrCount, x, y, mod, p, reset, en, clk); input [15:0] x, y, mod, p; output [15:0] out; input reset, en, clk; wire [15:0] pp, ma0, temp; output [3:0] itrCount; counter mycount(itrCount, reset, en, clk); partial_product ppg(pp, x, y, itrCount, mod, p); mod_add modAdder(out, pp, temp, mod); dff_16_lp partial(clk, out, temp, reset, en); endmodule Top level of multiplier

19 19 module partial_product(out, x, y, i, mod, p); output [15:0] out; input [15:0] x, y, mod, p; input [3:0] i; wire [15:0] diff1, diff2, added, result, corrected, final; wire [15:0] high, low, shifted, toadd; wire cout1, cout2, ithbith, toobig; sub_16 difference1(diff1, cout1, {12'b0, i}, p); sub_16 difference2(diff2, cout2, p, {12'b0, i}); shift_left shiftL(high, y, diff1[3:0]); shift_right shiftR(low, y, diff2[3:0]); mux16 choose(high, low, shifted, cout1); shift_left shiftL2(toadd, y, i); and16 bigand(added, toadd, mod); fulladder_16 addhighlow(.out(result),.xin(added),.yin(shifted),.cin({1'b0}),.cout(nowhere)); sub_16 correct(.out(corrected),.cout(toobig),.xin(mod),.yin(result)); mux16 correctionMux(.out(final),.high(corrected),.low(result),.sel(toobig)); shift_right ibit({15'b0, ithbit}, x, i); select16 checkfor0(.out(out),.x(result),.sel(ithbit)); endmodule Partial Product Unit w/ modulo reduction

20 20 module mod_add(out, x, y, mod); input [15:0] x, y, mod; output [15:0] out; wire cout, isDouble, cin; wire [15:0] plus, lowbits, done, mod_bar, check; fulladder_16 add(.out(plus),.xin(x),.yin(y),.cin(cin),.cout()); invert_16 inverter(mod_bar, mod); and16 hihnbits(check, plus, mod_bar); and16 lownbits(done, plus, mod); or8 (cin, check[0], check[1], check[2], check[3], check[4], check[5], check[6], check[7], check[8], check[9], check[10], check[11], check[12], check[13], check[14], check[15]); compare_16 checkfordouble(isDouble, done, 16'b1111_1111_1111_1111); mux16 fixdouble(.out(out),.high(16'b0),.low(done),.sel(isDouble)); endmodule Modulo Adder

21 21 Final Design Process Notes Lessons learned: Never tweak the schematics without retesting the verilog first. Considering total time spent during this phase, roughly half was on the “core” and the FSM, the rest on the “wrapper”.

22 22 Road to verification : C 2 Examples of the high-level C implementations: Tyrion:~/Desktop/15525 nstohs$./prime4 7 round 1: (4 * 4 - 2) mod 127 = 14 round 2: (14 * 14 - 2) mod 127 = 67 round 3: (67 * 67 - 2) mod 127 = 42 round 4: (42 * 42 - 2) mod 127 = 111 round 5: (111 * 111 - 2) mod 127 = 0 2^7-1 is prime Tyrion:~/Desktop/15525 nstohs$./prime4 11 round 1: (4 * 4 - 2) mod 2047 = 14 round 2: (14 * 14 - 2) mod 2047 = 194 round 3: (194 * 194 - 2) mod 2047 = 788 round 4: (788 * 788 - 2) mod 2047 = 701 round 5: (701 * 701 - 2) mod 2047 = 119 round 6: (119 * 119 - 2) mod 2047 = 1877 round 7: (1877 * 1877 - 2) mod 2047 = 240 round 8: (240 * 240 - 2) mod 2047 = 282 round 9: (282 * 282 - 2) mod 2047 = 1736 2^11-1 is not prime

23 23 Road to verification: Verilog Samples of Verilog Verification output: Partial Product Unit p = 7 380 ppOut= 56, x= 14, y= 14, i= 2, mod= 127, p= 7 400 ppOut= 112, x= 14, y= 14, i= 3, mod= 127, p= 7 420 ppOut= 0, x= 14, y= 14, i= 4, mod= 127, p= 7 440 ppOut= 0, x= 14, y= 14, i= 5, mod= 127, p= 7 Top Level p = 7 itrOut= x itrOut= 4 itrOut= 14 itrOut= 67 itrOut= 42 itrOut= 111 itrOut= 0 Top Level p = 11 itrOut= x itrOut= 4 itrOut= 14 itrOut= 194 itrOut= 788 itrOut= 701 itrOut= 119 itrOut= 1877 … Tests were either specific tests on important units such as Partial_Product …our top level tests. Note that these are the same results generated from the C code

24 24 Road to verification: Schematic I Schematic Test of our modular adder. 128 + 68 Mod 127 = 69

25 25 Road to verification: Schematic II Plot of the top level output after a single iteration, p=7 Output after a single iteration is 14, the expected value.

26 26 Road to verification: Schematic III The simulation outputs after a full run, showing the results of all iterations. Simulations start taking a long time. More on that later.

27 27 Road to verification: Intermission Disk Space required for a full-length schematic test of p=7 : 6 GB Time required for a full-length schematic test of p=7 : 4 hours Disk Space required for a full-length extracted test of p=7 : more Time required for a full-length extracted test of p=7 : longer Disk Space required for a full-length extractedRC test of p=7 : 1 iPod Time required for a full-length extractedRC test of p=7 : T_T Simulations become very demanding and lengthy due to tests needing to be “deep” to be useful. To meet such demands, be sure to use Genuine AMD© CPUs.

28 28 Road to verification: Layout I 3 words: “the net-lists match” Of course, there is far more to be concerned about. Due to simulator issues, layout simulations were delayed on some major modules. Partial Product Sims In Progress (I Hope)

29 29 Road to verification: Layout II Top Level layout Sims in Progress

30 30 Road to verification: Timing Layout Timing Sims in progress Pathmill was useful to help us gauge our critical path, which is one cycle through our modulo multiplier. When run on the top level, a critical path of 12.703ns was found. This was in the ballpark relative to our research.

31 31 Issues extractedRC of partial_product module Registers switch Switching from parallel calculations to series –Transistor count vs. clock cycles Syncing up design between people –Transferring files –Different design styles LONG simulation times Floorplanning –Too much emphasis on aspect ratios and not enough on wiring –Couldn’t decide on one set floorplan

32 32 Floorplan v1.0 Prime Logic Mod Multiplier Mod Adder FSM Memory

33 33 Floorplan v2.0

34 34 Floorplan v3.0

35 35 Floorplan v4.0

36 36 Floorplan v5.0

37 37 Final Floorplan

38 38 Pin Specs PinType# of Pins Vdd!In/Out1 Gnd!In/Out1 p In16 clkIn1 startIn1 DoneOut1 outOut1 Total-22

39 39 Initial Part Specs ModuleTransistor Count Area (µm²) Transistor Density FSM300900.33 mod_p2,4407,000.35 mod_add1,2829,000.14 partial_product8,67665,000.13 count1,6566,000.27 sub_167043,500.20 Registers1,8486,000.30 compare36300.12 Total16,94297,700.17

40 40 Final Part Specs ModuleTransistor Count Area (µm²) Transistor Density FSM1521,200.13 mod_p1,2808,603.15 mod_add1,1685,603.21 partial_product7,52054,680.14 count1,4248,701.16 sub_165762,934.20 Registers8966,028.15 compare56201.28 Total13,70286,621.16 Aspect Ratio 2.45 0.79 2.40 1.16 6.88 4.49 4.76 4.41 1.01

41 41 Chip Specs Transistor Count: 13,702 Size: 296.51µm x 292.13µm Area: 86,621µm² Aspect Ratio: 1.01:1 Density: 0.16 transistors/µm²

42 42 Final Floorplan

43 43 Final Floorplan

44 44 Poly Layer Density: 7.14%

45 45 Active Layer Density: 8.76%

46 46 Metal1 Layer Density: 23.86%

47 47 Metal2 Layer Density: 19.97%

48 48 Metal3 Layer Density: 11.30%

49 49 Metal4 Layer Density: 10.34%

50 50 Conclusions Plan for buffers –Can’t put them in after the fact Your design will change dramatically from start to finish so be flexible Communication is key Do layout in parallel


Download ppt "1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka."

Similar presentations


Ads by Google