Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lab 2 presentations Prof. Sherief Reda Division of Engineering,

Similar presentations


Presentation on theme: "Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lab 2 presentations Prof. Sherief Reda Division of Engineering,"— Presentation transcript:

1 Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lab 2 presentations Prof. Sherief Reda Division of Engineering, Brown University http://ic.engin.brown.edu

2 Reconfigurable Computing S. Reda, Brown University Runtimes by different teams 4.2 seconds 14 seconds 33 seconds 300 seconds 305 seconds 320 seconds

3 Reconfigurable Computing S. Reda, Brown University Cesare Ferri Rotor Le Palindrome Checker

4 Reconfigurable Computing S. Reda, Brown University Part I : Verilog Module always @(posedge CLOCK_50) begin not_palindrome = 1'd0;len = 0; tmp = number; //reset for (i = 0; i<9 ; i = i + 4'd1) begin if (tmp > 0) begin modulo = tmp % 4'd10; tmp = tmp / 10; vector[len % 9] = modulo; len = len + 1; end th = (len >> 1) ; for (j=0; j<th; j = j + 4'd1) begin tmp2 = (len-1) - j; tmp3 = vector[j];tmp4 = vector[tmp2]; if ( tmp3 != tmp4 ) not_palindrome = 1'b1; end result = ~(not_palidrome); end DECOMPOSE THE NUMBER IN DIGITS Room for loop unrolling here..

5 Reconfigurable Computing S. Reda, Brown University Part I : Verilog Module always @(posedge CLOCK_50) begin not_palindrome = 1'd0;len = 0; tmp = number; //reset for (i = 0; i<9 ; i = i + 4'd1) begin if (tmp > 0) begin modulo = tmp % 4'd10; tmp = tmp / 10; vector[len % 9] = modulo; len = len + 1; end th = (len >> 1) ; for (j=0; j<th; j = j + 4'd1) begin tmp2 = (len-1) - j; tmp3 = vector[j];tmp4 = vector[tmp2]; if ( tmp3 != tmp4 ) not_palindrome = 1'b1; end result = ~(not_palidrome); end COMPARE THE DIGITS STORED INTO THE VECTOR loop unrolling, again..

6 Reconfigurable Computing S. Reda, Brown University Optimized Verilog Code Do loop unrolling to compare digits: if (digits[0] == digits[3] && digits[1] == digits[2]) not_palindrome = 1'd1;//reset

7 Reconfigurable Computing S. Reda, Brown University Unsolved things Our running time now depends on the way that we extract digits from the number Some ideas to improve? –Using shift register –Using non-blocking instructions

8 Reconfigurable Computing S. Reda, Brown University Palindrome Homework Summary ENGN2911X Aaron Mandle Bryant Mairs

9 Reconfigurable Computing S. Reda, Brown University Setup Two-cycle fixed length custom instruction Operates on 20 numbers at a time Returns total palindromes in that 20- number block

10 Reconfigurable Computing S. Reda, Brown University Process Combinatorial conversion from binary to BCD Check number of digits Compare digits based on length Total up number of valid palindromes

11 Reconfigurable Computing S. Reda, Brown University Binary to BCD Conversion Built using blocks of conditional add-3 modules and shifts Add-3 modules: –4-bit input –Adds 3 if input was 5 or greater Based on adding 6 numbers > 9

12 Reconfigurable Computing S. Reda, Brown University module checkPalindrome(data, result); input [31:0] data; output [31:0] result; wire [3:0] digits [10:0]; wire [3:0] digCount; bin2bcd({digits[9], digits[8], digits[7], digits[6], digits[5], digits[4], digits[3], digits[2], digits[1], digits[0]}, data); assign digCount = digits[9] != 0?10: digits[8] != 0?9: digits[7] != 0?8: digits[6] != 0?7: digits[5] != 0?6: digits[4] != 0?5: digits[3] != 0?4: digits[2] != 0?3: digits[1] != 0?2: 1; assign result = digCount == 1 || digCount == 2 && (digits[0] == digits[1]) || digCount == 3 && (digits[0] == digits[2]) || digCount == 4 && (digits[0] == digits[3] && digits[1] == digits[2]) || digCount == 5 && (digits[0] == digits[4] && digits[1] == digits[3]) || digCount == 6 && (digits[0] == digits[5] && digits[1] == digits[4] && digits[2] == digits[3]) || digCount == 7 && (digits[0] == digits[6] && digits[1] == digits[5] && digits[2] == digits[4]) || digCount == 8 && (digits[0] == digits[7] && digits[1] == digits[6] && digits[2] == digits[5] && digits[3] == digits[4]) || digCount == 9 && (digits[0] == digits[8] && digits[1] == digits[7] && digits[2] == digits[6] && digits[3] == digits[5]); endmodule

13 Reconfigurable Computing S. Reda, Brown University Yossi

14 Reconfigurable Computing S. Reda, Brown University For all solutions Finding the length of the decimal representation (# digits) by: typedef unsigned long UINT; inline UINT GetMSDFIndx(UINT n) { return (n >= 100000000 ? 8 : (n >= 10000000 ? 7 : (n >= 1000000 ? 6 : (n >= 100000 ? 5 : (n >= 10000 ? 4 : (n >= 1000 ? 3 : (n >= 100 ? 2 : (n >= 10 ? 1 : 0)))))))); }

15 Reconfigurable Computing S. Reda, Brown University Software Only Solutions Times: –On laptop (Intel 2333 MHz): 8 secs. –On NIOS (100 MHz): 3500 secs. Inherently sequential –Early false detection: quit the computation if we find two digits that do not match.  Brings down expected # divide operations to less than 2.2

16 Reconfigurable Computing S. Reda, Brown University Software Only Solutions Observations: 1. Detect whether the MSD is a given number without division –MSD test: d is the MSD of number n of length L if and only if d*10 L-1 ≤ n < (d+1)* 10 L-1 E.g 4*10 3 <= 4765 < 5*10 3 2. “Cut out” the MSD: 4665 – 4*10 3 = 665 and continue. Algorithm: find one LSD after another, compare with MSDs, quit early if not a palindrome. Runs in 8 seconds on laptop

17 Reconfigurable Computing S. Reda, Brown University Software Only Solutions On NIOS, division is really expensive Division free algorithm: Don’t test the MSD, find it with binary search

18 Reconfigurable Computing S. Reda, Brown University Software Only Solutions On NIOS, division is really expensive Algorithm: –Start from left –Find half of the digits –Compute the palindrome whose left half matches these digits –Compare to the tested number Loose the early false detection, but still better than division. Runs in 3500 secs on NIOS 100 MHz.

19 Reconfigurable Computing S. Reda, Brown University Using the Hardware A general trick to divide by a constant without using division. Based on trick I read in “Hackers Delight” of how to divide by 3. Demonstrate on divide by 10: –Given: number n < 2 30 –Needed: floor(n/10) –Algorithm: Multiply n by (2 31 +2)/10 = 0xCCCCCCD, and then shift right 31 positions.

20 Reconfigurable Computing S. Reda, Brown University Division Free divide by 10 Algorithm: Multiply n<2 30 by (2 31 +2)/10 = 0xCCCCCCD, and then shift right 31 positions. Proof: The above algorithm outputs: floor[ n * ((2 31 +2)/10) * 1/2 31 ] = floor [ n/10 + 2*n/(10*2 31 ) ] n < 2 30 implies: 2*n < 2 31  < 1/10 floor(n/10) <= n/10 <= floor(n/10) + 9/10 floor(n/10) <= n/10 + 2*n/(10*231) < floor(n/10) + 1 = floor(n/10)

21 Reconfigurable Computing S. Reda, Brown University Divide by Constant Similarly, to divide n by a constant C, we need to find P and R such that: –2 P + R = 0 mod C. –R*n < 2 P And then multiply n by (2 P + R)/C, and shift right P positions. Found the constants to all powers of 10 needed. Algorithm worst register to register delay: 25 ns. Run Time: 33 secs.

22 Reconfigurable Computing S. Reda, Brown University EN2911X Lab 2: Palindromes Brian Reggiannini and Chris Erway

23 Reconfigurable Computing S. Reda, Brown University Checking a palindrome All combinational logic! Step 1: Convert 30-bit integer to 37-bit binary-coded decimal (BCD) format Step 2: Detect the length of decimal number Step 3: Compare pairs of digits with XOR

24 Reconfigurable Computing S. Reda, Brown University Binary to BCD converter

25 Reconfigurable Computing S. Reda, Brown University Binary to BCD converter

26 Reconfigurable Computing S. Reda, Brown University Integration with Nios II Worst-case propagation delay: 43ns, 5 cycles Don’t want to wait! Use 32-bit PIO interface Array of 25 palindrome-checking units Write out 32-bit start value… –Read back # of total palindromes found (from next 25) –While Nios is waiting: increment loop counter

27 Reconfigurable Computing S. Reda, Brown University Nios Software

28 Reconfigurable Computing S. Reda, Brown University Results Original C program: 49.59s/billion Unoptimized Nios C program: 7842s/100million Final result: 4.2s/billion (420000036 cycles @ 100MHz) –Total logic elements: 23,039 / 33,216 (69%)


Download ppt "Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lab 2 presentations Prof. Sherief Reda Division of Engineering,"

Similar presentations


Ads by Google