Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007.

Similar presentations


Presentation on theme: "Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007."— Presentation transcript:

1 Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

2 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 2 Doublet Matching, Hash Sorter

3 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 3 Hit Matching SoftwareFPGA Typical FPGA Resource Saving Approaches O(n 2 ) for(){ for(){…} } O(n)*O(N) Comparator Array Hash Sorter O(n)*O(N): in RAM O(n 3 ) for(){ for(){…} } O(n)*O(N 2 ) CAM, Hugh Trans. Tiny Triplet Finder O(n)*O(N*logN) O(n 4 ) for(){ for(){ for() {…} }}}

4 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 4 Hash Sorter K K D K D Pass 1:  Data in Group 1 are stored in the hash sorter bins based on key number K. Pass 2:  Data in Group 2 are fetched though and paired up with corresponding Group 1 data with same key number K. Group 1 Group 2

5 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 5 Hash Sorter K

6 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 6 Hash Sorter Implementation Pipelined structure: Single clock cycle push or pop Single clock cycle fast reset

7 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 7 An Example of Track Recognition: Event We explain the track recognition process using this 20-track example.

8 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 8 Tangent Angle Measurements There are various techniques to measure the tangent angle of the track segment (or “doublet”, or “cluster”). Sometimes extra “ghost” segments may exist. The ghost segments may be resolved in track recognition process later. 

9 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 9 A Large Curvature Track A soft track hits large  region.  A global algorithm is better suited. The “high-p T ” approximation is not valid globally.  Exact track equation is needed. R r 00   Parameter: Initial angle Measure the tangent angle.. Parameter: Radius of curvature

10 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 10 An Example of Track Recognition: Clustering 00 c0c0 clustering The doublets in clusters are grouped together. The “ghost” doublets are gone. The 9-bin schemeThe 4-bin scheme For doublets on the seeding super layer in this bin… search for coincident in these 9 bins. For doublets on the seeding super layer in this bin… search for coincident in these 4 bins.

11 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 11 FPGA Block Diagram Hash sorters for  0 Hash sorters for c 0

12 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 12 Without Full Track Recognition Two track parameters can be calculated for each doublet. Useful trigger primitives can be found without full track recognition. For example…

13 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 13 Triplet Finding, Tiny Triplet Finder

14 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 14 Hit Matching SoftwareFPGA Typical FPGA Resource Saving Approaches O(n 2 ) for(){ for(){…} } O(n)*O(N) Comparator Array Hash Sorter O(n)*O(N): in RAM O(n 3 ) for(){ for(){…} } O(n)*O(N 2 ) AM, CAM, Hugh Trans. Tiny Triplet Finder O(n)*O(N*logN) O(n 4 ) for(){ for(){ for() {…} }}}

15 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 15 Hits, Hit Data & Triplets Hit data come out of the detector planes in random order. Hit data from 3 planes generated by same particle tracks are organized together to form triplets.

16 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 16 TTF Operations Phase I: Filling Bit Arrays Note: Flipped Bit Order Physical Planes Bit Array/Shifters For any hit… Fill a corresponding logic cell. x A + x C = 2 x B x A = - x C + constant

17 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 17 TTF Operations Phase II: Making Match For any center plane hit… Logically shift the bit array. Perform bit- wise AND in this range. Triplet is found. Physical Planes Bit Array/Shifters

18 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 18 Tiny Triplet Finder Reuse Coincident Logic via Shifting Hit Patterns C1 C2 C3 One set of coincident logic is implemented. For an arbitrary hit on C3, rotate, i.e., shift the hit patterns for C1 and C2 to search for coincidence.

19 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 19 Tiny Triplet Finder for Circular Tracks *R1/R3 *R2/R3 Triplet Map Output To Decoder Bit Array Shifter Bit Array Shifter Bit-wise Coincident Logic 1.Fill the C1 and C2 bit arrays. (n1 clock cycles) 2.Loop over C3 hits, shift bit arrays and check for coincidence. (n3 clock cycles) Also works with more than 3 layers

20 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 20 Tiny? Yes, Tiny! – Logic Cell Usage: AM, CAM, Hough Transform etc., O(N 2 ) Tiny Triplet Finder O(N*logN)

21 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 21 Complex Triplet Fining Problems

22 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 22 Options of Sequence Control

23 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 23 Micro-computing vs. Reconfigurable Computing In microprocessor, the users specify program on fixed logic circuits. In FPGA, the users specify logic circuits (as well as program). The FPGA computing needs not to follow microprocessor architectures. (But useful experiences can be borrowed.) The usefulness of FPGA reconfigurable computing is still to be fully appreciated. (100+3-4)*5+7 =? 100 3 4 5 7 Control: Data: 100,3,4,5,7 LD(-)(+)(*)(+) CPU FPGA Data Program Configuration Data Program

24 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 24 ELMS– Enclosed Loop Micro-Sequencer Loop & Return Logic + Stack Conditional Branch Logic Program Counter ROM 128x 36bits A Reset CLK Control Signals PCControl SignalsOpration 00000000000000000 01001000100011010LDR1, #n 02000010001000000LDR2, #addr_a 03000000000000100LDR3, #addr_X 04000000010001000LDR7, #0 05000000000100001BckA1LDR4, (R2) 06000100000010000INCR2 07000001000100000LDR5, (R3) 08000100010000001INCR3 09001001000100000MULR6, R4, R5 0a000000010001000EndA1ADDR7, R7, R6 0b000010000010000DECR1 0c000000100000100BRNZBckA1 Special in ELMS Supports FOR loops at machine code level PC+ROM is a good sequencer in FPGA. Adding Conditional Branch Logic allows the program to loop back. Loop & Return Logic + Stack is a special feature in ELMS that supports FOR loops at machine code level. Allows jump back as in microprocessors

25 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 25 Software: Using Spread Sheet as Compiler

26 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 26 What’s Good about ELMS No ALU => Small Resource Usage Program DATA Memory Princeton Architecture Harvard Architecture Fermilab Architecture(?) Program Control ALU Program Memory Program Control ALU DATA Memory Program Memory Sequencer (ELMS) Data Processor DATA Memory The Princeton Architecture is more suitable at system level while Harvard Architecture is better suited at micro-structure level. Regular microprocessors cannot run looped program without an ALU. The ALU takes large amount of resource while may not be efficiently utilized for data processing tasks in FPGA. The ELMS can run nested loop program without an ALU. Further separation of Program and data is therefore possible. The ELMS is kept small.

27 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 27 Recursive Structure

28 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 28 The Digitizer Card for the Fermilab Beam Loss Monitor System Beam loss input signals from ion chambers are integrated and digitized. Sliding sums are accumulated and compared with pre-loaded thresholds. Over threshold in several places causes beam abort based on pre-defined setting. Beam loss signals are filtered and “de-rippled” for display purposes. Sequence is controlled by “Seq128” block. ADC 21  s/sample RAM Fast Sliding Sum A>B Slow Sliding Sum Very Slow Sliding Sum Immediate Sliding Sum Threshold I Abort Logic A>B Threshold F A>B Threshold S A>B Threshold V CIC Sums De-ripple Process Ion Chamber Input Seq128

29 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 29 Filter Functions Sliding Sum Cascaded Integrator Comb (CIC) Sum of 2nd Order The CIC sum is a sliding sum of sliding sums. The frequency response of CIC sum is a sinc 2 (x) function that has 2nd order zeros and better stop band suppression. First Zero @ 360 Hz Frequency 21  s/sample 124 samples

30 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 30 Filter Implementation Recursive Implementation Recursive != IIR Non-Recursive Implementation Finite Impulse Respond (FIR) Infinite Impulse Respond (IIR) Possible Yes NO Resource Friendly  x[n] s[n] + -x[n-K] x[n] The non-recursive implementation needs: 124 memory fetches, 124 additions and more ops for longer sum lengths. The recursive implementation needs: 1 memory fetch, 2 add/sub operations regardless sum length. Sliding Sum

31 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 31 BLM DC Process Sequencing The processes of calculating sliding sums and CIC sums are fully sequenced. The de-ripple processor is flat for the process path. But it operates sequentially for 4 channels. Fully Sequencing Partially Flat

32 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 32 The End Thanks

33 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 33 Resource Saving Tricks Loop Reduction Tricks: The number of computations in a given task is reduced by (1) using fewer iterations in loops or/and (2) using fewer operations in each iteration. Non-Loop Reduction Tricks: The number of computations in a given task is unchanged. The FPGA resource is saved by (1) reusing the resources multiple times via sequencing or/and (2) using transistor-saving resources such as RAM.

34 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 34 Resource Saving Tricks Loop-Reduction Multiplier-less (ML) Approaches Recursive Implementation of FIR Filter FFT: O(n)*O(log(N)) Tiny Triplet Finder: O(n)*O(N*log(N)) + s[n] -x[n-K] x[n] + y[n] -s[n-K]  x[n] y[n] *h1 *h2 *h[K] X   << +/- *R1/R3 *R2/R3 Bit Array Shifter Bit Array ShifterBit-wise Coincident Logic

35 Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 35 Resource Saving Tricks Non-Loop-Reduction Sequencing:Using RAM: Hash Sorter/Histogram OP1 Initialization OP2OP3OP4 OP1OP2OP3OP4 OP1OP2OP3OP4 OP1OP2OP3OP4 Initialization 1 Initialization 2 Initialization 3 OP1 OP2 OP3 OP4 OP1 OP2 OP3 OP4 OP1 OP2 OP3 OP4 OP1 OP2 OP3 OP4


Download ppt "Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007."

Similar presentations


Ads by Google