Presentation is loading. Please wait.

Presentation is loading. Please wait.

By: Daniel BarskyNatalie Pistunovich Supervisors: Rolf HilgendorfInna Rivkin 10/06/2010.

Similar presentations


Presentation on theme: "By: Daniel BarskyNatalie Pistunovich Supervisors: Rolf HilgendorfInna Rivkin 10/06/2010."— Presentation transcript:

1 By: Daniel BarskyNatalie Pistunovich Supervisors: Rolf HilgendorfInna Rivkin 10/06/2010

2  Overview  Project objectives  Hardware  Introduction to CAD tools  Detailed Data Flow through the system  Hardware utilization summary  Latencies summary  Points of possible improvement:  Implementation  Architecture  Algorithm  Selected improvement to be implemented  Timeline  Future plans  Project status  Gantt Chart

3  On the short run, optimize the algorithm to use minimal hardware, in order to fit on 2 FPGA chips, while maintaining minimum latency  On the long run, determine an optimal architecture to be implemented on chip (ASIC)

4  GIDEL PROCStar III card  4 x Altera Stratix III FPGA  1 GB DDR DRAM  Altera Stratix-III EP3SE260  255K Logic Elements  Maximum 768 18x18 bit multipliers *  Max. frequency - ~300MHz * In FIR mode

5  In order to get acquainted to the different CAD tools in use, we have constructed a model design and ran it through the entire process until it is burned and run on the card

6 Expander CTF OMP DSP Pseudo Inverse Incoming Samples At 60MHz Samples are filtered and decimated to 12 channels of 20MHz, Sent to Memory & Q-Frame Q-Frame collects 70 samples, calculates Q-Frame and sends it to OMP Memory OMP calculates the support from the Q-Frame. Then, it sends it to the Pseudo Inverse Memory stored samples for later reconstruction, each with the appropriate support index Reconstruction & Support Change Detection Reconstruction reconstructs data from input samples using the pseudo- inverse SCD checks for a significant change in the support, if detected – initiates calculating a new one In Iteration Mode, samples are further filtered and decimated to 12 channels of 2MHz each iteration & sent to the CTF Samples are also sent to the SCD to check for a change in the support In iteration mode, a Q-Frame is constructed, a support is calculated and accumulated for each iteration Pseudo-Inverse recovers the columns of the support from matrix A, constructs their pseudo- inverse & sends it to the Reconstruction

7 Anal og Syst em + A/D 60 MHz 12 bit 60 MHz 12 bit 60 MHz 12 bit The expander (master) sends 12 20MHz slices to the CTF (slave) each cycle 10 MHz -10 MHz 10 MHz -10 MHz 10 MHz -10 MHz 30 MHz-30 MHz 30 MHz-30 MHz 30 MHz -30 MHz 30 MHz -30 MHz 2 1 3 10 MHz -10 MHz 10 MHz-10 MHz 10 MHz-10 MHz 10 MHz -10 MHz 10 MHz -10 MHz 10 MHz -10 MHz 2 10 MHz-10 MHz 10 MHz-10 MHz 10 MHz -10 MHz LPF Memory CTF 12 samples 20MHz each 20 MHz sample The expander sends new 20MHz slices to the Memory each cycle cycle and to the DSP 2 1 3 cos sin cos sin cos sin cos sin DSP 20 MHz sample

8 Anal og Syst em + A/D 60 MHz 12 bit 60 MHz 12 bit 60 MHz 12 bit The expander (master) sends 12 2MHz slices to the CTF (slave) each cycle Once the CTF requests for new, the expander changes and sends Memory CTF 12 samples 2MHz each new 20 MHz sample The expander sends new 20MHz slices to the Memory each cycle and to the DSP 10 MHz -10 MHz 10 MHz-10 MHz 10 MHz-10 MHz 1 2 3 LPF 30 MHz-30 MHz 2 1 3 cos sin DSP 20 MHz sample 10 MHz -10 MHz 10 MHz -10 MHz 10 MHz -10 MHz 1 2 3 10 MHz -10 MHz 10 MHz -10 MHz 10 MHz -10 MHz 1 2 3 10 MHz-10 MHz 10 MHz-10 MHz 10 MHz -10 MHz 1 2 3 3x80x(20/180)10x40x(2/180) 1 21 1 MHz-1 MHz 1 MHz-1 MHz 1 MHz-1 MHz 1 MHz-1 MHz 1 MHz-1 MHz 1 MHz-1 MHz 1 MHz-1 MHz 1 MHz-1 MHz 1 MHz-1 MHz 30 MHz -30 MHz 30 MHz -30 MHz 2 1 3 1 2 3 cos sin cos sin 30 MHz-30 MHz 2 1 3 cos sin LPF -1 MHz 1 MHz-1 MHz 1 MHz-1 MHz 1 MHz

9  Normal:  Iterations: 180 MHz120 MHzMultipliers 336496Normal 459672Iteration

10 IterationsNormalCycles 20481@60MHz 408162@120MHz 612243@180MHz 3.41.35@us  Normal:  Iterations:

11  Constructs the Q-Frame for the support calculation, and sends it to the Q-Frame Block Q-Frame Q-Frame Memory Mem A 5kbit Mem B 5kbit Controller Input Channels From Expander Vector Multiplier Support Accumulator Q-Frame entries To OMP Support Vector From OMP Support Length Vector To DSP Support Indices To DSP 3x2x18 bit 12x12x18 bit complex 7x12 bit4 bit 12 bit Conversion To Complex

12  Receives a vector of 12 18-bit complex samples from the Expander (Y[1..12])  Calculates in 2 clock cycles Vector Multiplier Y1Y1 Y2Y2 Y3Y3 Y4Y4 Y 12 Y1Y1 Y2Y2 Y3Y3 Y4Y4 Y1HY1H Y2HY2H Y3HY3H Y4HY4H Y 12 H

13  On the 1 st cycle, calculates and stores the first 3 columns  Requires: 33 Complex 18  18 Complex multipliers Vector Multiplier Q 2,1 Q 1,2 Q 3,1 Q 1,3 Q 3,2 Q 2,3 y3Hy3Hy3Hy3H y2Hy2Hy2Hy2H y1Hy1Hy1Hy1H Q 1,1 y1y1y1y1 Q 2,2 Q 2,1 y2y2y2y2 Q 3,3 Q 3,2 Q 3,1 y3y3y3y3  Q 12,3 Q 12,2 Q 12,1 y 12 Memory Bank Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 Column 12

14  On the 2 nd cycle, calculates and stores the last 9 columns  Requires: 45 Complex 18  18 Complex multipliers Vector Multiplier Q 5,4 Q 4,5 Q 6,5 Q 5,6 Q 6,4 Q 4,5 Q 12,4 Q 4,12 Q 12,5 Q 5,12 Q 12,6 Q 6,12 y 12 H  y6Hy6Hy6Hy6H y5Hy5Hy5Hy5H y4Hy4Hy4Hy4H y3Hy3Hy3Hy3H y2Hy2Hy2Hy2H y1Hy1Hy1Hy1H Q 1,12  Q 1,6 Q 1,5 Q 1,4 Q 1,3 Q 1,2 Q 1,1 y1y1y1y1 Q 2,12  Q 2,6 Q 2,5 Q 2,4 Q 2,3 Q 2,2 Q 2,1 y2y2y2y2 Q 3,12  Q 3,6 Q 3,5 Q 3,4 Q 3,3 Q 3,2 Q 3,1 y3y3y3y3  Q 4,4 Q 4,3 Q 4,2 Q 4,1 y4y4y4y4  Q 5,5 Q 5,4 Q 5,3 Q 5,2 Q 5,1 y5y5y5y5  Q 6,6 Q 6,5 Q 6,4 Q 6,3 Q 6,2 Q 6,1 y6y6y6y6  Q 12,12  Q 12,6 Q 12,5 Q 12,4 Q 12,3 Q 12,2 Q 12,1 y 12 Memory Bank Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 Column 12

15  Uses 45 18x18 bit Complex Multipliers (45 DSP Half-Blocks)  Latency:  Normal Mode:  Iteration Mode:  Independent of system clock frequency!

16  Calculates the signal’s support from the Q- Frame using the Orthogonal Matching Pursuit algorithm, using several iterations Support Calculation Support Merge A Matrix Memory 12x12x18 bit complex Q-Frame entries from Q-Frame 12 bit Support Vector to Q-Frame OMP Matrix Multiplier

17  Initialization: Q-frame is loaded into residual matrix  1 cycle Q-Frame Residual Matrix

18  Phase 1: Projection  101 cycles  144 18x18 Complex multipliers Residual A AHAH Z

19 Current Support  Phase 2: Energy Calculation, Find maximum energy & Update Support  101 cycles  12 18x18 Complex multipliers Z Z1Z1 Z2Z2 Z3Z3 Z4Z4 Z5Z5 Z6Z6 Z 100 Z 101 Z1Z1 Z1HZ1H |Z 1 | 2 |Z| 2 Z1Z1 Maximum Energy

20  Phase 4: Vector Orthogonalization  Number of cycles depends on iteration (on i-th iteration – 2i cycles)  12 18x18 Complex Multipliers Current SupportA V support Previous Orthogonal Vectors V support WjWj WjWj

21  Phase 5: Vector Normalization  2 cycles + (square root calculation time)  12 18x18 Complex Multipliers V support V support H V support Previous Orthogonal Vectors W support

22  Phase 6: Residual Matrix Update  14 cycles  144 18x18 Complex Multipliers W support Residual W support H W support Residual

23  Phase 6: Residual Matrix Energy Calculation & Stopping Condition Check  13 cycles  12 18x18 Complex Multipliers Residual Calculate Column Energy Calculate Overall Energy

24  Uses 144 18x18 bit Complex Multipliers (144 DSP Half-Blocks)  Latency:  Normal Mode: ~1100 Clock Cycles  6.1 usec at 180MHz  9.1 usec at 120MHz  Iteration Mode: ~2560 Clock Cycles per iteration  14.2 usec per iteration at 180MHz  21.3 usec per iteration at 120MHz (latency is contained in Q-Frame construction latency for the next iteration, which is 35 usec per iteration)

25 Memory Expander DSP Support Change detector Samples Y Support Reconstructed signal Pseudo Inverse External memory Matrix A CTF Samples Y

26 Matrix A Support...... A mxn QR Decomposition  261 cycles  51 multipliers, 1 sqrt Q mxm

27 A QR Decomposition Q X RQTQT =  12 cycles  144 multipliers

28 R R R -1  156 cycles  1 multiplier, 1 divide

29 QTQT R -1 AtAt X=  12 cycles  144 multipliers R -1

30 AtAt YZ X= Memory  1 cycle @ 20MHz  144 multipliers Y samples

31 AtAt YZ X= DSP Support Change detector Pseudo Inverse Support changed

32 Multipliers 144 @120MHz @180MHz usCycles 3.6441@120MHz 2.45441@180MHz

33 Memory Controller...... A†A† CTF Expander DSP Support Change Detector Q-Frame OMP FPGA 1FPGA 2FPGA 3 73% 98% 75% Timeline New Incoming Sample Expander Delay 1.3usec Q-Frame Delay 3.5usec OMP Delay 6usec Pseudo-Inverse Delay 2.4usec Reconstruction Delay Sample ready For reconstruction

34 Memory Controller...... CTF Expander DSP Support Change Detector Q-Frame OMP  Use Matrix Multiplication Unit DSP Q-Frame OMP  Extend Q-Frame Calculation  Reconstruction using Matrix Multiplication Unit Support Change Detector 1 divide

35 Memory FPGA#2 - Matrix Multiplication FPGA#1 – Expander Expander Matrix multiplication unit............ DSP Reconstruction & Support Change Detection Pseudo-Inverse CTF Q-Frame OMP Controller

36  Consider rank-1 updates for a change in the support  Consider changing the QR decomposition algorithm in the DSP: Householder modified Gramm-Schmidt  Consider another decomposition: QR LQ, SVD, etc.  Consider another the MP algorithm: OMP BMP, Convex Optimization, etc.

37 System Analysis:  Studying the system’s algorithm  Understanding algorithm implementation  Analyzing hardware usage & latency Locating points of possible optimization Current System Simulation  Creating Entire Current System test environment  Simulating entire current system System Optimization  Selecting optimizations to be implemented  Implementing optimizations  Simulating optimized system DONE FUTURE PRESENT

38

39  The Memory Block consists of 2 Memory Banks, each of 12 columns 12x18x2 bits wide  Each column can be written completely in 1 cycle and independent of the other columns Memory Bank Column 2 Column 3 Column 4 Column 12 Column 1 Element 1 18 bit complex Element 2 18 bit complex Element 3 18 bit complex Element 12 18 bit complex Element 4 18 bit complex 12x12x18x2 bit vector Data from Vector Multiplier 12x12x18x2 bit vector Q-Frame to OMP Back

40  In iteration mode, the Support Accumulator accumulates the supports extracted from each iteration  When finished iterating, sends the complete support to the DSP block Support Accumulator D-FF 12x1 binary vector Iteration support From OMP Support Format Conversion 4 bit Support Length Vector To DSP 7  12 bit Support Indices To DSP D-FF Back


Download ppt "By: Daniel BarskyNatalie Pistunovich Supervisors: Rolf HilgendorfInna Rivkin 10/06/2010."

Similar presentations


Ads by Google