FIR Tap Filter Optimization CE222 Final Project Spring 2003 S oleste H ilberg N icole S tarr
FIR Tap Filter Finite Impulse Response Filter FIR filters are one of the two primary types of digital filters used in Digital Signal Processing (DSP) applications To implement the filter: 1. Put the input sample into the delay line 2. Multiply each sample in delay line by corresponding coefficient & accumulate result 3. Shift the delay line by one sample to make room for the next input sample
FIR Tap Filter A FIR filter produces an output, y(n), that is the weighted sum of the current and past inputs, x(n) A 3-tap filter is based on 3 previous inputs y n = b 0 x n + b 1 x n-1 + b 2 x n-2 = b i x n-i i = 0 3
FIR Tap Filter Input: Sample [23:0] Output: Result [47:0] Clock frequency: 20 MHz 24 Sample Result Input_valid output_ready rst clk 48
FIR Tap Filter Executes processes in the following order: (Load) Load input data to rin (Calc1) Multiply rin with coefficient c0, store to acc (Calc2) Multiply rs0 with coefficient c1, add to acc, store in acc (Calc3) Multiply rs1 with coefficient c2, add to acc, store in acc (Shift) Store acc in result, move rs0 to rs1, move rin to rs0
Testbench Input / Output INPUT OUTPUT a f a054af e0546e d ec a053ab e0536a a a 00000b6052e8 INPUT OUTPUT bbffdb b1fa dfa8c fa fa fa98a dfa9cb faa0c faa4d faa8e
First Attempt: 5 cycle latency 3 calculation states wait load calc1 calc2 calc3 shift Input_valid = 1 Input_valid = 0 output_ready = 0
Results: 5 cycle latency Verilog Simulation Results: Testbench simulation complete Required time to complete: Number of inputs processed: 20 Cycles to complete: Time-to-input ratio: 5.125
Second Attempt: 3 cycle latency Condense 3 calculation states into 1 state wait load calc1 shift Input_valid = 1 Input_valid = 0 output_ready = 0 load
Results: 3 cycle latency Verilog Simulation Results: Testbench simulation complete Required time to complete: 6450 Number of inputs processed: 20 Cycles to complete: 64.5 Time-to-input ratio: 3.225
Third Attempt: 2 stage pipeline Stage 1: Shift registers, load new input value Stage 2: Calculate results time Instruction STAGE 1STAGE 2 STAGE 1 STAGE 2
Results: 2 stage pipeline Verilog Simulation Results: Testbench simulation complete Required time to complete: 2150 Number of inputs processed: 20 Cycles to complete: 21.5 Time-to-input ratio: 1.075
Speedup: Pipeline over State Machine 3 stage state machine: 37% faster than 5 stage state machine 2 stage pipeline machine: 67% faster than 3 stage state machine 79% faster than 5 stage state machine
Results Comparison: Area 5 state machine Combinational area: Noncombinational area: Total cell area: 2 stage pipeline Combinational area: Noncombinational area: Total cell area:
Results Comparison: Timing 5 state machine data required time 9.77 data arrival time slack (MET) 0.00 2 stage pipeline data required time 9.00 data arrival time slack (MET) 0.84