# Chapter 14 Finite Impulse Response (FIR) Filter Implementation

## Presentation on theme: "Chapter 14 Finite Impulse Response (FIR) Filter Implementation"— Presentation transcript:

Chapter 14 Finite Impulse Response (FIR) Filter Implementation

Outline Digital Filters and FIR filters
Implementation of FIR Filters on C54x Implementation of FIR Filters on C55x Comparison of C54x and C55x

Outline of FIR Filters Generalities on Digital Filters
FIR Filters with Matlab Implementation of FIR Filters

Digital Filters The choice of the sampling frequency depends on value the maximal frequency fmax in the signal. The Shannon theorem sugests to use fS > 2 fmax.

Linear, Time-Invariant Digital Systems
Linearity Time Invariance

Impulse Response

Input-Output Relationship, Convolution

Input-Output Relationship, Convolution
Using linearity and time invariance:

Output for a Single Frequency Input
Single frequency input  Single frequency output

Frequency Transfer Function
For a digital filter the frequency transfer function is periodic. Amplitude Group delay Phase

Relationship Between Fourier Transforms of Input and Output

Z Transfer Function

Basic Relationships of a Digital Filter

Rational z Transfer Function
Linear equation with constant coefficients. Rational z transfer function are easy to build.

IIR and FIR Filters IIR = Infinite Impulse Response
FIR = Finite Impulse Response FIR IIR

FIR and IIR FIR: output yn is a linear combination of a finite number of input samples. IIR: output yn is a linear combination of a finite number of input and of output samples. Recursive form.

Causality and Stability
A filter is causal if hn=0 for n < 0 A filter is stable if the output is bounded for any bounded input. Condition for stability is: All the poles of H(z) are inside the unit circle FIR are always stable. Or:

Representation of Poles and Zeroes of H(z) in the Complex Plane

Some Useful Matlab Functions
Example for a FIR filter: Enter the filter coefficients vector b: b=[ ]; a=1; Calculate transfer function Hf, its amplitude and phase on 256 samples, with fs=1: [Hf,f]=freqz(b,a,256,1); HfA=abs(Hf); Hfphi=angle(Hf); The sampling frequency is set to 1.

Some Useful Matlab Functions
Plot impulse response: stem(b) Plot amplitude and phase of transfer function: plot(f,HfA) and plot(f,Hfphi)

Some Useful Matlab Functions
Generate a test signal = sum of cosines: x=cos(2*pi*[0:99]*0.25)+2*cos(2*pi*[0:99]*0.1); Apply the filter to x. Output is y: y=filter(b,a,x); Plot the results: plot(x); plot(y) x is the sum of 2 frequencies : 0.25 and 0.1. The frequency 0.25 is cancelled by the filter. There is only the frquency 0.1 in the output. Notice the transient response. The filter cancels the frequency 0.25. y has only the freq. 0.1.

Calculation of a FIR using Matlab
For given attenuation and frequency response characteristics, the transfer function can be calculated using different methods: Mean square error, miniMax (Chebychev) Empirical window method Corresponding Matlab functions firls and remez. fir and fir1.

Example using Matlab Design a low pass filter:
Sampling frequency = 9600 Hz Maximum attenuation (passband) = 0.1 dB Minimum attenuation (stopband) = 50 dB Limit frequencies of passband and stopband = 1200 Hz and 2600 Hz.

Example using Matlab Vector of limited frequencies (normalized)
Vector of required amplitudes: A=[ ]; Least square calculation of filter: Bls=firls(23,F,A); Mini Max calculation of filter: Bre=remez(21,F,A); Window method (Hamming): Bwin=fir1(25,( )/9600);

Results of Matlab Example
The minimum orders to satisfy the constraints are 23 for LS, 21 for minimax and 25 for the window method. Least square method Window method The order of a FIR filter is equal to the number of coefficients minus 1. Mini Max window

Results of Matlab Example
Impulse Response hn n

FIR Filters with Constant Group Delay or Linear Phase
For many applications, it is desirable to use a filter with a constant group delay (independant of the frequency). The phase will be linear or affine. 2 possible cases: symmetrical or asymmetrical FIR. Constant group delay = TS (N-1)/2 Symmetrical: h(n)=h(N-1-n) Asymmetrical; h(n)=-h(N-1-n) N is the number of coefficients

FIR filters with Constant Group Delay or Linear Phase
Asymmetric case: linear phase Asymmetrical case:

Fixed Point Implementation of FIR Filters Numerical Issues
16 bits for data and coefficients Accumulators have size 40 bits Fixed point representation of data Size B = 16 bits, Format Qk: k fractional bits Quantization of coefficients Maximum magnitude coefficient = hmax Number of bits of the integer part of coefficients is Bi: Bi = log2(hmax) Coefficients in Qk’ with k = 16-Bi

Matlab Example The coefficients Bre can be quantized using 16-bit fixed point with 15 fractional bits: Bre=round(Bre*2^15); To store the result in a text file for CCS: fp=fopen('coef.asm','wt') for i=1:22 fprintf(fp,' .word %d \n',Ba(i)) end fclose(fp)

Matlab Example File coef.asm Can be edited to be used with CCS.

FIR Implementation, Numerical issues, FRCT bit
Common case: Data and coefficients in Q15 format Product h(i)x(n-i) in Q30 (2 sign bits) By shifting products 1 bit left, the product are in Q31 format with only 1 sign bit. If the FRCT bit (Fraction) is set to 1, products are automatically shifted 1 bit left.

Structures for FIR Implementation
Common structures for FIR filters Transversal structures Trellis structure Useful in some adaptive situations. Transversal structures using: Linear buffers Circular buffers Special case for symmetrical or asymmetrical FIRs.

Transversal Structures of FIR
Structure with a delay line Transposed structure

Implementation of a FIR with a Delay Line
Most common structure used in DSP. The delay line can be implemented using a linear or a circular buffer. Basic operations: Read a new data value x(n) every TS ACCU=0 for i=0 to N-1: Multiply h(i) by x(n-i) and add it to accumulator Output y(n)

Implementation of FIR Filters on C54x
Implementation of General Transversal FIR filters Using linear buffers Using circular buffers Implementation of Symmetrical FIR filters

Operations using a Linear Buffer for a FIR with N Coefficients
Length of the delay line = N samples Read a new sample x(n) and store it in the delay line in the first position. ACCU=0 for i=0 to N-1 Read h(i) and x(n-i) Multiply h(i) by x(n-i) and add it to ACCU Output y(n) N-1 Shifts in the delay line.

Linear Buffer, MACD Mode
Instead of shifting N-1 samples at the end, do the shift in the loop one by one. Read a new sample xn and store it in the delay line in the first position. ACCU=0 for i=N-1 to 0 Read h(i) and x(n-i) Multiply h(i) by x(n-i) and add it to ACCU Shift x(n-i) in the delay line Output y(n)

MACD Instruction MACD: Multiply Accumulate and Delay move.
MACD Smem, pmad, src src=src+Smem*pmad; T=Smem; (Smem+1)=Smem If MACD used in a loop with RPT the program memory (pmad) address is automatically incremented. MACD alone = 3 cycle times In a RPT loop 1 cycle time

Implementing a FIR with MACD
Memory organization of data and coefficients

Initialization of Registers
STM Stores #value to the MMR early in the pipeline to avoid latencies. 2 words, 2cycles. Initialization of FRCT bit (fractional mode): Instructions SSBX (Set Status Bit) and RSBX (Reset Status Bit). Initialization of ACCU Using RPTZ :RePeaT after initializing ACCU at 0 Or via LD #0,A

RPT, RPTZ Instructions RPT #n RPTZ src, #n
Repeat next instruction n+1 times. Repetition counter set to n and decreases until 0. 1 or 2 cycles, not interruptible. RPTZ src, #n Same as repeat, except that src ACCU is cleared to zero before repeat. 2 cycles , not interruptible. Some instructions execute faster when in repeat mode (pipeline).

Implementing a FIR Filter with MACD
Test with CCS Filter with N=32 coefficients all equal to 1/32 Create a file fircoef.asm, address of coefficients in program mem = adr_coef

Implementing a FIR Filter with MACD
File containing coefficients fircoef.asm

Implementing a FIR Filter with MACD
File firmacd.asm with the program 2 files to compile and link: fircoef.asm and firmacd.asm Test by associating files on the ports DRR0 and DXR0 File infir.dat attached to DRR0 File outfir.dat attached to DXR0

Implementing a FIR Filter with MACD
Program file firmacd.asm: initializations

Implementing a FIR Filter with MACD
Program file firmacd.asm: endless loop See files firmacd.asm and fircoef.asm for the test in directory tutorial.

FIR with MACD, Test with CCS
Create project, create command file, compile and link. To test the impulse response: Create a file infir.dat with: A value 0.5 (0x4000) then zeros (at least 40) Set 2 probe points 1 at reading of DRR: LDM DRR 1 at end of loop: B debut Attach files to probe points infir.dat at 1rst probe point (read value stored at address 0x20 DRR) outfir.dat at second probe point (data at address 0x21 DXR is strored in the file)

Results Let program run until end of file infir.dat
Load file outfir.dat at some address in the DSP data memory (File-Data-Load) Plot the content of this memory area (View-Graph-Time/Frequency). Plot a time graph (Single Time) Plot a frequency graph (FFT: Magnitude and Phase)

Results for the impulse response and its FFT

Second Test New test with a sine input.
Replace infir.dat by file insinus.dat containing 80 samples of a sine with 40 samples per period of sine. Name outsine.dat the result file. Repeat the same operations as in the preceding test.

Second test Observe that the output is attenuated and is phase shifted by values corresponding at H(f) at fS/40.

Implementation using a Circular Buffer
A circular buffer of length N is a block of contiguous memory words addressed by a pointer using a modulo N addressing mode. The 2 extreme words of the memory block are considered as contiguous. Characteristics of a circular buffer: Instead of moving the N data in memory, just modify the pointers. When a new data x(n) arrives, the pointer is incremented and the new data is written in place of the oldest one.

Trace of Memory and Pointer in a Circular Buffer of Length 3

FIR with Circular Buffers
1 for data 1 for coefficients

Operation of FIR with Circular Buffer
Read a new input sample x(n) Store it at address of pnt_data ACCU=0 for i=1 to N-1 multiply data pointed by pnt_data by coefficient pointed by pnt_coef. Add product to ACCU decrement pointers pnt_data and pnt_coef end output y(n) from ACCU increment pnt_data of 1

Instruction MAC with 2 operands in Indirect Addressing Mode
MAC: Multiply and Accumulate MAC Xmem, Ymem, src[, dest] dst=src+Xmem*Ymem T=Xmem With Xmem, Ymem use only AR2 to AR5 Can be executed in 1 cycle time. Dual operand instructions indirect addressing restricted to: AR2, AR3, AR4, AR5 none, +, -, +0%

Circular Buffer with C54x
Circular indirect addressing mode: *ARi-%, *ARi+%, *ARi-0%, *ARi+0%, *ARi(lk)% In dual operand mode Xmem, Ymem: *ARi+0% only valid mode To perform a decrement, store a negative value in AR0. BK register: Stores the size N of the circular buffer. Must be initialized before use. There may be several circular buffers at different addresses at the same time but with the same length.

Limitations on Start Addresses of Circular Buffers
If N is written on nb bits in binary, the start address must have its nb LSB at 0: Examples: for N=32, 6 LSB of start address =0 for N=30, 5 LSB of start address =0 To access a circular buffer: Initialize BK with N (nb bits) Choose 1 ARi as a pointer The effective start address of the buffer is the value in ARi with its nb LSB at 0. The end address = start addess +N-1.

Circular buffer on C54x

Implementation of FIR Filter with 2 Circular Buffers
Same filter as in the preceding example, coefficients in section .coef (in program memory) in file fircoef.asm. N=32 2 buffers are allocated in data memory for the coefficients and the data of the filters Start addresses must be multiple of 64. First step of program after initialization: Transfer coefficients from program to data memory from adr_coef to adr_debut_coef.

Copy values from program to data memory In RPT mode pmad is automatically incremented.

Implementation of FIR with 2 Circular Buffers, Initializations

Implementation of FIR with 2 Circular Buffers, Program
See files fircirc.asm and fircoef.asm for the test.

Command File for Circular Buffer Addressing Constraint
The addresses adr_debut_dat and adr_debut_coef have to be aligned with a multiple of 64 in the example. adr_debut_dat is the start address of unitialized section buf_data. adr_debut_coef is the start address of unitialized section buf_coef. To align the 2 sections on a multiple of 64, in the command file add align(64) after the name of the sections in the MEMORY directive, for example: buf_data align(64) > DATA page 1

Implementation of a Symmetrical FIR filter
The symmetry of coefficients is used to decrease the computational load: b(n)=b(N-1-n) N time cycles for a general FIR filter with N coefficients is N (in good conditions). N/2 time cycles for a symmetrical FIR filter. Use of specific instruction FIRS.

FIRS Instruction to Work with RPT(Z)
FIRS Xmem, Ymem, pmad Xmem, Ymem corresponds to: x(n-i), x(n-N+1+i) Coefficients in program memory pmad operations of FIRS: pmad PAR while RC  0 B = B + A(32:16) x Pmem addressed by PAR A = (Xmem+Ymem)<<16 PAR=PAR+1 RC=RC-1 3 operands: 2 in data, 1 in program. Uses the 2 Accumulators A and B. USes ALU and MAC units. Multiplies the 17 MSB of A where is stored x(n-i)+x(n-N+1+i) by a value in program memory (ceoffficient) and add result to B. Then add Xmem (a new value of x(n-i)) to Ymem (a new value of x(n-N+1+i)) and store result in high part of accu A. Increments program memory address to point to a new coeffcient. 1 time cycle in repeated mode when the pipeline is filled. Decrement the reprtition counter.

Using FIRS for a Symmetrical FIR Filter
3 arrays: N/2 first coefficients, N/2 newest data and N/2 oldest data.

Using FIRS for a Symmetrical FIR Filter
BK = N/2 At the beginning AR2 and AR3 point to: the newest data x(n) and the oldest data x(n-N+1)

Using FIRS for a Symmetrical FIR Filter
FIRS is repeated N/2 times The first sum x(n)+x(n-N+1) is done before entering the loop. N/2 iterations (AR2 and AR3 incremented by 1): At the first iteration AR2 points on x(n-1) and AR3 on x(n-N+2) After N/2 iterations: AR2 is decremented of 2 and AR3 of 1. The oldest sample x(n-N/2+1) of 1st buffer is stored in 2nd buffer in place of x(n-N+1). Then AR is incremented by 1. New sample x(n+1) is stored in place of x(n).

Symmetrical FIR Implementation with FIRS, Initializations

Symmetrical FIR Implementation using FIRS, Program
See files firsym.asm and fircoef.asm for the test.

Tutorial The listing files for the prceent examples can be found in directory tutorial: Tutorial > Dsk5416 > Chapter 14 > Labs_fir

Implementation of FIR Filters on C55x
Implementation of block filters Implementation of symmetrical or asymmetrical FIR filters

Implementation of FIR Filters using C55x
2 MAC units accessed using 3 data buses D, B, C make it possible to: Calculate 2 output samples y at a time using same set of coefficients and different data x. Calculate 2 output samples y at a time using same input data x but 2 set of coefficients.

Using the 2 MAC Units Use of block filtering in order to calculate 2 output samples at a time.

Block Filter Calculate a block of M output samples:
Avoids interrupts sample by sample Allows calculation of 2 samples at a time M+N-1 inputs necessary to calculate M output samples. Because of N-1 initial conditions.

Block Filter, example N=4, M=3

Block Filter Example Double loop: Coefficients accessed by CDP:
On coefficients and on m Coefficients accessed by CDP: CDP (Cmem) modifications limited to: *CDP, *CDP+, *CDP-, *(CDP+T0). CDP uses B bus only for dual-MAC. Because B bus is internal only, coefficients must also be internal. Place data operands carefully to avoid memory conflicts (SA/DARAM).

Using Dual MAC The products bi xn-i and bi xn-i-1 are done in parallel. As they use the same coeffcicient bi, 3 buses are sufficient. The common coefficients and 2 sample data are read in parallel. The register CDP points to the coefficient bi and the 2 ARi (here AR2 and R3) points to the data xn-i and bi xn-i-1. CDP uses B bus only for dual-MAC. Because B bus is internal only, coefficients must also be internal. CDP (Cmem) modifications limited to: *CDP, *CDP+, *CDP-, *(CDP+T0). Place data operands carefully to avoid memory conflicts (SA/DARAM). The 2 products are done at the same time in the 2 MAC units. They are accumulated in the separate accumulators AC0 and AC1. In the example instruction, 2 elementary MAC instructions are done in parallel. It is represented by the :: symbol.

Initialization of Pointers
Use AMOV to do transfers during the “AD” pipeline phase. Init AR2 to point to the 1st value of input data : (x) Init AR3 to point to the 2nd value of input data (x+1) Init CDP to point to coefficient array (a)

Inner Loop on Coefficients
Pointers at the end of the repeat instruction: Reinitialization of pointers for next output sample:

Initialize size of the circular buffer: BK Set up Buffer Start Address: BSA and Xeven Set up ARi or CDP No memory alignment constraint

The even XARn (i.e. 0,2,4,6) determines the 64K Page

Selecting Circular or Linear Addressing Mode
Use the LSB of Status word ST2_55 Set or reset status bits: If BK=0 or BK=FFFF, linear = circular.

Circular Buffer Exercise
Use AR4 as a circular pointer to x{5}: Results are cumulative

Circular Buffer for Coefficients
Table of coefficients b0 … b3: Circular buffer addressed by CDP. Initialize XCDP: 7 MSB Initialize CDP to 0: offset in the buffer Set up CPD in circular addressing mode

Store Results, 32-bit Moves
Assuming fractional mode, 2 results are in high parts of AC0 and AC1 AC0 and AC1 can be saved separately: AC0, AC1 can be saved at the same time: Pairs: (AC0,AC1), (AC2,AC3) ARi incremented of 2 Even align y

Block Filter Inner Loop

Outer Loop Using RPTB or RPTBlocal
Use RPTB Repeat Block instruction We must specifiy: Start address of the block: next instruction End address: label specifies last instruction The number of repetitions counter: BRC0: loop counter initialized with count-1 Min count = 2 RPTBlocal: executes from the IBU 56 bytes maximum (if > 56 Bytes use RPTB) Reduces power consumption RPTBLOCAL limitation is 56 bytes, but the IBU can contain up to 64 bytes. Where did the other 8 go? Instruction/IBU alignment. The fetch h/w always accesses an ALIGNED 4-byte packet. So, if the first instruction in the RPTBLOCAL is the 4th byte, you lose 3 bytes. Also, if the last instruction is the first byte of a 4-byte packet, you lose another 3. The other two are taken by the RPTBLOCAL instruction itself for a total of 8 bytes. Now, the restriction of 56 bytes is NOT a hard #. It’s actually 56 bytes plus or minus 3 bytes depending on WHAT is inside the loop. Another KEY here is that the RPTBLOCAL (as well as the RPTB) instructions use the address of the LAST instruction in the block, REGARDLESS of how many bytes the instruction is. This is slightly different than 54x where we used the “end-1” concept.

Outer Loop on m: Calculate M yn-m
Please note that it looks like we are loading N into BRC0 when it should be N minus 1. However, remember how the #outputs is determined: samples-taps+1. So, samps-taps+1 (-1) = samps-taps. And the value is divided by 2 because we calculate 2 samples at a time.

More Nested loops ? Nesting RPTB or RPTBlocal:
2 levels supported using BRC0 (outer) and BRC1/BRS1 (inner) No saving of registers required for nested block repeat. Block Repeat vs. Local Repeat: BRC0 is always used for the outer loop. BRC1/BRS1 are always used for the inner loop. BRC1 is loaded ONCE outside both loops. When BRC1 is initialized, BRS1 is auto-loaded with the same value. Then, each time the inner repeat instruction is executed BRC1 is loaded with the contents of BRS1. BRC1 decrements normally inside the inner loop. This allows you to load both repeat counters outside the loops. Both inner and outer loops have their own start and end addresses (RSA0/1, REA0/1). Therefore, no saving of registers is required. RPTBLOCAL and RPTB can be used interchangeably. So, you can use two locals and two blocks or mix ‘em. Typically, if they are mixed, a RPTB will be used as the outer and RPTBLOCAL will be used as the inner because RPTBLOCAL loops execute out of the IBU. However, if the total size of BOTH blocks is still less than 56bytes, you can surely use TWO RPTBLOCALs.

Laboratory on Block Filter
Implement a block FIR with 16 coefficients and input block size = 200. Implement subroutine

Using the Stack and Subroutines
Subroutines require call and ret. During a call the return address is stored in the Stack SP. Let us call fir the subroutine: call fir

Initialize the Stack Declare an unitialized section (.usect) of appropriate length to reserve space. Initialize stack pointer to point to the top of stack +1. Recommendation: place the stack in internal memory and align on a 4-byte boundary: ALIGN= specifies bytes

The System Stack SSP When a call occurs PC[15:0] is pushed on the stack The upper 8 bits SP[23:16] are pushed on the system stack accessed by SSP System Stack Pointer. CFCT is used to store the active loop context. WSP and XSSP share the same upper 7 bits. Place SP and SSP with care to avoid dual-access delays. Regarding the “potential delays”, this can occur if the SP and SSP end up in the same block of SARAM. Remember, the values are written to the SP and SSP at the same time – two writes (and then two reads later on). Given that the SP and SSP HAVE to be on the same 64Kword data page (because they share the same upper 7 bits), it makes it “easier” to put them near each other. CFCT; there is a bit that indicates that “repeat single is active”. This allows a repeat single to be interruptible. The stack modes will be clarified in the module on interrupts where the reset vector is covered. The default stack mode is the 32-bit mode where RETA is not used. So, when the students do their labs, they won’t know the difference. However, in the interrupts lab, they will investigate the different stack modes and the material will discuss exactly what happens in each mode.

Data Types Byte: 8 bits Word: 16 bits Long: 32 bits
Long access assumes address points to MSW LSW read from same address with LSB toggled. Ptr=100h, MSW=100h, LSW = 101h Ptr=101h, MSW=101h, LSW = 100h To ensure proper alignment: Constants (int, long) are automatically aligned on type boundaries Variables: 16 bit: no problem 32 bits use: use the even-align flag: .usect “vars”,Nwords,,1

Solution: Declarations
.sect "indata" x0 .copy in7.dat .def start .cpl_off .arms_off .c54cm_off stklen .set 100 a usect "coeffs",16,1,1 y usect "results",200,1,1 BOS .usect "STK", stklen,1,1 BOSS .usect "SSTK",stklen,1,1 .sect "init" table .int 7FCh, 7FDh, 7FEh, 7FFh .int 800h, 801h, 802h, 803h .int 803h, 802h, 801h, 800h .int 7FFh, 7FEh, 7FDh, 7FCh

Solution: Code .DP a0 start: AMOV #BOS+stklen,XSPc ;set up Stack +
.sect "code" .DP a0 start: AMOV #BOS+stklen,XSPc ;set up Stack + MOV #BOSS+stklen,SSP ;System Stack Ptrs CALL copy ;copy coeffs BSET FRCT ;turn on mult. shift BSET M ;turn on 40 bit math BSET SXMD ;turn on sign exten. CALL fir ;perform fir nop here: B here ;stop

Solution: Subroutine copy
copy: AMOV #table,XAR2 ;load pointers AMOV #a0,XAR3 RPT #7 MOV dbl(*AR2+),dbl(*AR3+) ;move from table to a RET

Solution: Subroutine fir
fir: MOV #92,BRC0 ;block repeat count AMOV #x0,XAR2 ;initialize pointers AMOV #x0+1,XAR3 ;for data, AMOV #y0,XAR4 ;results AMOV #a0,XCDP ;and coeffiecients MOV #a0,BSAC ;buffer start address MOV #16,BKC ;buffer size MOV #0, CDP ;index BSET CDPLC ;turn on circ adr CDP RPTBlocal end MPYM *AR2+,*CDP ,AC0 ;AC0 1st product MPYM *AR3+,*CDP+,AC1 ;AC1 gets 2nd prd RPT #14 MAC *AR2+,*CDP+,AC0 ;form results :: MAC *AR3+,*CDP+,AC1 MOV pair(hi(AC0)),dbl(*AR4+) ;store AC0/AC1 ASUB #14,AR2 ;wrap data pointers end ASUB #14,AR3 ;next calculation RET

Implementation of Symmetrical and Anti-symmetrical FIR filters on ‘C55x
These filters may be “folded” and performed with N adds and N/2 MACs Filters need to be designed as even length

FIRSADD Xmem,Ymem, coef,Acx,Acy Acy = Acy + (Acx x (*CDP)) || Acx = Xmem + Ymem For symmetrical FIR FIRSSUB Xmem,Ymem, coef,Acx,Acy || Acx = Xmem - Ymem For anti-symmetrical FIR If performing a block FIR, dual MAC has better performance than FIRS. A design consideration for migration from ‘C54x. If you’re using a block FIR, a dual-MAC beats FIRS. On the 54x, FIRS was faster. But on the 55x, having the dual-MAC basically obsoletes the FIRS instruction. For migration from the 54x (which has the SAME benchmark - 2 taps/cycle), the advantage of FIRSADD is lower power.

Comparison of C54x and C55x 2 MAC in ‘C55x versus 1 for C54x
Well suited for block filtering and 2 taps per cycle time instead of 1 (for large N). Circular addressing modes: 3 BK registers in C55X instead of 1 in ‘C54x: allows for several simultaneous circular buffers with different size. In C54x, circular addressing mode is specified in indirect addressing type % in the instructions. In C55x, the mode in set in status register ST2_55 for each register (linear or circular). No memory alignment constraint.

Comparison of C54x and C55x Symmetrical and Anti-symmetrical FIR Filters
In C54x, instruction FIRS: Allows 2 taps/cycle for a symmetrical FIR In C55x, instructions FIRSADD + FIRSSUB: Allow us to efficiently implement symmetrical and anti-symmetrical FIRs. Despite the 2 MACs, as there is only 1 ALU, again 2 taps/cycle for symmetrical or anti-symmetrical FIRs.

Follow On Activities on 5416 DSK
Laboratory 3 for TMS320C5416 DSK To determine by practical experiment the best FIR window functions for audio. Laboratory 4 for TMS320C5416 DSK To determine by experiment how many FIR coefficients are required for acceptable audio quality. Application 4 for TMS320C5416 DSK Electronic Crossover for multiple loudspeaker system. Divides audio signal into treble and bass at 16 different selectable frequencies using FIR filters.

Follow on activities on 5510 DSK
Application “delays and echo” for TMS320C5510 DSK Simulates delays in communications networks and reflection of sound heard in a canyon. Introduces circular buffers and the configuration used for a Finite Impulse Response (FIR) filter.