Common DSP Tasks… Modulation-Demodulation, Error correction Noise reduction, equalization, echo cancellation Audio compression Vector and Matrix calculations Control algorithms
DSPs Need to Do… Efficient repetitive numerical calculations Maintain numeric fidelity Provide high memory bandwidth Streaming data Real Time processing
DSPs Need to Minimize… Real Time execution unpredictability Memory use Power consumption Cost Development time
What Do DSPs Have? Specialized memory architecture (Harvard) Specialized parallel execution units Specialized addressing modes Specialized instruction sets for parallel execution Specialized peripherals
FIR Filtering… D x h0h0 D h1h1 D hnhn y 1.Two data fetches, 2.Multiply operation, 3.Accumulate Operation, 4.Input vector shifting
Multiply-Accumulate (MAC) Multiplication in single cycle Execution time ~ 200 ns Register Multiplier ALU Accumulator
Special Hardware Units… Hardware shifter. Hardware circular buffers. Special h/w for zero overhead looping. Special address generation units.
Address Generation Units… Work in parallel with DSP core execution unit. Access new addresses without pausing to calculate new addresses. Take advantage of predictability in the pattern of data access in DSP algorithms, using special addressing modes. e.g. register-indirect with post increment addressing, circular (modulo) addressing, bit- reverse addressing in hardware.
Von Neumann Architecture… Processor Core Memory (Code+Data) Address bus Data bus 1.Fetch MAC instruction 2.Read value of ‘x’ 3.Read value of ‘h’ 4.Multiply x, h and accumulate 5.Write result to memory 4 memory access operations One multiplication
Harvard Architecture… Processor Core Memory B 1.Data and Code in separate memory segments 2.Multiple address and data buses 3.Double memory bandwidth 4.Simultaneous code and data fetch Memory A AB1 AB2 DB1 DB2
Caches in DSP and GPP… 1.GPPs normally contain two on-chip caches – one for data and the other for instructions. 2.Allows full speed retrieval of instructions and data without accessing slower off-chip memory. 3.DSPs contain a very small instruction cache and no data cache. 4.GPPs use control logic to determine what code and data goes into cache, while in DSPs it is programmer’s job to make a decision.
Fixed-Point Arithmetic… Most DSPs use fixed point arithmetic than floating point. Faster. Cheaper. Hardware support for saturation arithmetic, rounding and shifting.
Special Instructions Why special instructions? Multiple operations per instruction cycle. Minimize program memory space. Specify several parallel operations in a single instruction. These instructions permit restricted access to registers and do not allow arbitrary operation combinations.
Special Instructions… MAC X0, Y0, A, X: (R0)+,X0, Y:(R4)+N4, Y0 Multiply contents of X0 and Y0 Add result to accumulator A Load register X0 from X memory location pointed to by R0 Load register Y0 from Y memory location pointed to by R4 Post-increment R0 by 1 Post-increment R4 by the contents of register R4 This instruction calculates one tap of the FIR filter in one clock cycle
Execution Time Predictability… Non-DSP applications have a maximum average response time (firm real time). DSP applications are hard real time. Important to be able to calculate exactly the processing time required, or at least the worst time scenario. GPPs do not have a good execution time predictability. Lack of execution time predictability affect code optimization.
Execution Time Predictability… GPPs – complicated algorithms for branch prediction and caching. Speculative code execution depending on branch prediction. Programmer does not know which instructions and data will go into cache and when. Worst case execution time may be a order of magnitude greater than the actual execution time.
Execution Time Predictability… DSPs do not use branch prediction algorithms. Programmer decides which instruction go into cache. No data cache in most DSPs.
Other features of DSPs and GPPs VLIW (Very Long Instruction Word). Combines a number of different instructions in a long instruction word. e.g. 256 bytes word – 8 instructions. More MACs, ALUs and other execution units. GPPs use SIMD.