4DSPs Need to Do… Efficient repetitive numerical calculations Maintain numeric fidelityProvide high memory bandwidthStreaming dataReal Time processing
5DSPs Need to Minimize… Real Time execution unpredictability Memory use Power consumptionCostDevelopment time
6What Do DSPs Have? Specialized memory architecture (Harvard) Specialized parallel execution unitsSpecialized addressing modesSpecialized instruction sets for parallelexecutionSpecialized peripherals
7FIR Filtering… x D y h0 h1 hn Two data fetches, Multiply operation, DSP algorithms are the basis for most of the DSP special hardware units.Two data fetches,Multiply operation,Accumulate Operation,Input vector shifting
8Multiply-Accumulate (MAC) Multiplication in single cycleExecution time ~ 200 nsRegisterMultiplierALUAccumulator
9Special Hardware Units… Hardware shifter.Hardware circular buffers.Special h/w for zero overhead looping.Special address generation units.
10Address Generation Units… Work in parallel with DSP core execution unit.Access new addresses without pausing to calculate new addresses.Take advantage of predictability in the pattern of data access in DSP algorithms, using special addressing modes.e.g. register-indirect with post increment addressing, circular (modulo) addressing, bit-reverse addressing in hardware.
11Von Neumann Architecture… Fetch MAC instructionRead value of ‘x’Read value of ‘h’Multiply x, h and accumulateWrite result to memoryProcessor CoreAddress busData bus4 memory access operationsOne multiplicationMemory(Code+Data)
12Harvard Architecture… Data and Code in separate memory segmentsMultiple address and data busesDouble memory bandwidthSimultaneous code and data fetchProcessor CoreAB1DB1AB2DB2Memory AMemory B
13Caches in DSP and GPP…GPPs normally contain two on-chip caches – one for data and the other for instructions.Allows full speed retrieval of instructions and data without accessing slower off-chip memory.DSPs contain a very small instruction cache and no data cache.GPPs use control logic to determine what code and data goes into cache, while in DSPs it is programmer’s job to make a decision.
14Fixed-Point Arithmetic… Most DSPs use fixed point arithmetic than floating point.Faster.Cheaper.Hardware support for saturation arithmetic, rounding and shifting.
15Special Instructions Why special instructions? Multiple operations per instruction cycle.Minimize program memory space.Specify several parallel operations in a single instruction.These instructions permit restricted access to registers and do not allow arbitrary operation combinations.
16Special Instructions… MAC X0, Y0, A, X: (R0)+,X0, Y:(R4)+N4, Y0Multiply contents of X0 and Y0Add result to accumulator ALoad register X0 from X memory location pointed to by R0Load register Y0 from Y memory location pointed to by R4Post-increment R0 by 1Post-increment R4 by the contents of register R4This instruction calculates one tap of the FIR filter in one clock cycle
17Execution Time Predictability… Non-DSP applications have a maximum average response time (firm real time).DSP applications are hard real time.Important to be able to calculate exactly the processing time required, or at least the worst time scenario.GPPs do not have a good execution time predictability.Lack of execution time predictability affect code optimization.
18Execution Time Predictability… GPPs – complicated algorithms for branch prediction and caching.Speculative code execution depending on branch prediction.Programmer does not know which instructions and data will go into cache and when.Worst case execution time may be a order of magnitude greater than the actual execution time.
19Execution Time Predictability… DSPs do not use branch prediction algorithms.Programmer decides which instruction go into cache.No data cache in most DSPs.
20Other features of DSPs and GPPs VLIW (Very Long Instruction Word).Combines a number of different instructions in a long instruction word.e.g. 256 bytes word – 8 instructions.More MACs, ALUs and other execution units.GPPs use SIMD.