Presentation is loading. Please wait.

Presentation is loading. Please wait.

6/3/20151 ENCM515 Comparison of Integer and Floating Point DSP Processors M. Smith, Electrical and Computer Engineering, University of Calgary, Canada.

Similar presentations


Presentation on theme: "6/3/20151 ENCM515 Comparison of Integer and Floating Point DSP Processors M. Smith, Electrical and Computer Engineering, University of Calgary, Canada."— Presentation transcript:

1 6/3/20151 ENCM515 Comparison of Integer and Floating Point DSP Processors M. Smith, Electrical and Computer Engineering, University of Calgary, Canada smithmr @ ucalgary.ca This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during your presentation In Slide Show, click on the right mouse button Select “Meeting Minder” Select the “Action Items” tab Type in action items as they come up Click OK to dismiss this box This will automatically create an Action Item slide at the end of your presentation with your points entered.

2 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 2 / 58 Requirements for “perfect” DSP architecture Fast instruction cycle -- not clock speed Fast hardware multiplier Floating point for easier design -- avoids scaling and overflow High precision wide busses for register, memory, processing units Fast loop operation

3 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 3 / 58 “Perfect” DSP architecture -- II Several data buses available to reduce memory bus conflict/transfer overhead Harvard architecture and/or instruction caches to avoid instruction and data-fetch clashes Duplicate resources for parallel computation Dedicated address calculation hardware Extensive temporary registers to avoid unnecessary fetches of continually used data Architecture allows easy parallel operation in multiprocessor systems -- NEW Cycle time adjustable by instruction -- UNCOMMON Duplicate resources for parallel computation of real and imaginary components -- UNCOMMON -- SIMD?

4 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 4 / 58 Integer DSP processors remain popular Around a long time so much code already development Many designs available Some complications Overflow with addition multiplication operations -- 16 bit x 16 bit means 32 bit result where only certain portions are useful Overcome with Fractional Format Overcome with special architecture features

5 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 5 / 58 Consider 12 bit A/D Double-sided -- -15V to nearly +15V 0x800 -- -15V -- negative full scale 0xA00 -- -11.25V -- three quarter negative scale 0xC00 -- -7.5V -- half negative full scale 0xE00 -- -3.75V -- quarter negative full scale 0x000 -- 0V 0x200 -- 3.75V -- quarter positive full scale 0x400 -- 7.5V -- half positive full scale Connect so that negative sign (bit 11) on A/D matches negative sign (bit 31) on 21061

6 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 6 / 58 Consider 12 bit A/D connected to 32 bit 21061 Double-sided -- -15V to nearly +15V 0x80000000 -- -15V -- negative full scale 0xA0000000 -- -11.25 -- three quarters negative 0xC0000000 -- -7.5V -- half negative full scale 0xE0000000 -- -3.75V -- quarter negative 0x00000000 -- 0V 0x20000000 -- 3.75V -- quarter positive 0x40000000 -- 7.5V -- half positive full scale Connected so that negative sign (bit 11) on A/D matches negative sign (bit 31) on 21061

7 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 7 / 58 Examples of integer problems SIMPLE SMOOTHING Let’s sum up a couple of values around -7.5V and calculate an average 0xA100000 + 0xA1000002 + …… Overflow VERY SIMPLE FIR FILTER Result = V1 * H1 + V2 * H2 Let V1 = 0xA1000000 (32 bits) Let H1 = 0x8 (3 bits) Need 35 bits to keep result What do the 35 bits mean? -- need to scale

8 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 8 / 58 Fractional values -- automatic handling of multiplication shifts Normally 0xf0000000 * 0xf0000000 would result in 64 bit values which would then need scaling

9 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 9 / 58 Fractional values -- Not all problems removed -- Overflow Understand “fractional” as “fractional full scale” Okay when multiply (R7) but look at R6 = -1 + -1

10 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 10 / 58 This is the standard overflow -1 = 0x80000000 (16 bits) -1 + -1 = 0x 80000000 + 0x 80000000 ----------------- 0x100000000 (17 bits) Can expect to overflow in the middle of integer FIR filter, although final result should be in range -1.0 to +1.0 if filter gain is less than 1. Must handle intermediate results overflowing

11 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 11 / 58 MR registers -- 80 bits wide MR2MR1MR0 -- 79 63 31 0 MR1 -- acts just the same R register in fractional mode MR2 -- OVERFLOW -- looks after the problems of -1 * -1 MR0 -- UNDERFLOW -- looks after problems of -1 / 65000 Works till have to get values out of MR -- Okay in FIR (important stuff in MR1)

12 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 12 / 58 Set the MR register to 0

13 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 13 / 58 Now subtract -( -1 * -1 )from MR MR2 -- extra sign bits? MR1 -- looks like R0

14 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 14 / 58 Subtract another -1 (get -2 as 80 bits fractional) MR2 -- extra sign bits? MR1 -- looks like R6

15 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 15 / 58 Need to look at a variety of processors TI32010 -- Very early integer DSP TI32C240 -- Later integer DSP Motorola 56000 -- Popular integer DSP AMD 29050 series (RISC with some DSP) Analog SHARC 2106X Motorola C6701 -- VLIW Analog TigerSHARC -- VLIW

16 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 16 / 58 TI32010 Block Diagram

17 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 17 / 58 TI32010 -- Details

18 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 18 / 58 More advanced TMS320C4X

19 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 19 / 58 TI240 -- Block Diagram

20 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 20 / 58 TI C2XXX Block Diagram 1

21 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 21 / 58 TI C2XXX -- Block Diagram 2

22 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 22 / 58 Motorola 56000 Core

23 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 23 / 58 Motorola 56300 Integer Processor

24 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 24 / 58 Problems with Integer Implementations Use 8-bit examples for simplicity 16 bit will have same problem 8 bit A/D for real time operations 8 bit processor Average 4 values 1, 2, 3, 4 -- answer will be correct = 5 127, 2, 3, 4 -- answer incorrect = -60

25 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 25 / 58 Solution -- Scaling Must prescale all incoming numbers by a value that guarantees that no overflow occurs. Do process then rescale Add 2 numbers -- ASR 1 - scale by 2 Add 4 numbers -- ASR 2 - scale by 4 Average 4 values 1, 2, 3, 4 -- scaled by 4 -- 0, 0, 0, 1 average = 0 -- accurate answer to 2 bits 127, 2, 3, 4 -- scaled -- 32, 0, 0, 1 answer = 32 -- accurate answer to 2 bits

26 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 26 / 58 Guard Bits -- above and below Need to do 8 bit algorithm in 16 bit processor Use 4 guard bits below and 4 above Still need to prescale, but not by as much Example adding 4 numbers -- no prescale adding 16 numbers -- no prescale? Adding 32 numbers -- prescale

27 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 27 / 58 Example of Guard Bits Store with guard bits 127 -- 0x7F -- stored as 0x07F0 2 -- 0x02 -- stored as 0x0020 3 -- 0x03 -- stored as 0x0030 4 -- 0x04 -- stored as 0x0040 Sum stored as 0x0880 Average stored as 0x0220 = 34 FIR type sum may involves 128 terms

28 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 28 / 58 Reference Source Following diagrams from Digital Signal Processing Principles, Algorithms and Applications -- 2nd addition Proakis and Manolakis, McMillian Publishing, 1992

29 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 29 / 58 Quantization Error Suppose you want to develop band-pass or low-pass IIR filter

30 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 30 / 58 Two pole IIR filter

31 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 31 / 58 Allowable pole-positions OK band-pass BAD low-pass

32 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 32 / 58 Coupled form IIR filter

33 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 33 / 58 Allowed pole positions

34 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 34 / 58 Floating Point Chips Only scale as necessary Scale automatically Many other advantages Many formats of floats Some are high precision and slow Some are low precision and fast Some are as high precision as possible given the speed Round up, round down etc

35 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 35 / 58 Floating point formats on 21K Three kinds available IEEE Single Precision -- normal operations - - 32-bit format -- Also extended 40-bit format Short Word Floating Point -- special 21K feature -- 16-bit format Used to create IIR delay lines as use less memory Special memory location for storage Special instructions -- Fpack and Funpack

36 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 36 / 58 What are the allowed numbers? 32 bit integer Minimum value is -2^31 Maximum value is +2^31 - 1 Smallest value is 1 Granularity of 1 32 bit floating point Maximum value is +2^+127 Minimum value is -2^+127 Smallest value is 2^-127 Granularity -- changes -- fine for small number, coarse for large

37 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 37 / 58 Normal 21k FP example Ordinary Decimal 178.125 Best Integer Approximation 178 Scientific Decimal 1.78125 * 10^2 Scientific Binary 1.0110010001 * 2^111 1 bit -- sign 8 bits -- for unsigned magnitude biased exponent 24 bits -- for fractional part Total 33 bits of storage

38 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 38 / 58 Normal 21k FP Example continued Scientific Binary 1.0110010001 * 2^111 1 bit -- sign 8 bits -- for unsigned magnitude biased exponent (+127) 24 bits -- for fractional part -- the 1. Is “James Bonded” -- “remembered not stored” -- need 23 bits Total 33 bits of storage Biased exponent (1).0110010001 * 2^10000110 sign = 0 biased exponent = 10000110 fractional part = 0110010001 hidden 1. For normalized numbers

39 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 39 / 58 Packed 21k Float See Appendix C Short Float type supports gradual underflow Sacrifices precision for dynamic range Largest number 2 ^ 135 Smallest “Accurate” Number 2 ^ 120 Smallest “Non-zero” number 2 ^ 110 Must scale numbers appropriately 1 bit -- sign 4 bit -- (binary exponent - 120) 11 bit -- rounded upper 11 bits of source OR 11 bit represents non-normalized form of the source when exponent stored as 0

40 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 40 / 58 Addition in Floating point 10 + 11 -- stored as (1).frac * 2^N (1).010 * 2^3 + (1).011 * 2^3 = 10.101 * 2^3 = (1).0101 * 2^4 -- must renormalize 10 + 20 (1).010 * 2^3 + (1).010 * 2^4 = 0.1010 * 2^4 + 1.010 * 2^4 -- denormalize = (1).1110 * 2^4 Remember that (1) is “magic” or remembered and is not stored Can all be done using integer instructions -- around 280 instructions per FOP Problems with co-processor -- data moves

41 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 41 / 58 AMD29050 FP pipeline

42 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 42 / 58 AMD29050 FP pipeline latency

43 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 43 / 58 FP pipeline latency -- software solution

44 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 44 / 58 FP pipeline latency -- Hardware Solution

45 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 45 / 58 21K -- Computational Unit

46 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 46 / 58 29K and 21K Comparison 29K is “general” not DSP 29K and 21K are both Super-scalar structurally 21K is super-scalar instructionally 29K has two important “superscalar features” in terms of instructions FMAC which is 2 instructions on 21K (1 in integer) 192 registers on 29K -- no need to do dm( ) and pm( ) access since already in registers! FMAC gives 29K tremendous speed advantage

47 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 47 / 58 29K and 21K Comparison Both 29K and 21K can complete new FADD every cycle -- BUT 29K FADD 7-stage pipeline at 50 MHz is FETCH DECODE Denormalize, Add, Perhaps Renormalize, Round WRITEBACK 21K FADD 3-stage pipeline at 40 MHZ FETCH DECODE EXECUTE/WRITEBACK

48 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 48 / 58 FP versus Int processors Trade algorithm stability and speed/ease of development with cost Cost is rapidly changing FP has “less baggage” in terms of legacy code Now VLIW (true) on DSP and VLIW (effective) on standard Intel and AMD stuff

49 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 49 / 58 New Trends in DSP VLIW

50 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 50 / 58

51 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 51 / 58

52 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 52 / 58

53 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 53 / 58 More comments on TIC67XX VLIW

54 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 54 / 58 Tiger SHARC -- Comparison

55 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 55 / 58 Tiger SHARC -- Block

56 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 56 / 58 Tiger SHARC comments

57 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 57 / 58 Tiger SHARC comments -- 2

58 6/3/2015 ENCM515 -- Comparing Floating Point and Integer Processors Copyright smithmr@ucalgary.ca 58 / 58 Looked at a variety of processors TI32010 -- Very early integer DSP TI32C240 -- Later integer DSP Motorola 56000 -- Popular integer DSP AMD 29050 series (RISC with some DSP) Analog SHARC 2106X Motorola C6701 -- VLIW Analog TigerSHARC -- VLIW


Download ppt "6/3/20151 ENCM515 Comparison of Integer and Floating Point DSP Processors M. Smith, Electrical and Computer Engineering, University of Calgary, Canada."

Similar presentations


Ads by Google