Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intro to the “c6x” VLIW processor ● Texas Instruments TMSC6000 series ● TMSC6700 subseries – include floating point ● VLIW = Very Long Instruction Word.

Similar presentations


Presentation on theme: "Intro to the “c6x” VLIW processor ● Texas Instruments TMSC6000 series ● TMSC6700 subseries – include floating point ● VLIW = Very Long Instruction Word."— Presentation transcript:

1 Intro to the “c6x” VLIW processor ● Texas Instruments TMSC6000 series ● TMSC6700 subseries – include floating point ● VLIW = Very Long Instruction Word

2 Operations in Parallel registers Function units

3 Operations in Parallel registers Function units bypassing

4 Non-orthogonal registers Function units registers Bypass

5 Non-orthogonal registers L2S2 M2M2 D2D2 Function units L1S1 M1M1 D1D1 registers Bypass A B *** See TI's picture ***

6 Specialized Function Units ● L units: arithmetic, compare, and logical ops ● S units: arithmetic, logical, branches, constant generation ● M units: multiplies ● D units: address generation / memory accesses

7 registers Complicated hardware

8 registers Explicit parallelism

9 Simple VLIW encoding ● Slots that cannot be utilized are filled with no-ops ● Bad for code density, cache utilization, energy,...

10 C6X: Packets ● One bit of each instruction indicates whether next instruction can be executed in parallel (0 = “EOP”) ● Any slot can go to any function unit 0 1 0 1 1 1 1 1

11 C6X: Packets ● One bit of each instruction indicates whether next instruction can be executed in parallel ● Any slot can go to any function unit 0 1 0 1 1 1 1 1

12 C6X: Packets ● One bit of each instruction indicates whether next instruction can be executed in parallel ● Any slot can go to any function unit 0 1 0 1 1 1 1 1 0 1 0 1 1 1 1 1 ● Packet cannot cross an 8-word boundary ● Resources constrain which instructions can be combined in the same packet ● You can branch into the middle of a packet!

13 Explicit scheduling Delay slots must be respected – no HW interlocks or scoreboarding Multiply – 1 delay slot Load – 4 delay slots Branch – 5 delay slots B5 := B3 * B2 B7 := B5 + B1 Right B5 := B3 * B2 B7 := B5 + B1 Wrong

14 Predicated execution Why? To get rid of branches (5 delay slots * 8 wide....) Basic idea: a comparison result is stored to a condition register ; this register is then used as an operand of other instructions, and its value causes those operations to be selectively enabled or squashed. [Condition registers: A1, A2, B0, B1, B2] Example: If (B3 { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/3249399/11/slides/slide_13.jpg", "name": "Predicated execution Why.", "description": "To get rid of branches (5 delay slots * 8 wide....) Basic idea: a comparison result is stored to a condition register ; this register is then used as an operand of other instructions, and its value causes those operations to be selectively enabled or squashed. [Condition registers: A1, A2, B0, B1, B2] Example: If (B3

15 Predicated execution With branches: cmp B3, B4 bge L2 B3 := B3+1 b DONE L2: B4 := B4+1 DONE: With predicates: cmplt B3, B4 B0 [B0] B3 := B3+1 [!B0] B4 := B4+1...and the last two can be issued in parallel! Control dependency has been converted to data dependency...

16 Assembly details.text.align 32.global proc proc: mvk 4, b3 mvk 5, b4 cmpgt b3, b4, b0 [ b0] mvk.S2 9, b5 || [!b0] mvk.S1 8, a5 stw a5, *-a15[4].....

17 Fetch/execute pipeline PG generate program address PS program address send PW program memory access PR fetch reaches CPU boundary DP instruction dispatch DC instruction decode E1 execute 1 E2 execute 2 E3 execute 3 E4 execute 4 E5 execute 5

18 Addressing Modes *R (*R) *+R[ucst5] (R[ucst5]) *-R[ucst5] (R[-ucst5]) *+R[offsetR] (R[offsetR]) *-R[offsetR] (R[-offsetR]) Special case: 15b offsets: *+B15[ucst15] *+B14[ucst15] C equivalent

19 Addressing Modes Pre/post increment/decrement *++R, *R++ *++R[ucst5], *R++[ucst5] *--R[ucst5], *R--[ucst5] *++R[offsetR], *R++[offsetR] *--R[offsetR], *R--[offsetR]

20 Resources http://www.cs.cmu.edu/~tcal/15745/


Download ppt "Intro to the “c6x” VLIW processor ● Texas Instruments TMSC6000 series ● TMSC6700 subseries – include floating point ● VLIW = Very Long Instruction Word."

Similar presentations


Ads by Google