Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Silicon Programming in the Tangram/Haste language Material adapted from lectures by: Prof.dr.ir Kees van Berkel [Dr. Johan Lukkien] [Dr.ir.

Similar presentations


Presentation on theme: "Introduction to Silicon Programming in the Tangram/Haste language Material adapted from lectures by: Prof.dr.ir Kees van Berkel [Dr. Johan Lukkien] [Dr.ir."— Presentation transcript:

1 Introduction to Silicon Programming in the Tangram/Haste language Material adapted from lectures by: Prof.dr.ir Kees van Berkel [Dr. Johan Lukkien] [Dr.ir. Ad Peeters] at the Technical University of Eindhoven, the Netherlands

2 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-102 TU/e VLSI programming for … Low costs: –introduce resource sharing. Low delay (high throughput): –introduce parallelism. Low energy (low power): –reduce activity; …

3 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-103 TU/e VLSI programming for high performance Keep it simple!! Make the analysis; focus on bottlenecks Introduce parallelism: expressions, commands, loops, pipelining Enable parallelism, by reducing dependencies such as resource sharing

4 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-104 TU/e Expression-level parallelism Examples: balancing: (v+w)+(x+y) is faster than v+w+x+y substitution: z:= g(f(x)) is faster than y:= f(x) ; z:= g(y) carry-select adder carry-save multiplier

5 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-105 TU/e Command level parallelism If S2 does not depend on outcome of S1 then S1 ; S2 can be transformed into S1 || S2. (dependencies: data, sharing, synchronization) This reduces computation time , unless ordering is enforced through external synchronization.  (S1 ; S2 ) =  (;) +  (S1) +  (S2)  (S1 || S2 ) =  (||) + max(  (S1),  (S2))

6 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-106 TU/e Exposure of cmd-level parallelism Let *[S] be a shorthand for forever do S od Assume S0 must precede S1 and S1 must precede S2; How to speedup *[ S0 ; S1 ; S2 ] ? *[ S0 ; S1 ; S2 ] = { loop unfolding } S0 ; *[S1 ; S2 ; S0 ] = { S0 does not depend on S1} S0 ; *[S1 ; (S2 || S0) ]

7 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-107 TU/e wagging *[a?x ; b!f(x)] ={ loop unrolling, renaming } *[a?x ; b!f(x) ; a?y ; b!f(y) ] ={ loop folding } a?x ; *[b!f(x) ; a?y ; b!f(y) ; a?x]  {increases slack by 1} a?x ; *[(b!f(x) || a?y) ; (b!f(y) || a?x)]

8 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-108 TU/e Parallel reads from REG file Let RF be a register file. Then x:= RF[i] ; y:= RF[j] cannot be parallelized. (Register files have a single read port.) Parallel read actions can be realized by doubling the register file: > := > { write } and > := > { read }

9 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-109 TU/e Pipelining in Tangram Compare three programs: P0: *[ a?x0 ; b!f2(f1(f0(x0))) ] P1: *[ a?x0; x1:= f0(x0) ; x2:= f1(x1) ; b!f2(x2) ] P2: *[ a?x0 ; a1!f0(x0) ] || *[ a1?x1 ; a2!f1(x1) ] || *[ a2?x2 ; b!f2(x2) ]

10 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1010 TU/e Pipelining in Tangram (cntd) Output sequence b identical for P0, P1, and P2. P0 and P1 have same communication behavior; P1 is larger, slower, and warmer. P2 vs P1: similar in size, energy, and latency, but up to 3 times higher throughput, depending on (relative) complexity of f0, f1, f2.

11 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1011 TU/e A Processor Example: DLX (“Deluxe”) (AMD 29K + DECstation 3100 + HP850 + IBM801 + Intel i860 + MIPS M/120A + MIPS M/1000 + Motorola 88K + RISC I + SGI 4D/60 + SPARCstation-1 + Sun 4/110 + Sun-4/260) / 13 = DLX Other RISC examples include: Cray-1,2,3, AMD2900, DEC Alpha, ARM.

12 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1012 TU/e DLX instruction formats Opcode loads, stores, conditional branch,.. rs1 rd Immediate I-type offset Opcode Jump, jump and link, trap, return from exception J-type Opcode Reg-reg ALU operations rs1 rdrs2 function R-type 31 26, 25 21, 20 16, 15 11, 10 0

13 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1013 TU/e Example instructions

14 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1014 TU/e GCD in DLX assembler pre:LWR1,4(R0)R1:=Mem[4+0] LWR2,8(R0)R2:=Mem[8+0] loop: SUBR3,R1,R2R3:=R1-R2 BEQZR3,”exit”if (R3=0) then PC:=“exit” SLTR4,R1,R2R4:=(R1<R2) BEQZR4,”pos2”if (R4=0) then PC:=“pos2” pos1:SUBR2,R2,R1R2:=R2-R1 J“loop”PC:=“loop” pos2:SUBR1,R1,R2R1:=R1-R2 J“loop”PC:=“loop” exit:SW20(R0),R1Mem[20+0]:=R1 HLT

15 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1015 TU/e DLX interface, state Instruction memory Mem (Data memory) pc address instruction address data r/w clockinterrupt r0 r1 r2 r31 DLX CPU Reg

16 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1016 TU/e DLX: “Moore machine” (ignoring interrupts)  Reg[0],pc  :=  0,0  ; do  Mem[Reg[rs1 +immediate], pc, Reg[rd]  :=  if SW  Reg[rd] fi, if J  pc+4+offset [] BEQZ  if Reg[rs]=0  pc+4 +immediate [] Reg[rs]#0  pc+4 fi [] else  pc+4 fi, if LW  Mem[rs1+immediate] [] ADD  ALU(add, Reg[rs1], Reg[rs2]) fi  od

17 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1017 TU/e DLX: 5-step sequential execution Reg A B Imm ir npc pc aluo cond lmd 0? Instr. mem 4 Mem IFIDEXMM WB

18 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1018 TU/e DLX: pipelined execution IFIDEXMMWB IFIDEXMMWB IFIDEXMM IFIDEX IFIDEXMMWB IFIDEXMMWB Time  [in clock cycles] 1 2 3 4 5 6 7 8... Program execution  [instructions]

19 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1019 TU/e DLX: pipelined execution Reg pc 0? Instr. mem 4 Mem Instruction FetchInst.DecodeEXecuteMemory Write Back

20 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1020 TU/e DLX system organization dlx(…) rom(…)ram(…) system_dlx(…) file: gcd.bin files: RAMout RAMin RAMaddr datatoRAM datafromRAM ROMaddr ROMdata system boundary

21 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1021 TU/e dlx0.ht #include types.ht & dlx0 : export proc ( ROMaddr!chan adtype & ROMdata?chan word & RAMaddr!chan rwadtype & datatoRAM!chan S30 & datafromRAM?chan S30 ). begin … RF: ram array U5 of S30 end

22 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1022 TU/e system_dlx0.ht #include "dlx0.ht" & dlx0 : proc ( ROMaddr!chan adtype & ROMdata?chan word & RAMaddr!chan rwadtype & datatoRAM!chan S30 & datafromRAM?chan S30 ). import & env_dlx4 : main proc ( & ROMfile? chan word & RAMinfile? chan S30 & RAMfile! chan S30 /* > */ ). begin next slide end

23 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1023 TU/e system_dlx0.ht : main body begin & ROMaddr : chan adtype & ROMdata : chan word & RAMaddr : chan rwadtype & datatoRAM : chan S30 & datafromRAM: chan S30 … & ROMinterface : proc(). begin.. end & RAMinterface : proc(). begin.. end | initialise() ; ROMinterface() || RAMinterface() || dlx0( ROMaddr, ROMdata, RAMaddr, datatoRAM, datafromRAM ) end

24 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1024 TU/e script htcomp system_dlx0 htsim -limit 1000 system_dlx0 RAMin RAMout htview system_dlx0 Htmap system_dlx0

25 Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1025 TU/e DLX0: instruction loop do -halted then ROMaddr!PC ; ROMdata?ir ; PC:=PC+4 {auxPC:=PC+4 ; PC:=PCaux} ; case (ir cast Itype.0) is > then LW() or > then SW() or > then if (ir cast Rtype.4 = 1) then SLT() fi or > then BEQZ() or > then J() or > then halted:=true si od


Download ppt "Introduction to Silicon Programming in the Tangram/Haste language Material adapted from lectures by: Prof.dr.ir Kees van Berkel [Dr. Johan Lukkien] [Dr.ir."

Similar presentations


Ads by Google