Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to VLSI Programming Lecture 9: High Performance DLX

Similar presentations


Presentation on theme: "Introduction to VLSI Programming Lecture 9: High Performance DLX"— Presentation transcript:

1 Introduction to VLSI Programming Lecture 9: High Performance DLX
(course 2IN30) Prof. dr. ir.Kees van Berkel

2 Time table 2005 date class | lab subject Aug. 30 2 | 0 hours
intro; VLSI Sep. 6 3 | 0 hours handshake circuits Sep. 13 handshake circuits assignment Sep. 20 Tangram Sep. 27 no lecture Oct. 4 Oct. 11 1 | 2 hours demo, fifos, registers | deadline assignment Oct. 18 design cases; Oct. 25 DLX introduction Nov. 1 low-cost DLX Nov. 8 high-speed DLX Dec. 13 deadline final report 11/15/2018 Kees van Berkel

3 Lecture 9 Recapitulation of Lecture 8
3-stage DLX, using branch-delay slots Lab work: 3-stage DLX in Tangram:  80 MIPS Industrial applications of asynchronous technology Conclusion of course 2IN30 11/15/2018 Kees van Berkel

4 Pipelining in Tangram Compare three programs:
P0: *[ a?x0 ; b!f2(f1(f0(x0))) ] P1: *[ a?x0; x1:= f0(x0) ; x2:= f1(x1) ; b!f2(x2) ] P2: *[ a?x0 ; a1!f0(x0) ] || *[ a1?x1 ; a2!f1(x1) ] || *[ a2?x2 ; b!f2(x2) ] 11/15/2018 Kees van Berkel

5 Pipelining in Tangram (cntd)
Output sequence b identical for P0, P1, and P2. P0 and P1 have same communication behavior; P1 is larger, slower, and warmer. P2 vs P1: similar in size, energy, and latency, but up to 3 times higher throughput, depending on (relative) complexity of f0, f1, f2. 11/15/2018 Kees van Berkel

6 DLX0: instruction loop do -halted then ROMaddr!PC ; ROMdata?ir
; PC:=PC {auxPC:=PC+4 ; PC:=PCaux} ; case (ir cast Itype.0) is <<t,f,f,f,f,f>> then LW() or <<t,f,f,f,f,t>> then SW() or <<f,f,f,f,f,f>> then if (ir cast Rtype.4 = 1) then SLT() fi or <<f,t,f,f,f,f>> then BEQZ() or <<f,t,f,f,f,t>> then J() or <<f,f,t,f,f,f>> then halted:=true si od 11/15/2018 Kees van Berkel

7 DLX0: instruction loop Each instruction cycle:
4 sequential commands for each instruction type sequential commands for specific instructions = 5-7 sequential commands each cycle Pipelining: split these 5-7 commands over 2 stages, in a (more or less) balanced way. … is simple when instruction does not affect PC, but more difficult for jump and branch instructions. 11/15/2018 Kees van Berkel

8 2-stage DLX: example template
Instruction Execute Data RAM address data Program ROM Instr. Fetch/Decode pc ir 11/15/2018 Kees van Berkel

9 DLX: 3-stage pipelined execution
Time  [instruction cycles] IF ID EX Program execution  [instructions] Stage EX includes memory access and writeback 11/15/2018 Kees van Berkel

10 3-stage DLX: example template
Program ROM Instruction Fetch Instruction Decode Instruction Execute DATA RAM address data pc ir ? 11/15/2018 Kees van Berkel

11 Reducing pipeline branch penalties
Problem: which instruction to fetch after branch instruction? Strategies: wait until branch address is computed (DLX0) predict branch not taken predict branch taken introduce branch-delay slots (today’s assignment) 11/15/2018 Kees van Berkel

12 Branch delay slots Single branch delay slot:
branch instruction branch-delay instruction branch target (if not taken) Branch-delay instruction, various possibilities: e.g. instruction preceding branch instruction (if branch condition does not depend on outcome); ... or an instruction succeeding the branch, if … NOP instruction if no productive alternative available. This constitutes a change in the ISA! 11/15/2018 Kees van Berkel

13 Final assignment 3-stage DLX, with instruction rate exceeding 80 MIPS when executing GCD (measured over several GCD cycles). NB1: exploit branch delay slots. This requires a different version of the assembler text!!. NB2: can be achieved using command level parallelism and pipelining. (Expression-level parallelism may yield a bonus.) NB3: speed up the environment (RAM, ROM) when necessary. 11/15/2018 Kees van Berkel

14 VLSI programming of asynchronous circuits
behavior, area, time, energy, test coverage Tangram program feedback compiler simulator Handshake circuit expander Asynchronous circuit (netlist of gates) 11/15/2018 Kees van Berkel

15 Demonstrator ICs 11/15/2018 Kees van Berkel
ImageNet IC: 2 weeks from start to tape-out all first-time-right, except: Mozart: (aC) 11/15/2018 Kees van Berkel

16 Added value 1985: modularity, ease of design (no value added to product!) 1990: low power (ESPRIT project ) 1992: low noise, low EME (Electro-Magnetic Emission) 2000: ... 11/15/2018 Kees van Berkel

17 Added value: low power DCC Error Corrector
11/15/2018 Kees van Berkel

18 A sync-async “arms race”
11/15/2018 Kees van Berkel

19 Synchronous 80C51 - Asynchronous 80C51
Added value: Low Power Synchronous 80C Asynchronous 80C51 11/15/2018 Kees van Berkel

20 Added value: Low EM Emission
11/15/2018 Kees van Berkel

21 Roadblock: circuit size the 80C51 learning curve
1995/6 1999/4 11/15/2018 Kees van Berkel

22 “Just in time” processing
Asynchronous DSP circuit fifo DC/DC in out Vdd Vdd’ fifo Asynchronous DSP circuit Asynchronous DSP circuit . Asynchronous DSP circuit 11/15/2018 Kees van Berkel

23 ADPCM 11/15/2018 Kees van Berkel

24 ADPCM 11/15/2018 Kees van Berkel

25 ADPCM 11/15/2018 Kees van Berkel

26 Industrialization of the Technology
Philips Semiconductors Zürich (1994 Dec): “We want to set a world record in low power, by using asynchronous technology.” Their choice for a vehicle: the 80C51 micro-controller (used in many consumer products). Result: 4× less power, minimal EME. Follow-up: pager baseband ICs, … In parallel: transfer and upgrade of tools + design flow Finally we had a group who, given our low-power claim, wanted to invest x manyears in trying to exploit handshake technology. 11/15/2018 Kees van Berkel

27 Pager Baseband Controller ICs
Myna pager: FLEX™ protocol 32 alphanumeric messages a single AAA battery (1V) up to 25 weeks battery life Pager baseband controller ICs: PCA5007, PCA 5010 com/pip/PCA5007 async.html Industrialization is one thing, Commercialization another. 11/15/2018 Kees van Berkel

28 1998-Sep: the PCA 5007 11/15/2018 Kees van Berkel

29 A new generation of pagers: a common platform for all standards
PCA 5007 Baseband Controller LCD #   M Memory Receiver I Q I2C EMI 25V 11/15/2018 Kees van Berkel

30 EMI: a critical design factor (Electro-Magnetic Interference)
Antenna signal may be as small as 25V. Clock harmonics of synchronous micro-controllers interfere with RF (X00 MHz). With asynchronous 80C51: signal decoding by means of (standard-specific) software. (This also enables upgrading/downloading!) Furthermore: no shielding is required between controller and RF receiver. Asynchronous technology: excellent EME performance 11/15/2018 Kees van Berkel

31 PCA5007 block diagram 11/15/2018 Kees van Berkel

32 Contactless smartcard IC (ESPRIT project DESCALE)
Power regulator 80C51 micro-controller DES engine UART RAM, ROM, EEPROM 13.56 MHz clock power (a few mW) bi-directional communication (106 kbit/s) Radio link: 11/15/2018 Kees van Berkel

33 Contactless smartcard IC
Properties a) low average power lower peak power speed adaptation Merits Maximum speed for received power (a,c) Robust operation against voltage drops (c) Smaller buffer capacitor (b,c) 11/15/2018 Kees van Berkel

34 Conclusion First asynchronous VLSI circuits on the market (high volume sales). Prospects for more async products look good. Added value: low power, EME performance. Added costs: test, IC area, being different. Asynchronous VLSI technology: there is room for it in market niches, … but it may contribute to main-stream VLSI. 11/15/2018 Kees van Berkel

35 Bibliography Computer Architecture; a Quantitative Approach (3rd Ed.); John L Hennessy & David A Patterson; Morgan Kaufmann Publishers Inc, 1996. ARM System Architecture; Steve Furber; Addison Wesley, 1996. DSP Processor Fundamentals, Architectures and Features; Phil Lapsey et al (Berkeley Design Technology Inc.), IEEE, 1996. newscenter/archive/2004/handshake.html 11/15/2018 Kees van Berkel

36 Lab-work and report You are allowed to team up with a colleague (Not mandatory.) Report: more than listing of functional Tangram programs: analyze the specifications and requirements; present design options, alternatives, trade-offs; motivate your design choices; explain functional correctness of your Tangram programs; analyze & explain {area, time, energy} of your programs. 11/15/2018 Kees van Berkel

37 Lab work Assignment 6: create a 3-stage pipelined dlx3.tg
design a reduced-costs version dlx3s.tg Kees van Berkel Attn: Cecile Brouwers, HG 5.06, Wisk & Informatica Success! … and have fun! 11/15/2018 Kees van Berkel


Download ppt "Introduction to VLSI Programming Lecture 9: High Performance DLX"

Similar presentations


Ads by Google