Presentation is loading. Please wait.

Presentation is loading. Please wait.

An instruction buffer for a low-power DSP 1 An Instruction Buffer for a Low Power DSP Mike Lewis AMULET group.

Similar presentations


Presentation on theme: "An instruction buffer for a low-power DSP 1 An Instruction Buffer for a Low Power DSP Mike Lewis AMULET group."— Presentation transcript:

1 An instruction buffer for a low-power DSP 1 An Instruction Buffer for a Low Power DSP Mike Lewis AMULET group

2 An instruction buffer for a low-power DSP 2 A low-power DSP architecture n Targeted for digital mobile phones Microprocessor + DSP combination n Multi-level power reduction strategy… Asynchronous Large register file Parallel structure Parallel instructions cached

3 An instruction buffer for a low-power DSP 3 A low-power DSP architecture n Fetch unit- autonomous instruction fetch Register Bank (2x128x16 bit) Load-store unit ALU Index register values Opcode X/Y mem P mem int0, int1, nmi Operand BufferDecodeIndex reg.Fetch VLIW mem

4 An instruction buffer for a low-power DSP 4 A low-power DSP architecture n Instruction buffer: 32 entry FIFO Register Bank (2x128x16 bit) Load-store unit ALU Index register values Opcode X/Y mem P mem int0, int1, nmi Operand BufferDecodeIndex reg.Fetch VLIW mem

5 An instruction buffer for a low-power DSP 5 A low-power DSP architecture n Decode instruction, read VLIW operand Register Bank (2x128x16 bit) Load-store unit ALU Index register values Opcode X/Y mem P mem int0, int1, nmi Operand BufferDecodeIndex reg.Fetch VLIW mem

6 An instruction buffer for a low-power DSP 6 A low-power DSP architecture n Substitute and update index registers Register Bank (2x128x16 bit) Load-store unit ALU Index register values Opcode X/Y mem P mem int0, int1, nmi Operand BufferDecodeIndex reg.Fetch VLIW mem

7 An instruction buffer for a low-power DSP 7 A low-power DSP architecture n Read registers and VLIW opcode Register Bank (2x128x16 bit) Load-store unit ALU Index register values Opcode X/Y mem P mem int0, int1, nmi Operand BufferDecodeIndex reg.Fetch VLIW mem

8 An instruction buffer for a low-power DSP 8 A low-power DSP architecture n Perform operation Register Bank (2x128x16 bit) Load-store unit ALU Index register values Opcode X/Y mem P mem int0, int1, nmi Operand BufferDecodeIndex reg.Fetch VLIW mem

9 An instruction buffer for a low-power DSP 9 The instruction buffer n Stores pre-fetched instructions n Performs hardware-based loops Instructions read from memory into buffer Subsequent iterations use stored copies Buffer manages loop counter 32 instructions, with up to 16 nested loops

10 An instruction buffer for a low-power DSP 10 Requirements n Low power consumption n Minimise latency n Low cycle time: 25ns max

11 An instruction buffer for a low-power DSP 11 Asynchronous buffer designs n Micropipeline Very good cycle time Poor latency and power consumption Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Ain Rin Aout Rout En Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Ain Rin Aout Rout En Ain Rin Aout Rout En Ain Rin Aout Rout En

12 An instruction buffer for a low-power DSP 12 Asynchronous buffer designs n Word-slice FIFO Latches arranged in parallel Write token Read token Tristate Latch EnOE Full wr rd Rd_req Tristate Latch EnOE Full wr rd Rd_req Tristate Latch EnOE Full wr rd Rd_req Tristate Latch EnOE Full wr rd Rd_req Write disable Write request Read acknowledge Read request Data in Data out Write token Read token EnOE Full wr rdRd_req Read token Write token EnOE Full wr rdRd_req Read token Write token

13 An instruction buffer for a low-power DSP 13 Asynchronous buffer designs Writes disabled by ANDing full indications Read requested by ORing all read requests Write token Read token Tristate Latch EnOE Full wr rd Rd_req Tristate Latch EnOE Full wr rd Rd_req Tristate Latch EnOE Full wr rd Rd_req Tristate Latch EnOE Full wr rd Rd_req Write disable Write request Read acknowledge Read request Data in Data out

14 An instruction buffer for a low-power DSP 14 Write token Read token Tristate Latch EnOE Full wr rd Rd_req Tristate Latch EnOE Full wr rd Rd_req Tristate Latch EnOE Full wr rd Rd_req Tristate Latch EnOE Full wr rd Rd_req Write disable Write request Read acknowledge Read request Data in Data out Word-slice FIFO operation Tristate Latch Full wr rd Rd_req Tristate Latch Full wr rd Rd_req Full wr rd Rd_req Full wr rd Rd_req Tristate Latch Full wr rd Rd_req Tristate Latch Full wr rd Rd_req Tristate Latch Full wr rd Rd_req Tristate Latch Full wr rd Rd_req Tristate Latch Full wr rd Rd_req Tristate Latch Full wr rd Rd_req Tristate Latch Full wr rd Rd_req Write disable Tristate Latch Full wr rd Rd_req

15 An instruction buffer for a low-power DSP 15 Looping behaviour n Loops require Changing the flow of the read token Preventing stages from being emptied –but making sure that they appear to be empty Read token Write token Loop startLoop end Full End of loop Full

16 An instruction buffer for a low-power DSP 16 Evaluation n Power efficiency, latency, cycle-time What defines ‘good’ performance? n Compare with a known design 32-entry micropipeline FIFO chosen Compare operation in non-looping mode

17 An instruction buffer for a low-power DSP 17 Evaluation n Powermill used to gather results Test harness feeds identical random instructions in both tests, at various speeds –and also ensures correct outputs Energy per transfer measured –at maximum throughput for each design –at a rate much less than the maximum

18 An instruction buffer for a low-power DSP 18 Results n Cycle time 6.0ns (167MHz) for instruction buffer. 2.0ns (488MHz) for micropipeline FIFO. –The expected result: micropipeline FIFO is know to have good cycle time Instruction buffer well within 25ns target

19 An instruction buffer for a low-power DSP 19 Results n Latency 2.7ns for instruction buffer 26ns for micropipeline FIFO –Big benefit from parallel structure

20 An instruction buffer for a low-power DSP 20 Results n Energy consumption per transfer Maximum speed –0.32nJ for instruction buffer –0.67nJ for micropipeline FIFO 50MHz (well below maximum) –0.48nJ for instruction buffer –0.77nJ for micropipeline FIFO Instruction buffer consumes 48%-62% of the energy of the simpler micropipeline

21 An instruction buffer for a low-power DSP 21 Conclusions n Cycle time well within specification n Good latency achieved n Low power consumption Outperforms much simpler FIFO design –Study on full extracted layout suggests word- slice FIFO still better with wiring added [13]


Download ppt "An instruction buffer for a low-power DSP 1 An Instruction Buffer for a Low Power DSP Mike Lewis AMULET group."

Similar presentations


Ads by Google