VLIW Digital Signal Processor Michael Chang. Alison Chen. Candace Hobson. Bill Hodges.

VLIW Digital Signal Processor Michael Chang. Alison Chen. Candace Hobson. Bill Hodges

Introduction Functionality Functionality ISA ISA Implementation Implementation Functional blocks Functional blocks Circuit analysis Circuit analysis Testing Testing Off Chip Memory Off Chip Memory Status Status

Things to look for Design Tradeoffs Design Tradeoffs Register file size Register file size Multiple word sizes Multiple word sizes Instruction set and implementation Instruction set and implementation Data forwarding Data forwarding Software-controlled on chip cache Software-controlled on chip cache Shared Address/Data bus for off-chip data memory Shared Address/Data bus for off-chip data memory

Instruction Set Architecture 24-bit instruction words pack 3 sub-instructions: 24-bit instruction words pack 3 sub-instructions: Ex: SUB R5 R3, LDM R3 R6, BNEZ R1 Ex: SUB R5 R3, LDM R3 R6, BNEZ R1 Register file - 8 registers Register file - 8 registers 3 bit encoding * 5 Reg. IDs = 15 bits per IW 3 bit encoding * 5 Reg. IDs = 15 bits per IW Simple but useful Instruction Set Simple but useful Instruction Set Multiply, Add/Subtract, Branch, Jump, Load Memory, Load intermediate, Load CCM Multiply, Add/Subtract, Branch, Jump, Load Memory, Load intermediate, Load CCM 2 Branch delay slots 2 Branch delay slots

Microarchitecture In order, 4 stage pipeline In order, 4 stage pipeline IF, ID, EX, WB IF, ID, EX, WB 3 cycle pipeline stage 3 cycle pipeline stage Data forwarding Data forwarding Eliminate RAW hazards (ELEC 320, 425) Eliminate RAW hazards (ELEC 320, 425) 5 forwarding paths 5 forwarding paths Control Logic Control Logic PLA controls pipeline PLA controls pipeline Initialize pipeline, reset Program Counter Initialize pipeline, reset Program Counter Cycle through three cycles of pipeline stage Cycle through three cycles of pipeline stage

Implementation Double Wide Silicon Floorplan

ALU Design Array Multiplier Array Multiplier Ripple-Carry Adder Ripple-Carry Adder Longest Paths: Longest Paths: Add/Subtract: 10.74 ns through MSB Add/Subtract: 10.74 ns through MSB Multiply: 15.87 ns through 10 th product term Multiply: 15.87 ns through 10 th product term

Compiler Controlled Memory (CCM) Small on chip software controlled cache Small on chip software controlled cache Similar to Commercial DSPs Similar to Commercial DSPs Predictable access time in real time Predictable access time in real time Benefits over off chip memory: Benefits over off chip memory: Double bandwidth Double bandwidth Software configurability Software configurability Reduced register “spill”/ “fill” pressure Reduced register “spill”/ “fill” pressure Easily extendable Easily extendable

Implementation of CCM 4 12-bit lines of memory on chip (8 words) 4 12-bit lines of memory on chip (8 words) Two registers, R6 and R7, for loading and storing Two registers, R6 and R7, for loading and storing Two instructions, LDC and STC Two instructions, LDC and STC 9-bit instruction 9-bit instruction Three bit opcode Three bit opcode Five bit word line Five bit word line Single bit determines single/double access Single bit determines single/double access Example instruction: LDC 1 00001 Example instruction: LDC 1 00001 (Reads CCM Line 1 into R6 and R7)

ORCA Test Vector Generation Process Goal: Greater accuracy and shorter time to verify chip functionality Goal: Greater accuracy and shorter time to verify chip functionality Assembly Code Binary Code IRSIM vectors assembler Vector translator

ORCA Vector Suite Goal: Create functional vectors to isolate specific chip cells to aid in post-silicon debug. Goal: Create functional vectors to isolate specific chip cells to aid in post-silicon debug. Register File Register File Compiler Controlled Memory Compiler Controlled Memory ALU ALU Branch Branch Data Forwarding Data Forwarding Pipeline Pipeline

ORCA Obsbus State Machine Goal: Increase internal test signals to the IO’s by implementing a MUX. The MUX is controlled by output signals generated from a state machine. Goal: Increase internal test signals to the IO’s by implementing a MUX. The MUX is controlled by output signals generated from a state machine. 16:1 MUX, 6 output observability pins, 1 input observability pin 16:1 MUX, 6 output observability pins, 1 input observability pin Allows observation of up to 96 internal signals using 7 pins. Allows observation of up to 96 internal signals using 7 pins. The state machine changes state on each toggle of the input pin. The state machine changes state on each toggle of the input pin.

Obsbus Signals Goal: Track an instruction execution through each of the pipeline stages. Goal: Track an instruction execution through each of the pipeline stages. FetchDecode Write Back Execute Program Counter Branch Address OpCodes RegF output Forwarding signals OpCodes ALU input/output CCM input/output OpCodes RegF inputs

Off Chip Memory Instruction memory Instruction memory Regular static RAM (used previously in 422) Regular static RAM (used previously in 422) 8 bit addressing, 8 bit data reads 8 bit addressing, 8 bit data reads 2 8 = 256 words possible = 85 VLIW instructions 2 8 = 256 words possible = 85 VLIW instructions 70 ns read time 70 ns read time One read every cycle One read every cycle Output address on clock A, latch data on clock B Output address on clock A, latch data on clock B One read/cycle * 8bits * 3 cycles/pipeline state = 24 bit VLIW One read/cycle * 8bits * 3 cycles/pipeline state = 24 bit VLIW

Off Chip Memory, continued Data memory (DS1609) Data memory (DS1609) Shared Address/Data bus Shared Address/Data bus PLA carefully designed to control memory PLA carefully designed to control memory Uses worst case propagation delays Uses worst case propagation delays Timed signals using two out of phase clocks Timed signals using two out of phase clocks Default PLA output latching on clock B Default PLA output latching on clock B External latching on clock A to properly time signals External latching on clock A to properly time signals 50 ns read time 50 ns read time

Current Status Functionality of major blocks tested Functionality of major blocks tested Instruction Fetch in final stages Instruction Fetch in final stages ALU instructions implemented and working, including data forwarding ALU instructions implemented and working, including data forwarding Memory instructions just need to be routed Memory instructions just need to be routed Crystal and HSPICE analysis fifty percent complete Crystal and HSPICE analysis fifty percent complete Global power, clock, and pin routing allocated in floorplan Global power, clock, and pin routing allocated in floorplan

Conclusion Solid fundamental ISA gives a nice “baby DSP” Solid fundamental ISA gives a nice “baby DSP” Modular implementation of fundamental blocks Modular implementation of fundamental blocks Design Decisions are well justified Design Decisions are well justified Register file size Register file size Instruction word length Instruction word length Implementation balances timing and space Implementation balances timing and space Access to off chip memory Access to off chip memory

Questions?

VLIW Digital Signal Processor Michael Chang. Alison Chen. Candace Hobson. Bill Hodges.

Similar presentations

Presentation on theme: "VLIW Digital Signal Processor Michael Chang. Alison Chen. Candace Hobson. Bill Hodges."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

VLIW Digital Signal Processor Michael Chang. Alison Chen. Candace Hobson. Bill Hodges.

Similar presentations

Presentation on theme: "VLIW Digital Signal Processor Michael Chang. Alison Chen. Candace Hobson. Bill Hodges."— Presentation transcript:

Similar presentations

About project

Feedback