AIE Processor Concept
Sequential Processor Stages DecodeFetchExecuteMemWB
Pipelining Processor Stages DecodeFetch ExecuteMem PipElinePipEline PipElinePipEline WB PipElinePipEline PipElinePipEline PipElinePipEline PipElinePipEline PipElinePipEline PipElinePipEline
Transaction Table Five Stages Pipeline
Pipelining Design As Queue – Problems: High Circuit Complexity If Queue is Full in a stage the previous must halt until the queue release item, so there is no great benefit. – Implementation Shift Register Circuit & Registers [Waste Cycles] Counter & Registers [Save Cycles]
Shift Register Circuit & Registers
Counter & Registers Pipeline
Pipeline Optimal Designs Sync Pipeline – All Pipeline Modules Attached with Same Cycle Controller – Cycle Time = Max Stage Clock – Problems There is Waste in Clock but not to much Every stage not aware of the status of previous stage.
Pipeline Optimal Designs A Sync Pipeline – Every Stage aware of the status of the previous stage using internal handshaking signals Ready – Acknowledge Signals – Advantages There is no clock waste thanks to handshaking signals There is no Max Cycle Clock, every instruction take the clocks need to perform it’s operation. – Disadvantages In Control Unit you must specify every instruction timing in every stage of the pipelined processor
Pipeline Optimal Designs Sync Pipeline & A Sync Pipeline
Sync Pipeline Implementation
Key Feature of AIE Processor 32-bit Pipelined Processor Processor Support 48 Instruction Processor Interface with Interleaved Memory Interface with LCD Terminal using Instructions Processor have it’s Assembly Interpreter
Instructions
Ir register IR Register 8 bit INSTRUCIONS ROM Address Bus : 8 bit Data Bus : 32 bit 32 bit Modes Select 24 bit
ROM CONTROLS 11-CMP reg1,reg2 -3e IPNT immediate -4c RPNT reg -2c STOWE address,reg -4c STOWO address,reg -4c STODW address,reg -4c LODWE reg,address -4c LODWO reg,address -4c1d LODDW reg,address -4c a-JG address -8c b-JE address -8c c-JL address -8cc d-JC address -8d e-JNG address -8c f-JNE address -8c JNL address -8cc JNC address -8d JMP address -8d NOP MOV reg,immediate -8c ADD d.reg,s1.reg,s2.reg ADC d.reg,s1.reg,s2.reg SUB d.reg,s1.reg,s2.reg SUW d.reg,s1.reg,s2.reg MUL d.reg,s1.reg,s2.reg DIV d.reg,s1.reg,s2.reg -2a TRSA d.reg,s1.reg -2c TRSB d.reg,s2.reg -2e a-AND d.reg,s1.reg,s2.reg b-OR d.reg,s1.reg,s2.reg c-NAND d.reg,s1.reg,s2.reg d-NOR d.reg,s1.reg,s2.reg e-XOR d.reg,s1.reg,s2.reg f-XNOR d.reg,s1.reg,s2.reg -3a NOT d.reg,s1.reg -3c007000
Main Modes IMMEDIATE MODE REGISTER, REGISTER MODE MEMORY MODE
IMMEDIATE MODE ROM- address 8bit REG-address 5 bit3bit IMMEDIATE 16 bit IR Register : Instructions : *MOV reg,immediate -8c *JG address -8c *JE address -8c *JL address -8cc00000 *JC address -8d *JNG address -8c *JNE address -8c *JNL address -8cc00800 *JNC address -8d *JMP address -8d400000
REGISTER REGISTER MODE IR Register : ROM- address 8bit Source_REG2 5 bit3bit Source_REG1 5 bit3bit Destination_REG 5 bit3bit Instructions : *ADD d.reg,s1.reg,s2.reg *ADC d.reg,s1.reg,s2.reg *SUB d.reg,s1.reg,s2.reg *SUW d.reg,s1.reg,s2.reg *MUL d.reg,s1.reg,s2.reg *DIV d.reg,s1.reg,s2.reg -2a *TRSA d.reg,s1.reg -2c *TRSB d.reg,s2.reg -2e *AND d.reg,s1.reg,s2.reg *OR d.reg,s1.reg,s2.reg *NAND d.reg,s1.reg,s2.reg *NOR d.reg,s1.reg,s2.reg *XOR d.reg,s1.reg,s2.reg *XNOR d.reg,s1.reg,s2.reg -3a *NOT d.reg,s1.reg -3c *CMP reg1,reg2 -3e002000
Indirect addressing MODE IR Register : 8bit5 bit3bit5 bit Instructions : *IDSTOWE address - 2c *IDSTOWO address - 2c *IDSTODW address - 2c *IDLODWE address - 2c *IDLODWO address - 2c1c7000 *IDLODDW address - 2c ROM- address 8bit Source_REG2 5 bit3bit Source_REG1 5 bit3bit Destination_REG 5 bit3bit
MEMORY MODE IR Register : ROM- address 8bit IMMEDIATE 16 bit REG-address 3bit5 bit Instructions : *STOWE address,reg -4c *STOWO address,reg -4c *STODW address,reg -4c *LODWE reg,address -4c *LODWO reg,address -4c1d5000 *INC reg,immediate *DEC reg,immediate *LODDW reg,address -4c *IPNT immediate -4c *PUSHWE reg -4c *PUSHWO reg -4c *PUSHDW reg -4c *POPWE reg -4c *POPWO reg -4c1d5600 *POPDW reg -4c395600
INSTRUCTION set B 31,B 30,B 29 (1) B 28,B 27,B 26, B 25 (2) B 24,B 23,B 22 (3) B 21,B 20,B 19, B 18 (4) B 17 B 16 B 15 B 14,B 13,B 12 B 11 (5)(6)(7)(8)(9) 1) Select Mode : {B 31 : Immediate mode, B 30 : Memory Mode, B 29 : Register-Register Mode} 2) Execution Control 3) Execution Conditional Control 4) Memory Control : {B 21 : BHE, B 20 :Select Memory, B 19 :Memory R/w, B 18 :Memory Even/Odd } 5) Select Write Back Block or TTY Block 6) Select The Input of the Write Back Block From Alu Result or Memory Output 7) No Operation 8) Register File Control { B 14 :Write Register, B 13 :OE Register,B 12 :Enabel Write Select Register } 9) Invert Condition
Tracing Some Instructions
MIPS Architecture based
For Example Executing These Two Instruction Sequentially I1:R1=R2+R3 I2:R4=R2 AND R1
I1: Fetching I2: Still in Memory l1
I1: Decoding & RegFetch R2 R3 I2: Fetching l2 l1
I1: Execute (R2 + R3) I2: Decoding & RegFetch R2 R1 l2 l1
I1: MEM[no Operation] (R2 + R3) I2: Execute (R2 AND R1) l2 l1
I1: Write Back R1=(R2 + R3) I2: MEM[no Operation] (R2 AND R1) Data Stored In R1 l2 l1
Solution I1:R1=R2+R3 NOP I2:R4=R2 AND R1
I1: Fetching I2: Still in Memory l1
I1: Decoding & RegFetch R2 R3 I2: Still in Memory L1 NOP
I1: Execute (R2 + R3) I2: Still in Memory NOP L1 NOP
I1: MEM[no Operation] (R2 + R3) I2: Still in Memory NOP L1 NOP
I1: Write Back R1=(R2 + R3) I2: Fetching Data Stored In R1 L1 NOP L2
I1: Terminated I2: Decoding & RegFetch R2 R1 L2 NOP
I1: Terminated I2:Execute (R2 AND R1) L2 NOP
I1: Terminated I2:MEM[No Operation] (R2 AND R1) L2 NOP
I1: Terminated I2: Terminated
Statistics & Comparisons
Cisc Vs Risc Cisc: -Richer instruction set but very complex circuit. -Instructions generally take more than 1 clock to execute. -Instructions of a variable size. Risc: -Instructions execute in one clock cycle. -Uniformed length instructions and fixed instruction format. -Simple instructions and circuit.
Speed: With Pipelining: Each stage takes 4 clock cycles 5 stages IF,ID,EX,MEM,WB If clock rate 5 MHz then time for performing an instruction per pipeline stage is 0.8 µsec. Without Pipelining: If clock rate 5 MHz then time for performing an instruction is 4 µsec.
MOV r1,05h MOV r2,04h ADD r3,r1,r2 STODW r3,1234h Pipelining If ID NOP If ID EXMEM IfIDEXMEM WB NOP
Average no. of stall cycles per instruction is 0.75 Speed up is 2.85
Thank you