Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

Similar presentations


Presentation on theme: "Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation."— Presentation transcript:

1 Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation

2 (R)evolution of Processors Rock Hard Ice Hard Play-dough Hard

3 (R)evolution of Processors Rock Hard Ice Hard Play-dough Hard Hardwire, GPP Perform well in most conditions but not extreme conditions

4 (R)evolution of Processors Rock Hard Ice Hard Play Dough Hard GPP with FPGAs Custom designs perform well in some extreme conditions. Required extensive knowledge of hardware design

5 (R)evolution of Processors Rock Hard Ice Hard Play-dough Hard GPP with embedded programmable logics Reconfiguration triggered by software

6 (R)evolution of Processors Ice Hard –Contains ASIC (Application Specific IC) designs Increases time-to- market Takes time to reconfigure

7 Software Hotspots In DSP –80% of the processing load are spent on 20% of the code Hand tuned assembly that can take thousands of cycle to execute. Less portable –The remaining 80% of the code have complex system functions Run well on most GPP

8 Software Hotspots Example when 16 QuadAM modem (19.2 Kbaud) implemented entirely in software –takes 177,000 instruction cycles to execute on TIC6711 FPGA Co-processor (a few cycles)

9 Solving Hotspots PROCESSOR + FPGA MULTIPLE DSPs P P P P FPGA DSP ENABLED PROCESSORS P P RISC PROCESSOR PROGRAMMABLE LOGIC

10 An Example of Configurable Processor (Stretch S5000) ALU FPU 32-BIT RF CONTROL 128-BIT WRF 32-BIT RF ALU FPU S5 ENGINE I/O I/O + DMA ISEF Instruction-Set Extension Fabric DATA RAM 32KB SRAM 256KB D-CACHE 32KB I-CACHE 32KB MMU S5 Engine Common To All S5000 Processors 300 MHz Xtensa-V 32-bit RISC Processor I/O Subsystem Tailored To Markets & Applications Programmable Logic Data Path Inside The RISC Processor 32 x 128b Wide Registers + Flexible Wide Load/Store Instructions

11 Programmable Logic Architecture RISC DP Instruction Set Extension Fabric (ISEF) WRAR Memory 128 32 128 32

12 ISEF Resources An ISEF includes: –Computation resources –Routing resources –Pipeline resources –State Register resources 2 types of computation resources: –4096 arithmetic units (AUs) for arithmetic and logic operations –8192 multiplier units (MUs) for multiply and shift operations Example: A single ISEF may implement –32 16*16 multipliers –128 32-bit ALUs

13 Wide Register Wide register file is used for holding WR data –32 WR registers (128-bits each) –Divided into 2 banks of 16 registers (WRA and WRB) The WRA/WRB types associate a variable with WR bank A/B –WRA v1, v2, v3; –WRB w1, w2, w3; The WR type defaults to WRA –Use WRA/WRB to avoid unnecessary register moves between the two WR banks

14 Extension Instructions (EIs) The power of the Software Configurable Processor (SCP) architecture is derived from the ability to define new and complex instructions that operate on very wide data Extension Instruction’s 3 steps 1.EI Definition: write a Stretch-C function 2.EI Compilation: compile the Stretch-C function 3.EI Use: call an EI through its intrinsic in the application code (C/C++)

15 Extension Instructions 1.Define an Extension Instruction (writing Stretch-C) #include SE_FUNC void V_AND8(WR v1, WR vMask, WR *vOut) { *vOut = v1 & vMask; } 2.Compile and link EI (Stretch-C source file: *.xc ) 3.Use EI in C/C++ application code (calling intrinsics) #include “vector.h” WR v1, vMask, vOut; … WRL128I(&v1, (WR*) memSrc1Ptr, 0); V_AND8(v1, vMask, &vOut); WRS128I(vOut, (WR*) memDstPtr, 0); vector.xc

16 Extension Instructions –Are issued by the Xtensa –Read source operands from the 128-bit WR and/or 32-bit AR register files –Execute out of the ISEF –Write destination operands to WR Once the ISEF is configured with the new instruction, it may be –Called as an intrinsic from application C code –Used as an assembly instruction in an assembly source file

17 Writing Stretch-C Functions #include SE_FUNC void V_AND128( WR v1, WR v2, WR *vOut) { *vOut = v1 & vMask; } #include stretch.h header file Stretch-C functions are identified by keyword SE_FUNC void EI names are identified by the Stretch-C function name (for single instruction functions) EI source and destination operands are defined by the Stretch-C function parameters EI operation is defined by the Stretch-C function instructions

18 Extension Instruction Parameters 1 Extension Instructions are user defined assembly instructions that use input and output operands An Extension Instruction can specify up to 3 Parameters –0, 1, 2, or 3 inputs –0, 1 or 2 outputs Input and output parameters reside in register files –Inputs come from the WR or AR register files –Outputs may only be written to the WR register file Assembly # result = a + b ADD result, a, b Stretch-C // RESULT = A + B V_ADD4(A, B, &RESULT);

19 Extension Instruction Parameters 2 EI source operands (inputs) may include –Up to 3 WR inputs (use WR, WRA or WRB) –Up to 2 AR inputs (use int, short, etc.) EI destination operands (outputs) may include –Up to 2 WR outputs, each writing a separate WR bank –Use the C pointer notation for outputs A single WR parameter may be used as both an input and output operand SE_FUNC void FOO(int c1, WR v1, WRB *vOut){ } SE_FUNC void FOO(WR v1, WRA *vOut1, WRB *vOut2){ } SE_FUNC void FOO(WR v1, WRA *vInOut1, WRB *vOut2){ }

20 Example of Stretch-C RGB2YCrCb Y = 0.299 R + 0.587 G + 0.114 B Cr = 0.701 R - 0.587 G - 0.114 B Cb = -0.299 R - 0.587 G + 0.886 B Or Y = (77R + 150G + 29B) >> 8 Cb = (-43R - 85G + 128B + 32768) >> 8 Cr = (128R - 107G + 21B + 32768) >> 8

21 RGB2YCC SE_FUNC void rgb2ycc(WR A, WR *B) { se_sint r[5], g[5], b[5]; se_sint y[5], cb[5], cr[5]; int i, j; /* unpack A to RGB data, does not use any ISEF logic */ for (i = 0; i < 5; i++) { j = i * 3 * 8; r[i] = A(j+7, j); g[i] = A(j+15, j+8); b[i] = A(j+23, j+16); } /* converting 5 pixels */ for (i = 0; i < 5; i++) { y[i] = ( 77*r[i] + 150*g[i] + 29*b[i] ) >> 8; cb[i] = (-43*r[i] - 85*g[i] + 128*b[i] + 32768) >> 8; cr[i] = (128*r[i] - 107*g[i] - 21*b[i] + 32768) >> 8; } /* pack YCbCr to B */ *B = (cr[4],cb[4],y[4],cr[3],cb[3],y[3],cr[2],cb[2],y[2],cr[1],cb[1],y[1],cr[0],cb[0],y[0]); }

22 Stretch Compiler scc libei.hlibei.a rgb2ycc.xc scc rgb2ycc.c scc rgb2ycc.exe rgb2ycc.o target compile link Stretch compile run

23 Compiler Option S5000

24 Summary Software Configurable Processor –Describe hardware using C/C++ But not trivial. Basic understanding of the architecture is needed –Reconfiguration can take place in 150 micro-seconds 2 ISEFs per chip –Can ping pong Configuration files stored in SDRAM –Use DMA to preload information ISEF is proprietary and NOT FPGAs

25


Download ppt "Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation."

Similar presentations


Ads by Google