Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid* Dept of Computer Science & Engineering University of California, Riverside *Also.

Similar presentations


Presentation on theme: "1 Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid* Dept of Computer Science & Engineering University of California, Riverside *Also."— Presentation transcript:

1 1 Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid* Dept of Computer Science & Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems at UC Irvine This work was supported in part by the National Science Foundation and by NEC C&C Research Labs

2 CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside 1-2 Outline Introduction: Hardware/Software Partitioning And the common assumption of a single specification Different Algorithms in Hardware/Software Codesign Extended Applications Experiments Future Work and Conclusions

3 CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside 1-3 Introduction – Hw/Sw Partitioning Hw/sw partitioning can speedup software Shown by numerous researchers E.g., Balboni, Fornaciari, Sciuto CODES’96; Eles, Peng, Kuchchinski, Doboli DAES’97; Gajski, Vahid, Narayan, Gong Prentice-Hall 1997; Grode, Knudsen, Madsen DATE’98; many others 1.5 to 10x common Some examples like image processing get 100-800x speedup E.g., Cameron project, FCCM’02 Can reduce energy too E.g. Henkel, Li CODES’98 Wan, Ichikawa, Lidsky, Rabaey CICC’98 Stitt, Grattan, Villarreal, Vahid FCCM’02 60-80% energy savings measured on real single-chip uP/FPGA devices

4 CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside 1-4 Hw/Sw Partitioning on Single-Chip Platforms Numerous single-chip commercial devices with uP and FPGA Triscend E5 (shown) Triscend A7 Atmel FPSLIC Xilinx Virtex II Pro Altera Excalibur More sure to come… Make hw/sw partitioning even more attractive uP and peripheralsCache/memory Configurable logic

5 CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside 1-5 Hw/Sw Partitioning – Commercial Tools Evolving Commercial products evolving Synopsys’ Nimble compiler (2000) attempt Proceler Microprocessor Report’s 2001 Technology of the Year Award Others coming…

6 CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside 1-6 Hw/Sw Partitioning – Single-Spec Assumption Assumption – Start from a single specification Typically sw source Partitioning Find critical sw kernels, map some to hw This assumption is made in most research efforts as well as commercial tools Hw/sw partitioner SwHw Specification CompilationSynthesis BinariesNetlists

7 CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside 1-7 Digital Camera Example Developed with intent of exploring hw/sw tradeoffs Captures images, compresses, uploads to PC Soon found that a single specification wasn’t reasonable Two key functions had different hw/sw algorithms CRC DCT Controller Communications DCT CCD Pre-Process Huffman Encoder CRC calculation Controller DCT CCD Pre-Processor Huffman encoder CRC

8 CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside 1-8 Digital Camera Example Results in weak hw design We would have written CRC and DCT differently had we known they’d be mapped to hw Yet, we’d keep the original algorithms if they ended up in software Hw/sw partitioner Sw: Huff., CCD, CtrlHw: CRC, DCT Spec: DCT, Huffman, CRC, CCD, Ctrl CompilationSynthesis BinariesNetlists Weak

9 CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside 1-9 Different Algorithms in Hw vs. Sw The single-specification assumption doesn’t always hold Key observation Designers often use very different algorithms if a behavior is mapped to hardware versus if that behavior is mapped to software Widely known by designers In textbooks Also known in parallel processing – sequential and parallel algorithms

10 CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside 1-10 Different Algorithms – Sorting Example Suppose desired behavior fills a buffer, sorts the buffer, and transmits the sorted list Fill() Sort() Transmit() Sort() in software –QuickSort Simple and fast in sw Poor in hw, can’t be parallelized well Sort() in hardware – Parallel Mergesort Very fast in hardware Slow in sw (if sequential) due to overhead Derive one from the other? Quicksort MS …

11 CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside 1-11 Different Algorithms – CRC Example CRC – Cyclic Redundancy Check Used for error checking during communication, stronger than parity Mathematically, divides a constant into the data and saves the remainder Main Function … calls crc() with parameters: init_crc- initial value *data- pointer to data len- length of data jinit- initializing options crc() returns: value of CRC for given data crc/data/data/data

12 CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside 1-12 Different Algorithms – CRC in Hardware char crc_hw(…) { unsigned short j, crc_value = init_crc; unsigned short new_crc_value; if (jinit >= 0) crc_value=((uchar) jinit) | (((uchar) jinit) << 8); for (j=1;j<=len;j++) { new_crc_value = bit(4,data[j]) ^ bit(0,data[j]) ^ bit(8,crc_value) ^ bit(12,crc_value); // bit 0 new_crc_value = new_crc_value | (bit(5,data[j])^bit(1,data[j])^bit(9,crc_value)^bit(13,crc_value))<<1; new_crc_value = new_crc_value | (bit(6,data[j])^bit(2,data[j])^bit(10,crc_value)^bit(14,crc_value))<< 2;. … continue for bits 3 through 7 …. } return (new_crc_value); } Hardware Version Knowing the generator polynomial, one can calculate the XOR’s for each individual bit Each CRC value is the result of bit-wise XOR’s with the data and the previous CRC value Synthesizes to hw very nicely; but getting bits and shifting are inefficient in sw

13 CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside 1-13 Different Algorithms – CRC in Software Software Version Before doing any calculations, create an initialization table that calculates the CRC for each individual character Use data as index into initialization table and execute two XOR’s Requires lookups, but faster for a sequential calculation char crc_sw(…) // Source: Numerical Recipes in C { unsigned short initialize_table(unsigned short crc, unsigned char one_char); static unsigned short icrctb[256]; unsigned short tmp1, j, crc_value = init_crc; if (!init) { init=1; for (j=0;j<=255;j++) { icrctb[j]=initialize_table(j << 8,(uchar)0); } if (jinit >= 0) crc_value=((uchar) jinit) | (((uchar) jinit) << 8); for (j=1;j<=len;j++) { tmp1 = data[j] ^ HIBYTE(crc_value); crc_value = icrctb[tmp1] ^ LOBYTE(crc_value) << 8; } return (crc_value); }

14 CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside 1-14 Different Algorithms -- DCT DCT – Discrete Cosine Transform Computationally intensive, numerous matrix multiplies Accounts for perhaps 70% of JPEG encoding time Dozens of possible algorithms Best algorithm depends largely on computational resources Certainly different for sw and hw Doing multiplications in floating-point vs. fixed-point Multiplication by a constant can be efficiently mapped to hardware, but accuracy will be lost by not using floating-point

15 CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside 1-15 Codesign Extended Applications (CEAs) Basic idea: Write two versions of certain functions Only the critical functions, and Only those with different sw and hw algorithms Typically only a handful of these Most time is spent in just a few critical functions Include both function versions in the specification But use compiler flags to include either sw or hw version main() { … crc(); … } char crc(…) { #ifdef cea_crc_hw crc_hw(…); #else crc_sw(…); #endif } % gcc –Dcea_crc_hw main.c

16 CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside 1-16 CEAs when using C/C++ and VHDL C code crc_hw(…inputs…) /* Hardware crc... */ for (j=1;j<=len;j++) { TSHORT(to_hw)= data[j]); TBYTE(enable) = 1; TBYTE(enable) = 0; } crc_value=TSHORT(result); return (crc_value) VHDL code if (rst = '1') then crc <= "0000000000000000"; done <= '0'; elsif (clk'event and clk = '1') then if (enable = '1') then if done = '0' then crc <= nextCRC16_D8(input,crc); done <= '1'; end if; else done <= '0'; output <= crc; end if;

17 CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside 1-17 CEAs Enable Hw/Sw Partitioning Tool Traditional hw/sw partitioner Compiler, estimators, search heuristics, technology files, etc. Drawback: heavy impact on tool flow CEAs plus platforms result in simple partitioner Script uses existing compiler, synthesis, and evaluation (simulation or physical measurement) Drawbacks: must write two versions of critical functions, script may use simpler search function Different partitioners for different domains Hw/sw partitioner SwHw Specification CompilationSynthesis BinariesNetlists Essentially a compiler, search heuristic, and estimator. Heavy-duty tool. Script SwHw CEA CompilationSynthesis BinariesNetlists Evaluator Search heuristic and tool control. Lightweight tool.

18 CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside 1-18 Experiments Compared hw and sw CRC algorithms Synthesized to FPGA Compiled to MIPS uP Demonstrates need for different algorithms Sw and hw CRC algorithms in FPGA. Size (Blocks) Delay (clock cycles/character) Hardware CRC algorithm 191 Software CRC algorithm 443 Sw and hw CRC algorithms on a microprocessor. Size (Assembly Lines) Clock Cycles Software CRC Algorithm 1061180,000 Hardware CRC Algorithm 1298814,000

19 CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside 1-19 Experiments Wrote small signal processing example as CEA Wrote sw and hw versions of core functions In this case, algorithms were similar Setup power measurement for two real platforms XS40 (board with microcontroller chip and Xilinx FPGA chip) E5 (single chip with microcontroller and FPGA) Partitioning script automatically partitioned and measured power and cycles (overnight – due to place & route time) Demonstrates how CEAs enable simple yet practical hw/sw partitioning Easily migrates to different platforms, different chips PartitioningEnergy (Joules) on E5 device MultiplySumBit-Share SW 12.4 SW HW8.6 SWHWSW8.8 HWSW 8.0 SWHW 4.8 HWSWHWDoes not Route HW SWDoes not Route HW Does not Route

20 CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside 1-20 Issues and Future Work Issues What if hw versions not used after partitioning? Wasted effort? Verification of all possible combinations? Must use wisely or problem grows unwieldy Future work More examples, more platforms Several versions of the same function One hardware area-conscious One hardware speed-conscious One software code-size-conscious One software speed-conscious …more… Experimenting with communication between hardware and software DMA transfer, wide-access memories, …

21 CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside 1-21 Conclusions Basic hw/sw partitioning assumption of a single specification doesn’t always hold Codesign Extended Applications help support different algorithms CEAs enable hw/sw partitioning in existing tool flows Utilizes existing compilation, synthesis, mapping, evaluation tools, and platforms Simple yet effective approach to hw/sw partitioning


Download ppt "1 Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid* Dept of Computer Science & Engineering University of California, Riverside *Also."

Similar presentations


Ads by Google