Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hardware accelerator for PPC microprocessor Final presentation By: Instructor: Kopitman Reem Fiksman Evgeny Stolberg Dmitri.

Similar presentations


Presentation on theme: "Hardware accelerator for PPC microprocessor Final presentation By: Instructor: Kopitman Reem Fiksman Evgeny Stolberg Dmitri."— Presentation transcript:

1 Hardware accelerator for PPC microprocessor Final presentation By: Instructor: Kopitman Reem Fiksman Evgeny Stolberg Dmitri

2 Agenda Ways to implement an algorithm Starting with ASC HW architecture SW architecture System optimization Generic module (iDCT) Timing results

3 Abstract Problem There are complex functions (e.g. FFT) which takes a lot of CPU recourses Consider the ways of implementation of such functions and choose the best solution according to specified constraints Solutions Pure SW implementation Pure HW implementation Combinational HW + SW - ASC technology

4 Abstract SW Low cost Low performance HW High cost High performance Combinational

5 Project Goals Study of ASC (A Stream compiler) Study of functions in PamDC library Implementation of interface between a generic module and the CPU using ASC Implementation of some specific module to test the interface Implementation of the same module in SW and make conclusions about performance

6 ASC - A Stream Compiler Combinational (SW/HW) code Familiar C++ writing Generates a flexible HW Standard NetList output (edif) Supported by standard Cad tools Provides HW optimization UNIX oriented

7 ASC – code example #include "asc.h" main(int argc, char **argv) { printf("Hello World\n"); STREAM_START; // ASC code start // Hardware Variable Declarations HWint in(IN); HWint out(OUT); HWint tmp(TMP); STREAM_LOOP(16); tmp = (in << 1) + 55; out = tmp; STREAM_END; // ASC code end } Software Hello World Hardware 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87

8 System components Memec evaluation board Xilinx Virtex II Pro FPGA with PPC405 JTAG LCD, Serial port for debug SW tools Xilinx EDK Xilinx Platform Studio Chip Scope

9 Design Approach - general FPGA module PPC405 Processor Memory EDAC Memory EDAC Memory DRAM Peripheral ASC Peripheral module Monitor other peripheral Monitor module System Bus (PLB)

10 ASC interface (General view) DMA engine DMA Buffer Serdes Generic Module PLB bus Interrupt controller FIFO_in FIFO_out Data Addr CTRL Fifo_full Data_in Data_out

11 SW review – main algorithm Start/reset System blocks initialization(FIFO, DMA,GPIO,LCD) DMA busy Yes Write data packets to ASC application No Calculatio n complete No Read data packets from ASC application Yes

12 SW review – C code fundament DMA – control and data TX/RX func. LCD – setup and data TX func. Data size manipulation Timers control func. MASK definition – user friendly orientation

13 iDCT abstract Reconstructs an image or audio block from it’s discrete cosine transform Why iDCT? Complex iterative algorithm which takes a lot of CPU resources

14 ASC design – IDCT module Discrete Cosine Transform This transform is utilized in the current standards for still images (JPEG) and video compression (MPEG). The principle: Xm - matrix of discrete samples (iDCT samples) Tm - cosine coefficient matrix Fm - DCT matrix

15 ASC design – Optimization (1) ASC supports: Latency Throughput Area For large amount of data: Throughput – calculation time optimized

16 ASC design – Optimization (2) Optimization… Throughput, Area, Latency? AreaThroughputLatency 3 stream cycles9 stream cycles1 stream cycle Max latency 28 clk8 clk Stream cycle

17 ASC design – Optimization (3) Optimization – Area consumption Absolute values refer to Xilinx Virtex II Pro XC2VP7 FPGA LatencyThroughputArea %absolute value% % 2,905 4,440 3,883 FF used in total design total num 9,856 75%7,42279%7,85060%5,915design 4 input LUTs in total total num 9,856 3%26418%1,79913%1,242FF used for ASC total num 9,856 36%3,50940%3,93720%2,002ASC 4 input LUTs for total num 9,856 1,387,906 1,442,767 1,370,908Total equivalent gate count 44345 99206 27347 for design Total gate count Comparing to Empty

18 ASC design – Optimization (4) Optimization – Area Consumption Optimization by latency is the choice. Best throughput and latency, with average area consumption

19 Clock calculations Get time 1 Set DMA control Tx / Rx data packet complete No Get time 2 Yes Calk_time = time2 – time1 LCD write Data + calculation time

20 iDCT running results – SW (1) Linear calculation time growth vs. data packet length as expected in iDCT Basic packet size is 32 bytes. Packet length scale is in num. of basic packets

21 iDCT running results – SW (2) Exponential time calculation growth with exp. data length increasing Exponential Data incease 1 10 100 1000 10000 100000 1000000 10000000 100000000 137 1020305070 100150200250300350400450470500512550700 1000 100003000050000 100000300000500000 1000000 log (Packet length) (x*32) log (Calculation time[us]) Exponential Data increase

22 iDCT running results – HW (1) FIFO size influence (512 bytes) High calculation time vs. writing new data to FIFO

23 iDCT running results – HW (2) FIFO size influence (512 bytes) High calculation time vs. writing new data to FIFO Basic packet size is 32 bytes. Packet length scale is in num. of basic packets

24 iDCT running results – SW vs. HW

25 Innovations Make this generic interface hard coded and include it as part of FPGA (IP) development packet. Development becomes to C++ coding only Interconnection between PPC & Generic Module becomes transparent Make current design faster using separate DMA channels for read and write

26


Download ppt "Hardware accelerator for PPC microprocessor Final presentation By: Instructor: Kopitman Reem Fiksman Evgeny Stolberg Dmitri."

Similar presentations


Ads by Google