Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2000 Morgan Kaufman Overheads for Computers as Components Accelerators zAccelerated systems. zSystem design: yperformance analysis; yscheduling and.

Similar presentations


Presentation on theme: "© 2000 Morgan Kaufman Overheads for Computers as Components Accelerators zAccelerated systems. zSystem design: yperformance analysis; yscheduling and."— Presentation transcript:

1

2 © 2000 Morgan Kaufman Overheads for Computers as Components Accelerators zAccelerated systems. zSystem design: yperformance analysis; yscheduling and allocation.

3 © 2000 Morgan Kaufman Overheads for Computers as Components Accelerated systems zUse additional computational unit dedicated to some functions? yHardwired logic. yExtra CPU. zHardware/software co-design: joint design of hardware and software architectures.

4 © 2000 Morgan Kaufman Overheads for Computers as Components Accelerated system architecture CPU accelerator memory I/O request data result data

5 © 2000 Morgan Kaufman Overheads for Computers as Components Accelerator vs. co- processor zA co-processor executes instructions. yInstructions are dispatched by the CPU. zAn accelerator appears as a device on the bus. yThe accelerator is controlled by registers.

6 © 2000 Morgan Kaufman Overheads for Computers as Components Accelerator implementations zApplication-specific integrated circuit. zField-programmable gate array (FPGA). zStandard component. yExample: graphics processor.

7 © 2000 Morgan Kaufman Overheads for Computers as Components System design tasks zDesign a heterogeneous multiprocessor architecture. yProcessing element (PE): CPU, accelerator, etc. zProgram the system.

8 © 2000 Morgan Kaufman Overheads for Computers as Components Why accelerators? zBetter cost/performance. yCustom logic may be able to perform operation faster than a CPU of equivalent cost. yCPU cost is a non-linear function of performance. cost performance

9 © 2000 Morgan Kaufman Overheads for Computers as Components Why accelerators? cont’d. zBetter real-time performance. yPut time-critical functions on less-loaded processing elements. yRemember RMS utilization---extra CPU cycles must be reserved to meet deadlines. cost performance deadline deadline w. RMS overhead

10 © 2000 Morgan Kaufman Overheads for Computers as Components Why accelerators? cont’d. zGood for processing I/O in real-time. zMay consume less energy. zMay be better at streaming data. zMay not be able to do all the work on even the largest single CPU.

11 © 2000 Morgan Kaufman Overheads for Computers as Components Accelerated system design zFirst, determine that the system really needs to be accelerated. yHow much faster is the accelerator on the core function? yHow much data transfer overhead? zDesign the accelerator itself. zDesign CPU interface to accelerator.

12 © 2000 Morgan Kaufman Overheads for Computers as Components Performance analysis zCritical parameter is speedup: how much faster is the system with the accelerator? zMust take into account: yAccelerator execution time. yData transfer time. ySynchronization with the master CPU.

13 © 2000 Morgan Kaufman Overheads for Computers as Components Accelerator execution time zTotal accelerator execution time: yt accel = t in + t x + t out Data input Accelerated computation Data output

14 © 2000 Morgan Kaufman Overheads for Computers as Components Data input/output times zBus transactions include: yflushing register/cache values to main memory; ytime required for CPU to set up transaction; yoverhead of data transfers by bus packets, handshaking, etc.

15 © 2000 Morgan Kaufman Overheads for Computers as Components Accelerator speedup zAssume loop is executed n times. zCompare accelerated system to non- accelerated system: yS = n(t CPU - t accel ) y = n[t CPU - (t in + t x + t out )] Execution time on CPU

16 © 2000 Morgan Kaufman Overheads for Computers as Components Single- vs. multi-threaded zOne critical factor is available parallelism: ysingle-threaded/blocking: CPU waits for accelerator; ymultithreaded/non-blocking: CPU continues to execute along with accelerator. zTo multithread, CPU must have useful work to do. yBut software must also support multithreading.

17 © 2000 Morgan Kaufman Overheads for Computers as Components Total execution time zSingle-threaded:z Multi-threaded: P2 P1 A1 P3 P4 P2 P1 A1 P3 P4

18 © 2000 Morgan Kaufman Overheads for Computers as Components Execution time analysis zSingle-threaded: yCount execution time of all component processes. z Multi-threaded: yFind longest path through execution.

19 © 2000 Morgan Kaufman Overheads for Computers as Components Sources of parallelism zOverlap I/O and accelerator computation. yPerform operations in batches, read in second batch of data while computing on first batch. zFind other work to do on the CPU. yMay reschedule operations to move work after accelerator initiation.

20 © 2000 Morgan Kaufman Overheads for Computers as Components Accelerated systems zSeveral off-the-shelf boards are available for acceleration in PCs: yFPGA-based core; yPC bus interface.

21 © 2000 Morgan Kaufman Overheads for Computers as Components Accelerator/CPU interface zAccelerator registers provide control registers for CPU. zData registers can be used for small data objects. zAccelerator may include special-purpose read/write logic. yEspecially valuable for large data transfers.

22 © 2000 Morgan Kaufman Overheads for Computers as Components Caching problems zMain memory provides the primary data transfer mechanism to the accelerator. zPrograms must ensure that caching does not invalidate main memory data. yCPU reads location S. yAccelerator writes location S. yCPU writes location S. BAD

23 © 2000 Morgan Kaufman Overheads for Computers as Components Synchronization zAs with cache, main memory writes to shared memory may cause invalidation: yCPU reads S. yAccelerator writes S. yCPU reads S.

24 © 2000 Morgan Kaufman Overheads for Computers as Components Partitioning zDivide functional specification into units. yMap units onto PEs. yUnits may become processes. zDetermine proper level of parallelism: f3(f1(),f2()) f1()f2() f3() vs.

25 © 2000 Morgan Kaufman Overheads for Computers as Components Partitioning methodology zDivide CDFG into pieces, shuffle functions between pieces. zHierarchically decompose CDFG to identify possible partitions.

26 © 2000 Morgan Kaufman Overheads for Computers as Components Partitioning example Block 1 Block 2 Block 3 cond 1 cond 2 P1P2P3 P4 P5

27 © 2000 Morgan Kaufman Overheads for Computers as Components Scheduling and allocation zMust: yschedule operations in time; yallocate computations to processing elements. zScheduling and allocation interact, but separating them helps. yAlternatively allocate, then schedule.

28 © 2000 Morgan Kaufman Overheads for Computers as Components Example: scheduling and allocation P1P2 P3 d1d2 Task graph Hardware platform M1M2

29 © 2000 Morgan Kaufman Overheads for Computers as Components Example process execution times

30 © 2000 Morgan Kaufman Overheads for Computers as Components Example communication model zAssume communication within PE is free. zCost of communication from P1 to P3 is d1 =2; cost of P2->P3 communication is d2 = 4.

31 © 2000 Morgan Kaufman Overheads for Computers as Components First design zAllocate P1, P2 -> M1; P3 -> M2. time M1 M2 network 5101520 P1P2 d2 P3 Time = 19

32 © 2000 Morgan Kaufman Overheads for Computers as Components Second design zAllocate P1 -> M1; P2, P3 -> M2: M1 M2 network 5101520 P1 P2 d2 P3 Time = 18

33 © 2000 Morgan Kaufman Overheads for Computers as Components System integration and debugging zTry to debug the CPU/accelerator interface separately from the accelerator core. zBuild scaffolding to test the accelerator. zHardware/software co-simulation can be useful.


Download ppt "© 2000 Morgan Kaufman Overheads for Computers as Components Accelerators zAccelerated systems. zSystem design: yperformance analysis; yscheduling and."

Similar presentations


Ads by Google