Presentation is loading. Please wait.

Presentation is loading. Please wait.

Harmony: A Run-Time for Managing Accelerators Sponsor: LogicBlox Inc. Gregory Diamos and Sudhakar Yalamanchili.

Similar presentations


Presentation on theme: "Harmony: A Run-Time for Managing Accelerators Sponsor: LogicBlox Inc. Gregory Diamos and Sudhakar Yalamanchili."— Presentation transcript:

1 Harmony: A Run-Time for Managing Accelerators Sponsor: LogicBlox Inc. Gregory Diamos and Sudhakar Yalamanchili

2 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS 2 Software Challenges of Heterogeneity Programming Model Programming Model Execution Model Execution Model Portability Portability Performance Performance

3 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS 3 Pooled Accelerator Execution Model Instance Heterogeneous multiprocessor systems are viewed as a pool of processors, each potentially with a unique ISA and system interface Applications that make full use of these systems must include binaries compatible with each accelerator ISA

4 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS Execution Model Configuration of the Machine Model Architecture description specifies configuration of accelerators and processors & communicates QoS requirements Kernel Stream Elements Control Thread Stream ACC … Local Memory DMACache FIFO Multicore processor 1Accelerator 1 Memory Programming Model Accelerator-based Code Segment – compiled for specific device/driver combination System Architecture Description Source Program Compilation Environment HVM

5 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS 5 Goals of Harmony Low Overhead Low Overhead Comparable to or better than hand tuned applications Comparable to or better than hand tuned applications System Configuration Agnostic System Configuration Agnostic Correct execution on a system with any number and type of heterogeneous architectures Correct execution on a system with any number and type of heterogeneous architectures No code modification required No code modification required Scalable Scalable EP application performance should scale with the number of devices EP application performance should scale with the number of devices Familiar Familiar Do not require any more than current programming model of threaded applications for homogeneous architectures Do not require any more than current programming model of threaded applications for homogeneous architectures Harmony

6 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS Key Idea Accelerator kernel deployment based on static and dynamic inter-kernel dependencies Accelerator kernel deployment based on static and dynamic inter-kernel dependencies Inspired by ILP scheduling techniques Inspired by ILP scheduling techniques Kernels are “issued” to accelerators and their execution is “committed” to release dependent kernels Kernels are “issued” to accelerators and their execution is “committed” to release dependent kernels op Dependence resolution op ReadyBuffer Issue From Application Harmony

7 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS 7 Harmony Architecture & Operation Harmony

8 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS 8 Harmony Runtime Operation Accelerator kernels are mapped to specific architectures based on Accelerator kernels are mapped to specific architectures based on Architectures in the system Architectures in the system Available implementations Available implementations Performance Performance Results are forwarded to waiting functions Results are forwarded to waiting functions Can support speculation Can support speculation Results are committed in order Results are committed in order Harmony

9 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS 9 Application Development Programmer supplied (Harmony) checks on entry/exit to accelerator kernels Programmer supplied (Harmony) checks on entry/exit to accelerator kernels Marshalling of operands when a accelerator kernel is invoked Marshalling of operands when a accelerator kernel is invoked May employ multiple (static) implementations corresponding to multiple accelerators May employ multiple (static) implementations corresponding to multiple accelerators Harmony

10 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS 10 Preliminary Performance Evaluation 3.1% Overhead 3.8% Overhead Matrix Multiplication Harmony

11 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS 11 Scheduling Overhead Harmony

12 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS 12 Extensions to FPGAs Maintain the base Harmony deployment model Maintain the base Harmony deployment model Accelerator pools Accelerator pools Associate a Harmony thread with each FPGA-based accelerator Associate a Harmony thread with each FPGA-based accelerator Virtualize the FPGA fabric Virtualize the FPGA fabric Demand-driven vs. static configuration of the fabric Demand-driven vs. static configuration of the fabric Adapt existing register allocation based scheduling techniques Adapt existing register allocation based scheduling techniques Example: Virtualized Packet Schedulers (Sponsor: RNET Technologies) Example: Virtualized Packet Schedulers (Sponsor: RNET Technologies) Poster Session Poster Session Extensions to FPGAs

13 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS FPGA-Based Accelerator Architecture Volatile (DRAM)‏ Nonvolatile (FLASH)‏ PCIe/Hypertransport/CSI Interface PowerPC EncryptDecrypt FFT Memory Controller Switch NI Extensions to FPGAs

14 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS Accelerator Configuration Volatile (DRAM)‏ Nonvolatile (FLASH)‏ PCIe/Hypertransport/CSI Interface PowerPC Memory Controller Switch NI Host Driver Host (DRAM)‏ EncryptDecrypt Switch NI Harmony Thread Address translation in the NI allows isolated paths between accelerators and memory FFTNI Harmony Thread Future

15 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS Virtual Machine Monitor User Software Guest OS Heterogeneous Virtual Machines Heterogeneous Virtual Machines Local Memory Cache ACC DMA FIFO Local Memory Cache Local Memory Cache ACC DMA FIFO ACC DMA FIFO Network SW Resources HW Resources CPU isolation security legacy systems User Software Guest OS PIs: A. Gavrilovska, K. Schwan, S. Yalamanchili Virtualization of accelerator resources Consolidation and sharing of accelerators Looking Ahead

16 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS 16 Questions?


Download ppt "Harmony: A Run-Time for Managing Accelerators Sponsor: LogicBlox Inc. Gregory Diamos and Sudhakar Yalamanchili."

Similar presentations


Ads by Google