Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures Yakun Sophia Shao, Brandon Reagen,

Similar presentations


Presentation on theme: "A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures Yakun Sophia Shao, Brandon Reagen,"— Presentation transcript:

1 A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, David Brooks Harvard University

2 Programmable Accelerators (DSP, GPU) Application-Specific Accelerator (ASIP, ASIC) General-Purpose Cores (CPU) Flexibility Programmability Energy Efficiency Beyond Homogeneous Parallelism Design Cost 2

3 OMAP 4 SoC 3 Today’s SoC

4 4 OMAP 4 SoC Today’s SoC ARM Cores GPU DSP System Bus Secondary Bus Secondary Bus Tertiary Bus DMA SD USB Audio Video Face Imaging USB

5 Today’s SoC 5 Apple A7 Harvard VLSI-ARCH Group SoC Tapeout

6 Today’s SoC 6 GPU/ DSP CPU Buses Mem Inter- face Acc CPU Acc

7 Future Accelerator-Centric Architectures Flexibility Design Cost Programmability How to decompose an application to accelerators? How to rapidly design lots of accelerators? How to design and manage the shared resources? 7 GPU/DSP Big Cores Shared Resources Memory Interface Sea of Fine-Grained Accelerators Small Cores

8 Private L1/ Scratchpad Aladdin Accelerator Specific Datapath Shared Memory/Interconnect Models Unmodified C-Code Accelerator Design Parameters (e.g., # FU, mem. BW) Power/Area Performance “Accelerator Simulator” Design Accelerator-Rich SoC Fabrics and Memory Systems Design Cost Flexibility Programmability 8 Aladdin: A pre-RTL, Power- Performance Accelerator Simulator “Design Assistant” Understand Algorithmic-HW Design Space before RTL

9 GPU/DS P Big Cores Shared Resources Memory Interface Sea of Fine-Grained Accelerators Small Cores Future Accelerator-Centric Architecture 9

10 GPU/DS P Big Cores Shared Resources Memory Interface Sea of Fine-Grained Accelerators Small Cores Future Accelerator-Centric Architecture 10 Aladdin can rapidly evaluate large design space of accelerator-centric architectures.

11 Aladdin Overview C Code Power/Area Performance Activity Acc Design Parameters Optimization Phase Realization Phase Optimistic IR Initial DDDG Idealistic DDDG Program Constrained DDDG Resource Constrained DDDG Power/Area Models 11 Dynamic Data Dependence Graph (DDDG)

12 Aladdin Overview C Code Optimistic IR Initial DDDG Idealistic DDDG Program Constrained DDDG Resource Constrained DDDG Power/Area Models Optimization Phase Realization Phase Power/Area Performance Activity Acc Design Parameters 12

13 From C to Design Space C Code: for(i=0; i

14 From C to Design Space IR Dynamic Trace C Code: for(i=0; i

15 From C to Design Space Initial DDDG 0. i=0 1. ld a2. ld b st c 5. i++ 6. ld a7. ld b st c 10. i ld a12. ld b st c C Code: for(i=0; i

16 0. i=0 5. i i ld a12. ld b st c 6. ld a7. ld b st c 1. ld a 2. ld b st c C Code: for(i=0; i

17 Include application-specific customization strategies. Node-Level: –Bit-width Analysis –Strength Reduction –Tree-height Reduction Loop-Level: –Remove dependences between loop index variables Memory Optimization: –Memory-to-Register Conversion –Store-Load Forwarding –Store Buffer Extensible –e.g. Model CAM accelerator by matching nodes in DDDG 17 From C to Design Space Optimization Phase: C->IR- >DDDG

18 From C to Design Space One Design MEM Resource Activity Idealistic DDDG Acc Design Parameters: Memory BW <= 2 1 Adder 0. i=0 5.i i ld a 12. ld b st c 6. ld a 7. ld b st c 1. ld a 2. ld b st c 15. i ld a 17. ld b st c Cycle 0. i=0 5.i++ 6. ld a 7. ld b st c 1. ld a 2. ld b st c 18

19 From C to Design Space Another Design MEM Resource Activity Cycle 0. i=0 5.i i ld a 12. ld b st c 7. ld b st c 1. ld a 2. ld b st c 15. i ld a 17. ld b st c 6. ld a 19 Acc Design Parameters: Memory BW <= 4 2 Adders Idealistic DDDG 0. i=0 5.i i ld a 12. ld b st c 6. ld a 7. ld b st c 1. ld a 2. ld b st c 15. i ld a 17. ld b st c

20 Constrain the DDDG with program and user- defined resource constraints Program Constraints –Control Dependence –Memory Ambiguation Resource Constraints –Loop-level Parallelism –Loop Pipelining –Memory Ports –# of FUs (e.g., adders, multipliers) 20 From C to Design Space Realization Phase: DDDG- >Estimates

21 Cycle Power 21 Acc Design Parameters: Memory BW <= 4 2 Adders Acc Design Parameters: Memory BW <= 2 1 Adder From C to Design Space Power-Performance per Design

22 From C to Design Space Design Space of an Algorithm Cycle Power 22

23 Aladdin Validation C Code Power/Area Performance Aladdin ModelSim Design Compiler Verilog Activity 23

24 Aladdin Validation C Code Power/Area Performance Aladdin RTL Designer HLS C Tuning Vivado HLS ModelSim Design Compiler Verilog Activity 24

25 Aladdin Validation 25

26 Aladdin Validation 26

27 Aladdin enables rapid design space exploration for accelerators. C Code Power/Area Performance Aladdin RTL Designer HLS C Tuning Vivado HLS ModelSim Design Compiler Verilog Activity 27

28 Aladdin enables pre-RTL simulation of accelerators with the rest of the SoC. 28 GPU Shared Resources Memory Interface Sea of Fine-Grained Accelerators Big Cores Small Cores GPGPU- Sim MARSx86... XIOSim … Cacti/Orion2 DRAMSim2

29 29 Acc Core Cache Memory Acc Core Cache Memory Core Modeling Accelerators in a SoC-like Environment

30 Architectures with 1000s of accelerators will be radically different; New design tools are needed. Aladdin enables rapid design space exploration of future accelerator-centric platforms. You can find Aladdin at 30 Aladdin: A pre-RTL, Power- Performance Accelerator Simulator


Download ppt "A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures Yakun Sophia Shao, Brandon Reagen,"

Similar presentations


Ads by Google