Michele Santoro: michele.santoro@dresd.org Further Improvements in Interconnect-Driven High-Level Synthesis of DFGs Using 2-Level Graph Isomorphism Michele.

Slides:

Advertisements

Similar presentations

Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.

Advertisements

HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNES: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n Relatore: Prof.

ECE 667 Synthesis and Verification of Digital Circuits

VSMC MIMO: A Spectral Efficient Scheme for Cooperative Relay in Cognitive Radio Networks 1.

ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)

Weighted Random and Transition Density Patterns for Scan-BIST Farhana Rashid* Vishwani D. Agrawal Auburn University ECE Department, Auburn, Alabama

4/22/ Clock Network Synthesis Prof. Shiyan Hu Office: EREC 731.

A Routing Technique for Structured Designs which Exploits Regularity Sabyasachi Das Intel Corporation Sunil P. Khatri Univ. of Colorado, Boulder.

Improving Placement under the Constant Delay Model Kolja Sulimma 1, Ingmar Neumann 1, Lukas Van Ginneken 2, Wolfgang Kunz 1 1 EE and IT Department University.

1 Interconnect Layout Optimization by Simultaneous Steiner Tree Construction and Buffer Insertion Presented By Cesare Ferri Takumi Okamoto, Jason Kong.

High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department

FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.

Ahmed Awad Atsushi Takahash Satoshi Tanakay Chikaaki Kodamay ICCAD’14

 Based on the resource constraints a lower bound on the iteration interval is estimated  Synthesis targeting reconfigurable logic (e.g. FPGA) faces the.

Merging Synthesis With Layout For Soc Design -- Research Status Jinian Bian and Hongxi Xue Dept. Of Computer Science and Technology, Tsinghua University,

ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC.

Combining High Level Synthesis and Floorplan Together EDA Lab, Tsinghua University Jinian Bian.

Torino (Italy) – June 25th, 2013 Ant Colony Optimization for Mapping, Scheduling and Placing in Reconfigurable Systems Christian Pilato Fabrizio Ferrandi,

Low-Power Wireless Sensor Networks

LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.

Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns.

March 20, 2007 ISPD An Effective Clustering Algorithm for Mixed-size Placement Jianhua Li, Laleh Behjat, and Jie Huang Jianhua Li, Laleh Behjat,

IT-Master Thesis Themes 2008 Discrete Systems Lab Prof. Dr.-Ing. Volker Lohweg Contact:

High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.

1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.

Optimal digital circuit design Mohammad Sharifkhani.

L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수

Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.

Performance evaluation of component-based software systems Seminar of Component Engineering course Rofideh hadighi 7 Jan 2010.

RF network in SoC1 SoC Test Architecture with RF/Wireless Connectivity 1. D. Zhao, S. Upadhyaya, M. Margala, “A new SoC test architecture with RF/wireless.

R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez.

1 Optical Packet Switching Techniques Walter Picco MS Thesis Defense December 2001 Fabio Neri, Marco Ajmone Marsan Telecommunication Networks Group

LatchPlanner:Latch Placement Algorithm for Datapath-oriented High-Performance VLSI Design Minsik Cho, Hua Xiang, Haoxing Ren, Matthew M. Ziegler, Ruchir.

4 Introduction Broadcasting Tree and Coloring System Model and Problem Definition Broadcast Scheduling Simulation 6 Conclusion and Future Work.

L13 :Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수

KAIS T On the problem of placing Mobility Anchor Points in Wireless Mesh Networks Lei Wu & Bjorn Lanfeldt, Wireless Mesh Community Networks Workshop, 2006.

OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer.

Wajid Minhass, Paul Pop, Jan Madsen Technical University of Denmark

University of Michigan Electrical Engineering and Computer Science 1 Compiler-directed Synthesis of Multifunction Loop Accelerators Kevin Fan, Manjunath.

University of Michigan Electrical Engineering and Computer Science Automatic Synthesis of Customized Local Memories for Multicluster Application Accelerators.

Output Grouping-Based Decomposition of Logic Functions Petr Fišer, Hana Kubátová Department of Computer Science and Engineering Czech Technical University.

Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Timing Optimization.

HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNS: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n Relatore: Prof.

CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.

CAD for VLSI Ramakrishna Lecture#2.

Test complexity of TED operations Use canonical property of TED for - Software Verification - Algorithm Equivalence check - High Level Synthesis M ac iej.

Retiming EECS 290A Sequential Logic Synthesis and Verification.

POLITECNICO DI MILANO A SystemC-based methodology for the simulation of dynamically reconfigurable embedded systems Dynamic Reconfigurability in Embedded.

Resource Sharing in LegUp. Resource Sharing in High Level Synthesis Resource Sharing is a well-known technique in HLS to reduce circuit area by sharing.

COE 360 Principles of VLSI Design Delay. 2 Definitions.

Memory Segmentation to Exploit Sleep Mode Operation

Architecture and Synthesis for Multi-Cycle Communication

ECE 565 High-Level Synthesis—An Introduction

Cristian Ferent and Alex Doboli

FPGA: Real needs and limits

Delay Optimization using SOP Balancing

Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2

Period Optimization for Hard Real-time Distributed Automotive Systems

2 University of California, Los Angeles

Hyunchul Park, Kevin Fan, Manjunath Kudlur,Scott Mahlke

Timing Optimization Andreas Kuehlmann

Powerline Communications: Channel Characterization and Modem Design

Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.

12/4/2018 A Regularity-Driven Fast Gridless Detailed Router for High Frequency Datapath Designs By Sabyasachi Das (Intel Corporation) Sunil P. Khatri (Univ.

Architecture Synthesis

The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’

Delay Optimization using SOP Balancing

Low Power Digital Design

A. Stammermann, D. Helms, M. Schulte OFFIS Research Institute

Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.

Presentation transcript:

Michele Santoro: michele.santoro@dresd.org Further Improvements in Interconnect-Driven High-Level Synthesis of DFGs Using 2-Level Graph Isomorphism Michele Santoro: michele.santoro@dresd.org Relatore: Donatella Sciuto Correlatore: Marco D. Santambrogio

Resource Allocation and Binding Motivation Problem statement: Interconnects have great impact on circuit design: on area: interconnect size on area on circuit's latency: signals propagation on power consumption: parasite capacitance and intrinsic resistance Solution: Decreasing the number of Interconnects. Focus on the single steps or tasks of HLS process. Operation Scheduling Resource Allocation and Binding Controller Synthesis Behavioral Description Datapath Placement Floorplanning HLS

Interconnection Sharing Innovation Innovative contribution of this thesis is focused on scheduling and allocation phases This thesis combines different techniques in an innovative way to obtain a reduction of Synthesis cost. Coloring Resource Sharing Interconnection Sharing Analysis of the scheduling and allocation problem is divided into two phases. Static Dynamic

Outline Introduction State of Art Implementations Results Scheduling and Allocation problem definition State of Art Implementations MR-LCS Coloring Aware PushDown algorithm Best Resource Results Benchmark Random DFGs Conclusion and Future Work

Introduction As briefly said so far, the VLSI Design Flow allows to create from high level specifications an actual device. High Level Synthesis is part of the VLSI Design flow, and it is made of several steps, like: Scheduling Allocation Placement Floorplanning Scheduling Selects control step for each operation. Determines the number of type of resources to allocate. Allocation Da dire vari tipi di scheduling: esatti e euristiche vincoli risorse e latenza Maps operations to Functional Units. Determines the total number of all kind of resources, including Mux & Registers

Coloring Aware MR-LCS Starting point: MR-LCS (ALAP generalization) Improving in 2 phases: Coloring: pre-processing phase Scheduling: together with allocation and binding Identifying isomorphic 2-level sub-graphs Join patterns Split patterns Linear patterns

Estimated Available Time Scheduling priority is given to Colored Sub-graphs. Need to Estimate Availability of nodes. R1 P1 R2 P2 EAT = 6 ST = 5 P1 P2 R3 n x y 2 2 P2 generates an overlapping R1 P1 n P2 = 1 EAT(n) = 8 = 0 EAT(n) = Alap(n) R2 R3 n x y

Pushdown algorithm The Pushdown algorithm exploits the Safe Range to find the best solution in case of overlapping. It also better manages the utilization of the resources. Eg: u Initial situation n1 n2 n3 n4 n5 n4 n5 Start backward from R_L n2 n3 n4 n5 u Schedule the operations n1 n2 n3 n4 n5 u Final situation

Best Resource algorithm It is possible to take advantage of the current state of the scheduling keeping record of all the existing interconnections. R1 P1 P2 R2 n R5 R6 R3 R4 C3 C4

Results To validate the results, the algorithms have been applied to Media Benchmark and also to Random generated DFGs. Captured Costs have been divided into: Direct costs: Indirect costs: Derived costs: Number of interconnections Number of resources Number of registers Number of multiplexers Max fan-out Wire Length Total Area

Benchmark Results fft: Fast Discrete Fourier Transformation convolve: convolution of 2 functions jdmerge: used in reconstructing JPEG images getblk: a kernel service that manages buffers Wires Resources Benchmark Nodes Edges MR-LCS CA CA_PD_BR BR fft2 11 9 6 7 5 4 fft1 17 12 13 8 convolve2 18 10 convolve1 23 14 16 getblk 33 29 22 20 21 convolve0 49 41 30 31 15 jdmerge 79 65 60 54 44 32 19 Avg 32.86 26.29 20.86 21.14 18.43 19.43 12.71 11.00 10.00 10.43 Improv. -1.37% 11.64% 6.85% 13.48% 21.35% 17.98%

Random DFGs: Direct Costs Wires #Nodes MR-LCS CA CA_PD_BR BR α=0.0 α=0.5 α=1.0 50 32 35 300 263 240 217 216 225 550 512 460 432 431 433 444 800 759 670 639 635 662 1050 1010 881 847 845 842 884 1300 1260 1091 1066 1054 1103 1550 1505 1297 1246 1234 1323 1800 1761 1519 1500 1478 1476 1548 7102 6194 5993 5937 5923 6221 Improv. 12.8% 15.6% 16.4% 16.6% 12.4% Table shows an improvement of about 60% for Resource Sharing Table shows an improvement of about 17% for Wire Sharing Resource #Nodes MR-LCS CA CA_PD_BR BR α=0.0 α=0.5 α=1.0 50 15 10 9 300 101 54 53 55 62 550 179 99 98 100 117 800 263 139 134 169 1050 333 168 172 217 1300 412 207 206 212 257 1550 488 233 228 226 296 1800 579 299 273 277 353 2368 1209 1170 1190 1480 Improv. 22.9% 60.7% 61.9% 61.3% 51.8%

Random DFGs: Indirect Costs Registers #Nodes MR-LCS CA CA_PD_BR BR α=0.0 α=0.5 α=1.0 50 29 30 27 300 172 163 133 132 550 303 288 232 234 228 236 800 438 419 323 325 320 332 1050 580 544 407 411 398 423 1300 721 679 512 510 497 533 1550 841 798 588 605 581 628 1800 986 920 684 695 673 726 4069 3841 2907 2941 2856 3039 Improv. 5.6% 28.6% 27.7% 29.8% 25.3% Multiplexers #Nodes MR-LCS CA CA_PD_BR BR α=0.0 α=0.5 α=1.0 50 6 7 300 38 40 26 27 550 68 67 45 48 800 101 93 61 60 59 65 1050 134 123 74 72 73 82 1300 170 155 92 88 104 1550 194 180 106 102 118 1800 233 209 122 119 142 943 873 531 530 519 593 Improv. 7.5% 43.6% 43.8% 44.9% 37% Max Fan-out #Nodes MR-LCS CA CA_PD_BR BR α=0.0 α=0.5 α=1.0 50 9 8 7 300 43 29 25 24 27 550 48 32 31 33 800 56 37 36 40 1050 62 42 45 1300 65 44 1550 73 47 49 46 53 1800 71 51 55 426 286 276 281 307 Improv. 33% 32.9% 35.3% 34.2% 28%

Random DFGs: Derived Costs Wire Length #Nodes MR-LCS CA CA_PD_BR BR α=0.0 α=0.5 α=1.0 50 327 476 258 256 264 328 300 7225 4919 4859 5061 4425 5530 550 19195 16434 11289 11785 13346 13547 800 33625 23352 18485 18398 18506 24278 1050 53617 44911 30835 30872 29861 39128 1300 94362 47793 35643 34661 35451 41891 1550 101569 75465 53907 52420 53373 69657 1800 146467 90954 66617 64530 63814 83573 456387 304304 221893 217983 219040 277932 Improv. 33.3% 51.4% 52.2% 52.0% 39.1% Total Area #Nodes MR-LCS CA CA_PD_BR BR α=0.0 α=0.5 α=1.0 50 1344 1360 816 864 832 928 300 11680 8624 5376 5760 6912 550 22672 15232 10752 10240 10816 12672 800 33792 23296 13376 14976 13728 17920 1050 47728 31520 20672 19712 18816 24960 1300 57792 33600 20800 1550 67936 46144 26496 28288 25920 33408 1800 81312 55040 32000 28800 34800 324256 214816 128304 127968 124384 152400 Improv. 33.8% 60.4% 60.5% 61.6% 53.0% Tables show a reduction of about 52% for Total Wire Length and of about 60% for Total Area

Conclusions and Future Works In this Master Thesis Project simple considerations have been used It has been proved that proposed algorithms perform better than standard MR-LCS achieving: up to 17% of improvement in interconnection sharing around 68% of improvement in resource sharing reduction of around 64% of overall cost Future Works Recognize and exploit different topological patterns. Multi coloring pre-processing. Reiterate the solution through the algorithm: This allows to get further improvements, because the algorithm will be aware of the solution upperbound.

Questions?