A Dictionary Construction Technique for Code Compression Systems with Echo Instructions Embedded and Reconfigurable Systems Lab Computer Science Department.

Slides:



Advertisements
Similar presentations
Instruction Selection for Compilers that Target Architectures with Echo Instructions Philip BriskAni NahapetianMajid Sarrafzadeh Embedded and Reconfigurable.
Advertisements

Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling John Cavazos University.
Idempotent Code Generation: Implementation, Analysis, and Evaluation Marc de Kruijf ( ) Karthikeyan Sankaralingam CGO 2013, Shenzhen.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
1 CS 201 Compiler Construction Machine Code Generation.
Greedy Algorithms Greed is good. (Some of the time)
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Constraint Programming for Compiler Optimization March 2006.
Whole-Program Linear-Constant Analysis with Applications to Link-Time Optimization Ludo Van Put – Dominique Chanet – Koen De Bosschere Ghent University.
Application Specific Instruction Generation for Configurable Processor Architectures VLSI CAD Lab Computer Science Department, UCLA Led by Jason Cong Yiping.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
Cpeg421-08S/final-review1 Course Review Tom St. John.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Addressing Optimization for Loop Execution Targeting DSP with Auto-Increment/Decrement Architecture Wei-Kai Cheng Youn-Long Lin* Computer & Communications.
1 CS 201 Compiler Construction Lecture 13 Instruction Scheduling: Trace Scheduler.
1/20 Data Communication Estimation and Reduction for Reconfigurable Systems Adam Kaplan Philip Brisk Ryan Kastner Computer Science Elec. and Computer Engineering.
Architecture and Compilation for Reconfigurable Processors Jason Cong, Yiping Fan, Guoling Han, Zhiru Zhang Computer Science Department UCLA Nov 22, 2004.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
ICCAD’01: November, 2001 Instruction Generation for Hybrid Reconfigurable Systems Ryan Kastner, Seda Ogrenci-Memik, Elaheh Bozorgzadeh and Majid Sarrafzadeh.
Intermediate Code. Local Optimizations
Center for Embedded Computer Systems University of California, Irvine and San Diego Loop Shifting and Compaction for the.
Improving Code Generation Honors Compilers April 16 th 2002.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
University of California San Diego Locality Phase Prediction Xipeng Shen, Yutao Zhong, Chen Ding Computer Science Department, University of Rochester Class.
Software Pipelining in Pegasus/CASH Cody Hartwig Elie Krevat
KNURE, Software department, Ph , N.V. Bilous Faculty of computer sciences Software department, KNURE The trees.
Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.
Generic Software Pipelining at the Assembly Level Markus Pister
Department of Computer Science A Static Program Analyzer to increase software reuse Ramakrishnan Venkitaraman and Gopal Gupta.
UNIVERSITAT POLITÈCNICA DE CATALUNYA Departament d’Arquitectura de Computadors Exploiting Pseudo-schedules to Guide Data Dependence Graph Partitioning.
Automated Design of Custom Architecture Tulika Mitra
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
Graphs Rosen, Chapter 8. Isomorphism (Rosen 560 to 563) Are two graphs G1 and G2 of equal form? That is, could I rename the vertices of G1 such that the.
CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Macro instruction synthesis for embedded processors Pinhong Chen Yunjian Jiang (william) - CS252 project presentation.
Florida State University Automatic Tuning of Libraries and Applications, LACSI 2006 In Search of Near-Optimal Optimization Phase Orderings Prasad A. Kulkarni.
May 31, May 31, 2016May 31, 2016May 31, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University,
Interference Graphs for Programs in Static Single Information Form are Interval Graphs Philip Brisk Processor Architecture Laboratory (LAP) EPFL Lausanne,
Design of a High-Throughput Low-Power IS95 Viterbi Decoder Xun Liu Marios C. Papaefthymiou Advanced Computer Architecture Laboratory Electrical Engineering.
Compilers for Embedded Systems Ram, Vasanth, and VJ Instructor : Dr. Edwin Sha Synthesis and Optimization of High-Performance Systems.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Basic Memory Management 1. Readings r Silbershatz et al: chapters
Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.
A High-Level Synthesis Flow for Custom Instruction Set Extensions for Application-Specific Processors Asia and South Pacific Design Automation Conference.
Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑 教授 組員 : R 張馨怡 R 林秀萍.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set Extensions Nikolaos Vassiliadis, George Theodoridis and Spiridon.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
Area-Efficient Instruction Set Synthesis for Reconfigurable System on Chip Designs Philip BriskAdam KaplanMajid Sarrafzadeh Embedded and Reconfigurable.
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
Optimization Code Optimization ©SoftMoore Consulting.
Algorithms and networks
Methodology of a Compiler that Compresses Code using Echo Instructions
The Taxi Scheduling Problem
From C to Elastic Circuits
Optimal Polynomial-Time Interprocedural Register Allocation for High-Level Synthesis Using SSA Form Philip Brisk Ajay K. Verma Paolo Ienne csda.
Unit IV Code Generation
CS 201 Compiler Construction
Code Optimization Overview and Examples Control Flow Graph
In Search of Near-Optimal Optimization Phase Orderings
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Lecture 19: Code Optimisation
Compiler Construction
Presentation transcript:

A Dictionary Construction Technique for Code Compression Systems with Echo Instructions Embedded and Reconfigurable Systems Lab Computer Science Department University of California, Los Angeles {philip, macbeth, ani, LCTES ’05. June 16, Chicago, IL Philip BriskJamie MacbethAni NahapetianMajid Sarrafzadeh

Outline Introduction: Code Compression Dictionary Compression Dictionary Construction Overview of the Algorithm Experimental Methodology and Results Summary

Why Reduce Program Size? Reduces Memory Requirements Silicon Cost of Program Storage in on-chip ROMs As Embedded Systems Become More Complex, Ever-More Functionality Will Migrate to Software Costs of Runtime Decompression Performance Overhead Area of the Decoder Circuitry Introduction: Code Compression For Embedded Systems

Dictionary Compression 1.Find Repeated Code Sequences 2.Place Each Sequence Into a Dictionary 3.Replace Each Sequence in the Program with a Codeword that Accesses the Dictionary Program Dictionary

CALD Instructions Place each sequence in a dictionary All Codewords Point to the Dictionary Echo Instructions Leave one Instance of the Sequence Inline All Codewords Point to the Sequence CALD and Echo Instructions Program Dictionary Program

The Traditional Approach: Compression Performed at Link Time Substring Matching [Fraser et al., 1984] + Register Renaming [Cooper and McIntosh, 1999] [Debray et al., 2000] + Instruction Rescheduling [De Sutter et al., 2002] Our Approach is Somewhat Different… Identify Repeated Isomorphic Patterns that Occur within the Intermediate Representation PRIOR TO Register Allocation [Brisk et al., 2004] Compression Algorithms

Dictionary Construction A:R1 ← R2 + R3 B:R4 ← R5 + R6 C:R7 ← R1 + R4 A:R1 ← R2 + R3 C:R7 ← R1 + R4 A:R1 ← R2 + R3 B:R4 ← R5 + R6 C:R7 ← R1 + R4 B:R4 ← R5 + R6 A:R1 ← R2 + R3 C:R7 ← R1 + R4 A:R1 ← R2 + R3 C:R7 ← R1 + R4 Dictionary 1 Dictionary 2 Sequence 1 Sequence 2 2 Schedules Exist for DAG 1 DAG 1 DAG 2 DAG 2 is isomorphic to a subgraph of DAG 1 5 3

Isomorphic Pattern Generation Edge Contraction Add an Operation to a Pattern Combine 2 Patterns into a Larger One Build a Subgraph Hierarchy (SH)

Isomorphic Pattern Generation Edge Contraction Add an Operation to a Pattern Combine 2 Patterns into a Larger One Build a Subgraph Hierarchy (SH) T1T1 T1T1 SH

Isomorphic Pattern Generation Edge Contraction Add an Operation to a Pattern Combine 2 Patterns into a Larger One Build a Subgraph Hierarchy (SH) T1T1 T1T1 SH

Isomorphic Pattern Generation Edge Contraction Add an Operation to a Pattern Combine 2 Patterns into a Larger One Build a Subgraph Hierarchy (SH) T1T1 T1T1 SH T2T2

Isomorphic Pattern Generation Edge Contraction Add an Operation to a Pattern Combine 2 Patterns into a Larger One Build a Subgraph Hierarchy (SH) T1T1 T1T1 SH T2T2 T2T2

Isomorphic Pattern Generation Edge Contraction Add an Operation to a Pattern Combine 2 Patterns into a Larger One Build a Subgraph Hierarchy (SH) T1T1 T1T1 SH T2T2 T2T2

Isomorphic Pattern Generation Edge Contraction Add an Operation to a Pattern Combine 2 Patterns into a Larger One Build a Subgraph Hierarchy (SH) T1T1 T1T1 SH T2T2 T2T2

Isomorphic Pattern Generation Edge Contraction Add an Operation to a Pattern Combine 2 Patterns into a Larger One Build a Subgraph Hierarchy (SH) T1T1 T1T1 SH T2T2 T2T2

Isomorphic Pattern Generation Edge Contraction Add an Operation to a Pattern Combine 2 Patterns into a Larger One Build a Subgraph Hierarchy (SH) T1T1 T1T1 SH T2T2 T2T2 T3T3 T2T2 T2T2 T3T3 T4T4

Isomorphic Pattern Generation Edge Contraction Add an Operation to a Pattern Combine 2 Patterns into a Larger One Build a Subgraph Hierarchy (SH) T1T1 T1T1 SH T2T2 T2T2 T3T3 T2T2 T2T2 T3T3

T3T3 Isomorphic Pattern Generation Edge Contraction Add an Operation to a Pattern Combine 2 Patterns into a Larger One Build a Subgraph Hierarchy (SH) T1T1 T2T2 T1T1 T2T2 SH T2T2 T4T4 T2T2 T3T3 T4T4

T3T3 Isomorphic Pattern Generation Edge Contraction Add an Operation to a Pattern Combine 2 Patterns into a Larger One Build a Subgraph Hierarchy (SH) T1T1 T2T2 T1T1 T2T2 SH T2T2 T4T4 T2T2 T3T3 T4T4

An SH Grammar The SH is also a DAG Generate a pattern T k from sub-patterns T i and T j ; Contract edge (T i, T j ) Create a Production: T k → T i T j T3T3 T1T1 T2T2 T2T2 T4T4 T 2 → xT 1 x T 4 → T 3 T 2 x

Derivations and Scheduling a b c d ef g a b c d ef g a c d e d ef d f G1G1 G2G2 G3G3 G4G4 G6G6 G5G5 G7G7 G 1 → G 2 G 3 G 2 → G 4 bG 3 → G 5 g G 4 → acG 5 → G 6 fG 5 → G 7 e G 6 → deG 7 → df Grammar G1G1 G3G3 G4G4 ac G7G7 df G5G5 e g b G2G2 G1G1 G3G3 G4G4 ac G6G6 de G5G5 f g b G2G2 acbdefgacbdfeg Derivations

Compatibility T i, T j – patternsS i, S j – schedules for T i, T j Assume T i is a Subgraph of T j We want T i and T j to Share the Same Dictionary Entry Then S i must be a Contiguous Subsequence of S j. A:R1 ← R2 + R3 B:R4 ← R5 + R6 C:R7 ← R1 + R4 A:R1 ← R2 + R3 C:R7 ← R1 + R4 B:R4 ← R5 + R6 A:R1 ← R2 + R3 C:R7 ← R1 + R4 AC is a Contiguous Subsequence of BAC but not ABC

Convex Cuts in DAGs Let G = (V, E) be a DAG A Cut is a Partition of V A Convex Cut cannot have edges that cross the boundary of a cut in BOTH directions SH Construction Ensures Convex Cuts DAGNon-Convex Cut Convex Cut / Scheduling

Convex Cuts and Compatibility a b c d ef g G1G1 a b c d ef g G2G2 G3G3 b d f a c e g G4G4 G5G5 a b c d e f g a b c d e f g a b c d ef g G 1→(2,3) b d f a c e g G 1→(4,5) a b c d ef g G 1→(2,3),(4,5) CYCLE! G 1 → G 2 G 3 G 1 → G 4 G 5

Generalized Compatibility Given a Set of Productions with G 1 on the LHS… G 1 → G 2 G 3 G 1 → G 4 G 5 …G 1 → G 2k G 2k+1 How can we Tell if they are Compatible?, Three Criteria Equivalent to Compatibility 1.G 1→(2,3),(4,5),…,(2k,2k+1) is Acyclic 2.G 2 G 4 … G 2k 3.G 2k+1 … G 5 G 3 The Pragmatic Question: If all Productions are NOT Compatible, what is the Largest Compatible Subset?

The Subset/Subgraph View of Compatibility and Scheduling GiGi GjGj G i G j G j - G i SiSi S j-i SiSi 1.Construct a Schedule S i for G i 2.Construct a Schedule S j-i for G j-i 3.Construct a Schedule S j = S i S j-i for G j

A Production Compatibility Graph Represent the Subgraph Relation as a DAG called the Production Compatibility Graph (PCG) Productions G 1 → G i … and G 1 → G j … create vertices G i and G j Add an Edge (G i, G j ) to the PCG if 1.G i G j 2.There is no G k such that G j G k G i Any PATH in the PCG Corresponds to a Subset of Patterns that can be Scheduled Contiguously within a Dictionary entry for G 1.

PCG Example a b c d ef g G1G1 a b c d ef g G2G2 G3G3 b d f a c e g G4G4 G5G5 a b c d ef g G6G6 G7G7 a b c d ef g G8G8 G9G9 a b c d e f g G 10 G 11 G8G8 G2G2 G4G4 G6G6 G 10 PCG

Algorithm Overview Recall that the Subgraph Hierarchy is a DAG Process SH Entries in Topological Order All Sub-Patterns Processed Before Each Pattern Construct a PCG for each SH Entry Assign Vertex Weights to Each Pattern based on the Number of Sub-Patterns in the Dictionary Entry Find Max Vertex-Weighted Path in the PCG Determine the Maximum Gain Pattern in the SH Remove the Max Gain Pattern – and all Sub-Patterns Selected for its Dictionary Entry Repeat until the SH is Empty

Experimental Framework Algorithm Built into the Machine SUIF Compiler 1.Consolidate Each Application using link_suif Pass All Unrolled Loops Manually Re-rolled 2.Standard Front End Compilation Script One Round of Constant Folding/DCE 3.Instruction Selection for Alpha Architecture ARM Back End Recently Released… 4.Detect Recurring Isomorphic Patterns in the IR Analysis described in [Brisk et al., 2004] 5.Dictionary Construction as Described Here

Experimental Methodology Cannot Compare with Substring Matching Many Schedules Exist for Each DAG Substring Matching Assumes Scheduled Code How to Determine the Best Schedule for Each DAG? Our Algorithm Determines a Schedule for the Entire Set of DAGs to Maximize Pattern Overlap Naïve Approach – Each Pattern Gets Its Own Dictionary Entry Our Approach - Isomorphism/Scheduling

Experimental Results Applications Taken from MediaBench [Lee et al., 1997]

Compilation Time Benchmark Total (sec) Dictionary (sec)(%) Epic G.721 GSM JPEG MPEG2 Dec MPEG2 Enc Pegwit PGP PGP (RSA) Rasta % 7.23% 2.44% 4.45% 4.06% 3.06% 3.37% 2.85% 5.74% 4.81%

Conclusion Algorithm Given for Dictionary Construction What Is Built is Actually an Intermediate Representation of a Dictionary Combination of 3 Classically Hard Problems Graph/Subgraph Isomorphism Scheduling Dictionary Construction/Compression Future Work: Register Allocation and Assignment Make a Best Effort to Assign Registers So that Isomorphic Patterns have Identical Register Usage

1. Brisk, P., Nahapetian, A., and Sarrafzadeh, M. Instruction Selection for Compilers that Target Architectures with Echo Instructions, SCOPES Fraser, C. W., Myers, E., and Wendt, A. Analyzing and Compressing Assembly Code. Symposium on Compiler Construction, Cooper, K. D., and McIntosh, N. Enhanced Code Compression for Embedded RISC Processors, PLDI De Sutter, B., De Bus, B., and De Bosschere, K. Sifting out the Mud: Low-Level C++ Code Reuse, OOPSLA Debray, S., Evans, W., Muth, R., and De Sutter, B. Compiler Techniques for Code Compaction, TOPLAS, Lee, C., Potkonjak, M., and Mangione-Smith, W. H. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems, MICRO-30, References

Questions ?