Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimal Polynomial-Time Interprocedural Register Allocation for High-Level Synthesis Using SSA Form Philip Brisk Ajay K. Verma Paolo Ienne csda.

Similar presentations


Presentation on theme: "Optimal Polynomial-Time Interprocedural Register Allocation for High-Level Synthesis Using SSA Form Philip Brisk Ajay K. Verma Paolo Ienne csda."— Presentation transcript:

1 Optimal Polynomial-Time Interprocedural Register Allocation for High-Level Synthesis Using SSA Form
Philip Brisk Ajay K. Verma Paolo Ienne csda

2 Outline Register Allocation Overview
Interprocedural Register Allocation Related Work SSA Form With Launch and Landing Pads Optimal Solution Experimental Results Conclusion

3 Modeling Register Allocation
For Procedure Pi… Build interference graph Gi = (Vi, Ei) Vi – One vertex for each variable Ei – Edge between each pair of interfering variables Two variables interfere if their lifetimes overlap Compute the chromatic number χ(Gi) Color assignment = Register assignment NP-Complete in general

4 Local Interferences Local Interferences – Single Procedure
Overlapping lifetimes Static Single Assignment (SSA) Form Interference graph is chordal X  Y   X Z   Y  Z Y Z X Y Z X

5 Global Interferences Global Interferences
Variable V is live across a call to procedure P V interferes with EVERY local variable in P And all variables in all procedures reachable from P Must consider all paths through the Call Graph Main: V  Call P  V P: Call Q Q: Main P Q

6 Global Interferences and Recursion
Fact: No register can hold a local variable across a recursive function call Runtime stack is required Some exceptions (e.g. static local variables) Ignored here Call Graph Compute strongly connected components (SCCs) Collapse each SCC into a single node Resulting “Augmented Component Graph” is acyclic

7 Interprocedural Register Allocation
Interprocedural Interference Graph (IIG) Undirected graph G = (V, E) V – All variables in all procedures E – Local AND global interferences Compute chromatic number χ(G)

8 Related Work Interprocedural Register Allocation in HLS
Color IIG with heuristic [Vemuri et al., TODAES ’02] IIG is large Polynomial heuristics are still slow Scalable Approach [Beidas and Zhu, ASP-DAC ’05] Color each procedure individually Use any heuristic you want Use any intermediate representation you want Propagate global interferences at call points IIG is never built

9 Contribution Interprocedural register allocation
Optimal, polynomial-time algorithm Scalable IIG is never built If built, it would be chordal Each Procedure colored individually SSA Form – interference graph is chordal Special case of [Beidas and Zhu, ASP-DAC ’05] Top-down color propagation Novel SSA-based intermediate representation Chordal color assignment (with offset)

10 Preallocation of Global Registers
Global registers hold variables that are live across procedure calls How many do we need? Pi – Procedure ck – Call Point Pi ck Pj Procedure Call P – Set of Procedures in App. L(ck) – Set of variables live across ck ck : Call Pj

11 Preallocation of Global Registers
Compute: δ – Number of variables live… At the entry of a procedure Across a call point Procedure: Pi ck: Call … δ2 (δi is known) δ1 δm Pi L(ck) δi = MAX {δk} 1 ≤ k ≤ m δk = δi + |L(ck)| (i.e. Over all points that call Pi)

12 Example δ6 = MAX{δ11, δ14} δ6 = MAX{5, 4} = 5 δ5 = MAX{δ12, δ13}
i δi Example P1 P1 P1 P1 P1 P1 P1 P2 P2 2 P2 2 P2 2 ci |L(ci)| P3 P3 3 P3 3 P3 3 c7 1 1 P4 2 P4 2 P4 P4 2 c8 2 2 c7 c7 c7 c8 P2 c8 c8 c9 P3 c9 c9 c10 c10 c10 P4 c11 c14 P6 c11 c11 P5 6 P5 6 P5 c9 3 3 P6 P6 5 P6 5 P2 c12 P2 P3 c13 P3 P4 c14 P4 c10 2 2 c7 1 c8 2 c7 1 c7 1 c7 c11 5 5 c8 2 c8 2 c8 c12 3 3 c12 c12 c13 P5 c13 c14 c9 c9 3 c9 3 c9 3 c13 3 3 c10 2 c10 2 c10 2 c10 P5 P6 c14 2 2 c11 5 c11 5 c14 4 c11 5 c11 δ6 = MAX{δ11, δ14} δ6 = MAX{5, 4} = 5 δ5 = MAX{δ12, δ13} δ5 = MAX{5, 6} = 6 δ14 = |L(c14)| + δ4 δ14 = = 4 δ13 = |L(c13)| + δ3 δ13 = = 6 δ4 = MAX{δ10} δ4 = MAX{2} = 2 δ8 = |L(c8)| + δ1 δ8 = = 2 δ11 = |L(c11)| + δ1 δ11 = = 5 δ9 = |L(c9)| + δ1 δ9 = = 3 δ10 = |L(c10)| + δ1 δ10 = = 2 δ7 = |L(c7)| + δ1 δ7 = = 1 δ1 = 0 δ12 = |L(c12)| + δ2 δ12 = = 5 δ3 = MAX{δ9} δ3 = MAX{3} = 3 δ2 = MAX{δ7, δ8} δ2 = MAX{1, 2} = 2 c12 5 c12 c12 5 c12 5 c13 6 c13 c13 6 c13 6 c14 c14 4 c14 4

13 Preallocation of Global Registers
N = MAX {δi} – Number of global registers allocated Pi P T = {T1, …., TN} When Procedure Pi is called.. At most δi variables live across calls leading to Pi Holds for every path in the call graph How to ensure that all variables live across calls leading to Pi are assigned to the right register?

14 Launch and Landing Pads
Procedure Pi calls Pj; (m = δi) Assign variables live across calls leading to Pi to T1…Tm Let ck be the call point; n = |L(ck)| Launch Pad Parallel copy placed before the call (Tm+1…Tm+n)  ψ(L(ck)) Landing Pad Copy the values back after the call L(ck)  ψ((Tm+1…Tm+n))

15 Theoretical Consequences of Launch and Landing Pads
Theorem: All global interferences involve at least one global register Corollary: Local variables in distinct procedures do not interfere No local variable in “main” has a global interference Every variable defined locally in Pi (m = δi) Interferes with global registers T1…Tm Does NOT interfere with global registers Tm+1, … TN => Can assign local vars in Pi to global registers Tm+1, … TN

16 Reducing the Chromatic Number
Procedure: A V  … Call B W  … …  V X  … …  W Y  … …  X …  Y Procedure: B Z  … …  Z V W V W X Y Z X Y Chromatic Number = 3

17 Reducing the Chromatic Number
Procedure: A V  … T1  Ψ(V) Call B V  Ψ-1(T1) W  … …  V X  … …  W Y  … …  X T1  Ψ(Y) Y  Ψ-1(T1) …  Y Procedure: B Z  … …  Z V T1 V W X W V Y X Y T1 Z T1 Chromatic Number = 2

18 Characterizing the IIG
Theorem: T is a clique in the IIG IIG is chordal Chromatic Number of the IIG is: R = MAX{δi + χ(Gi)} Pi P

19 Example CLIQUE Global interference
δ1 = 0 δ2 = 2 δ3 = 3 δ4 = 2 δ3 = 6 δ6 = 5 Global interference Tj interferes with each local variable in Gi

20 Coloring Algorithm Use SSA+LLP Form, but DON’T build the IIG
For Pi colors in the range 1..δi are unavailable Color the local (chordal) interference graph Gi of Pi Complexity: O(Vi + Ei) For each vertex in Pi, replace color c with c + δi Complexity: O(Vi)

21 Experiments Applications taken from Mediabench and MiBench
Written in C Compiled Using Machine SUIF Optimal color assignment Compare to heuristics Color Palette Propagation Top-Down, Bottom-Up [Beidas and Zhu, ASP-DAC’05] Heuristic Color Assignment [Matula and Beck, JACM ’83]

22 Registers Allocated (Normalized to Optimal)

23 Runtime (Normalized to Optimal)

24 Runtime of Pegwit (Normalized to Optimal)

25 Limitations Global Variables Static Local Variables Function Pointers
Interfere with all variables in the program Lifetime can still be analyzed Static Local Variables Initialized on first access Hold their values across function calls Function Pointers Resolution is NP-Complete

26 Conclusion Inteprocedural register allocation in HLS A few limitations
Optimal, polynomial-time algorithm Uses SSA Form + Launch/Landing Pads IIG is a chordal graph Scalable – no need to build IIG Significantly faster than sub-optimal heuristics A few limitations Global variables, local static variables Function pointers Resolution is NP-Complete

27 Related Work Register Allocation in HLS
Clique Partitioning/Coloring Problem [Tseng and Siewiorek, ’86] Scheduled DFGs – Interval Graphs [Kurdahi and Parker, ’87] Scheduled Cyclic DFGs – Circular Arc Graphs (NP-Complete) [Stok, ’92] Restrictions on Variable Lifetimes – Chordal Graphs [Springer and Thomas, ’94] Static Single Assignment Form – Chordal Graphs [Brisk et al. 2005/6], [Hack and Goos, 2005/6], [Bouchez et al. 2005]


Download ppt "Optimal Polynomial-Time Interprocedural Register Allocation for High-Level Synthesis Using SSA Form Philip Brisk Ajay K. Verma Paolo Ienne csda."

Similar presentations


Ads by Google