Temporal Placement.

Slides:

Advertisements

Similar presentations

Mathematical Preliminaries

Advertisements

Variations of the Turing Machine

Introduction to Algorithms

EE384y: Packet Switch Architectures

1 Concurrency: Deadlock and Starvation Chapter 6.

Constraint Satisfaction Problems

1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.

Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.

UNITED NATIONS Shipment Details Report – January 2006.

Dynamic Programming Introduction Prof. Muhammad Saeed.

Introduction to Algorithms

and 6.855J Spanning Tree Algorithms. 2 The Greedy Algorithm in Action

1 Chapter 12 File Management Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,

Polygon Scan Conversion – 11b

Data Visualization Lecture 4 Two Dimensional Scalar Visualization

GR2 Advanced Computer Graphics AGR

The National Certificate in Adult Numeracy

1 Outline relationship among topics secrets LP with upper bounds by Simplex method basic feasible solution (BFS) by Simplex method for bounded variables.

Robust Window-based Multi-node Technology- Independent Logic Minimization Jeff L.Cobb Kanupriya Gulati Sunil P. Khatri Texas Instruments, Inc. Dept. of.

Discrete time Markov Chain

THERMAL-AWARE BUS-DRIVEN FLOORPLANNING PO-HSUN WU & TSUNG-YI HO Department of Computer Science and Information Engineering, National Cheng Kung University.

Chapter 4: Informed Heuristic Search

Chapter 5 : Memory Management

Data Structures ADT List

Data Structures Using C++

Chapter 10: Virtual Memory

Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.

การทดสอบโปรแกรม กระบวนการในการทดสอบ

Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.

David Luebke 1 8/25/2014 CS 332: Algorithms Red-Black Trees.

Routing and Congestion Problems in General Networks Presented by Jun Zou CAS 744.

1 Modeling and Simulation: Exploring Dynamic System Behaviour Chapter9 Optimization.

CSE 4101/5101 Prof. Andy Mirzaian. Lists Move-to-Front Search Trees Binary Search Trees Multi-Way Search Trees B-trees Splay Trees Trees Red-Black.

Scalable and Dynamic Quorum Systems Moni Naor & Udi Wieder The Weizmann Institute of Science.

1 Motion and Manipulation Configuration Space. Outline Motion Planning Configuration Space and Free Space Free Space Structure and Complexity.

6.4 Best Approximation; Least Squares

1 On c-Vertex Ranking of Graphs Yung-Ling Lai & Yi-Ming Chen National Chiayi University Taiwan.

Chapter 2 Entity-Relationship Data Modeling: Tools and Techniques

Chapter 10: The Traditional Approach to Design

10 -1 Chapter 10 Amortized Analysis A sequence of operations: OP 1, OP 2, … OP m OP i : several pops (from the stack) and one push (into the stack)

Systems Analysis and Design in a Changing World, Fifth Edition

©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.

Local Search Jim Little UBC CS 322 – CSP October 3, 2014 Textbook §4.8

CSE Lecture 17 – Balanced trees

PSSA Preparation.

Foundations of Data Structures Practical Session #7 AVL Trees 2.

Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.

Simple Linear Regression Analysis

Introduction into Simulation Basic Simulation Modeling.

Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

CPSC 322, Lecture 5Slide 1 Uninformed Search Computer Science cpsc322, Lecture 5 (Textbook Chpt 3.5) Sept, 14, 2012.

Bart Jansen 1.  Problem definition  Instance: Connected graph G, positive integer k  Question: Is there a spanning tree for G with at least k leaves?

1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.

The Pumping Lemma for CFL’s

Minimum Vertex Cover in Rectangle Graphs

Probabilistic Reasoning over Time

Section 3.4 The Traveling Salesperson Problem Tucker Applied Combinatorics By Aaron Desrochers and Ben Epstein.

Optimal Rectangle Packing: A Meta-CSP Approach Chris Reeson Advanced Constraint Processing Fall 2009 By Michael D. Moffitt and Martha E. Pollack, AAAI.

Chapter 7 Memory Management Operating Systems: Internals and Design Principles, 6/E William Stallings Dave Bremer Otago Polytechnic, N.Z. ©2009, Prentice.

Chapter 7 Memory Management

Dynamic NoC. 2 Limitations of Fixed NoC Communication NoC for reconfigurable devices:  NOC: a viable infrastructure for communication among task dynamically.

K. Bazargan R. KastnerM. Sarrafzadeh Physical Design for Reconfigurable Computing Systems using Firm Templates Department of Electrical & Computer Engineering.

Chapter 7 Memory Management

 Optimal Packing of High- Precision Rectangles By Eric Huang & Richard E. Korf 25 th AAAI Conference, 2011 Florida Institute of Technology CSE 5694 Robotics.

Chapter 7 Memory Management

Partial Reconfigurable Designs

Chapter 2 Memory and process management

Presentation transcript:

Temporal Placement

Wasted Resources Wasted resource wr(vi) of a node vi: Unused area occupied by the node vi during the computation of a partition. wr(vi) = (t(Pi)−ti)×ai, t(Pi): run-time of partition Pi. ti: run-time of the component vi ai: area of vi Wasted resource avoidance: A component is placed on the chip only when its computation is required and remains on the device only for time it is active.  Idle components can be replaced by new ones. Assumptions: There is partial reconfiguration support. Blocks are hard For soft library modules, temporal partitioning + 2d placement may suffice.

Temporal Placement A horizontal cut: RFU configuration at a given time Time-dependent placement Management of task execution at run-time Graphical Representation: Z axis: life time of a configuration. Some overlaps (resource sharing) in 2-D is allowed: If not at the same time. A horizontal cut: RFU configuration at a given time

Temporal Placement Off-line (compile time) On-line (during execution) Pre-defined placement sequence In DSP applications, program flow can be predicted. On-line (during execution) Computation sequence is not known in advance.  Dynamically at run time. Needs to solve computational intensive problems in a fraction of millisecond: Efficient management of the free space Selection of the best site for a new component Management of communication

Off-Line Temporal Placement Given a DFG G = (V,E) and a reconfigurable device H (with length Hx and width Hy), a temporal placement is a 3-dimensional vector function: p = (px, py, pt) : V → R3, pt defines a feasible schedule (pt(vi): Starting time of vi.) px(vi), py(vi): Coordinates of vi on H,

Off-Line Temporal Placement [Bazargan]

Off-Line Temporal Placement Algorithms: First fit Best fit Packing Simulated annealing (KAMER)

First/Best Fit Algorithm First fit: Select a node among ready nodes. i.e. nodes whose predecessors have been placed Place it in the first free location. Advantages: Fast (linear w.r.t. number of free locations) Disadvantages: Unused resources very high Fragmentation

First/Best Fit Algorithm Select a node among ready nodes. Place it in the best free location. Advantages: Better area utilization Disadvantages: Much slower: Complete list of free locations must be searched. Fragmentation Simplifying the problem: Restricting the modules to columns (Virtex II) or part of a column (Virtex 4 / Virtex 5 / Virtex 6). Frames CLBs

First Fit Algorithm Algorithm: First-fit temporal placement of clusters 1. While all the clusters are not placed do 1.1. Select one cluster Cact from the list of ready-to-run clusters 1.2. From the clusters already placed, select the cluster Ctop with the smallest finishing-time such that Ctop ≤ Cact 1.3. Place the cluster Cact on top of Ctop 2. end while

First Fit Algorithm Configuration sequence produced: {0, 1, 2, 3, 4} {5, 1, 2, 3, 4} {5, 1 , 6, 7, 4} {5, 1 , 6, 7, 8} {5, 9 , 6, 7, 8} {10, 9 , 6, 7, 8}

ILP ILP: Can construct a constraint set An objective function Advantage: Exact solution Limitation: Only for small size problems.

Packing Approach Variations of Packing Problem: Base Minimization Problem (BMP): Given a set of boxes B and a height H, find a container with minimal size (x, y, H) that can accommodate the set of boxes B i.e. in temporal placement: Find the device with minimum size (x, y) on which a set of components can be implemented given an overall run-time t = H. Strip Packing Problem (SPP): Given a set of boxes and a base (X, Y), find the minimum height h container with size (X, Y, h) that can hold all the boxes. i.e. in temporal Find the minimum run-time t = h for a set of components given a reconfigurable device with size (X, Y). The device is usually fixed  SPP is more common.

KAMER

KAMER Model of an RCS Larger picture: An RFU operation ri can be either accepted or rejected based on availability of RFU real-estate If rejected, run on CPU  running time penalty Assumption: No communications among RFUs After running an operation on RFU, results are saved in CPU registers

On-line temporal placement KAMER Model of an RCS On-line temporal placement RFUOPS = {r1, …, rn| ri = (wi, hi, si, ei)} ei - si : time span the operation is resident in the system Placement of RFUOPS: Modeled as 3D template placement Empty Rectangle (ER): A rectangle that does not overlap a placed module on the chip Maximum Empty Rectangle (MER): An ER not included in any other rectangle than the device bounding box

Example Non-ER: (E,F) ER: (E,D): MER (A,D): MER (E,C): not MER KAMER method: keeping all maximum empty rectangles: Permanently keeps track of all MERs Whenever a request for placing a component v arrives, the list of MERs is searched for a rectangle that can accommodate v. Possible to have many MERs in which v can fit  Strategies: first-fit or best-fit

KAMER Once a rectangle is chosen: Problem: Candidate points are those that do not allow an overlap with the external part of the rectangle. Problem: Number of empty rectangles does not grow linear with the number of components included

KAMER Run-time of free rectangle placement: O(n2) Heuristic: Keep only the non-overlapping empty rectangles  Linear time, lower quality Problem 1: Several non-overlapping rectangle representations

Keeping Non-Overlapping ERs Problem 2: Non-overlapping empty rectangles are not necessarily maximal  A module may exists that could fit onto the device, but cannot be placed Can’t fit (bad decomposition) Can fit

Keeping Non-Overlapping ERs Problem 2: Can’t fit Can fit

Keeping Non-Overlapping ERs Incremental update: Whenever a new component v1 is placed in a rectangle, two possibilities: Horizontal splitting Vertical splitting Negative impact on next components

Keeping Non-Overlapping ERs Horizontal Can’t fit Vertical Solution: Delaying the split decision for a number of steps later Can fit

Keeping Non-Overlapping ERs KAMER always finds a rectangle to place a new component, if one exists. But position to place the component within the rectangle must be selected from a set of points whose number is the area of the rectangle in worst case In [Bazargan00]: Bottom-left position Note: No communication is assumed between the rectangle to be placed and those already placed. If communication exists: An optimal algorithm should consider any single point Solution: [Ahmadinia04]

KAMER Off-Line Placement Uses simulated annealing in a 3D space: Places X% largest modules X is a parameter of the algorithm Idea: Remove small modules  Open space for larger ones  Small modules fragment more Initial placement: from KAMER- best fit Perturb: Can displace a module on the RFU No change in start or end time Can move an operation from CPU to RFU and vice versa Use zero-temperature SA for fine tuning Place as many (100-X)% modules as it can

Packing Classes Interval Graph: Given a DFG G = (V,E), a reconfigurable device H and a packing of V into H, (i.e. a 3-D placement of the components of V on the device H), an interval graph of G is a graph Gi = (V,Ei), i  {1, 2, 3} such that: (vk, vl)  Ei  the projections of the nodes vk and vl overlap in the i-th dimension. Complement Graph: Complement of interval graph: Gi = (V,Ei) The comparability graph of a partially ordered set P = (X, <=) is the graph with vertex set X for which vertices x and y are adjacent iff either x<=y or y<=x in P . In graph theory, a comparability graph is an undirected graph that connects pairs of elements that are related to each other in a partial order. Comparability graphs have also been called transitively orientable graphs, partially orderable graphs, and containment graphs.[1] If (S,<) is a partial order, one can define an undirected graph in which the vertices are the elements of S and in which there exists an undirected edge connecting any pair of vertices (x,y) such that either x < y or y < x. Such a graph is called a comparability graph. Equivalently, a comparability graph is a graph that has a transitive orientation.[2] Such an orientation consists of an assignment of a direction to each edge of the graph such that the resulting directed graph satisfies a transitive law: whenever there exist directed edges (x,y) and (y,z), there must exist an edge (x,z).

Packing Class Packing class: A d-tuple of interval graphs with the following properties: C1: Each independent set S  Gi is i-feasible i {1,2,3} (wi(S) = vS wi(v)  hi) (each set of boxes must fit in the container in the i-th direction) Independent Set S: v,w  S, v overlaps with w in dimension (3-i), i{1,2} C2: i=1...d Ei = . There must be at least one dimension in which the boxes do not overlap. The Gi are called component graphs of E = {E1, E2, E3}. wi = length of vi in i-th direction

Packing Class

Packing Class Invalid two-dimensional packing:

Packing Class Theorem: A d-tuple of graphs Gi(Vi,Ei) corresponds to a feasible packing, if and only if it is a packing class. Comparability Graph: Given a DFG G = (V,E), a reconfigurable device H and a packing of V into H, The comparability graph of an interval graph Gi = (V,Ei), i  {1, 2} is the directed graph Gi = (V,Ei), where (vk, vl)  Ei  vl is ‘placed after’ vk in 3-rd dimension, i.e. (pt(vk) + tk ≤ pt(vl)). The relation ‘place after’ defined by the comparability graph is a transitive relation also known as transitive orientation that can be used for orienting the packing classes.

Packing Class Orientation Given G(V, E), H and packing of V on H, orientation of the packing class corresponding to the packing is defined by constructing the comparability graph of the interval graph in the time dimension.

Algorithm Can check if a given arrangement of nodes forms a valid packing with some precedence constraints fulfilled. Build arrangements on which we can apply the formulas to check whether the arrangement is a solution. If we have a set of available solutions, we need to extract the optimal one. Arrangements can be constructed by arbitrarily fixing/changing the edges of the different graphs as well as their orientations. solution space: The set of all possible arrangements that can be built. Can be extremely large.

Algorithm Incremental approach: Constructs a tree from the root to the leaf: Root consists of the set of available nodes. Inserts edges to the already existing graph Two different insertions of edges on the same node  two different branches in the tree. In each step: A new edge is included into the graphs, The validity of the resulting placement is checked. If the resulting placement not valid, then all possible arrangement in which the introduced edges exist are discarded from the search. Otherwise, a new edge is added to the constructed graph. Whenever a branch does not lead to valid leaf, the complete subtree resulting from the introduction of a new branch is discarded.

References [Bobda07] C. Bobda, “Introduction to Reconfigurable Computing: Architectures, Algorithms and Applications,” Springer, 2007. [Hauck08] S. Hauck, A. DeHon, "Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation" Morgan-Kaufmann, 2007 [Mehdipour06] F. Mehdipour*, M. Saheb Zamani, M. Sedighi, “An integrated temporal partitioning and physical design framework for static compilation of reconfigurable computing systems,” Journal of Microprocessors and Microsystems, Elsevier, v30, 2006, pp. 52–62. [Bazargan00] K. Bazargan, R. Kastner and M. Sarrafzadeh, "Fast Template Placement for Reconfigurable Computing Systems," IEEE Design and Test - Special Issue on Reconfigurable Computing, January-March 2000. [Bazargan] Bazargan, “Fast Placement Methods for Reconfigurable Computing Systems,” http://www.ece.umn.edu/~kia/Research/Pre2000/RCSPlace/ [Ahmadinia04] A. Ahmadinia, C. Bobda, M. Bednara, and J. Teich, “A new approach for on-line placement on reconfigurable devices,” in Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS) / Reconfigurable Architectures Workshop (RAW), 2004.