Presentation is loading. Please wait.

Presentation is loading. Please wait.

Redundancy-Aware, Fault-Tolerant Clustering

Similar presentations


Presentation on theme: "Redundancy-Aware, Fault-Tolerant Clustering"— Presentation transcript:

1 Redundancy-Aware, Fault-Tolerant Clustering
Jason Cong and Brian Tagiku VLSI CAD Lab Computer Science Department University of California, Los Angeles

2 Overview of IC-DFN Efforts at UCLA
Synthesis for higher level of abstraction Architecture and synthesis for nanoFPGAs (jointly with Prof. Tim Cheng, Evelyn Hu, and Kang Wang) Synthesis for error-resilient designs 11/30/2018 UCLA VLSICAD LAB

3 xPilot: Platform-Based Synthesis System
SystemC/C/MMM Platform Description & Constraints Recent Progress on xPilot Refined MMM-to-SSDM translation Efficient & versatile scheduling engine based system of difference constraints (DAC’06) Communication-centric binding based distributed register file μ- arch (ICCAD’06) Behavior-and-communication co-optimization for interface synthesis (DAC’06) Design drivers Motion-JPEG MPEG4 simple profile video decoder Hybrid approach on Xilinx XUP board Microblaze (or PowerPC) + HW synthesized blocks xPilot xPilot Front End Profiling SSDM (System-Level Synthesis Data Model) Analysis Mapping Processor & Architecture Synthesis Interface Synthesis Behavioral Synthesis Custom Logic Processor Cores + Executables Drivers + Glue Logic FPSoC Uniqueness of xPilot Platform-based synthesis and optimization Communication-centric synthesis 11/30/2018 UCLA VLSICAD LAB

4 MPEG-4 Simple Profile Decoder: Synthesis Results
Complexity of synthesized RTLs Module Name Orig. C Source File (+ line#) VHDL line# Copy Controller copyControl.c (287) 2815 Motion Comp. Motion-compensation.c (312) 4681 Parser/ VLD bitstream.c (439) 6093 motion_decode.c (492) 10934 parser.c (1095) 12036 texture_vld.c (504) 6089 Texture/ IDCT texture_idct.c (1819) 11537 Texture Update textureUpdate.c (220) 2736 Total 5168 56921 Setting Module Slices BRAMs Period (ns) 30fps Device:v2p30 Parser/VLD 2693 16 8.0 Motion Comp. 899 Texture/IDCT 2032 Texture Update & Copy Control 1407 11/30/2018 UCLA VLSICAD LAB

5 Updated Results on Motion-JPEG Example
Preprocess DCT Quant Huffman Model #1 : 5 Microblazes FSL-based communication Table Modification OR Preprocess HW-DCT Quant Huffman Encoded JPEG Images Model #2 : 4 Microblazes + DCT on FPGA fabrics Table Modification RAW Images System Cycle# Fmax (MHZ) Exe Time (ms) Area (Slice#) Model #1 23812 126 0.189 4306 Model #2 (-38%) 0.117 6345 11/30/2018 UCLA VLSICAD LAB FSL-based communication is a major performance overhead Xilinx XUP Board

6 Overview of IC-DFN Efforts at UCLA
Synthesis for higher level of abstraction Architecture and synthesis for nanoFPGAs (jointly with Prof. Tim Cheng, Evelyn Hu, and Kang Wang) Synthesis for error-resilient designs Redundancy-aware, fault-tolerant clustering 11/30/2018 UCLA VLSICAD LAB

7 Hierarchical FPGAs 2 level, hierarchical circuit logic
Level 1 – LUTs Level 2 – Clusters of LUTs Higher levels (clusters of clusters) also possible Uses locality of interconnections to improve circuit performance 11/30/2018 UCLA VLSICAD LAB

8 Redundancy in FPGAs LUTs can fail with some probability
Allocate extra components (e.g. LUTs) into the system Re-route inputs and outputs to a spare LUT Ideally, want the spare LUT to be close to the failure so that delay does not increase 11/30/2018 UCLA VLSICAD LAB

9 Redundancy in FPGAs LUTs can fail with some probability
Allocate extra components (e.g. LUTs) into the system Re-route inputs and outputs to a spare LUT Ideally, want the spare LUT to be close to the failure so that delay does not increase 11/30/2018 UCLA VLSICAD LAB

10 Redundancy in FPGAs LUTs can fail with some probability
Allocate extra components (e.g. LUTs) into the system Re-route inputs and outputs to a spare LUT Ideally, want the spare LUT to be close to the failure so that delay does not increase 11/30/2018 UCLA VLSICAD LAB

11 Redundancy in FPGAs LUTs can fail with some probability
Allocate extra components (e.g. LUTs) into the system Re-route inputs and outputs to a spare LUT Ideally, want the spare LUT to be close to the failure so that delay does not increase 11/30/2018 UCLA VLSICAD LAB

12 Redundancy in FPGAs LUTs can fail with some probability
Allocate extra components (e.g. LUTs) into the system Re-route inputs and outputs to a spare LUT Ideally, want the spare LUT to be close to the failure so that delay does not increase 11/30/2018 UCLA VLSICAD LAB

13 Motivational Example 4 LUTs (each of delay 1) 2 Clusters of 3 LUTs
Inter-cluster edges have delay 3 Target delay 6 LUTs fail with probability 0.1 A B C D A C B D 11/30/2018 UCLA VLSICAD LAB

14 Motivational Example Circuit Delay Probability 6 0.89 9 0.09 --- 0.02
0.97 9 0.01 --- 0.02 11/30/2018 UCLA VLSICAD LAB

15 The Problem Inputs Objective A network G of n LUTs (acyclic)
An FPGA with C clusters, each with M LUTs Inter-cluster interconnect delay d Target circuit delay D Probability p of LUT failure Objective Cluster G using no more than C clusters such that probability of circuit achieving delay D or faster is maximized. LUT duplication allowed, but at the cost of a spare LUT. 11/30/2018 UCLA VLSICAD LAB

16 Dynamic Programming Heuristic
Use a dynamic programming matrix A A is an n £ n £ D matrix Each entry A[i,j,k] stores a clustering solution of LUT i and its predecessors such that Exactly j clusters are used The minimum arrival time at the output of i is k The probability of the circuit achieving delay k is maximized 11/30/2018 UCLA VLSICAD LAB

17 Dynamic Programming Heuristic
Filling out the matrix Traverse graph in topological order For PI, form its own cluster For all others Select subset of parents Select clusters of parents and merge Place resulting clustering in A if probability of achieving k is largest so far Repeat for all possible subsets of parents and clusterings 11/30/2018 UCLA VLSICAD LAB

18 Dynamic Programming Heuristic
Filling out the matrix Traverse graph in topological order For PI, form its own cluster For all others Select subset of parents Select clusters of parents and merge Place resulting clustering in A if probability of achieving k is largest so far Repeat for all possible subsets of parents and clusterings PI PI 11/30/2018 UCLA VLSICAD LAB

19 Dynamic Programming Heuristic
Filling out the matrix Traverse graph in topological order For PI, form its own cluster For all others Select subset of parents Select clusters of parents and merge Place resulting clustering in A if probability of achieving k is largest so far Repeat for all possible subsets of parents and clusterings 11/30/2018 UCLA VLSICAD LAB

20 Dynamic Programming Heuristic
Filling out the matrix Traverse graph in topological order For PI, form its own cluster For all others Select subset of parents Select clusters of parents and merge Place resulting clustering in A if probability of achieving k is largest so far Repeat for all possible subsets of parents and clusterings 11/30/2018 UCLA VLSICAD LAB

21 DP Heuristic Performance
All LUTs weight 1 10% failure rate Intracluster edge delay 0 Intercluster edge delay 3 8 clusters each of 3 LUTs Target delay of 7 11/30/2018 UCLA VLSICAD LAB

22 DP Heuristic Performance
Min-delay clustering Achieves delay 7 with probability ≈ 0.28 DP clustering Achieves delay 7 with probability ≈ 0.39 11/30/2018 UCLA VLSICAD LAB

23 Difficulties Best known algorithm for calculating probability distribution of delays is exponential Reconvergent fan-out introduces dependencies in probabilities Can’t use exact probabilities to guide algorithms/heuristics Hard to evaluate the performance of algorithms/heuristics Difficult to assess quality of a sub-clustering of a node and its fan-in cone Global knowledge (e.g. placement of spares) of the clustering is needed Makes dynamic programming a harder approach 11/30/2018 UCLA VLSICAD LAB

24 Future Work Study the tractability of the problem
Propose exact or approximation algorithms or better heuristics Generalize the interconnect delays so the problem addresses LUT placement Study the problem of assigning failures to spares so as to minimize delay 11/30/2018 UCLA VLSICAD LAB


Download ppt "Redundancy-Aware, Fault-Tolerant Clustering"

Similar presentations


Ads by Google