Download presentation
Presentation is loading. Please wait.
Published byHippolyte Blanchette Modified over 6 years ago
1
Redundancy-Aware, Fault-Tolerant Clustering
Jason Cong and Brian Tagiku VLSI CAD Lab Computer Science Department University of California, Los Angeles
2
Overview of IC-DFN Efforts at UCLA
Synthesis for higher level of abstraction Architecture and synthesis for nanoFPGAs (jointly with Prof. Tim Cheng, Evelyn Hu, and Kang Wang) Synthesis for error-resilient designs 11/30/2018 UCLA VLSICAD LAB
3
xPilot: Platform-Based Synthesis System
SystemC/C/MMM Platform Description & Constraints Recent Progress on xPilot Refined MMM-to-SSDM translation Efficient & versatile scheduling engine based system of difference constraints (DAC’06) Communication-centric binding based distributed register file μ- arch (ICCAD’06) Behavior-and-communication co-optimization for interface synthesis (DAC’06) Design drivers Motion-JPEG MPEG4 simple profile video decoder Hybrid approach on Xilinx XUP board Microblaze (or PowerPC) + HW synthesized blocks xPilot xPilot Front End Profiling SSDM (System-Level Synthesis Data Model) Analysis Mapping Processor & Architecture Synthesis Interface Synthesis Behavioral Synthesis Custom Logic Processor Cores + Executables Drivers + Glue Logic FPSoC Uniqueness of xPilot Platform-based synthesis and optimization Communication-centric synthesis 11/30/2018 UCLA VLSICAD LAB
4
MPEG-4 Simple Profile Decoder: Synthesis Results
Complexity of synthesized RTLs Module Name Orig. C Source File (+ line#) VHDL line# Copy Controller copyControl.c (287) 2815 Motion Comp. Motion-compensation.c (312) 4681 Parser/ VLD bitstream.c (439) 6093 motion_decode.c (492) 10934 parser.c (1095) 12036 texture_vld.c (504) 6089 Texture/ IDCT texture_idct.c (1819) 11537 Texture Update textureUpdate.c (220) 2736 Total 5168 56921 Setting Module Slices BRAMs Period (ns) 30fps Device:v2p30 Parser/VLD 2693 16 8.0 Motion Comp. 899 Texture/IDCT 2032 Texture Update & Copy Control 1407 11/30/2018 UCLA VLSICAD LAB
5
Updated Results on Motion-JPEG Example
Preprocess DCT Quant Huffman Model #1 : 5 Microblazes FSL-based communication Table Modification OR Preprocess HW-DCT Quant Huffman Encoded JPEG Images Model #2 : 4 Microblazes + DCT on FPGA fabrics Table Modification RAW Images System Cycle# Fmax (MHZ) Exe Time (ms) Area (Slice#) Model #1 23812 126 0.189 4306 Model #2 (-38%) 0.117 6345 11/30/2018 UCLA VLSICAD LAB FSL-based communication is a major performance overhead Xilinx XUP Board
6
Overview of IC-DFN Efforts at UCLA
Synthesis for higher level of abstraction Architecture and synthesis for nanoFPGAs (jointly with Prof. Tim Cheng, Evelyn Hu, and Kang Wang) Synthesis for error-resilient designs Redundancy-aware, fault-tolerant clustering 11/30/2018 UCLA VLSICAD LAB
7
Hierarchical FPGAs 2 level, hierarchical circuit logic
Level 1 – LUTs Level 2 – Clusters of LUTs Higher levels (clusters of clusters) also possible Uses locality of interconnections to improve circuit performance 11/30/2018 UCLA VLSICAD LAB
8
Redundancy in FPGAs LUTs can fail with some probability
Allocate extra components (e.g. LUTs) into the system Re-route inputs and outputs to a spare LUT Ideally, want the spare LUT to be close to the failure so that delay does not increase 11/30/2018 UCLA VLSICAD LAB
9
Redundancy in FPGAs LUTs can fail with some probability
Allocate extra components (e.g. LUTs) into the system Re-route inputs and outputs to a spare LUT Ideally, want the spare LUT to be close to the failure so that delay does not increase 11/30/2018 UCLA VLSICAD LAB
10
Redundancy in FPGAs LUTs can fail with some probability
Allocate extra components (e.g. LUTs) into the system Re-route inputs and outputs to a spare LUT Ideally, want the spare LUT to be close to the failure so that delay does not increase 11/30/2018 UCLA VLSICAD LAB
11
Redundancy in FPGAs LUTs can fail with some probability
Allocate extra components (e.g. LUTs) into the system Re-route inputs and outputs to a spare LUT Ideally, want the spare LUT to be close to the failure so that delay does not increase 11/30/2018 UCLA VLSICAD LAB
12
Redundancy in FPGAs LUTs can fail with some probability
Allocate extra components (e.g. LUTs) into the system Re-route inputs and outputs to a spare LUT Ideally, want the spare LUT to be close to the failure so that delay does not increase 11/30/2018 UCLA VLSICAD LAB
13
Motivational Example 4 LUTs (each of delay 1) 2 Clusters of 3 LUTs
Inter-cluster edges have delay 3 Target delay 6 LUTs fail with probability 0.1 A B C D A C B D 11/30/2018 UCLA VLSICAD LAB
14
Motivational Example Circuit Delay Probability 6 0.89 9 0.09 --- 0.02
0.97 9 0.01 --- 0.02 11/30/2018 UCLA VLSICAD LAB
15
The Problem Inputs Objective A network G of n LUTs (acyclic)
An FPGA with C clusters, each with M LUTs Inter-cluster interconnect delay d Target circuit delay D Probability p of LUT failure Objective Cluster G using no more than C clusters such that probability of circuit achieving delay D or faster is maximized. LUT duplication allowed, but at the cost of a spare LUT. 11/30/2018 UCLA VLSICAD LAB
16
Dynamic Programming Heuristic
Use a dynamic programming matrix A A is an n £ n £ D matrix Each entry A[i,j,k] stores a clustering solution of LUT i and its predecessors such that Exactly j clusters are used The minimum arrival time at the output of i is k The probability of the circuit achieving delay k is maximized 11/30/2018 UCLA VLSICAD LAB
17
Dynamic Programming Heuristic
Filling out the matrix Traverse graph in topological order For PI, form its own cluster For all others Select subset of parents Select clusters of parents and merge Place resulting clustering in A if probability of achieving k is largest so far Repeat for all possible subsets of parents and clusterings 11/30/2018 UCLA VLSICAD LAB
18
Dynamic Programming Heuristic
Filling out the matrix Traverse graph in topological order For PI, form its own cluster For all others Select subset of parents Select clusters of parents and merge Place resulting clustering in A if probability of achieving k is largest so far Repeat for all possible subsets of parents and clusterings PI PI 11/30/2018 UCLA VLSICAD LAB
19
Dynamic Programming Heuristic
Filling out the matrix Traverse graph in topological order For PI, form its own cluster For all others Select subset of parents Select clusters of parents and merge Place resulting clustering in A if probability of achieving k is largest so far Repeat for all possible subsets of parents and clusterings 11/30/2018 UCLA VLSICAD LAB
20
Dynamic Programming Heuristic
Filling out the matrix Traverse graph in topological order For PI, form its own cluster For all others Select subset of parents Select clusters of parents and merge Place resulting clustering in A if probability of achieving k is largest so far Repeat for all possible subsets of parents and clusterings 11/30/2018 UCLA VLSICAD LAB
21
DP Heuristic Performance
All LUTs weight 1 10% failure rate Intracluster edge delay 0 Intercluster edge delay 3 8 clusters each of 3 LUTs Target delay of 7 11/30/2018 UCLA VLSICAD LAB
22
DP Heuristic Performance
Min-delay clustering Achieves delay 7 with probability ≈ 0.28 DP clustering Achieves delay 7 with probability ≈ 0.39 11/30/2018 UCLA VLSICAD LAB
23
Difficulties Best known algorithm for calculating probability distribution of delays is exponential Reconvergent fan-out introduces dependencies in probabilities Can’t use exact probabilities to guide algorithms/heuristics Hard to evaluate the performance of algorithms/heuristics Difficult to assess quality of a sub-clustering of a node and its fan-in cone Global knowledge (e.g. placement of spares) of the clustering is needed Makes dynamic programming a harder approach 11/30/2018 UCLA VLSICAD LAB
24
Future Work Study the tractability of the problem
Propose exact or approximation algorithms or better heuristics Generalize the interconnect delays so the problem addresses LUT placement Study the problem of assigning failures to spares so as to minimize delay 11/30/2018 UCLA VLSICAD LAB
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.