Enabling System-Level Modeling of Variation-Induced Faults in Networks-on-Chips Konstantinos Aisopos (Princeton, MIT) Chia-Hsin Owen Chen (MIT) Li-Shiuan.

Slides:



Advertisements
Similar presentations
A Novel 3D Layer-Multiplexed On-Chip Network
Advertisements

DRAIN: Distributed Recovery Architecture for Inaccessible Nodes in Multi-core Chips Andrew DeOrio †, Konstantinos Aisopos ‡§ Valeria Bertacco †, Li-Shiuan.
VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects Sarangi et al Prateeksha Satyamoorthy CS
© 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center.
High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department
CSE241 Formal Verification.1Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Recitation 6: Formal Verification.
Lizhong Chen and Timothy M. Pinkston SMART Interconnects Group
CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips CiprianSeiculescu Stavros Volos Naser Khosro Pour Babak Falsafi Giovanni De Micheli.
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
Timing Analysis Timing Analysis Instructor: Dr. Vishwani D. Agrawal ELEC 7770 Advanced VLSI Design Team Project.
Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al RC Reading Group – 3/29/2006 Presenter: Ilya Tabakh.
NoC Modeling Networks-on-Chips seminar May, 2008 Anton Lavro.
OCIN Workshop Wrapup Bill Dally. Thanks To Funding –NSF - Timothy Pinkston, Federica Darema, Mike Foster –UC Discovery Program Organization –Jane Klickman,
MICRO-MODEM RELIABILITY SOLUTION FOR NOC COMMUNICATIONS Arkadiy Morgenshtein, Evgeny Bolotin, Israel Cidon, Avinoam Kolodny, Ran Ginosar Technion – Israel.
IP I/O Memory Hard Disk Single Core IP I/O Memory Hard Disk IP Bus Multi-Core IP R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R Networks.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
Shangri-La: Achieving High Performance from Compiled Network Applications while Enabling Ease of Programming Michael K. Chen, Xiao Feng Li, Ruiqi Lian,
Chung-Kuan Cheng†, Andrew B. Kahng†‡,
CAD and Design Tools for On- Chip Networks Luca Benini, Mark Hummel, Olav Lysne, Li-Shiuan Peh, Li Shang, Mithuna Thottethodi,
Issues in System-Level Direct Networks Jason D. Bakos.
Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Automated Design.
Trace-Driven Optimization of Networks-on-Chip Configurations Andrew B. Kahng †‡ Bill Lin ‡ Kambiz Samadi ‡ Rohit Sunkam Ramanujam ‡ University of California,
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Statistical Critical Path Selection for Timing Validation Kai Yang, Kwang-Ting Cheng, and Li-C Wang Department of Electrical and Computer Engineering University.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Enhanced Metamodeling Techniques for High-Dimensional IC Design Estimation Problems Andrew B. Kahng, Bill Lin and Siddhartha Nath VLSI CAD LABORATORY,
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Back-end Timing Models Core Models.
McRouter: Multicast within a Router for High Performance NoCs
Design methodology.
TM Efficient IP Design flow for Low-Power High-Level Synthesis Quick & Accurate Power Analysis and Optimization Flow JAN Asher Berkovitz Yaniv.
ISE. Tatjana Petrovic 249/982/22 ISE software tools ISE is Xilinx software design tools that concentrate on delivering you the most productivity available.
A Flexible Interconnection Structure for Reconfigurable FPGA Dataflow Applications Gianluca Durelli, Alessandro A. Nacci, Riccardo Cattaneo, Christian.
R OUTE P ACKETS, N OT W IRES : O N -C HIP I NTERCONNECTION N ETWORKS Veronica Eyo Sharvari Joshi.
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
Elastic-Buffer Flow-Control for On-Chip Networks
International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia.
CAD for Physical Design of VLSI Circuits
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
Probabilistic Mechanism Analysis. Outline Uncertainty in mechanisms Why consider uncertainty Basics of uncertainty Probabilistic mechanism analysis Examples.
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
George Michelogiannakis, Prof. William J. Dally Concurrent architecture & VLSI group Stanford University Elastic Buffer Flow Control for On-chip Networks.
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
Department of Computer Science and Engineering The Pennsylvania State University Akbar Sharifi, Emre Kultursay, Mahmut Kandemir and Chita R. Das Addressing.
LA-LRU: A Latency-Aware Replacement Policy for Variation Tolerant Caches Aarul Jain, Cambridge Silicon Radio, Phoenix Aviral Shrivastava, Arizona State.
NUMERICAL TECHNOLOGIES, INC. Assessing Technology tradeoffs for 65nm logic circuits D Pramanik, M Cote, K Beaudette Numerical Technologies Inc Valery Axelrad.
Axel Jantsch 1 Networks on Chip Axel Jantsch 1 Shashi Kumar 1, Juha-Pekka Soininen 2, Martti Forsell 2, Mikael Millberg 1, Johnny Öberg 1, Kari Tiensurjä.
Explicit Modeling of Control and Data for Improved NoC Router Estimation Andrew B. Kahng +*, Bill Lin * and Siddhartha Nath + UCSD CSE + and ECE * Departments.
SARC Proprietary and Confidential Processor-to-Memory-Blocks NoC with Pre-Configured (but run-time reconfigurable) Low-Latency Routes G. Mihelogiannakis,
Yu Cai Ken Mai Onur Mutlu
Hrushikesh Chavan Younggyun Cho Structural Fault Tolerance for SOC.
Greg Alkire/Brian Smith 197 MAPLD An Ultra Low Power Reconfigurable Task Processor for Space Brian Smith, Greg Alkire – PicoDyne Inc. Wes Powell.
ARIADNE Agnostic Reconfiguration In A Disconnected Network Environment Konstantinos Aisopos (Princeton, MIT), Andrew DeOrio (Michigan), Li-Shiuan Peh (MIT),
Virtual-Channel Flow Control William J. Dally
A Systematic Methodology to Develop Resilient Cache Coherence Protocols Konstantinos Aisopos (Princeton, MIT) Li-Shiuan Peh (MIT)
Physical Design of FabScalar Generated Cores EE6052 Class Project Wei Zhang.
M AESTRO : Orchestrating Predictive Resource Management in Future Multicore Systems Sangyeun Cho, Socrates Demetriades Computer Science Department University.
Programmable Hardware: Hardware or Software?
ASIC Design Methodology
Mixed-Digital/Analog Simulation and Modeling Research
3Boston University ECE Dept.;
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
Rahul Boyapati. , Jiayi Huang
Analysis of a Chip Multiprocessor Using Scientific Applications
Israel Cidon, Ran Ginosar and Avinoam Kolodny
Timing Analysis 11/21/2018.
Post-Silicon Calibration for Large-Volume Products
Presentation transcript:

Enabling System-Level Modeling of Variation-Induced Faults in Networks-on-Chips Konstantinos Aisopos (Princeton, MIT) Chia-Hsin Owen Chen (MIT) Li-Shiuan Peh (MIT)

The Tale of Resilient NoCs Silicon technologies move into the nanometer regime  Devices become unreliable due to Process Variation (PV)  System designers propose resilient NoC architectures From 1994 to 2011…  Dally’s Reliable Router (1994)  RoCo (ISCA’06)  BulletProof (HPCA’06)  Vicis (DAC’09) What fault model are these proposals evaluated with? uniform random fault distribution across gates >50% inaccuracy in capturing fault locations can we do better?

Methodology for Accurate Fault Modeling What is the golden reference of the expected PV maps? The SPICE models of Standard Cells of the technology How do we use them to capture variation-induced faults? (list of standard cells and their interconnections) Layouts of Standard Cells SPICE models of Standard Cells extraction router RTL synthesized netlist synthesis SPICE model netlist Monte Carlo simulations

Methodology for Accurate Fault Modeling Challenge: duration of simulation Solution: hybrid timing / circuit-level simulation Step 1. Find the critical paths and the inputs that result in their longest delays (with Static Timing Analysis) Step 2. Perform Monte Carlo circuit-level simulations only for these paths / input permutations to capture variation-induced timing violations Step 3. Map circuit-level timing violations back to system-level faults

Methodology for Accurate Fault Modeling Step 3: mapping circuit-level violations  system-level faults Each Verilog signal piggybacks a vector of system-level faults critical path1 critical path2 X unfair arbitration X X data corruption packet loss 100 Monte Carlo simulations 1/100 3/100 # timing violations? X X P(fault type = packet loss) (1/100) U (3 /100)

Probability / System Impact of Faults? (1) for fixed configuration and fixed runtime conditions Probability of occurrence configuration:5-input / 5-output router, 4-stage pipeline, 4 private VCs, 3 buffers/VC, 64bit wires runtime conditions: 2.8GHz, 27C data corruption packet loss misrouting credit generation credit loss erroneous allocation unfair arbitration packet duplication packet conservation flow control

Probability / System Impact of Faults? data vnet num VCs2 num buff/VC3 control vnet num VCs2 num buff/VC1 channel width (bits)64 num inputs5 (4 directions, network interface) num outputs5 (4 directions, network interface) frequency75% synthesis frequency (2.85GHz) temperaturenot fixed (input argument) core power1 watt topology8x8 mesh, 4 memory controllers at corners floorplan256mm 2, 2mmx2mm cores, 0.2mmx0.2mm routers L1 cache32KB/node, private unified, 2W, MESI L2 cache1MB/node, shared distributed, 16W workloaduniform random traffic, PARSEC suite temperature Fault Model process parameters - threshold voltage (μ,σ) - transistor width (μ,σ) - transistor length (μ,σ) - oxide thickness (μ,σ) probability of faults Hotspot 5.0 thermal model Orion 2.0 power model Garnet network simulator GEMS multi-core simulator floorplan power temperature = fixed °C (2) for dynamic runtime conditions system and network configuration router configuration

Probability / System Impact of Faults? (2) for dynamic runtime conditions 8%-10% fault probabilities for high traffic up to 1% fault probabilities for real workloads

Conclusions Presented a fault modeling tool for system-level simulators Accurate + easy-to-integrate into any network simulator (already available in GEMS and GARNET) Do you need a fault model to accurately evaluate…  …a resilient coherence protocol (tolerating lost messages)?  …a resilient routing algorithm (tolerating misrouted packets)?  …an Error Correction Code (protecting data bits)? …then consider integrating our tool into your simulator to accurately model faults! Download here: