FAULTSIM: A FAST, CONFIGURABLE MEMORY-RESILIENCE SIMULATOR DAVID A. ROBERTS, AMD RESEARCH PRASHANT J. NAIR, GEORGIA INSTITUTE OF TECHNOLOGY

Slides:



Advertisements
Similar presentations
How to Use The 3 AXI Configurations
Advertisements

ATI Stream Computing OpenCL™ Histogram Optimization Illustration Marc Romankewicz April 5, 2010.
ATI Stream Computing ACML-GPU – SGEMM Optimization Illustration Micah Villmow May 30, 2008.
ATI Stream ™ Physics Neal Robison Director of ISV Relations, AMD Graphics Products Group Game Developers Conference March 26, 2009.
Pension Fund Trustees Liability Ncedi Mbongwe. Introduction to Camargue Underwriting Managers Established in 2001 Underwriters: Mutual and Federal and.
Cooperative Boosting: Needy versus Greedy Power Management INDRANI PAUL 1,2, SRILATHA MANNE 1, MANISH ARORA 1,3, W. LLOYD BIRCHER 1, SUDHAKAR YALAMANCHILI.
© 2014 Microsoft Corporation. All rights reserved.
IMPORTANT READ CAREFULLY BEFORE USING THIS PRODUCT LICENSE AGREEMENT AND LIMITED WARRANTY BY INSTALLING OR USING THE SOFTWARE, FILES OR OTHER ELECTRONIC.
Citadel: Efficiently Protecting Stacked Memory From Large Granularity Failures June 14 th 2014 Prashant J. Nair - Georgia Tech David A. Roberts- AMD Research.
How to Use the Three AXI Configurations
Citadel: Efficiently Protecting Stacked Memory From Large Granularity Failures Dec 15 th 2014 MICRO-47 Cambridge UK Prashant Nair - Georgia Tech David.
Coordinated Energy Management in Heterogeneous Processors INDRANI PAUL 1,2, VIGNESH RAVI 1, SRILATHA MANNE 1, MANISH ARORA 1,3, SUDHAKAR YALAMANCHILI 2.
Panel Discussion: The Future of I/O From a CPU Architecture Perspective #OFADevWorkshop Brad Benton AMD, Inc.
HETEROGENEOUS SYSTEM COHERENCE FOR INTEGRATED CPU-GPU SYSTEMS JASON POWER*, ARKAPRAVA BASU*, JUNLI GU †, SOORAJ PUTHOOR †, BRADFORD M BECKMANN †, MARK.
AMD platform security processor
OpenCL Introduction A TECHNICAL REVIEW LU OCT
FPGA and ASIC Technology Comparison - 1 © 2009 Xilinx, Inc. All Rights Reserved How do I Get Started with PlanAhead?
Conditions and Terms of Use
© 2012 Microsoft Corporation. All rights reserved.
Copyright 2011, Atmel December, 2011 Atmel ARM-based Flash Microcontrollers 1 1.
OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT
1| AMD FirePro™ / Creo 2.0 Launch Event | April 2012 | Confidential – NDA Required AMD FIREPRO ™ / CREO 2.0 Sales Deck April 2012.
End User License Agreement Permission to use and redistribute this Document is granted, provided that (1) the below copyright notice appears in all copies.
NRCCL (University of Oslo, Faculty of Law) Copyleft and Open Source Jon Bing Notrwegian Research Center for Computers and Law Master Lecture 13 October.
Sequential Consistency for Heterogeneous-Race-Free DEREK R. HOWER, BRADFORD M. BECKMANN, BENEDICT R. GASTER, BLAKE A. HECHTMAN, MARK D. HILL, STEVEN K.
A l a d d i n. c o m eSafe 6 FR2 Product Overview.
Enhancement Package Innovations Gabe Rodriguez - Halliburton Stefan Kneis – SAP Marco Valencia - SAP.
ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008.
Joseph L. GreathousE, Mayank Daga AMD Research 11/20/2014
C O N F I D E N T I A LC O N F I D E N T I A L ATI FireGL ™ Workstation Graphics from AMD April 2008 AMD Graphics Product Group.
STRUCTURAL AGNOSTIC SPMV: ADAPTING CSR-ADAPTIVE FOR IRREGULAR MATRICES MAYANK DAGA AND JOSEPH L. GREATHOUSE AMD RESEARCH ADVANCED MICRO DEVICES, INC.
SIMULATION OF EXASCALE NODES THROUGH RUNTIME HARDWARE MONITORING JOSEPH L. GREATHOUSE, ALEXANDER LYASHEVSKY, MITESH MESWANI, NUWAN JAYASENA, MICHAEL IGNATOWSKI.
SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION MARC S. ORR †§, SHUAI CHE §, AYSE YILMAZER §, BRADFORD M. BECKMANN §, MARK D. HILL †§, DAVID A. WOOD †§ †
IMPLEMENTING A LEADING LOADS PERFORMANCE PREDICTOR ON COMMODITY PROCESSORS BO SU † JOSEPH L. GREATHOUSE ‡ JUNLI GU ‡ MICHAEL BOYER ‡ LI SHEN † ZHIYING.
End User License Agreement Permission to use and redistribute this Document is granted, provided that (1) the below copyright notice appears in all copies.
Copyright ©2006 CA. All rights reserved. All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
F AULT S IM : A Fast, Configurable Memory-Reliability Simulator for Conventional and 3D-Stacked Systems* Memory Reliability Tutorial: HPCA-2016 Prashant.
PPEP: ONLINE PERFORMANCE, POWER, AND ENERGY PREDICTION FRAMEWORK BO SU † JUNLI GU ‡ LI SHEN † WEI HUANG ‡ JOSEPH L. GREATHOUSE ‡ ZHIYING WANG † † NUDT.
Connectivity to bank and sample account structure
µC-States: Fine-grained GPU Datapath Power Management
Joseph L. GreathousE, Mayank Daga AMD Research 11/20/2014
© 2002, Cisco Systems, Inc. All rights reserved.
Receptacle Housings for M-Style Infinite Switches
ADP Product Suite Integration – New Hire Workflow
ATI Stream Computing ACML-GPU – SGEMM Optimization Illustration
Measuring and Modeling On-Chip Interconnect Power on Real Hardware
BLIS optimized for EPYCTM Processors
Parallelspace PowerPoint Template for ArchiMate® 2.1 version 1.1
Parallelspace PowerPoint Template for ArchiMate® 2.1 version 2.0
The Small batch (and Other) solutions in Mantle API
Blake A. Hechtman†§, Shuai Che†, Derek R. Hower†, Yingying Tian†Ϯ,
SOC Runtime Gregory Stoner.
Automation in an XML Authoring Environment
libflame optimizations with BLIS
Transparency: Exceptions
Self-Registration walk-through
Interference from GPU System Service Requests
Interference from GPU System Service Requests
Machine Learning for Performance and Power Modeling of Heterogeneous Systems Joseph L. Greathouse, Gabriel H. Loh Advanced Micro Devices, Inc.
Transparency Reporting: Status
RegMutex: Inter-Warp GPU Register Time-Sharing
Machine Learning for Performance and Power Modeling of Heterogeneous Systems Joseph L. Greathouse, Gabriel H. Loh Advanced Micro Devices, Inc.
Motivation for 36OU Open Rack
Advanced Micro Devices, Inc.
Jason Stewart (AMD) | Rolando Caloca O. (Epic Games) | 21 March 2018
Presentation transcript:

FAULTSIM: A FAST, CONFIGURABLE MEMORY-RESILIENCE SIMULATOR DAVID A. ROBERTS, AMD RESEARCH PRASHANT J. NAIR, GEORGIA INSTITUTE OF TECHNOLOGY JUNE 14 TH 2014

| FAULTSIM: A FAST, CONFIGURABLE MEMORY-RESILIENCE SIMULATOR | JUNE 14 TH, DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2014 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.

| FAULTSIM: A FAST, CONFIGURABLE MEMORY-RESILIENCE SIMULATOR | JUNE 14 TH, MOTIVATION  Multi-granularity DRAM faults are common* ‒Bit, column, row, bank or rank  3D die-stacking introduces through-silicon vias (TSVs) as new points of failure  ECC needs to be customized to the memory ‒e.g. ECC-DIMM, ChipKill, RAID etc.  Complex to model analytically ‒Including scrubbing & dynamic repair REAL-WORLD MEMORY FAILURES FaultSim allows quick & easy memory resilience design space exploration *V. Sridharan and D. Liberty, “A study of dram failures in the field,” in High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for, pp. 1–11, 2012.

| FAULTSIM: A FAST, CONFIGURABLE MEMORY-RESILIENCE SIMULATOR | JUNE 14 TH, SIMULATOR  Memory chips (Fault Domains) organized into ranks (Domain Groups)  Monte Carlo randomized fault injection according to field study failure rates ‒Divide chip lifetime into fixed intervals (e.g. 7 year lifetime with 3-hour intervals)  At each time step, Fault Ranges (FRs) randomly inserted into a list within each FD according to fault probability ‒Evaluate ECC against recorded fault patterns

| FAULTSIM: A FAST, CONFIGURABLE MEMORY-RESILIENCE SIMULATOR | JUNE 14 TH, FAULT REPRESENTATION  Example memory with 8 rows and 8 bits per row ‒6-bit addresses ‒Fault ranges A, B and C (A and B intersect) ‒Mask field: indicates that fault address bit i can be 0 or 1 (covers both values) ‒Address field: indicates specific address bit values where Mask i == 0 FRMaskAddress A B C

| FAULTSIM: A FAST, CONFIGURABLE MEMORY-RESILIENCE SIMULATOR | JUNE 14 TH, FAULT RANGE INTERSECTION  Identifying intersection of FRs is a fundamental operation of the simulator ‒Allows detection of faults across chips in the same codeword(s) ‒Fast O(1) boolean function ‒FRs X and Y intersect if, for all address bit positions i ‒Either one of the masks is 1 (fault covers 0 and 1 values) OR ‒The specific address bits match XYIntersects? AB AC BC Examples for potentially intersecting Fault Range combinations X and Y

| FAULTSIM: A FAST, CONFIGURABLE MEMORY-RESILIENCE SIMULATOR | JUNE 14 TH, ECC EVALUATION ALGORITHM  We validate the simulator using conventional ECC-DIMM and ChipKill codes ‒One DRAM rank composed of ‘18’ 4-bit wide (x4) DRAM chips ‒Simulated results compared with approximate analytical model  FaultSim results for SECDED & ChipKill within 2% of approx. analytical model  Example: ChipKill ECC ‒Count the maximum number of faulty symbols in any one codeword ‒Assume 8-bit symbol size in following example ‒Record a failure if faulty symbol count per codeword > 1

| FAULTSIM: A FAST, CONFIGURABLE MEMORY-RESILIENCE SIMULATOR | JUNE 14 TH, CHIPKILL ECC ALGORITHM EXAMPLE  Fault Domain (chip) states at end of time step … 18 chips In rank CHIP 0 CHIP 1 Fault Range A Fault Range B

| FAULTSIM: A FAST, CONFIGURABLE MEMORY-RESILIENCE SIMULATOR | JUNE 14 TH, CHIPKILL ECC ALGORITHM EXAMPLE n_intersect 0 … 18 chips In rank CHIP 0 CHIP 1 FR temp Fault Range B FR 0 = A FR temp = FR 0  Copy the starting FR (FR 0 ) to a temporary FR

| FAULTSIM: A FAST, CONFIGURABLE MEMORY-RESILIENCE SIMULATOR | JUNE 14 TH, CHIPKILL ECC ALGORITHM EXAMPLE  Broaden FR temp to cover the symbol width of 8 bits  Consider all FRs (including A) for intersection with symbol  Increment n_intersect when true … 18 chips In rank CHIP 0 CHIP 1 FR temp Fault Range B FR 0 = A FR temp = FR 0 FR temp.mask |= 0x7 FR 1 = A If( intersects( FR temp, FR 1 ) ) n_intersect++ n_intersect 0 1

| FAULTSIM: A FAST, CONFIGURABLE MEMORY-RESILIENCE SIMULATOR | JUNE 14 TH, CHIPKILL ECC ALGORITHM EXAMPLE  Broaden FR temp to cover the symbol width of 8 bits  Consider all FRs (including A) for intersection with symbol  Increment n_intersect when true … 18 chips In rank CHIP 0 CHIP 1 FR temp Fault Range B FR 0 = A FR temp = FR 0 FR temp.mask |= 0x7 FR 1 = A If( intersects( FR temp, FR 1 ) ) n_intersect++ FR 1 = B If( intersects( FR temp, FR 1 ) ) n_intersect++ n_intersect Exceeds correctable errors: Stop simulation

| FAULTSIM: A FAST, CONFIGURABLE MEMORY-RESILIENCE SIMULATOR | JUNE 14 TH, CHIPKILL ECC ALGORITHM EXAMPLE  Continue algorithm from FR 0 = B if n_intersect <= 1  Reset n_intersect = 0  Two loops are necessary because you may not have counted FR 1 ’s that span more symbols* … 18 chips In rank CHIP 0 CHIP 1 Fault Range B FR 0 = B FR temp = FR 0 FR temp.mask |= 0x7 FR 1 = A If( intersects( FR temp, FR 1 ) ) n_intersect++ FR 1 = B If( intersects( FR temp, FR 1 ) ) n_intersect++ n_intersect Fault Range A * See backup slide

| FAULTSIM: A FAST, CONFIGURABLE MEMORY-RESILIENCE SIMULATOR | JUNE 14 TH, RESULTS AND FUTURE WORK  Simulated failure probability (BCH, ChipKill) within 2% of analytical model  Used FaultSim for evaluation in “Citadel” 3D-stacked DRAM ECC paper  We are continuing to develop the tool for new fault models, memory types and improved accuracy (real ECC evaluation and data patterns)  Intention to release an open-source version

QUESTIONS?

| FAULTSIM: A FAST, CONFIGURABLE MEMORY-RESILIENCE SIMULATOR | JUNE 14 TH, BACKUP  Add a third chip (CHIP 2)  Broadening FR B and FR C into FR temp (symbol width) does not change their size  Starting from FR 0 = C, you will see 2 intersections (Chips 2 and 1)  Starting from FR 0 = A, you will see 3 intersections (Chips 1, 2 and 0)  Therefore every FR needs to be considered as FR 0 to find greatest number of overlapping symbols in the rank EXPLANATION FOR USE OF TWO FOR LOOPS CHIP 0 CHIP 1 Fault Range B Fault Range A CHIP 2 Fault Range C