WTM’13, Prague, April 14, 2013 1 Post-Silicon Debugging of Transactional Memory Tests Carla Ferreira, João Lourenço {carla.ferreira,

Slides:



Advertisements
Similar presentations
SACMAT 03© Mohammad Al-Kahtani1 Induced Role Hierarchies with Attribute-Based RBAC Mohammad A. Al-Kahtani Ravi Sandhu George Mason University NSD Security,
Advertisements

On-the-fly Healing of Race Conditions in ARINC-653 Flight Software
1 Evaluation of Commercial Off The Shelf (COTS) Operating System (OS) Malfunction Mitigation Methods C. Forni, ATK B. Blake, ATK R. Hall, Textron D. Magidson,
Asanovic/Devadas Spring Advanced Superscalar Architectures Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology.
Instruction Level Parallelism and Superscalar Processors
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,
RAM (cont.) 220 bytes of RAM (1 Mega-byte) 20 bits of address Address
Computer Science Education
SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan
COMP375 Computer Architecture and Organization Senior Review.
Web Performance Tuning Lin Wang, Ph.D. US Department of Education Copyright [Lin Wang] [2004]. This work is the intellectual property of the author. Permission.
Chapter 14 Testing Tactics
1 Parallel Algorithms (chap. 30, 1 st edition) Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. p0p0 p1p1.
Hardware-assisted Virtualization
Presenter: Jyun-Yan Li On the Generation of Functional Test Programs for the Cache Replacement Logic W. J. Perez H. Universidad del Valle Grupo de Bionanoelectrónica.
1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
1 B. Bruidegom Computer Architecture Top down approach B. Bruidegom AMSTEL-instituut.
Based on Mike Feeley’s original slides; Modified by George Tsiknis Unit 11 Local Variables, Parameters and the Stack Relevant Information CPSC 213 Companion.
Chapter 4 The Von Neumann Model
Chapter 4 The Von Neumann Model
Current Activity Summary
The IA-64 Architectural Innovations Hardware Support for Software Pipelining José Nelson Amaral 1.
SE-292 High Performance Computing
11-1 Bard, Gerstlauer, Valvano, Yerraballi EE 319K Introduction to Embedded Systems Lecture 11: Data Acquisition, Numerical Fixed-Point Calculations, Lab.
A Micro-benchmark Suite for AMD GPUs Ryan Taylor Xiaoming Li.
Topics Left Superscalar machines IA64 / EPIC architecture
Slides created by: Professor Ian G. Harris Efficient C Code  Your C program is not exactly what is executed  Machine code is specific to each ucontroller.
ARM versions ARM architecture has been extended over several versions.
Machine & Assembly Language. Machine Language  Computer languages cannot be read directly by the computer – they are not in binary.  All commands need.
RAT R1 R2 R3 R4 R5 R6 R7 Fetch Q RS MOB ROB Execute Retire.
Instruction Level Parallelism
The LC-3 – Chapter 5 COMP 2620 Dr. James Money COMP 2620.
Overheads for Computers as Components 2nd ed.
Introduction to Computer Systems
Kosarev Nikolay MIPT Apr, 2010
Chapter 3 โพรเซสเซอร์และการทำงาน The Processing Unit
Link-Time Path-Sensitive Memory Redundancy Elimination Manel Fernández and Roger Espasa Computer Architecture Department Universitat.
CPSC 330 Fall 1999 HW #1 Assigned September 1, 1999 Due September 8, 1999 Submit in class Use a word processor (although you may hand-draw answers to Problems.
Stored Program Architecture
CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip Vivian Lee.
The NS7520.
CPU performance CPU power consumption
AT94 Training 2001Slide 1 FPSLIC- Embedded MCU Core 8 Bit RISC MCU Industry’s Highest 8-bit Performance A Real 8-Bit RISC Architecture Low Power ( idle/power.
Chapter 3 General-Purpose Processors: Software
F28PL1 Programming Languages Lecture 3: Assembly Language 2.
ITEC 352 Lecture 13 ISA(4).
Scalable Rule Management for Data Centers Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan 4/3/2013.
Instruction-Level Parallelism
The Project Please read the project’s description first. Each router will have a unique ID, with your router’s ID of 0 Any two connected routers will have.
Lecture 9 – OOO execution © Avi Mendelson, 5/ MAMAS – Computer Architecture Lecture 9 – Out Of Order (OOO) Dr. Avi Mendelson Some of the slides.
Based on Mike Feeley’s and Tamara Munzner’s original slides; Modified by George Tsiknis Parameters and Local Variables Relevant Information CPSC 213 Companion.
Computer System Organization Computer-system operation – One or more CPUs, device controllers connect through common bus providing access to shared memory.
© 2013 IBM Corporation Enabling easy creation of HW reconfiguration scenarios for system level pre-silicon simulation Erez Bilgory Alex Goryachev Ronny.
Effects of Virtual Cache Aliasing on the Performance of the NetBSD Operating System Rafal Boni CS 535 Project Presentation.
What Great Research ?s Can RAMP Help Answer? What Are RAMP’s Grand Challenges ?
IBM Haifa Research Lab IBM Haifa Labs An Open Source Simulation Model of Software Testing Shmuel Ur Elad Yom-Tov Paul Wernick
February 11, 2003Ninth International Symposium on High Performance Computer Architecture Memory System Behavior of Java-Based Middleware Martin Karlsson,
Methodology for Architectural Level Reliability Risk Analysis Lalitha Krothapalli CSC 532.
Presenter : Ching-Hua Huang 2013/7/15 A Unified Methodology for Pre-Silicon Verification and Post-Silicon Validation Citation : 15 Adir, A., Copty, S.
Ingres Version 6.4 An Overview of the Architecture Presented by Quest Software.
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.
Fundamentals of Programming Languages-II Subject Code: Teaching SchemeExamination Scheme Theory: 1 Hr./WeekOnline Examination: 50 Marks Practical:
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
C OMBINING P RE -S ILICON V ERIFICATION B RAINS WITH P OST -S ILICON P LATFORM M USCLE Reviewer: Shuo-Ren, Lin 2012/5/11 ALCom 1.
??? ple r B Amulya Sai EDM14b005 What is simple scalar?? Simple scalar is an open source computer architecture simulator developed by Todd.
Chapter 2 Memory and process management
Chapter 18 Software Testing Strategies
Foundations of Computer Science
Methodology for Architectural Level Reliability Risk Analysis
Presentation transcript:

WTM’13, Prague, April 14, Post-Silicon Debugging of Transactional Memory Tests Carla Ferreira, João Lourenço {carla.ferreira, Ophir Friedler, Wisam Kadry, Amir Nahir, Vitali Sokhin {ophirf, wisamk, nahir, IBM ResearchUniversidade Nova de Lisboa

WTM’13, Prague, April 14, Post Silicon Post-silicon validation elements: 1. Stimulating the design under test 2. Detecting erroneous behavior 3. Localizing the root cause of the problem 4. Providing a fix.

WTM’13, Prague, April 14, Stimulation 1. Test generation 2. Execution 3. Consistency checking 4. Repeat… Forever! Silicon Accelerator Generation Checking Execution OS services Test Template Topology Architectural Model Exerciser Image (Threadmill)

WTM’13, Prague, April 14, Detection Consistency checking Run the same test-case from the same initial architectural state. Expect the same final architectural state ori r10,r0,170 stb r10,0(r6) lbz r11,0(r6)... Initial State R0 = 0x1, R1 = 0x2 … Final State R0 = 0xA, R1 = 0xB … Micro-architectural state varies! Caches, page misses, pre-fetching, thread priorities

WTM’13, Prague, April 14, Detection And what if two different final states are manifested? ori r10,r0,170 stb r10,0(r6) lbz r11,0(r6)... Initial State R0 = 0x1, R1 = 0x2 … Final State R0 = 0xA, R1 = 0xB … ori r10,r0,170 stb r10,0(r6) lbz r11,0(r6)... Initial State R0 = 0x1, R1 = 0x2 … Final State R0 = 0xC, R1 = 0xB … MIS-COMPARE Final State R0 = 0xA, R1 = 0xB … Final State R0 = 0xC, R1 = 0xB …

WTM’13, Prague, April 14, Localization approach 1. A test-case that produces a mis-compare is found 2. Fast-forward to that test-case on a software simulator (a.k.a. Reference model) 3. Execute test case on the reference model instruction by instruction and extract information

WTM’13, Prague, April 14, Localization Reduce number of resources and instructions that might be the root cause of the mis-compare Study the effect of transactions in the test-case on the final state. Justification: Force erroneous behaviour on reference model and re- create the mis-compare results

8 R1 R4R3R2 Localization = suspicious instruction subset

WTM’13, Prague, April 14, Concluding remarks Debug automation effectively reduces the debugging effort. Graph analysis holds the potential automate the localization of suspicious resources and instructions Future work: - Study the impact of escaped stores in transaction aborts - experiment with larger (real-world) cases

WTM’13, Prague, April 14, Questions