PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection S. Lu, P. Zhou, W. Liu, Y. Zhou, J. Torrellas University.

Slides:



Advertisements
Similar presentations
Masahiro Fujita Yoshihisa Kojima University of Tokyo May 2, 2008
Advertisements

Review of the MIPS Instruction Set Architecture. RISC Instruction Set Basics All operations on data apply to data in registers and typically change the.
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Overcoming an UNTRUSTED COMPUTING BASE: Detecting and Removing Malicious Hardware Automatically Matthew Hicks Murph Finnicum Samuel T. King University.
IMPACT Second Generation EPIC Architecture Wen-mei Hwu IMPACT Second Generation EPIC Architecture Wen-mei Hwu Department of Electrical and Computer Engineering.
Anshul Kumar, CSE IITD CSL718 : VLIW - Software Driven ILP Hardware Support for Exposing ILP at Compile Time 3rd Apr, 2006.
Data Dependencies Describes the normal situation that the data that instructions use depend upon the data created by other instructions, or data is stored.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
A look at interrupts What are interrupts and why are they needed.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.
Nested Transactional Memory: Model and Preliminary Architecture Sketches J. Eliot B. Moss Antony L. Hosking.
S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, B. Calder UCSD and Microsoft PLDI 2007.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
Continuously Recording Program Execution for Deterministic Replay Debugging.
Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC.
Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
Yuanyuan ZhouUIUC-CS Architectural Support for Software Bug Detection Yuanyuan (YY) Zhou and Josep Torrellas University of Illinois at Urbana-Champaign.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.
Multiscalar processors
Pipelined Processor II CPSC 321 Andreas Klappenecker.
Testing an individual module
INPUT/OUTPUT ORGANIZATION INTERRUPTS CS147 Summer 2001 Professor: Sin-Min Lee Presented by: Jing Chen.
1 RAKSHA: A FLEXIBLE ARCHITECTURE FOR SOFTWARE SECURITY Computer Systems Laboratory Stanford University Hari Kannan, Michael Dalton, Christos Kozyrakis.
The Design and Implementation of a Log-Structured File System Presented by Carl Yao.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 2006 Exterminator: Automatically Correcting Memory Errors Gene Novark, Emery Berger.
What are Exception and Interrupts? MIPS terminology Exception: any unexpected change in the internal control flow – Invoking an operating system service.
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
CS533 Concepts of Operating Systems Jonathan Walpole.
Is Out-Of-Order Out Of Date ? IA-64’s parallel architecture will improve processor performance William S. Worley Jr., HP Labs Jerry Huck, IA-64 Architecture.
Use of Coverity & Valgrind in Geant4 Gabriele Cosmo.
Software Integrity Monitoring Using Hardware Performance Counters Corey Malone.
Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.
1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.
Detecting Atomicity Violations via Access Interleaving Invariants
HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.
The Standford Hydra CMP  Lance Hammond  Benedict A. Hubbert  Michael Siu  Manohar K. Prabhu  Michael Chen  Kunle Olukotun Presented by Jason Davis.
G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview: Using Hardware.
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
1 Programming with Shared Memory - 3 Recognizing parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Jan 22, 2016.
Processes and threads.
Advanced Operating Systems CIS 720
Speculative Lock Elision
Multiscalar Processors
Interrupts In 8085 and 8086.
Effective Data-Race Detection for the Kernel
/ Computer Architecture and Design
Chapter 10 The Stack.
Hardware Multithreading
MARIE: An Introduction to a Simple Computer
Henk Corporaal TUEindhoven 2011
Mengjia Yan† , Jiho Choi† , Dimitrios Skarlatos,
Hybrid Transactional Memory
Speculative execution and storage
Programming with Shared Memory Specifying parallelism
Operating System Introduction.
CSC3050 – Computer Architecture
Patrick Akl and Andreas Moshovos AENAO Research Group
Programming with Shared Memory - 3 Recognizing parallelism
Programming with Shared Memory Specifying parallelism
rePLay: A Hardware Framework for Dynamic Optimization
In Today’s Class.. General Kernel Responsibilities Kernel Organization
Chapter 4 The Von Neumann Model
Presentation transcript:

PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection S. Lu, P. Zhou, W. Liu, Y. Zhou, J. Torrellas University of Illinois at Urbana Champaign

2 Motivation We have plenty of methods to detect software bugs – Assertions – Software (CCured, Purify) – Hardware (iWatcher) However, these tools only check part of the total code – Only the part that is actually exercised

3 Motivation Checking the entire code is really hard – Impossible to check all paths, or even all branches – Too difficult to create complete test cases – State dependent on external factors File system, system load, date etc – Too much time required for the tools to run We must find a way to check as much of the code as possible, with as few runs as possible

4 PathExpander Idea When arriving at a branch execute BOTH paths – The Non-Taken (NT) path will not commit A dynamic checker – NOT PathExpander – will monitor the program execution and detect bugs in any of the two paths This can happen in Software or Hardware – We will mainly talk about the H/W

5 PathExpander Operation All memory operations of the NT path are sandboxed – The NT path must not permanently alter the program’s state The NT path may execute branches – But only the taken path will be followed – Otherwise, this can grow exponentially

6 PathExpander Operation The NT path will be killed – If a number of instructions is normally executed for resource contention reasons – If an unsafe event occurs I/O, which cannot be sandboxed – If it raises an exception load NULL, etc The NT path can be executed on an idle core, if we have CMP processor

7 Architecture Overview

8 Basic Architecture Overview 2 4-bit exercise counters in the BTB for the two edges – An NT path will be spawned if that edge has exercise counter below a threshold – Periodically reset to zero, to support long-running programs 1-bit V(volatile)Tag per L1 cache line – All NT path writes set the VTag – When the NT path is squashed, invalidate these lines

9 Basic Architecture Overview Monitor_Memory_Area: pointer to NON volatile memory, that will remain after the NT path is killed – Used by the monitoring tool Predicate Register – Used for consistency fixes (in a few slides…)

10 CMP Architecture Overview NT paths are spawned to idle cores, to hide the delay Vtag now becomes 8-bit – ID of the path that wrote in the line – Allows memory versioning Fast register copy mechanism is required

11 CMP PathExpander Data Dependencies

12 CMP PathExpander Data Dependencies A path must read data that it or its parents wrote. A taken path cannot commit before – Its father commits – Its siblings have been killed Kill siblings immediately to limit the delay if the taken path must commit to displace lines to the L2

13 State Inconsistency Example of state inconsistency problem: If (x >= 2) { use x; assume we have lot of data } Else { use x; we have few data }; The predicate register will inform us if this is a T or NT path – If we are on an NT path, we must somehow ‘fix’ these inconsistencies.

14 State Inconsistency fixing

15 State Inconsistency fixing The compiler will insert predicated fix instructions. If a pointer must be fixed, the compiler will have a dummy variable of the same type for the pointer to point at – Will that be correct always? What will happen if we have unions?

16 PathExpander Implementation Software – In PIN Hardware – With and without CMP 4 cores for the CMP case Dynamic Checkers – CCured (dynamic software) – iWatcher (dynamic hardware-assisted) – Assertions NON bug-triggering inputs

17 Simulation Parameters

18 Applications Tested

19 Experimental Results Bug Detection Bugs Detected with NON triggering inputs 21:17 Detected to False Negatives

20 Experimental Results: Effect of Consistency Fixing on False Positives Fixing reduces from 13:0.75 to 4:1 False to True Positives

21 Experimental Results Coverage Improvement Branch coverage increases from 40% to 65%

22 Experimental Results Cumulative Coverage Improvement Random Inputs Last 5 Inputs are given by programmers

23 Crash- and Unsafe-Event-Latency 65% - 99% of spawned NT execute for 1000 instructions without crashing or causing unsafe events

24 Evaluation: Software Implementation Overhead DANGER: do not attempt this at home (or during this lifetime anyway)

25 Evaluation: Hardware Implementation Overhead CMP overhead < 10% If SEQ overhead small, thread spawning and squashing will worsen performance

26 Conclusion PathExpander is NOT a dynamic checker – It simply improves coverage by checking both paths – And increases the false positives (4:1 ratio) Software Implementation is waaaaaaaaaaay toooooooooooooooo sloooooooooooooow

27 Conclusion A lot of things remain untouched – Where’s the tech report??? – What do they do with speculative L1 lines under eviction? – Fixing is really naïve. A lot of interesting things happen is queues, lists etc that cannot be fixed by the compiler – What kind of ISA? Is it worth it? Thank you! Questions?