HARDWARE SOFTWARE PARTITIONING AND CO-DESIGN PRINCIPLES MADHUMITA RAMESH BABU SUDHI PROCH 1/37.

Slides:



Advertisements
Similar presentations
A Method for Validating Software Security Constraints Filaret Ilas Matt Henry CS 527 Dr. O.J. Pilskalns.
Advertisements

Runtime Techniques for Efficient and Reliable Program Execution Harry Xu CS 295 Winter 2012.
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Hadi Goudarzi and Massoud Pedram
ECE-777 System Level Design and Automation Hardware/Software Co-design
Scheduling in Distributed Systems Gurmeet Singh CS 599 Lecture.
1/20 Generalized Symbolic Execution for Model Checking and Testing Charngki PSWLAB Generalized Symbolic Execution for Model Checking and Testing.
ISE480 Sequencing and Scheduling Izmir University of Economics ISE Fall Semestre.
1 Program Slicing Purvi Patel. 2 Contents Introduction What is program slicing? Principle of dependences Variants of program slicing Slicing classifications.
NATW 2008 Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, R. Iris Bahar Division of Engineering Brown University Providence,
Ashish Gupta Under Guidance of Prof. B.N. Jain Department of Computer Science and Engineering Advanced Networking Laboratory.
1 Achieving Trusted Systems by Providing Security and Reliability (Research Project #22) Project Members: Ravishankar K. Iyer, Zbigniew Kalbarczyk, Jun.
Ritu Varma Roshanak Roshandel Manu Prasanna
Achieving Trusted Systems by Providing Security and Reliability Ravishankar K. Iyer, Zbigniew Kalbarczyk, Jun Xu, Shuo Chen, Nithin Nakka and Karthik Pattabiraman.
In vfprintf(), if (fmt points to “%n”) then **ap = (character count) Achieving Trusted Systems by Providing Security and Reliability FORMAL REASONING ON.
Chapter 18 Testing Conventional Applications
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
1 of 14 1 / 18 An Approach to Incremental Design of Distributed Embedded Systems Paul Pop, Petru Eles, Traian Pop, Zebo Peng Department of Computer and.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.
Petros OikonomakosBashir M. Al-Hashimi Mark Zwolinski Versatile High-Level Synthesis of Self-Checking Datapaths Using an On-line Testability Metric Electronics.
Evaluation of Safety Critical Software -- David L. Parnas, -- A. John van Schouwen, -- Shu Po Kwan -- June 1990 Presented By Zhuojing Li.
CUTE: A Concolic Unit Testing Engine for C Technical Report Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Secure Web Applications via Automatic Partitioning Stephen Chong, Jed Liu, Andrew C. Meyers, Xin Qi, K. Vikram, Lantian Zheng, Xin Zheng. Cornell University.
Network Aware Resource Allocation in Distributed Clouds.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Illinois Center for Wireless Systems Wireless Security Quantification and Mechanisms Bill Sanders Professor, Electrical and Computer Engineering Director,
1 A Static Analysis Approach for Automatically Generating Test Cases for Web Applications Presented by: Beverly Leung Fahim Rahman.
An efficient active replication scheme that tolerate failures in distributed embedded real-time systems Alain Girault, Hamoudi Kalla and Yves Sorel Pop.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
1 Nasser Alsaedi. The ultimate goal for any computer system design are reliable execution of task and on time delivery of service. To increase system.
Security - Why Bother? Your projects in this class are not likely to be used for some critical infrastructure or real-world sensitive data. Why should.
Trusted ILLIAC - A Configurable, Application-Aware, High-Performance Platform for Trustworthy Computing Ravishankar Iyer, Wen-mei Hwu, Klara Nahrstedt,
Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and.
1 Distributed and Optimal Motion Planning for Multiple Mobile Robots Yi Guo and Lynne Parker Center for Engineering Science Advanced Research Computer.
C OMPARING T HREE H EURISTIC S EARCH M ETHODS FOR F UNCTIONAL P ARTITIONING IN H ARDWARE -S OFTWARE C ODESIGN Theerayod Wiangtong, Peter Y. K. Cheung and.
An OBSM method for Real Time Embedded Systems Veronica Eyo Sharvari Joshi.
CAS 721 Course Project Implementing Branch and Bound, and Tabu search for combinatorial computing problem By Ho Fai Ko ( )
Symbolic Execution with Abstract Subsumption Checking Saswat Anand College of Computing, Georgia Institute of Technology Corina Păsăreanu QSS, NASA Ames.
Chapter 11: Dynamic Analysis Omar Meqdadi SE 3860 Lecture 11 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.
A local search algorithm with repair procedure for the Roadef 2010 challenge Lauri Ahlroth, André Schumacher, Henri Tokola
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Tolerating Communication and Processor Failures in Distributed Real-Time Systems Hamoudi Kalla, Alain Girault and Yves Sorel Grenoble, November 13, 2003.
Verification of Behavioral Consistency in C by Using Symbolic Simulation and Program Slicer Takeshi Matsumoto Thanyapat Sakunkonchak Hiroshi Saito Masahiro.
SAFEWARE System Safety and Computers Chap18:Verification of Safety Author : Nancy G. Leveson University of Washington 1995 by Addison-Wesley Publishing.
Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.
CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Physically Aware HW/SW Partitioning for Reconfigurable Architectures with Partial Dynamic Reconfiguration Sudarshan Banarjee, Elaheh Bozorgzadeh, Nikil.
HW7: Due Dec 5th 23:59 1.Describe test cases to reach full path coverage of the triangle program by completing the path condition table below. Also, draw.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
Phoenix Based Dynamic Slicing Debugging Tool Eric Cheng Lin Xu Matt Gruskin Ravi Ramaseshan Microsoft Phoenix Intern Team (Summer '06)
A Review of Software Testing - P. David Coward
Run-Time Environments Chapter 7
Dynamo: A Runtime Codesign Environment
Testing and Debugging PPT By :Dr. R. Mall.
COEN 421- Embedded System and Software Design
University of Washington
Depth First Search—Backtracking
Human Complexity of Software
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
by Xiang Mao and Qin Chen
Software Verification and Validation
Software Verification and Validation
CUTE: A Concolic Unit Testing Engine for C
Software Verification and Validation
Anand Bhat*, Soheil Samii†, Raj Rajkumar* *Carnegie Mellon University
Variable Storage Memory Locations (Logical) Variable Classes Stack
Lecture 4 – Data collection List ADT
Presentation transcript:

HARDWARE SOFTWARE PARTITIONING AND CO-DESIGN PRINCIPLES MADHUMITA RAMESH BABU SUDHI PROCH 1/37

Automated Derivation of Application-Aware Error Detectors Using Static Analysis: The Trusted Illiac Approach Karthik Pattabiraman, Member, IEEE, Zbigniew T. Kalbarczyk, Member, IEEE, and Ravishankar K. Iyer, Fellow, IEEE 1/41 2/37

INTRODUCTION 3/37

OVERVIEW A data error is defined as a divergence in the data values used in a program from an error-free run of the program for the same input. Describes an approach to derive runtime error detectors using static analysis of application. The detectors can be implemented in hardware or software. This paper focuses on software implementation, but hardware in employed in Reliability and Security engine. 4/37

TERMS USED IN PAPER Backward Program Slice -- that can affect value of variable at program location. Critical variable -- highly sensitive to random data errors. Checking expression -- computed from backward slice of critical variable. Detector -- set of all checking expressions for a critical variable. 5/37

STEPS IN DETECTOR DERIVATION IDENTIFICATION OF CRITICAL VARIABLE Having highest dynamic fan-outs. Each function is considered separately to identify variables. COMPUTATION OF BACKWARD SLICE OF CRITICAL VARIABLES. Backward traversal of program till computation of variable. All possible dependences are considered. CHECK DERIVATION, INSERTION, INSTRUMENTATION Backtracked, inserted just after computation of critical variable. Track control paths at runtime. RUNTIME CHECKING IN HARDWARE AND SOFTWARE Path Tracking is implemented in hardware. Checking is also moved to hardware. 6/37

EXAMPLE CODE FRAGMENT WITH DETECTORS. if (a==0) b=a+c; d=b-e; f=d+b; b=a+c; d=b-e; f=d+b; Path 1 Use f; Rest of code c=a-d; b=d+e; f=b+c; c=a-d; b=d+e; f=b+c; Path 2 if (path==1) f2= 2*c – e if (a==0) f2= 2*c – e if (a==0) f2=a+e If (a!=0) f2=a+e If (a!=0) If (f2==f) Declare error in f along path and exit then else then else 7/37

SOFTWARE ERRORS COVERED MEMORY CORRUPTION ERRORS: i) Can write to heap or stack. ii) Static analysis assumes objects are infinitely apart in memory iii) Thus, backtracking examines all dependeces for the critical variable RACE CONDITIONS AND SYNCHRONIZATION ERRORS: i) Concurrent programs due to lack of synchronized accesses. ii) Static analysis does not account asynchronous modifications. iii) Thus, backward slice contains values of shared variables under synchronous conditions. 8/37

SOFTWARE ERRORS COVERED MEMORY CORRUPTION ERRORS: int foo (int buf[]) {int sum [buflen]; int max = 0; int maxIndex=0; Sum[0]=0; for (int i=0; i<buflen;i++) {sum[i+1]=sum[i]+buf[i]; if (max<buf[i]) {max= buf[i]; maxindex=I; } } if (max>threshold) return sum[maxindex]; return sum[buflen]; } Memory overflow 9/37

SOFTWARE ERRORS COVERED RACE CONDITIONS AND SYNCHRONIZATION ERRORS: void foo (int *a, mutex*alock, int n, int c) { int i= 0; int sum =0; for (i=0;i<n;i++) { acquire_mutex (alock[i]); old_a= a[i]; a[i]=a[i]+c; check (a[i]==old_a+c) release_mutex(alock[i]); } } Thread modifying contents of a may be in another module Precise analysis required, is unscalable CHECK 10/37

HARDWARE ERRORS COVERED Hardware transient errors that result in corruption of architectural state are considered in the fault model. INSTRUCTION FETCH AND DECODE ERRORS EXECUTE AND MEMORY UNIT ERRORS CACHE/MEMORY/REGISTER FILE ERRORS. 11/37

STATIC ANALYSIS A new compiler pass VALUE RECOMPUTATION PASS (VRP) is introduced in the LLVM architecture. Static Single Assignment (SSA) form is used as intermediate code representation.  each variable defined once and given an unique name.  a special static construct “phi” instruction whenever there is a merge. 12/37

PATH SPECIFIC SLICING ALGORITHM The backward traversal starts from the critical instruction and terminates whenever one of these conditions is met: Beginning of current function is reached: void bubble ( int srtElements, int *sortList) A basic block is revisited in a loop: if data dependence is in a loop, one detector on critical variable, another on value after critical variable in the loop A dependence across loop iterations is encountered: Split detectors. A memory operand is encountered: Usually, virtual registers store variables, but cases like pointer references, duplicates memory loads. 13/37

ALGORITHM Critical instruction Backward slice Starting instruction with ID Corresponding flowpath Index of parent path Visits each operand adding to slicelist Function computeslices (critical Instruction): ---- return PathList,SliceList  Function visit (seedInstruction,pathID,parent): -----return Terminal; Only terminal paths are added to the final list of paths. Certain instructions like mallocs, frees cannot be computed but do not have nay impact on performance. 14/37

SCALABILITY AND COVERAGE Number of control paths Size of checking expression Number of detectors 15/37

STATE MACHINE GENERATION START LOOPENTRY LOOPEXIT THEN NO_EXIT ENDIF START B B A A C C G G F F E E D D (LOOPENTRY, LOOPEXIT) (ENDIF,NO_EXIT) (LOOPENTRY,NO_EXIT) (THEN, ENDIF) (NO_EXIT, ENDIF) 16/37

EXPERIMENTAL RESULTS PERFORMANCE OVERHEADS  Checking overhead of VRP is 25%, code modification by 8%. DETECTION COVERAGE 17/37

DISCUSSIONS AND FUTURE WORK 77% coverage for errors that propagate and cause crashes. FDV can provide 100% coverage, albeit extremely expensive. If we neglect redundant detections, 90% of errors are detected. ============================================ Deriving detectors at lower levels of compilation. Migration of checking functionality to reconfigurable hardware. 18/37

Hardware/Software Optimization of Error Detection Implementation for Real time Embedded systems Adrian Lifa, Petru Eles, Zebo Peng, Viacheslav Izosimov International Conference on Hardware/Software Codesign and System Synthesis, /37

Agenda Motivation and Background Example Of Error Detection Implementation (EDI) Optimization Challenge – with examples EDI Algorithm for Static and PDR FPGA H/W Experimental results Conclusion and Improvements 20/37

Motivation and Background Reliable system operation for safety Critical systems Adaptive Cruise Control Nuclear Power Plant Error detection and recovery is very important Implementation involves cost – time overhead Early Optimization of scheme is most beneficial 21/37

EDI - Example Error Detection and recovery code 2 Main sources of performance overhead Variable Checking Path Tracking 22/37

Optimization Challenge SW only approach – Overhead as high as 400% HW only implementation – Increased cost (logic area) Other Choice – Mixed H/W and S/W approach Optimization Variables Time criticality of tasks Amount and cost of H/W Nature Of H/W (static or Partial reconfigurable) 23/37

Optimization Challenge Processes modeled as acyclic graphs – Connections show dependence 24/37

Optimization Challenge Optimization Objective – Optimal fault tolerant worst case schedule length (WCSL), given overheads and mapping of tasks “Re-execution of task on fault” model used for recovery 25/37

Optimization Challenge - Example WCET U – Baseline worst case execution time WCET i – worst case execution for an implementation h i – H/W cost/area for a particular process P i – Reconfiguration time for a particular task 26/37

Optimization Challenge - Example Implementation Options Considered: S/W Only – Path tracking and variable checking in SW – interleaved code. HW Only – Path tracking and variable checking in HW Mixed HW/SW - Path Tracking in H/W. Variable Checking in SW 27/37

Optimization Challenge - Example SW Only implementation HW Only implementation – Unconstraint area P1 – Mixed; P2 – SW P3 – Mixed; P4 - SW P1 – Mixed; P2 – SW P3 – SW; P4 - Mixed P1 – Mixed; P2 – Mixed PDR P3 – SW; P4 – Mixed 28/37

EDI Algorithm Combined mapping and scheduling problem Optimal Sol possible only for very small set of tasks and nodes – NP complete otherwise Use Heuristics – Tabu Search Algorithm 29/37

EDI Algorithm – Static FPGA 30/37

EDI Algorithm – Static FPGA Important aspects – Start from a random start solution Search neighborhood – Perform Moves Simple Moves and Swap moves Swap moves – replace tasks on one resource Avoid Local Minima - Accept non improving moves Tabu moves used to avoid cycling to local minima Diversification used to broaden search – Wait counters for processes. Use long waiting processes. Restrict search to critical path moves – constraint 31/37

EDI Algorithm – PDR FPGA Additional Complexities– Calculate reconfiguration schedule for EDI Function of Earliest Start time, Worst case execution time, HW area and critical path dependency. Moves Exploration for a Process 32/37

Experimental Results Process Graphs : 6 types with 15 graphs each Types of random data = 2 FPGA HW variation – 12 types (as % of max area) Total Evaluation settings = 2 * 6 * 15 * 12 = /37

Experimental Results Possible only for 20 process graphs and up to 40% HW area Error – 1% max (testcase1) 2.5% max (testcase2) 34/37

Experimental Results – Static FPGA 15% HW area gives >50% improvement – testcase1 40% HW area gives >50% improvement – testcase2 Improvement Saturates after a point 35/37

Experimental Results – PDR FPGA 5% HW area gives >36% improvement – testcase1 25% HW area gives >34% improvement – testcase2 Improvements are over and beyond Static HW case 36/37

Conclusion and Improvements Conclusions - Optimization scheme for EDI was presented Fault tolerance and Real time constraints make life challenging Heuristic based algorithm (Tabu search) was used PDR HW option gives best results Improvements - Assumes a fixed mapping of tasks to each of the computational nodes Could have compared with some other heuristic algorithm – simulated annealing 37/37