Dancing With Uncertainty Saša Misailović Stelios Sidiroglou Martin Rinard MIT CSAIL.

Slides:



Advertisements
Similar presentations
Vote Elicitation with Probabilistic Preference Models: Empirical Estimation and Cost Tradeoffs Tyler Lu and Craig Boutilier University of Toronto.
Advertisements

SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
Optimizing single thread performance Dependence Loop transformations.
Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,
Intel® performance analyze tools Nikita Panov Idrisov Renat.
Background Concurrent access to shared data can lead to inconsistencies Maintaining data consistency among cooperating processes is critical What is wrong.
Programming Abstractions for Approximate Computing Michael Carbin with Sasa Misailovic, Hank Hoffmann, Deokhwan Kim, Stelios Sidiroglou, Martin Rinard.
*time Optimization Heiko, Diego, Thomas, Kevin, Andreas, Jens.
Accuracy-Aware Program Transformations Sasa Misailovic MIT CSAIL.
Dynamic Feedback: An Effective Technique for Adaptive Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California,
Randomized Accuracy Aware Program Transformations for Efficient Approximate Computations Sasa Misailovic Joint work with Zeyuan Allen ZhuJonathan KelnerMartin.
CS444/CS544 Operating Systems Introduction to Synchronization 2/07/2007 Prof. Searleman
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
Testing an individual module
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
Chapter 12: Simulation and Modeling Invitation to Computer Science, Java Version, Third Edition.
PARALLEL PROGRAMMING ABSTRACTIONS 6/16/2010 Parallel Programming Abstractions 1.
TASK ADAPTATION IN REAL-TIME & EMBEDDED SYSTEMS FOR ENERGY & RELIABILITY TRADEOFFS Sathish Gopalakrishnan Department of Electrical & Computer Engineering.
University of Maryland Automatically Adapting Sampling Rates to Minimize Overhead Geoff Stoker.
Achieving Better Reliability With Software Reliability Engineering Russel D’Souza Russel D’Souza.
Programming with Shared Memory Introduction to OpenMP
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
This module was created with support form NSF under grant # DUE Module developed by Martin Burtscher Module B1 and B2: Parallelization.
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
Design Space Exploration
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
Using Loop Perforation to Dynamically Adapt Application Behavior to Meet Real-Time Deadlines Henry Hoffmann, Sasa Misailovic, Stelios Sidiroglou, Anant.
Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,
CSIS 123A Lecture 9 Recursion Glenn Stevenson CSIS 113A MSJC.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.
Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization.
Reasoning about Relaxed Programs Michael Carbin Deokhwan Kim, Sasa Misailovic, and Martin Rinard.
Chapter 10 Verification and Validation of Simulation Models
CS162 Week 5 Kyle Dewey. Overview Announcements Reactive Imperative Programming Parallelism Software transactional memory.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
Probabilistic Verification of Discrete Event Systems using Acceptance Sampling Håkan L. S. Younes Carnegie Mellon University.
9/22/2011CS4961 CS4961 Parallel Programming Lecture 9: Task Parallelism in OpenMP Mary Hall September 22,
MPI and OpenMP.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
Agenda  Quick Review  Finish Introduction  Java Threads.
Rely: Verifying Quantitative Reliability for Programs that Execute on Unreliable Hardware Michael Carbin, Sasa Misailovic, and Martin Rinard MIT CSAIL.
Håkan L. S. YounesDavid J. Musliner Carnegie Mellon UniversityHoneywell Laboratories Probabilistic Plan Verification through Acceptance Sampling.
FNAL Software School Day 4 Matt Herndon, University of Wisconsin – Madison.
Function Recursion to understand recursion you must understand recursion.
Dynamic Region Selection for Thread Level Speculation Presented by: Jeff Da Silva Stanley Fung Martin Labrecque Feb 6, 2004 Builds on research done by:
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Parallel Computing Chapter 3 - Patterns R. HALVERSON MIDWESTERN STATE UNIVERSITY 1.
Tutorial 2: Homework 1 and Project 1
Optimistic Hybrid Analysis
Chapter 12: Simulation and Modeling
to understand recursion you must understand recursion
Introduction to OpenMP
Software Correctness Indexed Processes SWEN T2.
CS427 Multicore Architecture and Parallel Computing
Atomic Operations in Hardware
Computer Engg, IIT(BHU)
Martin Rinard Laboratory for Computer Science
Chapter 10 Verification and Validation of Simulation Models
to understand recursion you must understand recursion
Introduction to OpenMP
Cilk and Writing Code for Hardware
Sculptor: Flexible Approximation with
Presentation transcript:

Dancing With Uncertainty Saša Misailović Stelios Sidiroglou Martin Rinard MIT CSAIL

Example Water: Simulates system of water molecules HH O HH O HH O H H O H H O HH O HH O

Example Water: Simulates system of water molecules HH O HH O HH O H H O H H O HH O HH O

Example Water: Simulates system of water molecules HH O HH O HH O H H O H H O HH O HH O

Example Water: Simulates liquid water molecules HH O HH O HH O H H O H H O HH O HH O

Example Water: Simulates system of water molecules HH O HH O HH O H H O H H O HH O HH O

Example Water: Simulates system of water molecules HH O HH O HH O H H O H H O HH O HH O

Dubstep Explores the effects of selectively removing synchronization

Dubstep Highlights 1.Removing locks and opportunistic barriers trade accuracy for performance 2.Automatically explores the tradeoff space induced by candidate transformations 3.Uses statistical analysis to characterize impact of transformations on accuracy

Dubstep Workflow Prepare Find Transform Analyze Navigate

Dubstep Workflow Prepare Find Transform Analyze Navigate 1.Prepare representative inputs 2.Prepare accuracy model – Output abstraction (important parts of output) – Accuracy bound (amount of tolerable error)

Dubstep Workflow Prepare Find Transform Analyze Navigate Loops with parallel constructs Profiling: performance & memory

Dubstep Workflow Prepare Find Transform Analyze Navigate Loops with parallel constructs Profiling: performance & memory Interf (56.4%) Poteng (43.4%)

Dubstep Workflow Removing synchronization Prepare Find Transform Analyze Navigate void scratchPad::updateForces (double R[3][3]) { mutex_lock(this->lock); this->H1force.vecAdd(R[0]); this->Oforce.vecAdd(R[1]); this->H2force.vecAdd(R[2]); mutex_unlock(this->lock); }

Dubstep Workflow Removing synchronization Prepare Find Transform Analyze Navigate void scratchPad::updateForces (double R[3][3]) { mutex_lock(this->lock); this->H1force.vecAdd(R[0]); this->Oforce.vecAdd(R[1]); this->H2force.vecAdd(R[2]); mutex_unlock(this->lock); }

Dubstep Workflow Removing synchronization Prepare Find Transform Analyze Navigate void scratchPad::updateForces (double R[3][3]) { this->H1force.vecAdd(R[0]); this->Oforce.vecAdd(R[1]); this->H2force.vecAdd(R[2]); }

Dubstep Workflow Opportunistic barriers Prepare Find Transform Analyze Navigate void ensemble::interf(){ parallel_for( interf_body, 0, NumMol-1 ); }

Dubstep Workflow Opportunistic barriers Prepare Find Transform Analyze Navigate void ensemble::interf(){ parallel_for( interf_body, 0, NumMol-1 ); }

Dubstep Workflow Opportunistic barriers Prepare Find Transform Analyze Navigate void ensemble::interf(){ parallel_for( interf_body, 0, NumMol-1 ); } Schedule threads Execute interf_body in parallel Wait for all threads to complete

Dubstep Workflow Opportunistic barriers Prepare Find Transform Analyze Navigate void ensemble::interf(){ parallel_for*( interf_body, 0, NumMol-1 ); } Schedule threads Execute interf_body in parallel Wait for half of threads to complete Instruct remaining threads to stop [Rinard, OOPSLA 2007]

Dubstep Workflow Analyze transformed program: Criticality –M–Memory safety, integrity Performance –S–Speedup comparison Accuracy –S–Statistical analysis Prepare Find Transform Analyze Navigate

Dubstep Workflow Prepare Find Transform Analyze Navigate c Input Original Program Output Output Abstraction (Application-Specific) Transformed Program Difference Bound δ <

Dubstep Workflow Navigate the tradeoff space: Transform and analyze one location at a time – 3 locations in water Transform multiple locations in the same candidate program – Guided by the results of the previous step Prepare Find Transform Analyze Navigate

Search Space Exploration LI BI BR LI+BI LI+BP BI+BP LI+BI+BP Relative Speedup Accuracy loss LI – Synchronization Interf BI – Barrier Interf BP – Barrier Poteng Baseline: original parallel program runs 6.2 times faster than sequential on 8 cores

Search Space Exploration LI BI BR LI+BI LI+BP BI+BP LI+BI+BP Relative Speedup Accuracy loss LI – Synchronization Interf BI – Barrier Interf BP – Barrier Poteng How confident can we be about these observations? Baseline: original parallel program runs 6.2 times faster than sequential on 8 cores

Execution Reliability The probability p that the transformed program on the given input produces the result with error less than bound δ While we cannot model p, we can specify minimum acceptable reliability r

Execution Reliability Determine if program’s reliability p > r

Execution Reliability Determine if program’s reliability p > r How to pick N?

How Many Runs Are Enough? Procedure that determines that p > r : Returns correct result most of the time – Wrong decision rate  – Tolerance region  Quickly determines extreme (very good or bad) transformations

Statistical Analysis Sequential Probability Ratio Test

Statistical Analysis Sequential Probability Ratio Test

Statistical Analysis Sequential Probability Ratio Test Bound ( δ ) Best Transformation 0.01LI 0.05LI 0.10LI+BI+BR 0.15LI+BI+BR

Statistical Analysis Sequential Probability Ratio Test Bound ( δ ) Best Transformation 0.01LI 0.05LI 0.10LI+BI+BR 0.15LI+BI+BR

Exploring Tradeoff Space Start: Sequential program with for loops Transformations: Parallel loop introduction Synchronization, Replication Quickstep [MIT-TR , TECS/PEC 2012] Prepare Find Transform Analyze Navigate

Exploring Tradeoff Space Start: Program with for loops Transformations: Skip loop iterations (multiple forms) Loop Perforation [ICSE 2010, ONWARD 2010, SAS 2011, FSE 2011] Prepare Find Transform Analyze Navigate

Exploring Tradeoff Space Start: Program with command line parameters Transformations: Alternate function versions activated by CL parameters Dynamic Knobs [ASPLOS 2011] Prepare Find Transform Analyze Navigate

Exploring Tradeoff Space Start: Program is a tree of Map-Reduce type tasks Transformations: Function Substitution Reduction Sampling NapRed [POPL 2012] Prepare Find Transform Analyze Navigate

Exploring Tradeoff Space Start: Parallel program with for loops Transformations: Removing Locks Opportunistic Barriers Dubstep [Today: RACES 2012] Prepare Find Transform Analyze Navigate

Reasoning About Accuracy Exploring levels of accuracy guarantees: Logic-based Probabilistic Statistical Empirical Prepare Find Transform Analyze Navigate