Using JetBench to Evaluate the Efficiency of Multiprocessor Support for Parallel Processing HaiTao Mei and Andy Wellings Department of Computer Science.

Slides:



Advertisements
Similar presentations
On the Locality of Java 8 Streams in Real- Time Big Data Applications Yu Chan Ian Gray Andy Wellings Neil Audsley Real-Time Systems Group, Computer Science.
Advertisements

May 2, 2015©2006 Craig Zilles1 (Easily) Exposing Thread-level Parallelism  Previously, we introduced Multi-Core Processors —and the (atomic) instructions.
Lecturer: Simon Winberg Lecture 18 Amdahl’s Law & YODA Blog & Design Review.
Praveen Yedlapalli Emre Kultursay Mahmut Kandemir The Pennsylvania State University.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
© Andy Wellings, 2004 Concurrent and Real-Time Programming in Java  Electronic copies of course foils available via 
Discrete-Event Simulation: A First Course Steve Park and Larry Leemis College of William and Mary.
T-diagrams “Mommy, where do compilers come from?” Adapted from:
Programming Languages Structure
© Andy Wellings, 2003 Roadmap  Introduction  Concurrent Programming  Communication and Synchronization  Completing the Java Model  Overview of the.
Enhancing the Platform Independence of the Real-Time Specification for Java Andy Wellings, Yang Chang and Tom Richardson University of York.
Today Concepts underlying inferential statistics
Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.
“Evaluating MapReduce for Multi-core and Multiprocessor Systems” Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, Christos Kozyrakis Computer.
EMBEDDED SOFTWARE Team victorious Team Victorious.
Alon Horn and Oren Ierushalmi Supervised by Mony Orbach Winter 2010 Characterization Presentation Implementation of an Engine Control Unit over Many-Core.
T WO W AY ANOVA W ITH R EPLICATION  Also called a Factorial Experiment.  Factorial Experiment is used to evaluate 2 or more factors simultaneously. 
T WO WAY ANOVA WITH REPLICATION  Also called a Factorial Experiment.  Replication means an independent repeat of each factor combination.  The purpose.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Embedded Java Research Geoffrey Beers Peter Jantz December 18, 2001.
Introduction to High-Level Language Programming
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
1 Testing Concurrent Programs Why Test?  Eliminate bugs?  Software Engineering vs Computer Science perspectives What properties are we testing for? 
Computer System Architectures Computer System Software
Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms Reporter: Jilin Zhang Authors:Changjun Hu, Yali.
Parallel implementation of RAndom SAmple Consensus (RANSAC) Adarsh Kowdle.
Optimizing the trace transform Using OpenMP and CUDA Tim Besard
(C) 2009 J. M. Garrido1 Object Oriented Simulation with Java.
OpenMP: Open specifications for Multi-Processing What is OpenMP? Join\Fork model Join\Fork model Variables Variables Explicit parallelism Explicit parallelism.
Chapter 7 Experimental Design: Independent Groups Design.
EEL Software development for real-time engineering systems.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
CS 346 – Chapter 4 Threads –How they differ from processes –Definition, purpose Threads of the same process share: code, data, open files –Types –Support.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
I. Statistical Tests: A Repetive Review A.Why do we use them? Namely: we need to make inferences from incomplete information or uncertainty þBut we want.
Conformance Test Experiments for Distributed Real-Time Systems Rachel Cardell-Oliver Complex Systems Group Department of Computer Science & Software Engineering.
Adaptive Multi-Threading for Dynamic Workloads in Embedded Multiprocessors 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan,
Chapter 4 – Threads (Pgs 153 – 174). Threads  A "Basic Unit of CPU Utilization"  A technique that assists in performing parallel computation by setting.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
CS4730 Real-Time Systems and Modeling Fall 2010 José M. Garrido Department of Computer Science & Information Systems Kennesaw State University.
Threaded Programming Lecture 2: Introduction to OpenMP.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Threads. Thread A basic unit of CPU utilization. An Abstract data type representing an independent flow of control within a process A traditional (or.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 10 Introduction to the Analysis.
OPERATING SYSTEMS CS 3502 Fall 2017
Current Generation Hypervisor Type 1 Type 2.
Parallel Programming By J. H. Wang May 2, 2017.
Introduction Enosis Learning.
Improving java performance using Dynamic Method Migration on FPGAs
Anne Pratoomtong ECE734, Spring2002
Introduction Enosis Learning.
Objective of This Course
I. Statistical Tests: Why do we use them? What do they involve?
Programming with Shared Memory
Chapter 4: Threads & Concurrency
Statistical Thinking and Applications
Chapter 10 Introduction to the Analysis of Variance
Process Management -Compiled for CSIT
Presentation transcript:

Using JetBench to Evaluate the Efficiency of Multiprocessor Support for Parallel Processing HaiTao Mei and Andy Wellings Department of Computer Science University of York, UK

Goal of the Work To compare how efficiently the concurrency models of various programming languages are implemented on multiprocessor SMP systems Given a program running on Linux/Windows does it really matter what language you use? Evaluate  Ada, C (+ OMP), RTSJ (Jamaica) ― with compiled implementations  C#, Java, Java with Thread Pools, Java with JOMP ― with JIT implementations

Approach Use JetBench  an application benchmark written in C used in conjunction with OpenMP  contains real time jet engine thermodynamic calculations  calculations are inspired by a sequential application named NASA EngineSim  simple parallel program Rewrite JetBench in  Ada, C (+ OMP), RTSJ (Jamaica)  C#, Java, Java with Thread Pools, Java with JOMP

Approach continued Use Linux  on a physical machine (1 – 8 cores)  via the Simics simulator (1-128 cores) Measure response times and speed-ups Use statistical techniques to evaluate the significance of the results

JetBench Goals:  to be a benchmark that can execute in parallel on multiprocessor platforms  to provide a tool to analyze real time performance of a real-time operating system, including thread scheduling, execution efficiency and memory management capabilities 3 step execution  initialization  create threads with the help of OpenMP and carry out the calculation in parallel  print out results

JetBench: Step 1 Initializes parameters and opens a file that contains all the input data needed The input data consists of the values of three sensors: altitude, air speed, and throttle. A fourth input value gives a contrived deadline that represents a time constraint on the calculation of the engine's performance figures The deadline is fixed at a value of 0.05 seconds  This has been chosen to be approximately 2-3 times the value of the execution time of the raw C code calculations  Hence, the required utilization is less than 50%

JetBench: Step 2 Creates and starts one worker thread for each processing core  All the threads perform the same operations Each thread calculates π Then reads input data and carries out thermodynamic, geometry and engine performance calculations The times taken to perform the calculations are recorded When all the data is processed, the threads terminate

JetBench: Step 3 During the last step, the results are collected from the second step and are printed

Issues with JetBench Code is littered with needless access to shared variables and never-used variables Race conditions — no use of synchronization when shared variables are needed Confuses response times with execution time Ignores thread creation and termination overheads

Revised structure

Languages Ada  AdaCore GNAT GPL 4.6 C used with OMP  gcc and OpenMP 3.1 Java 8  Java version 1.8.0_05 (build 1.8.0_05-b13) Java using Open MP  jomp1.0b. RTSJ  Jamaica Builder 6.2 Release 4 (build 8016). C#  Mono JIT compiler version

Results I

Results: II

Results: III

Analysis of Results Analysis of variance (ANOVA) is a general statistical technique for separating the total variation in a set of measurements into  the variation due to measurement noise and the  variation due to real differences among the alternatives being compared

Two-way ANOVA Examines the influence of two different independent variables on one dependent variable It determines both the main effect of contributions of each independent variable and if there is an interaction effect between them The analysis computes an F value which describes this relationship It is an appropriate technique for the analysis of the measurements of execution/response times

Goals of analysis To prove that both programming languages and the number of cores have an impact on the benchmark's response times,  and also that there is significant interaction between them,  i.e. different programming languages have different efficiency impacts in multiprocessor parallel processing The null hypothesis is made that both factors (programming languages and number of cores) have no effect on the benchmark's response times, i.e. its efficiency

ANOVA Analysis Calculations performed by Matlab SourceFProbability of null hypothesis F value for 0.01 probability of null hypothesis Cores < Languages < Interaction914.45< Of course, this doesn’t tell us much. It could be one bad implementation, e.g. Java JOMP!

Tukey HSD Analysis Compares the means of every language with the means of every other language to find significant differences A >> B indicates the means are significantly different and A is larger than B A > B indicates the means are NOT significantly different but A is larger than B Matlab used to perform calculations

Response Times: 1 and 2 Cores Java JOMP >> C# >> Java > Java ForkJoin >> RTSJ > Ada > C + OpenMP Java JOMP >> C# >> Java ForkJoin > Java >> RTSJ >> C + OpenMP > Ada

Response Times: 4 and 8 Cores

Speed-UP: 2 and 4 Cores RTSJ > C+OpenMP > C# > Ada > Java ForkJoin > Java >> Java JOMP C+OpenMP > C# > RTSJ > Ada >> Java > Java ForkJoin >> Java JOMP

Speed-UP: 6 and 8 Cores Impact of hyperthreading?

Response Times using Simics

Speed-Up using Simics

Conclusions It now taken for granted that real-time and embedded platforms will be multicore Plethora of programming languages can be used We chose some languages targeting real-time and others (that tend to be JITted) Even allowing a warm up phase, JIT couldn’t match compiled code Thread creation cost becomes more significant if not processing lots of input data