Author : Cedric Augonnet, Samuel Thibault, and Raymond Namyst INRIA Bordeaux, LaBRI, University of Bordeaux Workshop on Highly Parallel Processing on a.

Slides:



Advertisements
Similar presentations
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Introduction to Rails.
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.
Motivation Desktop accelerators (like GPUs) form a powerful heterogeneous platform in conjunction with multi-core CPUs. To improve application performance.
Project Overview 2014/05/05 1. Current Project “Research on Embedded Hypervisor Scheduler Techniques” ◦ Design an energy-efficient scheduling mechanism.
UML Static diagrams. Static View: UML Component Diagram Component diagrams show the organization and dependencies among software components. Component:
1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
CASE Tools CIS 376 Bruce R. Maxim UM-Dearborn. Prerequisites to Software Tool Use Collection of useful tools that help in every step of building a product.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
1/25 Pointer Logic Changki PSWLAB Pointer Logic Daniel Kroening and Ofer Strichman Decision Procedure.
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
G RID R ESOURCE BROKER FOR SCHEDULING COMPONENT - BASED APPLICATIONS ON DISTRIBUTED RESOURCES Reporter : Yi-Wei Wu.
Last time: Runtime infrastructure for hybrid (GPU-based) platforms  Task scheduling Extracting performance models at runtime  Memory management Asymmetric.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
WMS systems manage and coordinate several independent subtasks. The coordination problems get even more serious when the subtasks are performed on separate.
German National Research Center for Information Technology Research Institute for Computer Architecture and Software Technology German National Research.
Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.
Focused Matrix Factorization for Audience Selection in Display Advertising BHARGAV KANAGAL, AMR AHMED, SANDEEP PANDEY, VANJA JOSIFOVSKI, LLUIS GARCIA-PUEYO,
GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.
1 Research Groups : KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems SCI 2 SMetrology and Models Intelligent.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Y. Kotani · F. Ino · K. Hagihara Springer Science + Business Media B.V Reporter: 李長霖.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Heterogeneity-Aware Peak Power Management for Accelerator-Based Systems Gui-Bin.
Dtsi/Sol CEA System Software Activities 125/02/2005VD R&D topics Designing tools and system software for:  The management of parallelism Mono-processor.
OPERATING SYSTEM SCHEDULING FOR EFFICIENT ONLINE SELF- TEST IN ROBUST SYSTEMS PRIDHVI RAJ RAMANUJULA CSc 8320.
MILAN: Technical Overview October 2, 2002 Akos Ledeczi MILAN Workshop Institute for Software Integrated.
14.1/21 Part 5: protection and security Protection mechanisms control access to a system by limiting the types of file access permitted to users. In addition,
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
Laboratory of Model Driven Engineering for Embedded Systems An Execution Framework for MARTE-based Models UML&AADL’2008 workshop Belfast, Northern Ireland.
QCAdesigner – CUDA HPPS project
Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,
CSC480 Software Engineering Lecture 10 September 25, 2002.
Parametric Optimization Of Some Critical Operating System Functions An Alternative Approach To The Study Of Operating Systems Design.
Week 14 Introduction to Computer Science and Object-Oriented Programming COMP 111 George Basham.
Managing Web Server Performance with AutoTune Agents by Y. Diao, J. L. Hellerstein, S. Parekh, J. P. Bigus Presented by Changha Lee.
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
Dynamic Scheduling Monte-Carlo Framework for Multi-Accelerator Heterogeneous Clusters Authors: Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Source:
Exploiting Instruction Streams To Prevent Intrusion Milena Milenkovic.
Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.
An Integrated GPU Power and Performance Model (ISCA’10, June 19–23, 2010, Saint-Malo, France. International Symposium on Computer Architecture)
Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.
General requirements for BES III offline & EF selection software Weidong Li.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Design and implementation Chapter 7 – Lecture 1. Design and implementation Software design and implementation is the stage in the software engineering.
18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.
Low-power Task Scheduling for GPU Energy Reduction Li Tang, Yiji Zhang.
SECTION 6 DESIGN STUDY. What’s in this section: –Design Variables –Design Studies Overview –Specifying an Objective –Execution Display Settings –Output.
CIS 375 Bruce R. Maxim UM-Dearborn
Computer Engg, IIT(BHU)
GPU Architecture and Its Application
Productive Performance Tools for Heterogeneous Parallel Computing
Introduction to Load Balancing:
Controlled Kernel Launch for Dynamic Parallelism in GPUs
Edinburgh Napier University
R. E. Wyllys Copyright 2003 by R. E. Wyllys Last revised 2003 Jan 15
Seamlessly distributing tasks between CPUs, GPUs, disks, clusters, ...
Accelerating MapReduce on a Coupled CPU-GPU Architecture
Class project by Piyush Ranjan Satapathy & Van Lepham
Konstantis Daloukas Nikolaos Bellas Christos D. Antonopoulos
Chapter 15 Introduction to Rails.
The Organizational Impacts on Software Quality and Defect Estimation
Automatic Handwriting Generation
Rohan Yadav and Charles Yuan (rohany) (chenhuiy)
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Author : Cedric Augonnet, Samuel Thibault, and Raymond Namyst INRIA Bordeaux, LaBRI, University of Bordeaux Workshop on Highly Parallel Processing on a Chip (HPPC 2009)

1) Introduction 2) What is StarPU ? 3) How to define and to build performance models ? 4) Build history-based performance models dynamically 5) Experimental validation 6) Conclusion

 Multi-core architectures featuring specialized accelerator ◦ Those are getting an increasing amount of attention. ◦ This success will probably influence the design of future High Performance Computing hardware.  Homogeneous multi-core system → Heterogeneous multi-core system  Static prediction → Dynamic prediction

 Auto-tuning performance prediction approach ◦ based on performance history tables dynamically built during the application run.

 A runtime system for task scheduling on heterogeneous multi-core architecture.  The design of StarPU is organized around three main components: ◦ An unified execution model. ◦ A data management library. ◦ A scheduling framework.

 Define performance model : ◦ We need to decide which parameters the model should depend on. ◦ Find relationship between these parameters.

 Build performance model : ◦ It is common to use specific pre-calibration program to build those model. ◦ It is however possible to design a model based on the amount of computations per task, and to calibrate the parameters by the means of a regression. ◦ StarPU can therefore automatically calibrate parametric models, either at runtime using linear regression models or offline in the case of non- linear models.

 Regression analysis will be create a model of dependent variable and independent variable.  In the model, we can be prediction value of dependent variable by independent variable.  General cases are linear regression and non- linear regression

 Measuring tasks' duration.  Identifying task kinds.  Feeding and looking up from the model.

 Each computational kernel is associated with a hash table per architecture.  Steps : 1.A task is submitted to StarPU 2.It computes its hash. 3.Consults the hash table corresponding to the proper kernel-architecture pair to retrieve the average execution time previously measured for this kind of task. 4.Update hash table, and save the new hash table to a file. (These performance models are persistent between different runs.)

 Environment : ◦ They have implemented these automatic model calibration mechanisms in StarPU. ◦ Multi-core CPU, GPU, Cell processor(SPU)

Performance feedback tools

 We have proposed a generic approach to seamlessly build history-based performance models.  It has been implemented within the StarPU runtime system with the support of its integrated data management library, and we have shown how StarPU's performance feedback tools help the programmer to analyze whether the resulting performance prediction are relevant or not.