*time Optimization Heiko, Diego, Thomas, Kevin, Andreas, Jens.

Slides:



Advertisements
Similar presentations
Dependability analysis and evolutionary design optimisation with HiP-HOPS Dr Yiannis Papadopoulos Department of Computer Science University of Hull, U.K.
Advertisements

Analysis of Algorithms: time & space Dr. Jeyakesavan Veerasamy The University of Texas at Dallas, USA.
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Query Task Model (QTM): Modeling Query Execution with Tasks 1 Steffen Zeuch and Johann-Christoph Freytag.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
CS487 Software Engineering Omar Aldawud
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Presented By Srinivas Sundaravaradan. MACH µ-Kernel system based on message passing Over 5000 cycles to transfer a short message Buffering IPC L3 Similar.
System design-related Optimization problems Michela Milano Joint work DEIS Università di Bologna Dip. Ingegneria Università di Ferrara STI Università di.
Spring 2008 Network On Chip Platform Instructor: Yaniv Ben-Itzhak Students: Ofir Shimon Guy Assedou.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Compiler Challenges, Introduction to Data Dependences Allen and Kennedy, Chapter 1, 2.
Addressing Optimization for Loop Execution Targeting DSP with Auto-Increment/Decrement Architecture Wei-Kai Cheng Youn-Long Lin* Computer & Communications.
Simulated-Annealing-Based Solution By Gonzalo Zea s Shih-Fu Liu s
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
1 Platform-Based Design A paper by Alberto Sangiovanni-Vincentelli EE 249, 11/5/2002 Presenter: Mel Tsai.
Compiler Optimization-Space Exploration Adrian Pop IDA/PELAB Authors Spyridon Triantafyllis, Manish Vachharajani, Neil Vachharajani, David.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
SBSE Course 4. Overview: Design Translate requirements into a representation of software Focuses on –Data structures –Architecture –Interfaces –Algorithmic.
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Design Space Exploration
Pipelines for Future Architectures in Time Critical Embedded Systems By: R.Wilhelm, D. Grund, J. Reineke, M. Schlickling, M. Pister, and C.Ferdinand EEL.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Microprocessor-based systems Curse 7 Memory hierarchies.
Hybrid-Scheduling: A Compile-Time Approach for Energy–Efficient Superscalar Processors Madhavi Valluri and Lizy John Laboratory for Computer Architecture.
FPGA FPGA2  A heterogeneous network of workstations (NOW)  FPGAs are expensive, available on some hosts but not others  NOW provide coarse- grained.
Generative Programming. Automated Assembly Lines.
© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.
1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.
Zheng Wu. Background Motivation Analysis Framework Intra-Core Cache Analysis Cache Conflict Analysis Optimization Techniques WCRT Analysis Experiment.
C OMPARING T HREE H EURISTIC S EARCH M ETHODS FOR F UNCTIONAL P ARTITIONING IN H ARDWARE -S OFTWARE C ODESIGN Theerayod Wiangtong, Peter Y. K. Cheung and.
DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Novel, Emerging Computing System Technologies Smart Technologies for Effective Reconfiguration: The FASTER approach.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
Using Cache Models and Empirical Search in Automatic Tuning of Applications Apan Qasem Ken Kennedy John Mellor-Crummey Rice University Houston, TX Apan.
Energy-Aware Resource Adaptation in Tessellation OS 3. Space-time Partitioning and Two-level Scheduling David Chou, Gage Eads Par Lab, CS Division, UC.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
Platform Abstraction Group 3. Question How to deal with different types hardware and software platforms? What detail to expose to the programmer? What.
A Memory-hierarchy Conscious and Self-tunable Sorting Library To appear in 2004 International Symposium on Code Generation and Optimization (CGO ’ 04)
Martin Kruliš by Martin Kruliš (v1.1)1.
University of Michigan Electrical Engineering and Computer Science 1 Compiler-directed Synthesis of Multifunction Loop Accelerators Kevin Fan, Manjunath.
Models for runtime optimization Free Breakout Session Jens, Thomas, Alex, Christoph.
Machine Learning in Compiler Optimization By Namita Dave.
Adaptive Inlining Keith D. CooperTimothy J. Harvey Todd Waterman Department of Computer Science Rice University Houston, TX.
Improving System Availability in Distributed Environments Sam Malek with Marija Mikic-Rakic Nels.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
Optimizing Compilers Background
Genomic Data Clustering on FPGAs for Compression
Database Performance Tuning and Query Optimization
Design Space Exploration
Instruction Level Parallelism (ILP)
Progress Report 2014/04/23.
Operating System Introduction.
Chapter 11 Database Performance Tuning and Query Optimization
Realizing Closed-loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges Islam S. Badreldin*, Ann Gordon-Ross*,
6- General Purpose GPU Programming
Rohan Yadav and Charles Yuan (rohany) (chenhuiy)
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

*time Optimization Heiko, Diego, Thomas, Kevin, Andreas, Jens

Question How can design time help to reduce the complexity of runtime optimization? Curse of dimensionality During design-time the possible space for optimization is restricted Currently: – Start from a sequential solution – Stepwise increase Narrowing down the search space in each phase

Different Phases Design Time Implementation time Runtime

Design Time Architecture Design Optimization Variation Points – Component Allocation, Replication, Implementation – Hardware #cores, cpu speed, scheduling algorithm – Component configuration Goal: – Find good solution w.r.t. to different criteria Result  a set of solutions (Pareto Front) that are optimal w.r.t. different criteria  Performance, reliability, cost, energy efficiency (interesting since it is hard to optimize  Only valid for the assumptions made by the optimization

Implementation / Compilation Time Input Parameters – HW Properties #of cores, cache size Variation Points – Tiling for loop iterations – reorder of instructions Goal – Minimize execution time or minimum amount of communication Means: – Smart compilation – Manual – Measurement Result: – A program with several parameters that can be changed at runtime (#of threads, blocksize…) – Parameterized schedule – Model that describes the effect of parameters on execution time

Runtime Variation Points – # of used threads – Memory footprint – Application specific parameters Means – Search-based optimization – Combine with prediction to assess the effect of parameter changes Result – Parameter Values

Interaction of Phases Feedback cycles #cores & #of threads used How do these interaction affect the prediction & optimization process?

That’s it.

Expectation How to use model prediction for runtime optimization? – Block Size  HW-dependent? How to do good task scheduling? – Adaptive translation & execution Good starting point for search? – Preliminary measurement Multicriteria optimization at runtime? Extending design-time optimization towards runtime? Parameter- space exploration How can design time help to reduce the complexity of runtime optimization? – Curse of dimensionality – During design-time the possible space for optimization is restircted – Currently: Start from a sequential solution Stepwise increase Narrowing down the search space in each phase