Gedae, Inc. www.gedae.com Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

A LEGO-like Lightweight Component Architecture for Organic Computing Thomas Schöbel-Theuer, Universität Stuttgart
Simulation executable (simv)
Implementation of 2-D FFT on the Cell Broadband Engine Architecture William Lundgren Gedae), Kerry Barnes (Gedae), James Steed (Gedae)
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Chorus and other Microkernels Presented by: Jonathan Tanner and Brian Doyle Articles By: Jon Udell Peter D. Varhol Dick Pountain.
1 Architectural Complexity: Opening the Black Box Methods for Exposing Internal Functionality of Complex Single and Multiple Processor Systems EECC-756.
Feature requests for Case Manager By Spar Nord Bank A/S IBM Insight 2014 Spar Nord Bank A/S1.
MotoHawk Training Model-Based Design of Embedded Systems.
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Trace-Based Automatic Parallelization in the Jikes RVM Borys Bradel University of Toronto.
Computer Systems/Operating Systems - Class 8
Lecturer: Sebastian Coope Ashton Building, Room G.18 COMP 201 web-page: Lecture.
Concurrency: Mutual Exclusion and Synchronization Why we need Mutual Exclusion? Classical examples: Bank Transactions:Read Account (A); Compute A = A +
Distributed Computations
SCORE - Stream Computations Organized for Reconfigurable Execution Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, Yury Markovskiy Andre DeHon, John.
CS 584. A Parallel Programming Model We need abstractions to make it simple. The programming model needs to fit our parallel machine model. Abstractions.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 11 Slide 1 Architectural Design.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
WSN Simulation Template for OMNeT++
Object Based Operating Systems1 Learning Objectives Object Orientation and its benefits Controversy over object based operating systems Object based operating.
0 HPEC 2010 Automated Software Cache Management.
Course Instructor: Aisha Azeem
Programming the Cell Multiprocessor Işıl ÖZ. Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
UNIX System Administration OS Kernal Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept Kernel or MicroKernel Concept: An OS architecture-design.
© 2008, Renesas Technology America, Inc., All Rights Reserved 1 Purpose  This training course describes how to configure the the C/C++ compiler options.
Gedae, Inc. Implementing Modal Software in Data Flow for Heterogeneous Architectures James Steed, Kerry Barnes, William Lundgren Gedae, Inc.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Architectural Design portions ©Ian Sommerville 1995 Establishing the overall structure of a software system.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
Gedae Portability: From Simulation to DSPs to the Cell Broadband Engine James Steed, William Lundgren, Kerry Barnes Gedae, Inc
Prepared by: Sanaz Helmi Hoda Akbari Zahra Ahmadi Sharif University of Tech. Summer 2006 An Introduction to.
Smith’s Aerospace © P. Bailey & K. Vander Linden, 2005 Architecture: Component and Deployment Diagrams Patrick Bailey Keith Vander Linden Calvin College.
Programming Examples that Expose Efficiency Issues for the Cell Broadband Engine Architecture William Lundgren Gedae), Rick Pancoast.
CHAPTER TEN AUTHORING.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Architectural Design l Establishing the overall structure of a software system.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
May 16-18, Skeletons and Asynchronous RPC for Embedded Data- and Task Parallel Image Processing IAPR Conference on Machine Vision Applications Wouter.
The Mach System Abraham Silberschatz, Peter Baer Galvin, Greg Gagne Presentation By: Agnimitra Roy.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Introduction to VHDL Simulation … Synthesis …. The digital design process… Initial specification Block diagram Final product Circuit equations Logic design.
Abstract A Structured Approach for Modular Design: A Plug and Play Middleware for Sensory Modules, Actuation Platforms, Task Descriptions and Implementations.
M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young MACH: A New Kernel Foundation for UNIX Development Presenter: Wei-Lwun.
Review of Parnas’ Criteria for Decomposing Systems into Modules Zheng Wang, Yuan Zhang Michigan State University 04/19/2002.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Flexible Filters for High Performance Embedded Computing Rebecca Collins and Luca Carloni Department of Computer Science Columbia University.
Teaching Digital Logic courses with Altera Technology
Software Systems Division (TEC-SW) ASSERT process & toolchain Maxime Perrotin, ESA.
KERRY BARNES WILLIAM LUNDGREN JAMES STEED
Slide 1 Chapter 8 Architectural Design. Slide 2 Topics covered l System structuring l Control models l Modular decomposition l Domain-specific architectures.
9/30/2016 Distributed System Exploration Mirabilis Design Inc.
Gedae, Inc. Implementing Modal Software in Data Flow for Heterogeneous Architectures James Steed, Kerry Barnes, William Lundgren Gedae, Inc.
Adapted from Krste Asanovic
Module 12: I/O Systems I/O hardware Application I/O Interface
Support for Program Analysis as a First-Class Design Constraint in Legion Michael Bauer 02/22/17.
The Mach System Sri Ramkrishna.
Parallel Programming By J. H. Wang May 2, 2017.
Software Engineering with Reusable Components
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Parallel Programming in C with MPI and OpenMP
A Virtual Machine Monitor for Utilizing Non-dedicated Clusters
Chapter 13: I/O Systems.
Presentation transcript:

Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004

Gedae, Inc. Gedae is a block diagram language... Branch Mode1 Merge State Machine Software A Mode3F Mode2D E B C Express signal and data processing algorithms, parallelism, load balancing, fault tolerance and mode control What is Gedae?

Gedae, Inc. Gedae transforms under user control... Branch Mode1 Merge State Machine Transformed Graph A Mode3F Mode2D E B C q srs q sr q sr q sr r sr sr sr sr sr sr q q q q User can set optimization parameters that are independent of the graph to guide transformation

Gedae, Inc. operate efficiently on a virtual machine. Branch Merge State Machine F D E C q s q s q s q s r r r r r q q q q Mode1 rs s Mode3 Mode2 r r s s B rs PE1PE2PE3PE4PE5 XBUS BSP RTK Complete systems can be developed independent of the target system without losing runtime efficiency Mode2 rs A rs

Gedae, Inc. Gedae Language Gedae provides application information through –modules with well-defined behavior –ports with well-defined characteristics –and manifest connectivity with explicit sequential and parallel execution paths This information is implicit in most languages Gedae makes the information explicit –over 50 different information expression features x_fird in C out mult outa Information provided by language allows Gedae to analyze and efficiently implement algorithms b stream of tokens stream complex in[N](D); tokens contain complex data tokens are a vector with N elements box requires D input tokens. D

Gedae, Inc. Gedae Transformations The block diagram is transformed using over 100 algorithms. The transformations establish the: –Order of execution –Queue sizes –Granularities –Memory layout –Dynamic schedule parameters –Data transfer types and parameters –Mode control Functional Specification Heterogeneous HW Virtual Machine Detailed Model Transformations Implementation Specification Generation Deployable Application User Gedae Vendor Key The Gedae transformations build a detailed model of the deployed application. Gedae uses that information to provide visibility

Gedae, Inc. Gedae Virtual Machine (VM) Gedae provides the following components: –Command handler –Dynamic scheduler –Segmentation Support –Primitive Support –Visibility Support The vendor provides –Inter-processor communications –Optimized vector libraries –Other basic services The Gedae virtual machine makes applications processor independent Vendor Components Gedae Library Developer Library Multiprocessor Hardware Static Memory and Execution Schedules Managed by the Gedae Dynamic Scheduler Gedae Components Virtual Machine Generated Application

Gedae, Inc. Three Examples Real-Time Space-Time Adaptive Processing (RT-STAP) –Miter benchmark graph –Illustrates efficient parallel execution of large graph Multilevel Mode Graph –Illustrates nested mode control with distributed state –Dynamic data application Sonar Graph –Illustrates large data reduction during processing Each example illustrates features of the language, transformations, and virtual machine

Gedae, Inc. RT-STAP: Language Families permit replicating box and data elements

Gedae, Inc. Instantiation constants control the size of the graph Routing boxes allow equation based connectivity RT-STAP: Language

Gedae, Inc. User maps primitives to physical processors Gedae transforms graph by inserting send/receive primitives to communicate between partitions Gedae automatically creates executables to run on each processor Different mappings can be tried without modifying the graph – the needed transformation happens automatically RT-STAP: Transformations

Gedae, Inc. RT-STAP Transformations User can set transfer properties on send/recv pairs with Transfer Table Transformations automatically set parameters to send/recv pairs to communicate these properties to running application User can guide transformations to optimize implementation

Gedae, Inc. RT-STAP: Running on VM Send/Recv webs show interprocessor communication and uncover synchronization problems

Gedae, Inc. Memory Map Structure Display RT-STAP: Running on VM Preplanned use of memory allows distributed runtime debugging

Gedae, Inc. Mode Control: Language Branch boxes make mode changes and mark segment boundaries “Exclusive” branch outputs show where resources can be shared State shared between modes is explicitly declared in the graph The Gedae primitive language directly supports segmented data processing, sharing of resources, and distribution of state

Gedae, Inc. Mode Control: Language Branch box copies input data stream to one of a family of outputs based on a control stream. Output is: Segmented - the box will add segment boundaries to the output Dynamic - the box will state how much data is produced on the output at runtime. Exclusive - only one of the family of F outputs gets data on any firing. Allows sharing of resources and state. Name: cp_branchf_e Input: stream ControlParamRec in; Input: stream int c; Local: int last; Output: exclusive segmented dynamic stream ControlParamRec [F]out; Reset: { last = -1; } Apply: { int g,i; int prdc = 0; for (g=0; g<granularity; g++) { int j = c[g]; if (last != j) { if (0<=last && last<F) { produce(out[last],prdc); prdc = 0; segment(out[last],SEGMENT_END); } last = j; } if (0<=j && j<F) { *out++ = *in; prdc++; } in++; } produce(out[last],prdc); } The Gedae extensible language has no “built-in” primitives delivered primitives. Users can add custom primitives

Gedae, Inc. Mode Graph: Transformation User can set partitioning, mapping, data transfer methods, granularity, priority, queue sizes and schedule properties from the group control dialog Partition and Subschedule Set Data Transfer Methods Set Static Schedule Properties (Tasks) Set Granualrity and Priority Map Set Queue Size and Properties

Gedae, Inc. Mode Graph: Running on VM Each mode requires a different number of processors Branch boxes at one level are responsible for the dynamic distribution VM runtime kernel enforces dynamic data driven execution. Send and receive primitives and state transfer primitives use BSP of virtual machine to transfer data

Gedae, Inc. Primitives to send and receive state are automatically added by transformations Messages generated by Virtual Machine at mode change boundaries efficiently coordinate state transfers Result is efficient transparent use of shared state on distributed processing system Mode Graph: Running on VM

Gedae, Inc. Sonar: Language Sonar Graph creates low bandwidth output from high bandwidth input data

Gedae, Inc. Sonar: Language Connectivity + Port Descriptions gives information needed to schedule graph mx_vx produces R=120 tokens out for every 1 token in vx_multV box must fire 120 times for each firing of the mx_vx box. vx_fft box fires one time for each firing of vx_multV box Simple predetermined schedule generated from graph and info embedded in primitives inplace stream complex out[C](R) = in; mx_adjoin mx_vx vx_multV vx_fft Static Schedule Timeline Fire at granularity 120, 1 time Can create a multirate graph that has boxes firing at different granularities

Gedae, Inc. mx_adjoin mx_vx vx_multV vx_fft Static Schedule TimelineSubscheduled Timeline Fire at granularity 1, 120 times Fire at granularity 120, 1 time Multirate graphs can be implemented using subscheduling to improve speed and reduce memory usage User can place boxes in subschedules to strip-mine the vector processing Allows use of fast memory Can reduce memory usage Sonar: Transformation

Gedae, Inc. Auto-subscheduling has reduced memory needed by graph from 250 Mbytes to about 2.5 Mbytes - 100x improvement User can put boxes into named subschedules manually – but can be difficult Auto-Subscheduling Tool puts boxes in subschedules automatically Finds nested sets of connected boxes running at common granularities. Automatically sets subscheduling levels Auto-Subscheduling Tool Schedule Information Dialog Sonar: Transformation

Gedae, Inc. Multiple levels of subscheduling evident on Trace Table Sonar: Running on VM

Gedae, Inc. Conclusion Gedae Block Diagram Language allows simple expression of a wide range of algorithms User optimization information can be added without modifying block diagram 100+ transformations create efficient executable application from language and user information Application runs efficiently on Virtual Machine VM provides portability and visibility