Telescoping Languages A Framework for Generating High- Performance Problem-Solving Systems Ken Kennedy Center for High Performance Software Rice University.

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

Agenda Definitions Evolution of Programming Languages and Personal Computers The C Language.
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Long-Term Information Technology Research Meeting the PITAC Challenge Ken Kennedy Center for High Performance Software Rice University
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Systems Software.
Telescoping Languages: A Compiler Strategy for Implementation of High-Level Domain-Specific Programming Systems Ken Kennedy Rice University.
R R R CSE870: Advanced Software Engineering (Cheng): Intro to Software Engineering1 Advanced Software Engineering Dr. Cheng Overview of Software Engineering.
Reasons to study concepts of PL
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
COMP205 Comparative Programming Languages Part 1: Introduction to programming languages Lecture 3: Managing and reducing complexity, program processing.
The Procedure Abstraction Part I: Basics Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
©TheMcGraw-Hill Companies, Inc. Permission required for reproduction or display. COMPSCI 125 Introduction to Computer Science I.
ISBN Lecture 01 Preliminaries. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.1-2 Lecture 01 Topics Motivation Programming.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
Lecture 1CS 380C 1 380C Last Time –Course organization –Read Backus et al. Announcements –Hadi lab Q&A Wed 1-2 in Painter 5.38N –UT Texas Learning Center:
Reusability and Portability Chapter 8 CSCI Reusability and Portability  The length of the development process is critical.  No matter how high.
Globus Computing Infrustructure Software Globus Toolkit 11-2.
COP4020 Programming Languages
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 15 Slide 1 Real-time Systems 1.
PROGRAMMING LANGUAGES The Study of Programming Languages.
Overview of the Course. Critical Facts Welcome to CISC 672 — Advanced Compiler Construction Instructor: Dr. John Cavazos Office.
Center for Research on Multicore Computing (CRMC) Overview Ken Kennedy Rice University
The Procedure Abstraction Part I: Basics Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Overview of the Course Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Overview of the Course Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
1 History of compiler development 1953 IBM develops the 701 EDPM (Electronic Data Processing Machine), the first general purpose computer, built as a “defense.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages.
TMC BioGrid A GCC Consortium Ken Kennedy Center for High Performance Software Research (HiPerSoft) Rice University
1 ENERGY 211 / CME 211 Lecture 26 November 19, 2008.
Development Timelines Ken Kennedy Andrew Chien Keith Cooper Ian Foster John Mellor-Curmmey Dan Reed.
Generative Programming. Automated Assembly Lines.
Middleware for FIs Apeego House 4B, Tardeo Rd. Mumbai Tel: Fax:
Software Support for High Performance Problem Solving on the Grid An Overview of the GrADS Project Sponsored by NSF NGS Ken Kennedy Center for High Performance.
COP4020 Programming Languages Names, Scopes, and Bindings Prof. Xin Yuan.
High Performance Computing on the Grid: Is It for You? With a Discussion of Help on the Way (the GrADS Project) Ken Kennedy Center for High Performance.
Module 4 Part 2 Introduction To Software Development : Programming & Languages Introduction To Software Development : Programming & Languages.
02/09/2010 Industrial Project Course (234313) Virtualization-aware database engine Final Presentation Industrial Project Course (234313) Virtualization-aware.
Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.
1 EECS 6083 Compiler Theory Based on slides from text web site: Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
RUN-Time Organization Compiler phase— Before writing a code generator, we must decide how to marshal the resources of the target machine (instructions,
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
FORTRAN History. FORTRAN - Interesting Facts n FORTRAN is the oldest Language actively in use today. n FORTRAN is still used for new software development.
Requirements Engineering Requirements Engineering in Agile Methods Lecture-28.
Compilers as Collaborators and Competitors of High-Level Specification Systems David Padua University of Illinois at Urbana-Champaign.
SoftwareServant Pty Ltd 2009 SoftwareServant ® Using the Specification-Only Method.
High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
Lesson 1 1 LESSON 1 l Background information l Introduction to Java Introduction and a Taste of Java.
Fortran Compilers David Padua University of Illinois at Urbana-Champaign.
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
CS412/413 Introduction to Compilers and Translators April 2, 1999 Lecture 24: Introduction to Optimization.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
VGrADS Programming Tools Research: Vision and Overview Ken Kennedy Center for High Performance Software Rice University
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
A Single Intermediate Language That Supports Multiple Implemtntation of Exceptions Delvin Defoe Washington University in Saint Louis Department of Computer.
Advanced Software Engineering Dr. Cheng
Overview of the Course Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Compiling R for Performance in Bioinformatics Applications
Lesson Objectives Aims Key Words Compiler, interpreter, assembler
The Procedure Abstraction Part I: Basics
Chapter 7 –Implementation Issues
Overview of the Course Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Presentation transcript:

Telescoping Languages A Framework for Generating High- Performance Problem-Solving Systems Ken Kennedy Center for High Performance Software Rice University Center for High Performance Software Research

Collaborators Bradley Broom Arun Chauhan Keith Cooper Jack Dongarra Rob Fowler Lennart Johnsson Chuck Koelbel Cheryl McCosh John Mellor-Crummey Linda Torczon

Center for High Performance Software Research Philosophy Compiler Technology = Off-Line Processing —Goals: improved performance and language usability –Making it practical to use the full power of the language —Trade-off: preprocessing time versus execution time —Rule: performance of both compiler and application must be acceptable to the end user Examples —Macro expansion –PL/I interpretive macro facility –Fixed macros can be compiled  10x improvement with compilation —TransMeta “Code Morphing” –Dynamic compilation of machine code

Center for High Performance Software Research Making Languages Usable It was our belief that if FORTRAN, during its first months, were to translate any reasonable “scientific” source program into an object program only half as fast as its hand-coded counterpart, then acceptance of our system would be in serious danger... I believe that had we failed to produce efficient programs, the widespread use of languages like FORTRAN would have been seriously delayed. — John Backus

Center for High Performance Software Research A Java Experiment Scientific Programming In Java —Goal: make it possible to use the full object-oriented power for scientific applications –Many scientific implementations mimic Fortran style OwlPack Benchmark Suite —Three versions of LinPACK in Java –Fortran style –Lite object-oriented style –Full polymorphism  No differences for type Experiment —Compare running times for different styles on same Java VM —Evaluate potential for compiler optimization

Center for High Performance Software Research Performance Results Results Using JDK 1.2 JIT on SUN Ultra 5

Center for High Performance Software Research Programming Productivity Challenges —programming is hard —professional programmers are in short supply —high performance will continue to be important One Strategy: Make the End User a Programmer —professional programmers develop components —users integrate components using: –problem-solving environments (PSEs) based on scripting languages (possibly graphical)  examples: Visual Basic, Tcl/Tk, AVS, Khoros Compilation for High Performance —translate scripts and components to common intermediate language —optimize the resulting program using interprocedural methods

Center for High Performance Software Research Script-Based Programming Component Library Component Library User Library User Library Script

Center for High Performance Software Research Script-Based Programming Component Library Component Library User Library User Library Script Intermediate Code Intermediate Code Translator

Center for High Performance Software Research Script-Based Programming Component Library Component Library User Library User Library Script Intermediate Code Intermediate Code Global Optimizer Global Optimizer Translator

Center for High Performance Software Research Code Generator Code Generator Script-Based Programming Component Library Component Library User Library User Library Script Intermediate Code Intermediate Code Global Optimizer Global Optimizer Translator

Center for High Performance Software Research Code Generator Code Generator Script-Based Programming Component Library Component Library User Library User Library Script Intermediate Code Intermediate Code Global Optimizer Global Optimizer Translator Problem: long compilation times, even for short scripts!

Center for High Performance Software Research Code Generator Code Generator Script-Based Programming Component Library Component Library User Library User Library Script Intermediate Code Intermediate Code Global Optimizer Global Optimizer Translator Problem: long compilation times, even for short scripts! Problem: expert knowledge on specialization lost

Center for High Performance Software Research Telescoping Languages L 1 Class Library L 1 Class Library

Center for High Performance Software Research Telescoping Languages L 1 Class Library L 1 Class Library Compiler Generator Compiler Generator L 1 Compiler Could run for hours

Center for High Performance Software Research Telescoping Languages L 1 Class Library L 1 Class Library Script Compiler Generator Compiler Generator L 1 Compiler Script Translator Script Translator Optimized Application Optimized Application Vendor Compiler Vendor Compiler Could run for hours understands library calls as primitives

Center for High Performance Software Research Telescoping Languages: Advantages Compile times can be reasonable —More compilation time can be spent on libraries —Script compilations can be fast –Components reused from scripts may be included in libraries High-level optimizations can be included —Based on specifications of the library designer –Properties often cannot be determined by compilers –Properties may be hidden after low-level code generation User retains substantive control over language performance —Mature code can be built into a library and incorporated into language Reliability can be improved —Specialization by compilation framework, not user

Center for High Performance Software Research Applications Matlab Compiler —Automatically generated from LAPACK or ScaLAPACK –With help via annotations from the designer Generator for ARPACK —Library developer maintains code in Matlab —Currently recodes in Fortran by hand — could be automated Flexible Data Distributions —Failing of HPF: inflexible distributions —Data distribution == collection of interfaces that meet specs —Compiler applies standard transformations Generator for Grid Computations —GrADS: automatic generation of NetSolve

Center for High Performance Software Research Application: Matlab for Signal Processing Automatically generated from LAPACK or ScaLAPACK —With help via annotations from the designer Special project: Signal Processing Applications written in Matlab —Users want simplicity and performance —Matlab currently gives them the first but not the second –Codes rewritten in C for communications devices —Run signal processing procedures through the generator –Many code modules reused

Center for High Performance Software Research Application: POOMA Procedure library for computational hydrodynamics —Distributed data structures –vectors, arrays, tensors —Coded in C++ —Context optimizations coded into template expansion mechanism –20-line program compiles for over an hour on 32 processors —Enhanced reliability Telescoping languages —Generate POOMA from simpler libraries for Fortran and Java

Center for High Performance Software Research Requirements of Script Compilation Scripts must generate efficient programs —Comparable to those generated from standard interprocedural methods —Avoid need to recode in standard language Script compile times should be proportional to length of script —Not a function of the complexity of the library —Principle of “least astonishment”

Center for High Performance Software Research Telescoping Languages Script L 1 Compiler Script Translator Script Translator Optimized Application Optimized Application Vendor Compiler Vendor Compiler understands library calls as primitives

Center for High Performance Software Research Script Compilation Algorithm Propagate variable property information throughout the program —Use jump functions to propagate through calls to library Apply high-level transformations —Driven by information about properties —Ensure that process applies to expanded code Select and substitute specialized variants for library calls —At each call site, determine the best approximation to parameter properties that is reflected by a specialized fragment in the code database –Use a method similar to “unification” —Substitute fragment from database for call –This could contain a call to a lower-level library routine.

Center for High Performance Software Research Telescoping Languages L 1 Class Library L 1 Class Library Compiler Generator Compiler Generator L 1 Compiler Could run for hours

Center for High Performance Software Research Library Analysis and Preparation Discovery of Critical Properties and Propagator Construction Analysis of Transformation Specifications —Construction of a specification-driven translator for use in compiling scripts Code Specialization for Different Sets of Parameter Properties

Center for High Performance Software Research Library Analysis and Preparation Discovery of Critical Properties and Propagator Construction —Which properties of parameters affect optimization –Examples: value, type, rank and size of matrix

Center for High Performance Software Research Discovery of Critical Properties From specifications by the library designer —If the matrix is triangular, then… From examining the code itself —Look at a promising optimization point —Determine conditions under which we can make significant optimizations —See if any of these conditions can be mapped back to parameter properties From sample calling programs provided by the designer  call average(shift(A,-1), shift(A,+1)) –Can save on memory accesses

Center for High Performance Software Research Examining the Code Example from LAPACK subroutine VMP(C, A, B, m, n, s) integer m,n,s; real A(n), B(n), C(m) i = 1 do j = 1, n C(i) = C(i) + A(j)*B(j) i = i + s enddo end VMP Vectorizable if s != 0

Center for High Performance Software Research Library Analysis and Preparation Discovery of Critical Properties and Propagator Construction —Which properties of parameters affect optimization –Examples: value, type, rank and size of matrix —Construction of jump functions for the library calls –With respect to critical properties

Center for High Performance Software Research Library Analysis and Preparation Discovery of Critical Properties and Propagator Construction —Which properties of parameters affect optimization –Examples: value, type, rank and size of matrix —Construction of jump functions for the library calls –With respect to critical properties Analysis of Transformation Specifications —Construction of a specification-driven translator for use in compiling scripts

Center for High Performance Software Research High-level Identities Often library developer knows high-level identities —Difficult for the compiler to discern —Optimization should be performed on sequences of calls rather than code remaining after expansion Example: Push and Pop —Designer Push(x) followed by y = Pop() becomes y = x –Ignore possibility of overflow in Push Example: Trigonometric Functions —Sin and Cos used in same loop—both computed using expensive calls to the trig library —Recognize that cos(x) and sin(x) can be computed by a single call to sincos(x,s,c) in a little more than the time required for sin(x).

Center for High Performance Software Research Out of Core Arrays —Operations Get(I,J) and GetRow(I,Lo,N) Get in a loop  Do I  Do J  … Get(I,J)  Enddo When can we vectorize? —Turn into GetRow —Answer: if Get is not involved in a recurrence. –How can we know? Contextual Expansions

Center for High Performance Software Research Contextual Expansions Out of Core Arrays —Operations Get(I,J) and GetRow(I,Lo,N) Get in a loop  Do I  Do J  … Get(I,J)  Enddo When can we vectorize? —Turn into GetRow —Answer: if Get is not involved in a recurrence. –How can we know? Vector versions of library routines can often be constructed

Center for High Performance Software Research Library Analysis and Preparation Discovery of Critical Properties and Propagator Construction —Which properties of parameters affect optimization –Examples: value, type, rank and size of matrix —Construction of jump functions for the library calls –With respect to critical properties Analysis of Transformation Specifications —Construction of a specification-driven translator for use in compiling scripts Code Specialization for Different Sets of Parameter Properties —For each set, assume and optimize to produce specialized code

Center for High Performance Software Research Code Selection Example Library compiler develops inlining tables subroutine VMP(C, A, B, m, n, s) integer m,n,s; real A(n), B(n), C(m) i = 1 do j = 1, n C(i) = C(i) + A(j)*B(j) i = i + s enddo end VMP case on s: ==0: C(1) = C(1) + sum(A(1:n)*B(1:n)) !=0: C(1:n:s) = C(1:n:s) + A(1:n)*B(1:n) default: call VMP(C,A,B,m,n,s) Inlining Table: vector

Center for High Performance Software Research Application: Matlab for Signal Processing Signal processing users want simplicity, programming power, and performance —Currently over 500,000 Matlab licenses Matlab gives them simplicity and power but not performance —Codes prototyped in Matlab —Codes rewritten in C for communications devices –Users would rather not do this Telescoping Languages: —Many signal processing code modules reused over and over —Run these procedures through the language generator –Produce Matlab SP, a high-level domain-specific environment

Center for High Performance Software Research Matlab SP: Preliminary Findings Optimizations That Pay Off —Vectorization –Wins because of hand coded vector/matrix primitives —Elimination of common array subexpressions —Optimization of array allocation and reshape operations New Optimizations —Procedure vectorization –Interchange call and loop after distribution —Procedure strength reduction –Subdivide procedure in to variant and invariant components –Use invariant component only once

Center for High Performance Software Research Procedure Strength Reduction Procedure called in loop for i = 1:N x = f(c 1,c 2,i,c 3 ) end Becomes f  (c 1,c 2, c 3 ) for i = 1:N x = f  (i) end Further improvements possible —Use code differentiation to compute differences –ADIFOR

Center for High Performance Software Research Procedure Strength Reduction Performance

Center for High Performance Software Research Summary Optimization enables language power —Principle: encourage rather than discourage use of powerful features –Good programming practice should be rewarded Programming support is challenging —Particularly with application and platform complexity on the rise –Compounded by the shortage of IT professionals Strategy: make end users into application developers —Telescoping languages: Framework for generating high-level problem-solving systems —Must produce high-quality code –Avoid the need to recode by hand

Center for High Performance Software Research Summary PITAC: Focus on long-term, high-risk research The scalable infrastructure should be a scalable problem-solver —Access to information is not enough —Linked computation is not enough Programming support is still relatively primitive —Application and platform complexity increasing —Compounded by the shortage of IT professionals Strategy: make end users into application developers —Professional programmers focus on components —End users build applications in scripting systems Telescoping languages: —Framework for generation of high-level problem-solving systems

Software Support for High- Performance Problem Solving (With Application to Grid Programming) Ken Kennedy Center for High Performance Software Rice University Center for High Performance Software Research

Collaborators Bradley Broom Arun Chauhan Keith Cooper Jack Dongarra Rob Fowler Dennis Gannon Lennart Johnsson John Mellor-Crummey John Reynders Linda Torczon

Center for High Performance Software Research Lessons from PITAC Findings —Research funding increasingly focused on short term —Universities weakened –Impact on workforce —Industry cannot fill the gap –Return on investment: 24 percent versus 66 percent Refocus Research on Long-Term, High-Risk Problems —Requires an expansion of the base Invest in Key Areas —Software —Scalable Information Infrastructure —High Performance Computing —Social, Economic, and Workforce Issues (Education)

Center for High Performance Software Research Two IT Grand Challenges The Internet as Problem-Solving Engine —Challenge: How do we develop applications and manage their execution? –Reliable performance under varying load –Accessibility to ordinary scientists and engineers —GrADS Project Software Productivity —Challenge: How do we increase the nation’s productivity in software development –Too much software to be written, too few developers –Application and platform complexity increasing —Idea: make it possible for end users to be application developers

Center for High Performance Software Research Grids are “Hot” Computational Data Information Access Knowledge DISCOM SinRG APGrid TeraGrid

Center for High Performance Software Research National Distributed Problem Solving

Center for High Performance Software Research National Distributed Problem Solving

Center for High Performance Software Research National Distributed Problem Solving Supercomput er

Center for High Performance Software Research National Distributed Problem Solving Supercomput er Database

Center for High Performance Software Research National Distributed Problem Solving Supercomput er Database

Center for High Performance Software Research National Distributed Problem Solving Database Supercomput er Database Supercomput er

Center for High Performance Software Research Today: Globus Developed by Ian Foster and Carl Kesselman —Grew from the I-Way (SC-95) Basic Services for distributed computing —Resource discovery and information services —User authentication and access control —Job initiation —Communication services (Nexus and MPI) Applications are programmed by hand —Many applications —User responsible for resource mapping and all communication –Existing users acknowledge how hard this is

Center for High Performance Software Research GrADSoft Architecture Goal: reliable performance on dynamically changing resources Whole- Program Compiler Libraries Binder Real-time Performance Monitor Performance Problem Resource Negotiator Scheduler Grid Runtime System Source Appli- cation Config- urable Object Program Software Components Performance Feedback Negotiation

Center for High Performance Software Research GrADSoft Architecture Execution Environment Whole- Program Compiler Libraries Binder Real-time Performance Monitor Performance Problem Resource Negotiator Scheduler Grid Runtime System Source Appli- cation Config- urable Object Program Software Components Performance Feedback Negotiation

Center for High Performance Software Research GrADSoft Architecture Execution Environment Whole- Program Compiler Libraries Binder Real-time Performance Monitor Performance Problem Resource Negotiator Scheduler Grid Runtime System Source Appli- cation Config- urable Object Program Software Components Performance Feedback Negotiation

Center for High Performance Software Research GrADSoft Architecture Program Preparation System Whole- Program Compiler Libraries Binder Real-time Performance Monitor Performance Problem Resource Negotiator Scheduler Grid Runtime System Source Appli- cation Config- urable Object Program Software Components Performance Feedback Negotiation

Center for High Performance Software Research GrADSoft Architecture Problem-Solving Environments Whole- Program Compiler Libraries Binder Real-time Performance Monitor Performance Problem Resource Negotiator Scheduler Grid Runtime System Source Appli- cation Config- urable Object Program Software Components Performance Feedback Negotiation

Center for High Performance Software Research Library Analysis and Preparation Discovery of Critical Properties and Propagator Construction —Which properties of parameters affect optimization –Examples: value, type, rank and size of matrix —Construction of jump functions for the library calls –With respect to critical properties Analysis of Transformation Specifications —Construction of a specification-driven translator for use in compiling scripts Code Specialization for Different Sets of Parameter Properties —For each set, assume and optimize to produce specialized code