Vineet Kumar and Laurie Hendren McGill University MiX10 Compiling MATLAB for High Peformance Computing.

Slides:



Advertisements
Similar presentations
Optimizing Compilers for Modern Architectures Allen and Kennedy, Chapter 13 Compiling Array Assignments.
Advertisements

Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
879 CISC Parallel Computation High Performance Fortran (HPF) Ibrahim Halil Saruhan Although the [Fortran] group broke new ground …
Fortran Jordan Martin Steven Devine. Background Developed by IBM in the 1950s Designed for use in scientific and engineering fields Originally written.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Potential Languages of the Future Chapel,
CSI 3125, Preliminaries, page 1 Programming languages and the process of programming –Programming means more than coding. –Why study programming languages?
AUTOMATIC GENERATION OF CODE OPTIMIZERS FROM FORMAL SPECIFICATIONS Vineeth Kumar Paleri Regional Engineering College, calicut Kerala, India. (Currently,
Computational Methods of Scientific Programming Lecturers Thomas A Herring, Room A, Chris Hill, Room ,
CIS 101: Computer Programming and Problem Solving Lecture 8 Usman Roshan Department of Computer Science NJIT.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
Telescoping Languages: A Compiler Strategy for Implementation of High-Level Domain-Specific Programming Systems Ken Kennedy Rice University.
Introduction to Fortran Fortran Evolution Drawbacks of FORTRAN 77 Fortran 90 New features Advantages of Additions.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Scripting Languages For Virtual Worlds. Outline Necessary Features Classes, Prototypes, and Mixins Static vs. Dynamic Typing Concurrency Versioning Distribution.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.
Computer Science 1620 Programming & Problem Solving.
ISBN Lecture 01 Preliminaries. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.1-2 Lecture 01 Topics Motivation Programming.
Chapter 2: Algorithm Discovery and Design
CS0007: Introduction to Computer Programming Introduction to Arrays.
Computer Science 101 Introduction to Programming.
Language Evaluation Criteria
MT311 Java Application Programming and Programming Languages Li Tak Sing ( 李德成 )
McLab Tutorial Laurie Hendren, Rahul Garg and Nurudeen Lameed Other McLab team members: Andrew Casey, Jesse Doherty, Anton Dubrau,
Programming Languages and Paradigms Object-Oriented Programming.
Abstraction IS 101Y/CMSC 101 Computational Thinking and Design Tuesday, September 17, 2013 Carolyn Seaman University of Maryland, Baltimore County.
CS 355 – Programming Languages
McLab Tutorial Part 6 – Introduction to the McLab Backends MATLAB-to-MATLAB MATLAB-to-Fortran90 (McFor) McVM with JIT 6/4/2011Backends-
C++ Programming. Table of Contents History What is C++? Development of C++ Standardized C++ What are the features of C++? What is Object Orientation?
Imperative Programming
Introduction to MATLAB adapted from Dr. Rolf Lakaemper.
Operator Precedence First the contents of all parentheses are evaluated beginning with the innermost set of parenthesis. Second all multiplications, divisions,
Multi-Dimensional Arrays
CS 363 Comparative Programming Languages
CSC3315 (Spring 2009)1 CSC 3315 Programming Languages Hamid Harroud School of Science and Engineering, Akhawayn University
Programming Paradigms Procedural Functional Logic Object-Oriented.
Programming Language C++ Xulong Peng CSC415 Programming Languages.
Abstraction IS 101Y/CMSC 101 Computational Thinking and Design Tuesday, September 17, 2013 Marie desJardins University of Maryland, Baltimore County.
Arrays Module 6. Objectives Nature and purpose of an array Using arrays in Java programs Methods with array parameter Methods that return an array Array.
Object-Oriented Program Development Using Java: A Class-Centered Approach, Enhanced Edition.
1 © 2002, Cisco Systems, Inc. All rights reserved. Arrays Chapter 7.
1.  10% Assignments/ class participation  10% Pop Quizzes  05% Attendance  25% Mid Term  50% Final Term 2.
Looping and Counting Lecture 3 Hartmut Kaiser
C++ Programming Basic Learning Prepared By The Smartpath Information systems
Understanding Data Types and Collections Lesson 2.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
CPS120: Introduction to Computer Science Lecture 15 Arrays.
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
Vineet Kumar and Laurie Hendren McGill University MiX10 First steps to compiling MATLAB to X10.
Fortress Aaron Becker Abhinav Bhatele Hassan Jafri 2 May 2006.
©SoftMoore ConsultingSlide 1 Generics “Generics constitute the most significant change in the Java programming language since the 1.0 release.” – Cay Horstmann.
Euro-Par, 2006 ICS 2009 A Translation System for Enabling Data Mining Applications on GPUs Wenjing Ma Gagan Agrawal The Ohio State University ICS 2009.
1 Becoming More Effective with C++ … Day Two Stanley B. Lippman
CIS 601 Fall 2003 Introduction to MATLAB Longin Jan Latecki Based on the lectures of Rolf Lakaemper and David Young.
MT311 Java Application Development and Programming Languages Li Tak Sing ( 李德成 )
Improving Matlab Performance CS1114
1 HPJAVA I.K.UJJWAL 07M11A1217 Dept. of Information Technology B.S.I.T.
CIS 595 MATLAB First Impressions. MATLAB This introduction will give Some basic ideas Main advantages and drawbacks compared to other languages.
Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)
Chapter 1: Preliminaries Lecture # 2. Chapter 1: Preliminaries Reasons for Studying Concepts of Programming Languages Programming Domains Language Evaluation.
LESSON 8: INTRODUCTION TO ARRAYS. Lesson 8: Introduction To Arrays Objectives: Write programs that handle collections of similar items. Declare array.
CS 326 Programming Languages, Concepts and Implementation
Introduction to MATLAB
Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz
Tonga Institute of Higher Education IT 141: Information Systems
Simulation And Modeling
Tonga Institute of Higher Education IT 141: Information Systems
An Orchestration Language for Parallel Objects
Presentation transcript:

Vineet Kumar and Laurie Hendren McGill University MiX10 Compiling MATLAB for High Peformance Computing

Why? Why not ! motivation and challenges 1

Why MATLAB to X10? I wish my program could run faster ! I wish I could make better use of that super computer ! I wish I had time to learn that cool new language I read about! What do I do about all the programs that are already written ?

Captain MiX10 comes to rescue Keep programming in MATLAB and translate your MATLAB programs to X10 Run your programs faster No need to learn X10 Make good use of your supercomputing resources

Over 1 million MATLAB users in 2004 and numbers doubling every 1.5 to 2 years. Even more MATLAB users who use free systems Octave or SciLab million monthly google searches for “MATLAB”. Users from disciplines in science, engineering and economics in academia as well as industry. Why do we care about MATLAB?

Award-winning next generation parallel programming language by IBM. Designed for “Performance and Productivity at scale”. Provides C++ and Java backends. Why X10 as target? MiX10

MATLAB no formal language specification dynamically typed flexible syntax unconventional semantics everything is a matrix huge builtin library X10 Object-oriented Statically typed Parallel programming The job’s not easy MiX10

2 What? Under the hood Technical challenges and solutions

The McLab project overview (recap)

MiX10 IR generator Code transformations Code printer Builtin handler Mix10.x10.x10 Tamer IR Callgraph Analyses Mix10 IR Transformed IR MiX10 builtins.xml

Compiling for High Performance Major Challenges

Key technical challenges Efficient compilation of arrays Handling existing MATLAB concurrency controls and introducing new concurrency controls Efficient framework for handling builtin methods

X10 Arrays Rail Intrinsic fixed-size one dimensional array. Indexed by a Long type value starting at 0. Similar to a C/C++ array Abstractions for multi-dimensional arrays Region arrays Dynamic and flexible but slow. Simple arrays Static and restricted but fast.

X10 Arrays – Simple arrays Dense rectangular arrays with zero-based indexing. Support for only up to three dimensions. Require shape to be defined at compile-time. Internally elements are backed on a rail in a row-major fashion. These restrictions allow for efficient optimizations for array indexing.

X10 Arrays – Simple arrays Default: Row-major order Our enhancement: Column-major order (0,0):1(0,1):7(0,2):8(0,3):4 (1,0):-3(1,1):6(1,2):9(1,3):5 (2,0):0(2,1):-4(2,2):-2(2,3): … 2D Array Rail … 2D Array Rail (0,0):1(0,1):7(0,2):8(0,3):4 (1,0):-3(1,1):6(1,2):9(1,3):5 (2,0):0(2,1):-4(2,2):-2(2,3):-3

X10 Arrays – Region arrays Collection of data elements mapped to a collection (region) of indices (points). A point is an indexing unit of the array, represented by a tuple of integers. A region is a collection of points of same rank. (3,1)(2,2)(-3,1)(4,0) (2,5)(3,2)(4,7)(-5,2) (3,0)(4,3)(5,1)(6,3) Values mapped to region: Array A collection of points: Region (3,1):1(2,2):7(-3,1):8(4,0):4 (2,5):-3(3,2):6(4,7):9(-5,2):5 (3,0):0(4,3):-4(5,1):-2(6,3):-3

X10 Arrays – Region arrays Flexibility of shape and indexing. A rich set of API methods. No need to declare shape statically. Flexibility comes at a cost of performance.

An example 4-D 3-D

Our Strategy Gather as much static information as possible. Use simple arrays if shapes of all arrays Are known statically. Remain same at all points in the program. Are supported by X10 simple arrays Else, use region arrays. If using region arrays, statically specify arrays’ ranks, if known. Extend the X10 compiler to provide helper methods for easy sub-array access (use of ‘:’ operator).

X10 Concurrency constructs Fine-grained concurrency async S – Create asynchronous concurrent activities. Sequencing finish S – Wait for all transitively created concurrent activities by S. Place-shifting operations at (P) S – execute S at processing unit P. Atomicity atomic S, when (cond) S – Execute S atomically with respect to other atomic statements.

Introducing Concurrency constructs in MATLAB

MATLAB parfor Introduce finish and async constructs to control the concurrent flow of execution. Any variable, not defined outside the for loop is declared local to the async block. For any variable that is defined outside the loop and is not a reduction variable, a local copy is made for each concurrent iteration. All reduction statements are made atomic.

Parallelizing vector instructions Loop vectorization is a suggested optimization technique in MATLAB to replace loop-based scalar-oriented code by a vector operation. Depending on the type of operation and size of the vector, we can benefit by breaking down the operation into a set of concurrent operations on parts of the vector. Operation

Parallelizing vector instructions We implement concurrent versions of relevant operations by extending our builtin framework. Users can use –vec_par_length switch to specify a threshold size of the input vector for all or specific builtins beyond which the concurrent version of the builtin will be invoked. Example: -vec_par_length all=500 sin=1000 cos=1000 will invoke concurrent version of sin and cos only if size of input vector is >1000. For all other builtins, concurrent version will be invoked for input vector size >500.

Builtin framework – design goals Provide an easy way to extend for specialized implementations of builtin methods. Specialization for concurrency. Specialization for column vector operations. Provide an easy way for programmers to make custom modifications to builtin implementations. Keep readability in mind.

Builtin methods overloading 4 overloaded methods only for real values for region arrays. Another 4 for complex numbers.

Builtin methods overloading Different implementations for simple arrays. Even more implementations for specialized versions.

Should we have all the possible overloaded methods for every builtin used in the generated code ?

That’s what Builtin handler solves! Template based specialization framework. Easily extensible for specializations. Generates only required overloaded versions based on the kind of arrays, specialization used and type of input arguments. Creates a separate class (Mix10.x10). Improves readability. Allows custom modifications.

3 Wow! Some preliminary results

Compilation flow Java, C++X10MATLAB MiX10 X10c javac x10c++g++ Managed backend Native backend -O, -NO_CHECKS

Native backend(C++) We achieved over 20% performance improvement with X10 sequential code compared to MATLAB. Builtin specialization is important (nb1d_a). Complex number operations are really efficient with C++ backend (mbrt).

Managed backend(Java) Simple arrays perform much better compared to region arrays for 2-D arrays not involving stencil operations. Providing static information with region arrays improves performance by around 30%. (again)Builtin specialization is important to improve performance. Note capr and dich with –O flag turned on …

“The JIT is very unhappy” Optimizer triggered code inlining. Resultant code too big for JIT compiler to compile and it switched to interpreter. Static rank declaration eliminated runtime rank checks. Reduced code size for capr enough for JIT compiler to compile but dich was still too large. (again) Static rank declaration gave significant performance improvements for other benchmarks (upto 30% depending on number of array accesses)

Concurrency performance results Over 2 to 20 times speedup for C++ and 5 to 16 times speedup for Java compared to sequential MATLAB and around 2 times compared to parallel MATLAB.

Concurrency performance results Over 30 to 90 times speedup for C++ and 10 to 70 times speedup for Java compared to sequential MATLAB and compared to parallel MATLAB, around 10 times for C++ and 6 times for Java. Slowdowns for large number of parallel activities, each with small computation.

Thank You

Acknowledgements NSERC for supporting this research, in part David Grove for helping us validate and understand some results Anton Dubrau for his help in using Tamer Xu Li for providing valuable suggestions

function[x,y] = rgbFilter(h,w,c,p,g) % h : height of the image % w : width of the image % c : colour of the filter % p : pixel matrix of the image % g : gradient factor filter=ones(h,w); filter=filter*c; %applyFilter=filter; x = i; % for i=1:w %x = p(:,i); %x = x+gradient(w,g); %end x = p; y = applyFilter(p,filter); end McSAF Static Analysis Framework function[x,y] = rgbFilter(h,w,c,p,g) % h : height of the image % w : width of the image % c : colour of the filter % p : pixel matrix of the image % g : gradient factor filter=ones(h,w); filter=filter*c; %applyFilter=filter; %x = i; for i=1:w x = p(:,i); x = x+gradient(w,g); end x = p; y = applyFilter(p,filter); end What happens if we uncomment %applyFilter=filter; ? Is this an error ? McSAF A low-level IR Kind analysis y = applyFilter(p,filter); becomes an array access No, it’s a call to the builtin i();

Tamer function[x,y] = rgbFilter(h,w,c,p,g) % h : height of the image % w : width of the image % c : colour of the filter % p : pixel matrix of the image % g : gradient factor filter=ones(h,w); filter=filter*c; %applyFilter=filter; %x = i; for i=1:w x = p(:,i); x = x+gradient(w,g); end x = p; y = applyFilter(p,filter); end Tamer Very low-level IR Callgraph Type analysis Shape analysis IsComplex analysis

X10 as a target language Nice X10 features

The type ‘Any’ function[x,y] = rgbFilter(h,w,c,p,g) % h : height of the image % w : width of the image % c : colour of the filter % p : pixel matrix of the image % g : gradient factor filter=ones(h,w); filter=filter.* c; %applyFilter=filter; %x = i; for i=1:w x = p(:,i); x = x+gradient(w,g); end x = p; y = applyFilter(p,filter); end return [x as Any, y as Any]; Same idea also used for Cell Arrays

Point and Region API function[x,y] = rgbFilter(h,w,c,p,g) % h : height of the image % w : width of the image % c : colour of the filter % p : pixel matrix of the image % g : gradient factor filter=ones(h,w); filter=filter.* c; %applyFilter=filter; %x = i; for i=1:w x = p(:,i); x = x+gradient(w,g); end x = p; y = applyFilter(p,filter); end mix10_pt_p = Point.make(0, 1-(i as Int)); mix10_ptOff_p = p; x = new Array[Double]( ((p.region.min(0))..(p.region.max(0)))*(1..1), (p:Point(2))=> mix10_ptOff_p(p.operator-(mix10_pt_p))); mix10_pt_p = Point.make(0, 1-(i as Int)); mix10_ptOff_p = p; x = new Array[Double]( ((p.region.min(0))..(p.region.max(0)))*(1..1), (pt:Point(2))=> mix10_ptOff_p(pt.operator-(mix10_pt_p))); Works even when shape is unknown at compile time

Native backend

Managed backend

Frontend McSAFTamerX10 backend Fortran backend Dynamic analysis McVM MATLAB Aspect MATLAB McIR McSAF IR, Kind analysis Tame IR, Callgraph, Analyses The McLab project overview

Source code Apply gradient and filter to an image function[x,y] = rgbFilter(h,w,c,p,g) % h : height of the image % w : width of the image % c : colour of the filter % p : pixel matrix of the image % g : gradient factor filter=ones(h,w); filter=filter*c; %applyFilter=filter; %x = i; for i=1:w x = p(:,i); x = x+gradient(w,g); end x = p; y = applyFilter(p,filter); end Create a filter Apply gradient Apply filter

function[x,y] = rgbFilter(h,w,c,p,g) % h : height of the image % w : width of the image % c : colour of the filter % p : pixel matrix of the image % g : gradient factor filter=ones(h,w); filter=filter*c; %applyFilter=filter; x = i; % for i=1:w %x = p(:,i); %x = x+gradient(w,g); %end x = p; y = applyFilter(p,filter); end McSAF Static Analysis Framework function[x,y] = rgbFilter(h,w,c,p,g) % h : height of the image % w : width of the image % c : colour of the filter % p : pixel matrix of the image % g : gradient factor filter=ones(h,w); filter=filter*c; %applyFilter=filter; %x = i; for i=1:w x = p(:,i); x = x+gradient(w,g); end x = p; y = applyFilter(p,filter); end McSAF Low level IR Kind analysis Is a parameterized expression array or function call ?

Tamer function[x,y] = rgbFilter(h,w,c,p,g) % h : height of the image % w : width of the image % c : colour of the filter % p : pixel matrix of the image % g : gradient factor filter=ones(h,w); filter=filter*c; %applyFilter=filter; %x = i; for i=1:w x = p(:,i); x = x+gradient(w,g); end x = p; y = applyFilter(p,filter); end What are the types of h, w, c, p and g ? Is ones(h,w) builtin or user-defined? What is the shape of filter ? Is g real or complex ? What are the types of x and y ? Tamer

Frontend McSAFTamerX10 backend Fortran backend Dynamic analysis McVM MATLAB Aspect MATLAB McIR McSAF IR, Kind analysis Tame IR, Callgraph, Analyses The McLab project overview

X10 Arrays “Rail” is the intrinsic fixed-size one dimensional array, indexed by a Long type value starting at 0. Region arrays A region is a collection of points of same rank. A point is an indexing unit of the array. A region array is a set of elements mapped to points in the underlying region. Flexibility of shape and indexing. Flexibility comes at a cost of performance. Simple arrays Dense rectangular arrays with zero-based indexing. Support only upto three dimensions. Internally elements are backed on a rail in a row-major fashion. These restrictions allow for efficient optimizations for array indexing.

Benchmarks bubble - bubble sort capr - computes the capacitance per unit length of a coaxial pair of rectangles dich - finds the Dirichlet solution to Laplace’s equation fiff - a finite difference solution to a wave equation mbrt - computes a mandelbrot set nb1d - simulates the gravitational movement of a set of objects in 1 dimension

In this talk 1WHY? 2WHAT? 3WOW!