Comparison of Array Operation Synthesis and Straightforward Compilation FORALL (I=1:N:1; J=1:N:1) IF (1<=I<=N-1) and (1<=J<=N) THEN T1 (I,J)=A(I+1,J) ELSE.

Slides:



Advertisements
Similar presentations
Memory.
Advertisements

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois A Proposal of Operation History Management System for Source-to-Source Optimization.
Implementation of 2-D FFT on the Cell Broadband Engine Architecture William Lundgren Gedae), Kerry Barnes (Gedae), James Steed (Gedae)
Array Operation Synthesis to Optimize Data Parallel Programs Department of Computer Science, National Tsing-Hua University Student:Gwan-Hwan Hwang Advisor:
Computational Methods of Scientific Programming Lecturers Thomas A Herring, Room A, Chris Hill, Room ,
CMPT 300: Operating Systems I Dr. Mohamed Hefeeda
1 School of Computing Science Simon Fraser University CMPT 300: Operating Systems I Dr. Mohamed Hefeeda.
Complexity Analysis (Part I)
Idiom Recognition in the Polaris Parallelizing Compiler Bill Pottenger and Rudolf Eigenmann Presented by Vincent Yau.
Chapter 7 Introduction to Arrays Part I Dr. Ali Can Takinacı İstanbul Technical University Faculty of Naval Architecture and Ocean Engineering İstanbul.
Chapter 3.2 : Virtual Memory
CSE 421 Algorithms Richard Anderson Lecture 16 Dynamic Programming.
Parallelizing Compilers Presented by Yiwei Zhang.
Lecture 7 Sept 19, 11 Goals: two-dimensional arrays (continued) matrix operations circuit analysis using Matlab image processing – simple examples Chapter.
Concept of Basic Time Complexity Problem size (Input size) Time complexity analysis.
ECIV 301 Programming & Graphics Numerical Methods for Engineers Lecture 3 Programming and Software.
Concatenation MATLAB lets you construct a new vector by concatenating other vectors: – A = [B C D... X Y Z] where the individual items in the brackets.
Basic PRAM algorithms Problem 1. Min of n numbers Problem 2. Computing a position of the first one in the sequence of 0’s and 1’s.
CE 311 K - Introduction to Computer Methods Daene C. McKinney
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Chapter Complexity of Algorithms –Time Complexity –Understanding the complexity of Algorithms 1.
Compiler, Languages, and Libraries ECE Dept., University of Tehran Parallel Processing Course Seminar Hadi Esmaeilzadeh
CS412/413 Introduction to Compilers Radu Rugina Lecture 15: Translating High IR to Low IR 22 Feb 02.
Cosc 3P92 Week 9 & 10 Lecture slides
Memory Management. Process must be loaded into memory before being executed. Memory needs to be allocated to ensure a reasonable supply of ready processes.
Data Structures & AlgorithmsIT 0501 Algorithm Analysis I.
1 Hardware Support for Collective Memory Transfers in Stencil Computations George Michelogiannakis, John Shalf Computer Architecture Laboratory Lawrence.
Lecture 7 – Data Reorganization Pattern Data Reorganization Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
1 Chapter 3.2 : Virtual Memory What is virtual memory? What is virtual memory? Virtual memory management schemes Virtual memory management schemes Paging.
Chapter 10 A Algorithm Efficiency. © 2004 Pearson Addison-Wesley. All rights reserved 10 A-2 Determining the Efficiency of Algorithms Analysis of algorithms.
1 Fly – A Modifiable Hardware Compiler C. H. Ho 1, P.H.W. Leong 1, K.H. Tsoi 1, R. Ludewig 2, P. Zipf 2, A.G. Oritz 2 and M. Glesner 2 1 Department of.
Experiences with Enumeration of Integer Projections of Parametric Polytopes Sven Verdoolaege, Kristof Beyls, Maurice Bruynooghe, Francky Catthoor Compiler.
8.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles Implementation of Page Table Page table is kept in main memory Page-table base.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 3, 2005 Session 7.
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
2013/12/09 Yun-Chung Yang Partitioning and Allocation of Scratch-Pad Memory for Priority-Based Preemptive Multi-Task Systems Takase, H. ; Tomiyama, H.
Outline Announcements: –HW III due Friday! –HW II returned soon Software performance Architecture & performance Measuring performance.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
Spring 2003CSE P5481 Advanced Caching Techniques Approaches to improving memory system performance eliminate memory operations decrease the number of misses.
Solving equations with polynomials – part 2. n² -7n -30 = 0 ( )( )n n 1 · 30 2 · 15 3 · 10 5 · n + 3 = 0 n – 10 = n = -3n = 10 =
1 Array Operation Synthesis to Optimize Data Parallel Programs Speaker : Gwan-Hwan Hwang (黃冠寰), Ph.D. Associate Professor Department of Information and.
Assembly - Arrays תרגול 7 מערכים.
An Evaluation of Data-Parallel Compiler Support for Line-Sweep Applications Daniel Chavarría-Miranda John Mellor-Crummey Dept. of Computer Science Rice.
1 Overview of Programming Principles of Computers.
8.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Fragmentation External Fragmentation – total memory space exists to satisfy.
HPF (High Performance Fortran). What is HPF? HPF is a standard for data-parallel programming. Extends Fortran-77 or Fortran-90. Similar extensions exist.
Computational Methods of Scientific Programming Lecturers Thomas A Herring, Room A, Chris Hill, Room ,
1 HPJAVA I.K.UJJWAL 07M11A1217 Dept. of Information Technology B.S.I.T.
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
Heterogeneous Computing using openCL lecture 4 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.
Ada, Scheme, R Emory Wingard. Ada History Department of Defense in search of high level language around Requirements drafted for the language.
Main Memory: Paging and Segmentation CSSE 332 Operating Systems Rose-Hulman Institute of Technology.
Complexity Analysis (Part I)
PreCalculus Section 14.3 Solve linear equations using matrices
Selective Code Compression Scheme for Embedded System
6-2 Solving Systems using Substitution
Benjamin Goldberg Compiler Verification and Optimization
STUDY AND IMPLEMENTATION
Lecture 3: Main Memory.
CSE451 Memory Management Introduction Autumn 2002
Pima Medical Institute Online Education
TARGET CODE -Next Usage
Pima Medical Institute Online Education
Complexity Analysis (Part I)
Complexity Analysis (Part I)
Presentation transcript:

Comparison of Array Operation Synthesis and Straightforward Compilation FORALL (I=1:N:1; J=1:N:1) IF (1<=I<=N-1) and (1<=J<=N) THEN T1 (I,J)=A(I+1,J) ELSE T1 (I,J)=0 ENDFORALL FORALL (I=1:N:1; J=1:N:1) T2 (I,J)= T1 (J,I) ENDFORALL FORALL (I=1:N:1; J=1:N:1) IF (1<=I<=N-1) and (1<=J<=N) THEN B(I,J)= T2 (I+1,J) ELSE B(I,J)= T2 (I-N,J) ENDFORALL FORALL (I=1:N-1:1; J=1:N-1:1) B (I,J)=A(J+1,I+1) ENDFORALL FORALL (I=1:N-1:1;J=1:N:N) B (I,J)= 0 ENDFORALL FORALL (I=N:N:1;J=1:N-1:1) B (I,J)=A(J+1,I-N+1) ENDFORALL FORALL (I=N:N:1;J=1:N:N) B (I,J)= 0 ENDFORALL Code with Array Operation Synthesis Code by straightforward compilation

Synthesis Array Expression S1 and S2 separately Synthesis Array Expression S1 and S2 collectively SPREAD: one-to-many data movement function FORALL (I=1:N:1,J=1:M:1) A(I)=SIN(SQRT(B(I)+0.5)+COS(C(I)))+D(I,J) END FORALL Synthesis Anomaly REAL A(N),B(N),C(N),T(N,M),D(N,M) A=SIN(SQRT(B+0.5)+COS(C)) T=SPREAD(A,DIM=2,NCOPIES=M)+D Statement S1 Statement S2 FORALL (I=1:N:1) A(I)=SIN(SQRT(B(I)+0.5)+COS(C(I))) END FORALL FORALL (I=1:N:1,J=1:M:1) T(I,J)=A(I)+D(I,J) END FORALL N SIN,SQRT,COS 2*N+N*M addition N+N*M assignments N*M SIN,SQRT,COS 3*N*M addition N*M assignments

We propose a polynomial time Synthesis Anomaly Prevention algorithm Loop Interchange for more Synthesis Synthesis Anomaly(Cont’d)

Analysis of Array Operation Synthesis We prove Array Operation Synthesis can:  reduce the number of stores.  reduce the number of loads.  do not increase the required computations.

Advanced Techniques Optimization for Segmentation Descriptors with Coupled Index Functions Synthesis of Array Reduction and Location intrinsic operations Synthesis of WHERE CONSTRUCT

Contributions The first scheme which can synthesis the following Fortran 90 intrinsic array operations Array Section Movement, SPREAD, TRANSPOSE, EOSHIFT, CHIFT, MERGE WHERE CONSTRUCT, Array Reduction Functions(ALL,COUNT,MAXVAL) Array Location Functions(MAXLOC,MINLOC)

SYNTOOL An implementation of array operation as a web-based tool  Kernel Implemented in C  A Web Page + CGI program Perform source-to-source Array Operation Synthesis and return Fortran 90 or HPF program Available on WWW at

SYNTOOL Test Beds  Sequent S27 with 10 identical processors  SGI Power Challenge with 10 identical processors Seven test suites of Fortran 90 are used  last four program fragments are from real application codes Synthesis on Shared-Memory Systems CPU Cache CPU Cache CPU Cache CPU Cache Main Memory Shared Bus

Code Fragment 1 (CSHIFT, TRASPOSE, ADDITION, RESHAPE) Code Fragment 2 (Where construct) Experimental Results on Sequent (N=256)

Code Fragment 3 (EOSHIFT,MERGE RESHAPE, ADDITION) Code Fragment 4 (Purdue-set Problem 9) Experimental Results on Sequent (N=256)

Code Fragment 5 (APULE routine electromagnetic scattering problem) Code Fragment 6 (Sandia Wave) Experimental Results on Sequent (N=256)

Code Fragment 7 (Linear Equation Solve) Experimental Results on Sequent (N=256)

Experimental Results on SGI Power Challenge (N=512) Code Fragment 4 (Purdue-set Problem 9) Code Fragment 5 (APULE routine electromagnetic scattering problem)

Experimental Results on SGI Power Challenge (N=512) Code Fragment 6 (Sandia Wave) Code Fragment 7 (Linear Equation Solve)