CS 201 Compiler Construction

Slides:



Advertisements
Similar presentations
Dependency Test in Loops By Amala Gandhi. Data Dependence Three types of data dependence: 1. Flow (True) dependence : read-after-write int a, b, c; a.
Advertisements

Optimizing Compilers for Modern Architectures Coarse-Grain Parallelism Chapter 6 of Allen and Kennedy.
1 ECE734 VLSI Arrays for Digital Signal Processing Loop Transformation.
Dependence Precedence. Precedence & Dependence Can we execute a 1000 line program with 1000 processors in one step? What are the issues to deal with in.
COSC513 Operating System Research Paper Fundamental Properties of Programming for Parallelism Student: Feng Chen (134192)
Using the Iteration Space Visualizer in Loop Parallelization Yijun YU
Optimizing Compilers for Modern Architectures Allen and Kennedy, Chapter 13 Compiling Array Assignments.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
A Process Splitting Transformation for Kahn Process Networks Sjoerd Meijer.
Instruction-Level Parallel Processors {Objective: executing two or more instructions in parallel} 4.1 Evolution and overview of ILP-processors 4.2 Dependencies.
Carnegie Mellon Lecture 7 Instruction Scheduling I. Basic Block Scheduling II.Global Scheduling (for Non-Numeric Code) Reading: Chapter 10.3 – 10.4 M.
Programmability Issues
1 CS 240A : Numerical Examples in Shared Memory with Cilk++ Matrix-matrix multiplication Matrix-vector multiplication Hyperobjects Thanks to Charles E.
Optimizing single thread performance Dependence Loop transformations.
EECC756 - Shaaban #1 lec # 1 Spring Systolic Architectures Replace single processor with an array of regular processing elements Orchestrate.
Eliminating Stalls Using Compiler Support. Instruction Level Parallelism gcc 17% control transfer –5 instructions + 1 branch –Reordering among 5 instructions.
1 ILP (Recap). 2 Basic Block (BB) ILP is quite small –BB: a straight-line code sequence with no branches in except to the entry and no branches out except.
Chapter 4 Advanced Pipelining and Intruction-Level Parallelism Computer Architecture A Quantitative Approach John L Hennessy & David A Patterson 2 nd Edition,
Compiler Challenges for High Performance Architectures
Dependence Analysis Kathy Yelick Bebop group meeting, 8/3/01.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Compiler Challenges, Introduction to Data Dependences Allen and Kennedy, Chapter 1, 2.
Addressing Optimization for Loop Execution Targeting DSP with Auto-Increment/Decrement Architecture Wei-Kai Cheng Youn-Long Lin* Computer & Communications.
Stanford University CS243 Winter 2006 Wei Li 1 Data Dependences and Parallelization.
Optimizing Compilers for Modern Architectures Coarse-Grain Parallelism Chapter 6 of Allen and Kennedy.
(Page 554 – 564) Ping Perez CS 147 Summer 2001 Alternative Parallel Architectures  Dataflow  Systolic arrays  Neural networks.
Презентація за розділом “Гумористичні твори”
Центр атестації педагогічних працівників 2014
Галактики і квазари.
Характеристика ІНДІЇ.
Процюк Н.В. вчитель початкових класів Боярської ЗОШ І – ІІІ ст №4
1 Matrix Addition, C = A + B Add corresponding elements of each matrix to form elements of result matrix. Given elements of A as a i,j and elements of.
Synchronization (Barriers) Parallel Processing (CS453)
Array Dependence Analysis COMP 621 Special Topics By Nurudeen Lameed
Array Cs212: DataStructures Lab 2. Array Group of contiguous memory locations Each memory location has same name Each memory location has same type a.
Thread-Level Speculation Karan Singh CS
Two –Dimensional Arrays Mrs. C. Furman September 18, 2008.
Духовні символи Голосіївського району
Program Analysis & Transformations Loop Parallelization and Vectorization Toheed Aslam.
HPF (High Performance Fortran). What is HPF? HPF is a standard for data-parallel programming. Extends Fortran-77 or Fortran-90. Similar extensions exist.
An Overview of Parallel Processing
Channels. Models for Communications Synchronous communications – E.g. Telephone call Asynchronous communications – E.g. .
Conception of parallel algorithms
Dependence Analysis Important and difficult
Think What will be the output?
Presented by: Huston Bokinsky Ying Zhang 25 April, 2013
Проф. д-р Васил Цанов, Институт за икономически изследвания при БАН
ЗУТ ПРОЕКТ на Закон за изменение и допълнение на ЗУТ
О Б Щ И Н А С И Л И С Т Р А П р о е к т Б ю д ж е т г.
Електронни услуги на НАП
Боряна Георгиева – директор на
РАЙОНЕН СЪД - БУРГАС РАБОТНА СРЕЩА СЪС СЪДЕБНИТЕ ЗАСЕДАТЕЛИ ПРИ РАЙОНЕН СЪД – БУРГАС 21 ОКТОМВРИ 2016 г.
Сътрудничество между полицията и другите специалисти в България
Съобщение Ръководството на НУ “Христо Ботев“ – гр. Елин Пелин
НАЦИОНАЛНА АГЕНЦИЯ ЗА ПРИХОДИТЕ
ДОБРОВОЛЕН РЕЗЕРВ НА ВЪОРЪЖЕНИТЕ СИЛИ НА РЕПУБЛИКА БЪЛГАРИЯ
Съвременни софтуерни решения
ПО ПЧЕЛАРСТВО ЗА ТРИГОДИШНИЯ
от проучване на общественото мнение,
Васил Големански Ноември, 2006
Програма за развитие на селските райони
ОПЕРАТИВНА ПРОГРАМА “АДМИНИСТРАТИВЕН КАПАЦИТЕТ”
БАЛИСТИКА НА ТЯЛО ПРИ СВОБОДНО ПАДАНЕ В ЗЕМНАТА АТМОСФЕРА
МЕДИЦИНСКИ УНИВЕРСИТЕТ – ПЛЕВЕН
Стратегия за развитие на клъстера 2015
Моето наследствено призвание
Правна кантора “Джингов, Гугински, Кючуков & Величков”
Безопасност на движението
Visual Programming Week # 10
Matrix Addition, C = A + B Add corresponding elements of each matrix to form elements of result matrix. Given elements of A as ai,j and elements of B as.
Presentation transcript:

CS 201 Compiler Construction Array Dependence Analysis & Loop Parallelization

Loop Parallelization Goal: Identify loops whose iterations can be executed in parallel on different processors of a shared-memory multiprocessor system. Matrix Multiplication for I = 1 to n do -- parallel for J = 1 to n do -- parallel for K = 1 to n do –- not parallel C[I,J] = C[I,J] + A[I,K]*B[K,J]

Data Dependences Flow Dependence: S1: X = …. S2: … = X Anti Dependence: S1: … = X S2: X = … Output Dependence: S1: X = … S1 δf S2 S1 δa S2 S1 δo S2 S1 S2 δf S1 S2 δa S1 S2 δo

Example: Data Dependences do I = 1, 40 S1: A(I+1) = …. S2: … = A(I-1) enddo S1: A(I-1) = … S2: … = A(I+1) S1: A(I+1) = … S2: A(I-1) = … S1 S2 δf S2 S1 δa S1 S2 δo

Sets of Dependences δf Due to A() do I = 1, 100 S: A(I) = B(I+2) + 1 T: B(I) = A(I-1) - 1 enddo S T δf Due to A() S(1): A(1) = B(3) + 1 T(1): B(1) = A(0) - 1 S(2): A(2) = B(4) + 1 T(2): B(2) = A(1) - 1 S(3): A(3) = B(5) + 1 T(3): B(3) = A(2) - 1 ………….. S(100): A(100) = B(102) + 1 T(100): B(100) = A(99) - 1 Set of iteration pairs associated with this dependence: {(i,j): j=i+1, 1<=i<=99} Dependence distance: j-i=1 constant in this case.

Nested Loops level 1: do I1 = 1, 100 level 2: do I2 = 1, 50 S: A(I1,I2) = A(I1,I2-1) + B(I1,I2) enddo Value computed by S in an iteration (i1,i2) is same as the value used in an iteration (j1,j2): A(i1,i2)A(j1,j2-1) iff i1=j1 and i2=j2-1 S is flow dependent on itself at level 2 (corresponds to inner loop) for fixed value of I1 dependence exists between different iterations of second loop (inner loop). iteration pairs: {((i1,i2),(j1,j2)): j1=i1, j2=i2+1, 1<=i1<=100, 1<=i2<=49}

Nested Loops Contd.. {((i1,i2),(j1,j2)): j1=i1, j2=i2+1, Iteration pairs: {((i1,i2),(j1,j2)): j1=i1, j2=i2+1, 1<=i1<=100, 1<=i2<=49} Dependence distance vector: (j1-i1,j2-i2) = (0,1) There is no dependence at level 1.

Computing Dependences - Formulation level 1: do I1 = 1, 8 level 2: do I2 = max(I1-3,1), min(I1,5) A(I1+1,I2+1) = A(I1,I2) + B(I1,I2) enddo Can we find iterations (i1,i2) & (j1,j2) such that i1+1=j1; i2+1=j2 and 1<=i1<=8 1<=j1<=8 i1-3<=i2<=i1 j1-3<=j2<=j1 i1-3<=i2<=5 j1-3<=j2<=5 1<=i2<=i1 1<=j2<=j1 1<=i2<=5 1<=j2<=5 i1, i2, j1, j2 are integers.

Computing Dependences Contd.. Dependence testing is an integer programming problem  NP-complete Solutions trade-off Speed and Accuracy of the solver. False positives: imprecise tests may report dependences that actually do not exist – conservative is to report false positives but never miss a dependence. DependenceTests: extreme value test; GCD test; Generalized GCD test; Lambda test; Delta test; Power test; Omega test etc…

Extreme Value Test Approximate test which guarantees that a solution exists but it may be a real solution not an integer solution. Let f: Rn  R st f is bounded in a set S contained in Rn Let b be a lower bound of f in S and B be an upper bound of f in S: b ≤ f(ā) ≤ B for any ā ε S contained in Rn For a real number c, the equation f(ā) = c will have solutions iff b ≤ c ≤ B.

Extreme Value Test Contd.. Example: f: R2  R; f(x,y) = 2x + 3y S = {(x,y): 0<=x<=1 & 0<=y<=1} contained in R2 lower bound, b=0; upper bound, B=5 1. Is there a real solution to the equation 2x + 3y = 4 ? Yes. (0.5,1) is a solution, there are many others. 2. Is there an integer solution in S ? No. f(0,0)=0; f(0,1)=3; f(1,0)=2; & f(1,1)=5. For none of the integer points in S, f(x,y)=4. If there are no real solutions, there are no integer solutions. If there are real solutions, then integer solutions may or may not exist.

Extreme Value Test Contd.. Example: DO I = 1, 10 DO J = 1, 10 A[10*I+J-5] = ….A[10*I+J-10]…. 10*I1+J1-5 = 10*I2+J2-10 10*I1-10*I2+J1-J2 = -5 f: R4  R; f(I1,I2,J1,J2) = 10*I1-10*I2+J1-J2 1<=I1,I2,J1,J2<=10; lower bound, b=-99; upper bound, B=+99 since -99 <= -5 <= +99 there is a dependence.

Extreme Value Test Contd..

Extreme Value Test Contd..

Extreme Value Test Contd..

Extreme Value Test Contd..

Nested Loops – Multidimensional Arrays

Nested Loops Contd..