A Multiple Associative Model to Support Branches in Data Parallel Applications Wittaya Chantamas and Johnnie W. Baker Department of Computer Science Kent.

Slides:



Advertisements
Similar presentations
Introduction to C Programming
Advertisements

Models of Computation Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Analysis of Algorithms Week 1, Lecture 2.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
MATH 224 – Discrete Mathematics
Lecture 19: Parallel Algorithms
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Chapter 8 ICS 412. Code Generation Final phase of a compiler construction. It generates executable code for a target machine. A compiler may instead generate.
Analysis of Algorithms CS Data Structures Section 2.6.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
HST 952 Computing for Biomedical Scientists Lecture 10.
Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.
PRAM (Parallel Random Access Machine)
Efficient Parallel Algorithms COMP308
Advanced Topics in Algorithms and Data Structures Lecture 7.1, page 1 An overview of lecture 7 An optimal parallel algorithm for the 2D convex hull problem,
Efficient Representation of Data Structures on Associative Processors Jalpesh K. Chitalia (Advisor Dr. Robert A. Walker) Computer Science Department Kent.
An Associative Broadcast Based Coordination Model for Distributed Processes James C. Browne Kevin Kane Hongxia Tian Department of Computer Sciences The.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
Introduction to Analysis of Algorithms
Data Parallel Algorithms Presented By: M.Mohsin Butt
MASC The Multiple Associative Computing Model Johnnie Baker, Jerry Potter, Robert Walker Kent State University (
Efficient Associative SIMD Processing for Non-Tabular Data Jalpesh K. Chitalia and Robert A. Walker Computer Science Department Kent State University.
Program Design and Development
Multithreaded ASC Kevin Schaffer and Robert A. Walker ASC Processor Group Computer Science Department Kent State University.
Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt.
MASC The Multiple Associative Computing Model Johnnie Baker, Jerry Potter, Robert Walker Kent State University (
SIMD, Associative, and Multi-Associative Computing Computational Models and Algorithms.
MASC Model 1 Associative Computing Overview Introduction –Motivation for the MASC model –The MASC and ASC Models –Languages Designed for the ASC Model.
ASC Programming Michael C. Scherger Department of Computer Science Kent State University September 27, 2002.
Improving Code Generation Honors Compilers April 16 th 2002.
Parallel and Distributed IR
Chapter 1 Program Design
1 Lecture 3 PRAM Algorithms Parallel Computing Fall 2008.
Lecture 18 Last Lecture Today’s Topic Instruction formats
Simple Program Design Third Edition A Step-by-Step Approach
1. Reference  2  Algorithm :- Outline the essence of a computational procedure, step by step instructions.  Program :- an.
November 18, 2005 PACL and ASC Processor Research Overview 1 Research Overview Parallel and Associative Computing Group and the ASC Processor Group Kent.
Week 2 CS 361: Advanced Data Structures and Algorithms
Operator Precedence First the contents of all parentheses are evaluated beginning with the innermost set of parenthesis. Second all multiplications, divisions,
Chapter One Introduction to Pipelined Processors.
Analysis of Algorithms
Computer Science Department Data Structure & Algorithms Lecture 8 Recursion.
Lecture 2 Foundations and Definitions Processes/Threads.
Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,
Games Development 2 Concurrent Programming CO3301 Week 9.
CSC 211 Data Structures Lecture 13
These notes were originally developed for CpSc 210 (C version) by Dr. Mike Westall in the Department of Computer Science at Clemson.
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
1 Implementing An Associative Processor on FPGAs.
Time Parallel Simulations I Problem-Specific Approach to Create Massively Parallel Simulations.
Department of Computer Science and Software Engineering
Data Structures and Algorithms in Parallel Computing Lecture 1.
Concurrency Properties. Correctness In sequential programs, rerunning a program with the same input will always give the same result, so it makes sense.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved Chapter 23 Algorithm Efficiency.
Searching Topics Sequential Search Binary Search.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Chapter 7 Memory Management Eighth Edition William Stallings Operating Systems: Internals and Design Principles.
A Scalable Pipelined Associative SIMD Array With Reconfigurable PE Interconnection Network For Embedded Applications Hong Wang & Robert A. Walker Computer.
Introduction to Algorithms
CSC317 Selection problem q p r Randomized‐Select(A,p,r,i)
Analysis of Algorithms
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
Akshay Tomar Prateek Singh Lohchubh
Convex Hull 1/1/ :28 AM Convex Hull obstacle start end.
COMP60611 Fundamentals of Parallel and Distributed Systems
Introduction to Algorithms
SIMD, Associative, and Multi-Associative Computing
Presentation transcript:

A Multiple Associative Model to Support Branches in Data Parallel Applications Wittaya Chantamas and Johnnie W. Baker Department of Computer Science Kent State University, Kent, OHIO USA Telephone: (330) Fax: (330) and

MASC - Spring 2007 Outline SIMD and Branches Single Instruction Multiple Data (SIMD) A data parallel program contains branches MASC Computational Model Multiple Associative Computing (MASC) Model MASC model with manager-worker paradigm The power of MASC model With variations of MASC With other models ASC language compiler support for the MASC model MASC Algorithm Shapes algorithm Modified ASC Quick Hull algorithm for the MASC model with manager-worker paradigm

MASC - Spring 2007 SIMD and Branches Most SIMD computers allow masking of PEs while determining whether or not that PE should participate in the operation in a parallel IF-THEN-ELSE statement The THEN and ELSE parts have to be executed sequentially

MASC - Spring 2007 SIMD and Branches A traditional SIMD computer IF ( a parallel condition) THENStatement Block A ELSEStatement Block B Suppose we have 14 PEs

MASC - Spring 2007 SIMD and Branches A traditional SIMD computer IF ( a parallel condition) THENStatement Block A ELSEStatement Block B Suppose we have 14 PEs

MASC - Spring 2007 SIMD and Branches A traditional SIMD computer IF ( a parallel condition) THENStatement Block A ELSEStatement Block B Suppose we have 14 PEs

MASC - Spring 2007 SIMD and Branches A traditional SIMD computer IF ( a parallel condition) THENStatement Block A ELSEStatement Block B Suppose we have 14 PEs

MASC - Spring 2007 SIMD and Branches Question: How can we improve the execution of the branches if we can have more than one instruction stream? One probable answer: Execute each part of the branches simultaneously by using each of the instruction streams

MASC - Spring 2007 SIMD and Branches The MASC computational model IF ( a parallel condition) THENStatement Block A ELSEStatement Block B Suppose we have 14 PEs

MASC - Spring 2007 SIMD and Branches The MASC computational model IF ( a parallel condition) THENStatement Block A ELSEStatement Block B Suppose we have 14 PEs

MASC - Spring 2007 Outline SIMD and Branches Single Instruction Multiple Data (SIMD) A data parallel program contains branches MASC Computational Model Multiple Associative Computing (MASC) Model MASC model with manager-worker paradigm The power of MASC model With variations of MASC With other models ASC language compiler support for the MASC model MASC Algorithm Shapes algorithm Modified ASC Quick Hull algorithm for the MASC model with manager-worker paradigm

MASC - Spring 2007 MASC Computational Model An extension of the associative computing model or ASC ASC model was created to capture Aspro and Staran ‘s style of programming The associative properties Broadcast data in constant time Constant time global reduction of Boolean values using AND/OR Integer values using MAX/MIN Constant time data search Provides content addressable data Eliminates need for sorting and indexing. Pick one responder in constant time Supported in hardware with the broadcast and reduction network

MASC - Spring 2007 MASC Computational Model The basic components of the model A set of instruction streams An array of cells (PE + Memory) IS-to-Cell broadcast and reduction networks (one for each IS, preferable) An IS-to-IS Network A simple cell network

MASC - Spring 2007 MASC Using Manager-Worker Paradigm A variation of the MASC model Two types of ISs A manager-IS (ID 0 ) Managing the work pool of tasks Coordinating and assigning subtasks (using a FORK operation) Combining finished tasks (using a JOIN operation) from worker-ISs Identical worker-ISs (ID 1 to ID m) Executing tasks in an associative (e.g., data parallel, SIMD) fashion using the PEs currently assigned to it

MASC - Spring 2007 MASC Using Manager-Worker Paradigm The IS network An IS-to-IS broadcast/reduction network The manager-IS can use the network to perform a Min/MAX or logical reduction on ISs in constant time (pick one idle worker-IS) The cell network is optional

MASC - Spring 2007 MASC Using Manager-Worker Paradigm A Cell is a simple ALU + its local memory IS-Selector Register (an (lg m)-bit register if there are m ISs) Holding the instruction stream ID, to which that PE is currently listening The register can be set or reset by the instruction stream that the PE is listening to or by the data tested in that PE Task-History Stack (use the memory in each cell) Holding the task ID The default content is empty The top of the stack always shows the current task the PE is currently executing, the task that has just been finished, or a new task that has not yet assigned to a worker-IS At any point in time, each PE listens to exactly one IS

MASC - Spring 2007 MASC Using Manager-Worker Paradigm A task is broken down into subtasks No interaction between subtasks during their executions A FORK operation Generates one or more subtasks from a branch by partitioning PEs into group based on the parallel condition New task id will be push into the Task-History Stack Those subtasks will be assigned to worker-ISs to be executed concurrently by setting the IS-Selector register of PEs in the corresponding group to the ID of the worker-IS A JOIN operation Recombines subtasks into the original parent task (i.e., the one existing prior to the fork) after they have been successfully executed by popping top of the Task-History Stack

MASC - Spring 2007 MASC Using Manager-Worker Paradigm A work pool (WP-Q) Containing tasks ready to be executed Work Pool Worker-IS

MASC - Spring 2007 Fork Operation Wittaya Chantamas, 08/24/2004

MASC - Spring 2007 Join Operation Wittaya Chantamas, 08/24/2004

MASC - Spring 2007 The Power of MASC Model Among the variations of the MASC model, the original MASC model with a simple cell network (1-d, 2-d, or hypercube) has the same power as A MASC model without any cell network 1-d cell network can be simulated in the MASC without any cell network in O(1) with a polynomial blow-up in size (PEs and ISs) A proof of the 2-d and hypercube network case is similar to the case of 1-d cell network A MASC model with manager-worker paradigm (We believed! Need further proof.)

MASC - Spring 2007 The Power of MASC Model Comparing to other models, the MASC model has the same power as Basic and Segmenting Reconfigurable Multiple Bus Machine (RMBM) CRCW-PRAM A restriction version of RM A Mesh with Multiple Broadcasting (MMB) is less powerful than Fused and Extended RMBM Reconfigurable Mesh (RM) Linear Mesh (LM)

MASC - Spring 2007 ASC Language Compiler Support for the MASC Model The MASC model needs a multiple IS support from the ASC An extension of the ASC language compiler for the MASC model A MASC directive Concurrent data parallel executions of different paths in a branch can be achieved by using the directive /*.masc fork */ A user has a tight control Not all different paths in branches will be executed concurrently Only those in branches with directives will Considered as a comment by the ASC compiler (will show in.lst file, not show in.iob file) No need for a new ASC compiler in order to run an ASC program in MASC system Need another extension if wanted to add a parallel case statement support

MASC - Spring 2007 A parallel IF-THEN-ELSE statement in the ASC language IF condition expression THEN statement block A ELSE statement block B ENDIF;

MASC - Spring 2007 main test int parallel b[$], c[$], d[$]; logical parallel BCD[$]; associate b[$], c[$], d[$] with BCD[$]; read b[$] c[$] d[$] in BCD[$]; b[$] = c[$] + 2; c[$] = d[$] - 3; /* will be no fork here */ if (b[$].lt. c[$]) then b[$] = c[$]; d[$] = 4; else c[$] = b[$]; b[$] = d[$]; endif; c[$] = d[$]; d[$] = c[$]; end; M W M M W a structure code.MI_BEGIN W beg_of_stmt 1c beg_read 5a00 SYSOT BCD B,C,D, beg_read 5a00 SYSOT BCD B,C,D,… beg_of_stmt 1c beg_of_stmt 1c mvpa_ 4812 C D mvpa_ 4812 C D.MI_END W M

MASC - Spring 2007 A parallel IF-THEN-ELSE statement in the ASC language /*.MASC fork */ IF condition expression THEN statement block A ELSE statement block B ENDIF;

MASC - Spring 2007 main test int parallel b[$], c[$], d[$]; logical parallel BCD[$]; associate b[$], c[$], d[$] with BCD[$]; read b[$] c[$] d[$] in BCD[$]; b[$] = c[$] + 2; c[$] = d[$] - 3; /*.MASC FORK */ if (b[$].lt. c[$]) then b[$] = c[$]; d[$] = 4; else c[$] = b[$]; b[$] = d[$]; endif; c[$] = d[$]; d[$] = c[$]; end; M W M W W W111 X100 M111 X110 M W W M W111 X100 M111 X110 a structure code.MI_BEGIN W beg_of_stmt 1c mvpa_ 4812 B C beg_of_stmt 1c mvpa_ 4812 B C beg_of_stmt 1c mvpa_ 4812 D B mvpa_ 4812 D B.MI_END W W

MASC - Spring 2007 Outline SIMD and Branches Single Instruction Multiple Data (SIMD) A data parallel program contains branches MASC Computational Model Multiple Associative Computing (MASC) Model MASC model with manager-worker paradigm The power of MASC model With variations of MASC With other models ASC language compiler support for the MASC model MASC Algorithm Shapes algorithm Modified ASC Quick Hull algorithm for the MASC model with manager-worker paradigm

MASC - Spring 2007 Shape Problem The testing problem To compute area of basic shapes in a database Can use the MASC model to solve this problem Each type of shapes required different equation to compute the area Areas of each shape types can be compute simultaneously by partitioning PEs in to groups (triangle, rectangle, or circle) and using one IS to compute the areas for each group

MASC - Spring 2007 MASC Quick Hull Algorithm The convex hull problem The convex hull of a set of points S is the smallest convex set containing S. In particular, each point of set S is either on the boundary of or in the interior of the convex hull Modified ASC Quick Hull algorithm for the MASC model with a limited number of ISs and using manager-worker paradigm with work pool

MASC - Spring 2007 MASC Quick Hull Algorithm Algorithm MASC Quick Hull (for the upper hull) Input: A set of points S given as (x,y) coordinates, each PE holds one point in S Output: vertices of the upper convex hull The manager assigns the initialization task (i.e., task 0) to a worker IS to find two extreme points, X- min point (w) and X-max point (e) Two points (w and e) in the convex hull are identified The manager creates task we and places it in the work pool. The PEs associated with this task are the ones whose point lies above segment we

MASC - Spring 2007 MASC Quick Hull Algorithm The manager assigns each task pq in the work pool to a worker IS to find another point in the convex hull using the PEs assigned to this task. Another point (r) in the convex hull is identified The manager places task pr and task rq in the work pool. The PEs associated with each task are the ones whose point lies above corresponding line segment The manager continues to execute 2 steps above until there are no active tasks and no tasks remain in the work pool, and then terminates the algorithm

MASC - Spring 2007 MASC Quick Hull Algorithm FF J J J J T 0T we T pr T rq F M-IS: Fork Task 0 W-IS: Execute Task 0 M-IS: Join Task 0 M-IS: Fork Task WE W-IS: Execute Task WE M-IS: Join Task WE M-IS: Fork Task PR and Task RQ W-IS: Execute Task RQ W-IS: Execute Task PR M-IS: Join Task PR M-IS: Join Task RQ

MASC - Spring 2007 MASC Quick Hull Algorithm Timing n is the number of points in S and m is the number of instruction streams Still O(n) in the worst case If we assume that on the average O(lg n) is the number of convex hull points, the average case running time is O((lg lg n)(lg n)/m) Producing a constant speedup of approximately m over the 1-IS version of the same algorithm for the average case

MASC - Spring 2007 Conclusion Traditional SIMD executes each part of branches of a data parallel program sequentially MASC can execute most or all parts of the branches simultaneously if there are enough instruction streams The original MASC model with a simple cell network is as powerful as a model without any cell network or with manager/worker paradigm The MASC model is as powerful as many computational models such as PRAM and some versions of RMBM An extension of the ASC compiler is required to take the benefit of having multiple ISs Some problems can take the advantage of having more than one instruction stream. Some do not.