A SPMD Model for OCR Sanjay Chatterjee 2/9/2015 Intel Confidential1.

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
Chapter 3 Process Description and Control
Parallel Processing & Parallel Algorithm May 8, 2003 B4 Yuuki Horita.
Intermediate Code Generation
CS3771 Today: deadlock detection and election algorithms  Previous class Event ordering in distributed systems Various approaches for Mutual Exclusion.
1 © R. Guerraoui The Power of Registers Prof R. Guerraoui Distributed Programming Laboratory.
Routing in a Parallel Computer. A network of processors is represented by graph G=(V,E), where |V| = N. Each processor has unique ID between 1 and N.
Lecture # 21 Chapter 6 Uptill 6.4. Type System A type system is a collection of rules for assigning type expressions to the various parts of the program.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
1 More on Arrays and Loops Reading for this Lecture: –Section 5.4, , Break and Continue in Loops Arrays and For-each Loops Arrays and Loops.
Reference: / MPI Program Structure.
Getting Started with MPI Self Test with solution.
Point-to-Point Communication Self Test with solution.
Chapter 2: Algorithm Discovery and Design
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Starting Out with C++ Early Objects Sixth Edition Chapter 6: Functions by.
Algorithms. Introduction Before writing a program: –Have a thorough understanding of the problem –Carefully plan an approach for solving it While writing.
A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.
Chapter 2: Algorithm Discovery and Design
Chapter 2: Algorithm Discovery and Design
SE-565 Software System Requirements More UML Diagrams.
Programming Concepts MIT - AITI. Variables l A variable is a name associated with a piece of data l Variables allow you to store and manipulate data in.
OCR User Hints API Rob, Sanjay, Zoran. Motivation for OCR user hints API Create a facility for the OCR application developer to provide application specific.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Chapter 2: Algorithm Discovery and Design Invitation to Computer Science, C++ Version, Third Edition.
Programmer's view on Computer Architecture by Istvan Haller.
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
ADVANCED EV3 PROGRAMMING LESSON
Chapter 6: Functions Starting Out with C++ Early Objects
Modelling III: Asynchronous Shared Memory Model Chapter 9 by Nancy A. Lynch presented by Mark E. Miyashita.
CSE 486/586 CSE 486/586 Distributed Systems Graph Processing Steve Ko Computer Sciences and Engineering University at Buffalo.
Object-Oriented Modeling Using UML CS 3331 Section 2.3 of Jia 2003.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Arrays An array is a data structure that consists of an ordered collection of similar items (where “similar items” means items of the same type.) An array.
Lists. Container Classes Many applications in Computer Science require the storage of information for collections of entities e.g. a student registration.
TECH Computer Science NP-Complete Problems Problems  Abstract Problems  Decision Problem, Optimal value, Optimal solution  Encodings  //Data Structure.
A SPMD Model for OCR (with collectives) Sanjay Chatterjee 2/9/2015 Intel Confidential1.
Exascale Programming Models Lecture Series 06/12/2014 What is OCR? TG Team (presenter: Romain Cledat) June 12,
Summer Computing Workshop. Introduction  Boolean Expressions – In programming, a Boolean expression is an expression that is either true or false. In.
Stacks. A stack is a data structure that holds a sequence of elements and stores and retrieves items in a last-in first- out manner (LIFO). This means.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Mohammad Amin Kuhail M.Sc. (York, UK) University of Palestine Faculty of Engineering and Urban planning Software Engineering Department Computer Science.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February Session 11.
C++ / G4MICE Course Session 2 Basic C++ types. Control and Looping Functions in C Function/method signatures and scope.
Alternate Version of STARTING OUT WITH C++ 4 th Edition Chapter 6 Functions.
9/22/2011CS4961 CS4961 Parallel Programming Lecture 9: Task Parallelism in OpenMP Mary Hall September 22,
C++ / G4MICE Course Session 1 - Introduction Edit text files in a UNIX environment. Use the g++ compiler to compile a single C++ file. Understand the C++
XP New Perspectives on XML, 2 nd Edition Tutorial 7 1 TUTORIAL 7 CREATING A COMPUTATIONAL STYLESHEET.
How to write a MSGQ Transport (MQT) Overview Nov 29, 2005 Todd Mullanix.
Repeated pattern hints Original plan: attach all EDT hints to the EDT template; there is no field in the ocrEdtCreate call for hints Observation: 1 repeated.
Iteration & Loop Statements 1 Iteration or Loop Statements Dept. of Computer Engineering Faculty of Engineering, Kasetsart University Bangkok, Thailand.
1 Process Description and Control Chapter 3. 2 Process A program in execution An instance of a program running on a computer The entity that can be assigned.
Mindstorm NXT-G Introduction Towson University Robotics.
Fusion Design Overview Object Interaction Graph Visibility Graph Class Descriptions Inheritance Graphs Fusion: Design The overall goal of Design is to.
SOCSAMS e-learning Dept. of Computer Applications, MES College Marampally INTERPROCESS COMMUNICATION AND SYNCHRONIZATION SYNCHRONIZATION.
1 Chapter 11 Global Properties (Distributed Termination)
Chapter 6 Functions. 6-2 Topics 6.1 Modular Programming 6.2 Defining and Calling Functions 6.3 Function Prototypes 6.4 Sending Data into a Function 6.5.
Chapter 2: Algorithm Discovery and Design Invitation to Computer Science.
State Modeling. Introduction A state model describes the sequences of operations that occur in response to external stimuli. As opposed to what the operations.
Computer Science 210 Computer Organization Machine Language Instructions: Control.
Chandra S. Martha Min Lee 02/10/2016
MPI Point to Point Communication
CSE 486/586 Distributed Systems Logical Time
Computer Science 210 Computer Organization
CS210- Lecture 5 Jun 9, 2005 Agenda Queues
Test Review CIS 199 Exam 2 by.
CS510 Operating System Foundations
Data Structures & Algorithms
CSE 486/586 Distributed Systems Logical Time
Presentation transcript:

A SPMD Model for OCR Sanjay Chatterjee 2/9/2015 Intel Confidential1

OCR SPMD model A SPMD context in OCR is a collection of individual logical execution units called ranks A rank has a unique id within a SPMD context and can be viewed as a sequential chain of SPMD-EDTs A SPMD context includes two kinds of SPMD EDT templates: compute and sync A SPMD rank starts by launching a compute EDT instance SPMD ranks collectively synchronize by individually calling a SYNC operation SPMD ranks collectively transition from a synchronization phase to a computation phase by individually calling COMPUTE A SPMD EDT restarts itself by calling NEXT Intel Confidential 2 SPMD CONTEXT RANK 1 C10 C11 C12 S10 C13 RANK 0 C00 S00 S01 S02 C01 RANK 2 C20 C21 C22 S20 S21 C23 RANK 3 C30 C31 C32 S30 C33 COMPUTE PHASE COLLECTIVE SYNC PHASE COMPUTE PHASE RANK MESSAGE NEXT SYNC COMPUTE NEXT

SPMD EDTs vs Regular EDTs SPMD EDTs are of two kinds: COMPUTE and SYNC SPMD EDTs are anonymized i.e they do not have a guid A SPMD EDT only lives within a SPMD context and is associated with a rank Returning from a SPMD EDT will exit the rank from the SPMD context A SPMD EDT can restart itself by calling NEXT A compute SPMD EDT can call SYNC to exit itself and start a new sync EDT on the same rank A sync SPMD EDT can call COMPUTE to exit itself and start a new compute EDT on the same rank A compute EDT calling COMPUTE or a sync EDT calling SYNC is an error A SPMD EDT in one rank can communicate with another rank using rank messages A SPMD EDT can add a self dependence with a new API ocrAddSelfDependence i.e, the EDT can make its next instance be dependent on another Event/DB Intel Confidential3

Creating and launching a SPMD Context u8 ocrSpmdLaunch( u64 numRanks, ocrGuid_t computeTemplate, u32 paramc, u64* paramv, u32 spmdDepc, ocrSpmdDep_t *spmdDepv, ocrGuid_t collector, ocrGuid_t affinity, ocrGuid_t outputEvent ); [in] numRanks : Number of ranks in the SPMD context [in] computeTemplate : Guid of the compute EDT template [in] paramc : Number of SPMD params [in] paramv : Params for the compute EDTs. They are copied to every rank. [in] spmdDepc : Number of ocrSpmdDep_t inputs for the compute EDT [in] spmdDepv : The ocrSpmdDep_t inputs for the compute EDT [in] collector : Guid of a library of synchronization algorithms [in] affinity : Affinity guid of the SPMD EDT [in] outputEvent : SPMD output event u8 ocrSpmdDepCreate( ocrSpmdDep_t *dep, ocrGuid_t db, ocrDbAccessMode_t mode, SpmdDepType_t type, u64 index, size_t elSize ); Creates a Spmd dependence type to provide as input to the Spmd context [out] dep: The Spmd EDT dep variable [in] db: Guid of DB used as input to the SPMD context [in] mode: Access mode on the DB [in] type: Type of SPMD dependence. Can be either “REGULAR” or “INDEXED” REGULAR: The DB used in a “regular” dep is copied to the compute EDTs on every rank. Each rank gets a new GUID for the copied DB. INDEXED: The DB used in an “indexed” dep is read in slices on every rank. Each rank get a new DB for the slice it uses. DB should be an array containing elements of size “elSize” The array length should at least “index” + SPMD numRanks Each rank “i” gets a DB of size “elSize” starting at offset ((index + i) * elSize) of the source DB [in] index: Used for “INDEXED” deps only. It is the starting index for rank 0 to start it’s slice. [in] elSize: Used for “INDEXED” deps only. Size of element used in the input DB. Intel Confidential4

SPMD Rank Messages SPMD rank messages support point-to-point communication between ranks Messages can be communicated only between the same kind of SPMD EDT templates Compute SPMD EDTs on one rank can only send/receive messages to/from compute SPMD EDTs on other ranks Sync SPMD EDTs on one rank can only send/receive messages to/from sync SPMD EDTs on other ranks Message ordering at source rank is guaranteed to be maintained at destination rank depv slot u8 ocrSend(u64 dstRank, u64 dstSlot, ocrGuid_t db); [in] dstRank: rank id of message destination rank [in] dstSlot: slot id at destination rank [in] db: Guid of the datablock communicated Called by message source Message send is guaranteed to be complete after NEXT is called Another send to the same location and slot is permitted only after calling NEXT u8 ocrRecv(u64 srcRank, u64 dstSlot); [in] srcRank: rank id of the message source rank [in] dstSlot: slot id in current rank where message will be received Called by message destination DB at destination can be accessed in slot after calling NEXT Intel Confidential5

API for adding a dependence in a SPMD EDT ocrAddSelfDependence(ocrGuid_t source, u32 slot, ocrDbAccessMode_t mode); [in] source: Source of the dependence edge. Maybe event or DB. [in] slot: Slot in the current SPMD EDT that will be satisfied by the dependence [in] mode: The access mode on the DB attached to the slot Adds a dependence to an event or DB source Allows SPMD EDT to wait for an event NEXT has to be called for completion of the wait on the satisfaction of the dependence The data from the source is visible only after calling NEXT Intel Confidential6

API for NEXT void ocrNext(); exits and restarts current SPMD EDT All sends and receives called before ocrNext are guaranteed to be complete before the EDT restarts After restart, the depv slots that receive messages are updated with new DB. All other depv slots and params maintain their state from previous ocrNext Intel Confidential7

API for COMPUTE void ocrCompute(); Creates and launches a new SPMD EDT in the current rank from the compute template of the SPMD context Can be called from either the initialization function or a sync SPMD EDT All compute EDTs in a rank share the same paramv and depv state setup during the initialization function can be updated during the lifetime of the rank Intel Confidential8

OCR Collectors OCR collectors are libraries that consist of various synchronization algorithms These algorithms are to be written as SYNC EDTs u8 ocrCollectorCreate( ocrGuid_t *collectorGuid); Creates a collector object [out] collectorGuid: Guid of the collector object u8 ocrCollectorRegister( ocrGuid_t collector, ocrCollective_t colType, ocrGuid_t syncTemplate); Registers a SYNC EDT template with this collector object [in] collector: Guid of the collector object [in] colType: Type of the collective synchronization operation [in] syncTemplateGuid: Guid of the sync EDT template that implements the collective operation Intel Confidential9

API for SYNC void ocrSync(ocrCollective_t colType, ocrGuid_t db, bool reqResult, u32 paramc, u64 *paramv); Creates and launches a new sync SPMD EDT in the current rank Called from a compute SPMD EDT (this call will exit the compute EDT). [in] colType: type of collective synchronization to be performed. E.g: sum-reduction, barrier, etc [in] db: Datablock passed to the synchronization operation as rank’s input element The DB can be accessed in slot 0 of the sync EDT [in] reqResult: Boolean to indicate if current rank needs the result of the collective [in] paramc: Number of params passed to the sync EDT [in] paramv: Params passed to sync EDT u8 ocrSyncResultPut(ocrGuid_t db) The sync EDT that holds the final result of the synchronization op [in] db: Result DB from the collective synchronization op u8 ocrSyncResultGet(ocrEdtDep_t *result); [out] result: The DB guid and pointer of the collective operation result Can be called only from a compute EDT Call will result in error if the previous ocrSync was called with reqResult as FALSE Intel Confidential10

Other API supported inside a SPMD context u64 ocrGetRank() – returns the current rank u64 ocrGetRanks() – returns total number of ranks in the current SPMD context Intel Confidential11

Example: Sum-Reduction Intel Confidential12 { … ocrEdtTemplateCreate( &sumRedTempl, sumRedFunc, 2, 2 ); ocrCollectorCreate( &collectorGuid ); ocrCollectorRegister( collectorGuid, SUM_REDUCTION, sumRedTempl ); ocrEdtTemplateCreate( &computeRedTempl, computeFunc, 1, 1 ); ocrEventCreate( &outputRed, OCR_EVENT_STICKY_T, TRUE ); u64 phase = 0; ocrSpmdDepCreate( &spmdDep, elArrayDb, DB_MODE_RO, INDEXED, 0, sizeof(u64) ); ocrSpmdContextCreate( NUM_RANKS, computeRedTempl, 1, &phase, 1, &spmdDep, collectorGuid, NULL_GUID, outputRed ); } ocrGuid_t computeFunc ( u32 paramc, u64* paramv, u32 depc, ocrEdtDep_t depv[] ) { if (*paramv == 0) { u64 sync_paramv[2]; sync_paramv[0] = ocrGetRank(); sync_paramv[1] = 1; ocrSync(SUM_REDUCTION, depv[0].guid, (ocrGetRank() == 0 ? TRUE : FALSE), 2, &sync_paramv) } else if (ocrGetRank() == 0) { ocrEdtDep_t result; ocrSyncResultGet( &result ); return result.guid; } return NULL_GUID; } ocrGuid_t sumRedFunc ( u32 paramc, u64* paramv, u32 depc, ocrEdtDep_t depv[] ) { u64 myRank = ocrGetRank(), numRanks = ocrGetRanks(); if (paramv[0] % 2) == 0) { if (paramv[1] != 1) {//reduce: depv[0] = depv[0] + depv[1];} u64 srcRank = myRank + paramv[1]; if (srcRank >= numRanks) break; ocrRecv(srcRank, 1); paramv[0] /= 2; paramv[1] *= 2; ocrNext(); } else { u64 dstRank = myRank - paramv[1]; //ASSERT(dstRank >= 0 && dstRank < numRanks); ocrSend(dstRank, 1, depv[0].guid); ocrCompute(); } ocrSyncResultPut( depv[0].guid ); }