Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Concurrent Matrix Transpose Algorithm, The Implementation Presentedby Pourya Jafari.

Similar presentations


Presentation on theme: "A Concurrent Matrix Transpose Algorithm, The Implementation Presentedby Pourya Jafari."— Presentation transcript:

1 A Concurrent Matrix Transpose Algorithm, The Implementation Presentedby Pourya Jafari

2 Review: Algorithm Steps Pre-process inside each thread Shift rows Shift rows Intra-process/thread communication Shift columns Shift columns Post-process inside each thread Shift rows again Shift rows again 00010203 10111213 20212223 30313233

3 Review: Shift values? Set shifts based on row index : range 0 to N-1 Now arrange the rows, so that column shifts gets us to i Preprocess shifting: i’ = i - L Preprocess shifting: i’ = i - L After intra-process shift columns should be equal to original row index i After intra-process shift columns should be equal to original row index i i’ + j = i i’ + j = i i - L + j = i L = - j i - L + j = i L = - j So we shift each column j cells up

4 Review: Last step ? 1 → 2: Column shift j up 2 → 3: Row shift based on row indices 3 → 4: ? Change of indices so far Change of indices so far (i - j, j) → (i - j, i - j + j) (i - j, i) = (m, n) (i - j, j) → (i - j, i - j + j) (i - j, i) = (m, n) One operation to change row index to j One operation to change row index to j n - m = (i - (i - j))= j 00010203 10111213 20212223 303132330011223310213203 20310213 300112230001020310111213 20212223 303132330011223303102132 02132031 01122330 0010203001112131 02122232 03132333 (1)(2-a)(2-b)(3) (4)

5 Review: Radix Using radix representation, we can group row shifts We use radix 2 for simplicity Digits are bit representation, Shift all row indices have their k-th bit on Digits are bit representation, Shift all row indices have their k-th bit on 0 1 2 301 2 301 2 3 Shift for each row k=0 k=1 =+

6 The concurrency picture Each thread can do pre/post processing independently Processes must synchronize after each phase after each phase after each step of intra-process step after each step of intra-process step during intra-process communications during intra-process communications

7 Communication package (1) We need a mean of communication Facilitates synchronized communication Facilitates synchronized communication Provides unbuffered communication to save memory Provides unbuffered communication to save memory JCSP: based on the algebra of Communicating Sequential Processes (CSP) has strong theory background has strong theory background Object Oriented Object Oriented

8 Communication package (2) JCSP provides One2OneChannel One2OneChannel Where a single sender can send and a single receiver can receive One2AnyChannel One2AnyChannel Where a single sender and many receiver can communicate but one at the same time Any2OneChannel Any2OneChannel Multiple senders and one receiver

9 Classes (1) CProcess: Column process Has a PID; Knows N; Has an array to save its items Has a PID; Knows N; Has an array to save its items One2OneChannel to each other process for intra-process shift operation One2OneChannel to each other process for intra-process shift operation One2AnyChannel to MProcess to receive start/resume calls One2AnyChannel to MProcess to receive start/resume calls Any2OneChannel to MProcess to signal that this CProcess has finished current step Any2OneChannel to MProcess to signal that this CProcess has finished current step

10 Classes (2) MProcess: Master Process One2Any Channel AnytoOneChannel to any CProcess One2Any Channel AnytoOneChannel to any CProcess Synchronizes the phases and intra-process communication by waiting for all CProcesses to finish current phase and then resume them for the next phase Synchronizes the phases and intra-process communication by waiting for all CProcesses to finish current phase and then resume them for the next phase

11 Classes (3) Launcher: Threads driver Create channels Create channels Create one MProcess and CProcess Create one MProcess and CProcess Run them in parallel Run them in parallel

12 Intra-process communication in CProcess Might send/receive multiple items Determines the indices that need to be shifted Determines the indices that need to be shifted Packs them in form of a message Packs them in form of a message Sends the message to the next CProcess and receive from the previous process in the shift chain Sends the message to the next CProcess and receive from the previous process in the shift chain Unpack the received message Unpack the received message Assign the items inside to the same indices determined in the first step Assign the items inside to the same indices determined in the first step

13 UML Diagram

14 The Intraprocess Shift Synchronized send and then receive Cycle might form All CProcesses will go to send state and wait for the next CProcess to receive All CProcesses will go to send state and wait for the next CProcess to receive None of CSProcesses receive -> Deadlock None of CSProcesses receive -> Deadlock

15 The Shift Cycle (1) One CProcess in the cycle should receive to break the cycle One CProcess in the cycle should receive to break the cycle But will lose the value which has to send Receives and buffers the send value Sends and then assign the buffered value to the relevant array cell

16 The Shift Cycle (3) Cycles happen when the interleaving value h divides N We do buffered read for all numbers less than h

17 The Shift Cycle (3) Even after this, the program runs into deadlock again Cycles form when gcd(h, N) is greater than 1 Must buffer values less than equal to gcd(h, N)

18 Results


Download ppt "A Concurrent Matrix Transpose Algorithm, The Implementation Presentedby Pourya Jafari."

Similar presentations


Ads by Google