An Associative Program for the MST Problem Part 2 of Associative Computing.

An Associative Program for the MST Problem Part 2 of Associative Computing

Overview In this set of slides, we will explore an alternate associative algorithm for the minimal spanning tree (MST) problem. Only slides with “light blue” titles will be covered in class. – The other slides are reference slides so that students can obtain an overview of the ASC language. As mentioned earlier, Professor Potter developed an associative programming language called ASC and a simulator for this language. –ASC has also been implemented on 3-4 SIMD computers. We will treat the ASC code for the MST included here as a detailed pseudocode description of this algorithm. The goal of this set of slides is to prepare students to write a C n (ClearSpeed) program for this algorithm. 2

3 Content Covered in Light Blue References The MST example and background Variables and Data Types Operator Notation Input and Output Mask Control Statements Loop Control Statements Accessing Values in Parallel Variables Performance Monitor Subroutines and other topics Basic Program Structure Software Location & Execution Procedures An Online ASC Program & Data File to Execute ASC Code for the MST algorithm for a directed graph The Shortest Path homework problem

4 References “ASC Primer” by Professor Jerry Potter is the primary reference for basic ASC –A copy is posted on lab website under “software” Lab website is at www.cs.kent.edu/~parallel/ “Associative Computing” book by Jerry Potter has a lot of additional information about the ASC language. Both references use a directed-graph version of the Minimal Spanning Tree as an important example.

5 Features of Potter’s MST Algorithm Both versions of MST are based on Prim’s sequential MST algorithm –In most algorithm books (e.g., see Baase, et. al. in references) A drawback of Potter’s version is that it requires 1 PE for each graph edge, which in worst case is n 2 – n =  (n 2 ) –Unlike earlier MST algorithm, this is not an “optimal cost” parallel algorithm An advantage is that it works for undirected graphs –The earlier MST algorithm covered might be possibly be extended to work for directed graphs. Uses less memory for most graphs than earlier algorithm –True especially for sparse graphs –Often will require a total of only O(n) memory locations, since the memory required for each PE is a small constant. –In the worst case, at most O(n 2 ) memory locations are needed. –Earlier algorithm always requires  (n 2 ) memory locations, as each PE stores a row of the adjacency matrix.

6 Representing the Graph in a Procedural Language We need to find edges that are incident to a node of the graph. What kind of data structure could be used to make this easy? Typically there are two choices: –An adjacency matrix Label the rows and columns with the node names. Put the weight w in row i and column j if edge i is incident to edge j with weight w. Doing this, we would use the representation of the graph in the problem as follows...

7 Graph Example for MST DE HI FC G BA 8 6 5 3 3 2 2 2 1 6 1 4 2 47

8 Adjacency Matrix For Preceding Graph ABCDEFGHIABCDEFGHI 273 246 422 218 162 765 3631 2834 2514 A B C D E F G H I

9 An Alternative Useful Representation for Sequential Algorithms Another possibility is to use adjacency lists, which can allow some additional flexibility for this problem in representing the rest of the data – namely the sets v1, v2, and v3. It is in these type of representations that we see pointers or references play a role. We link off of each node, all of the nodes which are incident to it, keeping them in increasing order by label.

10 Adjacency Lists for the Graph in the Problem A B C D E F G H I B2 A2 G3 G6 F7 C4 ETC..... G, H, and I will have 4 entries; all others have 3. In each list, the nodes are in increasing order by node label. Note: if the node label ordering is clear, the A, B,... need not be stored.

11 Adding the Other Information Needed While Finding the Solution Consider one of the states during the run, right after the segment AG is selected: V1 A B G V2 C F I H How will this data be maintained?

12 Cont. We need to know: –Which set each node is in. –What is each node’s parent in the tree formed below by both collections. –A list of the candidate nodes. V2 V1 A B G C F I H

13 A Typical Data Structure Used For This Problem Is Shown 1 A2A1 F4B2 3 3  7A2 F3A1 C3G2 H1G2 ABCDEFGHIABCDEFGHI I V2 elements are linked via yellow entries with V2lnk the head and  the tail: I  H  C  F Light blue boxes appeared in earlier states, but are no longer in use. Red entries say what set the node is in. Green entries give parent of node and orange entries give edge weights. The adjacency lists are not shown, but are linked off to right. V2lnk

14 I Is Now Selected and We Update 1 A2A1 F4B2 3 3  7A2 F3A1 C3G2 H1G1 ABCDEFGHIABCDEFGHI I V2lnk I is now in V1 so change its set value to 1. Look at nodes adjacent to I : E, F, G, H and add them to V2 if they are in V3: E is added...

15 E was Just Added to V2 1 A2A1 F4B2 3 H2  7A2 F3A1 C3G2 H1G1 ABCDEFGHIABCDEFGHI E V2lnk Store I’s link H in E’s position and E in V2lnk. This makes I’s entry unreachable. So V2 is now E  H  C  F Now we have to add relevant edges for I to any node in V2.

16 Add Relevant Edges For I to V2 Nodes 1 A2A1 F4B2 3 H2I2  5I2 F3A1 C3G2 H1G1 ABCDEFGHIABCDEFGHI E V2lnk Walk I’s adjacency list: E  F  G  H E was just added so select EI with weight 2. wgt(FI) = 5 < wgt(FA) was 7, so drop FA and add FI (see black blocks) G is in V1 so don’t add GI. wgt(HI) = 4 > wgt(HG) = 3, so no change. This is now ready for next round.

17 Complexity Analysis (Time and Space) for Prim’s Sequential Algorithm Assume –the preceding data structure is used. –The number of nodes is n –The number of edges is m Space used is 4n plus the space for adjacency lists. –The adjacency list are Θ(m), which in worst case is Θ(n 2 ) –This data structure sacrifices space for time. Time is Θ(n 2 ) in the worst case. –The adjacency list of each node is traversed only once when it is added to tree. The total work of comparing weights and updating the chart during all of these traversals is Θ(m), –There are n-1 rounds, as one tree node is selected each round. –Walking the V2 list to find the minimum could require n-1 steps the first round, n-2 the second, etc for a max of Θ(n 2 ) steps

18 Alternate ASC Implemention of Prim’s Algorithm using this Approach After setting up a data structure for the problem, we now need to code it by manipulating each state as we did on the preceding slides. ASC model provides an easier approach. Recall that ASC does NOT support pointers or references. The associative searching replaces the need for these. Recall, we collectively think of the PE processor memories as a rectangular structure consisting of multiple records. We will next introduce basic features of the ASC language in order to implement this algorithm.

19 Structuring the MST Data for ASC There are 15 bidirectional edges in the graph or 30 edges in total. Each directed edge will have a head and a tail. So, the bidirectional edge AB will be represented twice – once as having head A and tail B and once as having head B and tail A We will use 30 processors and in each PE’s memory we will store an edge representation as: State is 0, 1, 2, or 3 and will be explained shortly. headtailweightstate

20 ASC Data Types and Variables ASC has eight data types… –int (i.e., integer), real, hex (i.e., base 16), oct (i.e., base 8), bin (i.e., binary), card (i.e., cardinal), char (i.e., character), logical, index. –Card is used for unsigned integer data. Variables can either be scalar or parallel.

21 ASC Parallel Variables Parallel variables reside in the memory of individual processors. Consequently, tail, head, weight, and state will be parallel variables. In ASC, parallel variables are declared using an array-like notation, with $ in index: char parallel tail[$], head[$]; int parallel weight[$], state[$];

22 ASC Scalar and Index Variables Scalar variables in ASC reside in the IS (i.e., the front end computer), not in the PE’s memories. They are declared as char scalar node; Index variables in ASC are used to manipulate the index (i.e. choice of an individual processor) of a field. For example, graph[xx] They are declared as: index parallel xx[$]; They occupy 1 bit of space per processor

23 Logical Variables and Constants Logical variables in ASC are boolean variables. They can be scalar or parallel. –ASC does not formally distinguish between the index parallel and logical parallel variables –The correct type should be selected, based on usage. If you prefer to work with the words TRUE and FALSE, you can define logical constants by deflog (TRUE, 1); deflog (FALSE, 0); Constant scalars can be defined by define (identifier, value);

24 Logical Parallel Variables needed for MST These are defined as follows: logical parallel nextnod[$], graph[$], result[$]; The use of these will become clear in later slides. For the moment, recognize they are just bit variables, one for each PE.

25 Array Dimensions A parallel variable can have up to 3 dimensions –First dimension is “$”, the parallel dimension The array numbering is zero-based, so the declaration int parallel A[$,2] creates the following 1dimensional variables: A[$,0], A[$,1], A[$,2]

26 Mixed Mode Operations Mixed mode operations are supported and their result has the “natural” mode. For example, given declarations int scalar a, b, c; int parallel p[$], q[$], r[$], t[$,4]; index parallel x[$], y[$]; then c = a + b is a scalar integer q[$] = a + p[$] is a parallel integer variable a + p[x]is a integer value r[$] = t[x,2]+3*p[$] is a parallel integer variable x[$] = p[$].eq. r[$]is an index parallel variable More examples are given on page 9-10 of ASC Primer

27 The Memory Layout for MST As with most programming languages, the order of the declarations determines the order in which the variables are identified in memory. To illustrate, suppose we declare for MST char parallel tail[$], head[$]; int parallel weight[$], state[$]; int scalar node; index parallel xx[$]; logical parallel nexnod[$], graph[$], result[$]; The layout in the memories is given on next slide Integers default to the word size of the machine so ours would be 32 bits.

28 The Memory Layout for MST PE 0 1 2 3 4 p-1 p tail head weight state xx nxt gr res Last 4 are bit fields. The last 3 are named: nxtnod graph result

29 Operator Notation Relational and Logical Operators Original syntax came from FORTRAN and the examples in the ASC Primer use that syntax. However, the more modern syntax is supported:.lt.<.not. !.gt.>.or. ||.le.<=.and. &&.ge.>=.xor. --.eq. ==.ne.!= Arithmetic Operators addition + multiplication * division/

30 Parallel Input in ASC Input for parallel variables can be interactive or from a data file in ASC. We will run in a command window so file input will be handled by redirection If you are not familiar with command window handling or Linux (Unix), this will be shown. In either case, the data is entered in columns just like it will appear in the read command. –Do not use tabs. –THE LAST LINE MUST BE A BLANK LINE!

31 Parallel read and Associate Command The format of the Parallel read statement is read parvar1, parvar2,... in The command only works with parallel variables, not scalars. Input variables must be associated with a logical parallel variable before the read statement. The logical variable is used to indicate which PEs was used on input. After the read statement, the logical parallel variable will be true (i.e., 1) for all processors holding input values.

32 Parallel Input in ASC The associate command and the read command for MST would be: associate head[$], tail[$], weight[$], state[$] in graph[$]; read tail[$], head[$], weight[$] in graph[$]; Blanks can be used rather than commas, as indicated by MST example on pg 35 of Primer. Commenting Code: /* This is the way to comment code in ASC */

33 Input of Graph Suppose we were just entering the data for AB, AG, AF, BA, BC, and BG. Order is not important, but the data file would look like: A B 2 A G 5 A F 9 B A 2 B C 4 B G 6 blank line and memory would like: tail head weight graph A B 2 1 A G 5 1 A F 9 1 B A 2 1 B C 4 1 B G 6 1 0 0 0 0 

34 Scalar variable input Static input can be handled in the code. Also, define or deflog statements can be used to handle static input. Dynamic input is currently not supported directly, but can be accomplished as follows: –Reserve a parallel variable dummy (of desired type) for input. –Reserve a parallel index variable used. –Values to be stored in scalar variables are first read into dummy using a parallel-read and then transferred using get or next to the appropriate scalar variable. –Example: read dummy[$] in used[x]; get x in used[$] scalar-variable = dummy[x]; endget x;

35 Input Summary Direct scalar input is not directly supported. Scalars can be set as constants or can be set during execution using various commands. –We will see this shortly We will be able to output scalar variables –This will also be handy for debugging purposes. The main problem on input is to remember to include the blank line at the end. I suggest always printing your input data initially so you see it is going in properly.

36 Parallel Variable Output Format for parallel print statement is print parvar1, parvar2,... in Again, variables to be displayed must be associated with a logical parallel variable first. You can use the same association as for the read command: associate tail[$], head[$], weight[$] with graph[$]; read tail[$], head[$], weight[$] in graph[$]; print tail[$], head[$], weight[$] in graph[$]; You can use a logical parallel variable that has been set with another statement, like an IF statement, to control which PEs will output data.

37 MST Example Suppose state[$] holds information about whether a node is in V1, V2, etc. Then, you could set up an association by if (state[$] == 1) then result[$] = TRUE; endif; You can print with this association as follows: print tail[$], head[$], weight[$] in result[$]; –Only those records where state[$] == 1 would be printed.

38 Output Using “msg” The msg command –Used to display user text messages. –Used to display values of scalar variables. –Used to display a dump of the parallel variables. The entire parallel variable contents printed Status of active responders or association variables ignored Format: msg “string” list; msg “The answers are” max BB[X] B[$]; See Page 13-14 of ASC Primer

39 Assignment Statements Assignment can be made with compatible expressions using the equal sign with –scalar variables –parallel variables –logical parallel variables The data types normally have to be the same on both sides of the assignment symbol – i.e. don’t mix scalar and parallel variables. A few special cases are covered on the next slide

40 Some Assignment Statement Special Cases Declarations for Examples: int scalar k; int parallel b[$]; Index parallel xx[$]; If xx is an index variable with a 1 in at least one of its components, then following is valid: k = aa[xx] + 5; –Here, the component of aa used is one where xx is 1. – While selection is arbitrary (e.g., pick-one), this implementation selects the smallest index where xx[$] is 1. The assignment of integer arithmetic expressions to integer parallel variables is supported. b[xx] = 3 + 5; –This statement assigns an 8 to the “xx component” of b. –The component selected is identified by first “1” in xx. See pg 9-10 of Primer for more examples.

41 Example aa[$] = b[$] + c[$]; (1) Before: mask aa[$] b[$] c[$] 1 2 3 4 1 3 5 3 0 2 4 -3 0 6 4 1 1 2 -3 -6 After: mask aa[$] b[$] c[$] 1 7 3 4 1 8 5 3 0 2 4 -3 0 6 4 1 1 -9 -3 -6 1 Note: As an article, “a” is a reserved word in ASC and so it can’t be used as a variable name. (see ASC Primer, pgs 29-30 and 39)

42 Setscope Mask Control Statement Format: setscope body endsetscope; Resets the parallel mask register –setscope jumps out of current mask setting to the new mask given by its logical parallel variable. –One use is to reactivate currently inactive processors. –Also allows an immediate return to a previously calculated mask, such as an association. –Is an unstructured command such as go-to and jumps from current environment to a new environment. Use sparingly –endsetscope resets mask to preceding setting.

43 Example Before setscope: mask aa used tail 1 5 1 7 1 22 0 6 1 5 1 9 0 41 0 7 After setscope: used= aa mask tail 5 1 100 22 0 6 5 1 100 41 0 7 logical parallel used[$];... used[$] = aa[$] == 5; setscope used[$] tail[$] = 100; endsetscope; After endsetscope: aa mask tail 5 1 100 22 1 6 5 1 100 41 0 7

44 The Scalar IF Statement Scalar IF – similar to what you have used before – i.e. a branching statement – with the else part optional. Example: int scalar k;... if k == 5 then sum =0; else b = sum; endif;

45 The Parallel IF Mask Control Statement Looks like scalar IF except instead of a scalar logical expression, a parallel logical expression is encountered. Format: if then [ else ] endif Although it looks similar, the execution is considerably different. –The parallel version normally executes both “bodies”, each for the appropriate processors Useful as a parallel search control statement

46 Operation Steps of Parallel IF 1)Save the mask bit of processors that are currently active. 2)Broadcast code to the active processors to calculate the IF boolean expression. 3)If the boolean expression is true for an active processor, set its individual cell mask bit to TRUE; otherwise set its mask bit to FALSE. 4)Broadcast code for the “then” portion of the IF statement and execute it on the (TRUE) responders. 5)Compliment the mask bits for the processors that were active at step 1. Ones originally FALSE remain FALSE 6)Broadcast code for the “else” portion of the IF statement and execute it on the active processors. 7)Reset the mask to original mask at Step 1.

47 Example Before: b mask 1 1 7 1 2 1 1 1 1 0 After: b then mask else mask 2 10 -1 01 2 10 1 00 if (b[$] == 1) then b[$] =2; else b[$] = -1 endif;

48 IF – (ELSE-NOT-ANY) Format if then body of if elsenany body of “elsenany” endif; Note this is an “if” statement with an embedded ELSENANY clause. Either responders to “if” execute “if-body” or else all active responders execute “elsenany-body”. While this extension is occasionally useful, could get by with just “any” command –“any” command is covered in next construct.

49 The IF-ELSENANY Mask Control Statement Only one part of this IF statement is executed. Useful as a parallel search control statement Steps 1)Evaluate the conditional statement. 2)If there are one or more active responders, execute the “then” block. 3)If there is no active responders, the ELSE-NOT- ANY (ELSENANY) block is executed. 4)When executing the ELSENANY part, the original mask is used – i.e. the one prior to the IF-NOT-ANY statement.

50 Example Before: aa b c 1 17 0 2 13 0 2 8 0 3 12 0 2 9 0 4 67 0 0 0 0 0 12 0 After: mask1 mask2 aa b c 0 0 1 17 0 1 0 2 13 0 1 0 2 8 0 1 1 3 12 1 1 0 2 9 0 0 0 4 67 0 0 0 0 0 0 0 0 0 12 0 Recall: uses set mask if aa[$] > 1 && aa[$] < 4 /*sets mask*/ if b[$] == 12 then c[$] = 1 /* search for b ==12 */ elsenany c[$] = 9; endif; /* action if no b is 12*/ endif;

51 Example Before: aa b c 1 17 0 2 13 0 2 8 0 3 4 0 2 9 0 4 67 0 0 0 0 0 12 0 After: mask1 mask2 aa b c 0 0 1 17 0 1 0 2 13 9 1 0 2 8 9 1 0 3 4 9 1 0 2 9 9 0 0 4 67 0 0 0 0 0 0 0 0 0 12 0 Recall – uses original mask if aa[$] > 1 && aa[$] < 4 /*sets mask*/ if b[$] == 12 then c[$] = 1 /* search for b ==12 */ elsenany c[$] = 9; endif; /* action if no b is 12*/ endif;

52 The ANY Mask Control Statement Format: any body [elsenany body] endany; “ANY” is the primary construct used in ASC to support the “AnyResponders” associative property –The body of “ANY” is executed by all active processors if any data item satisfies the conditional statement. The “ELSENANY” provides a sometimes useful but non- essential extension of the “ANY” command.

53 The ANY Statement Used to search for data items that satisfy the conditional expression. There must be at least one responder for the body statement to be performed. If there are no responders, the ANY statement does nothing unless an ELSENANY is used. The mask used to execute the ANY body is the original mask prior to the ANY statement. Consequently, all active responders are effected if the conditional expression of the ANY evaluates to TRUE. If there are no responders, then the body of ELSENANY is executed by all active processors.

54 Example Before: mask aa b 1 3 0 0 9 0 1 16 0 1 10 0 1 8 0 0 0 0 1 0 0 After: mask aa b 0 3 0 0 9 0 1 16 11 1 10 11 1 8 11 0 0 0 if aa[$] > 7 then /* set mask */ any aa[$] == 10 b[$] = 11; endany; endif;

55 The Loop Control Statements Loop controlled by either a scalar test or a parallel test –LOOP-UNTIL statement Conditional is evaluated every iteration Loop controlled by a parallel test –Parallel FOR-Loop Conditional is evaluated only once –Parallel While-Loop Conditional is evaluated every iteration The FOR and WHICH loop statement are the ones normally used. –LOOP-UNTIL included for mostly for completeness.

56 The LOOP-UNTIL Statement Similar to REPEAT UNTIL loops in other languages. However, it is more flexible since the UNTIL conditional test can appear anywhere in the body of the loop. Format: first initialization loop body1 until (logical scalar expression) or (logical parallel expression) or (NANY logical parallel expression) body 2 endloop; Parallel exit conditions –The UNTIL exits when responder(s) are detected –With NANY, the UNTIL exits when a no-responder condition occurs “body 2” represents statements executed if UNTIL not satisfied.

57 Example Before: mask aa b 1 0 3 1 3 4 1 0 1 0 1 3 1 1 5 1 4 6 1 5 2 After : i=0 mask mask1 aa b 1 1 0 5 1 0 3 4 1 1 0 3 0 0 1 3 1 0 1 5 1 0 4 6 1 0 5 2 first i = 0; loop if aa[$] = i then b[$] =b[$] + 2; endif’; i = i + 1; until i > 4; endloop;

58 Example Before: mask aa b 1 0 3 1 3 4 1 0 1 0 1 3 1 1 5 1 4 6 1 5 2 After : i=0 i=1 mask aa b b 1 0 5 5 1 3 4 4 1 0 3 3 0 1 3 3 1 1 5 7 1 4 6 6 1 5 2 2 first i = 0; loop if aa[$] = i then b[$] = b[$] + 2; endif’; i = i + 1; until i > 4; endloop;

59 Example Before: mask aa b 1 0 3 1 3 4 1 0 1 0 1 3 1 1 5 1 4 6 1 5 2 After : i=0 i=1 i=3 mask aa b b b 1 0 5 5 5 1 3 4 4 6 1 0 3 3 3 0 1 3 3 3 1 1 5 7 7 1 4 6 6 6 1 5 2 2 2 first i = 0; loop if aa[$] = i then b[$] = b[$] + 2; endif’; i = i + 1; until i > 4; endloop;

60 Example Before: mask aa b 1 0 3 1 3 4 1 0 1 0 1 3 1 1 5 1 4 6 1 5 2 After : i=0 i=1 i=3 i= 4 mask aa b b b b 1 0 5 5 5 5 1 3 4 4 9 9 1 0 3 3 3 3 0 1 3 3 3 3 1 1 5 7 7 7 1 4 6 6 6 8 1 5 2 2 2 2 first i = 0; loop if aa[$] = i then b[$] = b[$] +2 ; endif’; i = i + 1; until i > 4; endloop; Note: The example is to illustrate only; it could be done easier.

61 The Parallel FOR-LOOP FOR is used for looping and retrieving Used when a process must be repeated for each cell that satisfies a certain condition. It is similar to the sequential FOR, but the conditional logical expression must be a parallel one. Initially, the conditional expression is evaluated and the active responders are stored in an index variable.

62 The Parallel FOR-LOOP (cont) The top responder is processed during each pass through the FOR-loop until no responders remain. The contents of the index variable is updated at the bottom of the loop (i.e., the top “1” is changed to “0”) The index variable is used to walk through the responders and to retrieve each responder’s records. The conditional condition is never re-evaluated.

63 Example sum = 0; for xx in tail[$] != 999 /*evaluates and stores in xx*/ sum = sum + value[xx]; endfor xx; tail xx value 3 1 10 1 st time : sum = sum + 10 = 10 5 1 20 2 nd time: sum = sum + 20 = 30 999 0 30 6 1 40 3 rd time: sum = sum + 40 = 70

64 The Parallel WHILE Loop Similar to LOOP-UNTIL loop except it re-evaluates the conditional expression before each iteration. Format: WHILE in body endwhile The iteration terminates when there are no responders to the parallel logical expression. Note the number of responders can increase, decrease, or remain the same during a run. Unlike the FOR loop, this loop can be infinite.

65 The Parallel WHILE Loop Unlike the FOR statement, this construct re- evaluates the logical conditional statement prior to each execution of the body of the while. The bit array resulting from the evaluation of the conditional statement is assigned to the index parallel variable on each pass. The index parallel array is available for use within the body for each loop and can be changed within the body. The iteration is terminated when the conditional statement is tested and there are no responders. –That is, all zeros in the index parallel variable. See ASC Primer pg 21-22 for more information

66 Before: aa b c 1 17 0 2 13 0 2 8 1 3 11 1 2 9 0 4 67 0 After 1 st loop: In loop, sumit is 13 DUMP OF ASSOCIATION ACTIVE FOLLOWS: AA,C, 1 0 7 0 2 1 3 1 2 0 4 0 sumit = 0; while xx in (aa[$] == 2) sumit = sumit + b[xx]; if (c[xx] == 1) then if (aa[$] == 2) then aa[$] = 5; endif; else aa[xx] = 7; endif; msg "In loop, sumit is " sumit; print aa[$], c[$] in active[$]; endwhile xx; After 2 nd loop: In loop, sumit is 21 DUMP OF ASSOCIATION ACTIVE FOLLOWS: AA,C, 1 0 7 0 5 1 3 1 5 0 4 0

67 When is Conditional Tested in Loops? UNTIL loops evaluate the test condition each time the UNTIL statement is encountered. WHILE loops have the test condition reevaluated before each iteration. The FOR loop evaluates the conditional expression initially and stores the resulting active responders in an index variable. This index variable is then used to retrieve items successively.

68 Special Commands to Obtain Parallel Variable Values Special Commands –Get Statement –Next Statement –Minimum and maximum values These commands are needed to implement some of the associative functions. –In particular, “get” and “next” allow the programmer to select an active responder for further processing. “get” & “next” implement the “PickOne” property.

69 GET Statement Used to access a specific field in the memory of an active processor. Format: get in body [elsenany body] The parallel logical expression is evaluated and its value assigned to the parallel index variable. The parallel index variable will identify the first active responder (if one exists) that satisfies the conditional test –first active responder executes the commands in the GET body. If there are no responders, the GET body is not executed. If GET contains an ELSENANY statement, its body is executed by all active processors when GET has no responders.

70 Example Before: tail val 10 100 1 90 2 77 1 83 After: tail val 10 100 1 0 2 77 1 83 get xx in tail[$] = 1 val[xx] = 0; endget xx;

71 The NEXT Statement Similar to GET statement, except NEXT deactivates the responder accessed each time it is called. Format: next in body [elsenany body] Unlike GET, two successive calls to NEXT is expected to select two distinct PEs and association records. NEXT is almost always used within a looping statement to walk through the selected PEs to do something in each.

72 Before: aa used b 1 0 2 4 1 2 19 0 2 4 1 2 After: aa used b 1 0 2 4 0 -1 4 1 2 19 0 2 4 1 2 Example int parallel aa[$], b[$];used[$] = aa[$] ==4; logical parallel used[$]; next xx in used[$] index parallel xx[$]; b[xx] = -1; endnext xx; Caution: xx in aa[$] ==4 is not allowed. A logical variable “used” must be involved and its top “1” is changed.

73 Example – see next slide for results main tryout int scalar k; int parallel aa[$], b[$], c[$]; logical parallel used [$], active[$]; index parallel xx[$]; associate aa[$], b[$], c[$] with active[$]; read aa[$], b[$], c[$] in active[$]; print aa[$], b[$], c[$] in active[$]; /* to see input */ /* Tryout of assignment statements */ b[$] = aa[$] + 5; c[$] = 3 + 5; used[$] = aa[$] == 5; /* selects all processors with 5 in aa field */ next xx in used[$] /* selects the top processor in “used” */ k = b[xx] +2; /* could do this & next line in one line */ c[xx] = k; /* done this way to show a scalar can be */ endnext xx; /* set */ print aa[$], b[$], c[$] in active[$]; end;

74 Before: DUMP OF ASSOCIATION ACTIVE FOLLOWS: AA,B,C, 1 2 3 2 3 4 5 6 7 8 9 10 11 12 13 5 1 2 5 2 1 After: DUMP OF ASSOCIATION ACTIVE FOLLOWS: AA,B,C, 1 6 8 Arrows show 2 7 8 PEs in used 5 10 12 8 13 8 xx is first PE 11 16 8 5 10 8 b[$] = aa[$] + 5; next xx in used[$] c[$] = 3 + 5; k = b[xx]+2; used[$] = aa[$] == 5; c[xx] = k; endnext xx;

75 Printing Scalars, Text Messages, and Dumping the Entire Parallel Array for a Field Format: msg “string” list; Example: msg "The values are " aa[$], k; The values are PE 0: 0 1 2 5 8 11 5 5 0 0 0 0 0 0 0 0 PE 16: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PE 32: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PE 48: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0... PE288: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PE304: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12

76 MAXVAL and MINVAL Functions and Other Functions MAXVAL(MINVAL) returns the maximum (minimum) value among active responders. If (tail[$] != 1) then k = maxval(weight[$]); endif; MAXDEX (MINDEX) returns the index of an entry where the maximum (minimum) value of the specified item occurs among the active responders. Recall – With an associative SIMD, above are constant time functions as they are supported in hardware. With a SIMD that is not associative, they can still be performed, but they are not constant time functions and their timings depend upon the interconnection network. There are several variations of above functions in ASC Primer – i.e. finding the nth smallest value. The function COUNT() returns the number of active responders. It can be useful in debugging.

77 Dynamic Storage Allocation allocate is used to identify a processor whose association record is currently unused. – Will be used to store a new association record –Creates a parallel index that points to the processor selected release is used to de-allocate storage of specified records in an association –Can release a single record or multiple records simultaneously. Example:Example: char parallel node[$], parent[$]; logical parallel tree[$]; index parallel x[$]; associate node[$], level[$], parent[$] with tree[$];...... allocate x in tree[$] node[x] = ‘B’ endallocate x; release parent[$].eq. ‘A’ from tree[$].

78 Performance Monitor Keeps track of number of scalar and parallel operations. It is turned on and off using the PERFORM statement –perform = 1; –perform = 0; The number of scalar and parallel operations can be printed using the MSG command –MSG “Number of parallel and scalar operations are” PA_PERFORM SC_PERFORM; The ASC Monitor is important for evaluation and comparison of various ASC algorithms and software. –It can also be used to determine or estimate running time. See Pg 30-31 of ASC Primer for more information

79 Additional Features Restricted subroutine capability is currently available –See call and include on pg 25-7 of ASC Primer. –ASC has a rather simplistic subroutine capability. –While not difficult, the subroutine details will not be covered in slides. –Assignment will not require use of subroutines. Use of personal pronouns and articles in ASC make code easier to read and shorter. –See page 29 of ASC Primer. –Again, the details are not covered in slides.

80 Basic Program Structure Main program_name –Constants; –Variables; –Associations; –Body; End;

81 Software Compiler and Emulator –DOS/Windows, UNIX (Linux) –WaveTracer –Connection Machine http://www.cs.kent.edu/ ~parallel/ and look under “software”http://www.cs.kent.edu/ ~parallel/ Use any text editor. –Careful on moving files between DOS and UNIX! ASC Compiler ASC Emulator Anyprog.asc Anyprog.iob Standard I/OFile I/O -e -wt -cm -e -wt -cm

82 Simple ASC Program Example: –Consider an ASC Program that computes the area of various simple shapes (circle, rectangle, triangle). Here is an example shapes.ascshapes.asc Here is the data shapes.datshapes.dat Here is the shapes.outshapes.out NOTE: Above links are only active during the “slide show”.

83 Software To compile the previous program –% asc1.exe –e shapes.asc To execute your program… –% asc2.exe –e shapes.iob –% asc2.exe –e shapes.iob < shapes.dat –% asc2.exe –e shapes.iob shapes.out Commands are executed in Windows from a command window. –See CMD command-line Environment document at http://www.cs.kent.edu/~jbaker/PDC- F07/references/CMD_Commands.doc http://www.cs.kent.edu/~jbaker/PDC- F07/references/CMD_Commands.doc Can execute UNIX (Linux) commands from line prompt –Don’t forget to change mode of compiler & emulator to be executable using chmod command.

MST Program Example in ASC Primer View ASC code as pseudocode and consider how to create equivalent C n code for the ClearSpeed Board

85 The Graph and Its Data File DE HI FC G BA 8 6 5 3 3 2 2 2 1 6 1 4 2 47 1 2 2 1 6 7 1 7 3 2 1 2 2 3 4 2 7 6 3 2 4 3 8 2 3 4 2 4 5 1 4 8 8 4 3 2 5 4 1 5 9 2 5 6 6 6 1 7 6 9 5 6 5 6 7 2 6 7 1 3 7 9 1 7 8 3 8 7 3 8 9 4 8 4 8 8 3 2 9 7 1 9 6 5 9 8 4 9 5 2

86 Header and declarations: /* The ASC Minimum Spanning Tree - with slight modifications from ASC PRIMER */ main mst /* Note: Vertices were encoded as integers */ deflog (TRUE, 1); deflog (FALSE, 0); char parallel tail[$], head[$]; int parallel weight[$], state[$]; char scalar node; index parallel xx[$]; logical parallel nxtnod[$], graph[$], result[$];

87 Obtain input: associate head[$], tail[$], weight[$], state[$] with graph[$]; read tail[$], head[$], weight[$] in graph[$]; Mark the active PEs for the next command (otherwise the zeros in the fields where data wasn’t read in would be used.) Find a tail whose weight is minimal. setscope graph[$] node = tail[mindex(weight[$])]; endsetscope; Because of the layout of the data file, we would find the first PE containing the minimal weight (which is 1) to be the PE holding 4 5 1. So node would be set to 4.

89 Continued Mark as being in set V2, all edges that have tails equal to node, i.e. 4: if (node == tail[$]) then state[$] = 2; else state[$] = 3; endif; This would mark the following edges as having a state of 2, i.e. they are in V2. 4 5 1 4 8 8 4 3 2

91 while xx in (state[$] == 2) if (state[$] == 2) then nxtnod[$] = mindex(weight[$]); endif; node = head[nxtnod[$]]; In loop 0: The only edges with the state of 2 are 4 5 1 4 8 8 4 3 2 so first one is selected and node is set to 5. state[nxtnod[$]] = 1; The edge 4 5 receives a state of 1. Continued

93 if (head[$] == node && state[$] != 1) then state[$] = 0; endif; We no longer want edges with a head of 5 so we throw those out of consideration by setting their state’s to 0. This would be edges 6 5 and 9 5 in data file. Continued

94 The Graph and Its Data File DE HI FC G BA 8 6 5 3 3 2 2 2 1 6 1 4 2 47 1 2 2 1 6 7 1 7 3 2 1 2 2 3 4 2 7 6 3 2 4 3 8 2 3 4 2 4 5 1 4 8 8 4 3 2 5 4 1 5 9 2 5 6 6 6 1 7 6 9 5 6 5 6 7 2 6 7 1 3 7 9 1 7 8 3 8 7 3 8 9 4 8 4 8 8 3 2 9 7 1 9 6 5 9 8 4 9 5 2 Green entries are edges thrown out.

95 Continued if (state[$] == 3 && node == tail[$]) then state[$] = 2; endif; The edges turned to a state of 2 are then: 5 4 5 9 5 6 Recall these are possible candidates for the next round. Do we want 5 4? Isn’t 4 5 already in? Solving the problem by using a picture didn’t run into this problem because once 5 4 was in, the 4 5 was eliminated from consideration automatically. So- we need to correct this. When an edge is included like X Y, we need to set the state of Y X to 0 to keep it out of further consideration. Is anything else needed?

96 Correct & Implement MST Algorithm in C n The algorithm as coded selects D first, while we selected A. The business at the beginning to select a minimal weight edge and use one of its node’s as the starting point was to avoid the need to assign a character to a variable. Since we are using integer nodes, we could eliminate setscope graph[$] node = tail[mindex(weight[$])]; endsetscope; and just set node to 1, i.e. node = 1; This might help you see what is going on. Try to trace the MST algorithm with this change and correct it. (Homework)

97 Shortest Path Problem for Graphs The minimal spanning tree algorithm by Prim is called a greedy algorithm. Greedy algorithms are usually applied to optimization problems – i.e. a set of configurations is searched to find one that minimizes or maximizes some objective function defined on these configurations. The approach is to proceed with a sequence of choices. –The sequence starts from some well-understood starting configuration. –Then we iteratively make choices that are locally best from among those currently possible. This approach does not always lead to a solution, but if it does, the problem is said to possess the greedy-choice property.

98 The Greedy-choice Property. This property says a global optimal configuration can be reached by a series of locally optimal choices – i.e. choices that are best from among the possibilities available at a time. This allows us to avoid the exponential timing that would result if, for example, we had to generate all trees in a graph and then find the minimal one. Many other problems are known to have the greedy choice problem. However, you need to be careful. Sometimes just a slight change in the wording of the problem turns it into a problem that doesn’t have the greedy-choice property. In fact, a slight change can produce an NP-complete problem.

99 Some Problems Known to Have the Greedy- choice Property (Minimal Spanning Tree) – just discussed (Shortest Path) Find the shortest path between two nodes on a connected, weighted graph where the weights are positive and represent distances. (Fractional Knapsack) Given a set of n items, such that each item i has a positive value b i and a positive weight w i. Find a maximum value subset that does not exceed a given weight W, provided we can take fractional values for the items, –Think of this as a knapsack being filled to not exceed the weight you can carry. Each item has benefit to you, but it can be split up into fractional parts, as is possible with granola bars, popcorn, water, etc.

100 However, The Wording is Delicate The Fractional Knapsack Problem is one that must be carefully stated. If, for the n items, you only allow an item to be taken or rejected, you have the 0-1 Knapsack Problem which is known to be NP-complete – i.e. it doesn’t have the greedy choice property. This has a pseudo-polynomial algorithm – i.e. one that runs in O(nW) time, where W is the weight. So the timing is not proportional just to the input size of the problem, n, but to a function involved in the problem statement. In fact, if W = 2 n, then the pseudo-polynomial algorithm for this problem is as bad as the brute force method of trying all combinations.

101 Some Problems with the Greedy-choice Property (Task Scheduling Problem) We are given a set T of n tasks such that each task i has a start time s i and a finish time f i where s i < f i. –Task i must start at time s i and it is guaranteed to be finished by time f i. –Each task has to be performed on a machine and each machine can execute only one task at a time. –Two tasks i and j are non-conflicting if f i ≤ s j or f j ≤ s i. –Two tasks can be scheduled to be executed on the same machine only if they are non-conflicting. What is the minimum number of machines needed to schedule all the tasks?

102 A Greedy-choice Algorithm for the Shortest Path Problem Given a connected graph with positive weights and two nodes s, the start node, and d, the destination node. Find a shortest path from s to d. A greedy choice algorithm is due to Dijkstra. Unlike the MST algorithm, more must be considered than just the minimum weight on edge leading out of a node. It is easy to find examples where that approach won’t work for this problem. Try to find one. (Exercise)

103 Dijkstra’s Sequential Algorithm for the Shortest Path Problem Let S be the set of nodes already explored and V all the nodes in the graph For each u in S, we store a distance value d(u) which will be defined below. Initially, only s, the starting point, is in S and d(s) =0. While S doesn’t include dp, the destination point, –Select a node v not in S with at least one edge from S for which the following is minimal: d’(v) = min{ d(u) + wgt(u,v)} –Here, the min is taken over all edges e=(u,v) with u  S and v  S and wgt(u,v) is the weight of edge e. Add v to S and define d(v) = d’(v). Stop when dp, the destination point, is placed in S.

104 Example of the Greedy-choice – only part of the graph is shown Set S s a b 1 2 e c x 3 1 2 4 2 2 3 d(a) = 1 d(b) = 2 d(s) = 0 Choose minimal from: d’(c) = d(a) + 3 = 4 d’(x) = min {d(a) + 2, d(s) + 4, d(b) +2} = 3 d’(e) = d(b) + 3 = 5 Therefore, let d(x) =3 and put x in S.

105 Shortest Path Homework More information about this assignment will be posted on the homework section of course webpage. You should first complete your homework for the MST.

An Associative Program for the MST Problem Part 2 of Associative Computing.

Similar presentations

Presentation on theme: "An Associative Program for the MST Problem Part 2 of Associative Computing."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Associative Program for the MST Problem Part 2 of Associative Computing.

Similar presentations

Presentation on theme: "An Associative Program for the MST Problem Part 2 of Associative Computing."— Presentation transcript:

Similar presentations

About project

Feedback