Download presentation

Presentation is loading. Please wait.

Published byHeidi Butrum Modified over 2 years ago

1
879 CISC Parallel Computation High Performance Fortran (HPF) Ibrahim Halil Saruhan saruhan@cis.udel.edu Although the [Fortran] group broke new ground … they never lost sight their main objective, namely to produce a product that would be acceptable to practical users with real problems to solve. Fortran … is still by far the most popular language for numerical computation Maurice V. Wilkes

2
Outline Introduction Brief History of Fortran and HPF HPF Directives, Syntax Data Mapping Data Parallelism Putting It all Together Intrinsic Procedures Extrinsic Procedures References and Further Information

3
Introduction HPF a language that combines the full Fortran 90 language with special user annotations dealing with data distribution. will be a standard programming language for computationally intensive applications on many types of machines. a set of extensions to Fortran expressing parallel execution at a high level. designed to provide a portable extension to Fortran 90 for writing data parallel applications. HPFF is the group of people who developed HPF. Since its introduction almost four decades ago, Fortran has been the language of choice for scientific and engineering programming, and HPF is the latest set of extensions to this venerable language.

4
Brief History of Fortran and HPF Early 1950’s The first programming language to be called Fortran was developed by IBM 1957 Became popular after the first compiler delivered to a customer 1966 ANSI published the first formal standard for Fortran including features like integer, real, double precision, do loop, if conditionals, subroutines, functions, Hollerith data type( replaced with character type) and global Variables. This standard is called Fortran 66 1978 ANSI and ISO published a new standard (Fortran 77) including features like If then else if end if conditional statements, complex data type, complex constants, complex numbers, character type, formatted, unformatted and direct-access file input and output.

5
Brief History of Fortran and HPF To satisfy the need for efficient programming on the new generation of parallel Machines, Fortran should need extensions and that leads to the beginning of HPF. 1991 Desire for Revision on FORTRAN 77 standard let to the work on Fortran with the title of Fortran 8X and resulted in 1991 by ISO and renamed as Fortran 90. Its goal was to modernize Fortran so that it may continue its long history as a scientific and engineering programming language. The first group to discuss standardization of parallel Fortran features was the Parallel Computing Forum (PCF). The original goals of the group were to standardize the language features for task oriented parallelism and shared memory machines. 1991 Nov Digital Equipment Corporation organized a meeting at the Supercomputing ’91 conference in Albuquerque, New Mexico to discuss HPF

6
Brief History of Fortran and HPF 1992 Jan Kickoff meeting for HPFF in Houston Texas, hosted by the Center for Research on Parallel Computation at Rice University. Over 130 people attended and the meeting is size was larger than expected, a series of smaller “working group” meetings was scheduled to create the language draft. 1992 Mar The HPFF working group, nearly 40 people, met for the first time in Dallas, Texas. Eight further meetings were held. 1993 May The HPFF working group produced the HPF language Specification version 1.0

7
HPF Directives and Their Syntax The form of an hpf-directive-line (H201) is : Directive-origin hpf-directive where a directive-origin(H202) is one of !HPF$ CHPF$ *HPF$ Fortran 90 allows comments to begin with “C” and “*” as well as “!” in the fixed source form, but allows only “!” to begin a comment in free source form. There are two forms of directive in HPF: specification-directive executable-diretive specification-directive (H204): Must be in the specification part of the program unit executable-directive (H205): Appears with the other Fortran 90 executable-constructs in the program unit. Examples :align, distribute, processors …

8
HPF-conforming or not? !HPF$ DISTRIBUTE (CYCLIC) :: PERIODIC_TABLE … RIGHT REAL PERIODIC_TABLE (103); !HPF$ DISTRIBUTE PERIODIC_TABLE (CYCLIC) REAL PERIODIC_TABLE (103) !HPF$ DISTRIBUTE PERIODIC_TABLE (CYCLIC) RIGHT !HPF$ DISTRIBUTE PERIODIC_TABLE (CYCLIC); DISTRIBUTE LOG_TABLE (BLOCK) WRONG !HPF$ DISTRIBUTE PERIODIC_TABLE (CYCLIC) !HPF$ DISTRIBUTE LOG_TABLE (BLOCK) RIGHT WRONG

9
Programming Model of HPF Programming Model Communication Parallelism FORALL DO INDEPENDENT INTRINSINC and STANDARD LIBRARY FUNCTIONS EXTRINSINC FUNCTIONS

10
Data Mapping HPF describes data-to-processor mapping by using two kind of operations: Distribute : Directive that describes how an array is divided into even-sized pieces and distributed to processors in a regular way. REAL A (100,100)Array declaration Result : Each processor receives a 50X50 block of A, like P1 gets A(1:50,1:50) There are 4 processors in this example !HPF$ DISTRIBUTE A (BLOCK, BLOCK) Result : Each processor receives every 4.th row of A, like P1 gets A(1,1:100), A(5,1:100), A(9,1:100) …. !HPF$ DISTRIBUTE A (CYCLIC, *)

11
Data Mapping Align : Directive that describes how two arrays ‘line up’ together. Result : X and Y are always distributed the same !HPF$ ALIGN X(I) WITH Y(I) Result : Elements of X correspond to the elements of Y(A can have at most half as many elements as Y) !HPF$ ALIGN X(I) WITH Y(2*I-1)

12
Data Mapping Example REAL DECK_OF_CARDS (52) !HPF$ DISTRIBUTE DECK_OF_CARDS (CYCLIC) 5678 1234 13141516 9101112 21222324 17181920 29303132 25262728 37383940 33343536 45464748 41424344 49505152 271217 161116 491419 381318 21263136 5101520 23283338 22273237 25303540 24293439 424752 414651 4348 4449 4550 REAL DECK_OF_CARDS (52) !HPF$ DISTRIBUTE DECK_OF_CARDS (CYCLIC(5)) There are 4 processors in this example DECK_OF_CARDS (1:49:4) DECK_OF_CARDS (2:50:4) DECK_OF_CARDS (3:51:4) DECK_OF_CARDS (4:52:4) DECK_OF_CARDS (1:5) and DECK_OF_CARDS (21:25) and DECK_OF_CARDS (41:45)

13
HPF Data Mapping Model Arrays or other objects Group of aligned objects Abstract processors as a user-declared Cartesian mesh Physical Processors ALIGN (static) or REALIGN (dynamic) DISTRIBUTE (static) or REDISTRIBUTE (dynamic) Optional implementation- dependent directive

14
Data Mapping Example REAL, DIMENSION (16) :: A, B, C REAL, DIMENSION (32) :: D REAL, DIMENSION (8) :: X REAL, DIMENSION (0:9) :: Y INTEGER, DIMENSION (16):: INX !HPF$ PROCESSORS, DIMENSION(4) :: PROC !HPF$ DISTRIBUTE, (BLOCK) ONTO PROCS :: A, B, D, INX !HPF$ DISTRIBUTE, (CYCLIC) ONTO PROCS:: C !HPF$ ALIGN (I) WITH Y(I+1):: X

15
HPF Data Mapping Declaration 1234 1234 15913 5678 1234 1234 a b inx d c PROCS (1) 5678 5678 261014 13141516 9101112 5678 PROCS (2) 9101112 9101112 371115 21222324 17181920 9101112 PROCS (3) 13141516 13141516 481216 29303132 25262728 13141516 PROCS (4) x y 01 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9

16
HPF Data Mapping Example 1 FORALL (I=1:16) A(I) = B(I) 1234 1234 5678 5678 9101112 9101112 13141516 13141516 No communication b a

17
HPF Data Mapping Example 2 FORALL (I=1:16) A(I) = C(I) 15913 1234 261014 5678 371115 9101112 48 16 13141516 Total communication is 12 elements c a

18
HPF Data Mapping Example 3 FORALL (I=1:15) A(I) = B(I+1) 1234 1234 5678 5678 9101112 9101112 13141516 13141516 Total communication is 3 elements b a

19
Data Parallelism The most important features used by HPF for parallelism are : Forall and Independent Forall generalizes the Fortran 90 array assignment to handle new shapes of arrays. Independent directive gives the compiler more information about a DO loop or FORALL statement. It tells the compiler that a DO loop doesn’t make any bad data access that force the loop to be run sequentially. Forall is not a loop, not is it a parallel loop as defined in some languages. Forall doesn’t iterate in any well-defined order. !HPF$ INDEPENDENT DO I=1,N X(INDX(I)) = Y(I) END DO

20
Data Parallelism 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43 44 45 51 52 53 54 55 11 12 13 14 15 21 11 23 24 25 31 32 22 34 35 41 42 43 33 45 51 52 53 54 44 FORALL (I = 2:5) A(I,I) = A(I-1,I-1) A single statement FORALL There are two kind of Forall statements. The single statement and the multi statement:

21
Data Parallelism 1 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 0 16 0 0 0 0 0 0 0 0 25 0 0 0 0 0 0 0 0 36 0 0 0 0 0 0 0 0 49 0 0 0 0 0 0 0 0 64 1 2 3 4 0 0 0 0 2 2 6 8 10 0 0 0 3 6 3 12 15 18 0 0 4 8 12 4 20 24 28 0 0 10 15 20 5 30 35 40 0 0 18 24 30 6 42 48 0 0 0 28 35 42 7 56 0 0 0 0 40 48 56 8 FORALL (I = 1:8) A(I,I) = SQRT(A(I,I)) FORALL (j = I-3 : I+3, J/=I.AND. J>=1.AND> J<=8) A(I,J) = A(I,I) * A(J,J) END FORALL A multi statement FORALL

22
Data Parallelism

27
Putting It all Together The total performance of an HPF program is the combination of parallelism and communication. The performance of an HPF program will depend on the programming model, compiler design, target machine characteristics, and other factors. A simple model for the total computation time of a parallel program is : Ttotal = Tpar /Pactive + Tserial + Tcomm Where : Ttotal is the total execution time. Tpar is the total work that can be executed in parallel. Pactive is the number of (physical) processors that are active, executing the work in Tpar Tserial is the total work that is done serially. Tcomm is the cost of communications

28
Example REAL, ARRAY(16,16) :: X, Y ….. FORALL (J= 2:15, K=2:15) Y(J,K) = (X(J,K) + X(J-1,K) + X(J+1,K)+ X(J,K-1), X(J,K+1))/5.0 END FORALL DISTRIBUTE X (*, BLOCK) DISTRIBUTE X (BLOCK,BLOCK) DISTRIBUTE X (*, CYCLIC) P1P2P3P4 P1P2 P3P4 P1P2P3P4P1P2P3P4P1P2P3P4P1P2P3P4 Various distributions of a 16*16 array onto four processors

29
Example DISTRIBUTE X (BLOCK,BLOCK) P1P2 P3P4 DISTRIBUTE X (*, BLOCK) P1P2P3P4 P2 and P3 each must compute 56 elements of Y (a 14X4 sub array of Y) P1 and P4 each must compute 42 elements of Y (a 14X3 sub array of Y) P2 must exchange 14 elements of X with P1 and 14 another elements of X with P3. P3 has the same computation as P2. P1,P4 has less work to do. Overall completion time: Tpar/Pactive is 56 element-computations and communications overhead (TComm) is 28 element-exchanges. Each processor holds 8*8 sub array. Each processor must compute 49 elements of Y. P1 must compute Y(2:8,2:8) … Each processor can compute 36 elements of Y without requiring communication. For the remaining 13 elements of Y it must obtain 7 elementsof X from each of two other processors Tpar/Pactive is 49 element computations T comm is 14 element-exchanges

30
Intrinsic and Library Procedures System Inquiry Functions (like NUMBER_OF_PROCESSORS, PROCESSORS_SHAPE, SIZE) Mapping Inquiry Subroutines (like HPF_ALIGNMENT, HPF_TEMPLATE, HPF_DISTRIBUTION) Computational Functions Bit Manipulation Functions (like ILEN,LEADZ,POPCNT,POPPAR) Array Location Functions (like MAXLOC,MINLOC) Array Reduction Functions (like IALL,IANY,IPARITY,PARITY) Array Combining Scatter Functions (like SUM_SCATTER) Array Prefix and Suffix Functions (like ALL_SCATTER, ANY_SCATTER) Array Sorting Functions (like GRADE_DOWN)

31
Extrinsic Procedures For Instance, INTERFACE EXTRINSINC (COBOL) SUBROUTINE PRINT_REPORT(DATA_ARRAY) REAL DATA_ARRAY(:,:) END SUBROUTINE PRINT_REPORT END INTERFACE HPF provides a mechanism by which HPF programs may call procedures written in other parallel programming languages. Because such procedures are outside of HPF, they are called extrinsic procedures

32
References and Further Information The High Performance Fortran Handbook by Charles H. Koelbel, David B. Loveman, Robert S. Schreiber The MIT Press 1994 (in the library) Designing and Building Parallel Programs, by Ian Foster http://wwwhttp://www unix.mcs.anl.gov/dbpp/text/node82.html#SECTION03400000000000000000 http://www.crpc.rice.edu/HPFF/, Rice University http://www.crpc.rice.edu/HPFF/ http://www.npac.syr.edu/hpfa/, Syracuse University http://www.npac.syr.edu/hpfa/

33
Questions ?

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google