HPC 01 Communication Models, Speedup and Scalability Schoenauer sec 8.2,8.4.

HPC 01 Communication Models, Speedup and Scalability Schoenauer sec 8.2,8.4

Message Passing Time zTo send l bytes t comm ( l ) = t startup + (h-1) t start-hop +(l+l 0 ) t send + t block t startup :Total time in setting up the communication t start-hop :Time for switching each “hop” in wormhole routing h : no. of hops l : no. of bytes to transfer l 0 : extra header bytes that are also moved t send :time to actually transfer 1 byte t block : time used in blocked messages en route

Communication Model zSpeed = l/t comm Actual << Theoretical hardware limit advertised zConsequences ySend messages in blocks -- avoid small single messages yArrange data distributions to get nearest neighbor communications e.g. use ring shift with direct neighbors

Communication Model zProgram with logical processor numbers

Communication Model zLatency Hiding: use asynchronous messaging to overlap communication and computation ( MPI_ISEND,MPI_IRECV ) yDomain decomposition in solving grid problems; Compute with first and communicate those while computing

Amdahl’s Law zConsider the execution of a program on p processors -- let the part q (0<q<1) of each operation be parallelized. Maximum speedup ysp false = t 1 /t p = 1/ [(q/p) +(1-q)] yIndicates the rapid loss of speedup if parallel fraction is not high enough as p increases yTo get 50% efficiency i.e. 256 on 512 q =0.998

Amdahl’s Law

zWhy False in speedup ? yAssumed that no. of ops are same for sequential and parallel -- usually algorithms and data structures are different yDid not account for parallelization cost -- communication and synchronization costs! yassumed that performance does not change for sequential/parallel code (diff. vector length...)

Speedup honest zsp hon = t 1 for best seq. algo./t p for real parallel algo x= [t1..]/[...+h bas +ph p ] (complex form -diff to use) zh p : communication time that depends on p yp --> infty ysp hon -->0

Scalability zThere is an optimal number of processors for each problem zFixed problem size with increasing numbers of processors is a poor use of parallel machine

Scalability zIncreasing problem size with increasing numbers of processors leads to better use of parallel machine

Scalability zNow let problem size m-->infty as p -->infty

Scalability zThus scalability is the desired measure of a parallel algoritthm/code and not speedup! zScalability is achieved if the quantity x[h p *p/m] is constant or increases very slowly as p increases

HPC 01 Communication Models, Speedup and Scalability Schoenauer sec 8.2,8.4.

Similar presentations

Presentation on theme: "HPC 01 Communication Models, Speedup and Scalability Schoenauer sec 8.2,8.4."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

HPC 01 Communication Models, Speedup and Scalability Schoenauer sec 8.2,8.4.

Similar presentations

Presentation on theme: "HPC 01 Communication Models, Speedup and Scalability Schoenauer sec 8.2,8.4."— Presentation transcript:

Similar presentations

About project

Feedback