Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia.

Similar presentations


Presentation on theme: "Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia."— Presentation transcript:

1 Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia

2 Parallel Programming on the SGI Origin2000 1)Parallelization Concepts 2)SGI Computer Design 3)Efficient Scalar Design 4)Parallel Programming -OpenMP 5)Parallel Programming- MPI

3 Academic Press 2001 ISBN 1-55860-671-8

4 1)Parallelization Concepts

5 Introduction to Parallel Computing Parallel computer :A set of processors that work cooperatively to solve a computational problem. Distributed computing : a number of processors communicating over a network Metacomputing : Use of several parallel computers

6 Parallel classification Parallel architectures Shared Memory / Distributed Memory Programming paradigms Data parallel / Message passing

7 Why parallel computing Single processor performance – limited by physics Multiple processors – break down problem into simple tasks or domains Plus – obtain same results as in sequential program, faster. Minus – need to rewrite code

8 Shared memory Cluster Vector Processor Three HPC Architectures

9 Each processor can access any part of the memory Access times are uniform (in principle) Easier to program (no explicit message passing) Bottleneck when several tasks access same location Shared Memory

10 CPU Memory CPU Symmetric Multiple Processors Examples: SGI Power Challenge, Cray J90/T90 Memory Bus

11 Data-parallel programming Single program defining operations Single memory Loosely synchronous (completion of loop) Parallel operations on array elements

12 Memory CPU Memory CPU Memory CPU Memory CPU Distributed Parallel Computing Examples: SP2, Beowulf clusters

13 Message Passing Programming Separate program on each processor Local Memory Control over distribution and transfer of data Additional complexity of debugging due to communications

14 Distributed Memory Processor can only access local memory Access times depend on location Processors must communicate via explicit message passing

15 Message Passing or Shared Memory? Takes longer to implement More details to worry about Increases source lines Complex to debug and time Increase in total memory used Scalability limited by: - communications overhead - process synchronization Parallelism is visible Easier to implement System handles many details Little increase in source Easier to debug and time Efficient memory use Scalability limited by: - serial portion of code - process synchronization Compiler based parallelism Message Passing Shared Memory

16 Performance issues Concurrency – ability to perform actions simultaneously Scalability – performance is not impaired by increasing number of processors Locality – high ration of local memory accesses/remote memory accesses (or low communication)

17 Objectives of HPC in the Technion Maintain leading position in science/engineering Production: sophisticated calculations Required: high speed Required: large memory Teach techniques of parallel computing In research projects As part of courses

18 HPC in the Technion SGI Origin2000 22 cpu (R10000) -- 250 MHz Total memory -- 9 GB 32 cpu (R12000) – 300 MHz Total memory - 9GB PC cluster (linux redhat 9.0) 6 cpu (pentium II - 866MHz) Memory - 500 MB/cpu PC cluster (linux redhat 9.0) 16 cpu (pentium III – 800 MHz) Memory – 500 MB/cpu

19 Origin2000 (SGI) 128 processors

20 Origin2000 (SGI) 22 processors

21 PC clusters (Intel) 6 processors 16 processors

22

23 Image courtesy Harvey Newman, Caltech Data Grids for High Energy Physics Tier2 Centre ~1 TIPS Online System Offline Processor Farm ~20 TIPS CERN Computer Centre FermiLab ~4 TIPS France Regional Centre Italy Regional Centre Germany Regional Centre Institute Institute ~0.25TIPS Physicist workstations ~100 MBytes/sec ~622 Mbits/sec ~1 MBytes/sec There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~622 Mbits/sec or Air Freight (deprecated) Tier2 Centre ~1 TIPS Caltech ~1 TIPS ~622 Mbits/sec Tier 0 Tier 1 Tier 2 Tier 4 1 TIPS is approximately 25,000 SpecInt95 equivalents

24 GRIDS: Globus Toolkit Grid Security Infrastructure (GSI) Globus Resource Allocation Manager (GRAM) Monitoring and Discovery Service (MDS): Global Access to Secondary Storage (GASS):

25 November 2004

26 A Recent Example Matrix multiply

27 Profile -- original

28 Profile – optimized code

29

30

31

32

33

34

35


Download ppt "Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia."

Similar presentations


Ads by Google