Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting.

Similar presentations


Presentation on theme: "Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting."— Presentation transcript:

1 Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting in Singapore

2 Kento Aida, Tokyo Institute of Technology What is Grid Challenge? programming competition to develop high- performance programs on the Grid The organizer operates a Grid testbed. Participants develop/run programs on the testbed. a special event in the Annual Symposium on Advanced Computing Systems and Infrastructures (SACSIS) history 1st Grid Challenge in SACSIS 2005 2nd Grid Challenge in SACSIS 2006

3 Kento Aida, Tokyo Institute of Technology Category compulsory programming competition on the Grid testbed solving the problem provided by the organizer  Graph Partitioning Problem students (university and high school) free giving opportunities to perform experiments on the Grid presentations during the conference students, engineers and researchers

4 Kento Aida, Tokyo Institute of Technology Compulsory Graph Partitioning Problem for given undirected graph G(V,E), |V| = 2n L and R are disjoint partitions generated by equally dividing G, where |L| = |R|. Find partition that minimizes the number of edges with one endpoint in L and the other in R. 2 3 4 5 61 LR

5 Kento Aida, Tokyo Institute of Technology Compulsory (cont’d) qualifying runs (3 weeks) Solve early!  to find a solution within a given threshold  shared resources  problem size: |V| = 500 - 1500 final runs (2 weeks) Solve fast!  dedicated time slots for finalists (2.5h per a team)  to find a solution within a given period (10 min)  A finalist with the best solution will be a winner!  problem size: |V| = 30000 - 35000

6 Kento Aida, Tokyo Institute of Technology Free experiments of research projects (1 month) shared resources projects tools  a monitoring tool, a message passing system, a programming tool, volunteer computing applications  physics simulation, bio informatics, simulation of diesel engine, optimization problems

7 Kento Aida, Tokyo Institute of Technology Participants D, 2 M, 12 U, 6 H, 1 compulsory free D, 2 M, 5 U, 1

8 Kento Aida, Tokyo Institute of Technology Testbed Grid Challenge Federation AIST Tokyo Institute of Technology The University of Tokyo Doshisha University more than 1,200 CPUs

9 Kento Aida, Tokyo Institute of Technology Resources collection of PC clusters spec of a PC cluster a gateway node  gateway, compiling computing nodes  computation global IP address/private IP address NFS  “/home” is shared among nodes

10 Kento Aida, Tokyo Institute of Technology Resources (cont’d) namesitecompt. node#compt. node (#CPUs) F32AIST (Tsukuba) Xeon 3GHz x2, 4GB mem., 1000BASE-T 128(256) SAKURAOpteron 1.8GHz x2, 3GB mem., 1000BASE-T 16(32) DISTITECH (Yokohama) Athlon MP 2000+ 1.6GHz x2, 512MB mem. 100BASE-TX 50(100) PrestoIIITITECH (Tokyo) Opteron 246/242 2/1.6GHz x2, 4/3/2GB mem. 1000BASE-T 103(206) TauU. Tokyo (Tokyo) Xeon 2.4/2.8GHz x2, 2GB mem., 1000BASE-T 175(350) ChikayamaU. Tokyo (Chiba) Xeon 2.4GHz x2, 2GB mem., 1000BASE-T 64(128) XeniaDoshisha U. (Kyoto) Xeon 2.4GHz x2, 1GB em. 100BASE-TX 63/126

11 Kento Aida, Tokyo Institute of Technology Internet Connection Tsukuba WAN F32 SAKURA PrestoIII Chikayama Tau DIS SINET Xenia WIDE

12 Kento Aida, Tokyo Institute of Technology Software Grid middleware Globus Tool Kit 2.4 batch queueing system Sun Grid Engine, PBS remote process invocation SSH, GXP monitoring Ganglia programming MPICH 1.2.7, Ninf-G 2.4

13 Kento Aida, Tokyo Institute of Technology GXP shell for distributed multi-cluster environment fast simultaneous command submissions parallel job pipes interactive selection of nodes to execute commands no cumbersome per-node operations! installation and deployment invocation of parallel processes monitoring, trouble diagnosis, debugging dead processes clean-up http://www.logos.ic.i.u-tokyo.ac.jp/phoenix/gxp_quick_man.shtml

14 Kento Aida, Tokyo Institute of Technology Ninf-G reference implementation of GridRPC GridRPC : a simple RPC-based programming model for the Grid  Client invokes remote libraries installed on remote servers on the Grid.  utilizing task parallelism http://ninf.apgrid.org/ server library server library data result data result client program server program grpc_call(…)

15 Kento Aida, Tokyo Institute of Technology Ganglia a distributed monitoring tool for high- performance computing systems such as PC clusters and Grids CPU load memory usage network traffic http://ganglia.sourceforge.net/

16 Kento Aida, Tokyo Institute of Technology Operation The testbed is operated by volunteers! researchers/technical staff/students What we need to do installation and its training for students user management job management

17 Kento Aida, Tokyo Institute of Technology User Management local account the same UID and login name for a user on all sites remote login via ssh  public key Globus account temporal CA for the Grid Challenge

18 Kento Aida, Tokyo Institute of Technology Job Management interactive or batch All sites provide both environment for job execution. dedicated slot Finalists are assigned dedicated slots for their application runs. the gentlemen’s agreement

19 Kento Aida, Tokyo Institute of Technology Troubles … computing nodes OS hang up, troubles on hard disc drives power supply failure of balancing power supply servers troubles on NFS, batch queueing systems monitoring troubles to collect monitoring data on ganglia

20 Kento Aida, Tokyo Institute of Technology Troubles … (cont’d) jobs being out of control waste of CPU/memory resources by jobs being out of control dedicated slots jobs running beyond its slot.

21 Kento Aida, Tokyo Institute of Technology Operational Issue trouble on computing nodes monitoring tools to identify computing nodes power supply critical problem for small groups, e.g., a lab in university tools for power monitoring low-power processor servers redundancy

22 Kento Aida, Tokyo Institute of Technology Operational Issue (cont’d) user/process management tools to control user processes  monitoring user processes  detecting unusual behavior  suspending/killing jobs being out of control tools for reservation  reserving dedicated slots for users  controlling user jobs

23 Kento Aida, Tokyo Institute of Technology Snapshots qualifying runs final runs

24 Kento Aida, Tokyo Institute of Technology Snapshots (cont’d)

25 Kento Aida, Tokyo Institute of Technology Conclusions Grid Challenge is programming competition to develop high-performance programs on the Grid. compulsory and free categories Grid testbed for Grid Challenge 6 sites, 7 PC clusters, >1200 CPU Globus, SGE, PBS, GXP, Ganglia, Ninf-G, MPICH, … discussion about operational issue tools for monitoring, power supply, user/process management

26 Kento Aida, Tokyo Institute of Technology Acknowledgements Information Processing Society of Japan Sun Microsystems Soum Corporation Grid Consortium Japan

27 Kento Aida, Tokyo Institute of Technology Thank you.


Download ppt "Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting."

Similar presentations


Ads by Google