Presentation on theme: "National Institute of Advanced Industrial Science and Technology Experiences through Grid Challenge Event Yoshio Tanaka."— Presentation transcript:
National Institute of Advanced Industrial Science and Technology Experiences through Grid Challenge Event Yoshio Tanaka
Grid Challenge A competition for programming on a Grid Main objectives For participants: To provide opportunities to use real Grid (for participants) For us: To understand obstacles/problems to make a Grid production level (1000cpus are shared by many users) To have an opportunity to encourage participants to use our software (e.g. Ninf-G, GXP) 30 students/graduates were participated in this event Provide 960cpus testbed for participants Schedule Preliminary: Feb. 1 ~ Feb. 28 Final round: Mar. 5 ~ Mar. 20
Grid Challenge Two categories Regular routine A problem is provided Graphic image analysis count the number of objects Ranked by the performance, i.e. which is the fastest program? Free routine Can do anything interesting Could have experiences on running his own software on real Grid
Software Software provided by the organizer ssh GXP GT2 & batch & jobmanager MPICH (p4) Ninf-G2 Other software can be installed by participants
Contributed resources Sites#nodes/#cpus IP addresses Administrated by TITECH / Matsuoka 100/200PublicTanaka-san TITECH / Aida30/60Private Prof. Aida + students Tokushima U. 50/100Private Prof. Ono + students U. Tsukua 20/40Publicstudents UEC50/100Private AIST Support U. Tokyo 40/40Public AIST Support U. Tokyo 63/126Private AIST Support U. Tokyo 107/214Public AIST Support AIST40/80public Total500/960
Preparation ( ~ Feb. 1) Administrators installed software in every site Participants sent ssh public key Administrators created accounts for all participants Participants tested each cluster login compile test run Participants obtained Globus certificates from AIST GTRC CA (if necessary) Participants sent Subject DN and administrators added their entries to grid-mapfile
Preparation (~ Feb. 1) (cont d) AIST provided A document for obtaining Globus certificate Test script for Globus A how-to document and sample programs of Ninf-G2 How to develop Ninf-G apps step-by-step Obtain certificate Test globus Develop and run Ninf-G apps client configuration file for the Grid challenge environment
Problems 30 participants shared 960 cpus for one month Some used ssh for process invocation Some used GXP for process invocation Some used Ninf-G2 for process invocation Need to take care (many) trouble shooting Some nodes went down pbs daemon died students usually made experiments in midnight Interactive use of backend nodes (via ssh/GXP) was allowed F32 prohibits interactive use AIST could not provide F32
Problems (cont d) Participants expected that all processes would be launched immediately (co-allocation) ssh/GXP enables it Ninf-G2 could not expect In order to keep fairness, we decided to change the configuration of batch queuing system For each processor, set the max number of processes per user to 1 Increased the max number of processes per processor to the number of participants (30) This is an unusual configuration!!
Insights valuable for PRAGMA Mixture of batch and interactive use introduce a problem batch is expected to provide dedicated environment load balancing Interactive use (via ssh) may disturb batch But some middleware/apps require interactive use co-allocation / grid-level scheduler is hard to solve (basically) Applications should not expect all resources are available Application developers need extra work for this feature Possible solutions Make application capable for using only available resources in as-is strategy Implement co-allocation based on reservation No grid-level reservation system yet Should be done manually Do we have the same problem in PRAGMA routine-basis experiments?