Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evolution of a High Performance Computing and Monitoring system onto the GRID for High Energy Experiments T.L. Hsieh, S. Hou, P.K. Teng Academia Sinica,

Similar presentations


Presentation on theme: "Evolution of a High Performance Computing and Monitoring system onto the GRID for High Energy Experiments T.L. Hsieh, S. Hou, P.K. Teng Academia Sinica,"— Presentation transcript:

1 Evolution of a High Performance Computing and Monitoring system onto the GRID for High Energy Experiments T.L. Hsieh, S. Hou, P.K. Teng Academia Sinica, Taipei, Taiwan HTTP proxy CDF GRID Computing Condor IPAS_OSG Taiwan-IPAS-LCG2 submitter monitor CE Grid nodes Glide-ins scheduler CAF is also a Portal for users Resources are shared with others acquire resources from grid if needed HTTP proxy Jobs are self-contained PacificCAF job CDF users CDFSoft2 parrot Glide-ins scheduler fcp firewall dCAF Computing Model Worker nodes submitter monitor Software distribution by NFS job Condor CDF user CDFSoft2 ATLAS GRID Computing ATLAS users Dispatcher AtlasSoft Taiwan-IPAS-LCG2 Taiwan-LCG2 CE Grid nodes Resources are shared with others firewall ATLAS job Glide-ins submitter Glide-ins submitter Glide-ins submitter job SE Conglia web browser A Conglia monitoring system is developed for interface to Condor. It is a web- browsing system that illustrates job status particularly useful for debugging of jobs and system errors by tracing the progress of the jobs. Commands in a running section are printed and the CPU history are shown in graphs. Integrated resources and monitoring Condor ASGC_OSG Taiwan-IPAS-LCG2 CDF UI Other UIs Conglia ATLAS UI WNs IPAS_OSG middlewares Pacific CAF A distributed computing facility is developed at Academia Sinica for remote data processing of high energy physics experiments. We first developed a dCAF (de-centralized CDF Analysis Farm) for the Collider Detector at Fermilab experiment, which provides a “portal” to users with a single submission prototype to access a dozen dCAFs in Pacific Asia, Europe and North America. The dCAF service includes Submitter: which accepts user’s job that contains a task archive (tarball) submitted to the local batch system – Condor is in use; Monitor: which offers limited access to job scratch area and web browser services to job information. Users can 1. get job status, list user jobs, and show processes of a job 2. remove, hold, release jobs 3. display files of a job. The customized dCAF is upgraded to become the PacificCAF that is capable of allocating resources on LCG (LHC Computing GRID) and OSG (Open Science GRID). It is a regional resource collector that has a Glide-in Condor pool with CPUs of joined GRID sites in the Pacific Asia region; Generic Connection Brokering (GCB) to nodes in GRID sites protected by Firewalls. PARROT HTTP service for distribution of dedicated CDF software. We also develop Tier services for applications of the ATLAS experiment on the LCG. A coherent computing service model is constructed to share GRID resources for users of high energy experiments. ATLAS users submit jobs to a Condor-G platform that has a dispatcher server which performs as a resource collector for a Condor pool of joined LCG sites In migration to GRID, we have integrated computing clusters into a Condor system of over 250 CPUs for local users and GRID access to LCG and OSG. The Condor system has multiple submission nodes. Some act as User Interface (UI) nodes and some as Computing Element (CE) gatekeepers. We have a 2 Gb link to the Taiwan LHC Tier-1 GRID site, several Gb links to sites in Pacific Asia region and a total 10 Gb to US and Europe. The 9 th International Conference on High Performance Computing, Grid and e-Science in Asia Pacific 。 September 9-12 2007, Seoul, Korea


Download ppt "Evolution of a High Performance Computing and Monitoring system onto the GRID for High Energy Experiments T.L. Hsieh, S. Hou, P.K. Teng Academia Sinica,"

Similar presentations


Ads by Google