Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ian Gable University of Victoria/HEPnet Canada 1 GridX1: A Canadian Computational Grid for HEP Applications A. Agarwal, P. Armstrong, M. Ahmed, B.L. Caron,

Similar presentations


Presentation on theme: "Ian Gable University of Victoria/HEPnet Canada 1 GridX1: A Canadian Computational Grid for HEP Applications A. Agarwal, P. Armstrong, M. Ahmed, B.L. Caron,"— Presentation transcript:

1 Ian Gable University of Victoria/HEPnet Canada 1 GridX1: A Canadian Computational Grid for HEP Applications A. Agarwal, P. Armstrong, M. Ahmed, B.L. Caron, A. Charbonneau, R. Desmarais, I. Gable, L.S. Groer, R. Haria, R. Impey, L. Klektau, C. Lindsay, G. Mateescu, Q. Matthews, A. Norton, W. Podaima, S. Popov, D. Quesnel, S. Ramage, R. Simmonds, R.J. Sobie, B. St. Arnaud, D.C. Vanderster, M. Vetterli, R. Walker CANARIE Inc., Ottawa, Ontario, Canada Institute of Particle Physics of Canada National Research Council, Ottawa, Ontario, Canada TRIUMF, Vancouver, British Columbia, Canada University of Alberta, Edmonton, Canada University of Calgary, Calgary, Canada Simon Fraser University, Burnaby, British Columbia, Canada University of Toronto, Toronto, Ontario, Canada University of Victoria, Victoria, British Columbia, Canada

2 Ian Gable University of Victoria/HEPnet Canada 2 Overview Motivation The GridX1 Framework –Middleware, Metascheduling, Monitoring User Applications –BaBar and ATLAS Web Services for GridX1

3 Ian Gable University of Victoria/HEPnet Canada 3 Motivation Particle physics (HEP) simulations are “embarrassingly parallel”; multiple instances of serial (integer) jobs We want to exploit the unused cycles at non-HEP sites –Support dedicated and shared facilities –Each shared facility may have unique configuration requirements –Minimal software demands on sites We want to develop a general grid –Open to other applications (serial, integer)

4 Ian Gable University of Victoria/HEPnet Canada 4 The GridX1 Resources GridX1 has used 8 shared clusters: Alberta(2), NRC Ottawa, WestGrid, Victoria(2), McGill, Toronto Total resources >> (2500 CPUs,100 TB disk,400 TB tape) Site Requirements: OS: Red Hat Enterprise Linux, Scientific Linux, CentOS, Suse Linux LRMS: PBS or Condor batch system Network: - External network access needed for worker nodes - Most sites have 1Gbit/s network connectivity

5 Ian Gable University of Victoria/HEPnet Canada 5 The GridX1 Infrastructure Grid Middleware –Virtual Data Toolkit: packaged version of Globus Toolkit 2.4 VDT is more stable than vanilla GT2 –We are evaluating GT4 & web services more on this later Security and User Management –GridX1 hosts require an X.509 certificate issued by the Grid Canada Certificate Authority –User certificates from trusted CAs around the world are accepted Authorization is managed at site level in a grid-mapfile User certificates are mapped to local unix accounts

6 Ian Gable University of Victoria/HEPnet Canada 6 GridX1 Resource Brokering We use Condor-G for resource brokering –Flexible and Scalable Collector: accepts resource advertisements from clusters Scheduler: queues jobs, submits to resources Negotiator: performs matchmaking between tasks and resources –Jobs specify Rank and Requirements –Eg. Rank = -Estimated Wait TimeRequirements: OS == Linux

7 Ian Gable University of Victoria/HEPnet Canada 7 The system scales: –To increase job throughput we add a Condor scheduler. Condor-G: A Scalable Metascheduler Condor-G system for BaBar Condor-G system for ATLAS

8 Ian Gable University of Victoria/HEPnet Canada 8 Condor-G Adapted for Atlas We have had success with CondorG on GridX1 These techniques were applied to build a CondorG executor to submit jobs to Atlas-LCG sites: 1.Site information is extracted from the BDII and converted to ClassAds 2.The CondorG executor running at UVic extracts jobs from the Atlas Prodsys DB and submits them to CondorG 3.Condor Matchmaking matches jobs to Atlas and Canadian sites

9 Ian Gable University of Victoria/HEPnet Canada 9 GridX1 is monitored using a Google Maps Mashup GridX1 Monitoring

10 Ian Gable University of Victoria/HEPnet Canada 10 GridX1 Monitoring A web-based dynamic resource monitor Employs Web 2.0/AJAX techniques

11 Ian Gable University of Victoria/HEPnet Canada 11 Applications: ATLAS Status 2004-2005 GridX1 used by the ATLAS experiment via the LCG-TRIUMF gateway Over 20,000 ATLAS jobs successfully completed Success rate of jobs was similar to LCG (50%)

12 Ian Gable University of Victoria/HEPnet Canada 12 Applications: Atlas Status 2006 Currently many GridX1 sites receive jobs directly from the Atlas-LCG Condor-G executor. –HEP clusters are being commissioned as Atlas Tier 2 sites and are linking directly to the LCG. –Non-HEP clusters will be connected using an interface Atlas Tier-1 Center being built at TRIUMF –10G lightpath link to be handed over to CANARIE November 1 from SURFnet to Connect CERN to Tier-1 centre at TRIUMF. –1G Lightpaths currently being established from University of Toronto, and UVic to TRIUMF

13 Ian Gable University of Victoria/HEPnet Canada 13 Applications: Atlas Future Plans Effort will be focused on recommissioning a GridX1 interface to facilitate addition of non-HEP sites –Non-LCG resources are integrated into LCG without all LCG middleware –Greatly simplifies the management of shared resources VM's such as Xen can be used to simplify the requirements at non- HEP sites CHEP 2006 Paper: Evaluation of Virtual Machines for HEP Grids –We showed that negligible performance penalty was suffered by the Atlas kit validation when run on Xen Virtual Machine. –We plan to research deploying pre packaged Atlas and BaBar images to GridX1 sites.

14 Ian Gable University of Victoria/HEPnet Canada 14 Applications: BaBar Status Monthly successful job output plotted at Bottom. –GridX1 production has peaked at 30000 jobs per month GridX1 provides ~50% of total Canadian BaBar production. –~15% of global production –Plan to move all Canadian BaBar Production to GridX1.

15 Ian Gable University of Victoria/HEPnet Canada 15 Current Development: Exploring SOA Grid Investigating service-oriented grid middleware –Targeted Metascheduler & Registry Services Deployed a GT4 testbed at UVic and NRC Metascheduler service – based on Condor-G Registry service – WS-MDS

16 Ian Gable University of Victoria/HEPnet Canada 16 A Metascheduler Service based on Condor-G Condor-G Job Manager GT4 Condor-G JobManager MDS ClassAd Extraction Tool Information Provider GLUE CE Schema with required Condor-G extensions

17 Ian Gable University of Victoria/HEPnet Canada 17 Summary Built upon proven technologies: VDT, Condor-G GridX1 allows us to exploit unused resources at HEP and non-HEP sites Dynamic grid monitor available at http://monitor.gridx1.ca/ GridX1 usage by ATLAS and BaBar applications is successful –Used for ATLAS DC2 during July 2004 – June 2005 –Receiving jobs from Atlas Executor in 2006 –Daily ~1000 BaBar jobs run daily Moving towards a Web Services based architecture.


Download ppt "Ian Gable University of Victoria/HEPnet Canada 1 GridX1: A Canadian Computational Grid for HEP Applications A. Agarwal, P. Armstrong, M. Ahmed, B.L. Caron,"

Similar presentations


Ads by Google