Grid Canada Testbed using HEP applications

Slides:



Advertisements
Similar presentations
ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.
Advertisements

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Grid Architecture Grid Canada Certificates International Certificates Grid Canada Issued over 2000 certificates Condor G Resource TRIUMF.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Cross Cluster Migration Remote access support Adianto Wibisono supervised by : Dr. Dick van Albada Kamil Iskra, M. Sc.
The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Particle Physics and the Grid Randall Sobie Institute of Particle Physics University of Victoria Motivation Computing challenge LHC Grid Canadian requirements.
Randall Sobie BCNET Annual Meeting April 24,2002 The Grid A new paradigm in computing Randall Sobie Institute of Particle Physics University of Victoria.
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
Grid Canada CLS eScience Workshop 21 st November, 2005.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
BaBar Grid Computing Eleonora Luppi INFN and University of Ferrara - Italy.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
LHC Computing Plans Scale of the challenge Computing model Resource estimates Financial implications Plans in Canada.
Ashok Agarwal University of Victoria 1 GridX1 : A Canadian Particle Physics Grid A. Agarwal, M. Ahmed, B.L. Caron, A. Dimopoulos, L.S. Groer, R. Haria,
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
1 Database mini workshop: reconstressing athena RECONSTRESSing: stress testing COOL reading of athena reconstruction clients Database mini workshop, CERN.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
BaBar and the GRID Tim Adye CLRC PP GRID Team Meeting 3rd May 2000.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
MC Production in Canada Pierre Savard University of Toronto and TRIUMF IFC Meeting October 2003.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
Grid activities in Czech Republic Jiri Kosina Institute of Physics of the Academy of Sciences of the Czech Republic
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
THE ATLAS COMPUTING MODEL Sahal Yacoob UKZN On behalf of the ATLAS collaboration.
Dynamic Extension of the INFN Tier-1 on external resources
Computing Clusters, Grids and Clouds Globus data service
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Eleonora Luppi INFN and University of Ferrara - Italy
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
U.S. ATLAS Grid Production Experience
Example: Rapid Atmospheric Modeling System, ColoState U
Future of WAN Access in ATLAS
Distributed Network Traffic Feature Extraction for a Real-time IDS
PROOF – Parallel ROOT Facility
The COMPASS event store in 2002
UK GridPP Tier-1/A Centre at CLRC
Thoughts on Computing Upgrade Activities
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Windows Azure Migrating SQL Server Workloads
Simulation use cases for T2 in ALICE
ALICE Computing Model in Run3
Chapter 16: Distributed System Structures
US ATLAS Physics & Computing
OffLine Physics Computing
Alice Software Demonstration
TeraScale Supernova Initiative
MonteCarlo production for the BaBar experiment on the Italian grid
ATLAS DC2 & Continuous production
Backfilling the Grid with Containerized BOINC in the ATLAS computing
HEC Beam Test Software schematic view T D S MC events ASCII-TDS
Computing activities at Victoria
Chapter-1 Computer is an advanced electronic device that takes raw data as an input from the user and processes it under the control of a set of instructions.
Exploring Multi-Core on
Presentation transcript:

Grid Canada Testbed using HEP applications Randall Sobie A.Agarwal, J.Allan, M.Benning, G.Hicks, R.Impey, R.Kowalewski, G.Mateescu, D.Quesnel, G.Smecher, D.Vanderster, I.Zwiers Institute for Particle Physics, University of Victoria National Research Council of Canada, CANARIE BC Ministry for Management Services Outline Introduction Grid Canada Testbed HEP Applications Results Conclusions 11/9/2018 Randall Sobie CHEP2003

Introduction Learn to establish and maintain an operating Grid in Canada Learn how to run our particle physics apps on the Grid BaBar simulation ATLAS data challenge simulation Significant computational resources being installed on condition that they share 20% of their resources Exploit the computational resources available at both HEP and non-HEP sites without installing application-specific software at each site 11/9/2018 Randall Sobie CHEP2003

Grid Canada Activities: Grid Canada was established to foster Grid research in Canada Sponsored by CANARIE, C3.Ca Association and National Research Council of Canada Activities: Operates the Canadian Certificate Authority HPC Grid testbed for parallel applications Linux Grid testbed High speed network projects TRIUMF-CERN 1 TB file transfer demo (iGrid) 11/9/2018 Randall Sobie CHEP2003

Grid Canada Linux Testbed 12 sites across Canada (+ 1 in Colorado) 1-8 nodes per site (mixture of single and clusters of machines) Network connectivity 10-100 Mbps from each site to Victoria Servers Globus 2.0 or 2.2 OpenAFS Victoria AFS - Objectivity Server 11/9/2018 Randall Sobie CHEP2003

HEP Simulation Applications Event Generator Detector simulation Reconstruction Inject Background evts Simulation of event data is done similarly between all HEP experiments. Each step is generally a separate job. BaBar uses an Objectivity DB for the event store ATLAS data challenge (DC1) uses Zebra Neither application are optimized for a Wide-Area Grid 11/9/2018 Randall Sobie CHEP2003

Objectivity DB Application 3 parts to the job (event generation, detector simulation and reconstruction) 4 hrs for 500 events on a 450 MHz CPU 1-day tests consisted of 90-100 jobs (~50,000 evts) using 1000 SI95 AFS Server Objectivity Server Processing node Software, logs Data, conditions Latencies ~100ms 100 Objy contacts per event 11/9/2018 Randall Sobie CHEP2003

Results A series of 1-day tests of the entire testbed using 8-10 sites 80-90% success rate for jobs Objectivity DB lock problems - container creation requires global lock “lock-monitor” “lock-cleaner” at end of job AFS – worked well network demand high during startup (some crashes due to inability to access AFS) 11/9/2018 Randall Sobie CHEP2003

Efficiency was low at distant sites frequent DB access for reading/writing data 80ms latencies Next step? fix application so it has less frequent DB access install multiple Objectivity servers at different sites HEP appears to be moving away from Objy 11/9/2018 Randall Sobie CHEP2003

Typical HEP Application Input events and output are read/ written into standard files (eg Zebra, Root) Software is accessed via AFS from Victoria server. No application dependent software at hosts. We explored 3 operating scenarios: AFS for reading and writing data GridFTP input data to site then write output via AFS GridFTP both input and output data 11/9/2018 Randall Sobie CHEP2003

AFS for reading and writing data AFS is the easiest way to run the application over the grid however its performance was poor as noted by many groups. In particular, frequent reading of input data via AFS was poor Remote CPU utilization < 5% GridFTP input data to site and write output via AFS AFS caches its output on local disk and then transfers to server. AFS transfer speeds were close to single-stream FTP Neither were considered to be optimal for production over the Grid 11/9/2018 Randall Sobie CHEP2003

GridFTP both input and output data (Software via AFS) AFS used to access static executable (400 MB) and for log files GridFTP for tarred and compressed input and output files input 2.7 GB (1.2 GB compressed) output 2.1 GB (0.8 GB compressed) 11/9/2018 Randall Sobie CHEP2003

Results Currently we have run this application over a subset of the Grid Canada testbed with machines local, 1500km and 3000km. We use a single application executes quickly. (ideal for grid tests) Typical times for running the application at a 3000km distant site. 11/9/2018 Randall Sobie CHEP2003

Network and local cpu utilization. Network traffic on the GridFTP machine for a single application Typical transfer rates ~ 30 mbits/s Network traffic on the AFS Server Little demand on AFS 11/9/2018 Randall Sobie CHEP2003

Plan is to run multiple jobs at all sites on GC Testbed Jobs are staggered to reduce initial I/O demand Normally jobs would read different input files We do not see any degradation in CPU utilization due to AFS. It may become an issue with more machines - we are running 2 AFS servers. We could improve AFS utilization by running an mirrored remote site We may become network-limited as the number of applications increase. Success ? This is a mode of operation that could work It appears that the CPU efficiency at remote sites is 80-100% (not limited by AFS) Transfer rate of data is (obviously) limited by the network capacity. We can run our HEP applications without any more than Linux, Globus and AFS-Client. 11/9/2018 Randall Sobie CHEP2003

Next Steps We have been installing large, new computational and storage facilities both shared and dedicated to HEP as well as a new high speed network. We believe we understand the basic issues in running a Grid but there is lots to do we do not run a resource broker error and fault detection is minimal or non-existent our applications could be better tuned to run over the Grid testbed The next step will likely involve fewer sites, but more CPUs with the goal of making a more production-type facility. 11/9/2018 Randall Sobie CHEP2003

Summary Grid Canada testbed has been used to run HEP applications at non-HEP sites Require only Globus, AFS-Client at remote Linux CPU Input/Output data transferred via GridFTP Software accessed by AFS Continuing to test our applications at a large number of widely distributed sites Scaling issues so far have not been a problem but we are still using relatively few resources (10-20 CPUs) Plan to utilize new computational and storage resources with the new CANARIE network to develop a production Grid Thanks to the many people who have established and worked on the GC testbed and/or provided access to their resources. 11/9/2018 Randall Sobie CHEP2003