SAM-Grid Status Core SAM development SAM-Grid architecture Progress Future work.

Slides:



Advertisements
Similar presentations
1 ALICE Grid Status David Evans The University of Birmingham GridPP 14 th Collaboration Meeting Birmingham 6-7 Sept 2005.
Advertisements

GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Physics with SAM-Grid Stefan Stonjek University of Oxford 6 th GridPP Meeting 30 th January 2003 Coseners House.
Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.
ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
4/2/2002HEP Globus Testing Request - Jae Yu x Participating in Globus Test-bed Activity for DØGrid UTA HEP group is playing a leading role in establishing.
Rod Walker IC 13th March 2002 SAM-Grid Middleware  SAM.  JIM.  RunJob.  Conclusions. - Rod Walker,ICL.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
Globus Toolkit 4 hands-on Gergely Sipos, Gábor Kecskeméti MTA SZTAKI
Slides for Grid Computing: Techniques and Applications by Barry Wilkinson, Chapman & Hall/CRC press, © Chapter 1, pp For educational use only.
1-2.1 Grid computing infrastructure software Brief introduction to Globus © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. Modification.
Meta-Computing at DØ Igor Terekhov, for the DØ Experiment Fermilab, Computing Division, PPDG ACAT 2002 Moscow, Russia June 28, 2002.
C. Loomis – Status EDG – Dec. 12, 2002 – 1 Status of the European DataGrid Project Charles Loomis (LAL/CNRS) LAL December 12, 2002 Outline Introduction.
The Sam-Grid project Gabriele Garzoglio ODS, Computing Division, Fermilab PPDG, DOE SciDAC ACAT 2002, Moscow, Russia June 26, 2002.
Globus Computing Infrustructure Software Globus Toolkit 11-2.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
HEP Experiment Integration within GriPhyN/PPDG/iVDGL Rick Cavanaugh University of Florida DataTAG/WP4 Meeting 23 May, 2002.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
Grid Computing - AAU 14/ Grid Computing Josva Kleist Danish Center for Grid Computing
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
Grid Job and Information Management (JIM) for D0 and CDF Gabriele Garzoglio for the JIM Team.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
Deploying and Operating the SAM-Grid: lesson learned Gabriele Garzoglio for the SAM-Grid Team Sep 28, 2004.
SAM Job Submission What is SAM? sam submit …… Data Management Details. Conclusions. Rod Walker, 10 th May, Gridpp, Manchester.
Computational grids and grids projects DSS,
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP th March 2003 San Diego.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.
SAM and D0 Grid Computing Igor Terekhov, FNAL/CD.
Grid Workload Management Massimo Sgaravatto INFN Padova.
22 nd September 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
16 September GridPP 5 th Collaboration Meeting D0&CDF SAM and The Grid Act I: Grid, Sam and Run II Rick St. Denis – Glasgow University Act II: Sam4CDF.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
Grid Scheduler: Plan & Schedule Adam Arbree Jang Uk In.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
The DZero/PPDG Test Bed Test bed composition as of Feb 2002: 3 PC at Fermilab (sammy, samadams, sameggs) Contact: Gabriele Garzoglio 1 PC at Imperial College.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.
GRID activities in Wuppertal D0RACE Workshop Fermilab 02/14/2002 Christian Schmitt Wuppertal University Taking advantage of GRID software now.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
VOX Project Status T. Levshina. 5/7/2003LCG SEC meetings2 Goals, team and collaborators Purpose: To facilitate the remote participation of US based physicists.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
IC Status – 29/4/02 Condor BS adaptor –New interface handles JDF`s (condor,fbs,condorG) Multi-process consumers. –In progress (crashes smaster on jobSubmitted())
The EDG Testbed Deployment Details
Patrick Dreher Research Scientist & Associate Director
The DZero/PPDG D0/PPDG mission is to enable fully distributed computing for the experiment, by enhancing SAM as the distributed data handling system of.
Presentation transcript:

SAM-Grid Status Core SAM development SAM-Grid architecture Progress Future work

Core SAM development SAM is a production system 300 active users 60,000 file replicas 5,000 files/day cache turnover (1TB) A fine-tuning example: The Friday afternoon opportunity Many users submit several projects for the w/e Station has project limit, 0(Ncpus) Queue projects, but then how to keep required data in cache Parallelisation & re-education: N processes per project, not N projects with 1 process each

Physicists always cheat… SAM helps

Multi-Process projects Project Manager Together, processes see each file once. Process is simple: Asks: Give me a file Responds to: Here`s the path Hang on None left Processes

SAM Grid RC Condor MMS Condor-G GRAM Grid sensors Job Definition and Management Monitoring and Information Data Handling Request Broker Compute Element Resource Logging and Bookkeeping Job Scheduler Info Processor and Converter Replica Catalog DH Resource Management Data Delivery and Caching Resource Info Job Client Job Status Updates Principal Component Service Implementation or Library Information GSI Batch System Site Gatekeeper AAA MDS-2 Condor Class Ads SAM-Grid Architecture Job Definition and Management Based on the Match Making Service of Condor® through collaboration with University of Wisconsin CS Group Monitoring and Information Services Provides a view of the status and history of the system, as well as the information relevant for job and data management Data Handling The existing SAM system, developed at Fermilab to accommodate high volume data management, plays a principal role in providing Data Handling services to the Job Management infrastructure

Job Definition and Management Condor MMS Condor-G GRAM Grid sensors Request Broker Compute Element Resource Job Scheduler Job Client Job Status Updates Batch System Site Gatekeeper Job Management Globus GRAM for inter-operability CondorG for remote submission Condor MMS for resource brokerage Condor is Resource Broker Collaboration with Condor group Condor members at weekly SAM-Grid meetings CVS branch of v6_3_2 with our requested functionality Ability to choose globus-scheduler External function calls allowed in MMS – can query SAM Db

Grid RC Monitoring and Information Logging and Bookkeeping Info Processor and Converter Replica Catalog Resource Info GSI AAA MDS-2 Condor Class Ads Monitoring & Information Package of information providers to interrogate: SAM Station: project progress, disk caches Replica Catalogue: file location, size Batch Systems: free cpus Resources: os, code releases present, memory, disk space,…

Monitoring & Information

SAM Data Handling DH Resource Management Data Delivery and Caching Data Handling Existing SAM system Added gridftp as a transfer protocol (also kerb-rcp,bbftp available) Use server certificates issued by FNAL Kerberized CA Delegation of user proxy not (yet) done (accounting, security) Server runs as unprivileged user Report bug, receive patch. Apply. Re-build on Linux and Ultrix, repackage,… i.e. very poor support. Globus bundles packaged as upd products During testing - re-discovered globus-url-copy bug STILL in downloadable globus release! Repeat above procedure? No, take EDG special globus-url-copy binary.

Future Work n th order brokering 0 th order: Submit to site where most data replicated. Trivial with condor additions. 1 st order: Sense grid connectivity using WP7 tools as plugin to condor Inter-site parallelisation: Split datasets, move jobs to data Dynamic station installation To use non-dedicated resources and clean-up afterwards upd has almost no dependencies on native packages Auto-tailoring forced by CDF makes this possible Further MC production/SAM integration.

0 th order brokering File Count: 99 Average File Size: Total File Size: Total Event Count: known domains and 3 stations At wuppertal :- 4719Mb( 7%) from fnal.gov at 0.5Mb/s Mb( 73%) from ic.ac.uk at 2.0Mb/s Mb( 20%) from pnfs at 0.5Mb/s. Transfer time =18.0hrs. Plus 2 tape mounts. At imperial-test :- 4719Mb( 7%) from fnal.gov at 0.5Mb/s Mb( 73%) from ic.ac.uk at 10.0Mb/s Mb( 20%) from pnfs at 0.5Mb/s. Transfer time =11.8hrs. Plus 2 tape mounts. At central-analysis : Mb( 78%) from pnfs at 10.0Mb/s Mb( 22%) from fnal.gov at 100.0Mb/s. Transfer time =1.5hrs. Plus 2 tape mounts. …but no free cpu! enstore tape

Conclusions SAM production system Heavy and increasing D0 use. Fine tuning. CDF deployment – no show stoppers SAM-Grid taking shape Monitoring & Information prototype available GridFTP pre-deployment tests. System failed me. Remote job submission works. CondorG enhancements allow site matching in MMS by query of SAM replica catalogue. Outreach-SAM offers unique, working example of a PP grid already some interest in PP data access patterns. expect more interest in real data handling & optimisation. Wise learn from other peoples mistakes