Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.

Slides:



Advertisements
Similar presentations
Fighting Malaria With The Grid. Computing on The Grid The Internet allows users to share information across vast geographical distances. Using similar.
Advertisements

Introduction The Open Science Grid (OSG) is a consortium of more than 100 institutions including universities, national laboratories, and computing centers.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Meta-Computing at DØ Igor Terekhov, for the DØ Experiment Fermilab, Computing Division, PPDG ACAT 2002 Moscow, Russia June 28, 2002.
The Sam-Grid project Gabriele Garzoglio ODS, Computing Division, Fermilab PPDG, DOE SciDAC ACAT 2002, Moscow, Russia June 26, 2002.
Workload Management Massimo Sgaravatto INFN Padova.
The B A B AR G RID demonstrator Tim Adye, Roger Barlow, Alessandra Forti, Andrew McNab, David Smith What is BaBar? The BaBar detector is a High Energy.
JIM Deployment for the CDF Experiment M. Burgon-Lyon 1, A. Baranowski 2, V. Bartsch 3,S. Belforte 4, G. Garzoglio 2, R. Herber 2, R. Illingworth 2, R.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Grid Job, Information and Data Management for the Run II Experiments at FNAL Igor Terekhov et al (see next slide) FNAL/CD/CCF, D0, CDF, Condor team, UTA,
SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.
High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.
J OINT I NSTITUTE FOR N UCLEAR R ESEARCH OFF-LINE DATA PROCESSING GRID-SYSTEM MODELLING FOR NICA 1 Nechaevskiy A. Dubna, 2012.
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
Grid Job and Information Management (JIM) for D0 and CDF Gabriele Garzoglio for the JIM Team.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
Service Computation 2010November 21-26, Lisbon.
Deploying and Operating the SAM-Grid: lesson learned Gabriele Garzoglio for the SAM-Grid Team Sep 28, 2004.
28 April 2003Lee Lueking, PPDG Review1 BaBar and DØ Experiment Reports DOE Review of PPDG January 28-29, 2003 Lee Lueking Fermilab Computing Division D0.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
LHC Computing Plans Scale of the challenge Computing model Resource estimates Financial implications Plans in Canada.
Grid Workload Management Massimo Sgaravatto INFN Padova.
GridPP18 Glasgow Mar 07 DØ – SAMGrid Where’ve we come from, and where are we going? Evolution of a ‘long’ established plan Gavin Davies Imperial College.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
International Workshop on HEP Data Grid Nov 9, 2002, KNU Data Storage, Network, Handling, and Clustering in CDF Korea group Intae Yu*, Junghyun Kim, Ilsung.
7April 2000F Harris LHCb Software Workshop 1 LHCb planning on EU GRID activities (for discussion) F Harris.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
The SAM-Grid and the use of Condor-G as a grid job management middleware Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
22 nd September 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
1 DØ Grid PP Plans – SAM, Grid, Ceiling Wax and Things Iain Bertram Lancaster University Monday 5 November 2001.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
…building the next IT revolution From Web to Grid…
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
ATLAS is a general-purpose particle physics experiment which will study topics including the origin of mass, the processes that allowed an excess of matter.
High Energy FermiLab Two physics detectors (5 stories tall each) to understand smallest scale of matter Each experiment has ~500 people doing.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
6 march Building the INFN Grid Proposal outline a.ghiselli,l.luminari,m.sgaravatto,c.vistoli INFN Grid meeting, milano.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
Adapting SAM for CDF Gabriele Garzoglio Fermilab/CD/CCF/MAP CHEP 2003.
Grid Job, Information and Data Management for the Run II Experiments at FNAL Igor Terekhov et al FNAL/CD/CCF, D0, CDF, Condor team.
Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
1 Particle Physics Data Grid (PPDG) project Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99.
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th A proposal for distributed computing monitoring for SuperB G.
Fabric for Frontier Experiments at Fermilab Gabriele Garzoglio Grid and Cloud Services Department, Scientific Computing Division, Fermilab ISGC – Thu,
HTCondor-CE. 2 The Open Science Grid OSG is a consortium of software, service and resource providers and researchers, from universities, national laboratories.
DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics.
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
Gene Oleynik, Head of Data Storage and Caching,
Workload Management Workpackage
Distributed Data Access and Resource Management in the D0 SAM System
Development of LHCb Computing Model F Harris
The DZero/PPDG D0/PPDG mission is to enable fully distributed computing for the experiment, by enhancing SAM as the distributed data handling system of.
Presentation transcript:

Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal

Overview  Characteristics of the High Energy Physics Community The SAM-Grid: enabling fully distributed analysis job processing The Proposed Instrumentation

Characteristics of the work in High Energy Physics High Energy Physics studies the fundamental interaction of Nature. Few laboratories around the world provide each unique facilities (accelerators) to study particular aspects of the field: the collaborations are geographically distributed. Experiments become every decade more challenging/expensive: the collaborations are large groups of people. The phenomena studied are statistical in nature and very rare events: a lot of data/statistics is needed

The Fermi National Accelerator Laboratory

The Nature of the Data

An example: the D0 Experiment Detector Data –1,000,000 Channels –Event size 250KB –Event rate ~50 Hz –On-line Data Rate 12 MBps –Est. 2 year totals (incl Processing and analysis): 1 x 10 9 events ~0.5 PB Monte Carlo Data (simulations) –5 remote processing centers –Estimate ~300 TB in 2 years.

The D0 Collaboration ~500 Physicists 72 institutions 18 Countries

How can all of them work together ? Using Large Distributed System Middleware: the Grid

Overview Characteristics of the High Energy Physics Community  The SAM-Grid: enabling fully distributed analysis job processing The Proposed Instrumentation

The SAM-Grid Project Mission: enable fully distributed computing for DZero and CDF Strategy: enhance the distributed data handling system of the experiments (SAM), incorporating standard Grid tools and protocols, and developing new solutions for Grid computing (JIM) Funds: the Particle Physics Data Grid (US) and GridPP (UK) People: Computer scientists and Physicists from Fermilab and the collaborating Universities History: SAM from 1997, JIM from end of 2001 Schedule: CDF and DZero are running now! A prototype is running, scheduled for production in Spring 03; long-term deliverables in 2 yrs.

The Logistics

JOB Computing Element Submission Client User Interface Queuing System Job Management User Interface Resource Selector Match Making Service Information Collector Execution Site #1 Submission Client Match Making Service Computing Element Grid Sensors Execution Site #n Queuing System Grid Sensors Storage Element Computing Element Storage Element Data Handling System Storage Element Informatio n Collector Grid Sensor s Computin g Element Data Handling System

Overview Characteristics of the High Energy Physics Community The SAM-Grid: enabling fully distributed analysis job processing  The Proposed Instrumentation

Why is this useful ? The SAM-Grid is a complex system: the instrumentation is of critical importance to Troubleshoot the system –Production systems are maintained 24x7 –Ease user support –Find anomalies/bugs Gather statistics –User data access patterns –Resource utilization –Global parameter optimization

Why is this challenging ? The SAM-Grid is composed of hundreds of servers, widely geographically distributed: what is a suitable architecture ? Servers have very diverse functionalities: is it possible to enable some form of uniform data access ?

Current instrumentation…. The SAM System uses a global log service: every SAM Server records free-format events/messages JIM V1 is under intense development: the current instrumentation is insufficient

…and its limitations The current log server is centralized: for the SAM system only it records 1 GB every few days. This does not scale. Message transport is UDP-based: this scales in the number of reporting servers, but data integrity is not guaranteed. The messages are not structured: data mining / presentation is non-trivial.

The direction 1 The CODA distributed File System is a good example of successful distributed architecture for instrumentation. Client Server Data Collector Data Log Reaper Database Off-Line Analyses

The direction 2 The structure of the message should include: the name of the client/server the types of the client/server: various groupings may be meaningful i.e. logistical, functional, logical, etc. the location of the client/server a global time stamp an id code, related to the severity of the message

Rough time estimate 1 FTE month to design the architecture + the message structure 1 FTE month to implement basic messaging 1 FTE month to study initial results 1 FTE month to feedback changes to the message structure and implementation