LHCb Software Meeting 28.03.01 Glenn Patrick1 First Ideas on Distributed Analysis for LHCb LHCb Software Week CERN, 28th March 2001 Glenn Patrick (RAL)

Slides:



Advertisements
Similar presentations
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Advertisements

LHCb Bologna Workshop Glenn Patrick1 Backbone Analysis Grid A Skeleton for LHCb? LHCb Grid Meeting Bologna, 14th June 2001 Glenn Patrick (RAL)
LHCb(UK) Meeting Glenn Patrick1 LHCb Grid Activities in UK LHCb(UK) Meeting Cambridge, 10th January 2001 Glenn Patrick (RAL)
LHCb Computing Activities in UK Current activities UK GRID activities RICH s/w activities.
SERENATE meeting Brussels 17/18 Sept 2002 P.Clarke / The Task The Plan The Experience The Lessons Peter Clarke Dept of Physics and.
Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.
JANET – what it is, history, current issues Geoff McMullen
Networking Issues David Salmon UKERNA/GNT
Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
21 June 2001ITSSC, Oxford1 SuperJANET4: what’s it for? Bob Day Network Development Director UKERNA.
SuperJANET4 Update Roland Trice, ULCC SuperJANET4 Rollout Manager.
11 Dec 2000F Harris Datagrid Testbed meeting at Milan 1 LHCb ‘use-case’ - distributed MC production
Exploiting the Grid to Simulate and Design the LHCb Experiment K Harrison 1, N Brook 2, G Patrick 3, E van Herwijnen 4, on behalf of the LHCb Grid Group.
CERN/IT/DB Multi-PB Distributed Databases Jamie Shiers IT Division, DB Group, CERN, Geneva, Switzerland February 2001.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.
08/06/00 LHCb(UK) Meeting Glenn Patrick LHCb(UK) Computing/Grid: RAL Perspective Glenn Patrick Central UK Computing (what.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Study Visit Programme to NeSC – October 2004 UK National Network, Campus and Regional Issues People Networks Linda McCormick Director of Computing Service.
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
LHCb DataModel Nick Brook Glenn Patrick University of Bristol Rutherford Lab Motivation DataModel Options Future plans.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
09-Sept-2004 NeSC UKLight Town Meeting Peter Clarke, UKLight Town Meeting Welcome, background and & programme for the day Peter Clarke.
SLICE Simulation for LHCb and Integrated Control Environment Gennady Kuznetsov & Glenn Patrick (RAL) Cosener’s House Workshop 23 rd May 2002.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
Data Import Data Export Mass Storage & Disk Servers Database Servers Tapes Network from CERN Network from Tier 2 and simulation centers Physics Software.
Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.
Nick Brook Current status Future Collaboration Plans Future UK plans.
LHC Computing Review - Resources ATLAS Resource Issues John Huth Harvard University.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
LHCb and DataGRID - the workplan for 2001 Eric van Herwijnen Wednesday, 28 march 2001.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
LHC Computing Review Recommendations John Harvey CERN/EP March 28 th, th LHCb Software Week.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
7April 2000F Harris LHCb Software Workshop 1 LHCb planning on EU GRID activities (for discussion) F Harris.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
Tony Doyle & Gavin McCance - University of Glasgow ATLAS MetaData AMI and Spitfire: Starting Point.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
WP8 Meeting Glenn Patrick1 LHCb Grid Activities in UK Grid WP8 Meeting, 16th November 2000 Glenn Patrick (RAL)
ICFA SCIC Meeting CERN 28 Sep 02 R. Hughes-Jones Manchester Networking from the UK Richard Hughes-Jones PPNCG.
Dave Newbold, University of Bristol8/3/2001 UK Testbed 0 Sites Sites that have committed to TB0: RAL (R) Birmingham (Q) Bristol (Q) Edinburgh (Q) Imperial.
…building the next IT revolution From Web to Grid…
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
UK Grid Meeting Glenn Patrick1 LHCb Grid Activities in UK Grid Prototype and Globus Technical Meeting QMW, 22nd November 2000 Glenn Patrick (RAL)
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
11th November 2002Tim Adye1 Distributed Analysis in the BaBar Experiment Tim Adye Particle Physics Department Rutherford Appleton Laboratory University.
The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.
LHCb Data Challenge in 2002 A.Tsaregorodtsev, CPPM, Marseille DataGRID France meeting, Lyon, 18 April 2002.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
CLRC Grid Team Glenn Patrick LHCb GRID Plans Glenn Patrick LHCb has formed a GRID technical working group to co-ordinate practical Grid.
The ATLAS Computing & Analysis Model Roger Jones Lancaster University ATLAS UK 06 IPPP, 20/9/2006.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
LHCb Current Understanding of Italian Tier-n Centres Domenico Galli, Umberto Marconi Roma, January 23, 2001.
ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.
Hall D Computing Facilities Ian Bird 16 March 2001.
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
LHCb computing model and the planned exploitation of the GRID Eric van Herwijnen, Frank Harris Monday, 17 July 2000.
Database Replication and Monitoring
Moving the LHCb Monte Carlo production system to the GRID
UK GridPP Tier-1/A Centre at CLRC
LHC Data Analysis using a worldwide computing grid
Gridifying the LHCb Monte Carlo production system
ATLAS DC2 & Continuous production
LHCb thinking on Regional Centres and Related activities (GRIDs)
Short to middle term GRID deployment plan for LHCb
Development of LHCb Computing Model F Harris
Presentation transcript:

LHCb Software Meeting Glenn Patrick1 First Ideas on Distributed Analysis for LHCb LHCb Software Week CERN, 28th March 2001 Glenn Patrick (RAL)

LHCb Software Meeting Glenn Patrick2 Analysis and the Grid? Monte-Carlo Production is readily mapped onto a Grid Architecture because: It is a well defined problem using the same executable. Already requires distributed resources (mainly cpu) in large centres (eg. Lyon, RAL, Liverpool...). Few people involved. Analysis is much more inventive/chaotic and will involve far more people in a wide range of institutes. How easily this is perceived to map onto the Grid depends on where we sit on the Hype Cycle....

LHCb Software Meeting Glenn Patrick3 Hype Cycle of Emerging Technology Courtesy of Gartner Group Time Hype Peak of Inflated Expectations Trough of Disillusionment Slope of Enlightenment Plateau of Productivity Trigger

LHCb Software Meeting Glenn Patrick4 Issues There are two basic issues: What is the data model for the experiment? Most work on this was done BG (before Grid). Is it still relevant? Do we move analysis jobs to the data or the data to the jobs? What is the minimum dataset required for analysis (AOD,ESD)? Are we accessing objects or files? Interactive versus batch computing. What services and interfaces have to be provided to grid-enable the LHCb analysis software? Difficult until a working Grid architecture emerges. Have to make a start and gradually evolve?

LHCb Software Meeting Glenn Patrick5 Data Model Networking Evolution In UK, WorldCom is providing the national backbone for SuperJanet4 from March SuperJanet3155 Mbit/s 1Q x SuperJanet32.5 Gbit/s 4Q x SuperJanet310 Gbit/s 2Q x SuperJanet320 Gbit/s Few years ago - Most bulk data was moved by tape. Now - Almost all data from RAL is moved over the network.  More scope for moving data to the application?

Glenn Patrick6 Scotland via Edinburgh Scotland via Glasgow WorldCom Glasgow WorldCom Edinburgh NorMAN YHMAN EMMAN EastNet External Links LMN Kentish MAN LeNSE SWAN & BWEMAN South Wales MAN TVN MidMAN Northern Ireland North Wales MAN NNW C&NL MAN WorldCom Warrington WorldCom Leeds WorldCom Reading WorldCom London WorldCom Reading WorldCom Portsmouth 155Mbit/s single fibre interface 622Mbit/s single fibre interface 2.5Gbit/s single fibre interface 2.5Gbit/s dual fibre interface 2.5Gbit/s development network SuperJanet4 UK Backbone, March 2001

LHCb Software Meeting Glenn Patrick7 Data Model Last Mile Problem? Having a fast backbone is not much use if local bottlenecks exist (typically 100 Mbit/s). Need to do point-to-point tests using realistic datasets. ConnectionRateTape(750MB) RAL CSF  RAL PPD  1600kB/s8 minutes RAL CSF  CERN  360kB/s35 minutes RAL CSF  Liverpool~90kB/s2.3 hours Very crude tests done on a bad day. Need to perform spectrum of tests with realistic datasets, new tools, etc. Parallel Grid-FTP(multiple streams)  1MB/s RAL  CERN  But increasing data flow down the analysis chain...

8 ESD: Data or Monte Carlo Event Tags Event Selection Analysis Object Data AOD Analysis Object Data AOD Calibration Data Analysis, Skims Raw Data Tier 0,1 Collaboration wide Tier 2 Analysis Groups Tier 3, 4 Physicists Physics Analysis Physics Objects Physics Objects Physics Objects INCREASING DATA FLOWINCREASING DATA FLOW Ref: Tony Doyle(WP2/ATLAS)

LHCb Software Meeting Glenn Patrick9 AODGroup Analysis Tags Physics Analysis Private Data (e.g. ntuple) Analysis Workstation Physics results Analysis Cycle (for each physicist) Which Datasets are really needed for Analysis? For event with “interesting” Group Analysis Tags Calibration Data Few physicists and for very few events Raw Data ESD Some physicists for small sample of events Generator Data For Monte Carlo events Likely to be different requirements at startup.

LHCb Software Meeting Glenn Patrick10 Datasets Hoffman ALICE(pp)ATLASCMSLHCb RAW per event1MB1MB1MB0.125MB ESD per event0.1MB0.5MB0.5MB0.1MB AOD per event10kB10kB10kB20kB TAG per event1kB0.1kB1kB1kB Real Data Storage1.2PB2PB1.7PB0.45PB Simulation Storage0.1PB1.5PB1.2PB0.36PB Calibration Storage0.00.4PB0.01PB0.01PB

LHCb Software Meeting Glenn Patrick11 Physics Use-Cases Baseline model assumes: Production Centre stores all phases of data (RAW, ESD, AOD and TAG). CERN is production centre for real data. TAG and AOD datasets shipped to Regional Centres. Only 10% of ESD data moved to outside centres. LHCb has smaller dataset sizes (but perhaps more specialised requirements)  more options available? Even with 2 x 10 9 events/year, total AOD sample is only 40 TB/year.

LHCb Software Meeting Glenn Patrick12 Analysis Interface Gaudi meets the Grid? Gaudi Services Application Manager Job Options Service Detector Description EventData Service Histogram Service Message Service Particle Property Service GaudiLab Service Grid Services Information Services Scheduling Security Monitoring Data Management Service Discovery Database Service? Meta Data Data Standard Interfaces & Protocols Most Grid services are producers or consumers of meta-data Logical DataStores Event Detector Histogram Ntuple

LHCb Software Meeting Glenn Patrick13 High Level Interfaces Need to define high-level Grid interfaces essential to Gaudi, especially relating to data access. For example: Data Query Data Locator Data Mover CASTOR HPSS Other MSS Medium Level Low Level High Level Data Replication

LHCb Software Meeting Glenn Patrick14 Analysis and the Grid In the Grid, analysis appears to be seen as a series of hierarchical queries (cuts) on databases/datasets: eg. (PTRACK < 150.0) AND (RICHpid = pion) Architectures based on multi-agent technology. Intelligent agent is a software entity with some degree of autonomy and can carry out operations on behalf of a user or program. Need to define “globally unique” LHCb namespace(s). ATF proposes using URI syntax… eg.

LHCb Software Meeting Glenn Patrick15 Agent Architecture (Serafini et al) User 2User 1 User n Agent Based Query Facilitator Query Execution Strategies Caching Strategies INDEX MSS 1 Cache/Disk Tape robotics MSS 2 Cache/Disk Tape robotics MSS k Cache/Disk Tape robotics Contains variety of agents: User agents Index agents MSS agents

LHCb Software Meeting Glenn Patrick16 RAL CSF 236 Linux cpu IBM 3494 tape robot LIVERPOOL MAP 300 Linux cpu CERN RAL (PPD) Bristol Imperial College Oxford GLASGOW/ EDINBURGH “Proto-Tier 2” Evolving LHCb Analysis Testbeds? Institutes RAL DataGrid Testbed Cambridge FRANCE ITALY :

LHCb Software Meeting Glenn Patrick17 Conclusions 1.Need better understanding of how Data Model will really work for analysis. Objects versus files? 2.Pragmatic study of performance/topology/limitations of national (and international) networks.  feed back into 1. 3.Require definition of high-level Grid services which can be exploited by Gaudi. Agent technology? 4.Need some realistic “physics” use-cases.  feed back into 1 and 3. 5.Accumulate experience of running Gaudi in a distributed environment (eg.CERN  UK).