Ian Willers Information: CMS participation in Monarc and RD45 Slides: Paolo Capiluppi, Irwin Gaines, Harvey Newman, Les Robertson, Jamie Shiers, Lucas.

Slides:

Advertisements

Similar presentations

L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.

Advertisements

L. Perini Milano 6 Mar Centri Regionali e Progetto GRID per ATLAS-Italia La situazione a oggi: decisioni, interesse, impegni, punti da chiarire.

CERN STAR TAP June 2001 Status of the EU DataGrid Project Fabrizio Gagliardi CERN EU-DataGrid Project Leader June 2001

International Grid Communities Dr. Carl Kesselman Information Sciences Institute University of Southern California.

S.L.LloydATSE e-Science Visit April 2004Slide 1 GridPP – A UK Computing Grid for Particle Physics GridPP 19 UK Universities, CCLRC (RAL & Daresbury) and.

1 ALICE Grid Status David Evans The University of Birmingham GridPP 14 th Collaboration Meeting Birmingham 6-7 Sept 2005.

1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL June 2006.

Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA.

31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.

Bernd Panzer-Steindel, CERN/IT WAN RAW/ESD Data Distribution for LHC.

LHCb Computing Activities in UK Current activities UK GRID activities RICH s/w activities.

Grids e HEP. Concorde (15 Km) Balloon (30 Km) CD stack with 1 year LHC data! (~ 20 Km) Mt. Blanc (4.8 Km) Bytes 10 3 Terabytes 1 Petabyte.

Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.

Highest Energy e + e – Collider LEP at CERN GeV ~4km radius First e + e – Collider ADA in Frascati GeV ~1m radius e + e – Colliders.

10-Feb-00 CERN Building a Regional Centre A few ideas & a personal view CHEP 2000 – Padova 10 February 2000 Les Robertson CERN/IT.

Amber Boehnlein, FNAL D0 Computing Model and Plans Amber Boehnlein D0 Financial Committee November 18, 2002.

Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

Randall Sobie The ATLAS Experiment Randall Sobie Institute for Particle Physics University of Victoria Large Hadron Collider (LHC) at CERN Laboratory ATLAS.

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia.

POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC.

1 Data Storage MICE DAQ Workshop 10 th February 2006 Malcolm Ellis & Paul Kyberd.

CERN/IT/DB Multi-PB Distributed Databases Jamie Shiers IT Division, DB Group, CERN, Geneva, Switzerland February 2001.

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.

Fabric Management for CERN Experiments Past, Present, and Future Tim Smith CERN/IT.

MONARC : results and open issues Laura Perini Milano.

High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.

CERN TERENA Lisbon The Grid Project Fabrizio Gagliardi CERN Information Technology Division May, 2000

Data Import Data Export Mass Storage & Disk Servers Database Servers Tapes Network from CERN Network from Tier 2 and simulation centers Physics Software.

Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.

CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

LHC Computing Review - Resources ATLAS Resource Issues John Huth Harvard University.

Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.

Modeling Regional Centers with MONARC Simulation Tools Modeling LHC Regional Centers with the MONARC Simulation Tools Irwin Gaines, FNAL for the MONARC.

No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) MONARC Project Status Report Harvey Newman California Institute.

LHC Computing Review Recommendations John Harvey CERN/EP March 28 th, th LHCb Software Week.

Computing for LHCb-Italy Domenico Galli, Umberto Marconi and Vincenzo Vagnoni Genève, January 17, 2001.

14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.

December 10,1999: MONARC Plenary Meeting Harvey Newman (CIT) Phase 3 Letter of Intent (1/2)  Short: N Pages è May Refer to MONARC Internal Notes to Document.

JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.

The LHCb Italian Tier-2 Domenico Galli, Bologna INFN CSN1 Roma,

Sep 02 IPP Canada Remote Computing Plans Pekka K. Sinervo Department of Physics University of Toronto 4 Sep IPP Overview 2 Local Computing 3 Network.

Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.

National HEP Data Grid Project in Korea Kihyeon Cho Center for High Energy Physics (CHEP) Kyungpook National University CDF CAF & Grid Meeting July 12,

ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.

26 Nov 1999 F Harris LHCb computing workshop1 Development of LHCb Computing Model F Harris Overview of proposed workplan to produce ‘baseline computing.

July 26, 1999MONARC Meeting CERN MONARC Meeting CERN July 26, 1999.

6 march Building the INFN Grid Proposal outline a.ghiselli,l.luminari,m.sgaravatto,c.vistoli INFN Grid meeting, milano.

U.S. ATLAS Computing Facilities Overview Bruce G. Gibbard Brookhaven National Laboratory U.S. LHC Software and Computing Review Brookhaven National Laboratory.

Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)

Tier 1 at Brookhaven (US / ATLAS) Bruce G. Gibbard LCG Workshop CERN March 2004.

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

10-Jan-00 CERN Building a Regional Centre A few ideas & a personal view CHEP 2000 – Padova 10 January 2000 Les Robertson CERN/IT.

U.S. ATLAS Computing Facilities DOE/NFS Review of US LHC Software & Computing Projects Bruce G. Gibbard, BNL January 2000.

January 20, 2000K. Sliwa/ Tufts University DOE/NSF ATLAS Review 1 SIMULATION OF DAILY ACTIVITITIES AT REGIONAL CENTERS MONARC Collaboration Alexander Nazarenko.

U.S. ATLAS Computing Facilities U.S. ATLAS Physics & Computing Review Bruce G. Gibbard, BNL January 2000.

Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.

1 Particle Physics Data Grid (PPDG) project Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99.

10-Feb-00 CERN HepCCC Grid Initiative ATLAS meeting – 16 February 2000 Les Robertson CERN/IT.

Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.

Hall D Computing Facilities Ian Bird 16 March 2001.

] Open Science Grid Ben Clifford University of Chicago

Bernd Panzer-Steindel, CERN/IT

Russian Regional Center for LHC Data Analysis

UK GridPP Tier-1/A Centre at CLRC

CS258 Spring 2002 Mark Whitney and Yitao Duan

LHCb thinking on Regional Centres and Related activities (GRIDs)

Development of LHCb Computing Model F Harris

Presentation transcript:

Ian Willers Information: CMS participation in Monarc and RD45 Slides: Paolo Capiluppi, Irwin Gaines, Harvey Newman, Les Robertson, Jamie Shiers, Lucas Taylor Hardware Resource Needs of CMS Experiment start up 2005

2 Contents Why is LHC computing different Monarc Project and proposed architecture An LHC offline computing facility at CERN A Regional Centre LHC data management The Particle Physics Data Grid Summary

3 CMS Structure showing sub-detectors

4 Not covered: CMS Software Professionals Professional software personnel ramping up to ~33 FTE’s (by 2003) Engineers support much larger no. of physicist developers (~4 times) Shortfall: 10 FTE’s (1999)

5 Not covered : Cost of Hardware ~120 MCHF Total Computing cost to 2006 incl. ~consistent with canonical 1/3 : 2/3 rule ~40 MCHF (Tier0, Central systems at CERN) ~40 MCHF (Tier1, ~5 Regional Centres each ~20% of central systems) ~40 MCHF (?) (Universities, Tier2 centres, MC, etc..) Figures being revised

6 LHC Computing: Different from Previous Experiment Generations Geographical dispersion: of people and resources Complexity: the detector and the LHC environment Scale: Petabytes per year of data 1800 Physicists 150 Institutes 32 Countries Major challenges associated with u u Coordinated Use of Distributed computing resources u u Remote software development and physics analysis u u Communication and collaboration at a distance R&D: New Forms of Distributed Systems

7 Comparisons with LHC sized experiment in 2006: CMS at CERN [*] [*] Total CPU: CMS or ATLAS ~ MSi95 Estimates for disk/tape ratio will change (technology evolution)

8 CPU needs for baseline analysis process 52k 4 Hours 4 Times/Day Individuals(500) Analysis (ESD 1%) ~580+x ~700+y ? Total Disk Storage(TB) ~1400k ~1700k 1050k 3k 1k 190k 100k 116k Total CPU Power (SI95) 10 ~300 Days ~10 6 events /DayExperiment/Group Simulation + Reconstruction Total utilized Total installed Hours 4 Times/Day Individuals(500)Analysis (AOD, TAG & DPD) Day Once/Month Groups (20) Selection Days Once/MonthExperimentRe-definition (AOD &TAG) Month 3 times/Year ExperimentRe-processing Days Once/YearExperimentReconstruction Disk I/O (MB/sec)Responsetime/passFrequencyActivity (100% efficiency, no AMS overhead)

9 Major activities foreseen at CERN reality check (Les Robertson, Jan. ‘99) Based on 520,000 SI95, present estimate from Les 600,000 SI ,000 SI95/year

10 MONARC: Common Project Models Of Networked Analysis At Regional Centers Caltech, CERN, Columbia, FNAL, Heidelberg, Helsinki, INFN, IN2P3, KEK, Marseilles, MPI Munich, Orsay, Oxford, Tufts PROJECT GOALS è Develop “Baseline Models” è Specify the main parameters characterizing the Model’s performance: throughputs, latencies è Verify resource requirement baselines: (computing, data handling, networks) TECHNICAL GOALS è Define the Analysis Process è Define RC Architectures and Services è Provide Guidelines for the final Models è Provide a Simulation Toolset for Further Model studies 622 Mbits/s Univ 2 CERN 520k SI Tbytes Disk; Robot Tier2 Ctr 20k SI95 20 TB Disk Robot FNAL/BNL 100k SI Tbyte Disk; Robot 622 Mbits/s N X 622 Mbits/s 622Mbits/s Univ 1 Univ M Model Circa 2005

11 CMS Analysis Model Based on MONARC and ORCA: “Typical” Tier1 RC è CPU Power~100 KSI95 è Disk space~200 TB è Tape capacity600 TB, 100 MB/sec è Link speed to Tier210 MB/sec (1/2 of 155 Mbps) è Raw data5% 50 TB/year è ESD data100%200 TB/year è Selected ESD25%10 TB/year [*] è Revised ESD25%20 TB/year [*] è AOD data100%2 TB/year [**] è Revised AOD100%4 TB/year [**] è TAG/DPD100%200 GB/year è Simulated data25%25 TB/year [*] Covering Five Analysis Groups; each selecting ~1% of Annual ESD or AOD data for a Typical Analysis [**] Covering All Analysis Groups

12 Monarc Data Hierarchy Tier2 Center ~1 TIPS Online System Offline Farm ~20 TIPS CERN Computer Center >20 TIPS Fermilab ~4 TIPS France Regional Center Italy Regional Center Germany Regional Center Institute Institute ~0.25TIPS Workstations ~100 MBytes/sec ~2.4 Gbits/sec Mbits/sec Bunch crossing per 25 nsecs. 100 triggers per second Event is ~1 MByte in size Physicists work on analysis “channels”. Each institute has ~10 physicists working on one or more channels Data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~622 Mbits/sec or Air Freight Tier2 Center ~1 TIPS ~622 Mbits/sec Tier 0 Tier 1 Tier 3 Tier 4 1 TIPS = 25,000 SpecInt95 PC (1999) = ~15 SpecInt95 Tier2 Center ~1 TIPS Tier 2

13 MONARC Analysis Process Example DAQ/RAW Slow Control/Cal ~20 ~25 Huge number of “small” jobs per Day. Chaotic activity 20 large jobs per Month. Coord. activity 4 times per Year? (per Exp.)

14 Tapes Network from CERN Network from Tier 2 & simulation centers Tape Mass Storage & Disk Servers Database Servers Physics Software Development R&D Systems and Testbeds Info servers Code servers Web Servers Telepresence Servers Training Consulting Help Desk Production Reconstruction Raw/Sim  Rec objs Scheduled predictable Experiment/ Physics groups Production Analysis Selection: Rec objs  AOD &TAG Scheduled Physics groups Individual Analysis Selection: TAG  plots Chaotic Physicists Desktops Tier 2 Local institutes CERN Tapes Regional Centre

15 Offline Computing Facility for CMS at CERN Purpose of the study u investigate the feasibility of building LHC computing facilities using current cluster architectures, conservative assumptions about technology evolution è scale & performance è technology è power è footprint è cost è reliability è manageability

16 Background & assumptions Sizing u Data are estimates from experiments u MONARC analysis group working papers and presentations Architecture u CERN is Tier0 centre and act as Tier1 centre... u CERN distributed architecture (in same room and across site) è simplest components (hyper-sensitive to cost, aversion to complication) è throughput (before performance) è resilience (mostly up all of the time) è computing fabric for flexibility, scalability avoid special-purpose components avoid special-purpose components everything can do anything (does not mean that parts are not dedicated for specific applications, periods,..) everything can do anything (does not mean that parts are not dedicated for specific applications, periods,..)

18 Components (1) Processors u u the then-current low-end PC server (equivalent of the dual cpu boards of 1999) u 4 cpus, each >100 SI95 u creation of AOD and analysis may need better (more expensive) Assembled into clusters and sub-farms u according to practical considerations like è throughput of first level LAN switch è rack capacity è power & cooling, …. u Each cluster comes with a suitable chunk of I/O capacity LAN u no issue - since the computers are high volume components, the computer-LAN interface is standard (then-current Ethernet!) u higher layers need higher throughput - but only about a Tbps

Processor cluster basic box four 100 SI95 processors standard network connection (~2 Gbps) 15% of systems configured as I/O servers (disk server, disk-tape mover, Objy AMS,..) with additional connection to the storage network cluster 9 basic boxes with a network switch (<10 Gbps) sub-farm 4 clusters - with a second-level network switch (<50 Gbps) one sub-farm fits in one rack 3 Gbps* 1.5 Gbps configured as I/O servers storage network farm network cluster and sub-farm sizing adjusted to fit conveniently the capabilities of network switch, racking, power distribution components sub-farm: 36 boxes, 144 cpus, 5 m 2

20 Components (2) Disks u u inexpensive RAID arrays u u capacity limited to ensure sufficient number of independent accessors (say ~100GB with the current size of disk farm) SAN (Storage Area Network) u u if this market develops into high-volume, low-cost (?) è è hopefully using the standard network medium u u otherwise use the current model LAN-connected storage servers instead of special-purpose SAN-connected storage controllers

disk sub-system array Two RAID controllers Dual-attached disks Controllers connect to the storage network Sizing of array subject to components available rack Integral number of arrays, with first level network switches In the main model, half-height 3.5” disks are assumed, 16 per shelf of a 19” rack. With space for 18 shelves in the rack (two-sided), half of the shelves are populated with disks, the remainder housing controllers, network switches, power distribution. 0.8 Gbps 5 Gbps storage network 19” rack, 1m deep, 1.1 m 2 with space for doors, 14TB capacity disk size restricted to give a disk count which matches the number of processors (and thus number of active processes)

22 I/O models to be investigated This is a fabric so it should support any model 1 - the I/O server, or Objectivity AMS model u u all I/O requests must pass through an intelligent processor u u data passes across the SAN to the I/O server and then across the LAN to the application server 2 - as above, but the SAN and the LAN are the same, or the SAN is accessed via the LAN u u all I/O requests pass twice across the LAN - double the network rates in the drawing 3 - the global shared file system u u no I/O servers, database servers - all data is accessed directly from all application servers u u the LAN is the SAN

23 Components (3) Tapes (unpopular in Computer Centres - new technology by 2004?) u u conservative assumption - è 100GB per cartridge è 20MB/sec per drive - with 25% achievable (robot, load/unload, position/rewind, retry, ….) u u let’s hope that all of the active data can be held on disk u u tape needed as archive and for shipping View of part of the magnetic tape vault at CERN's computer centre STK Robot 6000 cartridges x 50 GB = 300TB

24 Problems? We hope the local area network will not be a problem u u cpu-to-I/O requirement is modest u u few Gbps at the computer node u u suitable switches should be available in a couple of years Disk system is probably not an issue u u buy more disk than we currently predict to have enough accessors Tapes - already talked about that Space - OK - thanks to the visionaries of 1970 Power & cooling - not a problem but a new cost

25 The real problem Management u u installation u u monitoring u u fault-determination u u re-configuration u u Integration All this must be fully automated while retaining simplicity and flexibility Make sure full cost of ownership is considered u current industry cost of ownership of PC is 10’000 CHF/year vs. 3’000 CHF purchase price

26 Tapes Network from CERN Network from Tier 2 & simulation centers Tape Mass Storage & Disk Servers Database Servers Physics Software Development R&D Systems and Testbeds Info servers Code servers Web Servers Telepresence Servers Training Consulting Help Desk Production Reconstruction Raw/Sim  Rec objs Scheduled predictable Experiment/ Physics groups Production Analysis Selection: Rec objs  AOD &TAG Scheduled Physics groups Individual Analysis Selection: TAG  plots Chaotic Physicists Desktops Tier 2 Local institutes CERN Tapes Regional Centre

27 Data Import Data Export Mass Storage & Disk Servers Database Servers Tapes Network from CERN Network from Tier 2 and simulation centers Physics Software Development R&D Systems and Testbeds Info servers Code servers Web Servers Telepresence Servers Training Consulting Help Desk Production Reconstruction Raw/Sim--> Rec objs Scheduled, predictable experiment/ physics groups Production Analysis Selection Rec objs  TAG Scheduled Physics groups Individual Analysis Selection TAG  plots Chaotic Physicists Desktops Tier 2 Local institutes CERN Tapes DATAFLOW Robotic Mass Storage Central Disk Cache Local Disk Cache

28 LHC Data Management 4 Experiments mean u u >10PB / year total u u 100MB - 1.5GB/s u u ~20 years running u u ~5000 physicists u u ~250 institutes u u ~500 concurrent analysis jobs 24x7 Solutions must work at CERN and outside Scalable from 1-10 users to Support lap-top to large servers with 100GB-1TB and HSM From MB/GB/TB? (private data) to many PB

29 Objectivity/Database Architecture: CMS baseline solution Application Objy Client Objy Server Objy Lock Server Objy Server HSM Client HSM Server Application Objy Client Objy Server Application Host Application & Data Server Data Server + HSM  files  Any host

30 Objectivity/Database Architecture: CMS baseline solution Object ID (OID): 8 bytes u u 64K databases (files) on any host in network u u 32K containers per database u u 64K logical pages per container 4GB for 64KB page; 0.5GB for 8KB page size u u 64K object slots per page Theoretical limit: 10,000PB ( = 10EB)   assuming database files of 128TB Maximum practical file size ~10GB u u time to stage: seconds, tape capacity Pending architectural changes for VLDBs u u multi-file DBs (e.g. map containers to files) Federation Database Container Page Object

31 Particle Physics Data Grid (PPDG) u u First Year Goal: Optimized cached read access to 1-10 GBytes, drawn from a total data set of order One Petabyte PRIMARY SITE Data Acquisition, CPU, Disk, Tape Robot SECONDARY SITE CPU, Disk, Tape Robot Site to Site Data Replication Service 100 Mbytes/sec DOE/NGI Next Generation Internet Project ANL, BNL, Caltech, FNAL, JLAB, LBNL, SDSC, SLAC, U.Wisc/CS University CPU, Disk, Users PRIMARY SITE DAQ, Tape, CPU, Disk, Robot Satellite Site Tape, CPU, Disk, Robot University CPU, Disk, Users University Users University Users University Users Multi-Site Cached File Access Service Satellite Site Tape, CPU, Disk, Robot

32 PPDG: Architecture for Reliable High Speed Data Delivery Object-based and File-based Application Services Cache Manager File Access Service MatchmakingService Cost Estimation File Fetching Service File Replication Index End-to-End Network Services Mass Storage Manager ResourceManagement File Mover Site Boundary Security Domain

33 Distributed Data Delivery and LHC Software Architecture Architectural Flexibility u uGRID will allow resources to be used efficiently è è I/O requests up-front; data driven; respond to ensemble of changing cost estimates è è Code movement as well as data movement è è Loosely coupled, dynamic: e.g. Agent-based implementation

34 Summary - Data Issues Development of a robust PB-scale networked data access and analysis system is mission-critical u u An effective partnership exists, HENP-wide, through many R&D projects An aggressive R&D program is required to develop the necessary systems u For reliable data access, processing and analysis across a hierarchy of networks Solutions could be widely applicable to data problems in other scientific fields and industry, by LHC startup

35 Conclusions CMS has a first order estimate of needed resources and costs CMS has identified key issues concerning needed resources CMS is doing a lot of focused R&D work to refine estimates A lot of integration is needed for the software and hardware architecture We have positive feedback from different institutions for regional centres and development