Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Ian Bird LHC Computing Grid Project Leader Göttingen Tier 2 Inauguration 13 th May 2008 Challenges and Opportunities.

Similar presentations


Presentation on theme: "Dr. Ian Bird LHC Computing Grid Project Leader Göttingen Tier 2 Inauguration 13 th May 2008 Challenges and Opportunities."— Presentation transcript:

1 Dr. Ian Bird LHC Computing Grid Project Leader Göttingen Tier 2 Inauguration 13 th May 2008 Challenges and Opportunities

2 Ian.Bird@cern.ch The scales 2

3 Ian.Bird@cern.ch Chambres à muons Calorimètre - High Energy Physics machines and detectors pp @ √s=14 TeV L : 10 34 /cm 2 /s L: 2.10 32 /cm 2 /s 2,5 million collisions per second  LVL1: 10 KHz, LVL3: 50-100 Hz  25 MB/sec digitized recording 40 million collisions per second  LVL1: 1 kHz, LVL3: 100 Hz  0.1 to 1 GB/sec digitized recording 3

4 LHC: 4 experiments … ready! First physics expected in autumn 2008 Ian.Bird@cern.ch 4

5 The LHC Computing Challenge Ian.Bird@cern.ch 5  Signal/Noise: 10 -9  Data volume High rate * large number of channels * 4 experiments  15 PetaBytes of new data each year  Compute power Event complexity * Nb. events * thousands users  100 k of (today's) fastest CPUs  Worldwide analysis & funding Computing funding locally in major regions & countries Efficient analysis everywhere  GRID technology

6 A collision at LHC Ian.Bird@cern.ch Luminosity : 10 34 cm -2 s -1 40 MHz – every 25 ns 20 events overlaying 6

7 The Data Acquisition Ian.Bird@cern.ch 7

8 Tier 0 at CERN: Acquisition, First pass reconstruction, Storage & Distribution Ian.Bird@cern.ch 1.25 GB/sec (ions) 8

9 Tier 0 – Tier 1 – Tier 2 Ian.Bird@cern.ch 9 Tier-0 (CERN): Data recording First-pass reconstruction Data distribution Tier-1 (11 centres): Permanent storage Re-processing Analysis Tier-2 (>200 centres): Simulation End-user analysis

10 Evolution of requirements Ian.Bird@cern.ch 10 199419951996199719981999200020012002200320042005200620072008 LHC approved ATLAS & CMS approved ALICE approved LHCb approved “Hoffmann” Review 7x10 7 MIPS 1,900 TB disk ATLAS (or CMS) requirements for first year at design luminosity ATLAS&CMS CTP 10 7 MIPS 100 TB disk LHC start Computing TDRs 55x10 7 MIPS 70,000 TB disk (140 MSi2K)

11 Evolution of CPU Capacity at CERN SC (0.6GeV) PS (28GeV) ISR (300GeV) SPS (400GeV) ppbar (540GeV) LEP (100GeV) LEP II (200GeV) LHC (14 TeV) Costs (2007 Swiss Francs) Includes infrastructure costs (comp.centre, power, cooling,..) and physics tapes Tape & disk requirements: >10 times CERN possibility

12 Evolution of Grids Ian.Bird@cern.ch 12 Data Challenges First physics GRID 3 EGEE 1 LCG 1 EU DataGrid GriPhyN, iVDGL, PPDG EGEE 2 OSG LCG 2 EGEE 3 199419951996199719981999200020012002200320042005200620072008 WLCG 199419951996199719981999200020012002200320042005200620072008 Service Challenges Cosmics

13 The Worldwide LHC Computing Grid  Purpose  Develop, build and maintain a distributed computing environment for the storage and analysis of data from the four LHC experiments  Ensure the computing service  … and common application libraries and tools  Phase I – 2002-05 - Development & planning  Phase II – 2006-2008 – Deployment & commissioning of the initial services 13Ian.Bird@cern.ch

14 WLCG Collaboration  The Collaboration 4 LHC experiments ~250 computing centres 12 large centres (Tier-0, Tier-1) 56 federations of smaller “Tier-2” centres Growing to ~40 countries Grids: EGEE, OSG, Nordugrid  Technical Design Reports WLCG, 4 Experiments: June 2005  Memorandum of Understanding Agreed in October 2005  Resources 5-year forward look 14 Ian.Bird@cern.ch Tier 1 – all have now signed Tier 2: Tier 1 – all have now signed Tier 2: Australia Belgium Canada * China Czech Rep. * Denmark Estonia Finland France Germany (*) Hungary * Italy India Israel Japan JINR Korea Netherlands Norway * Pakistan Poland Potugal Romania Russia Slovenia Spain Sweden * Switzerland Taipei Turkey * UK Ukraine USA Still to sign: Austria Brazil (under discussion) * Recent additions MoU Signing Status

15 WLCG Service Hierarchy Ian.Bird@cern.ch 15 Tier-0 – the accelerator centre  Data acquisition & initial processing  Long-term data curation  Distribution of data  Tier-1 centres Tier-1 – “online” to the data acquisition process  high availability  Managed Mass Storage –  grid-enabled data service  Data-heavy analysis  National, regional support Tier-2: ~130 centres in ~35 countries  End-user (physicist, research group) analysis – where the discoveries are made  Simulation Canada – Triumf (Vancouver) France – IN2P3 (Lyon) Germany – Forschunszentrum Karlsruhe Italy – CNAF (Bologna) Netherlands – NIKHEF/SARA (Amsterdam) Nordic countries – distributed Tier-1 Spain – PIC (Barcelona) Taiwan – Academia SInica (Taipei) UK – CLRC (Oxford) US – FermiLab (Illinois) – Brookhaven (NY)

16 Recent grid use Tier 2: 54% CERN: 11% Tier 1: 35%  Across all grid infrastructures EGEE, OSG, Nordugrid  The grid concept really works – all contributions – large & small are essential!

17 Recent grid activity  These workloads (reported across all WLCG centres) are at the level anticipated for 2008 data taking 230k /day  WLCG ran ~ 44 M jobs in 2007 – workload has continued to increase  29M in 2008 – now at ~ >300k jobs/day  Distribution of work across Tier0/Tier1/Tier 2 really illustrates the importance of the grid system  Tier 2 contribution is around 50%; > 85% is external to CERN 300k /day

18 LHCOPN Architecture 18 Ian.Bird@cern.ch

19 Data Transfer out of Tier-0 Ian.Bird@cern.ch 19 Target: 2008/2009 1.3 GB/s

20 Production Grids  WLCG relies on a production quality infrastructure Requires standards of: ○ Availability/reliability ○ Performance ○ Manageability Will be used 365 days a year... (has been for several years!) Tier 1s must store the data for at least the lifetime of the LHC - ~20 years ○ Not passive – requires active migration to newer media  Vital that we build a fault-tolerant and reliable system That can deal with individual sites being down and recover Ian.Bird@cern.ch 20

21 The EGEE Production Infrastructure Production Service Pre-production service Certification test-beds (SA3) Test-beds & Services Operations Coordination Centre Regional Operations Centres Global Grid User Support EGEE Network Operations Centre (SA2) Operational Security Coordination Team Operations Advisory Group (+NA4) Joint Security Policy GroupEuGridPMA (& IGTF) Grid Security Vulnerability Group Security & Policy Groups Support Structures & Processes Training infrastructure (NA4) Training activities (NA3)

22 Site Reliability Ian.Bird@cern.ch 22 Sep 07Oct 07Nov 07Dec 07Jan 08Feb 08 All89%86%92%87%89%84% 8 best93% 95% 96% Above target (+>90% target) 7 + 25 + 49 + 26 + 47 + 3

23 Improving Reliability  Monitoring  Metrics  Workshops  Data challenges  Experience  Systematic problem analysis  Priority from software developers

24 Gridmap

25 Middleware: Baseline Services  Storage Element Castor, dCache, DPM Storm added in 2007 SRM 2.2 – deployed in production – Dec 2007  Basic transfer tools – Gridftp,..  File Transfer Service (FTS)  LCG File Catalog (LFC)  LCG data mgt tools - lcg-utils  Posix I/O – Grid File Access Library (GFAL)  Synchronised databases T0  T1s 3D project  Information System Scalability improvements  Compute Elements Globus/Condor-C – improvements to LCG-CE for scale/reliability web services (CREAM) Support for multi-user pilot jobs (glexec, SCAS)  gLite Workload Management in production  VO Management System (VOMS)  VO Boxes  Application software installation  Job Monitoring Tools The Basic Baseline Services – from the TDR (2005) Focus now on continuing evolution of reliability, performance, functionality, requirements For a production grid the middleware must allow us to build fault-tolerant and scalable services: this is more important than sophisticated functionality

26 Database replication  In full production Several GB/day user data can be sustained to all Tier 1s  ~100 DB nodes at CERN and several 10’s of nodes at Tier 1 sites Very large distributed database deployment  Used for several applications Experiment calibration data; replicating (central, read-only) file catalogues

27 LCG depends on two major science grid infrastructures …. EGEE - Enabling Grids for E-Science OSG - US Open Science Grid 27 Interoperability & interoperation is vital significant effort in building the procedures to support it

28 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 240 sites 45 countries 45,000 CPUs 12 PetaBytes > 5000 users > 100 VOs > 100,000 jobs/day Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences … Grid infrastructure project co-funded by the European Commission - now in 2 nd phase with 91 partners in 32 countries

29 EGEE: Increasing workloads ⅓ non-LHC

30 Grid Applications Ian.Bird@cern.ch 30 Medical Seismology Chemistry Astronomy Fusion Particle Physics

31 Share of EGEE resources Ian.Bird@cern.ch 31 5/07 – 4/08: 45 Million jobs HEP

32 HEP use of EGEE: May 07 – Apr 08 Ian.Bird@cern.ch 32

33 The next step

34 Sustainability: Beyond EGEE-II  Need to prepare permanent, common Grid infrastructure  Ensure the long-term sustainability of the European e- infrastructure independent of short project funding cycles  Coordinate the integration and interaction between National Grid Infrastructures (NGIs)  Operate the European level of the production Grid infrastructure for a wide range of scientific disciplines to link NGIs

35 EGI – European Grid Initiative www.eu-egi.org EGI Design Study proposal to the European Commission (started Sept 07) Supported by 37 National Grid Initiatives (NGIs) 2 year project to prepare the setup and operation of a new organizational model for a sustainable pan-European grid infrastructure after the end of EGEE-3

36 Summary  We have an operating production quality grid infrastructure that: Is in continuous use by all 4 experiments (and many other applications); Is still growing in size – sites, resources (and still to finish ramp up for LHC start-up); Demonstrates interoperability (and interoperation!) between 3 different grid infrastructures (EGEE, OSG, Nordugrid); Is becoming more and more reliable; Is ready for LHC start up  For the future we must: Learn how to reduce the effort required for operation; Tackle upcoming issues of infrastructure (e.g. Power, cooling); Manage migration of underlying infrastructures to longer term models; Be ready to adapt the WLCG service to new ways of doing distributed computing 36 Ian.Bird@cern.ch

37


Download ppt "Dr. Ian Bird LHC Computing Grid Project Leader Göttingen Tier 2 Inauguration 13 th May 2008 Challenges and Opportunities."

Similar presentations


Ads by Google