Presentation is loading. Please wait.

Presentation is loading. Please wait.

High Energy Physics – A big data use case Bob Jones Head of openlab IT dept CERN This document produced by Members of the Helix Nebula consortium is licensed.

Similar presentations


Presentation on theme: "High Energy Physics – A big data use case Bob Jones Head of openlab IT dept CERN This document produced by Members of the Helix Nebula consortium is licensed."— Presentation transcript:

1 High Energy Physics – A big data use case Bob Jones Head of openlab IT dept CERN This document produced by Members of the Helix Nebula consortium is licensed under a Creative Commons Attribution 3.0 Unported License. Permissions beyond the scope of this license may be available at http://helix-nebula.eu/. The Helix Nebula project is co-funded by the European Community Seventh Framework Programme (FP7/2007-2013) under Grant Agreement no 312301Members of the Helix Nebula consortiumCreative Commons Attribution 3.0 Unported Licensehttp://helix-nebula.eu/ Franco-British Workshop on Big Data in Science London, 6-7 November 2012

2 Accelerating Science and Innovation 2

3 3 200-400 MB/sec Data flow to permanent storage: 4-6 GB/sec 1.25 GB/sec 1-2 GB/sec

4 A distributed computing infrastructure to provide the production and analysis environments for the LHC experiments Managed and operated by a worldwide collaboration between the experiments and the participating computer centres The resources are distributed – for funding and sociological reasons Our task was to make use of the resources available to us – no matter where they are located Secure access via X509 certificates issued by network of national authorities - International Grid Trust Federation (IGTF) – http://www.igtf.net/ WLCG – what and why? Tier-0 (CERN): Data recording Initial data reconstruction Data distribution Tier-1 (11 centres): Permanent storage Re-processing Analysis Tier-2 (~130 centres): Simulation End-user analysis 4

5 Castor service at Tier 0 well adapted to the load: – Heavy Ions: more than 6 GB/s to tape (tests show that Castor can easily support >12 GB/s); Actual limit now is network from experiment to CC – Major improvements in tape efficiencies – tape writing at ~native drive speeds. Fewer drives needed – ALICE had x3 compression for raw data in HI runs WLCG: Data Taking HI: ALICE data into Castor > 4 GB/s (red) HI: Overall rates to tape > 6 GB/s (r+b) 23 PB data written in 2011 16 PB in 2012 ! 23 PB data written in 2011 16 PB in 2012 !

6 Overall use of WLCG 10 9 HEPSPEC-hours/month (~150 k CPU continuous use) 10 9 HEPSPEC-hours/month (~150 k CPU continuous use) 1.5M jobs/day Usage continues to grow even over end of year technical stop -# jobs/day -CPU usage Usage continues to grow even over end of year technical stop -# jobs/day -CPU usage

7 Significant use of Tier 2s for analysis CPU – 11.2010-10.2011

8 WLCG has been leveraged on both sides of the Atlantic, to benefit the wider scientific community – Europe (EC FP7): Enabling Grids for E-sciencE (EGEE) 2004-2010 European Grid Infrastructure (EGI) 2010-- – USA (NSF): Open Science Grid (OSG) 2006-2012 (+ extension?) Many scientific applications  Broader Impact of the LHC Computing Grid Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences … Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences … 8

9 Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 May 2008 9 EGEE – What do we deliver? Infrastructure operation –Sites distributed across many countries  Large quantity of CPUs and storage  Continuous monitoring of grid services & automated site configuration/management  Support multiple Virtual Organisations from diverse research disciplines Middleware –Production quality middleware distributed under business friendly open source licence  Implements a service-oriented architecture that virtualises resources  Adheres to recommendations on web service inter-operability and evolving towards emerging standards User Support - Managed process from first contact through to production usage –Training –Expertise in grid-enabling applications –Online helpdesk –Networking events (User Forum, Conferences etc.)

10 Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 May 2008 10 Sample of Business Applications SMEs –NICE (Italy) & GridWisetech (Poland): develop services on open source middleware for deployment on customer in- house IT infrastructure –OpenPlast project – (France) Develop and deploy Grid platform for plastics industry –Imense Ltd (UK) - Ported gLite application and GridPP sites Energy –TOTAL, UK - Ported application using GILDA testbed –CGGVeritas (France) – manages in-house IT infrastructures and sells services to petrochemical industry Automotive DataMat (Italy) – Provides grid services to automotive industry

11

12 CERN openlab in a nutshell A science – industry partnership to drive R&D and innovation with over a decade of success Evaluate state-of-the-art technologies in a challenging environment and improve them Test in a research environment today what will be used in many business sectors tomorrow Train next generation of engineers/employees Disseminate results and outreach to new audiences CONTRIBUTOR (2012) Bob Jones – CERN openlab 201212

13 Virtuous Cycle CERN requirements push the limit Apply new techniques and technologies Joint development in rapid cycles Test prototypes in CERN environment Produce advanced products and services Bob Jones – CERN openlab 2012 A public-private partnership between the research community and industry 13

14 Inter-partner collaborations: 2 Fellows: 4 Summer Students: 6 Publications: 37 Presentations: 41 Reference Activities: over 15 Product enhancements: on 8 product lines Fellows: 4 Summer Students: 6 Publications: 37 Presentations: 41 Reference Activities: over 15 Product enhancements: on 8 product lines openlab III (2009-2011) CERN openlab Board of Sponsors 2012

15 ICE-DIP Marie Curie proposal submitted in January 2012 to EC and accepted for funding (total 1.25M€ from EC): ICE-DIP, the Intel-CERN European Doctorate Industrial Program, is an EID scheme hosted by CERN and Intel Labs Europe. ICE-DIP will engage 5 Early Stage Researchers (ESRs). Each ESR will be hired by CERN for 3 years and will spend 50% of their time at Intel. Academic rigour and training quality is ensured by the associate partners, National University of Ireland Maynooth and Dublin City University, where the ESRs will be enrolled in doctorate programmes. Research themes: usage of many-core processors for data acquisition, future optical interconnect technologies, reconfigurable logic and data acquisition networks. Focus is the LHC experiments’ trigger and data acquisition systems 15

16 How to evolve WLCG? A distributed computing infrastructure to provide the production and analysis environments for the LHC experiments Collaboration - The resources are distributed and provided “in-kind” Service - Managed and operated by a worldwide collaboration between the experiments and the participating computer centres Implementation - Today general grid technology with high-energy physics specific higher-level services Evolve the Implementation while preserving the collaboration & service 16

17 CERN-ATLAS flagship configuration Monte Carlo jobs (lighter I/O) 10s MB in/out ~6-12 hours/job Ran ~40,000 CPU days Ramón Medrano Llamas,Fernando Barreiro, Dan van der Ster (CERN IT), Rodney Walker (LMU Munich) Difficulties overcome Different vision of clouds Different APIs Networking aspects

18 Conclusions The Physics community took the concept of a grid and turned into a global production quality service aggregating massive resources to meet the needs of the LHC collaborations The results of this development serve a wide range of research communities; have helped industry understand how it can use distributed computing; have launched a number of start-up companies and provided the IT service industry with new tools to support their customers Open source licenses encourage the uptake of the technology by other research communities and industry while ensuring the research community contribution is acknowledged Providing access to computing infrastructures by industry and research communities for prototyping purposes reduces the investment and risk for the adoption of new technologies October 2012 - The LHC Computing Grid - Bob Jones

19 Conclusions Many research communities and business sectors are now facing an unprecedented data deluge. The Physics community with its LHC programme has unique experience in handling data at this scale The on-going work to evolve the LHC computing infrastructure to make use of cloud computing technology can serve as an excellent test ground for the adoption of cloud computing in many research communities, business sectors and government agencies The Helix Nebula initiative is driving the physics community exploration of how commercial cloud services can serve the research infrastructures of the future and provide new markets for European industry October 2012 - The LHC Computing Grid - Bob Jones


Download ppt "High Energy Physics – A big data use case Bob Jones Head of openlab IT dept CERN This document produced by Members of the Helix Nebula consortium is licensed."

Similar presentations


Ads by Google