Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data for Big Discoveries How the LHC looks for Needles by Burning Haystacks Alberto Di Meglio CERN openlab Head DOI: 10.5281/zenodo.45449, CC-BY-SA,

Similar presentations


Presentation on theme: "Big Data for Big Discoveries How the LHC looks for Needles by Burning Haystacks Alberto Di Meglio CERN openlab Head DOI: 10.5281/zenodo.45449, CC-BY-SA,"— Presentation transcript:

1 Big Data for Big Discoveries How the LHC looks for Needles by Burning Haystacks Alberto Di Meglio CERN openlab Head DOI: 10.5281/zenodo.45449, CC-BY-SA, images courtesy of CERN

2 The Large Hadron Collider (LHC) Alberto Di Meglio – CERN LHC - HPC & Big Data 20162

3 Burning Haystacks The Detectors: 7000 tons “microscopes” 150 million sensors Data generated 40 million times per second Low-level Filters and Triggers 100,000 selections per second High-level filters (HLT) 100 selections per second  Peta Bytes / sec !  Tera Bytes / sec !  Giga Bytes / sec ! Alberto Di Meglio – CERN LHC - HPC & Big Data 20163

4 Storage, Reconstruction, Simulation, Distribution 6 GB/s Alberto Di Meglio – CERN LHC - HPC & Big Data 20164

5 Worldwide LHC Computing Grid Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 Tier-0 (CERN): Data recording Initial data reconstruction Data distribution Tier-1 (12 centres): Permanent storage Re-processing Analysis Tier-2 (68 Federations, ~140 centres): Simulation End-user analysis 525,000 cores 450 PB 5

6 1 PB/s of data generated by the detectors Up to 30 PB/year of stored data A distributed computing infrastructure of half a million cores working 24/7 An average of 40M jobs/month An continuous data transfer rate of 4-6 GB/s across the Worldwide LHC Grid (WLCG) Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 6

7 The Higgs Boson completes the Standard Model, but the Model explains only about 5% of our Universe What is the other 95% of the Universe made of? How does gravity really works? Why there is no antimatter in nature? Exotic particles? Dark matter? Extra dimensions? 7

8 LHC Schedule First runLS1Second runLS2Third runLS3HL-LHC 20092013201420152016201720182011201020112019202320242030?202120202022 … LHC startup 900 GeV 7 TeV L=6x10 33 cm -2 s -1 Bunch spacing = 50 ns Phase-0 Upgrade (design energy, nominal luminosity) 14 TeV L=1x10 34 cm -2 s -1 Bunch spacing = 25 ns Phase-1 Upgrade (design energy, design luminosity) 14 TeV L=2x10 34 cm -2 s -1 Bunch spacing = 25 ns Phase-2 Upgrade (High Luminosity) 14 TeV L=1x10 35 cm -2 s -1 Spacing = 12.5 ns FCC? 50 times more data than today in the next 10 years 50 PB/s out of the detectors 5 PB/day to be stored Alberto Di Meglio – CERN LHC - HPC & Big Data 20168

9 Information Technology Research Areas Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 Data acquisition and filteringComputing platforms, data analysis, simulationData storage and long-term data preservationCompute provisioning (cloud) Data analytics Networks 9

10 Flat Budget How can we Address the Challenges? Alberto Di Meglio – CERN LHC - HPC & Big Data 201610

11 Technology Evolution  More performance, less cost  Redesign of DAQ Less custom hardware, more off-the-shelf CPUs, fast interconnects, optimized SW  Redesign of SW to exploit modern CPU/GPU platforms (vectorization, parallelization)  New architectures (e.g. automata)  More performance, less cost  Redesign of DAQ Less custom hardware, more off-the-shelf CPUs, fast interconnects, optimized SW  Redesign of SW to exploit modern CPU/GPU platforms (vectorization, parallelization)  New architectures (e.g. automata) Alberto Di Meglio – CERN LHC - HPC & Big Data 201611 Computing Storage Infrastructure Data Analytics

12 Technology Evolution  Reduce data to be stored Move computation from offline to online  More tapes, less disks Dynamic data placements  New media Object-disks, shingled  Persistent meta-data, in-memory ops Flash/SSD, NVRAM, NVDIMM  Reduce data to be stored Move computation from offline to online  More tapes, less disks Dynamic data placements  New media Object-disks, shingled  Persistent meta-data, in-memory ops Flash/SSD, NVRAM, NVDIMM Alberto Di Meglio – CERN LHC - HPC & Big Data 201612 Computing Storage Infrastructure Data Analytics

13 Technology Evolution  Economies of scale  Large-scale, joint procurement of services and equipment  Hybrid infrastructure models Allow commercial resource procurement Large-scale cloud procurement models are not yet clear  Economies of scale  Large-scale, joint procurement of services and equipment  Hybrid infrastructure models Allow commercial resource procurement Large-scale cloud procurement models are not yet clear Alberto Di Meglio – CERN LHC - HPC & Big Data 201613 Computing Storage Infrastructure Data Analytics

14 Technology Evolution  Reduce analysis cost, decrease time  More efficient operations of the LHC systems Log analysis, proactive maintenance  Infrastructure monitoring, optimization  Data reduction, event filtering and identification on large data sets (~10PB)  Reduce analysis cost, decrease time  More efficient operations of the LHC systems Log analysis, proactive maintenance  Infrastructure monitoring, optimization  Data reduction, event filtering and identification on large data sets (~10PB) Alberto Di Meglio – CERN LHC - HPC & Big Data 201614 Computing Storage Infrastructure Data Analytics

15 Collaborations Alberto Di Meglio – CERN openlab Industrial Collaboration Frameworks HEP Community Joint Infrastructure Initiatives Laboratories, academia, research 15

16 New educational requirements Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 Multicore CPU programming, graphical processors (GPU), multithreaded software Software & Computing Engineers Data analysis technologies, tools, data storage, visualization, monitoring, security, etc. Data Scientists Applications of physics to other domains (hadron therapy, etc.), Knowledge transfer Multidisciplinary applications 16

17 Take-Away Messages  LHC is a long-term project Need to adapt and evolve in time  Look for cost-effective solutions Continuous technology tracking Aim for “just as good as necessary” based on concrete scientific objectives  Collaboration and education People is the most important asset to make this endeavour sustainable over time Alberto Di Meglio – CERN LHC - HPC & Big Data 201617

18 Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 EXECUTIVE CONTACT Alberto Di Meglio, CERN openlab Head alberto.di.meglio@cern.ch TECHNICAL CONTACT Maria Girone, CERN openlab CTO Maria.girone@cern.ch Fons Rademakers, CERN openlab CRO fons.rademakers@cern.ch COMMUNICATION CONTACT Andrew Purcell, CERN openlab Communication Officer Andrew.purcell@cern.ch Mélissa Gaillard, CERN IT Communication Officer melissa.gaillard@cern.ch ADMIN CONTACT Kristina Gunne, CERN openlab Administration Officer kristina.gunne@cern.ch 18 This work is licensed under a Creative Commons Attribution- ShareAlike 3.0 Unported License. It includes photos, models and videos courtesy of CERN and uses contents provided by CERN and CERN openlab


Download ppt "Big Data for Big Discoveries How the LHC looks for Needles by Burning Haystacks Alberto Di Meglio CERN openlab Head DOI: 10.5281/zenodo.45449, CC-BY-SA,"

Similar presentations


Ads by Google