A Large Hadron Collider Case Study - Where HPC and Big Data Converge Frank Würthwein Professor of Physics University of California San Diego November 15th,

A Large Hadron Collider Case Study - Where HPC and Big Data Converge Frank Würthwein Professor of Physics University of California San Diego November 15th, 2013

Outline The Science Software & Computing Challenges Present Solutions Future Solutions November 15th 2013Frank Wurthwein - HP-CAST212

The Science

~67% of energy is “dark energy” ~29% of matter is “dark matter” All of what we know makes up Only about 4% of the universe. We have some ideas but no proof of what this is! We got no clue what this is. The Universe is a strange place! November 15th 2013Frank Wurthwein - HP-CAST214

To study Dark Matter we need to create it in the laboratory November 15th 2013Frank Wurthwein - HP-CAST215 Mont Blanc Lake Geneva ALICE ATLAS LHCb CMS

“Big bang” in the laboratory We gain insight by colliding particles at the highest energies possible to measure: –Production rates –Masses & lifetimes –Decay rates From this we derive the “spectroscopy” as well as the “dynamics” of elementary particles. Progress is made by going to higher energies and brighter beams. November 15th 2013Frank Wurthwein - HP-CAST217

Explore Nature over 15 Orders of magnitude Perfect agreement between Theory & Experiment Dark Matter expected somewhere below this line. November 15th 2013Frank Wurthwein - HP-CAST218

And for the Sci-Fi Buffs … Imagine our 3D world to be confined to a 3D surface in a 4D universe. Imagine this surface to be curved such that the 4 th D distance is short for locations light years away in 3D. Imagine space travel by tunneling through the 4 th D. The LHC is searching for evidence of a 4 th dimension of space. November 15th 2013Frank Wurthwein - HP-CAST219

Recap so far … The beams cross in the ATLAS and CMS detectors at a rate of 20MHz Each crossing contains ~10 collisions We are looking for rare events that are expected to occur in roughly 1/10000000000000 collisions, or less. November 15th 2013Frank Wurthwein - HP-CAST2110

Software & Computing Challenges

The CMS Experiment

80 Million electronic channels x 4 bytes x 40MHz ----------------------- ~ 10 Petabytes/sec of information x 1/1000 zero-suppression x 1/100,000 online event filtering ------------------------ ~ 100-1000 Megabytes/sec raw data to tape 1 to 10 Petabytes of raw data per year written to tape, not counting simulations. 2000 Scientists (1200 Ph.D. in physics) –~ 180 Institutions –~ 40 countries 12,500 tons, 21m long, 16m diameter November 15th 2013Frank Wurthwein - HP-CAST2113

Example of an interesting Event November 15th 2013Frank Wurthwein - HP-CAST2114 Higgs to γγ candidate

Zoomed in R-Z view of a busy event November 15th 2013Frank Wurthwein - HP-CAST2115 Yellow dots indicate individual collisions, all during the same beam crossing.

Active Scientists in CMS November 15th 2013Frank Wurthwein - HP-CAST2116 5-40% of the scientific members are actively doing large scale data analysis in any given week. ~1/4 of the collaboration, scientists and engineers, contributed to the common source code of ~3.6M C++ SLOC.

Evolution of LHC Science Program 150Hz 1000Hz 10000Hz Event Rate written to tape November 15th 2013Frank Wurthwein - HP-CAST2117

The Challenge How do we organize the processing of 10’s to 1000’s of Petabytes of data by a globally distributed community of scientists, and do so with manageable “change costs” for the next 20 years ? Guiding Principles for Solutions Chose technical solutions that allow computing resources as distributed as human resources. Support distributed ownership and control, within a global single sign-on security context. Design for heterogeneity and adaptability. November 15th 2013Frank Wurthwein - HP-CAST2118

Present Solutions

November 15th 2013Frank Wurthwein - HP-CAST2120 Federation of National Infrastructures. In the U.S.A.: Open Science Grid

November 15th 2013Frank Wurthwein - HP-CAST2121 Among the top 500 supercomputers there are only two that are bigger when measured by power consumption.

Tier-3 Centers Locally controlled resources not pledged to any of the 4 collaborations. –Large clusters at major research Universities that are time shared. –Small clusters inside departments and individual research groups. Requires global sign-on system to be open for dynamically adding resources. –Easy to support APIs –Easy to work around unsupported APIs November 15th 2013Frank Wurthwein - HP-CAST2122

Me -- My friends -- The grid/cloud O(10 4 ) Users O(10 2-3 ) Sites O(10 1-2 ) VOs Thin client Thin “Grid API” Thick VO Middleware & Support Me My friends The anonymous Grid or Cloud Domain science specific Common to all sciences and industry November 15th 2013Frank Wurthwein - HP-CAST2123

“My Friends” Services Dynamic Resource provisioning Workload management –schedule resource, establish runtime environment, execute workload, handle results, clean up Data distribution and access –Input, output, and relevant metadata File catalogue November 15th 2013Frank Wurthwein - HP-CAST2124

Optimize Data Structure for Partial Reads November 15th 2013Frank Wurthwein - HP-CAST2125

Fraction of a file that is read November 15th 2013Frank Wurthwein - HP-CAST2126 # of files read For vast majority of files, less than 20% of the file is read. 20% Average 20-35% Median 3-7% (depending on type of file) Overflow bin

Future Solutions

From present to future Initially, we operated a largely static system. –Data was placed quasi-static before it can be analyzed. –Analysis centers have contractual agreements with the collaboration. –All reconstruction is done at centers with custodial archives. Increasingly, we have too much data to afford this. –Dynamic data placement Data is placed at T2s based on job backlog in global queues. –WAN access: ”Any Data, Anytime, Anywhere” Jobs are started on the same continent as the data instead of the same cluster attached to the data. –Dynamic creation of data processing centers Tier-1 hardware bought to satisfy steady state needs instead of peak needs. Primary processing as data comes off the detector => steady state Annual Reprocessing of accumulated data => peak needs November 15th 2013Frank Wurthwein - HP-CAST2128

Any Data, Anytime, Anywhere November 15th 2013Frank Wurthwein - HP-CAST2129 Global redirection system to unify all CMS data into one globally accessible namespace. Is made possible by paying careful attention to IO layer to avoid inefficiencies due to IO related latencies.

Vision going forward Implemented vision for 1 st time in Spring 2013 using Gordon Supercomputer at SDSC. November 15th 2013Frank Wurthwein - HP-CAST2130

November 15th 2013Frank Wurthwein - HP-CAST2131

CMS “My Friends” Stack CMSSW release environment –NFS exported from Gordon IO nodes –Future: CernVM-FS via Squid caches J. Blomer et al.; 2012 J. Phys.: Conf. Ser. 396 052013J. Blomer et al.; 2012 J. Phys.: Conf. Ser. 396 052013 Security Context (CA certs, CRLs) via OSG worker node clientOSG worker node client CMS calibration data access via FroNTier B. Blumenfeld et al; 2008 J. Phys.: Conf. Ser. 119 072007B. Blumenfeld et al; 2008 J. Phys.: Conf. Ser. 119 072007 –Squid caches installed on Gordon IO nodes glideinWMS I. Sfiligoi et al.; doi:10.1109/CSIE.2009.950doi:10.1109/CSIE.2009.950 –Implements “late binding” provisioning of CPU and job scheduling –Submits pilots to Gordon via BOSCO (GSI-SSH)BOSCO WMAgent to manage CMS workloadsWMAgent PhEDEx data transfer managementPhEDEx –Uses SRM and gridftpSRMgridftp November 15th 2013Frank Wurthwein - HP-CAST2132 Job environment Data and Job handling

CMS “My Friends” Stack CMSSW release environment –NFS exported from Gordon IO nodes –Future: CernVM-FS via Squid caches J. Blomer et al.; 2012 J. Phys.: Conf. Ser. 396 052013J. Blomer et al.; 2012 J. Phys.: Conf. Ser. 396 052013 Security Context (CA certs, CRLs) via OSG worker node clientOSG worker node client CMS calibration data access via FroNTier B. Blumenfeld et al; 2008 J. Phys.: Conf. Ser. 119 072007B. Blumenfeld et al; 2008 J. Phys.: Conf. Ser. 119 072007 –Squid caches installed on Gordon IO nodes glideinWMS I. Sfiligoi et al.; doi:10.1109/CSIE.2009.950doi:10.1109/CSIE.2009.950 –Implements “late binding” provisioning of CPU and job scheduling –Submits pilots to Gordon via BOSCO (GSI-SSH)BOSCO WMAgent to manage CMS workloadsWMAgent PhEDEx data transfer managementPhEDEx –Uses SRM and gridftpSRMgridftp November 15th 2013Frank Wurthwein - HP-CAST2133 Job environment Data and Job handling This is clearly mighty complex !!! So let’s focus only on the parts that are specific to incorporating Gordon as a dynamic data processing center.

November 15th 2013Frank Wurthwein - HP-CAST2134 Items in red were deployed/modified to incorporate Gordon BOSCO Minor mod of PhEDEx config file Deploy Squid Export CMSSW & WN client

Gordon Results Work completed in February/March 2013 as a result of a “lunch conversation” between SDSC & US-CMS management –Dynamically responding to an opportunity 400 Million RAW events processed –125 TB in and ~150 TB out –~2 Million core hours of processing Extremely useful for both science results as well as proof of principle in software & computing. November 15th 2013Frank Wurthwein - HP-CAST2135

Summary & Conclusions Guided by the principles: –Support distributed ownership and control in a global single sign-on security context. –Design for heterogeneity and adaptability The LHC experiments very successfully developed and implemented a set of new concepts to deal with BigData. November 15th 2013Frank Wurthwein - HP-CAST2136

Outlook (I) The LHC experiments had to largely invent an island of BigData technologies with limited interactions with industry and other domain sciences. Is it worth building bridges to other islands ? –IO stack and HDF5 ? –MapReduce ? –What else ? Is there a mainland emerging that is not just another island ? November 15th 2013Frank Wurthwein - HP-CAST2137

Outlook (II) November 15th 2013Frank Wurthwein - HP-CAST2138 With increasing brightness of the beams, the number of simultaneous collisions increases from ~10 to ~140. The resulting increase in # of hits in the detector leads to an exponential growth in the CPU time needed to do the pattern recognition at the core of our reconstruction software. O(10 4 ) by 2023 O(10 4 ) ~ O(10) x O(10) x O(10) x O(10) Moore’s law New hardware architectures New algorithms Built a better detector Hoped for solution: Problem:

A Large Hadron Collider Case Study - Where HPC and Big Data Converge Frank Würthwein Professor of Physics University of California San Diego November 15th,

Similar presentations

Presentation on theme: "A Large Hadron Collider Case Study - Where HPC and Big Data Converge Frank Würthwein Professor of Physics University of California San Diego November 15th,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Large Hadron Collider Case Study - Where HPC and Big Data Converge Frank Würthwein Professor of Physics University of California San Diego November 15th,

Similar presentations

Presentation on theme: "A Large Hadron Collider Case Study - Where HPC and Big Data Converge Frank Würthwein Professor of Physics University of California San Diego November 15th,"— Presentation transcript:

Similar presentations

About project

Feedback