Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Analysis in High Energy Physics, Weird or Wonderful? Richard P. Mount Director: Scientific Computing and Computing Services Stanford Linear Accelerator.

Similar presentations


Presentation on theme: "Data Analysis in High Energy Physics, Weird or Wonderful? Richard P. Mount Director: Scientific Computing and Computing Services Stanford Linear Accelerator."— Presentation transcript:

1 Data Analysis in High Energy Physics, Weird or Wonderful? Richard P. Mount Director: Scientific Computing and Computing Services Stanford Linear Accelerator Center ADASS 2007 September 24, 2007

2 September 24, 2007Richard P. Mount, SLAC2 Overview SLAC history and transition Why experimental HEP is so weird Modern experiment design HEP data and data-analysis Surviving with painful amount of data Computing hardware for HEP Challenges Summary

3 September 24, 2007Richard P. Mount, SLAC3 History and Transition 1962: SLAC founded to construct and exploit a 2-mile linear accelerator; 1968 – 1978; Nucleon structure (quarks) discovered 1973: SSRP(L) founded to exploit synchrotron radiation from SLAC’s SPEAR storage ring; 1974:  discovered at SPEAR 1977  lepton discovered at SPEAR 2003: KIPAC (Kavli Institute for Particle Astrophysics and Cosmology) founded at SLAC/Stanford; 2005: LCLS (Linac Coherent Light Source) high energy X-ray laser construction approved; 2008: Last year of BaBar data taking. Last year of operation of SLAC accelerators for experimental HEP; 2009: BES (Photon Science funding) takes ownership of SLAC accelerators. BES becomes the largest funding source for SLAC. LCLS operation begins; The future: –Vigorous Photon Science program; –Vigorous Astrophysics and Cosmology program; –Potentially vigorous HEP program.

4 September 24, 2007Richard P. Mount, SLAC4 Fundamental Physics at the Quantum Level Uncovering truth requires measuring probabilities: –Open production of new physics (normally “needle in haystack”) Electron-positron collisions at M Z (91.2 GeV) –Small perturbations due to new physics (extreme precision) Electron-positron collisions at low energy (<<M Z ) Both approaches require a LOT of collisions (especially if they are proton-proton collisions)

5 September 24, 2007Richard P. Mount, SLAC5 Hydrogen Bubble Chamber Photograph 1970 Low event rate, ~4  acceptance, physics selected by scanners CERN Photo

6 September 24, 2007Richard P. Mount, SLAC6 Sam Ting’s Experiment at BNL 1974 Discovery of the J/ , Nobel Prize 1976 High interaction rate: 10 12 protons/s Extremely small acceptance Physics selected by constructing a highly specific detector.

7 September 24, 2007Richard P. Mount, SLAC7 L3 Experiment, CERN, 1989-1999 Huge 4  Detector!

8 September 24, 2007Richard P. Mount, SLAC8

9 September 24, 2007Richard P. Mount, SLAC9 Modern Experiment Design Discoveries most likely by observing high center-of-mass energy collisions –High energy means a vast range of (largely boring) final states, each with low probability –High energy production of new physics will manifest itself in many final states each with excruciatingly low probability Hence: –Build high luminosity colliders –Measure the final states with huge 4  detectors –Record and analyze painfully large data samples

10 September 24, 2007Richard P. Mount, SLAC10 LHC Experiments 1 Petabyte/s of analog information

11 September 24, 2007Richard P. Mount, SLAC11 Characteristics of HENP Experiments Large, complex detectors: –Large (approaching worldwide) collaborations: 500 – 2000 physicists –Long (10 – 20 year) timescales God plays dice: –Simulation is essential –Data volumes typically well above the pain threshold: 10,000 tapes or more

12 September 24, 2007Richard P. Mount, SLAC12 What does a painful amount of data look like? Early 1990s 2000s

13 September 24, 2007Richard P. Mount, SLAC13 Simulation A series of non-invertible transformations: –Fundamental physics  probability distributions of final state particles –Final state particles  interactions/showers in the detector –Charged particles  ionization in gas/crystals/silicon –Ionization  charge in charge-measuring devices –“Trigger” decisions made to very selectively read out the devices –Complex and imperfect software reconstructs final state particles from the read-out data Simulation on a massive, often worldwide scale is essential to relate fundamental physics to what is observed.

14 September 24, 2007Richard P. Mount, SLAC14 HEP Data Models HEP data models are complex! –Typically hundreds of structure types (classes) –Many relations between them –Different access patterns HEP moved to OO (C ++ ) in the mid 1990s –OO applications deal with networks of objects –Pointers (or references) are used to describe relations Event TrackList TrackerCalor. Track Track Track Track Track HitList Hit Hit Hit Hit Hit Dirk Düllmann/CERN

15 September 24, 2007Richard P. Mount, SLAC15 Characteristics of HENP Data Analysis Data consists of billions of independent “events”; Events have an internal information content of hundreds or thousands of small objects; “Queries” are performed by writing code (every HEPhysicist is a C ++ programmer) Queries typically need 1% of the events, and a few of the objects within the events; Each query may have a large overlap with an earlier query or may be largely orthogonal to earlier queries; Its really nice when queries take minutes and not months.

16 September 24, 2007Richard P. Mount, SLAC16 Data Handling Supporting Physics Analysis for the L3 HEP Experiment A few terabytes in total Easing the pain – Tape-based analysis circa 1991

17 September 24, 2007Richard P. Mount, SLAC17 And then: Disks became cheap enough to use to cache the entire working set for analysis And even though our data doubled every year or so … Rising disk capacity kept up So we all rejoiced at our liberation from serial-access tape And we constructed magnificent object database systems capable accessing any object on demand But disk access rates stayed static Accessing data on disk became cripplingly inefficient unless the access was serial

18 September 24, 2007Richard P. Mount, SLAC18 So: We dusted off the tape-era streaming/filtering ideas And called them new names like “skimming” Allowing, for example, BaBar at SLAC to survive – even prosper – with a painfully large multi-petabyte dataset subject to intense analysis by hundreds of physicists and thousands of CPU cores.

19 September 24, 2007Richard P. Mount, SLAC19 The B A B AR Detector

20 September 24, 2007Richard P. Mount, SLAC20 Client Disk Server Tape Server SLAC-BaBar Computing Fabric IP Network (Cisco) 120 dual/quad CPU Sun/Solaris ~400 TB FibreChannel RAID arrays + ~400TB SATA >4000 cores, dual/quad CPU Linux 25 dual CPU Sun/Solaris 40 STK 9940B 6 STK 9840A 6 STK Powderhorn 3 PB of data HEP-specific ROOT software (Xrootd) + Objectivity/DB object database some NFS HPSS + SLAC enhancements to ROOT and Objectivity server code

21 September 24, 2007Richard P. Mount, SLAC21 Price/Performance Evolution: My Experience CPU Disk Capacity WAN Disk Random Access Disk Streaming Access

22 September 24, 2007Richard P. Mount, SLAC22 Tier 1 Online System Event Reconstruction France Germany Institute ~0.25TIPS ~100 MBps ~0.6-2.5 Gbps 100 - 1000 Mbps Physics data cache ~PBps ~0.6-2.5 Gbps Tier 0 +1 Tier 3 Tier 4 Tier 2 2000 physicists in 31 countries are involved in this 20- year experiment in which DOE is a major player. Grid infrastructure spread over the US and Europe coordinates the data analysis Analysis Italy FermiLab, USA Distributed Data Analysis and the Grid CERN / LHC High Energy Physics Data 2008 onwards Event Simulation CERN LHC CMS detector 12,500 tons, $700M 2.5-40 Gbps

23 September 24, 2007Richard P. Mount, SLAC23 A New Dream of Random Access

24 September 24, 2007Richard P. Mount, SLAC24 History and Transition 1962: SLAC founded to construct and exploit a 2-mile linear accelerator; 1968 – 1978; Nucleon structure (quarks) discovered 1973: SSRP(L) founded to exploit synchrotron radiation from SLAC’s SPEAR storage ring; 1974:  discovered at SPEAR 1977  lepton discovered at SPEAR 2003: KIPAC (Kavli Institute for Particle Astrophysics and Cosmology) founded at SLAC/Stanford; 2005: LCLS (Linac Coherent Light Source) high energy X-ray laser construction approved; 2008: Last year of BaBar data taking. Last year of operation of SLAC accelerators for experimental HEP; 2009: BES (Photon Science funding) takes ownership of SLAC accelerators. BES becomes the largest funding source for SLAC. LCLS operation begins; The future: –Vigorous Photon Science program; –Vigorous Astrophysics and Cosmology program; –Potentially vigorous HEP program.

25 September 24, 2007Richard P. Mount, SLAC25 Cultural Evolution in SLAC Computing Experimental HEP: –Large, organized collaborations –Computing recognized as vital –Detailed planning for computing –Expectation of highly professional computing Astrophysics and Cosmology (Theory) –Individual PIs or small collaborations –Computing recognized as vital –Desire for agility –“Professional” approach viewed as ponderous and costly Astronomy –Increasingly large collaborations and HEP-like timescales –Computing recognized as vital –Detailed (8 years in advance of data) planning for computing –Expectation of highly professional computing Photon Science –Light sources are costly facilities –Photon science has been “small science” up to now (“turn up for 3 days and take away a DVD”) –No large, organized collaborations –Computing not considered a problem by most scientists –Detailed planning would be a waste of time –“Wait for the crisis and then we will get the resources to fix the problems”

26 September 24, 2007Richard P. Mount, SLAC26 Technical Evolution in SLAC Computing Experimental HEP: –Challenging data-management needs (predominantly object-oriented?) –Throughput-oriented data processing, trivial parallelism –Commodity CPU boxes –Ethernet switches ideal –Experiment-managed bulk disk space Astrophysics and Cosmology (Theory) –Visualization –Parallel Computing – Shared-memory SMP –Parallel Computing – Infiniband MPI Clusters –Lustre and other “fancy” file systems –Macs and Linux boxes Astronomy –Visualization –Challenging data management needs (predominantly relational?) –Throughput-oriented data processing with real-time requirements –Lustre and other “fancy” file systems –Macs and Linux boxes Photon Science –Computer science challenges (extract information from a million noisy images) –MPI clusters needed now –Significant needs for bandwidth to storage –Other needs unclear

27 September 24, 2007Richard P. Mount, SLAC27 In Conclusion Computing in HEP and Astronomy –Strong similarities e.g. high volumes of granular, information-rich data serious approach to planning computing facilities and analysis software –Intriguing differences e.g. Visualization Relational versus Object data model?


Download ppt "Data Analysis in High Energy Physics, Weird or Wonderful? Richard P. Mount Director: Scientific Computing and Computing Services Stanford Linear Accelerator."

Similar presentations


Ads by Google