Data Analysis in High Energy Physics, Weird or Wonderful? Richard P. Mount Director: Scientific Computing and Computing Services Stanford Linear Accelerator.

Slides:



Advertisements
Similar presentations
First results from the ATLAS experiment at the LHC
Advertisements

Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.
Randall Sobie The ATLAS Experiment Randall Sobie Institute for Particle Physics University of Victoria Large Hadron Collider (LHC) at CERN Laboratory ATLAS.
Sept. 18, 2008SLUO 2008 Annual Meeting Vision for SLAC Science Persis S. Drell Director SLAC.
View from Experiment/Observation driven Applications Richard P. Mount May 24, 2004 DOE Office of Science Data Management Workshop.
February 19, 2008 FACET Review 1 Lab Overview and Future Onsite Facilities Persis S. Drell DirectorSLAC.
1 Andrew Hanushevsky - HEPiX, October 6-8, 1999 Mass Storage For BaBar at SLAC Andrew Hanushevsky Stanford.
Scientific Computing at SLAC Richard P. Mount Director: Scientific Computing and Computing Services DOE Review June 15, 2005.
HEP Prospects, J. Yu LEARN Strategy Meeting Prospects on Texas High Energy Physics Network Needs LEARN Strategy Meeting University of Texas at El Paso.
CERN/IT/DB Multi-PB Distributed Databases Jamie Shiers IT Division, DB Group, CERN, Geneva, Switzerland February 2001.
CHEP 2004 September 2004Richard P. Mount, SLAC Huge-Memory Systems for Data-Intensive Science Richard P. Mount SLAC CHEP, September 29, 2004.
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
25 February 2000Tim Adye1 Using an Object Oriented Database to Store BaBar's Terabytes Tim Adye Particle Physics Department Rutherford Appleton Laboratory.
Niko Neufeld, CERN/PH-Department
Scientific Computing for SLAC Science Bebo White Stanford Linear Accelerator Center October 2006.
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
May Richard P. Mount, SLAC Advanced Computing Technology Overview Richard P. Mount Director: Scientific Computing and Computing Services Stanford.
Data GRID Activity in Japan Yoshiyuki WATASE KEK (High energy Accelerator Research Organization) Tsukuba, Japan
Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
LHC Computing Review Recommendations John Harvey CERN/EP March 28 th, th LHCb Software Week.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Fermilab June 29, 2001Data collection and handling for HEP1 Matthias Kasemann Fermilab Overview of Data collection and handling for High Energy Physics.
Presented by Leadership Computing Facility (LCF) Roadmap Buddy Bland Center for Computational Sciences Leadership Computing Facility Project.
High Energy Physics Data Management Richard P. Mount Stanford Linear Accelerator Center DOE Office of Science Data Management Workshop, SLAC March 16-18,
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
CERN What Happens at CERN? "In the matter of physics, the first lessons should contain nothing but what is experimental and interesting to see. A pretty.
HEP and Non-HEP Computing at a Laboratory in Transition Richard P. Mount Director: Scientific Computing and Computing Services Stanford Linear Accelerator.
US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National Laboratory.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
High Energy Physics and Grids at UF (Dec. 13, 2002)Paul Avery1 University of Florida High Energy Physics.
National HEP Data Grid Project in Korea Kihyeon Cho Center for High Energy Physics (CHEP) Kyungpook National University CDF CAF & Grid Meeting July 12,
1D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Scientific Data Management: An Incomplete Experimental HENP Perspective D. Olson, LBNL 26 March 2002 SDM-ISIC Meeting.
Scientific Computing at SLAC: The Transition to a Multiprogram Future Richard P. Mount Director: Scientific Computing and Computing Services Stanford Linear.
High Energy FermiLab Two physics detectors (5 stories tall each) to understand smallest scale of matter Each experiment has ~500 people doing.
University user perspectives of the ideal computing environment and SLAC’s role Bill Lockman Outline: View of the ideal computing environment ATLAS Computing.
Dr. Andreas Wagner Deputy Group Leader - Operating Systems and Infrastructure Services CERN IT Department The IT Department & The LHC Computing Grid –
Computing for LHC Physics 7th March 2014 International Women's Day - CERN- GOOGLE Networking Event Maria Alandes Pradillo CERN IT Department.
Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.
LHC Computing, CERN, & Federated Identities
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
US ATLAS Tier 1 Facility Rich Baker Deputy Director US ATLAS Computing Facilities October 26, 2000.
HEP and Non-HEP Computing at a Laboratory in Transition Richard P. Mount Director: Scientific Computing and Computing Services Stanford Linear Accelerator.
The Worldwide LHC Computing Grid Frédéric Hemmer IT Department Head Visit of INTEL ISEF CERN Special Award Winners 2012 Thursday, 21 st June 2012.
GDB meeting - Lyon - 16/03/05 An example of data management in a Tier A/1 Jean-Yves Nief.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
PetaCache: Data Access Unleashed Tofigh Azemoon, Jacek Becla, Chuck Boeheim, Andy Hanushevsky, David Leith, Randy Melen, Richard P. Mount, Teela Pulliam,
1 Particle Physics Data Grid (PPDG) project Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99.
Grid technologies for large-scale projects N. S. Astakhov, A. S. Baginyan, S. D. Belov, A. G. Dolbilov, A. O. Golunov, I. N. Gorbunov, N. I. Gromova, I.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
Hall D Computing Facilities Ian Bird 16 March 2001.
LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.
GridPP10 Meeting CERN June 3 rd 2004
Stanford Linear Accelerator
The LHC Computing Grid Visit of Mtro. Enrique Agüera Ibañez
Western Analysis Facility
Understanding the nature of matter -
Southwest Tier 2.
Nuclear Physics Data Management Needs Bruce G. Gibbard
Stanford Linear Accelerator
TeraScale Supernova Initiative
SLAC B-Factory BaBar Experiment WAN Requirements
The Worldwide LHC Computing Grid
Using an Object Oriented Database to Store BaBar's Terabytes
Stanford Linear Accelerator
Presentation transcript:

Data Analysis in High Energy Physics, Weird or Wonderful? Richard P. Mount Director: Scientific Computing and Computing Services Stanford Linear Accelerator Center ADASS 2007 September 24, 2007

September 24, 2007Richard P. Mount, SLAC2 Overview SLAC history and transition Why experimental HEP is so weird Modern experiment design HEP data and data-analysis Surviving with painful amount of data Computing hardware for HEP Challenges Summary

September 24, 2007Richard P. Mount, SLAC3 History and Transition 1962: SLAC founded to construct and exploit a 2-mile linear accelerator; 1968 – 1978; Nucleon structure (quarks) discovered 1973: SSRP(L) founded to exploit synchrotron radiation from SLAC’s SPEAR storage ring; 1974:  discovered at SPEAR 1977  lepton discovered at SPEAR 2003: KIPAC (Kavli Institute for Particle Astrophysics and Cosmology) founded at SLAC/Stanford; 2005: LCLS (Linac Coherent Light Source) high energy X-ray laser construction approved; 2008: Last year of BaBar data taking. Last year of operation of SLAC accelerators for experimental HEP; 2009: BES (Photon Science funding) takes ownership of SLAC accelerators. BES becomes the largest funding source for SLAC. LCLS operation begins; The future: –Vigorous Photon Science program; –Vigorous Astrophysics and Cosmology program; –Potentially vigorous HEP program.

September 24, 2007Richard P. Mount, SLAC4 Fundamental Physics at the Quantum Level Uncovering truth requires measuring probabilities: –Open production of new physics (normally “needle in haystack”) Electron-positron collisions at M Z (91.2 GeV) –Small perturbations due to new physics (extreme precision) Electron-positron collisions at low energy (<<M Z ) Both approaches require a LOT of collisions (especially if they are proton-proton collisions)

September 24, 2007Richard P. Mount, SLAC5 Hydrogen Bubble Chamber Photograph 1970 Low event rate, ~4  acceptance, physics selected by scanners CERN Photo

September 24, 2007Richard P. Mount, SLAC6 Sam Ting’s Experiment at BNL 1974 Discovery of the J/ , Nobel Prize 1976 High interaction rate: protons/s Extremely small acceptance Physics selected by constructing a highly specific detector.

September 24, 2007Richard P. Mount, SLAC7 L3 Experiment, CERN, Huge 4  Detector!

September 24, 2007Richard P. Mount, SLAC8

September 24, 2007Richard P. Mount, SLAC9 Modern Experiment Design Discoveries most likely by observing high center-of-mass energy collisions –High energy means a vast range of (largely boring) final states, each with low probability –High energy production of new physics will manifest itself in many final states each with excruciatingly low probability Hence: –Build high luminosity colliders –Measure the final states with huge 4  detectors –Record and analyze painfully large data samples

September 24, 2007Richard P. Mount, SLAC10 LHC Experiments 1 Petabyte/s of analog information

September 24, 2007Richard P. Mount, SLAC11 Characteristics of HENP Experiments Large, complex detectors: –Large (approaching worldwide) collaborations: 500 – 2000 physicists –Long (10 – 20 year) timescales God plays dice: –Simulation is essential –Data volumes typically well above the pain threshold: 10,000 tapes or more

September 24, 2007Richard P. Mount, SLAC12 What does a painful amount of data look like? Early 1990s 2000s

September 24, 2007Richard P. Mount, SLAC13 Simulation A series of non-invertible transformations: –Fundamental physics  probability distributions of final state particles –Final state particles  interactions/showers in the detector –Charged particles  ionization in gas/crystals/silicon –Ionization  charge in charge-measuring devices –“Trigger” decisions made to very selectively read out the devices –Complex and imperfect software reconstructs final state particles from the read-out data Simulation on a massive, often worldwide scale is essential to relate fundamental physics to what is observed.

September 24, 2007Richard P. Mount, SLAC14 HEP Data Models HEP data models are complex! –Typically hundreds of structure types (classes) –Many relations between them –Different access patterns HEP moved to OO (C ++ ) in the mid 1990s –OO applications deal with networks of objects –Pointers (or references) are used to describe relations Event TrackList TrackerCalor. Track Track Track Track Track HitList Hit Hit Hit Hit Hit Dirk Düllmann/CERN

September 24, 2007Richard P. Mount, SLAC15 Characteristics of HENP Data Analysis Data consists of billions of independent “events”; Events have an internal information content of hundreds or thousands of small objects; “Queries” are performed by writing code (every HEPhysicist is a C ++ programmer) Queries typically need 1% of the events, and a few of the objects within the events; Each query may have a large overlap with an earlier query or may be largely orthogonal to earlier queries; Its really nice when queries take minutes and not months.

September 24, 2007Richard P. Mount, SLAC16 Data Handling Supporting Physics Analysis for the L3 HEP Experiment A few terabytes in total Easing the pain – Tape-based analysis circa 1991

September 24, 2007Richard P. Mount, SLAC17 And then: Disks became cheap enough to use to cache the entire working set for analysis And even though our data doubled every year or so … Rising disk capacity kept up So we all rejoiced at our liberation from serial-access tape And we constructed magnificent object database systems capable accessing any object on demand But disk access rates stayed static Accessing data on disk became cripplingly inefficient unless the access was serial

September 24, 2007Richard P. Mount, SLAC18 So: We dusted off the tape-era streaming/filtering ideas And called them new names like “skimming” Allowing, for example, BaBar at SLAC to survive – even prosper – with a painfully large multi-petabyte dataset subject to intense analysis by hundreds of physicists and thousands of CPU cores.

September 24, 2007Richard P. Mount, SLAC19 The B A B AR Detector

September 24, 2007Richard P. Mount, SLAC20 Client Disk Server Tape Server SLAC-BaBar Computing Fabric IP Network (Cisco) 120 dual/quad CPU Sun/Solaris ~400 TB FibreChannel RAID arrays + ~400TB SATA >4000 cores, dual/quad CPU Linux 25 dual CPU Sun/Solaris 40 STK 9940B 6 STK 9840A 6 STK Powderhorn 3 PB of data HEP-specific ROOT software (Xrootd) + Objectivity/DB object database some NFS HPSS + SLAC enhancements to ROOT and Objectivity server code

September 24, 2007Richard P. Mount, SLAC21 Price/Performance Evolution: My Experience CPU Disk Capacity WAN Disk Random Access Disk Streaming Access

September 24, 2007Richard P. Mount, SLAC22 Tier 1 Online System Event Reconstruction France Germany Institute ~0.25TIPS ~100 MBps ~ Gbps Mbps Physics data cache ~PBps ~ Gbps Tier 0 +1 Tier 3 Tier 4 Tier physicists in 31 countries are involved in this 20- year experiment in which DOE is a major player. Grid infrastructure spread over the US and Europe coordinates the data analysis Analysis Italy FermiLab, USA Distributed Data Analysis and the Grid CERN / LHC High Energy Physics Data 2008 onwards Event Simulation CERN LHC CMS detector 12,500 tons, $700M Gbps

September 24, 2007Richard P. Mount, SLAC23 A New Dream of Random Access

September 24, 2007Richard P. Mount, SLAC24 History and Transition 1962: SLAC founded to construct and exploit a 2-mile linear accelerator; 1968 – 1978; Nucleon structure (quarks) discovered 1973: SSRP(L) founded to exploit synchrotron radiation from SLAC’s SPEAR storage ring; 1974:  discovered at SPEAR 1977  lepton discovered at SPEAR 2003: KIPAC (Kavli Institute for Particle Astrophysics and Cosmology) founded at SLAC/Stanford; 2005: LCLS (Linac Coherent Light Source) high energy X-ray laser construction approved; 2008: Last year of BaBar data taking. Last year of operation of SLAC accelerators for experimental HEP; 2009: BES (Photon Science funding) takes ownership of SLAC accelerators. BES becomes the largest funding source for SLAC. LCLS operation begins; The future: –Vigorous Photon Science program; –Vigorous Astrophysics and Cosmology program; –Potentially vigorous HEP program.

September 24, 2007Richard P. Mount, SLAC25 Cultural Evolution in SLAC Computing Experimental HEP: –Large, organized collaborations –Computing recognized as vital –Detailed planning for computing –Expectation of highly professional computing Astrophysics and Cosmology (Theory) –Individual PIs or small collaborations –Computing recognized as vital –Desire for agility –“Professional” approach viewed as ponderous and costly Astronomy –Increasingly large collaborations and HEP-like timescales –Computing recognized as vital –Detailed (8 years in advance of data) planning for computing –Expectation of highly professional computing Photon Science –Light sources are costly facilities –Photon science has been “small science” up to now (“turn up for 3 days and take away a DVD”) –No large, organized collaborations –Computing not considered a problem by most scientists –Detailed planning would be a waste of time –“Wait for the crisis and then we will get the resources to fix the problems”

September 24, 2007Richard P. Mount, SLAC26 Technical Evolution in SLAC Computing Experimental HEP: –Challenging data-management needs (predominantly object-oriented?) –Throughput-oriented data processing, trivial parallelism –Commodity CPU boxes –Ethernet switches ideal –Experiment-managed bulk disk space Astrophysics and Cosmology (Theory) –Visualization –Parallel Computing – Shared-memory SMP –Parallel Computing – Infiniband MPI Clusters –Lustre and other “fancy” file systems –Macs and Linux boxes Astronomy –Visualization –Challenging data management needs (predominantly relational?) –Throughput-oriented data processing with real-time requirements –Lustre and other “fancy” file systems –Macs and Linux boxes Photon Science –Computer science challenges (extract information from a million noisy images) –MPI clusters needed now –Significant needs for bandwidth to storage –Other needs unclear

September 24, 2007Richard P. Mount, SLAC27 In Conclusion Computing in HEP and Astronomy –Strong similarities e.g. high volumes of granular, information-rich data serious approach to planning computing facilities and analysis software –Intriguing differences e.g. Visualization Relational versus Object data model?