1D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Scientific Data Management: An Incomplete Experimental HENP Perspective D. Olson, LBNL 26 March 2002 SDM-ISIC Meeting.

Slides:



Advertisements
Similar presentations
Object Persistency & Data Handling Session C - Summary Object Persistency & Data Handling Session C - Summary Dirk Duellmann.
Advertisements

31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Section 6.2. Record data by magnetizing the binary code on the surface of a disk. Data area is reusable Allows for both sequential and direct access file.
Introduction to CMS computing CMS for summer students 7/7/09 Oliver Gutsche, Fermilab.
Amber Boehnlein, FNAL D0 Computing Model and Plans Amber Boehnlein D0 Financial Committee November 18, 2002.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
ATLAS Analysis Model. Introduction On Feb 11, 2008 the Analysis Model Forum published a report (D. Costanzo, I. Hinchliffe, S. Menke, ATL- GEN-INT )
Data Analysis in High Energy Physics, Weird or Wonderful? Richard P. Mount Director: Scientific Computing and Computing Services Stanford Linear Accelerator.
Interfacing Interactive Data Analysis Tools with the Grid: PPDG CS-11 Activity Doug Olson, LBNL Joseph Perl, SLAC ACAT 2002, Moscow 24 June 2002.
Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.
Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.
Data oriented job submission scheme for the PHENIX user analysis in CCJ Tomoaki Nakamura, Hideto En’yo, Takashi Ichihara, Yasushi Watanabe and Satoshi.
Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.
High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 1 LAT Data Processing Facility Automatically process Level 0 data.
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
8th November 2002Tim Adye1 BaBar Grid Tim Adye Particle Physics Department Rutherford Appleton Laboratory PP Grid Team Coseners House 8 th November 2002.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
David N. Brown Lawrence Berkeley National Lab Representing the BaBar Collaboration The BaBar Mini  BaBar  BaBar’s Data Formats  Design of the Mini 
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
International Workshop on HEP Data Grid Nov 9, 2002, KNU Data Storage, Network, Handling, and Clustering in CDF Korea group Intae Yu*, Junghyun Kim, Ilsung.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Grand Challenge and PHENIX Report post-MDC2 studies of GC software –feasibility for day-1 expectations of data model –simple robustness tests –Comparisons.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Facilities and How They Are Used ORNL/Probe Randy Burris Dan Million – facility administrator.
1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge.
The PHysics Analysis SERver Project (PHASER) CHEP 2000 Padova, Italy February 7-11, 2000 M. Bowen, G. Landsberg, and R. Partridge* Brown University.
The LHCb CERN R. Graciani (U. de Barcelona, Spain) for the LHCb Collaboration International ICFA Workshop on Digital Divide Mexico City, October.
The Particle Physics Data Grid Collaboratory Pilot Richard P. Mount For the PPDG Collaboration DOE SciDAC PI Meeting January 15, 2002.
GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Non-LHC and Non-US-Collider Experiments’ Requirements Dan Tovey, University.
National HEP Data Grid Project in Korea Kihyeon Cho Center for High Energy Physics (CHEP) Kyungpook National University CDF CAF & Grid Meeting July 12,
PC clusters in KEK A.Manabe KEK(Japan). 22 May '01LSCC WS '012 PC clusters in KEK s Belle (in KEKB) PC clusters s Neutron Shielding Simulation cluster.
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
26 Nov 1999 F Harris LHCb computing workshop1 Development of LHCb Computing Model F Harris Overview of proposed workplan to produce ‘baseline computing.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
STAR C OMPUTING Plans for Production Use of Grand Challenge Software in STAR Torre Wenaus BNL Grand Challenge Meeting LBNL 10/23/98.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
23.March 2004Bernd Panzer-Steindel, CERN/IT1 LCG Workshop Computing Fabric.
Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
Analyzing ever growing datasets in PHENIX Chris Pinkenburg for the PHENIX collaboration.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
Workflows and Data Management. Workflow and DM Run3 and after: conditions m LHCb major upgrade is for Run3 (2020 horizon)! o Luminosity x 5 ( )
1 LHCb computing for the analysis : a naive user point of view Workshop analyse cc-in2p3 17 avril 2008 Marie-Hélène Schune, LAL-Orsay for LHCb-France Framework,
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
The ATLAS Computing & Analysis Model Roger Jones Lancaster University ATLAS UK 06 IPPP, 20/9/2006.
Production Mode Data-Replication Framework in STAR using the HRM Grid CHEP ’04 Congress Centre Interlaken, Switzerland 27 th September – 1 st October Eric.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
ATLAS TIER3 in Valencia Santiago González de la Hoz IFIC – Instituto de Física Corpuscular (Valencia)
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
1 Particle Physics Data Grid (PPDG) project Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99.
05/14/04Larry Dennis, FSU1 Scale of Hall D Computing CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental.
DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics.
Hall D Computing Facilities Ian Bird 16 March 2001.
LHC experiments Requirements and Concepts ALICE
for the Offline and Computing groups
The COMPASS event store in 2002
Grid Canada Testbed using HEP applications
Nuclear Physics Data Management Needs Bruce G. Gibbard
The ATLAS Computing Model
Development of LHCb Computing Model F Harris
Presentation transcript:

1D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Scientific Data Management: An Incomplete Experimental HENP Perspective D. Olson, LBNL 26 March 2002 SDM-ISIC Meeting Gatlinburg

2D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Particle Physics Data Grid PI’s: Mount, Livny, Newman Coordinators: Pordes, Olson

3D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Contents Quick overview of HENP data —Generic data flow —Sizes, timescales —Average physicist view What’s hard —Making technology work in production —A clear view for average physicist —Analysis of large datasets —Other things as well Today, many issues wrapped in hopes for “Data Grid”

4D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Experimental HENP event data Basic character of data is “event” —May be few particles

5D. Olson, SDM-ISIC Mtg, 26 Mar 2002 BaBar event

6D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Experimental HENP event data Basic character of data is “event” —May be few particles —May be MANY particles

7D. Olson, SDM-ISIC Mtg, 26 Mar 2002 STAR event, Au + Au

8D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Experimental HENP event data Basic character of data is “event” —May be few tracks —May be MANY tracks Detector characteristics, beam types, triggers effect the type of events recorded Physics analysis is a statistical analysis of many (1000’s, M’s, B’s, T’s) independent events

9D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Generic data flow in HENP “Skims”, “microDST production”, … Filtering chosen to make this a convenient size

10D. Olson, SDM-ISIC Mtg, 26 Mar 2002 A collaboration of people $100M, 10 yr, 100 people Free?, 10 yr, 20 people Free?, 1 yr, 10 people, 5x/yr Free?, 1 mo, 1 person, 50x/yr (“Typical” example today, LHC is larger)

11D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Example: CMS Tiers

12D. Olson, SDM-ISIC Mtg, 26 Mar 2002 List of major accelerator-based HENP experiments ExperimentLocation# physicistsTime scale BaBarSLAC STARBNL / RHIC PHENIXBNL / RHIC Jlab/CLASJLAB CDFFNAL D0FNAL ATLASCERN CMSCERN ALICECERN Jlab Hall DJLAB

13D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Size / frequency of basic activities ItemSize (TB) / Frequency (/yr) Typical todayLHC era (>5 yr) Raw data100 TB / yr1,000 TB / yr Event Reconstruction 3 / yr2 / yr DST data1 > DST/ raw > > DST/ raw > 0.02 microDST production 0.1 > microDST/DST >.001 ? Physics analysis * #physicists / year ?

14D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Average physicist view Mythology, culture, terminology varies a lot from one experiment to another. BaBar —Object view or primary event store (Objectivity) —Event collection objects give primary access points to data Event collection has list of references to all event components of interest With 100,000 collections, how to organize them? —Ntuples & PAW for final data format, analysis tool STAR (first year data, getting started) —A “production, trigger” is all reconstructed events for a trigger type with a certain version of code, (P00hg, central) —Access point is list of directory path’s below which all data are stored on disk —WZ will be setting up STACS —ROOT for data format and analysis tool …

15D. Olson, SDM-ISIC Mtg, 26 Mar 2002 What’s hard I, living with technology Typical computer center today —A couple STK Powderhorn tape silos, HPSS or home-grown MSS —1000 linux processors —Assortment of 100/1000 Gbps network —50 TB disk (1000 spindles) —Network s/w for I/O (NFS, Objy AMS, RFIO, …) —AFS for distributed collaboration Can make large RAID filesystems w/ network access —Faults can affect many nodes stale NFS file handles AFS faults affects nodes across country, work —Large RAID is $$$ Desire to reduce effect of faults —Fewer faults —More tolerance …

16D. Olson, SDM-ISIC Mtg, 26 Mar 2002 What’s hard II, A clear view for average physicist What’s going on in this box?

17D. Olson, SDM-ISIC Mtg, 26 Mar 2002 What’s hard II, A clear view for average physicist What data is available? —“data” means List of files? (like STAR) Collection object w/ pointers to all events? (like BaBar) —“available” means On disk? Where? Exists? Does it really have the filters and calibrations I need? Is it the “official” version of the data? …

18D. Olson, SDM-ISIC Mtg, 26 Mar 2002 What’s hard III, Analysis of large datasets Dataset does not fit on disk, or requires parallel processing, or is large enough operation that chance of fault is high

19D. Olson, SDM-ISIC Mtg, 26 Mar 2002 What’s hard III, Analysis of large datasets Dataset does not fit on disk —Needs access s/w to couple w/ processing SAM, STACS —Does performance meet demand?

20D. Olson, SDM-ISIC Mtg, 26 Mar 2002 SAM (Sequential data Access via Meta-data)

21D. Olson, SDM-ISIC Mtg, 26 Mar 2002 STACS

22D. Olson, SDM-ISIC Mtg, 26 Mar 2002 What’s hard III, Analysis of large datasets Dataset does not fit on disk —Needs access s/w to couple w/ processing SAM, STACS —Does performance meet demand? Needs parallel processing (not very hard) —Can not do analysis on private/personal machine —Schedule access to shared resource (CPU and disk) Operation for a single analysis is large enough that faults occur —Need exception handling —Need workflow management to complete failed tasks or, at least, accurately report status

23D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Example shared nothing cluster

24D. Olson, SDM-ISIC Mtg, 26 Mar 2002 PPDG

25D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Summary Faulty technology sets boundary conditions —Fault tolerant will expand boundaries of capabilities Data management is coupled with processing —Visualization (access w/o processing) is minor in HENP —Need access to data when & where it is needed for processing Working on data grid as context for data management PPDG has SDM ISIC as one of the technology base projects