Presentation is loading. Please wait.

Presentation is loading. Please wait.

6/2/2015 Michael Diesburg HCP 2005 1 Distributed Computing at the Tevatron D0 Computing and Event Model Michael Diesburg, Fermilab For the D0 Collaboration.

Similar presentations


Presentation on theme: "6/2/2015 Michael Diesburg HCP 2005 1 Distributed Computing at the Tevatron D0 Computing and Event Model Michael Diesburg, Fermilab For the D0 Collaboration."— Presentation transcript:

1 6/2/2015 Michael Diesburg HCP 2005 1 Distributed Computing at the Tevatron D0 Computing and Event Model Michael Diesburg, Fermilab For the D0 Collaboration HCP 2005 Les Diablerets, Switzerland

2 6/2/2015 Michael Diesburg HCP 2005 2 Outline Computing Infrastructure Data Model and Formats Large Scale Global Processing A little background on infrastructure (It’s old news…) Evolution of the data model and formats (Maybe some valuable hints as to where you could go wrong…) Real world experience with large scale remote production (What are the hard parts? What’s it really going to cost you?)

3 6/2/2015 Michael Diesburg HCP 2005 3 Computing Infrastructure Hardware configuration is canonical setup for a large experiment: –Central analysis farms of dual processor linux nodes –Linux disk cache servers (4TB each) –High speed tape storage (STK 9940, 9940B, ADIC LTO, LTO2) –Large, central event/configuration/calibration database (Oracle, Sun) –Linux RedHat desktops –Central production farms of dual processor linux nodes (separate from analysis farms, but identical) –Gb Ethernet interconnections on central systems, 100Mb to desktop –Similar, smaller scale, installations at collaborating institutions worldwide Current capacities: –~ 1 PB data on tape, 1.4G raw events –~ 40 TB/week tape I/O –~ 120 TB disk cache –~ 1500 GHz production farm cpu –~ 1500 GHz analysis farm cpu

4 6/2/2015 Michael Diesburg HCP 2005 4 Computing Infrastructure Two key pieces of software infrastructure: Enstore, SAM Enstore is storage management system based on software originally developed at DESY. –Provides all access to tape based storage –No direct user interaction with Enstore SAM (Serial Access via Metadata) is the glue tying everything together –Provides transparent global access to data –Disk caching and file replication –Local and wide area data transport –Comprehensive metadata catalog for both real and MC data –Provides dataset definition capabilities and processing histories –User interfaces via command line, web and python API –Supports operation under PBS, Condor, LSF, FBS batch systems (others via a standard batch adapter)

5 6/2/2015 Michael Diesburg HCP 2005 5 Data Model (Historical Perspective) As initially envisioned normal processing chain would be: –Raw data processed by reconstruction program to produce STA = raw data + all reconstructed info ( too big, for a subset of data only) DST = reconstructed objects plus enough info to redo reconstruction TMB = compact format of selected reconstructed objects. All catalogued and accessible via SAM Above formats supported by a standard C++ framework Analysis groups would produce and maintain specific tuples for their use Life doesn’t always turn out the way you want it to: –STA never implemented –TMB wasn’t ready when data started coming in –DST was ready, but initially people wanted extra info in raw data. –Root tuple output intended for debugging was available and many began using it for analysis (but too big and too slow to produce for all data) –Threshold for using the standard framework and SAM was high (complex and documentation inadequate)

6 6/2/2015 Michael Diesburg HCP 2005 6 Data Model (Historical Perspective) TMB was pushed to finalization ~8 months after data taking began –Late enough that many were already wedded to the debugging root tuple –Specification is an ugly process, someone is always left unhappy –Result was great for physical data access (everything disk resident) and algorithm development –Slow for analysis (unpacking times large, changes required slow relinks) Divergence between those using standard framework and those using root tuples –Lead to incompatibilities and complications. Most notably in standard object IDs. –Need for common format was universally recognized Effort was made to introduce a common analysis format in the form of TMBTree –Compatibility issues and inertia prevented most root tuple users from embracing it –Did not have clear support model –Never caught on

7 6/2/2015 Michael Diesburg HCP 2005 7 Data Model Recently have expanded TMB to TMB+ –Includes tracking hits allowing reconstruction –Dropped DST Another attempt to introduce a common format via ROOT –Better thought out, broader support –Had we done this 5 years ago… –Might have a better chance, but inertia may be too great Lessons: –Need a well thought out, common analysis format that meets needs of all users and all agree on from day one. –Must support trivial access operations with minimal impedance to user –Documentation has to be ready at start

8 6/2/2015 Michael Diesburg HCP 2005 8 Remote Processing First attempt at remote production processing of real data was done in Fall of ’03 –Goal was to reprocess ~500M –Some compromises made to simplify remote jobs: Reprocess from DSTs (no calibration DB needed) No storage of DSTs, only TMBs (less to transport back to FNAL) Merging of TMBs and final storage to tape done at FNAL –Final scope of remote activity: ~ 100M events processed at 5 sites (UK, WestGrid, IN2P3, GridKa, NIKHEF) ~ 25TB data transported ~ 2 months setup required for ~ 6 weeks actual processing Very manpower intensive, but successful ROI appeared to be positive

9 6/2/2015 Michael Diesburg HCP 2005 9 Remote Processing Lessons learned from first pass: –Endeavor is very manpower intensive –ROI could be considered positive, but it is a hard call to make Compare estimated manpower costs to cost of expanding FNAL farm to handle load Equipment stays bought –Network is a limiting factor. Prestaging of files necessary –Well defined plans finalized early are critical to success Last minute changes in priorities, processing modes cause ripple effects across entire system –Restage files –Recertification of sites –Certification requires major effort What constitutes an acceptable discrepancy? Who decides? If a discrepancy is found, who is “right” (maybe neither) Hard to do high statistics certification. Total resources are large, but individual sites may not be able process much for certification When is recertification necessary

10 6/2/2015 Michael Diesburg HCP 2005 10 Remote Processing Let’s do it again, but get serious this time: –~ 1G events to process remotely –~ 250 TB raw data to move –Do reconstruction from raw data with normal configuration and calibration DB accesses –All steps done remotely including TMB merging and initiation of final store to tape at FNAL. –All bookkeeping done with SAM –Use common, Grid enabled software everywhere –Expect ~ 6 months processing time, several months setup time –Will be manpower intensive. Expect ~ 1 FTE for each major site for the duration of the project How efficient do you need to be? Except for prestaging of files this is no different than normal first pass reconstruction on FNAL farm

11 6/2/2015 Michael Diesburg HCP 2005 11 Remote Production Using SAMGrid software for operation of this pass. –Includes Job Information Monitoring (JIM) –Grid job submission –Execution control package (RunJob) –Uses VDT for underlying services –Requires each site to provide a head node for installation –Installation is very time and manpower intensive. Access to experts is necessary –Head node is the choke point in the system –Much effort has gone into alleviating load problems on head node –Still scalability issues to be resolved. Functionality needs to be distributable across multiple nodes

12 6/2/2015 Michael Diesburg HCP 2005 12 Remote Production In spite of effort required and problems, it’s working –Started production of 1G event sample early March –Currently ~450M events processed, merged, and stored to tape at FNAL –Should finish early October –Next step is Grid enabled user analysis Lessons: –Large scale production of real data in a Grid environment is practical –Could even do first pass reconstruction if necessary –Cheap hardware wise (at least for first user…) –Expensive manpower wise What color is your money?


Download ppt "6/2/2015 Michael Diesburg HCP 2005 1 Distributed Computing at the Tevatron D0 Computing and Event Model Michael Diesburg, Fermilab For the D0 Collaboration."

Similar presentations


Ads by Google