Snapshot of the D0 Computing and Operations Planning Process Amber Boehnlein For the D0 Computing Planning Board.

Slides:



Advertisements
Similar presentations
Amber Boehnlein, FNAL D0 Computing Model and Plans Amber Boehnlein D0 Financial Committee November 18, 2002.
Advertisements

Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
6/2/2015 Michael Diesburg HCP Distributed Computing at the Tevatron D0 Computing and Event Model Michael Diesburg, Fermilab For the D0 Collaboration.
Belle computing upgrade Ichiro Adachi 22 April 2005 Super B workshop in Hawaii.
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
Title US-CMS User Facilities Vivian O’Dell US CMS Physics Meeting May 18, 2001.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Online Systems Status Review of requirements System configuration Current acquisitions Next steps... Upgrade Meeting 4-Sep-1997 Stu Fuess.
W.Smith, U. Wisconsin, ZEUS Computing Board Zeus Executive, March 1, ZEUS Computing Board Report Zeus Executive Meeting Wesley H. Smith, University.
GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 1 LAT Data Processing Facility Automatically process Level 0 data.
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
Planning and Designing Server Virtualisation.
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Data Import Data Export Mass Storage & Disk Servers Database Servers Tapes Network from CERN Network from Tier 2 and simulation centers Physics Software.
Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.
MiniBooNE Computing Description: Support MiniBooNE online and offline computing by coordinating the use of, and occasionally managing, CD resources. Participants:
A Design for KCAF for CDF Experiment Kihyeon Cho (CHEP, Kyungpook National University) and Jysoo Lee (KISTI, Supercomputing Center) The International Workshop.
Jan. 17, 2002DØRAM Proposal DØRACE Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) IntroductionIntroduction Remote Analysis Station ArchitectureRemote.
DØ RAC Working Group Report Progress Definition of an RAC Services provided by an RAC Requirements of RAC Pilot RAC program Open Issues DØRACE Meeting.
DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu.
Stephen Wolbers CHEP2000 February 7-11, 2000 Stephen Wolbers CHEP2000 February 7-11, 2000 CDF Farms Group: Jaroslav Antos, Antonio Chan, Paoti Chang, Yen-Chu.
International Workshop on HEP Data Grid Nov 9, 2002, KNU Data Storage, Network, Handling, and Clustering in CDF Korea group Intae Yu*, Junghyun Kim, Ilsung.
CDF Offline Production Farms Stephen Wolbers for the CDF Production Farms Group May 30, 2001.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Status of UTA IAC + RAC Jae Yu 3 rd DØSAR Workshop Apr. 7 – 9, 2004 Louisiana Tech. University.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
KLOE Computing Update Paolo Santangelo INFN LNF KLOE General Meeting University of Rome 2, Tor Vergata 2002, December
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
Meeting, 5/12/06 CMS T1/T2 Estimates à CMS perspective: n Part of a wider process of resource estimation n Top-down Computing.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
Sep 02 IPP Canada Remote Computing Plans Pekka K. Sinervo Department of Physics University of Toronto 4 Sep IPP Overview 2 Local Computing 3 Network.
Monte Carlo Data Production and Analysis at Bologna LHCb Bologna.
The KLOE computing environment Nuclear Science Symposium Portland, Oregon, USA 20 October 2003 M. Moulson – INFN/Frascati for the KLOE Collaboration.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
Feb. 14, 2002DØRAM Proposal DØ IB Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) Introduction Partial Workshop Results DØRAM Architecture.
Status report of the KLOE offline G. Venanzoni – LNF LNF Scientific Committee Frascati, 9 November 2004.
Collaboration meeting, October 11, 2002Jianming Qian, University of Michigan Computing & Software DØ main executables Data and MC production Data-handling.
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.
Frank Wuerthwein, UCSD Update on D0 and CDF computing models and experience Frank Wuerthwein UCSD For CDF and DO collaborations October 2 nd, 2003 Many.
NA62 computing resources update 1 Paolo Valente – INFN Roma Liverpool, Aug. 2013NA62 collaboration meeting.
Remote Institute Tasks Frank Filthaut 11 February 2002  Monte Carlo production  Algorithm development  Alignment, calibration  Data analysis  Data.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
MC Production in Canada Pierre Savard University of Toronto and TRIUMF IFC Meeting October 2003.
Feb. 13, 2002DØRAM Proposal DØCPB Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) IntroductionIntroduction Partial Workshop ResultsPartial.
Batch Software at JLAB Ian Bird Jefferson Lab CHEP February, 2000.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,
D0 File Replication PPDG SLAC File replication workshop 9/20/00 Vicky White.
Comments on #3: “Motivation for Regional Analysis Centers and Use Cases” Chip Brock 3/13/2.
Hans Wenzel CDF CAF meeting October 18 th -19 th CMS Computing at FNAL Hans Wenzel Fermilab  Introduction  CMS: What's on the floor, How we got.
PADME Kick-Off Meeting – LNF, April 20-21, DAQ Data Rate - Preliminary estimate Tentative setup: all channels read with Fast ADC 1024 samples, 12.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
DØ Computing Model and Operational Status Gavin Davies Imperial College London Run II Computing Review, September 2005.
SAM: Past, Present, and Future Lee Lueking All Dzero Meeting November 2, 2001.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
1 P. Murat, Mini-review of the CDF Computing Plan 2006, 2005/10/18 An Update to the CDF Offline Plan and FY2006 Budget ● Outline: – CDF computing model.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
Compute and Storage For the Farm at Jlab
PC Farms & Central Data Recording
DØ Computing & Analysis Model
Lee Lueking D0RACE January 17, 2002
DØ Internal Computing Review
DØ RAC Working Group Report
Proposal for a DØ Remote Analysis Model (DØRAM)
Presentation transcript:

Snapshot of the D0 Computing and Operations Planning Process Amber Boehnlein For the D0 Computing Planning Board

D0 CPB And Friends Lee Lueking and Chip Brock, Ruth & Wyatt & Vicky, Don & Jon & Bonnie Mike Diesburg, Iain Bertraim, Jae Yu, Heidi Harry Melanson, Serban Protopopescu, Qizhong Li, Dugan O’Neil Alan Jonckheere Stu Fuess Dane Skow, Dave Fagan Nick Hadley, Jianming Qian, ASB

Data size and Storage assumptions

D0 Institution Contributions All Monte Carlo production takes place offsite at Remote Centers Expect some analysis to occur remotely Investigating compute intensive operations in addition to MC generation for remote centers CLueD0 desktop cluster, administered by D0 collaboration members, contributions by institutions Institutions can provide project disk on D0mino Anticipated that institutions will contribute to CLuBs, the CLueD0 back end

Access and Analysis patterns Current Access and Analysis patterns –Much primary analysis done on high level data tier—currently reco based root tuple –Physics group coordinated efforts to generate Derived data sets by skimming through root tuple or reco output Picked event samples of raw data for re-reco studies Specialized reprocessing of small data sets

Extrapolation of Analysis patterns Assume that physics or analysis group generation of derived data sets continues –Skim of thumbnails for desktop or club analysis –Skim of DSTs for studies of which tmb contains inadequate information, re-reco –Pick events of raw data samples, small dst samples –Supply freight train to regulate fast DST access. Assume that the bulk of the users will do analysis on the TMB, either on DBE, club or remote cpus Smaller group does more time intensive analysis as a service running over larger data sets on DBE

Currently difficult to estimate analysis cpu usage –Assume generation of derived sets is relatively quick/per event, but happens often –Most DST access implies time intensive operations. Estimate it is at least the order of farm processing Would support 3 simultaneous users, ¼ farm processing time in 3 months each on 1/3 data, for initial data set. –Reco time per event is expected to increase dramatically as a number of multiple interactions –Make overall estimate of 75 seconds/event (500 MHz) for reco and analysis and re-reco—collaboration to weigh relative balance available in 2005, staged in. –Institutions can contribute CPU to CLuBs, assume FNAL contribution of $50K yearly –Remote center reprocessing, or dst level reprocessing is under evalution.

Farm Processing Farm processing capacity in Summer ’02 ~50Hz D0mino backend 16 node, 1 GHz 80 nodes, 2 GHz, summer ‘02

Roles of D0mino D0mino provided a centralized, stable and uniform work environment –Interactive and batch services for on and offsite users High I/O provided by 8 Gigabit ethernet connections—The Central Analysis SAM station –Interactive –To/from robotic storage –To/from secondary analysis systems –To/from the backend Access to project disks and disk cache (30 TB) Analysis CPU provided by Linux backend nodes

Upgrading D0mino Replacing D0mino requires identifying which parts of the design can be better served by more cost effective solutions –Linux back end to supply compute power for access to large data samples –Seeding CLuBs (the Clued0 backend) as solution for intermediate (1TB) samples –Continue to evaluate I/O performance Upgrading D0mino (O3000) would cost about $2M, and would have to be phased in. To stay within nominal guidance, would cut analysis/farm capacity by about ½ Fortunately, SAM gives D0 a lot of flexibility.

Backup facility Two primary consumers –Project disk archive –User driven backups of small samples Clearly a need, but not clear how best to accomplish.

Phase 1, Robotic Storage Plan D0 has 1 STK silo with drives –Writing raw data and Reco output to mezzosilo1 We have an option on a second silo AML2 with 6 LTO drives –Writing MC data –Plan to start writing Reco output as a test on May 7 –If test is successful Will purchase more LTO Drives and a few more 9940x –Else Will purchase as many 9940x as is feasible –Likely need to purchase a few more drives of each type to get us to decision point.

The overall estimated need for Robotic storage for phase 1 can be accommodated by the Mezzosilo1&2 and the AML2 even with the current generation of drives/media. Current Capacity approx 2*300 TB TB– compare to roughly 1 PB needed. Or The overall estimated need for Robotic storage for phase 1 can be accommodated by Mezzosilos 1&2 with 9940bs Assume the purchase of Mezzosilo 2 in 2003 Assume purchase of Drives for Mezzosilo 2 in 2003 Assume additional purchase of drives in 2004

Drive Estimates Support Online operations plus buffer drain –3 10mbyte/sec drives Support Farm operations –3 10 mbyte/sec drives Support incoming MC –2 drives Support Central analysis –Freight train for spooling through the DST sample in short interval (3 months) would consume 8 drives –MC and other –Pick events could consume infinite number of drives Buy LTO and STK in 2002—distribution to vary 20 drives for new Mezzosilo 2004 expect to add drives

Fixed Infrastructure Costs Database –machines –Database disks and controllers (assumed cost 10X cots) –DB Mirrors –Software Networking –Expand links between buildings, FCC –Additional switchs for DAB, farms –D0 to FCC upgrade to 10 Gb backbone upgrade ‘06 –Rewiring D0 for Gb to desktop in ’07 Linux build machines and disk Additional SAM servers

Disk Estimates Aim for sufficient cache, TMB storage on D0mino –All 2002 D0mino project disk additions supplied by the Institutions –Assume that model continues for project space –Supply additional 18 TB cache per year

Summary of infrastructure costs:

Rate assumptions Average rate assumes an accelerator and experiment Duty factor applied to a peak rate of 50 Hz

Full Cost Estimate, No I/O replacement

Questions to D0 Is it possible to make a better estimate of analysis CPU needs? Role of D0mino Relative weighting of tmb and DST analysis—better to have a larger tmb as trade off for less DST usage? Role of remote centers

Questions to CD How should we cost mover nodes? Relative role of disk vs robotic storage as time goes on Where will we put the phase 2 robots? Interaction between networking and remote centers Suggestions for backup facility