CMS Computing Model Simulation Stephen Gowdy/FNAL 30th April 2015CMS Computing Model Simulation1.

Slides:



Advertisements
Similar presentations
Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
Advertisements

Performance of Cache Memory
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
Router Architecture : Building high-performance routers Ian Pratt
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
MultiJob PanDA Pilot Oleynik Danila 28/05/2015. Overview Initial PanDA pilot concept & HPC Motivation PanDA Pilot workflow at nutshell MultiJob Pilot.
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized Caching Policies for Storage Systems.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
ALICE DATA ACCESS MODEL Outline ALICE data access model - PtP Network Workshop 2  ALICE data model  Some figures.
Analysis of Simulation Results Andy Wang CIS Computer Systems Performance Analysis.
Edge Based Cloud Computing as a Feasible Network Paradigm(1/27) Edge-Based Cloud Computing as a Feasible Network Paradigm Joe Elizondo and Sam Palmer.
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
Global Science experiment Data hub Center Oct. 13, 2014 Seo-Young Noh Status Report on Tier 1 in Korea.
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
8th November 2002Tim Adye1 BaBar Grid Tim Adye Particle Physics Department Rutherford Appleton Laboratory PP Grid Team Coseners House 8 th November 2002.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
Performance Model for Parallel Matrix Multiplication with Dryad: Dataflow Graph Runtime Hui Li School of Informatics and Computing Indiana University 11/1/2012.
Summary of RSG/RRB Ian Bird GDB 9 th May 2012
Grid Scheduler: Plan & Schedule Adam Arbree Jang Uk In.
Reconstruction Configuration with Python Chris Jones University of Cambridge.
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
Caitriana Nicholson, CHEP 2006, Mumbai Caitriana Nicholson University of Glasgow Grid Data Management: Simulations of LCG 2008.
Factors affecting ANALY_MWT2 performance MWT2 team August 28, 2012.
Week 14 Introduction to Computer Science and Object-Oriented Programming COMP 111 George Basham.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
8-Dec-15T.Wildish / Princeton1 CMS analytics A proposal for a pilot project CMS Analytics.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
PHENIX and the data grid >400 collaborators 3 continents + Israel +Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
US-CMS T2 Centers US-CMS Tier 2 Report Patricia McBride Fermilab GDB Meeting August 31, 2007 Triumf - Vancouver.
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
Avanade Confidential – Do Not Copy, Forward or Circulate © Copyright 2014 Avanade Inc. All Rights Reserved. For Internal Use Only SharePoint Insights (BETA)
Data Placement Intro Dirk Duellmann WLCG TEG Workshop Amsterdam 24. Jan 2012.
The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.
Stephen Gowdy FNAL 9th Feb 2015CMS Computing Model Simulation 1.
Ian Bird WLCG Networking workshop CERN, 10 th February February 2014
GDB, 07/06/06 CMS Centre Roles à CMS data hierarchy: n RAW (1.5/2MB) -> RECO (0.2/0.4MB) -> AOD (50kB)-> TAG à Tier-0 role: n First-pass.
Computer Model Simulation STEPHEN GOWDY FNAL 30th March 2015 Computing Model Simulation 1.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Maria Girone, CERN  CMS in a High-Latency Environment  CMSSW I/O Optimizations for High Latency  CPU efficiency in a real world environment  HLT 
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
1 June 11/Ian Fisk CMS Model and the Network Ian Fisk.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
Global Science experimental Data hub Center April 25, 2016 Seo-Young Noh Status Report on KISTI’s Computing Activities.
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
11th September 2002Tim Adye1 BaBar Experience Tim Adye Rutherford Appleton Laboratory PPNCG Meeting Brighton 11 th September 2002.
Compute and Storage For the Farm at Jlab
Kevin Thaddeus Flood University of Wisconsin
OPERATING SYSTEMS CS 3502 Fall 2017
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
FileStager test results
Presentation transcript:

CMS Computing Model Simulation Stephen Gowdy/FNAL 30th April 2015CMS Computing Model Simulation1

Overview Want to look at different computing models for HL-LHC To use caching (eg CDN, NDN) Where to place caches How large they need to be Discussion with others to possibly collaborate Writing a basic Python simulation Can consider to change to C++ if better performance is needed 30th April 2015CMS Computing Model Simulation2

Simulation Time driven discrete simulation 100 seconds used as time slices currently Takes account of slots in sites Allows for transfers between sites Code is in 30th April 2015CMS Computing Model Simulation3

Methodology Flat files read to load in site, network, job and file information Setup sites and links Next setup catalogue of data Read in simulation parameters for CPU efficiency, remote read penalty and file transfer rates Start processing jobs in sequence Use list of jobs from dashboard to feed simulation See how it performs to process current jobs 30th April 2015CMS Computing Model Simulation4

Simulation Parameters CPU Efficiency derived from actual jobs Latency between sites guessed at the moment CPU Efficiency penalty when reading remotely 0ms: 0, >=1ms 5%, >=50ms 20% Single file transfer maximum speed 0ms: 10Gbps, >=1ms 1Gbps, >=50ms 100Mbps, >=100ms 50Mbps 30th April 2015CMS Computing Model Simulation5

30th April 2015CMS Computing Model Simulation6 site cpuTime inputData fractionRead start end runTime dataReadTime dataReadCPUHit theStore Job name disk bandwidth network [[site, bandwidth, quality, latency] … ] batch Site qjobs [ Job ] rjobs [ Job ] djobs [ Job ] cores bandwidth Batch catalogue {lfn:[site…]} files {lfn:size, …} EventStore

Site Information Extracted from SiteDB pledge database Use information for 2014, most recent update If site has no pledge just assume 10TB and 100 slots Tier-2s default is larger, should probably update No internal bandwidth information so assume 20GB/s at all sites Recently only considering US Tier-1 and Tier-2 sites Sizes taken by hand from REBUS (could probably automate also) Vanderbilt assumed to be the same as others 30th April 2015CMS Computing Model Simulation7

30th April 2015CMS Computing Model Simulation8 Each Tier-2 has; 1000TB disk 1224 cores FNAL has; 17690TB disk cores

Job Information Site, Start Time, Wall Clock, CPU time, files read Extracted job information from dashboard A week from 15 th to 22 nd February About 5% of jobs have no site information (discarded) About 33% have no CPU time (derived from wall clock) <<1% have no start time (use CPU time before end time) <<1% have no input file defined (discarded) Will compare wall clock in simulation with actual for quality of simulation check Compare overall simulated wall clock time to compare different scenarios 30th April 2015CMS Computing Model Simulation9

File Information Extract network mesh from PhEDEx Using the links interface Also get reliability information If not present assumed 99% No actual transfer rate information available for links Use what is available to get a number between 1GB/s and 10GB/s, not at all accurate. Default 1GB/s. Extract file location and size information from PhEDEx No historical information is available When updating job information need to get an update for file locations Only get information on files used by jobs Some of jobs read a file outside the US, place copy at FNAL to allow job to work when considering only US sites 30th April 2015CMS Computing Model Simulation10

30th April 2015CMS Computing Model Simulation11 Startup output when only using US T1 and T2 sites; $ python python/Simulation.py Read in 9 sites. Read in 72 network links. Read in files. Read in locations. Read in 3 latency bins. Read in 4 transfer bins. Read in 10 job efficiency slots. About to read and simulate jobs... …

Caching Need to add different caching strategy later Cache hierarchy Including cache cleaning if getting full Currently simulation allows no transfers, or transfers. Also can discard transfers. Won’t transfer if there is no space available at a site Implement different models With new version of xrootd can read while still transferring 30th April 2015CMS Computing Model Simulation12

Scenarios Considered Run standard set of US jobs Each job ran twice to spread load across all Tier-2 sites more evenly 1. Run with a similar situation to today Vast majority of data already placed at execution site Small number of jobs will transfer data from another site (usually FNAL) 2. Only use a local cache, data initially at FNAL 3. Only read data from FNAL, no local copy (no local disk needed) 30th April 2015CMS Computing Model Simulation13

Vary Input Parameters Total wall clock time used in billions of seconds Each box has three values: Preplaced Data/Transfer File/Remote Read There is a very small difference in the Transfer File time with the change in transfer speed 30th April 2015CMS Computing Model Simulation14 Half CPU HitNormal CPU HitDouble CPU Hit Half Tran. Speed2.77/3.32/ /3.32/ /3.32/4.25 Normal Tran. Speed2.77/3.32/ /3.32/ /3.32/4.25 Double Tran. Speed2.77/3.32/ /3.32/ /3.32/4.25

Plots from running with different parameters Grid of three by three graphs, similar to previous table Left to right vary Remote Read penalty Half: 0ms: 0, >=1ms 2.5%, >=50ms 10% Normal: 0ms: 0, >=1ms 5%, >=50ms 20% Double: 0ms: 0, >=1ms 10%, >=50ms 40% Top to bottom vary maximum single file transfer rate Half: 0ms 5Gbps, >=1ms 500Mbps, >=50ms 50Mbps, >=100ms 25Mbps Normal: 0ms 10Gbps, >=1ms 1Gbps, >=50ms 100Mbps, >=100ms 50Mbps Double: 0ms 20Gbps, >=1ms 2Gbps, >=50ms 200Mbps, >=100ms 100Mbps 30th April 2015CMS Computing Model Simulation15

CPU Efficiency when Data Read from Fermilab 30th April 2015CMS Computing Model Simulation16

CPU Eff. when data copied from Fermilab 30th April 2015CMS Computing Model Simulation17

CPU Efficiency when data mostly preplaced 30th April 2015CMS Computing Model Simulation18

Rate from FNAL when data read from FNAL 30th April 2015CMS Computing Model Simulation19

Rate from FNAL when data copied from FNAL 30th April 2015CMS Computing Model Simulation20

Rate from FNAL when data mostly preplaced 30th April 2015CMS Computing Model Simulation21

Job States 30th April 2015CMS Computing Model Simulation22

Summary Can simulate CMS computing system Concentrating on US infrastructure, simpler system to understand, and perhaps experiment on Eg Turn off local disk access for a short amount of time Can use current infrastructure to determine input parameters better Scale up job throughput to capacity of system 30th April 2015CMS Computing Model Simulation23

BACKUP SLIDES 30th April 2015CMS Computing Model Simulation24

Jobs running when data read from FNAL 30th April 2015CMS Computing Model Simulation25

Jobs running when copied from FNAL 30th April 2015CMS Computing Model Simulation26

Jobs running when data mostly preplaced 30th April 2015CMS Computing Model Simulation27

Inter-Tier2 rate when data read from FNAL 30th April 2015CMS Computing Model Simulation28

Inter-Tier2 rate when data copied from FNAL 30th April 2015CMS Computing Model Simulation29

Inter-Tier2 rate when mostly preplaced 30th April 2015CMS Computing Model Simulation30