Presentation on theme: "Introduction to CMS computing CMS for summer students 7/7/09 Oliver Gutsche, Fermilab."— Presentation transcript:
Introduction to CMS computing CMS for summer students 7/7/09 Oliver Gutsche, Fermilab
7/7/09Oliver Gutsche - Introduction to CMS Computing 2 Basic of computing for HEP Particle physics is an event based processing problem The detector records collisions and stores the detector signals per collision This is what we call an event After the event recording, a lot of computing is needed to Store the data permanently Process the data to reconstruct event properties Access different forms of data to analyze events and extract physics results Monte Carlo simulation (MC) is also event based In addition to the processing, a lot of computing is needed to simulate a collision in the first place Processing, simulation and analysis can be performed with very high grade of parallelism In general, every event can be handled in parallel
7/7/09Oliver Gutsche - Introduction to CMS Computing 3 CMS in a nutshell (current planning basis) Data recording rate: 300 Hz Size per event: RAW: 1.5 MB (RAW+SIM 2 MB) RECO: 0.5 MB AOD: 0.1 MB Processing power per event: Simulation (including reconstruction) : 1000 HS HS06 = 1 event simulated and reconstructed in 100s on 3 GHz core Reconstruction: 100 HS HS06 = 1 event reconstructed in 10s on 3 GHz core DATA
7/7/09Oliver Gutsche - Introduction to CMS Computing /2010 We all are eagerly awaiting the start of data taking in 2009 Following estimates for the data taking year have been determined: “2009-run”: Oct’09 - Mar’10: 726M events “2010-run”: Apr’10 - Sep’10: 1555M events Translates to: 3.42 PB (yes, PetaByte) RAW data, 1.14 PB RECO data One single 3 HGz core would take ~100 years to reconstruct all of it This only refers to collision data, MC is not included
7/7/09Oliver Gutsche - Introduction to CMS Computing 5 Parallel processing We exploit the parallel nature of event processing by using multiple standard computers in parallel Processes only need access to input data and produce output data No communication between different computers during processing of events needed For the computing needs of CMS, we need many 10,000 computers (cores) Very interesting and challenging computational problem which governs the way we are doing computing
7/7/09Oliver Gutsche - Introduction to CMS Computing 6 From Single COmputer to the GRID 1. User is running processes on individual machines Has to log in to every machine, start the processes/jobs by hand Does scale for 1-3 users, but is very work intensive 2. Batch system: Scheduling system with access to many cores (farm) User adds his jobs to the queue Scheduling system starts jobs on one of the cores System can prioritize job execution and can impose fair share mechanism to support many users ➞ scales to many users Limited by farm size Batch system
7/7/09Oliver Gutsche - Introduction to CMS Computing 7 From Single COmputer to the GRID 3. The GRID, an interconnected network of batch farms Why not a single huge batch farm for CMS: Leverage local facilities, national/institutional computing resources and expertise Computing infrastructure for science is publicly funded, tax payers money stays in their respective country Reuse existing infrastructure and share with other disciplines Follow the international nature of the CMS collaboration Too big to put in one place even if we wanted to Several smaller farms are much easier to manage than one large one
7/7/09Oliver Gutsche - Introduction to CMS Computing 8 The GRID Transparent access to all distributed resources (like a very big batch system) Authentication via GRID certificates: User accounts of all users cannot be distributed to all GRID sites GRID certificates and derived short-living GRID proxies authenticate a user globally At the site, the GRID proxy is mapped to a generic local user account CMS uses the Virtual Organization Membership Service (VOMS) on top of GRID proxies to have a more fine grained role based user prioritization (e.g. “production role” gets priority) CMS uses following GRIDs: EGEE (European GRID), OSG (American GRID), NorduGRID Important for users in CMS: get your GRID certificate!
7/7/09Oliver Gutsche - Introduction to CMS Computing 9 Storage CMS will produce an enormously large volume of data Cheapest and most reliable technique to store these volumes of data: tape Simple technology with very high reliability (tape robot feeding tapes stored in large shelf systems to tape drives) Slow in writing and access (remember the old cassette players?) Storage capacity: 10 to 100 of PB Approach to overcome slow access: disk caches Access to data is not handled directly, disk cache systems are sitting between the processing farm and the tape system Disk cache system organize large amounts of disks into larger entities and provide access to the cached files, take care of deleting not or very rarely used files Systems have > 1000 individual disks Some disk cache systems you might come across: Castor, dCache, DPM, Storm, Hadoop Storage capacity: ~1 to 10 PB
7/7/09Oliver Gutsche - Introduction to CMS Computing 10 Computing for CMS How does CMS organize its computing to provide sufficient computing power to handle all data and MC?
7/7/09Oliver Gutsche - Introduction to CMS Computing 12 CMS Computing model Tier 0 (T0) at CERN (20% of all CMS computing resources) Record and prompt reconstruct collision data Calculate condition and alignment constants Store data on tape (only archival copy, no access) Only central processing, no user access Tier 1 (T1): regional centers in 7 countries (40% of all CMS computing resources) Store data fraction on tape (served copy) Every T1 site gets a fraction of the data according to its respective size Archive fraction of produced MC on tape Skim data to reduce data size and make data more easily handleable Rereconstruct data with newer software and conditions/alignment constants Only central processing, no user access Tier 2 (T2): local computing centers at Universities and Laboratories (40% of all CMS computing resources) Simulate MC events User access to data for analysis
7/7/09Oliver Gutsche - Introduction to CMS Computing 13 CMS distributed computing MODEL “Data driven” computing model Data and MC samples are distributed centrally Jobs (processing, analysis) “go” to the data Requires very fast network connections between the different centers: T0 ➞ T1: handled via the LHC-OPN (Optical Private Network) consisting of dedicated 10 Gbit/s network links Distributes the recorded data for storage on tape at T1 sites T1 ➞ T1: also handled via the OPN Redistribute parts of the data produced during rereconstruction T1 ➞ T2: handled via national high speed network links Transfer datasets for analysis to T2 sites T2 ➞ T1: handled via national high speed network links Transfer produced MC to T1 for storage on tape
7/7/09Oliver Gutsche - Introduction to CMS Computing 14 T0 Online system of the detector records event data and stores it in binary files (streamer files) There are 2 processing paths in the T0: Prompt: latency of several days The T0 repacks the binary files into ROOT files and splits the events into Primary Datasets according to trigger selections and stores them on tape This is called the RAW data tier The T0 then reconstructs the RAW data for the first time (Prompt Reconstruction) and stores the output on tape as well This is called the RECO data tier During reconstruction, a subset of information sufficient to perform 90% of all physics analysis is extracted and stored separately This is called the AOD (Analysis Object Data) data tier Also special alignment and calibration datasets are produced and copied directly to the CERN Analysis Facility (CAF) All RAW, RECO and AOD data is then transferred to the T1 sites for storage on tape Express: latency of 1 hour All the steps above combined into a single process run on 10% of all events selected from all the recorded data Output is copied to the CAF for express alignment and calibration workflows and prompt feedback by physics analysis
7/7/09Oliver Gutsche - Introduction to CMS Computing 15 T1 All T1 centers store a fraction of the recorded data on tape This is called the custodial data fraction MC produced at the T2 sites is archived on tape here as well The T1 processing resources are used for Skimming of data to reduce the data size by physics pre-selections to make the datasets more easily handleable by the physicists Rereconstruction of the data with new software versions and updated alignment and calibrations The T1 sites serve data to the T2 sites for analysis Users, physics groups or also central operations can request transfers of datasets Currently up to 10,000 jobs in parallel on the T1 sites
7/7/09Oliver Gutsche - Introduction to CMS Computing 16 T T2 sites serve 2000 physicists and provide access to data and MC samples for analysis Each CMS physicist can access data at all the T2 sites Still there is some regional association between physicist and T2 site to support local resources for storage of user files There is also association between physics groups (top, higgs, …) and T2 sites to organize data and MC sample placement A typical T2 sites has ~800 batch slots and 300 TB of disk space T2 sites don’t have tape Half of the resources of all T2 sites are reserved for MC production which is handled centrally
7/7/09Oliver Gutsche - Introduction to CMS Computing 17 USER How does the user interact with the CMS computing infrastructure?
7/7/09Oliver Gutsche - Introduction to CMS Computing 18 Transfer tool: phedex PhEDEx is CMS’ tool to request and manage data transfers Every user can request the transfer of a data sample to a T2 site for analysis Every T2 site (also the T1 sites and the T0) have data managers which approve or disapprove transfer requests according to global policies and available storage space
7/7/09Oliver Gutsche - Introduction to CMS Computing 19 Dataset Bookkeeping system (DBS) DBS handles to bookkeeping of datasets https://cmsweb.cern.ch/dbs_discovery A dataset name is composed of: / / / Primary dataset name: specifies the physics content of the sample Processed dataset name: specifies the processing conditions and data taking or MC production period Data tier: specifies the format of content of the files (RAW, RECO, AOD, … ) Primary tool to look up and discovery datasets and their location on the T2 level for your analysis
7/7/09Oliver Gutsche - Introduction to CMS Computing 20 GRID submission tool: CRAB CMS Remote Analysis Builder https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCrab Enables every user to send her/his analysis code to the T2 sites to process stored data and MC samples Represents a wrapper to the GRID tools used to execute jobs on the GRID
7/7/09Oliver Gutsche - Introduction to CMS Computing 21 Summary CMS computing needs are significant CMS expects to record in 2009/2010 alone More than 6 PetaByte of collision data All this data has to be processed and accessed via the distributed tiered computing infrastructure Especially for the average user: Documentation: https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookhttps://twiki.cern.ch/twiki/bin/view/CMS/WorkBook Getting help via Hypernews: https://hypernews.cern.ch/https://hypernews.cern.ch look for most appropriate forum, send a fully detailed report (too much is better than too little). Dont be shy to ask questions!