Presentation on theme: "31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure."— Presentation transcript:
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure (ie. Grid). The problem then reduces to: What datasets are required? Where are they required? Why are they required? Who is going to generate, distribute them? What are the formats, sizes & access patterns?
Event Tag Data Physics Objects Reconstructed Data Raw Data
Data Import Data Export Mass Storage & Disk Servers Database Servers Tapes Network from CERN Network from Tier 2 and simulation centers Physics Software Development R&D Systems and Testbeds Info servers Code servers Web Servers Telepresence Servers Training Consulting Help Desk Production Reconstruction Raw/Sim-->ESD Scheduled, predictable experiment/ physics groups Production Analysis ESD-->AOD AOD-->DPD Scheduled Physics groups Individual Analysis AOD-->DPD and plots Chaotic Physicists Desktops Tier 2 Local institutes CERN Tapes Support Services
batch physics analysis batch physics analysis detector event summary data raw data event reconstruction event reconstruction event simulation event simulation analysis objects (extracted by physics topic) Offline Data and Computation for Physics Analysis event filter (selection & reconstruction) event filter (selection & reconstruction) processed data
CPU for production Mass Storage for RAW, ESD AOD, and TAG Institute Selected User Analyses Institute Selected User Analyses Regional Centre User analysis Production Centre Generate raw data Reconstruction Production analysis User analysis Regional Centre User analysis Regional Centre User analysis Institute Selected User Analyses Regional Centre User analysis Institute Selected User Analyses CPU for analysis Mass storage for AOD, TAG CPU and data servers AOD,TAG real : 80TB/yr sim: 120TB/yr AOD,TAG 8-12 TB/yr LHCb
Production Centre (x1) Regional Centre (~x5) Institute (~x50) Real DataSimulated Data Data collection Triggering Reconstruction Final State Reconstruction CERN WAN Output to each RC: AOD and TAG datasets 20TB x 4 times/yr= 80TB/yr User Analysis WAN Output to each Institute: AOD and TAG for samples 1TB x 10 times/yr= 10TB/yr RAL, Lyon,... Event Generation GEANT tracking Reconstruction Final State Reconstruction WAN Output to each RC: AOD, Generator and TAG datasets 30TB x 4 times/yr= 120TB/yr User Analysis Selected User Analysis WAN Output to each institute: AOD and TAG for samples 3TB x 10 times/yr= 30TB/yr LHCb
Dataflow Model RAW Data DAQ system L2/L3 Trigger Calibration Data Reconstruction Event Summary Data (ESD) Reconstruction Tags Detector RAW Tags L3YES, sample L2/L3NO ESD Reconstruction Tags Analysis Object Data (AOD)Physics Tags First Pass Analysis Physics Analysis Private Data Analysis Workstation Physics results ESD RAW
Need to answer questions like... How will a physicist in Bristol/Brunel/IC/RAL: Select events for a given physics channel from a years worth of data taking? Transfer/replicate the selection for further analysis? Generate & process a large sample of simulated events? Run his/her batch job on existing samples of Monte- Carlo events (eg. at Tier1/Tier2)? Where do you want the data? What sort of data do you need - Tag,AOD,ESD,Raw?
How to Go Forward? Need to identify critical mass of people formed from all of the institutes who will start to study, develop and exploit CMS(UK) facilities now. Require expert(ise) in OO databases - specifically Objectivity (BaBar estimate 1 FTE). Each institute needs to start to identify its data requirements for simulation/physics/trigger studies. Need to understand how best to distribute, replicate, and centralise database & associated resources. Need good organisation with regular meetings, etc.