Download presentation
Presentation is loading. Please wait.
Published byShanon Gardner Modified over 9 years ago
1
CERN/IT/DB Multi-PB Distributed Databases Jamie Shiers IT Division, DB Group, CERN, Geneva, Switzerland February 2001
2
CERN - The European Organisation for Nuclear Research The European Laboratory for Particle Physics Fundamental research in particle physics Designs, builds & operates large accelerators Financed by 20 European countries SFR 950M budget - operation + new accelerators 3000 staff + 6000 users (researchers) from all over the world Experiments conducted by a small number of large collaborations: LEP (current) experiment: 500 physicists, 50 universities, 20 countries, apparatus cost SFR 100M LHC (future ~2006) experiment: 2000 physicists, 150 universities, apparatus costing SFR 500M
3
CERN/IT/DB aéroport Centre de Calcul Genève 27km
4
CERN/IT/DB The LEP tunnel
5
CERN/IT/DB The LHC machine in the LEP tunnel Two counter- circulating proton beams Collision energy 7 + 7 TeV 27 Km of magnets with a field of 8.4 Tesla Super-fluid Helium cooled to 1.9°K The world’s largest superconducting structure
6
CERN/IT/DB The LHC Detectors CMS ATLAS LHCb
7
CERN/IT/DB The ATLAS detector – the size of a 6 floor building!
8
CERN/IT/DB online system multi-level trigger filter out background reduce data volume from 40TB/s to 100MB/s level 1 - special hardware 40 MHz (40 TB/sec) level 2 - embedded processors level 3 - PCs 75 KHz (75 GB/sec) 5 KHz (5 GB/sec) 100 Hz (100 MB/sec) data recording & offline analysis
9
CERN/IT/DB Event Filter & Reconstruction (figures are for one experiment) switch data from detector - event builder high speed network computer farm tape and disk servers raw data summary data input: 5-100 GB/sec capacity: 50K SI95 (~4K 1999 PCs) recording rate: 100 MB/sec (Alice – 1 GB/sec) + 1-1.25 PetaByte/year + 1-500 TB/year 20,000 Redwood cartridges every year (+ copy)
10
CERN/IT/DB Analysis Model Reconstruction Selection Analysis Re-processing 3 per year Iterative selection Once per month Different Physics cuts & MC comparison ~4 times per day Experiment- Wide Activity (10 9 events) ~20 Groups’ Activity (10 9 10 7 events) ~25 Individual per Group Activity (10 6 –10 8 events) New detector calibrations Or understanding Trigger based and Physics based refinements Algorithms applied to data to get results 1000 SI95sec/event 1 job year 1000 SI95sec/event 1 job year 1000 SI95sec/event 3 jobs per year 1000 SI95sec/event 3 jobs per year 25 SI95sec/event ~20 jobs per month 25 SI95sec/event ~20 jobs per month 3SI95sec/event ~2000 jobs per day 3SI95sec/event Monte Carlo 5000 SI95sec/event
11
CERN/IT/DB LHC Data Volumes References within but not between events (1MB)
12
CERN/IT/DB Data Types: RAW Accessed sequentially, ~once per year + occasionally “random” access. 100% at CERN site / 20% at ~5 regional centres. “Simple” data model – largely dictated by detector hardware – typically large arrays of packed numbers Possible schema changes ~annually Could (easily?) be partitioned by time interval Main problem is sheer volume – cannot assume that all data can reside on-line
13
CERN/IT/DB Data Types: ESD “Summary data” – output of reconstruction process. Used for event display; re- reconstruction; generation of AOD Complex data model – several hundred classes; variable length data structures; many references
14
CERN/IT/DB Data Types: AOD “Physics” quantities – particle tracks; energy clusters etc. Relatively simple(?) data model but subject to frequent schema changes Accessed randomly by large numbers of users worldwide
15
CERN/IT/DB Data Types: TAG Used for pre-selection – tag stores key characteristics of each event some 10-100 words per event Updated ~monthly; perhaps much more often Accessed in “chaotic” fashion, world- wide by 100-1000 users
16
CERN/IT/DB Data Processing Tasks Reconstruction: support multiple writers (100-500 filter farm cpu’s) support aggregate data rates sufficient to keep up with acquisition 100MB/s for ATLAS/CMS; 1.5GB/s for ALICE Simulation: Very CPU intensive process but no “special” requirements Analysis: support multiple readers (~150 concurrent users) data volumes and rates hard to predict at this time assume aggregate data rates reconstruction
17
CERN/IT/DB
18
HEP Data Models HEP data models are complex! Typically hundreds (500-1000) of structure types (classes) Many relations between them Different access patterns LHC experiments rely on OO technology OO applications deal with networks of objects Pointers (or references) are used to describe relations Event TrackList TrackerCalor. Track Track Track Track Track HitList Hit Hit Hit Hit Hit
19
CERN/IT/DB CMS:1800 physicists 150 institutes 32 countries World Wide Collaboration distributed computing & storage capacity
20
CERN/IT/DB Exploit established computing expertise & infrastructure In national labs, universities Reduce dependence on links to CERN full ESD available nearby - through a fat, fast, reliable network link Tap funding sources not otherwise available to HEP Devolve control over resource allocation national interests? regional interests? at the expense of physics interests? Why Regional Centres?
21
CERN/IT/DB Tier2 Center Online System Offline Farm, CERN Computer Center > 20 TIPS FranceCentre FNAL Center Italy Center UK Center Institute Institute ~0.25TIPS Workstations ~100 MBytes/sec ~2.4 Gbits/sec 100 - 1000 Mbits/sec Bunch crossing per 25 nsecs. 100 triggers per second Event is ~1 MByte in size Physicists work on analysis “channels”. Each institute has ~10 physicists working on one or more channels Physics data cache ~PBytes/sec ~0.6 - 2.5 Gbits/sec + Air Freight Tier2 Center ~622 Mbits/sec Tier 0 +1 Tier 1 Tier 3 Tier 4 Tier2 Center Tier 2 GriPhyN: FOCUS On University Based Tier2 Centers Experiment
22
CERN/IT/DB Physics Data Overview Access patterns, users, schema, replication policy, differ significantly Why not store in different databases? Raw data databases + raw schema Tag data databases + tag schema...
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.