Presentation is loading. Please wait.

Presentation is loading. Please wait.

CERN/IT/DB Multi-PB Distributed Databases Jamie Shiers IT Division, DB Group, CERN, Geneva, Switzerland February 2001.

Similar presentations


Presentation on theme: "CERN/IT/DB Multi-PB Distributed Databases Jamie Shiers IT Division, DB Group, CERN, Geneva, Switzerland February 2001."— Presentation transcript:

1 CERN/IT/DB Multi-PB Distributed Databases Jamie Shiers IT Division, DB Group, CERN, Geneva, Switzerland February 2001

2 CERN - The European Organisation for Nuclear Research The European Laboratory for Particle Physics  Fundamental research in particle physics  Designs, builds & operates large accelerators  Financed by 20 European countries  SFR 950M budget - operation + new accelerators  3000 staff + 6000 users (researchers) from all over the world  Experiments conducted by a small number of large collaborations: LEP (current) experiment: 500 physicists, 50 universities, 20 countries, apparatus cost SFR 100M LHC (future ~2006) experiment: 2000 physicists, 150 universities, apparatus costing SFR 500M

3 CERN/IT/DB aéroport Centre de Calcul Genève  27km 

4 CERN/IT/DB The LEP tunnel

5 CERN/IT/DB The LHC machine in the LEP tunnel Two counter- circulating proton beams Collision energy 7 + 7 TeV 27 Km of magnets with a field of 8.4 Tesla Super-fluid Helium cooled to 1.9°K The world’s largest superconducting structure

6 CERN/IT/DB The LHC Detectors CMS ATLAS LHCb

7 CERN/IT/DB The ATLAS detector – the size of a 6 floor building!

8 CERN/IT/DB online system multi-level trigger filter out background reduce data volume from 40TB/s to 100MB/s level 1 - special hardware 40 MHz (40 TB/sec) level 2 - embedded processors level 3 - PCs 75 KHz (75 GB/sec) 5 KHz (5 GB/sec) 100 Hz (100 MB/sec) data recording & offline analysis

9 CERN/IT/DB Event Filter & Reconstruction (figures are for one experiment) switch data from detector - event builder high speed network computer farm tape and disk servers raw data summary data input: 5-100 GB/sec capacity: 50K SI95 (~4K 1999 PCs) recording rate: 100 MB/sec (Alice – 1 GB/sec) + 1-1.25 PetaByte/year + 1-500 TB/year 20,000 Redwood cartridges every year (+ copy)

10 CERN/IT/DB Analysis Model Reconstruction Selection Analysis Re-processing 3 per year Iterative selection Once per month Different Physics cuts & MC comparison ~4 times per day Experiment- Wide Activity (10 9 events) ~20 Groups’ Activity (10 9  10 7 events) ~25 Individual per Group Activity (10 6 –10 8 events) New detector calibrations Or understanding Trigger based and Physics based refinements Algorithms applied to data to get results 1000 SI95sec/event 1 job year 1000 SI95sec/event 1 job year 1000 SI95sec/event 3 jobs per year 1000 SI95sec/event 3 jobs per year 25 SI95sec/event ~20 jobs per month 25 SI95sec/event ~20 jobs per month 3SI95sec/event ~2000 jobs per day 3SI95sec/event Monte Carlo 5000 SI95sec/event

11 CERN/IT/DB LHC Data Volumes References within but not between events (1MB)

12 CERN/IT/DB Data Types: RAW  Accessed sequentially, ~once per year + occasionally “random” access.  100% at CERN site / 20% at ~5 regional centres.  “Simple” data model – largely dictated by detector hardware – typically large arrays of packed numbers  Possible schema changes ~annually  Could (easily?) be partitioned by time interval  Main problem is sheer volume – cannot assume that all data can reside on-line

13 CERN/IT/DB Data Types: ESD  “Summary data” – output of reconstruction process.  Used for event display; re- reconstruction; generation of AOD  Complex data model – several hundred classes; variable length data structures; many references

14 CERN/IT/DB Data Types: AOD  “Physics” quantities – particle tracks; energy clusters etc.  Relatively simple(?) data model but subject to frequent schema changes  Accessed randomly by large numbers of users worldwide

15 CERN/IT/DB Data Types: TAG  Used for pre-selection – tag stores key characteristics of each event  some 10-100 words per event  Updated ~monthly; perhaps much more often  Accessed in “chaotic” fashion, world- wide by 100-1000 users

16 CERN/IT/DB Data Processing Tasks  Reconstruction:  support multiple writers (100-500 filter farm cpu’s)  support aggregate data rates sufficient to keep up with acquisition 100MB/s for ATLAS/CMS; 1.5GB/s for ALICE  Simulation:  Very CPU intensive process but no “special” requirements  Analysis:  support multiple readers (~150 concurrent users)  data volumes and rates hard to predict at this time  assume aggregate data rates  reconstruction

17 CERN/IT/DB

18 HEP Data Models  HEP data models are complex!  Typically hundreds (500-1000) of structure types (classes)  Many relations between them  Different access patterns  LHC experiments rely on OO technology  OO applications deal with networks of objects  Pointers (or references) are used to describe relations Event TrackList TrackerCalor. Track Track Track Track Track HitList Hit Hit Hit Hit Hit

19 CERN/IT/DB CMS:1800 physicists 150 institutes 32 countries World Wide Collaboration  distributed computing & storage capacity

20 CERN/IT/DB  Exploit established computing expertise & infrastructure  In national labs, universities  Reduce dependence on links to CERN  full ESD available nearby - through a fat, fast, reliable network link  Tap funding sources not otherwise available to HEP  Devolve control over resource allocation  national interests? regional interests?  at the expense of physics interests? Why Regional Centres?

21 CERN/IT/DB Tier2 Center Online System Offline Farm, CERN Computer Center > 20 TIPS FranceCentre FNAL Center Italy Center UK Center Institute Institute ~0.25TIPS Workstations ~100 MBytes/sec ~2.4 Gbits/sec 100 - 1000 Mbits/sec Bunch crossing per 25 nsecs. 100 triggers per second Event is ~1 MByte in size Physicists work on analysis “channels”. Each institute has ~10 physicists working on one or more channels Physics data cache ~PBytes/sec ~0.6 - 2.5 Gbits/sec + Air Freight Tier2 Center ~622 Mbits/sec Tier 0 +1 Tier 1 Tier 3 Tier 4 Tier2 Center Tier 2 GriPhyN: FOCUS On University Based Tier2 Centers Experiment

22 CERN/IT/DB Physics Data Overview  Access patterns, users, schema, replication policy, differ significantly  Why not store in different databases? Raw data databases + raw schema Tag data databases + tag schema...


Download ppt "CERN/IT/DB Multi-PB Distributed Databases Jamie Shiers IT Division, DB Group, CERN, Geneva, Switzerland February 2001."

Similar presentations


Ads by Google