CERN/IT/DB Multi-PB Distributed Databases Jamie Shiers IT Division, DB Group, CERN, Geneva, Switzerland February 2001.

Slides:



Advertisements
Similar presentations
1 AMY Detector (eighties) A rather compact detector.
Advertisements

31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.
Highest Energy e + e – Collider LEP at CERN GeV ~4km radius First e + e – Collider ADA in Frascati GeV ~1m radius e + e – Colliders.
Amber Boehnlein, FNAL D0 Computing Model and Plans Amber Boehnlein D0 Financial Committee November 18, 2002.
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
Randall Sobie The ATLAS Experiment Randall Sobie Institute for Particle Physics University of Victoria Large Hadron Collider (LHC) at CERN Laboratory ATLAS.
Grids: Why and How (you might use them) J. Templon, NIKHEF VLV T Workshop NIKHEF 06 October 2003.
Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia.
Title US-CMS User Facilities Vivian O’Dell US CMS Physics Meeting May 18, 2001.
POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC.
HEP Prospects, J. Yu LEARN Strategy Meeting Prospects on Texas High Energy Physics Network Needs LEARN Strategy Meeting University of Texas at El Paso.
Regional Computing Centre for Particle Physics Institute of Physics AS CR (FZU) TIER2 of LCG (LHC Computing Grid) 1M. Lokajicek Dell Presentation.
D. Duellmann, CERN Data Management at the LHC1 Data Management at CERN’s Large Hadron Collider (LHC) Dirk Düllmann CERN IT/DB, Switzerland
LHC’s Second Run Hyunseok Lee 1. 2 ■ Discovery of the Higgs particle.
Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.
Offline Software Status Jan. 30, 2009 David Lawrence JLab 1.
25 February 2000Tim Adye1 Using an Object Oriented Database to Store BaBar's Terabytes Tim Adye Particle Physics Department Rutherford Appleton Laboratory.
CERN-IT-DB Exabyte-Scale Data Management Using an Object-Relational Database: The LHC Project at CERN Jamie Shiers CERN, Switzerland
Particle Physics and the Grid Randall Sobie Institute of Particle Physics University of Victoria Motivation Computing challenge LHC Grid Canadian requirements.
1 Kittikul Kovitanggoon*, Burin Asavapibhop, Narumon Suwonjandee, Gurpreet Singh Chulalongkorn University, Thailand July 23, 2015 Workshop on e-Science.
Data Import Data Export Mass Storage & Disk Servers Database Servers Tapes Network from CERN Network from Tier 2 and simulation centers Physics Software.
Copyright © 2000 OPNET Technologies, Inc. Title – 1 Distributed Trigger System for the LHC experiments Krzysztof Korcyl ATLAS experiment laboratory H.
Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.
LHC: ATLAS Experiment meeting “Conditions” data challenge Elizabeth Gallas - Oxford - August 29, 2009 XLDB3.
LHC Computing Review - Resources ATLAS Resource Issues John Huth Harvard University.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
IST E-infrastructure shared between Europe and Latin America High Energy Physics Applications in EELA Raquel Pezoa Universidad.
LHC Computing Plans Scale of the challenge Computing model Resource estimates Financial implications Plans in Canada.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Fermilab June 29, 2001Data collection and handling for HEP1 Matthias Kasemann Fermilab Overview of Data collection and handling for High Energy Physics.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
Institute of High Energy Physics ( ) NEC’2005 Varna, Bulgaria, September Participation of IHEP in EGEE.
V.A. Ilyin,, RIGF, 14 May 2010 Internet and Science: LHC view V.A. Ilyin SINP MSU, e-ARENA.
The LHCb CERN R. Graciani (U. de Barcelona, Spain) for the LHCb Collaboration International ICFA Workshop on Digital Divide Mexico City, October.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
Eine Einführung ins Grid Andreas Gellrich IT Training DESY Hamburg
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
WLCG and the India-CERN Collaboration David Collados CERN - Information technology 27 February 2014.
Grid Glasgow Outline LHC Computing at a Glance Glasgow Starting Point LHC Computing Challenge CPU Intensive Applications Timeline ScotGRID.
CBM Computing Model First Thoughts CBM Collaboration Meeting, Trogir, 9 October 2009 Volker Friese.
Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.
Computing for LHC Physics 7th March 2014 International Women's Day - CERN- GOOGLE Networking Event Maria Alandes Pradillo CERN IT Department.
ATLAS Grid Computing Rob Gardner University of Chicago ICFA Workshop on HEP Networking, Grid, and Digital Divide Issues for Global e-Science THE CENTER.
LHC Computing, CERN, & Federated Identities
Data Processing and the LHC Computing Grid (LCG) Jamie Shiers Database Group, IT Division CERN, Geneva, Switzerland
The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.
LCG LHC Computing Grid Project From the Web to the Grid 23 September 2003 Jamie Shiers, Database Group IT Division, CERN, Geneva, Switzerland
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
tons, 150 million sensors generating data 40 millions times per second producing 1 petabyte per second The ATLAS experiment.
LHCb Current Understanding of Italian Tier-n Centres Domenico Galli, Umberto Marconi Roma, January 23, 2001.
Storage Management on the Grid Alasdair Earl University of Edinburgh.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
The ATLAS detector … … is composed of cylindrical layers: Tracking detector: Pixel, SCT, TRT (Solenoid magnetic field) Calorimeter: Liquid Argon, Tile.
Collaborative Research Projects in Australia: High Energy Physicists Dr. Greg Wickham (AARNet) Dr. Glenn Moloney (University of Melbourne) Global Collaborations.
Predrag Buncic CERN Plans for Run2 and the ALICE upgrade in Run3 ALICE Tier-1/Tier-2 Workshop February 2015.
10-Feb-00 CERN HepCCC Grid Initiative ATLAS meeting – 16 February 2000 Les Robertson CERN/IT.
CrossGrid Workshop, Kraków, 5 – 6 Nov-2001 Distributed Data Analysis in HEP Piotr MALECKI Institute of Nuclear Physics Kawiory 26A, Kraków, Poland.
Hall D Computing Facilities Ian Bird 16 March 2001.
] Open Science Grid Ben Clifford University of Chicago
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
Grid site as a tool for data processing and data analysis
CERN, the LHC and the Grid
CS258 Spring 2002 Mark Whitney and Yitao Duan
Using an Object Oriented Database to Store BaBar's Terabytes
Development of LHCb Computing Model F Harris
The LHC Computing Grid Visit of Professor Andreas Demetriou
Presentation transcript:

CERN/IT/DB Multi-PB Distributed Databases Jamie Shiers IT Division, DB Group, CERN, Geneva, Switzerland February 2001

CERN - The European Organisation for Nuclear Research The European Laboratory for Particle Physics  Fundamental research in particle physics  Designs, builds & operates large accelerators  Financed by 20 European countries  SFR 950M budget - operation + new accelerators  3000 staff users (researchers) from all over the world  Experiments conducted by a small number of large collaborations: LEP (current) experiment: 500 physicists, 50 universities, 20 countries, apparatus cost SFR 100M LHC (future ~2006) experiment: 2000 physicists, 150 universities, apparatus costing SFR 500M

CERN/IT/DB aéroport Centre de Calcul Genève  27km 

CERN/IT/DB The LEP tunnel

CERN/IT/DB The LHC machine in the LEP tunnel Two counter- circulating proton beams Collision energy TeV 27 Km of magnets with a field of 8.4 Tesla Super-fluid Helium cooled to 1.9°K The world’s largest superconducting structure

CERN/IT/DB The LHC Detectors CMS ATLAS LHCb

CERN/IT/DB The ATLAS detector – the size of a 6 floor building!

CERN/IT/DB online system multi-level trigger filter out background reduce data volume from 40TB/s to 100MB/s level 1 - special hardware 40 MHz (40 TB/sec) level 2 - embedded processors level 3 - PCs 75 KHz (75 GB/sec) 5 KHz (5 GB/sec) 100 Hz (100 MB/sec) data recording & offline analysis

CERN/IT/DB Event Filter & Reconstruction (figures are for one experiment) switch data from detector - event builder high speed network computer farm tape and disk servers raw data summary data input: GB/sec capacity: 50K SI95 (~4K 1999 PCs) recording rate: 100 MB/sec (Alice – 1 GB/sec) PetaByte/year TB/year 20,000 Redwood cartridges every year (+ copy)

CERN/IT/DB Analysis Model Reconstruction Selection Analysis Re-processing 3 per year Iterative selection Once per month Different Physics cuts & MC comparison ~4 times per day Experiment- Wide Activity (10 9 events) ~20 Groups’ Activity (10 9  10 7 events) ~25 Individual per Group Activity (10 6 –10 8 events) New detector calibrations Or understanding Trigger based and Physics based refinements Algorithms applied to data to get results 1000 SI95sec/event 1 job year 1000 SI95sec/event 1 job year 1000 SI95sec/event 3 jobs per year 1000 SI95sec/event 3 jobs per year 25 SI95sec/event ~20 jobs per month 25 SI95sec/event ~20 jobs per month 3SI95sec/event ~2000 jobs per day 3SI95sec/event Monte Carlo 5000 SI95sec/event

CERN/IT/DB LHC Data Volumes References within but not between events (1MB)

CERN/IT/DB Data Types: RAW  Accessed sequentially, ~once per year + occasionally “random” access.  100% at CERN site / 20% at ~5 regional centres.  “Simple” data model – largely dictated by detector hardware – typically large arrays of packed numbers  Possible schema changes ~annually  Could (easily?) be partitioned by time interval  Main problem is sheer volume – cannot assume that all data can reside on-line

CERN/IT/DB Data Types: ESD  “Summary data” – output of reconstruction process.  Used for event display; re- reconstruction; generation of AOD  Complex data model – several hundred classes; variable length data structures; many references

CERN/IT/DB Data Types: AOD  “Physics” quantities – particle tracks; energy clusters etc.  Relatively simple(?) data model but subject to frequent schema changes  Accessed randomly by large numbers of users worldwide

CERN/IT/DB Data Types: TAG  Used for pre-selection – tag stores key characteristics of each event  some words per event  Updated ~monthly; perhaps much more often  Accessed in “chaotic” fashion, world- wide by users

CERN/IT/DB Data Processing Tasks  Reconstruction:  support multiple writers ( filter farm cpu’s)  support aggregate data rates sufficient to keep up with acquisition 100MB/s for ATLAS/CMS; 1.5GB/s for ALICE  Simulation:  Very CPU intensive process but no “special” requirements  Analysis:  support multiple readers (~150 concurrent users)  data volumes and rates hard to predict at this time  assume aggregate data rates  reconstruction

CERN/IT/DB

HEP Data Models  HEP data models are complex!  Typically hundreds ( ) of structure types (classes)  Many relations between them  Different access patterns  LHC experiments rely on OO technology  OO applications deal with networks of objects  Pointers (or references) are used to describe relations Event TrackList TrackerCalor. Track Track Track Track Track HitList Hit Hit Hit Hit Hit

CERN/IT/DB CMS:1800 physicists 150 institutes 32 countries World Wide Collaboration  distributed computing & storage capacity

CERN/IT/DB  Exploit established computing expertise & infrastructure  In national labs, universities  Reduce dependence on links to CERN  full ESD available nearby - through a fat, fast, reliable network link  Tap funding sources not otherwise available to HEP  Devolve control over resource allocation  national interests? regional interests?  at the expense of physics interests? Why Regional Centres?

CERN/IT/DB Tier2 Center Online System Offline Farm, CERN Computer Center > 20 TIPS FranceCentre FNAL Center Italy Center UK Center Institute Institute ~0.25TIPS Workstations ~100 MBytes/sec ~2.4 Gbits/sec Mbits/sec Bunch crossing per 25 nsecs. 100 triggers per second Event is ~1 MByte in size Physicists work on analysis “channels”. Each institute has ~10 physicists working on one or more channels Physics data cache ~PBytes/sec ~ Gbits/sec + Air Freight Tier2 Center ~622 Mbits/sec Tier 0 +1 Tier 1 Tier 3 Tier 4 Tier2 Center Tier 2 GriPhyN: FOCUS On University Based Tier2 Centers Experiment

CERN/IT/DB Physics Data Overview  Access patterns, users, schema, replication policy, differ significantly  Why not store in different databases? Raw data databases + raw schema Tag data databases + tag schema...