LHC experimental data: From today’s Data Challenges to the promise of tomorrow B. Panzer – CERN/IT, F. Rademakers – CERN/EP, P. Vande Vyvre - CERN/EP Academic.

Slides:



Advertisements
Similar presentations
Archive Task Team (ATT) Disk Storage Stuart Doescher, USGS (Ken Gacke) WGISS-18 September 2004 Beijing, China.
Advertisements

Data Storage Solutions Module 1.2. Data Storage Solutions Upon completion of this module, you will be able to: List the common storage media and solutions.
Chapter 3: Planning a Network Upgrade
Hardware & the Machine room Week 5 – Lecture 1. What is behind the wall plug for your workstation? Today we will look at the platform on which our Information.
Beowulf Supercomputer System Lee, Jung won CS843.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
6/2/2015Bernd Panzer-Steindel, CERN, IT1 Computing Fabric (CERN), Status and Plans.
18. November 2003Bernd Panzer, CERN/IT1 LCG Internal Review Computing Fabric Overview.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
12. March 2003Bernd Panzer-Steindel, CERN/IT1 LCG Fabric status
Storage area Network(SANs) Topics of presentation
Belle computing upgrade Ichiro Adachi 22 April 2005 Super B workshop in Hawaii.
Server Platforms Week 11- Lecture 1. Server Market $ 46,100,000,000 ($ 46.1 Billion) Gartner.
Storage Area Network (SAN)
Storage Networking. Storage Trends Storage growth Need for storage flexibility Simplify and automate management Continuous availability is required.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Storage Area Networks The Basics. Storage Area Networks SANS are designed to give you: More disk space Multiple server access to a single disk pool Better.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
High Performance Computing G Burton – ICG – Oct12 – v1.1 1.
Fabric Management for CERN Experiments Past, Present, and Future Tim Smith CERN/IT.
CERN - European Laboratory for Particle Physics HEP Computer Farms Frédéric Hemmer CERN Information Technology Division Physics Data processing Group.
Online Systems Status Review of requirements System configuration Current acquisitions Next steps... Upgrade Meeting 4-Sep-1997 Stu Fuess.
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 CERN.ch.
School of EECS, Peking University Microsoft Research Asia UStore: A Low Cost Cold and Archival Data Storage System for Data Centers Quanlu Zhang †, Yafei.
3. April 2006Bernd Panzer-Steindel, CERN/IT1 HEPIX 2006 CPU technology session some ‘random walk’
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
The ALICE DAQ: Current Status and Future Challenges P. VANDE VYVRE CERN-EP/AID.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
1 U.S. Department of the Interior U.S. Geological Survey Contractor for the USGS at the EROS Data Center EDC CR1 Storage Architecture August 2003 Ken Gacke.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
Storage and Storage Access 1 Rainer Többicke CERN/IT.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National Laboratory.
CERN IT Department CH-1211 Genève 23 Switzerland Introduction to CERN Computing Services Bernd Panzer-Steindel, CERN/IT.
IDE disk servers at CERN Helge Meinhard / CERN-IT CERN OpenLab workshop 17 March 2003.
RAL Site Report John Gordon HEPiX/HEPNT Catania 17th April 2002.
 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
23.March 2004Bernd Panzer-Steindel, CERN/IT1 LCG Workshop Computing Fabric.
LHC experimental data: From today’s Data Challenges to the promise of tomorrow B. Panzer – CERN/IT, F. Rademakers – CERN/EP, P. Vande Vyvre - CERN/EP Academic.
STORAGE ARCHITECTURE/ MASTER): Disk Storage: What Are Your Options? Randy Kerns Senior Partner The Evaluator Group.
Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
01. December 2004Bernd Panzer-Steindel, CERN/IT1 Tape Storage Issues Bernd Panzer-Steindel LCG Fabric Area Manager CERN/IT.
The Worldwide LHC Computing Grid Frédéric Hemmer IT Department Head Visit of INTEL ISEF CERN Special Award Winners 2012 Thursday, 21 st June 2012.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
19. November 2007Bernd Panzer-Steindel, CERN/IT1 CERN Computing Fabric Status LHCC Review, 19 th November 2007.
Hans Wenzel CDF CAF meeting October 18 th -19 th CMS Computing at FNAL Hans Wenzel Fermilab  Introduction  CMS: What's on the floor, How we got.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
26. Juni 2003Bernd Panzer-Steindel, CERN/IT1 LHC Computing re-costing for for the CERN T0/T1 center.
GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 1 Mass Storage at CERN GDB meeting, 12. January 2005.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
CASTOR: possible evolution into the LHC era
ALICE Computing Data Challenge VI
Storage Area Networks The Basics.
Storage Networking.
Cluster Active Archive
Bernd Panzer-Steindel, CERN/IT
LHC Computing re-costing for
Storage Networking.
Cost Effective Network Storage Solutions
Presentation transcript:

LHC experimental data: From today’s Data Challenges to the promise of tomorrow B. Panzer – CERN/IT, F. Rademakers – CERN/EP, P. Vande Vyvre - CERN/EP Academic Training CERN

Computing Infrastructure and Technology Day 2 Academic Training CERN May 2003 Bernd Panzer-Steindel CERN-IT

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 3 Outline  tasks, requirements, boundary conditions  component technologies  building farms and the fabric  into the future

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 4 Before building a computing infrastructure some questions need to be answered :  what are the tasks ?  what is the dataflow ?  what are the requirements ?  what are the boundary conditions ? Questions

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 5 Interactive physics analysis Interactive physics analysis Interactive physics analysis Interactive physics analysis Experiment dataflow Event reconstruction Event reconstruction High Level Trigger selection, reconstr. High Level Trigger selection, reconstr. Processed Data Raw Data Event Simulation Data Acquisition Event Summary Data Interactive physics analysis Interactive physics analysis Physics analysis

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 6 Physics result Detector channel digitization Level 1 and Level 2 Trigger Event building High Level Trigger Detector channel digitization Level 1 and Level 2 Trigger Event building High Level Trigger Offline data reprocessing Offline data analysis Interactive data analysis and visualization Offline data reprocessing Offline data analysis Interactive data analysis and visualization Simulated data production (Monte Carlo) Simulated data production (Monte Carlo) Tasks Data storage Data calibration Online data processing

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 7 Tape server Disk server CPU server DAQ Central Data Recording Online processing Online filtering MC production + pileup Analysis Dataflow Examples Re-processing 5 GB/s2 GB/s1 GB/s 50 GB/s 100 GB/s CPU intensive scenario for 2008

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 8 Requirements and Boundaries (I)  The HEP applications require integer processor performance and less floating point performance  choice of processor type, benchmark reference  Large amount of processing and storage needed, but optimization is for aggregate performance, not the single tasks + the events are independent units  many components, moderate demands on the single components, coarse grain parallelism

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 9  the major boundary condition is cost, staying within the budget envelope + maximum amount of resources  commodity equipment, best price/performance values ≠ cheapest ! take into account reliability, functionality and performance together == total-cost-of-ownership  basic infrastructure, environment availability of space, cooling and electricity Requirements and Boundaries (II)

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 10 Component technologies  processor  disk  tape  network and packaging issues

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 11 Level of complexity Batch system, load balancing, Control software, Hierarchical Storage Systems Hardware Software CPU Physical and logical coupling Disk PC Storage tray, NAS server, SAN element Storage tray, NAS server, SAN element Motherboard, backplane, Bus, integrating devices (memory,Power supply, controller,..) Operating system, driver Network (Ethernet, fibre channel, Myrinet, ….) Hubs, switches, routers Cluster World wide cluster Grid middleware Wide area network Coupling of building blocks

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 12 Processors  focus on integer price/performance (SI2000)  PC mass market INTEL and AMD  price/performance optimum is changing frequently between the two weak point of AMD : heat protection, heat production  current CERN strategy is to use INTEL processors

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 13 Price/performance evolution

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 14 Industry tries now to fulfill Moore’s Law

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 15  best price/performance per node comes today with dual processors and desk side cases processors are only 25-30% of the box costs  mainboard, memory, power-supply, case, disk  today a typical configuration is : 2 x 2.4 GHz PIV processors, 1 GB memory, 80 GB disk, fast ethernet  about two ‘versions’ behind == 2.8 GHz, 3 Ghz are available but don’t give a good price/performance value  one has to add 10% of the box costs for infrastructure ( racks, cabling, network, control system) Processor packaging 1U rack mounted case desk side case  thin units can be up to 30% more expensive  cooling and space

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 16 Computer center Experiment control room SPACE

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 17  seeing effects of market saturation for desktops + moving into the laptop direction we are currently using “desktop+” machines  more expensive to use server CPU’s Moore’s Second Law : the cost of a fabrication facility increases at an even greater rate as the transistor density (doubling every 18 month) current fabrication plants cost : ~ 2.5 billion $ (INTEL profit in 2002 : 3.2 billion $)  heat dissipation, currently heat production increases linear with performance  tera herz transistors ( ), reduce leakage currents  power saving processors BUT careful to compare effective performance measures for mobile computing do not help in case of 100% CPU utilization 24*7 operation Problems

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 18 Processor power consumption Heat production

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 19  Electricity and cooling  large investments necessary, long planning and implementation period  we use today about 700 KW in the center, upgrade to 2.5 MW has started i.e. 2.5 for electricity for cooling need extra buildings, will take several years and costs up to 8 million SFr  this infrastructure evolves not linear but in larger step functions  much more complicated for the experimental areas with their space and access limitations Basic infrastructure

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 20 Disk storage  density improving every year (doubling every ~14 month)  single stream speed (sequential I/O) increasing considerably (up to 100 MB/s)  transactions per second (random I/O, access time) very little improvement (factor 2 in 4 years, from 8 ms to 4 ms)  data rates drop considerably when moving from sequential to random I/O  online/offline processing works with sequential streams  analysis using random access patterns and multiple,parallel sequential streams =~ random access  disks come in different ‘flavours’, connection type to the host same hardware with different electronics SCSI, IDE, fiber channel different quality selection criteria  MTBF (Mean-Time-Between-Failure) mass market == lower values

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 21 Disk performance

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 22 Price/performance evolution

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 23 Storage density evolution

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 24  IDE disks are attached to a RAID controller inside a modified PC with a larger housing, connected with gigabit ethernet to the network  NAS Network Attached Storage good experience with this approach, current practice alternatives :  SAN Storage Area Networks based on disks directly attached to a fiber channel network  iSCSI SCSI commands via IP, disk trays with iSCSI controller attached to ethernet R&D, evaluations  advantages of SAN versus NAS which would justify the higher costs factor 2-4 not only the ‘pure’ costs per GB of storage  throughput, reliability, manageability, redundancy Storage packaging

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 25 for disk servers coupling of disks, processor, memory and network defines the performance + LINUX PCI 120 – 500 MB/s PCI-X 1 – 8 GB/s

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 26 Tape storage  not a mass market, aimed at backup (write once - read never) we need high throughput reliable under constant read/write stress  need automated reliable access to a large amount of data  large robotic installations  major players are IBM and StorageTek (STK)  improvements are slow, not comparable with processors or disks trends ; current generation : 30 MB/s tape drives with 200 GB cartridges  disk and tape storage prices are getting closer factor 2-3 difference  two types of read/write technologies : helical scan  “video recorder” complicated mechanics linear scan  “audio recorder” simpler, density lower linear is prefered, had some bad experience with helical scan

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 27 Network  commodity Ethernet 10 / 100 / 1000 / Mbits/s sufficient in the offline world and even partly in the online world (HLT) level1 triggers need lower latency times  special network, Cluster interconnect : Myrinet 1,2,10 Gbits/s GSN 6.4 Gbits/s  infiniband 2.5 Gbits/s * 4 (12)  storage network fiber channel 1 Gbits/s, 2 Gbits/s very high performance with low latency, small processor ‘footprint’, small market, expensive

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 28  nano technology (carbon nanotubes)  molecular computing, (kilohertz plastic processors, single molecule switches)  biological computing, (DNS computing)  quantum computing, (quantum dots, ion traps, few qbits only)  very interesting and fast progress in the last years, but far away from any commodity production  less fancy  game machines (X-Box, GameCube, Playstation 2) advantage : large market (>10 billion $), cheap high power nodes disadvantage : little memory, networking capabilities graphics cards several times the raw power of normal CPUs not easy to use in our environment “Exotic” technology trends

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 29 Technology evolution exponential growth rates everywhere

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 30 Building farms and the fabric

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 31 Building the Farm Processors  “desktop+” node == CPU server CPU server + larger case + 6*2 disks == Disk server CPU server + Fiber Channel Interface + tape drive == Tape server

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 32 Software ‘glue’  management of the basic hardware and software : installation, configuration and monitoring system (from the European Data Grid project)  management of the processor computing resources : Batch system (LSF from Platform Computing)  management of the storage (disk and tape) : CASTOR (CERN developed Hierarchical Storage Management system)

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 33 tape servers disk servers application servers Generic model of a Fabric to external network network

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 34 Fast Ethernet, 100 Mbit/s Gigabit Ethernet, 1000 Mbit/s WAN Disk ServerTape Server CPU Server Backbone Today’s schematic network topology Multiple Gigabit Ethernet, 20 * 1000 Mbit/s Gigabit Ethernet, 1000 Mbit/s

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 35 LCG Testbed Structure 100 cpu servers on GE, 300 on FE, 100 disk servers on GE (~50TB), 20 tape server on GE 3 GB lines 8 GB lines 64 disk server Backbone Routers Backbone Routers 36 disk server 20 tape server 100 GE cpu server 200 FE cpu server 100 FE cpu server 1 GB lines GigaBit Gigabit Ethernet Fast Ethernet

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 36  Benchmark,performance and testbed clusters (LCG prototype resources)  computing data challenges, technology challenges, online tests, EDG testbeds, preparations for the LCG-1 production system, complexity tests  500 CPU server, 100 disk server, ~ Si2000, ~ 50 TB  Main fabric cluster (Lxbatch/Lxplus resources)  physics production for all experiments Requests are made in units of Si2000  1000 CPU server, 160 disk server, ~ Si2000, ~ 100 TB  50 tape drives (30MB/s, 200 GB cart.) 10 silos with 6000 slots each == 12 PB capacity Computer center today

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT hardware generations 2-3 OS/software versions 4 Experiment environments Service control and management (e.g. stager, HSM, LSF master, repositories, GRID services, CA, etc Service control and management (e.g. stager, HSM, LSF master, repositories, GRID services, CA, etc Main fabric cluster Certification cluster Main cluster ‘en miniature’ Certification cluster Main cluster ‘en miniature’ R&D cluster (new architecture and hardware) R&D cluster (new architecture and hardware) Benchmark and performance cluster (current architecture and hardware) Benchmark and performance cluster (current architecture and hardware) New software, new hardware (purchase) oldcurrentnew Development cluster GRID testbeds Development cluster GRID testbeds General Fabric Layout

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 38 View of different Fabric areas Infrastructure Electricity, Cooling, Space Infrastructure Electricity, Cooling, Space Network Batch system (LSF, CPU server) Batch system (LSF, CPU server) Storage system (AFS, CASTOR, disk server) Storage system (AFS, CASTOR, disk server) Purchase, Hardware selection, Resource planning Purchase, Hardware selection, Resource planning Installation Configuration + monitoring Fault tolerance Installation Configuration + monitoring Fault tolerance Prototype, Testbeds Benchmarks, R&D, Architecture Benchmarks, R&D, Architecture Automation, Operation, Control Coupling of components through hardware and software GRID services !?

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 39 Into the future

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 40  current state of performance, functionality and reliability is good and technology developments look still promising  more of the same for the future !?!? How can we be sure that we are following the right path ? How to adapt to changes ? Considerations

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 41  continue and expand the current system BUT do in parallel :  R&D activities SAN versus NAS, iSCSI, IA64 processors, ….  technology evaluations infiniband clusters, new filesystem technologies,…..  Data Challenges to test scalabilities on larger scales “bring the system to it’s limit and beyond “ we are very successful already with this approach, especially with the “beyond” part  Fridays talk  watch carefully the market trends Strategy

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 42 CERN computer center 2008  Hierarchical Ethernet network tree topology (280 GB/s)  ~ 8000 mirrored disks ( 4 PB)  ~ 3000 dual CPU nodes (20 million SI2000)  ~ 170 tape drives (4 GB/s)  ~ 25 PB tape storage  The CMS High Level Trigger will consist of about 1000 nodes with 10 million SI2000 !!  all numbers : IF exponential growth rate continues !

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 43 Gigabit Ethernet, 1000 Mbit/s WAN Disk Server Tape Server CPU Server Backbone Tomorrow’s schematic network topology Multiple 10 Gigabit Ethernet, 200 * Mbit/s 10 Gigabit Ethernet, Mbit/s

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 44  quite confident in the technological evolution  quite confident in the current architecture  LHC computing is not a question of pure technology  efficient coupling of components, hard- + software  commodity is a must for cost efficiency  boundary conditions are important  market development can have large effects Summary

CERN Academic Training May 2003 Bernd Panzer-Steindel CERN-IT 45 Tomorrow Day 1 (Pierre VANDE VYVRE) –Outline, main concepts –Requirements of LHC experiments –Data Challenges Day 2 (Bernd PANZER) –Computing infrastructure –Technology trends Day 3 (Pierre VANDE VYVRE) –Data acquisition Day 4 (Fons RADEMAKERS) –Simulation, Reconstruction and analysis Day 5 (Bernd PANZER) –Computing Data challenges –Physics Data Challenges –Evolution