Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fabric Management for CERN Experiments Past, Present, and Future Tim Smith CERN/IT.

Similar presentations


Presentation on theme: "Fabric Management for CERN Experiments Past, Present, and Future Tim Smith CERN/IT."— Presentation transcript:

1 Fabric Management for CERN Experiments Past, Present, and Future Tim Smith CERN/IT

2 2000/11/03Tim Smith: HEPiX @ JLab2 Contents  The Fabric of CERN today  The new challenges of LHC computing  What has this got to do with the GRID  Fabric Management solutions of tomorrow?  The DataGRID Project

3 2000/11/03Tim Smith: HEPiX @ JLab3 Fabric Elements  Functionalities  Batch and Interactive  Disk servers  Tape Servers + devices  Stage servers  Home directory servers  Application servers  Backup service  Infrastructure  Job Scheduler  Authentication  Authorisation  Monitoring  Alarms  Console managers  Networks

4 2000/11/03Tim Smith: HEPiX @ JLab4 Fabric Technology at CERN 899091929394959697989900010303020204040505 1 100 10 1000 10000 Mainframes IBM Cray RISC Workstations Scalable Systems SP2 CS2 RISC Workstations PC Farms Multiplicity Scale Year SMPs SGI,DEC,HP,SUN

5 2000/11/03Tim Smith: HEPiX @ JLab5 Architecture Considerations  Physics applications have ideal data parallelism  mass of independent problems  No message passing  throughput rather than performance  resilience rather than ultimate reliability  Can build hierarchies of mass market components  High Throughput Computing

6 2000/11/03Tim Smith: HEPiX @ JLab6 Component Architecture 100/1000baseT switch CPU High capacity backbone switch 1000baseT switch Tape Server Disk Server Application Server

7 2000/11/03Tim Smith: HEPiX @ JLab7 Analysis Chain: Farms batch physics analysis batch physics analysis detector event summary data raw data event reconstruction event reconstruction event simulation event simulation interactive physics analysis analysis objects (extracted by physics topic) event filter (selection & reconstruction) event filter (selection & reconstruction) processed data

8 2000/11/03Tim Smith: HEPiX @ JLab8 Multiplication ! 0 200 400 600 800 1000 1200 Jul-97Jan-98Jul-98Jan-99Jul-99Jan-00 #CPUs tomog tapes pcsf nomad na49 na48 na45 mta lxbatch lxplus lhcb l3c ion eff cms ccf atlas alice

9 2000/11/03Tim Smith: HEPiX @ JLab9 PC Farms

10 2000/11/03Tim Smith: HEPiX @ JLab10 Shared Facilities

11 2000/11/03Tim Smith: HEPiX @ JLab11 LHC Computing Challenge  The scale will be different  CPU10k SI951M SI95  Disk30TB3PB  Tape600TB9PB  The model will be different  There are compelling reasons why some of the farms and some of the capacity will not be located at CERN

12 2000/11/03Tim Smith: HEPiX @ JLab12 Non-LHC Moore’s Law LHC Estimated disk storage capacity at CERN ~10K SI95 1200 processors Non- LHC LHC Estimated CPU capacity at CERN Bad News: IO 1996:4G @10MB/s 1TB – 2500MB/s 2000:50G @ 20 MB/s 1TB – 400 MB/s Bad News: Tapes < factor 2 reduction in 8 years Significant fraction of cost

13 2000/11/03Tim Smith: HEPiX @ JLab13 Regional Centres: a Multi-Tier Model Department    Desktop CERN – Tier 0 MONARC http://cern.ch/MONARC Tier 1 FNAL RAL IN2P3 622 Mbps 2.5 Gbps 622 Mbps 155 mbps Tier2 Lab a Uni b Lab c Uni n

14 2000/11/03Tim Smith: HEPiX @ JLab14 More realistically: a Grid Topology CERN – Tier 0 Tier 1 FNAL RAL IN2P3 622 Mbps 2.5 Gbps 622 Mbps 155 mbps Tier2 Lab a Uni b Lab c Uni n Department    Desktop DataGRID http://cern.ch/grid

15 2000/11/03Tim Smith: HEPiX @ JLab15 Can we build LHC farms?  Positive predictions  CPU and disk price/performance trends suggest that the raw processing and disk storage capacities will be affordable, and  raw data rates and volumes look manageable  perhaps not today for ALICE  Space, power and cooling issues?  So probably yes… but can we manage them?  Understand costs - 1 PC is cheap, but managing 10000 is not!  Building and managing coherent systems from such large numbers of boxes will be a challenge. 1999: CDR @ 45MB/s for NA48! 2000: CDR @ 90MB/s for Alice!

16 2000/11/03Tim Smith: HEPiX @ JLab16 Management Tasks I  Supporting adaptability  Configuration Management  Machine / Service hierarchy  Automated registration / insertion / removal  Dynamic reassignment  Automatic Software Installation and Management (OS and applications)  Version management  Application dependencies  Controlled (re)deployment

17 2000/11/03Tim Smith: HEPiX @ JLab17 Management Tasks II  Controlling Quality of Service  System Monitoring  Orientation to the service NOT the machine  Uniform access to diverse fabric elements  Integrated with configuration (change) management  Problem Management  Identification of root causes (faults + performance)  Correlate network / system / application data  Highly automated  Adaptive - Integrated with configuration management

18 2000/11/03Tim Smith: HEPiX @ JLab18 Relevance to the GRID ?  Scalable solutions needed in absence of GRID !  For the GRID to work it must be presented with information and opportunities  Coordinated and efficiently run centres  Presentable as a guaranteed quality resource  ‘GRID’ification : the interfaces

19 2000/11/03Tim Smith: HEPiX @ JLab19 Mgmt Tasks: A GRID centre  GRID enable  Support external requests: services  Publication  Coordinated + ‘map’able  Security: Authentication / Authorisation  Policies: Allocation / Priorities / Estimation / Cost  Scheduling  Reservation  Change Management  Guarantees  Resource availability / QoS

20 2000/11/03Tim Smith: HEPiX @ JLab20 Existing Solutions ?  The world outside is moving fast !!  Dissimilar problems  Virtual super computers (~200 nodes)  MPI, latency, interconnect topology and bandwith  Roadrunner, LosLobos, Cplant, Beowulf  Similar problems  ISPs / ASPs (~200 nodes)  Clustering: high availability / mission critical  The DataGRID : Fabric Management WP4

21 2000/11/03Tim Smith: HEPiX @ JLab21 WP4 Partners  CERN (CH)Tim Smith  ZIB (D)Alexander Reinefeld  KIP (D)Volker Lindenstruth  NIKHEF (NL)Kors Bos  INFN (I)Michele Michelotto  RAL (UK)Andrew Sansum  IN2P3 (Fr)Denis Linglin

22 2000/11/03Tim Smith: HEPiX @ JLab22 Concluding Remarks  Years of experience in exploiting inexpensive mass market components  But we need to marry these with inexpensive highly scalable management tools  Build components back together as a resource for the GRID


Download ppt "Fabric Management for CERN Experiments Past, Present, and Future Tim Smith CERN/IT."

Similar presentations


Ads by Google