Presentation is loading. Please wait.

Presentation is loading. Please wait.

Physics Computing at CERN Helge Meinhard CERN, IT Department OpenLab Student Lecture 23 July 2013.

Similar presentations


Presentation on theme: "Physics Computing at CERN Helge Meinhard CERN, IT Department OpenLab Student Lecture 23 July 2013."— Presentation transcript:

1 Physics Computing at CERN Helge Meinhard CERN, IT Department OpenLab Student Lecture 23 July 2013

2 Outline CERNs computing facilities and hardware CERNs computing facilities and hardware Service categories, tasks Service categories, tasks Infrastructure, networking, databases Infrastructure, networking, databases HEP analysis: techniques and data flows HEP analysis: techniques and data flows Network, plus, batch, storage Network, plus, batch, storage Between HW and services: Agile Infrastructure Between HW and services: Agile Infrastructure References References 23 July 2013Physics computing - Helge Meinhard2

3 Outline CERNs computing facilities and hardware CERNs computing facilities and hardware Service categories, tasks Service categories, tasks Infrastructure,, databases Infrastructure, networking, databases HEP analysis: techniques and data flows HEP analysis: techniques and data flows Network, plus, batch, storage Network, plus, batch, storage Between HW and services: Agile Infrastructure Between HW and services: Agile Infrastructure References References 23 July 2013Physics computing - Helge Meinhard3

4 Location (1) Building 513 (opposite of restaurant no. 2) 23 July 2013Physics computing - Helge Meinhard4

5 Building 513 Large building with 2700 m 2 surface for computing equipment, capacity for 3.5 MW electricity and 3.5MW air and water cooling Chillers Transformers 23 July 2013Physics computing - Helge Meinhard5

6 Building 513 – Latest Upgrade Scope of the upgrade: Scope of the upgrade: Increase critical UPS power to 600 kW (with new critical UPS room) and overall power to 3.5 MW (from 2.9 MW) Increase critical UPS power to 600 kW (with new critical UPS room) and overall power to 3.5 MW (from 2.9 MW) New dedicated room for critical equipment, new electrical rooms and critical ventilation systems in Barn New dedicated room for critical equipment, new electrical rooms and critical ventilation systems in Barn Dedicated cooling infrastructure for critical equipment (decoupled from physics) Dedicated cooling infrastructure for critical equipment (decoupled from physics) New building for cooling system New building for cooling system Critical equipment which cannot be moved to new rooms to have new dedicated cooling Critical equipment which cannot be moved to new rooms to have new dedicated cooling Networking area and telecoms rooms Networking area and telecoms rooms Restore N+1 redundancy for all UPS systems Restore N+1 redundancy for all UPS systems New critical room operational since January 2013 New critical room operational since January 2013 Critical services gradually being migrated into this room and expected to be completed early 2014 Critical services gradually being migrated into this room and expected to be completed early 2014 Last finishing touches to be completed soon Last finishing touches to be completed soon 23 July 2013Physics computing - Helge Meinhard6 New HVAC building

7 Other facilities (1) Building 613: Small machine room for tape libraries (about 200 m from building 513) Building 613: Small machine room for tape libraries (about 200 m from building 513) Hosting centre about 15 km from CERN: 35 m 2, about 100 kW, critical equipment Hosting centre about 15 km from CERN: 35 m 2, about 100 kW, critical equipment 23 July 2013Physics computing - Helge Meinhard7

8 Other facilities – Wigner (1) Additional resources for future needs Additional resources for future needs 8000 servers now, 15000 estimated by end 2017 8000 servers now, 15000 estimated by end 2017 Studies in 2008 into building a new computer centre on the CERN Prevessin site Studies in 2008 into building a new computer centre on the CERN Prevessin site Too costly Too costly In 2011, tender run across CERN member states for remote hosting In 2011, tender run across CERN member states for remote hosting 16 bids 16 bids In March 2012, Wigner Institute in Budapest, Hungary selected; contract signed in May 2012 In March 2012, Wigner Institute in Budapest, Hungary selected; contract signed in May 2012 23 July 2013Physics computing - Helge Meinhard8

9 Other facilities – Wigner (2) Timescales Data Centre Timescales Data Centre Construction started in May 2012 Construction started in May 2012 First room available January 2013 First room available January 2013 Inauguration with Hungarian Prime Minister June 18 th 2013 Inauguration with Hungarian Prime Minister June 18 th 2013 Construction finished at end of June 2013 Construction finished at end of June 2013 Timescales Services Timescales Services First deliveries (network and servers) 1Q13 First deliveries (network and servers) 1Q13 2x100Gbps links operation in February 2013 2x100Gbps links operation in February 2013 Round trip latency ~ 25ms Round trip latency ~ 25ms Servers installed, tested and ready for use May 2013 Servers installed, tested and ready for use May 2013 Expected to be put into production end July 2013 Expected to be put into production end July 2013 Large ramp up foreseen during 2014/2015 Large ramp up foreseen during 2014/2015 This will be a hands-off facility for CERN This will be a hands-off facility for CERN Wigner manage the infrastructure and hands-on work Wigner manage the infrastructure and hands-on work We do everything else remotely We do everything else remotely 23 July 2013Physics computing - Helge Meinhard9

10 Other facilities – Wigner (3) 23 July 2013Physics computing - Helge Meinhard10

11 Other facilities – Wigner (4) 23 July 2013Physics computing - Helge Meinhard11

12 CPU server or worker node: dual CPU, six or eight cores, 3...4 GB memory per core Disk server = CPU server + RAID or disk controller + ~ 16...36 internal or external SATA disks Tape server = CPU server + fibre channel connection + tape drive Commodity market components: not cheap, but cost effective! Simple components, but many of them Commodity market components: not cheap, but cost effective! Simple components, but many of them Market trends more important than technology trends Always watch TCO: Total Cost of Ownership Market trends more important than technology trends Always watch TCO: Total Cost of Ownership Computing Building Blocks Physics computing - Helge Meinhard1223 July 2013

13 24x7 operator support and System Administration services to support 24x7 operation of all IT services 24x7 operator support and System Administration services to support 24x7 operation of all IT services Hardware installation, retirement and repair Hardware installation, retirement and repair Per year: ~4000 hardware movements; ~2700 hardware interventions/repairs Per year: ~4000 hardware movements; ~2700 hardware interventions/repairs 1323 July 2013Physics computing - Helge Meinhard CERN CC currently MetricsNumber Number of 10GB NICs2782 Number of 1GB NICs18051 Number of cores88014 Number of disks79167 Number of memory modules63028 Number of processors17196 Number of boxes10035 Total disk space (TiB)123975.41 Total memory capacity (TiB)296.12 Preliminary – probably incomplete – to be checked

14 Outline CERNs computing facilities and hardware CERNs computing facilities and hardware Service categories, tasks Service categories, tasks Infrastructure,, databases Infrastructure, networking, databases HEP analysis: techniques and data flows HEP analysis: techniques and data flows Network, plus, batch, storage Network, plus, batch, storage Between HW and services: Agile Infrastructure Between HW and services: Agile Infrastructure References References 23 July 2013Physics computing - Helge Meinhard14

15 Two coarse grain computing categories Computing infrastructure and administrative computing Physics data flow and data processing Computing Service Categories 23 July 2013Physics computing - Helge Meinhard15

16 Task overview Communication tools: mail, Web, Twiki, GSM, … Communication tools: mail, Web, Twiki, GSM, … Productivity tools: office software, software development, compiler, visualization tools, engineering software, … Productivity tools: office software, software development, compiler, visualization tools, engineering software, … Computing capacity: CPU processing, data repositories, personal storage, software repositories, metadata repositories, … Computing capacity: CPU processing, data repositories, personal storage, software repositories, metadata repositories, … Needs underlying infrastructure Needs underlying infrastructure Network and telecom equipment Computing equipment for processing, storage and databases Management and monitoring software Maintenance and operations Authentication and security 23 July 2013Physics computing - Helge Meinhard16

17 Outline CERNs computing facilities and hardware CERNs computing facilities and hardware Service categories, tasks Service categories, tasks Infrastructure,, databases Infrastructure, networking, databases HEP analysis: techniques and data flows HEP analysis: techniques and data flows Network, plus, batch, storage Network, plus, batch, storage Between HW and services: Agile Infrastructure Between HW and services: Agile Infrastructure References References 23 July 2013Physics computing - Helge Meinhard17

18 Software environment and productivity tools Mail 230000 emails/day, 64% spam 31000 mail boxes Web services 12500 web sites Tool accessibility Windows, Office, CadCam, … User registration and authentication 35600 registered users, 48300 accounts Home directories (DFS, AFS) ~550 TB, backup service ~ 3 billion files PC management Software and patch installations Infrastructure needed : > 700 servers Infrastructure Services 1823 July 2013Physics computing - Helge Meinhard

19 ATLAS Central, high speed network backbone Experiments All CERN buildings 12000 active users World Wide Grid centres Computer centre processing clusters Network Overview 1923 July 2013Physics computing - Helge Meinhard Wigner

20 Numerous ORACLE data base instances, total > 370 TB data (not counting backups) Numerous ORACLE data base instances, total > 370 TB data (not counting backups) Bookkeeping of physics events for the experiments Bookkeeping of physics events for the experiments Meta data for the physics events (e.g. detector conditions) Meta data for the physics events (e.g. detector conditions) Management of data processing Management of data processing Highly compressed and filtered event data Highly compressed and filtered event data … LHC machine parameters, monitoring data Human resource information Financial bookkeeping Material bookkeeping and material flow control LHC and detector construction details … Bookkeeping: Database Services 23 July 2013Physics computing - Helge Meinhard20

21 Outline CERNs computing facilities and hardware CERNs computing facilities and hardware Service categories, tasks Service categories, tasks Infrastructure,, databases Infrastructure, networking, databases HEP analysis: techniques and data flows HEP analysis: techniques and data flows Network, plus, batch, storage Network, plus, batch, storage Between HW and services: Agile Infrastructure Between HW and services: Agile Infrastructure References References 23 July 2013Physics computing - Helge Meinhard21

22 HEP analyses Statistical quantities over many collisions Statistical quantities over many collisions Histograms Histograms One event doesnt prove anything One event doesnt prove anything Comparison of statistics from real data with expectations from simulations Comparison of statistics from real data with expectations from simulations Simulations based on known models Simulations based on known models Statistically significant deviations show that the known models are not sufficient Statistically significant deviations show that the known models are not sufficient Need more simulated data than real data Need more simulated data than real data In order to cover various models In order to cover various models In order to be dominated by statistical error of real data, not simulation In order to be dominated by statistical error of real data, not simulation 23 July 2013Physics computing - Helge Meinhard22

23 simulation reconstruction analysis interactive physics analysis batch physics analysis batch physics analysis detector event summary data raw data event reprocessing event reprocessing event simulation event simulation analysis objects (extracted by physics topic) event filter (selection & reconstruction) event filter (selection & reconstruction) processed data les.robertson@cern.ch Data Handling and Computation for Physics Analyses 23 July 2013Physics computing - Helge Meinhard23

24 Fast response electronics, FPGA, embedded processors, very close to the detector O(1000) servers for processing, Gbit Ethernet Network N x 10 Gbit links to the computer centre Detector 150 million electronics channels Level 1 Filter and Selection High Level Filter and Selection 1 PBytes/s 150 GBytes/s 0.6 GBytes/s CERN computer centre Constraints: Budget Physics objectives Downstream data flow pressure Data Flow – online 23 July 201324Physics computing - Helge Meinhard

25 Data Flow – offline LHC 4 detectors 1000 million events/s 1…4 GB/s Store on disk and tape World-wide analysis Export copies Create sub-samples Physics Explanation of nature Physics Explanation of nature Filter and first selection 10 GB/s 3 GB/s 23 July 2013Physics computing - Helge Meinhard25

26 SI Prefixes Source: wikipedia.org 23 July 2013Physics computing - Helge Meinhard26

27 Data Volumes at CERN Original estimate: 15 Petabytes / year Original estimate: 15 Petabytes / year Tower of CDs: which height? Tower of CDs: which height? Stored cumulatively over LHC running Stored cumulatively over LHC running Only real data and derivatives Only real data and derivatives Simulated data not included Simulated data not included Total of simulated data even larger Total of simulated data even larger Written in 2011: 22 PB Written in 2011: 22 PB Written in 2012: 30 PB Written in 2012: 30 PB Compare with (numbers from mid 2010): Compare with (numbers from mid 2010): Library of Congress: 200 TB E-mail (w/o spam): 30 PB 30 trillion mails at 1 kB each Photos: 1 EB 500 billion photos at 2 MB each 50 PB on Facebook Web: 1 EB Telephone calls: 50 EB … growing exponentially… 2723 July 2013Physics computing - Helge Meinhard

28 Event processing capacity CPU server Meta-data storage Data bases Tape storage Active archive and backup Disk storage Detectors Data import and export Functional Units 23 July 2013Physics computing - Helge Meinhard28

29 Here is my program and I want to analyse the ATLAS data from the special run on June 16 th 14:45h or all data with detector signature X Batch system to decide where is free computing time Data management system where is the data and how to transfer to the program Database system Translate the user request into physical location and provide meta-data (e.g. calibration data) to the program Processing nodes (CPU servers) Disk storage Management software Job Data and Control Flow (1) 2923 July 2013Physics computing - Helge Meinhard

30 Job Data and Control Flow (2) Users Interactive lxplus Data processing lxbatch Repositories - code - metadata - ……. Bookkeeping Data base Disk storage Tape storage Hierarchical Mass Storage Management System (HSM) CASTOR CERN installation 3023 July 2013Physics computing - Helge Meinhard

31 Outline CERNs computing facilities and hardware CERNs computing facilities and hardware Service categories, tasks Service categories, tasks Infrastructure,, databases Infrastructure, networking, databases HEP analysis: techniques and data flows HEP analysis: techniques and data flows Network, plus, batch, storage Network, plus, batch, storage Between HW and services: Agile Infrastructure Between HW and services: Agile Infrastructure References References 23 July 2013Physics computing - Helge Meinhard31

32 Switches in the distribution layer close to servers (Possibly multiple) 10 Gbit uplinks 1 Gbit or 10 Gbit to server CERN Farm Network 3223 July 2013Physics computing - Helge Meinhard

33 CERN Overall Network Hierarchical network topology based on Ethernet: core, general purpose, LCG, technical, experiments, external Hierarchical network topology based on Ethernet: core, general purpose, LCG, technical, experiments, external 180+ very high performance routers 180+ very high performance routers > 6000+ subnets > 6000+ subnets 3600+ switches (increasing) 3600+ switches (increasing) 75000 active user devices (exploding) 75000 active user devices (exploding) 80000 sockets – 5000 km of UTP cable 80000 sockets – 5000 km of UTP cable 5000 km of fibers (CERN owned) 5000 km of fibers (CERN owned) 200 Gbps of WAN connectivity 200 Gbps of WAN connectivity 23 July 2013Physics computing - Helge Meinhard33

34 Interactive Login Service: lxplus Interactive compute facility Interactive compute facility 45 virtual CPU servers running RHEL 6 (default target) 45 virtual CPU servers running RHEL 6 (default target) 45 CPU servers running Linux (RHEL 5 variant) 45 CPU servers running Linux (RHEL 5 variant) Access via ssh from desktops and notebooks under Windows, Linux, MacOS X Access via ssh from desktops and notebooks under Windows, Linux, MacOS X Used for compilation of programs, short program execution tests, some interactive analysis of data, submission of longer tasks (jobs) into the lxbatch facility, internet access, program development, … Used for compilation of programs, short program execution tests, some interactive analysis of data, submission of longer tasks (jobs) into the lxbatch facility, internet access, program development, … Interactive users per server 23 July 2013Physics computing - Helge Meinhard34

35 Processing Facility: lxbatch Today about 3650 processing nodes Today about 3650 processing nodes 3350 physical nodes, SLC5, 48000 job slots 3350 physical nodes, SLC5, 48000 job slots 300 virtual nodes, SLC6, 8000 job slots 300 virtual nodes, SLC6, 8000 job slots Jobs are submitted from lxplus, or channeled through GRID interfaces world-wide Jobs are submitted from lxplus, or channeled through GRID interfaces world-wide About 300000 user jobs per day recently About 300000 user jobs per day recently Reading and writing up to 2 PB per day Reading and writing up to 2 PB per day Uses IBM/Platform Load Sharing Facility (LSF) as a management tool to schedule the various jobs from a large number of users Uses IBM/Platform Load Sharing Facility (LSF) as a management tool to schedule the various jobs from a large number of users Expect a demand growth rate of ~30% per year Expect a demand growth rate of ~30% per year 23 July 2013Physics computing - Helge Meinhard35

36 Data Storage (1) Large disk cache in front of a long term storage tape system: CASTOR data management system, developed at CERN, manages the user IO requests Large disk cache in front of a long term storage tape system: CASTOR data management system, developed at CERN, manages the user IO requests 535 disk servers with 17 PB usable capacity 535 disk servers with 17 PB usable capacity About 65 PB on tape About 65 PB on tape Redundant disk configuration, 2…3 disk failures per day Redundant disk configuration, 2…3 disk failures per day part of the operational procedures part of the operational procedures Logistics again: need to store all data forever on tape Logistics again: need to store all data forever on tape > 25 PB storage added per year, plus a complete copy every 4 years (repack, change of technology) > 25 PB storage added per year, plus a complete copy every 4 years (repack, change of technology) Disk-only use case (analysis): EOS Disk-only use case (analysis): EOS Expect a demand growth rate of ~30% per year Expect a demand growth rate of ~30% per year 23 July 2013Physics computing - Helge Meinhard36

37 314 million files; 75 PB of data on tape already today Data Storage (2) – CASTOR 23 July 201337Physics computing - Helge Meinhard Total: ~ 87 PB On tape: ~ 76 PB Last 2 months: In: avg 1.7 GB/s, peak 4.2 GB/s Out: avg 2.5 GB/s, peak 6.0 GB/s

38 Data Storage (3) - EOS Disk-only storage for analysis use case Disk-only storage for analysis use case Requirements very different from CASTOR Requirements very different from CASTOR 914 servers, 17 PB usable, 144 M files 914 servers, 17 PB usable, 144 M files 23 July 2013Physics computing - Helge Meinhard38 Last 2 months: In: avg 3.4 GB/s, peak 16.6 GB/s Out: avg 11.9 GB/s, peak 30.6 GB/s

39 Other Storage for Physics Databases: metadata, conditions data, … Databases: metadata, conditions data, … AFS, DFS for user files AFS, DFS for user files CVMFS for experiment software releases, copies of conditions data, … CVMFS for experiment software releases, copies of conditions data, … Not a file system, but an http-based distribution mechanism for read-only data Not a file system, but an http-based distribution mechanism for read-only data … 23 July 2013Physics computing - Helge Meinhard39

40 Miscellaneous Services (in IT-PES) Numerous services for Grid computing Numerous services for Grid computing TWiki: Collaborative Web space TWiki: Collaborative Web space More than 250 Twikis, between just a few and more than 8000 Twiki items each (total 142000) More than 250 Twikis, between just a few and more than 8000 Twiki items each (total 142000) Version control services Version control services SVN with SVNWEB/TRAC (2041 active projects, 1230 GB) SVN with SVNWEB/TRAC (2041 active projects, 1230 GB) Git (257 active projects, 7.5 GB) Git (257 active projects, 7.5 GB) Issue tracking service Issue tracking service Atlassian JIRA, Greenhopper, Fisheye, Crucible, Bamboo, … Atlassian JIRA, Greenhopper, Fisheye, Crucible, Bamboo, … 138 projects, 15540 issues, 2942 users 138 projects, 15540 issues, 2942 users BOINC: Framework for volunteer computing BOINC: Framework for volunteer computing 23 July 2013Physics computing - Helge Meinhard40

41 World-wide Computing for LHC CERNs resources by far not sufficient CERNs resources by far not sufficient World-wide collaboration between computer centres World-wide collaboration between computer centres WLCG: World-wide LHC Computing Grid WLCG: World-wide LHC Computing Grid Web, Grids, clouds, WLCG, EGEE, EGI, EMI, …: See Fabrizio Furanos lecture on July 30 th Web, Grids, clouds, WLCG, EGEE, EGI, EMI, …: See Fabrizio Furanos lecture on July 30 th 23 July 2013Physics computing - Helge Meinhard41

42 Outline CERNs computing facilities and hardware CERNs computing facilities and hardware Service categories, tasks Service categories, tasks Infrastructure,, databases Infrastructure, networking, databases HEP analysis: techniques and data flows HEP analysis: techniques and data flows Network, plus, batch, storage Network, plus, batch, storage Between HW and services: Agile Infrastructure Between HW and services: Agile Infrastructure References References 23 July 2013Physics computing - Helge Meinhard42

43 Between Hardware and Services Until recently: dedicated hardware, OS and software set up according to service Until recently: dedicated hardware, OS and software set up according to service CERN-proprietary tools (written 2001…2003): ELFms, Quattor, LEMON, … CERN-proprietary tools (written 2001…2003): ELFms, Quattor, LEMON, … Challenges: Challenges: Very remote data centre extension Very remote data centre extension IT staff numbers remain fixed but more computing capacity is needed IT staff numbers remain fixed but more computing capacity is needed Tools are high maintenance and becoming increasingly brittle Tools are high maintenance and becoming increasingly brittle Inefficiencies exist but root cause cannot be easily identified and/or fixed Inefficiencies exist but root cause cannot be easily identified and/or fixed 23 July 2013Physics computing - Helge Meinhard43

44 Reviewed areas of Reviewed areas of Configuration management Configuration management Monitoring Monitoring Infrastructure layer Infrastructure layer Guiding principles Guiding principles We are no longer a special case for computing We are no longer a special case for computing Adopt a tool chain model using existing open source tools Adopt a tool chain model using existing open source tools If we have special requirements, challenge them again and again If we have special requirements, challenge them again and again If useful, make generic and contribute back to the community If useful, make generic and contribute back to the community CERN-IT: Agile Infrastructure 23 July 2013Physics computing - Helge Meinhard44

45 Configuration Management Puppet chosen as the core tool Puppet chosen as the core tool Puppet and Chef are the clear leaders for core tools Puppet and Chef are the clear leaders for core tools Many large enterprises now use Puppet Many large enterprises now use Puppet Its declarative approach fits what were used to at CERN Its declarative approach fits what were used to at CERN Large installations: friendly, wide-based community Large installations: friendly, wide-based community The PuppetForge contains many pre-built recipes The PuppetForge contains many pre-built recipes And accepts improvements to improve portability and function And accepts improvements to improve portability and function Training and support available; expertise is valuable on job market Training and support available; expertise is valuable on job market Additional tools: Foreman for GUI/dashboard; GIT for version control; Mcollective for remote execution; Hiera for conditional configuration; PuppetDB as configuration data warehouse Additional tools: Foreman for GUI/dashboard; GIT for version control; Mcollective for remote execution; Hiera for conditional configuration; PuppetDB as configuration data warehouse 23 July 2013Physics computing - Helge Meinhard45

46 Monitoring: Evolution Motivation Motivation Several independent monitoring activities in IT Several independent monitoring activities in IT Based on different tool-chain but sharing same limitations Based on different tool-chain but sharing same limitations High level services are interdependent High level services are interdependent Combination of data and complex analysis necessary Combination of data and complex analysis necessary Quickly answering questions you hadnt though of when data recorded Quickly answering questions you hadnt though of when data recorded Challenge Challenge Find a shared architecture and tool-chain components Find a shared architecture and tool-chain components Adopt existing tools and avoid home grown solutions Adopt existing tools and avoid home grown solutions Aggregate monitoring data in a large data store Aggregate monitoring data in a large data store Correlate monitoring data and make it easy to access Correlate monitoring data and make it easy to access 23 July 2013Physics computing - Helge Meinhard46

47 Monitoring: Architecture 23 July 2013Physics computing - Helge Meinhard47 Application Specific Aggregation Storage Feed Analysis Storage Analysis Storage Alarm Feed SNOW Portal Report Custom Feed Publisher Sensor Publisher Sensor Portal ActiveMQ/ Apollo Lemon Hadoop Oracle Kibana?

48 Move to the clouds … or Infrastructure as a Service (IaaS) Rationale: Rationale: Improve operational efficiency Improve operational efficiency Machine reception/testing by rapidly deploying batch VMs Machine reception/testing by rapidly deploying batch VMs Hardware interventions with long running programs by live migration Hardware interventions with long running programs by live migration Multiple operating system coverage by multi-VMs per hypervisor Multiple operating system coverage by multi-VMs per hypervisor Improve resource efficiency Improve resource efficiency Exploit idle resources such as service nodes by packing with batch Exploit idle resources such as service nodes by packing with batch Variable load such as interactive/build machines with suspend/resume Variable load such as interactive/build machines with suspend/resume Improve responsiveness Improve responsiveness Self-Service web interfaces rather than tickets to request resources Self-Service web interfaces rather than tickets to request resources Coffee break response time Coffee break response time Support middleware simplification and cloud studies Support middleware simplification and cloud studies Could we run LHC infrastructure with only an Amazon like interface? Could we run LHC infrastructure with only an Amazon like interface? Previous experience with lxcloud (OpenNebula) and CERN Virtualisation Infrastructure (Microsoft SCVMM) Previous experience with lxcloud (OpenNebula) and CERN Virtualisation Infrastructure (Microsoft SCVMM) 23 July 2013Physics computing - Helge Meinhard48

49 IaaS: Openstack Cloud operating system/orchestrator Cloud operating system/orchestrator Controls large pools of compute, storage, and networking resources Controls large pools of compute, storage, and networking resources Dashboard gives administrators control Dashboard gives administrators control Users to provision resources through a web interface Users to provision resources through a web interface Components used: compute, dashboard, image store, object storage, block storage, identity management, network, load balancing/high- availability Components used: compute, dashboard, image store, object storage, block storage, identity management, network, load balancing/high- availability Fully integrated with CERNs Active Directory via LDAP Fully integrated with CERNs Active Directory via LDAP Potential long-term impact on Grid middleware Potential long-term impact on Grid middleware 23 July 2013Physics computing - Helge Meinhard49

50 Conclusions The Large Hadron Collider (LHC) and its experiments are very data (and compute) intensive projects The Large Hadron Collider (LHC) and its experiments are very data (and compute) intensive projects Implemented using right blend of new technologies and commodity approaches Implemented using right blend of new technologies and commodity approaches Scaling computing to the requirements of LHC is hard work Scaling computing to the requirements of LHC is hard work IT power consumption/efficiency is a primordial concern IT power consumption/efficiency is a primordial concern Computing has worked very well during Run I (at 2 * 4 TeV), instrumental for discovering a Higgs particle Computing has worked very well during Run I (at 2 * 4 TeV), instrumental for discovering a Higgs particle We are on track for further ramp-ups of the computing capacity for future requirements We are on track for further ramp-ups of the computing capacity for future requirements Additional, remote data centre Additional, remote data centre AI project covering configuration and installation; IaaS; monitoring AI project covering configuration and installation; IaaS; monitoring 23 July 2013Physics computing - Helge Meinhard50

51 Thank you 23 July 2013Physics computing - Helge Meinhard51

52 Outline CERNs computing facilities and hardware CERNs computing facilities and hardware Service categories, tasks Service categories, tasks Infrastructure,, databases Infrastructure, networking, databases HEP analysis: techniques and data flows HEP analysis: techniques and data flows Network, plus, batch, storage Network, plus, batch, storage Between HW and services: Agile Infrastructure Between HW and services: Agile Infrastructure References References 23 July 2013Physics computing - Helge Meinhard52

53 http://sls.cern.ch/sls/index.php http://lemonweb.cern.ch/lemon-status/ http://gridview.cern.ch/GRIDVIEW/dt_index.php http://gridportal.hep.ph.ic.ac.uk/rtm/ http://it-dep.web.cern.ch/it-dep/ IT department Monitoring (currently in production) Lxplus Lxbatch http://plus.web.cern.ch/plus/ http://batch.web.cern.ch/batch/ CASTOR http://castor.web.cern.ch/castor/ More Information (1) EOS http://eos.web.cern.ch/eos/ 23 July 2013Physics computing - Helge Meinhard53

54 More Information (2) In case of further questions dont hesitate to contact me: Helge.Meinhard (at) cern.ch Grid and WLCG Computing and Physics http://www.chep2012.org/ http://lcg.web.cern.ch/LCG/public/default.htm http://www.egi.eu/http://www.egi.eu/, http://www.eu-emi.eu, http://www.eu-egee.org/http://www.eu-emi.euhttp://www.eu-egee.org/ Windows, Web, Mail https://winservices.web.cern.ch/winservices/ Data centre upgrades http://cern.ch/go/NN98 Agile Infrastructure project http://cern.ch/go/N8wp http://cern.ch/go/99Ck https://indico.cern.ch/conferenceDisplay.py?confId=184791 23 July 2013Physics computing - Helge Meinhard54

55 BACKUP SLIDES 23 July 2013Physics computing - Helge Meinhard55

56 Monitoring Large scale monitoring Large scale monitoring Surveillance of all nodes in the computer centre Surveillance of all nodes in the computer centre Hundreds of parameters in various time intervals, from minutes to hours, per node and service Hundreds of parameters in various time intervals, from minutes to hours, per node and service Data base storage and Interactive visualisation Data base storage and Interactive visualisation 23 July 2013Physics computing - Helge Meinhard56

57 About 8000 servers installed in centre About 8000 servers installed in centre Assume 3...4 years lifetime for the equipment Assume 3...4 years lifetime for the equipment Key factors: power efficiency, performance, reliability Key factors: power efficiency, performance, reliability Demands by experiments require investments of ~ 15 MCHF/year for new PC hardware and infrastructure Demands by experiments require investments of ~ 15 MCHF/year for new PC hardware and infrastructure Infrastructure and operation setup needed for ~3500 nodes installed per year ~3500 nodes installed per year ~3500 nodes removed per year ~3500 nodes removed per year Installation in racks, cabling, automatic installation, Linux software environment Installation in racks, cabling, automatic installation, Linux software environment Hardware Management 23 July 2013Physics computing - Helge Meinhard57

58 Software Glue Basic hardware and software management Basic hardware and software management Installation, configuration, monitoring (Quattor, Lemon, ELFms) Installation, configuration, monitoring (Quattor, Lemon, ELFms) Which version of Linux? How to upgrade? What is going on? Load? Failures? Which version of Linux? How to upgrade? What is going on? Load? Failures? Management of processor computing resources Management of processor computing resources Batch scheduler (LSF of Platform Computing Inc.) Batch scheduler (LSF of Platform Computing Inc.) Where are free processors? How to set priorities between users? Sharing of resources? How are results flowing back? Where are free processors? How to set priorities between users? Sharing of resources? How are results flowing back? Storage management (disk and tape) Storage management (disk and tape) CERN developed HSM called Castor CERN developed HSM called Castor Where are the files? How to access them? How much space is available? What is on disk, what on tape? Where are the files? How to access them? How much space is available? What is on disk, what on tape? 23 July 2013Physics computing - Helge Meinhard58

59 Physics data 36000 TB and 50 million files per year Administrative data 3.3 million electronic documents 280000 electronic docs per year 55000 electronic signatures per month 60000 emails per day 250000 orders per year > 1000 million user files backup per hour and per day continuous storage Users accessibility 24*7*52 = always tape storage forever accessibility 24*7*52 = always tape storage forever Storage 23 July 201359Physics computing - Helge Meinhard

60 Building 513 (3) – Latest Upgrade 23 July 2013Physics computing - Helge Meinhard60

61 Hardware Software Components CPU, disk server CPU, disk, memory, mainbord Operating system, device drivers Network, interconnects Cluster, local fabric Cluster, local fabric World-wide cluster World-wide cluster Resource management software Grid and cloud management software Wide area network Complexity / scale Physical and Logical Connectivity 23 July 201361Physics computing - Helge Meinhard Firmware

62 Disk servers Tape servers and tape libraries CPU servers 33000 processor cores 1800 NAS servers, 21000 TB, 40000 disks 160 tape drives, 53000 tapes 71000 TB capacity, 53000 TB used 160 tape drives, 53000 tapes 71000 TB capacity, 53000 TB used ORACLE Data base servers Network router ~ 400 servers ~ 300 TB raw ~ 400 servers ~ 300 TB raw 160 Gbits/s 2.9 MW electricity and cooling 2700 m 2 CERN Computer Centre 23 July 2013Physics computing - Helge Meinhard62

63 BNL New York ASGC/Taipei CCIN2P3/Lyon TRIUMF Vancouver RAL Rutherford CNAF Bologna CERNTIER2s FZK Karlsruhe NDGF Nordic countries PIC Barcelona FNAL Chicago NIKHEF/SARA Amsterdam THE REST OF THE WORLD… 23 July 2013Physics computing - Helge Meinhard63


Download ppt "Physics Computing at CERN Helge Meinhard CERN, IT Department OpenLab Student Lecture 23 July 2013."

Similar presentations


Ads by Google