Hepmark Valutazione della potenza dei nodi di calcolo nella HEP Michele Michelotto Padova Ferrara Bologna.

Slides:



Advertisements
Similar presentations
1 ALICE Grid Status David Evans The University of Birmingham GridPP 14 th Collaboration Meeting Birmingham 6-7 Sept 2005.
Advertisements

Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA.
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Green Datacenters solution March 17, 2009 By: Sanjay Sharma.
Re-examining Instruction Reuse in Pre-execution Approaches By Sonya R. Wolff Prof. Ronald D. Barnes June 5, 2011.
Chapter 8 Interfacing Processors and Peripherals.
I/O Chapter 8. Outline Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
1 Sizing the Streaming Media Cluster Solution for a Given Workload Lucy Cherkasova and Wenting Tang HPLabs.
KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.
Grids e HEP. Concorde (15 Km) Balloon (30 Km) CD stack with 1 year LHC data! (~ 20 Km) Mt. Blanc (4.8 Km) Bytes 10 3 Terabytes 1 Petabyte.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Computer Abstractions and Technology
Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.
Hepmark project Evaluation of HEP worker nodes Michele Michelotto at pd.infn.it.
Duke Atlas Tier 3 Site Doug Benjamin (Duke University)
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
A comparison of HEP code with SPEC benchmark on multicore worker nodes HEPiX Benchmarking Group Michele Michelotto at pd.infn.it.
Performance benchmark of LHCb code on state-of-the-art x86 architectures Daniel Hugo Campora Perez, Niko Neufled, Rainer Schwemmer CHEP Okinawa.
HS06 on the last generation of CPU for HEP server farm Michele Michelotto 1.
Moving out of SI2K How INFN is moving out of SI2K as a benchmark for Worker Nodes performance evaluation Michele Michelotto at pd.infn.it.
Test results Test definition (1) Istituto Nazionale di Fisica Nucleare, Sezione di Roma; (2) Istituto Nazionale di Fisica Nucleare, Sezione di Bologna.
Transition to a new CPU benchmark on behalf of the “GDB benchmarking WG”: HEPIX: Manfred Alef, Helge Meinhard, Michelle Michelotto Experiments: Peter Hristov,
University of Maryland Compiler-Assisted Binary Parsing Tugrul Ince PD Week – 27 March 2012.
Computer Performance Computer Engineering Department.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Last Time Performance Analysis It’s all relative
3. April 2006Bernd Panzer-Steindel, CERN/IT1 HEPIX 2006 CPU technology session some ‘random walk’
F. Brasolin / A. De Salvo – The ATLAS benchmark suite – May, Benchmarking ATLAS applications Franco Brasolin - INFN Bologna - Alessandro.
Temperature Aware Load Balancing For Parallel Applications Osman Sarood Parallel Programming Lab (PPL) University of Illinois Urbana Champaign.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
LHC Computing Review - Resources ATLAS Resource Issues John Huth Harvard University.
Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.
HS06 on new CPU, KVM virtual machines and commercial cloud Michele Michelotto 1.
Fast Benchmark Michele Michelotto – INFN Padova Manfred Alef – GridKa Karlsruhe 1.
Hyper Threading Technology. Introduction Hyper-threading is a technology developed by Intel Corporation for it’s Xeon processors with a 533 MHz system.
HS06 on last generation of HEP worker nodes Berkeley, Hepix Fall ‘09 INFN - Padova michele.michelotto at pd.infn.it.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
HS06 performance per watt and transition to SL6 Michele Michelotto – INFN Padova 1.
HEPMARK2 Consiglio di Sezione 9 Luglio 2012 Michele Michelotto - Padova.
From Westmere to Magny-cours: Hep-Spec06 Cornell U. - Hepix Fall‘10 INFN - Padova michele.michelotto at pd.infn.it.
New CPU, new arch, KVM and commercial cloud Michele Michelotto 1.
The last generation of CPU processor for server farm. New challenges Michele Michelotto 1.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
PASTA 2010 CPU, Disk in 2010 and beyond m. michelotto.
Moving out of SI2K How INFN is moving out of SI2K as a benchmark for Worker Nodes performance evaluation Michele Michelotto at pd.infn.it.
CERN IT Department CH-1211 Genève 23 Switzerland t IHEPCCC/HEPiX benchmarking WG Helge Meinhard / CERN-IT Grid Deployment Board 09 January.
SI2K and beyond Michele Michelotto – INFN Padova CCR – Frascati 2007, May 30th.
Computer Architecture & Operations I
Benchmarking of CPU models for HEP application
Measuring Performance II and Logic Design
Brief introduction about “Grid at LNS”
Evaluation of HEP worker nodes Michele Michelotto at pd.infn.it
CCR Autunno 2008 Gruppo Server
Gruppo Server CCR michele.michelotto at pd.infn.it
Computer Architecture & Operations I
CS161 – Design and Architecture of Computer Systems
SuperB and its computing requirements
How to benchmark an HEP worker node
Green cloud computing 2 Cs 595 Lecture 15.
Low Power processors in HEP
Gruppo Server CCR michele.michelotto at pd.infn.it
How INFN is moving out of SI2K has a benchmark for Worker Nodes
Morgan Kaufmann Publishers
CERN Benchmarking Cluster
INFN - Padova michele.michelotto at pd.infn.it
Support for ”interactive batch”
CS161 – Design and Architecture of Computer Systems
Exploring Multi-Core on
Presentation transcript:

Hepmark Valutazione della potenza dei nodi di calcolo nella HEP Michele Michelotto Padova Ferrara Bologna

CdS - Luglio 2008 michele michelotto - INFN PD2 Modello di computing Tier3 physics department Desktop Germany Tier-1 UK France Italy CERN Tier 1 Japan CERN Tier 0 Tier-2 Lab a Uni a Lab c Uni n Lab m Lab b Uni b Uni y Uni x grid for a regional group USA BNL USA FNAL grid for a physics study group Centri di vario livello

CdS - Luglio 2008 michele michelotto - INFN PD3 Esigenze di Computing Tape and disk Storage –Very Easy: events Terabyte Disk Storage –Easy again: events Terabyte –(1000x1000 or 1024x1024?) –RAID protected or raw size? Computing Power –Tricky: Event/sec? Sim or Reco? –MIPS, CernUnit, MHz, Spec, SI2K….

CdS - Luglio 2008 michele michelotto - INFN PD4 SI2K is the benchmark used up to now to measure the computing power of all the HEP experiments –Computing power requested by experiment –Computing power provided by a Tier-[0,1,2] SI2K is the nickname for SPEC CPU Int 2000 benchmark –Came after Spec89, Spec Int 92 and Spec Int 95 –Declared obsolete by SPEC in 2006 –Replaced by SPEC with CPU Int 2006

CdS - Luglio 2008 michele michelotto - INFN PD5 T1 + T2 cpu budget - LHC Misura in k SI2K

CdS - Luglio 2008 michele michelotto - INFN PD6 The SI2K inflaction The main problems with SI2000 in our community: it is not proportional to HEP codes performance (as it was) You can buy processors with huge SI2K number but with a smaller increase in real performances SI2K results for the last generation processor affected by inflation

CdS - Luglio 2008 michele michelotto - INFN PD7 Nominal SI vs real SI So CERN (and FZK) started to use a new currency: SI2K measured with gcc, the gnu C compiler and using two flavour of optimization –High tuning: gcc –O3 –funroll-loops– march=$ARCH –Low tuning: gcc –O2 –fPIC –pthread

CdS - Luglio 2008 michele michelotto - INFN PD8 CERN Proposal: Use as site rating the Real SI obtained by SI measured with gcc-low and increased by 50% –Actually this make sense only for a short period of time and for the last generation of processor Run n copies in parallel –Where n is the number of cores in the worker node –To take in account the drop in performance of a multicore machine when fully loaded.

CdS - Luglio 2008 michele michelotto - INFN PD9 Too many SI2K Take as an example a worker node with two Intel Woodcrest dual core 5160 at 3.06 GHz SI2K nominal: 2929 – 3089 (min – max) SI2K sum on 4 cores: SI2K gcc-low: 5523 SI2K gcc-high: 7034 SI2K gcc-low + 50%: 8284 esempio The goal is to find a commercial mantained benchmark to replace SI2K

CdS - Luglio 2008 michele michelotto - INFN PD10 Cache : importance of the cache architecture –1st level, 2nd level, 3rd level, cache latency (tempo), Cache bandwidth (vel trasfer), shared or exclusive? Access time to memory Power consumption Example: a big Tier2 with 500 boxes needs 100kW –About 800 MWh in one year –Energy cost 0.12 Euro per kWh Energy bills of 100 kEuro/year –A 10% improvement on Power efficiency means 10 k/year savings –And savings on the infrastructure (power distribution, UPS, Cooling)

CdS - Luglio 2008 michele michelotto - INFN PD11 Many gaps Difficult to measure: –Not easy to have machine on loan from Server reseller or producer –Not easy to borrow machine from colleagues –Always for short periods of time –A SPEC run can last hours Need a set of dedicated worker node to make SPEC and HEP application measurement

CdS - Luglio 2008 michele michelotto - INFN PD12 Padova: Michele Michelotto (1° Tecn.) 0.70, Matteo Menguzzato (Univ) 0.40 Ferrara: Alberto Gianoli (1° Tecn.): 0.20 Bologna : Franco Brasolin (CTER): 0.20 TOT FTE 1.5 Milestone 2009 Undestand SPEC Propose a new benchmark to replace SI2K Measure the performance of the current architectures for Montecarlo SIM (evt/sec vs SPEC) 2009/2010 Power performances, Cache profiling int estero consumo inventario TOTALI FE PD Totali

CdS - Luglio 2008 michele michelotto - INFN PD13 Mem intel vs amd Who is faster? It depends on the block size On the red zones Intel is better. On the green zone AMD is better

CdS - Luglio 2008 michele michelotto - INFN PD14 Cache behaviour 54xx has lower latency even with bigger cache The 3 processors behave very differently in the 4MB e 64MB range If your (HEP) application works in this range you will see a big change of performance changing processor

CdS - Luglio 2008 michele michelotto - INFN PD15 CMS sw SIM and Pythia CMS Montecarlo simulation (32bit) and Pythia (64bit) show the same performance once normalized Both Specint 2006 pubblished and Specint 2006 with gcc show the same behaviour SI2K pubbished does not match HEP sw SI2K cern better but not as good as SI2006

CdS - Luglio 2008 michele michelotto - INFN PD16 Babar TierA Results If you normalize by core and clock all new processors have the same performance Doubling the older generation cpu SI2006 matches this pattern (pubblished and gcc ratio constant) SI2000-cern better than SI2K nominal SI2000 clearly doesnt work

CdS - Luglio 2008 michele michelotto - INFN PD17 4 core processor

CdS - Luglio 2008 michele michelotto - INFN PD18 Intel 54xx

CdS - Luglio 2008 michele michelotto - INFN PD19 AMD 4core

CdS - Luglio 2008 michele michelotto - INFN PD20 Load transactional (confronto tra processori) Performance dont drop in the new 4core processor Clovertown drop wrt Harpwertown A dual core processor keeps only up to Load3

CdS - Luglio 2008 michele michelotto - INFN PD21 Perf/watt AMD Barcelona at 65nm Performance per watt similar to INTEL xeon at 45nm

CdS - Luglio 2008 michele michelotto - INFN PD22 Cache behaviour 54xx has lower latency even with bigger cache The 3 processors behave very differently in the 4MB e 64MB range If your (HEP) application works in this range you will see a big change of performance changing processor

CdS - Luglio 2008 michele michelotto - INFN PD23 Memory intel vs amd Access time very similar At 1GB (tipical footprint of HEP application) the new AMD behave better But the new are Xeon 54xx much better than the 53xx

CdS - Luglio 2008 michele michelotto - INFN PD24 Mem intel vs amd Who is faster? It depends on the block size On the red zones Intel is better. On the green zone AMD is better

CdS - Luglio 2008 michele michelotto - INFN PD25 Cache behaviour We need to study the behaviour of tipical HEP application –Simulation, event generation, Reconstruction, Analysis –To understand how to write more efficient application

CdS - Luglio 2008 michele michelotto - INFN PD26 Power issues Power consumption change from one processor to another – Clock, High-K dielectric, Active Power Managements, Clock throttling

CdS - Luglio 2008 michele michelotto - INFN PD27 An HEP data center Need to make measurement of Power usage for HEP application Example: a big Tier2 with 500 boxes needs 100kW –Like the whole CED of INFN Padova –About 800 MWh in one year –Energy cost 0.12 Euro per kWh Energy bills of 100 kEuro/year –A 10% improvement on Power efficiency means 10 k/year savings –And savings on the infrastructure (power distribution, UPS, Cooling)

CdS - Luglio 2008 michele michelotto - INFN PD28 Financial request Need to buy a new worker node each time a new processor is released in the dual proc market segment –Only if significantly new features are presents –One or two each for INTEL and AMD per year –4 kEuro each (dual proc, 2GB/core, 1disk) –2 box to start with

CdS - Luglio 2008 michele michelotto - INFN PD29 Transition problem Impossible to find SPEC Int 2000 pubblished results for the new processors (e.g. the not so new Clovertown 4-core) Impossible to find pubblished SPEC Int 2006 for old processor (before 2006) –E.g. Old P4 Xeon, P4, AMD 2xx You cant convert from SI2000 to SI2006 but the ratio for x86 architecture is in the 137 – 172 range

CdS - Luglio 2008 michele michelotto - INFN PD30 Even more Actually all the gcc results in the previous slide are on i386 (32bit) if you would like to know how your code is running on 64 bit machine, you can measure Specint INT 2000 with gcc on x86_64. So the worker node with two Intel Woodcrest dual core 5160 at 3.06 GHz SI2K nominal: 2929 – 3089 (min – max) SI2K on 4 cores: SI2K gcc-low: 6021 SI2K gcc-high: 6409 SI2K gcc-low + 50%: 9031

CdS - Luglio 2008 michele michelotto - INFN PD31 Atlas Here 100% is Xeon5160 Few results for SI2006+gcc but no diff from CMS and babar Few results also from SI2006 pubblished because of several old architectures SI2K+gcc not bad SI2K pubblished heavily overstimate new Xeon Atlas simulation normalized performs the same on the new intel core or amd opteron (like CMS, Babar)

CdS - Luglio 2008 michele michelotto - INFN PD32 Power consumption

CdS - Luglio 2008 michele michelotto - INFN PD33 Power meter Need a device to measure Voltage and Current And logging capabilities E.g. Fluke 1735

CdS - Luglio 2008 michele michelotto - INFN PD34 FZK Measurement In 2001 SPEC with gcc was 80% of the average pubblished data In 2006 the gap was much wider

CdS - Luglio 2008 michele michelotto - INFN PD35 Which is the better? I started to measure performances of HEP codes on several machines The goal was to find a commercial mantained benchmark to replace SI2K I compared HEP code with –SI2K pubblished results –SI2K measured with gcc and CERN tuning –SI2006 and SI2006 rate pubblished results –SI2006 and SI2006 with gcc4 (32 and 64 bit)

CdS - Luglio 2008 michele michelotto - INFN PD36 Cache In the 80s the latency (3-10 clock time) Now latency is 1000s of clock time Importance of the cache architecture –1st level, 2nd level, 3rd level –Cache latency (tempo) –Cache bandwidth (vel trasfer) –Shared or exclusive?