Definitions Information System Task Force 8 th January 2016 1.

Slides:

Advertisements

Similar presentations

Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.

Advertisements

Configuration management

Performance of Cache Memory

The GridPP Wiki The case in favour of it Generic, standard information (e.g. GridPP Approved VOs) Common use cases.

Fault-tolerant Adaptive Divisible Load Scheduling Xuan Lin, Sumanth J. V. Acknowledge: a few slides of DLT are from Thomas Robertazzi ’ s presentation.

University of Toronto Department of Computer Science © Steve Easterbrook. This presentation is available free for non-commercial use with attribution.

Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.

Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.

Andrew McNab - Manchester HEP - 22 April 2002 UK Rollout and Support Plan Aim of this talk is to the answer question “As a site admin, what are the steps.

Microsoft ® SQL Server ® 2008 and SQL Server 2008 R2 Infrastructure Planning and Design Published: February 2009 Updated: January 2012.

ITEC224 Database Programming

DISTRIBUTED DATABASES IN ADBMS Shilpa Seth

Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.

1 BIG FARMS AND THE GRID Job Submission and Monitoring issues ATF Meeting, 20/06/03 Sergio Andreozzi.

2 FOR INTERNAL USE ONLY Project Chartering  Define the components of a project charter  Develop a project idea into an effective project charter  Review.

Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.

Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.

Ideas for a virtual analysis facility Stefano Bagnasco, INFN Torino CAF & PROOF Workshop CERN Nov 29-30, 2007.

Multi-core jobs at the RAL Tier-1 Andrew Lahiff, Alastair Dewhurst, John Kelly February 25 th 2014.

Fast Benchmark Michele Michelotto – INFN Padova Manfred Alef – GridKa Karlsruhe 1.

Towards a Global Service Registry for the World-Wide LHC Computing Grid Maria ALANDES, Laurence FIELD, Alessandro DI GIROLAMO CERN IT Department CHEP 2013.

The History Exam Germany 1 hour and 15 minutes 4 questions.

Report on Installed Resource Capacity Flavia Donno CERN/IT-GS WLCG GDB, CERN 10 December 2008.

7 th and 8 th Grade Mathematics Curriculum Supports Eric Shippee College of William and Mary Alfreda Jernigan Norfolk Public Schools.

INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.

Conference name Company name INFSOM-RI Speaker name The ETICS Job management architecture EGEE ‘08 Istanbul, September 25 th 2008 Valerio Venturi.

7 th and 8 th Grade Mathematics Curriculum Supports Eric Shippee College of William and Mary Alfreda Jernigan Norfolk Public Schools.

Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.

Benchmarking status Status of Benchmarking Helge Meinhard, CERN-IT WLCG Management Board 14-Jul Helge Meinhard (at) CERN.ch.

EMI INFSO-RI Accounting John Gordon (STFC) APEL PT Leader.

1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.

Benchmarking Benchmarking in WLCG Helge Meinhard, CERN-IT HEPiX Fall 2015 at BNL 16-Oct Helge Meinhard (at) CERN.ch.

Automatic Resource & Usage Monitoring Steve Traylen/Flavia Donno CERN/IT.

Ian Gable HEPiX Spring 2009, Umeå 1 VM CPU Benchmarking the HEPiX Way Manfred Alef, Ian Gable FZK Karlsruhe University of Victoria May 28, 2009.

DataTAG is a project funded by the European Union DataTAG WP4 meeting, Bologna 29/07/2003 – n o 1 GLUE Schema - Status Report DataTAG WP4 meeting Bologna,

Assoc. Prof. Dr. Ahmet Turan ÖZCERİT.  What Operating Systems Do  Computer-System Organization  Computer-System Architecture  Operating-System Structure.

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks John Gordon SA1 Face to Face CERN, June.

APEL Cloud Accounting Status and Plans APEL Team John Gordon.

Report on Installed Resource Capacity Flavia Donno CERN/IT-GS WLCG Management Board, CERN 25 November 2008.

What is Assessment? NQT Programme: Developing effective classroom practices.

Next Steps after WLCG workshop Information System Task Force 11 th February

Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.

Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.

Installation Accounting Status Flavia Donno CERN/IT-GS WLCG Management Board, CERN 28 October 2008.

Accounting John Gordon WLC Workshop 2016, Lisbon.

II EGEE conference Den Haag November, ROC-CIC status in Italy

Exploring delegation a workshop for registered staff Louise Williams Healthcare Support Worker Development Coordinator, Powys Teaching Health Board.

WLCG Accounting Task Force Update Julia Andreeva CERN GDB, 8 th of June,

CERN LCG1 to LCG2 Transition Markus Schulz LCG Workshop March 2004.

Accounting Update John Gordon. Outline Multicore CPU Accounting Developments Cloud Accounting Storage Accounting Miscellaneous.

Maria Alandes Pradillo, CERN Training on GLUE 2 information validation EGI Technical Forum September 2013.

CREAM Status and plans Massimo Sgaravatto – INFN Padova

WLCG Accounting Task Force Introduction Julia Andreeva CERN 9 th of June,

Massimo Sgaravatto INFN Padova

Title of the Poster Supervised By: Prof.*********

Control System and Automatic Sequences

First proposal for a modification of the GIS schema

Performance Measurement of Processes and Threads Controlling, Tracking and Monitoring Based on Shared-Memory Parallel Processing Approach Asst. Prof.

Benchmarking Changes and Accounting

Raw Wallclock in APEL John Gordon, STFC-RAL

Proposal for obtaining installed capacity

Report on SLA progress Ioannis Liabotis <ilaboti at grnet.gr>

Lecture 2 Introduction to Programming

WLCG dedicated accounting portal (proposal)

Using the Parallel Universe beyond MPI

CSCE 313 – Introduction to UNIx process

Stephen Burke egi.eu EGI TF Prague September 20th 2012

Kajornsak Piyoungkorn,

Presentation transcript:

Definitions Information System Task Force 8 th January

What we are trying to define For GLUE 2 Logical CPUs (GLUE2ExecutionEnvironmentLogicalCPUs) Benchmark Value (GLUE2BenchmarkValue) 2

Existing definitions in WLCG (I) Definitions in the Installed Capacities document: i.e. GlueSubClusterLogicalCPUs Total number of cores/hyperthreaded CPUs in the SubCluster. In other words, LogicalCPUs counts the number of computing units seen by the OS on the WNs. Sites typically configure one job slot per logical CPU, but some sites allow more than this (e.g. five jobs per four cores) to allow for the fact that jobs spend some time waiting for i/o. The 1.3 GLUE schema does not allow such an over allocation to be published explicitly. Please note that again LogicalCPUs is a static number manually configured by the system administrator at a site and does not reflect the dynamic state of the WNs 3

Existing definitions in WLCG (II) What about Benchmarks? At the time the Installed Capacities document was written, it was the transition between SpecInt2000 and HEPSPEC The proposal is in any case to find the power per core (dividing the total score by the number of cores in the box) 4

More documentation referenced by WLCG How to define SI00, HEP-SPEC, MaxCPUTime, MaxWallClockTime For historical reasons, even if a Glue SubCluster is an homogeneous set of WNs, sites normally define one heterougeneous subcluster per cluster per CE and publish an average WN information Double counting problem (!) Can we get rid of this in GLUE 2? 5

Proposal by Andrew McNab [1] GLUE2ExecutionEnvironmentLogicalCPUs: the number of single-process benchmark instances run when benchmarking the Execution Environment, corresponding to the number of processors which may be allocated to jobs. Typically this is the number of processors seen by the operating system on one Worker Node (that is the number of "processor :" lines in /proc/cpuinfo on Linux), but potentially set to more or less than this for performance reasons. This value corresponds to the total number of processors which may be reported to APEL by jobs running in parallel in this Execution Environment, found by adding the values of the "Processor" keys in all of their accounting records. GLUE2BenchmarkValue: the average benchmark when a single-process benchmark instance is run for each processor which may be allocated to jobs. Typically the number of processors which may be allocated corresponds to the number seen by the operating system on the worker node (that is the number of "processor :" lines in /proc/cpuinfo on Linux), but potentially set to more or less than this for performance reasons. This should be equal to the benchmark ServiceLevel in the APEL accounting record of a single-processor job, where the APEL "Processors" key will have the value 1. 6

Proposal by Brian Bockelman [2] GLUE2ExecutionEnvironmentLogicalCPUs: the number of single-process benchmark instances run when benchmarking the Execution Environment. GLUE2BenchmarkValue: the average benchmark result when $(GLUE2ExecutionEnvironmentLogicalCPUs) single- threaded benchmark instances are run in the execution environment in parallel. 7

Feedback from sys admins Several sys admins contacted 1. Describe with as many details as possible the HW you will use in this example with the terminology you normally use. Add as many explanations and details as possible so we can understand your numbers and how you calculate things. 2. Read definitions in [1] and provide the numbers for your example according to what you have understood: GLUE2ExecutionEnvironmentLogicalCPUs GLUE2BenchmarkValue 3. Do the same for [2] GLUE2ExecutionEnvironmentLogicalCPUs GLUE2BenchmarkValue 4. Which definition you prefer? 5. Is it crystal clear on what needs to be provided after reading the definition? If not, what is it? What are you missing? Very useful feedback from Steve Jones, Alessandra Doria and Manfred Alef. Thanks very much! 8

Feedback from Steve Jones Both definitions give the same figures Preference for definition [1] with benchmarking process properly documented See attachment in Indico with benchmarking process description Very useful GridPP publishing tutorial We should have something similar! It contains references to Accounting and includes other attributes that may be also relevant to define in this context i.e. Scaling factors Notes on how these attributes are published at Liverpool Both GLUE 1 and GLUE 2 published in the same way 3 separate clusters CREAM/TORQUE (YAIM), ARC/CONDOR (manual) and VAC (manual) All clusters contain heterogeneous resources All nodes are scaled to a baseline machine (reference to which real nodes are scaled) Average values for logical CPUs and HEPSPEC for the nodes inside the cluster is published 9

Feedback from Alessandra Doria Both definitions give the same figures Both definitions are clear but she would make some changes The definitions focus on the benchmarks Benchmarks are important but do not determine the number of Logical CPUs Good to clarify that benchmarks must be run for the chosen number of processors which may be allocated to jobs as done in [1] Good to make references to APEL Notes on how these attributes are published in Napoli Both GLUE 1 and GLUE 2 published 2 separate clusters/subclusters of PBS/CREAM Both clusters contain heterogeneous resources Average values for logical CPUs and HEPSPEC for the nodes inside the cluster is published 10

Feedback from Manfred Alef Both definitions give the same figures Preference for definition [1] with some minor modifications Notes on how these attributes are published at KIT: Only GLUE 1 published so far 8 CEs/clusters/subclusters of CREAM/SGE In reality 1 single heterogeneous cluster Average values for logical CPUs and HEPSPEC for the nodes inside the cluster are published 11

Next steps What about writing something like the GridPP Publishing tutorial for GLUE 2? We should check whether it is fine for non GridPP sites We will use the definitions for Logical CPUs and benchmark agreed within the TF Could we aim at publishing one Execution Environment per set of homogeneous resources? Or do we let the site decide what they prefer to publish? Are average values of heterogeneous nodes good enough? Do we need to document also the benchmarking process or is this done (or could be done) by each site in a different way? Then we can do a validation campaign We need to define validation criteria for this! 12