CHEP 2000 Smart Resource Management Software in High Energy Physics Wolfgang Gentzsch and Lothar Lippert Gridware GmbH & Inc. Padua, 9 February 2000.

Slides:



Advertisements
Similar presentations
Vision for System and Resource Management of the Swiss-Tx class of Supercomputers Josef Nemecek ETH Zürich & Supercomputing Systems AG.
Advertisements

COURSE: COMPUTER PLATFORMS
| imodules.com RE Adapter for Encompass (v2.0) Encompass and The Raiser's Edge® Integrated Data Solution CONFIDENTIAL.
S.Chechelnitskiy / SFU Simon Fraser Running CE and SE in a XEN virtualized environment S.Chechelnitskiy Simon Fraser University CHEP 2007 September 6 th.
RE Adapter for Encompass (v1.0)‏ Encompass and The Raiser's Edge® Integrated Data Solution.
CMSC 421: Principles of Operating Systems Section 0202 Instructor: Dipanjan Chakraborty Office: ITE 374
Chapter 1: Introduction
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
6/2/20071 Grid Computing Sun Grid Engine (SGE) Manoj Katwal.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
1 Operating Systems Ch An Overview. Architecture of Computer Hardware and Systems Software Irv Englander, John Wiley, Bare Bones Computer.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
BMC Control-M Architecture By Shaikh Ilyas
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems.
1/16/2008CSCI 315 Operating Systems Design1 Introduction Notice: The slides for this lecture have been largely based on those accompanying the textbook.
VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
The material in this presentation is the property of Fair Isaac Corporation. This material has been provided for the recipient only, and shall not be used,
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Configuring Resources for the Grid Jerry Perez.
Remote OMNeT++ v2.0 Introduction What is Remote OMNeT++? Remote environment for OMNeT++ Remote simulation execution Remote data storage.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Resource management system for distributed environment B4. Nguyen Tuan Duc.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems.
Sun Grid Engine. Grids Grids are collections of resources made available to customers. Compute grids make cycles available to customers from an access.
© 1998 GENIAS Software GmbH GENIAS Software GmbH GRD Mannheim/1 GRD Success Stories Customer Scenarios for Global Distributed Workload Management Wolfgang.
Wolfgang Friebel, HEPiX Meeting Berkeley Installing and Running SGE at DESY (Zeuthen)
©Brooks/Cole, 2003 Chapter 7 Operating Systems. ©Brooks/Cole, 2003 Define the purpose and functions of an operating system. Understand the components.
PARMON A Comprehensive Cluster Monitoring System A Single System Image Case Study Developer: PARMON Team Centre for Development of Advanced Computing,
Grid Computing I CONDOR.
Nov 1, 2000Site report DESY1 DESY Site Report Wolfgang Friebel DESY Nov 1, 2000 HEPiX Fall
Database Administration COMSATS INSTITUTE OF INFORMATION TECHNOLOGY, VEHARI.
Open Solutions for a Changing World™ Copyright 2005, Data Access Worldwide June 6-9, 2005 Key Biscayne, Florida 1 Pervasive.SQL Version 9 - What’s New.
Database Architectures Database System Architectures Considerations – Data storage: Where do the data and DBMS reside? – Processing: Where.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Lecture 8: 9/19/2002CS149D Fall CS149D Elements of Computer Science Ayman Abdel-Hamid Department of Computer Science Old Dominion University Lecture.
1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
© 2007 IBM Corporation Snehal S. Antani, WebSphere XD Technical Lead SOA Technology Practice IBM Software WebSphere.
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
Hands-On Microsoft Windows Server 2008 Chapter 5 Configuring Windows Server 2008 Printing.
Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered System Real.
Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered System Real.
Chapter 1: Introduction
Chapter 1: Introduction
OpenPBS – Distributed Workload Management System
Introduction to Load Balancing:
Chapter 1: Introduction
Chapter 1: Introduction
CSC 480 Software Engineering
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
LSF: Joining Forces in a Global Compute Farm
Operating Systems.
Basic Grid Projects – Condor (Part I)
Chapter 1: Introduction
Language Processors Application Domain – ideas concerning the behavior of a software. Execution Domain – Ideas implemented in Computer System. Semantic.
Chapter 1: Introduction
Sun Grid Engine.
Chapter 1: Introduction
Chapter 1: Introduction
Introduction to OS (concept, evolution, some keywords)
Chapter 1: Introduction
Introduction to OS (concept, evolution, some keywords)
Chapter 1: Introduction
Presentation transcript:

CHEP 2000 Smart Resource Management Software in High Energy Physics Wolfgang Gentzsch and Lothar Lippert Gridware GmbH & Inc. Padua, 9 February 2000

Technical Requirements and Features what do we offer to help HEP Computing CHEP 2000 Resource Management with CODINE / GRD Gridware - The Company Technology Leader in Resource Management A special offer to the HEP community Our answer to falling hardware-prices

Technical Requirements and Features Array Jobs Advanced Queue Concept Policy Management Separation of Components Solutions for mixing interactive and batch Simplified system administration AFS Support CORBA Interface All “classic” Features Availability

Array Jobs #!/bin/sh... 1 single Submit-Command for thousands of similar jobs Example: qsub -t :1 jobscript.sh creates 1000 instances of a single job The whole array can be (also partly) manipulated (deleted, suspended,...) with 1 command unlimited number of instances

Job Advanced Queue Concept The whole cluster can be adressed Soft requests are supported No empty queues while others are more than full each host can be treated with different policies users just request resources higher efficiency “Emergency Room Concept” Job Cluster Dispatch Job Q1 Q2 Example: qsub -l mem_free=10M jobscript.sh Cluster is split Queues may run empty users have to decide for a queue Job has to stay in line also if other resources are unused “Grocery Store Concept” Example: qsub -q 10MQ jobscript.sh

Policy Management Fairshare Override System Boosts temporarily project/job/group/department Share Utilization Time Raise group Execute jobs earlier 20% Group1 30% Group2 50% Group3

Separation of Master and Scheduler Scalability high performance good response time faster job placement Separation of Components

Simplified system administration No daemon restarts necessary Add machines ‘on the fly’ Ability to install the entire cluster from one workstation No submit daemons or configuration needed for client Optimized architecture provides reliability Conifiguration changes without any pain

What else? CORBA Interface AFS Support All “classic” Features Interactive vs. Batch accounting, monitoring, suspension, sensors... time windows automatic suspend migration,... Availability all leading unix platforms

The company GENIASChord based in Germany European Union funded projects R&D company located in California leader in sales of RMS Technology leader in Resource Management Goal: make CODINE world standard in Resource Management

Our experience EU funded research projects REMUS UNICORE... Reseach & Development DESY Zeuthen (long relationship) CASPUR (recently switched to CODINE) MPI (Max Planck Institutes)... Industry BMW SAAB SIEMENS...

Contact Us (0)