Dr. David Wallom Use of Condor in our Campus Grid and the University September 2004.

Slides:



Advertisements
Similar presentations
Community Grids Lab1 CICC Project Meeting VOTable Developed VotableToSpreadsheet Service which accepts VOTable file location as an input, converts to Excel.
Advertisements

Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
A Computation Management Agent for Multi-Institutional Grids
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Dr. David Wallom Experience of Setting up and Running a Production Grid on a University Campus July 2004.
Monitoring and performance measurement in Production Grid Environments David Wallom.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Alain Roy Computer Sciences Department University of Wisconsin-Madison An Introduction To Condor International.
Grid Toolkits Globus, Condor, BOINC, Xgrid Young Suk Moon.
Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager.
Introduction to HP LoadRunner Getting Familiar with LoadRunner >>>>>>>>>>>>>>>>>>>>>>
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Ashok Agarwal 1 BaBar MC Production on the Canadian Grid using a Web Services Approach Ashok Agarwal, Ron Desmarais, Ian Gable, Sergey Popov, Sydney Schaffer,
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Grid Computing - AAU 14/ Grid Computing Josva Kleist Danish Center for Grid Computing
Daniel Vanderster University of Victoria National Research Council and the University of Victoria 1 GridX1 Services Project A. Agarwal, A. Berman, A. Charbonneau,
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
1 BIG FARMS AND THE GRID Job Submission and Monitoring issues ATF Meeting, 20/06/03 Sergio Andreozzi.
Grid Computing I CONDOR.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve.
Grid Workload Management Massimo Sgaravatto INFN Padova.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor RoadMap.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
NW-GRID Campus Grids Workshop Liverpool31 Oct 2007 NW-GRID Campus Grids Workshop Liverpool31 Oct 2007 Moving Beyond Campus Grids Steven Young Oxford NGS.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Evolution of a High Performance Computing and Monitoring system onto the GRID for High Energy Experiments T.L. Hsieh, S. Hou, P.K. Teng Academia Sinica,
Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona,
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
Proposal for a IS schema Massimo Sgaravatto INFN Padova.
Pilot Factory using Schedd Glidein Barnett Chiu BNL
Page 1 Printing & Terminal Services Lecture 8 Hassan Shuja 11/16/2004.
Nick LeRoy Computer Sciences Department University of Wisconsin-Madison Hawkeye.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Nicholas Coleman Computer Sciences Department University of Wisconsin-Madison Distributed Policy Management.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Caging the CCLRC Compute Zoo (Activities at.
HTCondor-CE for USATLAS Bob Ball AGLT2/University of Michigan OSG AHM March, 2015 Bob Ball AGLT2/University of Michigan OSG AHM March, 2015.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor Introduction.
The OxGrid Resource Broker David Wallom. Overview OxGrid Resource Broking Why build our own Job Submission and other tools Future developments.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 22 February 2006.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
Workload Management Workpackage
A Distributed Policy Scenario
The Scheduling Strategy and Experience of IHEP HTCondor Cluster
Presentation transcript:

Dr. David Wallom Use of Condor in our Campus Grid and the University September 2004

2 Outline The University of Bristol Grid (UoBGrid). The UoBGrid Resource Broker Users & their environment. Problems encountered. Other Condor use within Bristol. Summary.

3 The UoBGrid Planned for ~1000+ CPUs from 1.2 → 3.2GHz arranged in 7 clusters & 3+ Condor pools located in 4 different departments. Core services run on individual servers, e.g. Resource Broker & MDS.

4 The UoBGrid System Layout

5 The UoBGrid, now Currently 270 CPUs in 3 clusters and 1 Windows Condor pool. Central services run on 2 beige boxes in my office. Windows Condor pool only in single student open access area. Currently only two departments (Physics, Chemistry) fully engaged though more on their way. Remainder of large clusters still on legacy versions of operating systems, University wide upgrade programme started.

6 Middleware Virtual Data Toolkit. –Chosen for stability. –Platform independent installation method. –Widely used in other European production grid systems. Contains the standard Globus Toolkit version 2.4 with several enhancements. Also. –GSI enhanced OpenSSH. –myProxy Client & Server. Has a defined support structure.

7 Condor-G Resource Broker Uses the Condor-G matchmaking mechanism with Grid resources. Set to run immediately a job appears. Custom script for determination of resource status & priority. Integrated the Condor Resource description mechanism and Globus Monitoring and Discovery Service.

8 Resource Broker Operation

9 Information Passed into Resource ClassAd MyType = "Machine" TargetType = "Job" Name and gatekeeper URLs dependant on resource system name and installed scheduler as systems may easily have more than one jobmanager installed. Name = "grendel.chm.bris.ac.uk-pbs“ gatekeeper_url = "grendel.chm.bris.ac.uk/jobmanager-pbs" Make sure Globus universe, check number of nodes in cluster and set max number of matched jobs to a particular resource. Requirements = (CurMatches < 5) && (TARGET.JobUniverse == 9) WantAdRevaluate = True Time classad constructed. UpdateSequenceNumber = Currently hard coded in the ClassAd CurMatches = 0 System information retreived from Globus MDS information for head node only not worker OpSys = "LINUX“ Arch = "INTEL" Memory = 501 Installed software defined in resource broker file for each resource INTEL_COMPILER=True GCC3=True

10 Possible extensions to Resource Information Resource state information (LoadAvg, Claimed etc): –How is this defined for a cluster, maybe Condor-G could introduce new states of % full?? Number of CPUs and free diskspace: –How do you define this in a cluster? Is the number of CPUs set as per worker or overall the whole system? Same for disk space. Cluster performance (MIPS, KFlops): –This is not commonly worked out for small clusters so would need to be hardwired in but could be very useful for Ranking resources.

11 Results of condor_status

12 Load Management Only defines the raw numbers of jobs running, idle & held (for whatever reason). Has little measure of relative performance of nodes within grid, currently based on: –Head node processor type & memory. –MDS value of nodeCount for the jobmanager (this is not always the same as the real number of worker nodes. Executes currently only to a single queue on each resource.

13 What is currently running and how do I find out? Simple interface to condor_q Planning to use Condor Job Monitor when installed due to scalability issues.

14 Display of jobs currently running

15 Issues with Condor-G The following is a list of small issues we have: –How do you do some resource definitions for clusters? –When using condor_q –globus actual hostname job matched to is not displayed. –No job exit codes….. The job exit codes will become more important with increased number of users/problems. Once a job has been allocated to a remote cluster then rescheduling elsewhere is difficult.

16 The Users BaBar: –One resource is the Bristol BaBar farm so Monte-Carlo event production in parallel to UoBGrid usage. GENIE: –installing software onto each connected system by agreement with owners. LHCb: –Windows compiled Pythia event generation. Earth Sciences: –River simulation. Myself… –Undergraduate written charge distribution simulation code.

17 Usage Current record: –~10000 individual jobs in a week, –~2500 in one day.

18 Windows Condor through Globus Install Linux machine as Condor Master only. Configure this to flock to Windows Condor pool. Install Globus Gatekeeper. Edit jobmanager.pm file so that architecture for submitted jobs always WINNT51 (matches all the workers in the pool). Appears in Condor-G Resource list as WINNT51 Resource.

19 Windows Condor pools available through a Globus Interface from a flocked Linux pool Within three departments, currently there are three separate Windows Condor pools approximately 200 CPUs. Planning to allow all student teaching resources in as many departments as possible to have the software installed. This will allow a significant increase in university processing power with little cost increase. When department gives the OK then they will be added to the flocking list on single Linux submission system machine. Difficulty encountered with this setup is the lack of Microsoft Installer file. –Affects ability to use group policy method of software delivery and installation, directly affects how some computer officers view installing etc.

20 Evaluation of Condor again United Devices GridMP Computational Chemistry group have significant links with industrial partner who is currently using U.D. GridMP. Suggested to CC group they also use GridMP though after initial contact this was suggested to be very costly. e-Science group suggested that Condor would be a better system for them to use. Agreement from UD to do a published function & usage comparison between Condor & GridMP. Due to start this autumn.