Www.eu-etics.org ETICS All Hands meeting Bologna, October 23-25, 2006 NMI and Condor: Status + Future Plans Andy PAVLO Peter COUVARES Becky GIETZEL.

Slides:



Advertisements
Similar presentations
Todd Tannenbaum Condor Team GCB Tutorial OGF 2007.
Advertisements

Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service.
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
INFSO-RI An On-Demand Dynamic Virtualization Manager Øyvind Valen-Sendstad CERN – IT/GD, ETICS Virtual Node bootstrapper.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Workload Management Massimo Sgaravatto INFN Padova.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Keeping Your Software Ticking Testing with Metronome and the NMI Lab.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Module 14: Configuring Print Resources and Printing Pools.
Peter Couvares Computer Sciences Department University of Wisconsin-Madison Metronome and The NMI Lab: This subtitle included solely to.
Submitted by: Shailendra Kumar Sharma 06EYTCS049.
CSE 4481 Computer Security Lab Mark Shtern. INTRODUCTION.
Grid Job and Information Management (JIM) for D0 and CDF Gabriele Garzoglio for the JIM Team.
Grid Appliance – On the Design of Self-Organizing, Decentralized Grids David Wolinsky, Arjun Prakash, and Renato Figueiredo ACIS Lab at the University.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
BaBar MC production BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Large Scale Sky Computing Applications with Nimbus Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes – Bretagne Atlantique Rennes, France
Simplifying Resource Sharing in Voluntary Grid Computing with the Grid Appliance David Wolinsky Renato Figueiredo ACIS Lab University of Florida.
Heterogeneous Database Replication Gianni Pucciani LCG Database Deployment and Persistency Workshop CERN October 2005 A.Domenici
CSE 4481 Computer Security Lab Mark Shtern. INTRODUCTION.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor RoadMap.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Quill / Quill++ Tutorial.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
ABone Architecture and Operation ABCd — ABone Control Daemon Server for remote EE management On-demand EE initiation and termination Automatic EE restart.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Experiment Management System CSE 423 Aaron Kloc Jordan Harstad Robert Sorensen Robert Trevino Nicolas Tjioe Status Report Presentation Industry Mentor:
Evolution of a High Performance Computing and Monitoring system onto the GRID for High Energy Experiments T.L. Hsieh, S. Hou, P.K. Teng Academia Sinica,
Conference name Company name INFSOM-RI Speaker name The ETICS Job management architecture EGEE ‘08 Istanbul, September 25 th 2008 Valerio Venturi.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
CoreGRID Workpackage 5 Virtual Institute on Grid Information and Monitoring Services Michał Jankowski, Paweł Wolniewicz, Jiří Denemark, Norbert Meyer,
Condor week – March 2005©Gabriel Kliot, Technion1 Adding High Availability to Condor Central Manager Gabi Kliot Technion – Israel Institute of Technology.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
Dan Bradley Condor Project CS and Physics Departments University of Wisconsin-Madison CCB The Condor Connection Broker.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
INFSO-RI ETICS Local Setup Experiences A Case Study for Installation at Customers Location 4th. All Hands MeetingUwe Müller-Wilm VEGA Bologna, Nov.
INFSOM-RI ETICS: E-infrastructure for Testing, Integration and Configuration of Software Alberto Di Meglio Project Manager.
INFSOM-RI WP2: Infrastructure and Service Management Status Report ETICS All-Hands – 29 May 2006 Peter F. Couvares ETICS WP2 Leader.
Copyright © 2004 R2AD, LLC Submitted to GGF ACS Working Group for GGF-16 R2AD, LLC Distributing Software Life Cycles Join the ACS Team GGF-16, Athens R2AD,
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
ETICS, EU-OMII and the Software Repository Andrea Caltroni, INFN Padova ETICS 1 st All-Hands Meeting, Budapest - May 29-31, 2006.
Leveraging Database Technologies in Condor Jeff Naughton April 25, 2006.
Jean-Philippe Baud, IT-GD, CERN November 2007
Workload Management Workpackage
Dag Toppe Larsen UiB/CERN CERN,
Dynamic Deployment of VO Specific Condor Scheduler using GT4
U.S. ATLAS Grid Production Experience
Dag Toppe Larsen UiB/CERN CERN,
Operating a glideinWMS frontend by Igor Sfiligoi (UCSD)
StratusLab Final Periodic Review
StratusLab Final Periodic Review
WP2: Infrastructure and Service Management
Building and Testing using Condor
Module 01 ETICS Overview ETICS Online Tutorials
GLOW A Campus Grid within OSG
JRA 1 Progress Report ETICS 2 All-Hands Meeting
Presentation transcript:

ETICS All Hands meeting Bologna, October 23-25, 2006 NMI and Condor: Status + Future Plans Andy PAVLO Peter COUVARES Becky GIETZEL

Bologna -- All Hands Meeting 2 Overview Introduction Cross-site Job Migration Improving Documentation Virtual Machines Generic Connection Broker Future Plans Q & A

Bologna -- All Hands Meeting 3 Introduction University of Wisconsin team is dedicated to improving Condor technologies and the NMI framework. Condor user base continues to grow. Expecting upcoming surge of NSF users for NMI.

Bologna -- All Hands Meeting 4 Cross-site Job Migration Pools of ETICS computing resources installed at INFN, CERN, and University of Wisconsin. Jobs automatically routed to remote sites when local resources are unavailable to satisfy requirements. Transparent to users.

Bologna -- All Hands Meeting 5 Cross-site Job Migration Condor Schedd-on-the-Side Condor Schedd-on-the-Side Condor Job Condor Job Condor-C Job Grid Resource Routing Table NMI Build/Test Submission Local Site Remote Site Condor Schedd Condor Schedd Resource Advertiser Resource Advertiser Condor Matchmaker Condor Matchmaker Condor Matchmaker Condor Matchmaker

Bologna -- All Hands Meeting 6 Cross-site Job Migration NMI Universe Beyond ETICS: OMII-UK, OMII-Europe Available Resources Resource Advertiser CERN Resource Advertiser INFN Resource Advertiser University of Wisconsin

Bologna -- All Hands Meeting 7 Cross-site Job Migration Current status: –Explicit job routing is available in NMI framework Future plans: –Initial deployment (without prereq information): November 2006 –Improved matchmaking: December 2006 Still to be determined: –Authorization/Authentication method(s) –Scalable distributed data dissemination

Bologna -- All Hands Meeting 8 Documentation Emphasis on creating complete documentation and user tutorials for NMI framework. Additional contributions from Michael Bletzinger (NCSA) Target deadline: December 2006 ~ January 2007 New website:

Bologna -- All Hands Meeting 9 Virtual Machines Jobs are sand boxed inside of a virtual machine –Changes to the system are isolated to the local VM. Allow for more robust build and test scenarios Current Status in Condor: –Preliminary support for VMware is in Condor 6.9 –Users must create the VM image beforehand. –Future plans is to create VM dynamically and insert jobs –Plan to support Xen and VirtualPC Virtual Machines Condor's current VM-support is not directly usable by the NMI framework.

Bologna -- All Hands Meeting 10 Virtual Machines: Future Plans NMI and ETICS could provide a standard image per OS, configured with pre-requisite software. Images are stored in a cache and dynamically deployed with builds and tests. Users only need add a single-line to their submission file NMI framework enhancements: –Maintain cache of available OS VM images. –Inject build and test scripts inside of VM image. –Extract appropriate status, logs, and job artifacts.

Bologna -- All Hands Meeting 11 Generic Connection Broker One way for Condor jobs to traverse firewall. Daemon that acts as a proxy at the edge of firewalls. Acts as a broker, then steps out of the way. Low “maintenance”: –Works with NATs and multiple private networks. –No changes to firewall configuration Matchmaker Executor Submitter GCB ) Executor registers with GCB 2) Executor advertises to matchmaker 3) After match, submitter contacts executor, via GCB 4) GCB tells executor to open connection 5) Executor opens connection to submitter

Bologna -- All Hands Meeting 12 Gateway Connection Broker Currently only supported in Condor 6.8 for Linux Wisconsin team is working to improve GCB: –Clean up code base and remove testing logic –Port to other operating systems –Improve scalability and network performance

Bologna -- All Hands Meeting 13 Other Future Plans: NMI Parallel scheduling enhancements: –Task synchronization –Primitives today, high-level dependency spec/mgmt tomorrow? –Scalability testing: 10^1, 10^2, 10^3, 10^4 nodes? Re-factored database schema: –Improved DB scalability and performance –Improved build/test artifact provenance –Project hierarchy –Users and groups –Builds and tests are coupled to projects –Task-level metrics Fuzz testing mechanisms Website enhancements (maybe): –Consolidate "old" and "new" web interface –May focus more on debugging info than status info

Bologna -- All Hands Meeting 14 Other Future Plans: Condor New Development Series: Condor 6.9 Improved scalability: –Modularize schedd tasks –Non-blocking I/O Privilege separation: –Daemons no longer need to start with setuid permissions –Integration with glexec/sudo Enhanced security –Continue with source code audits –Signed ClassAds Parallel scheduling: –Document & understand current issues in a pool doing both independent & parallel work –Improve incrementally based on production experiences

Bologna -- All Hands Meeting 15 Q & A