SAN DIEGO SUPERCOMPUTER CENTER Inca Control Infrastructure Shava Smallen Inca Workshop September 4, 2008.

Slides:



Advertisements
Similar presentations
TeraGrid Deployment Test of Grid Software JP Navarro TeraGrid Software Integration University of Chicago OGF 21 October 19, 2007.
Advertisements

SAN DIEGO SUPERCOMPUTER CENTER Inca 2.0 Shava Smallen Grid Development Group San Diego Supercomputer Center June 26, 2006.
Test harness and reporting framework Shava Smallen San Diego Supercomputer Center Grid Performance Workshop 6/22/05.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
GENI Experiment Control Using Gush Jeannie Albrecht and Amin Vahdat Williams College and UC San Diego.
Condor Project Computer Sciences Department University of Wisconsin-Madison Stork An Introduction Condor Week 2006 Milan.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
Monitoring and performance measurement in Production Grid Environments David Wallom.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
NGOP J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov.
Workload Management Massimo Sgaravatto INFN Padova.
11 MAINTAINING THE OPERATING SYSTEM Chapter 5. Chapter 5: MAINTAINING THE OPERATING SYSTEM2 CHAPTER OVERVIEW Understand the difference between service.
Effective Methods for Analyzing Altiris Performance Sam Saffron | Development Manager | Altiris John Epeneter | Product Manager | Altiris Monitoring.
KARMA with ProActive Parallel Suite 12/01/2009 Air France, Sophia Antipolis Solutions and Services for Accelerating your Applications.
Screen Snapshot Service Kurt Biery SiTracker Monitoring Meeting, 23-Jan-2007.
A Web 2.0 Portal for Teragrid Fugang Wang Gregor von Laszewski May 2009.
SAN DIEGO SUPERCOMPUTER CENTER Working with Inca Reporters Jim Hayes Inca Workshop September 4-5, 2008.
The gLite API – PART I Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
Grid Computing I CONDOR.
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
1 Overview of the Application Hosting Environment Stefan Zasada University College London.
3-2.1 Topics Grid Computing Meta-schedulers –Condor-G –Gridway Distributed Resource Management Application (DRMAA) © 2010 B. Wilkinson/Clayton Ferner.
The Professional Open Source™ Company CLI Shell JBossNetwork Enterprise Manager Command Line Interface.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
National Computational Science National Center for Supercomputing Applications National Computational Science NCSA-IPG Collaboration Projects Overview.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
SAN DIEGO SUPERCOMPUTER CENTER Inca Data Display (data consumers) Shava Smallen Inca Workshop September 5, 2008.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Module 10 Administering and Configuring SharePoint Search.
Kurt Mueller San Diego Supercomputer Center NPACI HotPage Updates.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
SAN DIEGO SUPERCOMPUTER CENTER Inca TeraGrid Status Kate Ericson November 2, 2006.
Nguyen Tuan Anh. VN-Grid: Goals  Grid middleware (focus of this presentation)  Tuan Anh  Grid applications  Hoai.
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
Review of Condor,SGE,LSF,PBS
SAN DIEGO SUPERCOMPUTER CENTER Inside the Inca Depot Jim Hayes Inca Workshop September 4-5, 2008.
APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Portal Update Plan Ashok Adiga (512)
SAN DIEGO SUPERCOMPUTER CENTER Using the Inca APIs Jim Hayes Inca Workshop September 4-5, 2008.
VO Privilege Activity. The VO Privilege Project develops and implements fine-grained authorization to grid- enabled resources and services Started Spring.
SAN DIEGO SUPERCOMPUTER CENTER Administering Inca with incat Jim Hayes Inca Workshop September 4-5, 2008.
LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
SAN DIEGO SUPERCOMPUTER CENTER Welcome to the 2nd Inca Workshop Sponsored by the NSF September 4 & 5, 2008 Presenters: Shava Smallen
TeraGrid QA/INCA Turnover Jeff Koerner Q meeting December 8, 2010.
03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.
IBM Express Runtime Quick Start Workshop © 2007 IBM Corporation Deploying a Solution.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 Technical Overview Jakub T. Moscicki, CERN.
Run-time Adaptation of Grid Data Placement Jobs George Kola, Tevfik Kosar and Miron Livny Condor Project, University of Wisconsin.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
Shaowen Wang 1, 2, Yan Liu 1, 2, Nancy Wilkins-Diehr 3, Stuart Martin 4,5 1. CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department.
SQL Database Management
Open OnDemand: Open Source General Purpose HPC Portal
OGF PGI – EDGI Security Use Case and Requirements
Featured Enhancements to the IDE & Debugger
Use of Nagios in Central European ROC
How to connect your DG to EDGeS? Zoltán Farkas, MTA SZTAKI
Deploying and Configuring SSIS Packages
Introduction to Ansible
Presentation transcript:

SAN DIEGO SUPERCOMPUTER CENTER Inca Control Infrastructure Shava Smallen Inca Workshop September 4, 2008

Grid Resource … Reporter Manager Reporter Repository Agent Depot R S Data Consumers Grid Resource Reporter Manager R S R C Incat r r S Control Infrastructure Minimal impact on monitored resources Flexible reporter scheduling and configuration options Easy installation and maintenance Proxy credential available to reporters for user-level execution

SAN DIEGO SUPERCOMPUTER CENTER Agent provides centralized configuration and management Implements the configuration specified by Inca administrator Stages and launches a reporter manager on each resource Sends package and configuration updates Manages proxy information Administration via GUI interface (incat) Screenshot of Inca GUI tool, incat, showing the reporters that are available from a local repository

SAN DIEGO SUPERCOMPUTER CENTER A configuration is a description of an Inca deployment 1.Which resources do you want to monitor? 2.What do you want to monitor? 3.How do you want to monitor?

SAN DIEGO SUPERCOMPUTER CENTER Step 1a: Defining your resources A resource can be a cluster, supercomputer, or server TeraGrid SDSC sdsc-ia64onDemandncsa-ia64 IA-64 A resource group is two or more related resources Shared characteristic (e.g., ia64 arch) Site VO Resource Group Resource NCSA …

SAN DIEGO SUPERCOMPUTER CENTER Step 1b: Describing your resources Macros - Attributes (or variables) that describe your resource Can be defined in a resource or in a resource group Can be inherited -- most specific value wins Can have multiple values DataStar NCSA IA-64 Cluster TeraGrid projectId = TG-STA060008N scheduler = PBS gramContact = dslogin.sdsc.edu queue = default scheduler = LSF gramContact = tg-login.ncsa.edu queue = standby

SAN DIEGO SUPERCOMPUTER CENTER Step 1c: Automating access to resource Uses Java CoG - (supports Globus pre- WS servers) Grid Resource … Reporter manager Agent Grid Resource Reporter manager Grid Resource Reporter manager Uses Java Runtime exec Uses SSHTool’s Java SSH API Installs in $HOME/incaReporterManager by default Local Remote SshGlobus Local

SAN DIEGO SUPERCOMPUTER CENTER A configuration is a description of an Inca deployment 1.Which resources do you want to monitor? 2.What do you want to monitor? 3.How do you want to monitor?

SAN DIEGO SUPERCOMPUTER CENTER Step 2: Selecting or creating reporters 1.Use local repository Copy of the standard Inca reporter repository installed by default Use file:// or (recommended) 2.Use Inca project reporter repository + local repository Receive updates

SAN DIEGO SUPERCOMPUTER CENTER A configuration is a description of an Inca deployment 1.Which resources do you want to monitor? 2.What do you want to monitor? 3.How do you want to monitor?

SAN DIEGO SUPERCOMPUTER CENTER What is a report series? A set of reports collected at different points in time by executing a reporter with a set of arguments in a context on a particular resource.

SAN DIEGO SUPERCOMPUTER CENTER Step 3a: Find reporter to execute E.g., can you submit a batch job via Globus WS-GRAM to Grid resources Select reporter: grid.middleware.globus.unit.wsgram.jobsubmit % grid.middleware.globus.unit.wsgram.jobsubmit \ -host="tg-condor.purdue.teragrid.org:8443" \ -log="5" \ -maxMem="2048" \ -nodes="1" \ -project="TG-STA060008N" \ -queue="standby" \ -scheduler="Condor"

SAN DIEGO SUPERCOMPUTER CENTER Step 3b: Decide where to run reporter Select a single resource name or resource group E.g., sdsc-ia64 SDSC TeraGrid IA-64 TeraGrid SDSC sdsc-ia64onDemandncsa-ia64 IA-64 Resource Group Resource NCSA …

SAN DIEGO SUPERCOMPUTER CENTER Step 3c: Configure reporter arguments % grid.middleware.globus.unit.wsgram.jobsubmit \ -host= ” \ -log="5" \ -maxMem="2048" \ -nodes="1" \ -project= ” \ -queue= ” \ -scheduler= ” Resource macros Resource group macro DataStar NCSA IA-64 Cluster TeraGrid projectId = TG-STA060008N scheduler = PBS gramContact = dslogin.sdsc.edu queue = default scheduler = LSF gramContact = tg-login.ncsa.edu queue = standby

SAN DIEGO SUPERCOMPUTER CENTER grid.middleware.globus.unit.wsgram. jobsubmit \ -host= ” \ -log="5" \ -maxMem="2048" \ -nodes="1" \ -project= ” \ -queue= ” \ -scheduler= ” Agent “expands” macro values in series SDSC IA-64 TeraGrid grid.middleware.globus.unit.wsgram.jobsubmit \ -host= ” tg-login.sdsc.edu:8443" \ -log="5" \ -maxMem="2048" \ -nodes="1" \ -project= ” TG-STA060008N" \ -queue= ” \ -scheduler= ” grid.middleware.globus.unit.wsgram.jobsubmit \ -host= ” tg-login.ncsa.edu:8443" \ -log="5" \ -maxMem="2048" \ -nodes="1" \ -project= ” TG-STA060008N" \ -queue= ” standby ” \ -scheduler= ” PBS ” NCSA IA-64

SAN DIEGO SUPERCOMPUTER CENTER Agent “expands” multi-valued macro values in series grid.performance.ping \ -host=tg-login.sdsc.edu grid.performance.ping \ -host=tg-login.uc.edu grid.performance.ping \ -host=tg-login.psc.edu NCSA IA-64 grid.performance.ping \ Reporter will be executed once for each value in macro. hosts = tg-login.sdsc.edu, tg-login.uc.edu, tg-login.psc.edu NCSA IA-64

SAN DIEGO SUPERCOMPUTER CENTER Agent “expands” multiple multi-valued macro values in series Multiple multi-valued macros  cross product E.g., = bglogin.sdsc.edu, tg.ncsa.edu = /gpfs/inca, /users/inca, /scr/inca data.transfer.unit  Will expand to: 1.data.transfer.unit -host=bglogin.sdsc.edu -dir=/gpfs/inca 2.data.transfer.unit -host=bglogin.sdsc.edu -dir=/users/inca 3.data.transfer.unit -host=bglogin.sdsc.edu -dir=/scr/inca 4.data.transfer.unit -host=tg.ncsa.edu -dir=/gpfs/inca 5.data.transfer.unit -host=tg.ncsa.edu -dir=/users/inca 6.data.transfer.unit -host=tg.ncsa.edu -dir=/scr/inca

SAN DIEGO SUPERCOMPUTER CENTER Optional execution string can be used to set the context the reporter runs under E.g., run reporter under fresh shell: /bin/sh -l -c ‘net.benchmark.wget -args ’ E.g., softenv/modules configuration soft add +atlas; cluster.math.atlas.version -args Step 3d: Specify an execution context

SAN DIEGO SUPERCOMPUTER CENTER Step 3e: Choose a scheduling frequency Expressed in extended cron syntax minute hour dayOfMonth month dayOfWeek minute = The minute of the hour the reporter will be executed (range: 0-59) hour = The hour of the day the reporter will be executed (range: 0-23) dayOfMonth = The day of the month the reporter will be executed (range: 0-23) month = The month the reporter will be executed (range: 1-12) dayOfWeek = The day of the week the reporter will be executed (range: 0-6) "?" in the field tells Inca to pick a random time within the specified range -- spreads out load ? * * * * = run anytime every hour ?-59/10 * * * * = run anytime every 10 minutes

SAN DIEGO SUPERCOMPUTER CENTER Step 3f: Specify a unique nickname Descriptive name that describes the test Can contain macros -- important for multi-valued macros E.g., atlas_version E.g.,

SAN DIEGO SUPERCOMPUTER CENTER Step 3g: Limit resource usage of reporter (optional) Wall clock time E.g., no more than 10 seconds Cpu seconds E.g., no more than 2 cpu seconds Memory E.g., no more than 20 MB Reporter will be killed and an error report will be sent indicating the resource usage exceeded

SAN DIEGO SUPERCOMPUTER CENTER What is a suite? A set of report series that share a common theme. E.g., data management job management file transfer LiDAR workflow

Repository cache Suites Expand series Distribute RM Reporter Repository R C Incat Depot Refresh repository Download reporters C S S S S S S S S Grid Resource … Reporter Manager R S Grid Resource Reporter Manager R S r r RM controller Configuration contains: 1.Repository URLs 2.Resources 3.Suites Inside the agent

SAN DIEGO SUPERCOMPUTER CENTER Agent supports proxy credentials Case 1: Agent Reporter Manager MyProxy Server P Java CoG Proxy retrieved to launch Reporter Manager using Globus access method Proxy retrieved to provide credential for reporters Agent Reporter Manager MyProxy Server P Myproxy info Case 2:

SAN DIEGO SUPERCOMPUTER CENTER Agent supports “run now” execution for debugging Each series can be scheduled for immediate execution Invoked from Incat (inca admins) Invoked from command-line (system admins) Run a series before its next scheduled execution time to update a series result

SAN DIEGO SUPERCOMPUTER CENTER Pings reporter managers every 10 minutes Attempts to restart every hour If multiple hosts specified for a resource, will try each host Agent monitors reporter managers sdsc-ia64 tg-login1tg-login2tg-login3

SAN DIEGO SUPERCOMPUTER CENTER Reporter Manager Minimal functionality to limit load on resource Receives from reporter agent that started it: Reporters and libraries Reporter configuration and schedules Executes reporters periodically (cron) or now and forwards reports to the depot Profiles reporter system usage and enforces timeouts Grid Resource Reporter Manager

SAN DIEGO SUPERCOMPUTER CENTER Summary Inca control infrastructure provides centralized configuration and management Provides flexible reporter scheduling and configuration options Eases installation and maintenance via macros, access methods, and automatic package updates Limits impact on monitored resources Proxy credential available to reporters for user-level execution

SAN DIEGO SUPERCOMPUTER CENTER Agenda -- Day 1 9: :00Inca overview 10: :00Working with Inca Reporters 11: :00Hands-on: Reporter API and Repository 1:00 - 2:00Inca Control Infrastructure 2:00 - 3:00Administering Inca with incat 3:15 - 4:00Hands-on: Inca deployment (part 1)