Test harness and reporting framework Shava Smallen San Diego Supercomputer Center Grid Performance Workshop 6/22/05.

Slides:



Advertisements
Similar presentations
CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding or
Advertisements

PRAGMA BioSciences Portal Raj Chhabra Susumu Date Junya Seo Yohei Sawai.
TeraGrid Deployment Test of Grid Software JP Navarro TeraGrid Software Integration University of Chicago OGF 21 October 19, 2007.
SAN DIEGO SUPERCOMPUTER CENTER Inca 2.0 Shava Smallen Grid Development Group San Diego Supercomputer Center June 26, 2006.
CPSCG: Constructive Platform for Specialized Computing Grid Institute of High Performance Computing Department of Computer Science Tsinghua University.
Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki
Page 1 October 31, 2000 An Introduction to Large-Scale Software Development Steve Varnau Core HP-UX Operation October 31, 2000.
1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.
High Performance Computing Course Notes Grid Computing.
A Computation Management Agent for Multi-Institutional Grids
Monitoring and performance measurement in Production Grid Environments David Wallom.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Distributed Application Management Using PLuSH Jeannie Albrecht, Christopher Tuttle, Alex C. Snoeren, and Amin Vahdat UC San Diego CSE {jalbrecht, ctuttle,
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Simo Niskala Teemu Pasanen
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Kate Keahey Argonne National Laboratory University of Chicago Globus Toolkit® 4: from common Grid protocols to virtualization.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
CCSM Portal/ESG/ESGC Integration (a PY5 GIG project) Lan Zhao, Carol X. Song Rosen Center for Advanced Computing Purdue University With contributions by:
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
Rsv-control Marco Mambelli – Site Coordination meeting October 1, 2009.
National Center for Supercomputing Applications The Computational Chemistry Grid: Production Cyberinfrastructure for Computational Chemistry PI: John Connolly.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
NeSC Apps Workshop July 20 th, 2002 Customizable command line tools for Grids Ian Kelley + Gabrielle Allen Max Planck Institute for Gravitational Physics.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
SAN DIEGO SUPERCOMPUTER CENTER Inca Data Display (data consumers) Shava Smallen Inca Workshop September 5, 2008.
March 17, 2006CIP Status Meeting March 17, 2006 Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Project Report at CIP AG Meeting.
Kurt Mueller San Diego Supercomputer Center NPACI HotPage Updates.
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
TeraGrid CTSS Plans and Status Dane Skow for Lee Liming and JP Navarro OSG Consortium Meeting 22 August, 2006.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
OSG Tier 3 support Marco Mambelli - OSG Tier 3 Dan Fraser - OSG Tier 3 liaison Tanya Levshina - OSG.
SAN DIEGO SUPERCOMPUTER CENTER Inca TeraGrid Status Kate Ericson November 2, 2006.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
PerfSONAR-PS Functionality February 11 th 2010, APAN 29 – perfSONAR Workshop Jeff Boote, Assistant Director R&D.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
SAN DIEGO SUPERCOMPUTER CENTER Inca Control Infrastructure Shava Smallen Inca Workshop September 4, 2008.
Grid Monitoring and Information Services: Globus Toolkit MDS4 & TeraGrid Inca Jennifer M. Schopf Argonne National Lab UK National eScience Center (NeSC)
Portal Update Plan Ashok Adiga (512)
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Biomedical and Bioscience Gateway to National Cyberinfrastructure John McGee Renaissance Computing Institute
Avanade Confidential – Do Not Copy, Forward or Circulate © Copyright 2014 Avanade Inc. All Rights Reserved. For Internal Use Only SharePoint Insights (BETA)
SAN DIEGO SUPERCOMPUTER CENTER Welcome to the 2nd Inca Workshop Sponsored by the NSF September 4 & 5, 2008 Presenters: Shava Smallen
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
Monitoring and Information Services Core Infrastructure (MIS-CI) Service Description Mark L. Green OSG Integration Workshop at UC Feb 15-17, 2005.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
OSG Facility Miron Livny OSG Facility Coordinator and PI University of Wisconsin-Madison Open Science Grid Scientific Advisory Group Meeting June 12th.
Open Science Grid Configuring RSV OSG Resource & Service Validation Thomas Wang Grid Operations Center (OSG-GOC) Indiana University.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Shaowen Wang 1, 2, Yan Liu 1, 2, Nancy Wilkins-Diehr 3, Stuart Martin 4,5 1. CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department.
GPIR GridPort Information Repository
Overview – SOE PatchTT November 2015.
Shaowen Wang1, 2, Yan Liu1, 2, Nancy Wilkins-Diehr3, Stuart Martin4,5
Scalable Systems Software for Terascale Computer Centers
Leigh Grundhoefer Indiana University
Presentation transcript:

test harness and reporting framework Shava Smallen San Diego Supercomputer Center Grid Performance Workshop 6/22/05

Is the Grid Up? Can user X run application[s] Y on Grid[s] Z? Access dataset[s] N? Are Grid services the application[s] use available? Compatible versions? Are dataset[s] N accessible to user X? Credentials? Is there sufficient space to store output data? … Community of users (VO)? Multiple communities of users?

Testing a Grid If you can define Grid up in a machine- readable format, you can test it User documentation, users, mgmt Run larger job using both SDSC and PSC systems together, move data from SRB to local scratch storing results in SRB Move small output set from SRB to ANL cluster, do visualization experiments, render small sample, store results in SRB Move large output data set from SRB to remote-access storage cache at SDSC, render using ANL hardware, store results in SRB Develop and optimize code at Caltech Run large job at SDSC, store data using SRB. Run large job at NCSA, move data from SRB to local scratch and store results in SRB Grid up example

What type of testing? Deployment testing Automated, continuous checking of Grid services, software, and environment Installed? Running? Configured correctly? Accessible to users? Acceptable performance? E.g., gatekeeper ping or scaled down application Software Package (unit, integrated) Software Stack (interoperability) Software Deployment NMI Junit, PyUnit, Tinderbox

Who tests? Grid/VO Management Run from default user account Goal: user-level problems detected & fixed before users notice Results available to users User-specific Debug user account/environment issues Advanced usage: feedback tests

Inca Framework for the automated testing, benchmarking and monitoring of Grid systems Schedule execution of information gathering scripts (reporters) Collect, archive, publish, and display results Inca Server Resource 2 Resource 1 Resource N … Inca Client Inca Client Inca Client

Outline Introduction Inca architecture Case study: V&V on TeraGrid Current and Future Work Feedback

Inca Reporters Script or executable that outputs XML conforming to Inca specification Context of execution is required - important for repeatability What commands were run? What machine? What inputs? Communicate more than pass/fail Body XML can be reporter specific - flexibility E.g., package version info (software stack availability) E.g., SRB throughput (unusual drop in SRB performance) Users can run it independently of framework What time? What result?

Reporter Execution Framework How often should reporters run Boot-time, every hour, every day? Modes of execution: One shot mode: boot-time, after a maintenance cycle, user checking their specific setup Continuous mode: cron scheduling Data can be queried from a web service and displayed in a web page

Outline Introduction Inca architecture Case study: V&V on TeraGrid Current and Future Work Feedback

TeraGrid TeraGrid - an enabling cyberinfrastructure for scientific research ANL, Caltech, Indiana Univ., NCSA, ORNL, PSC, Purdue Univ., SDSC, TACC 40+ TF, 1+ PB, 40Gb/s net Common TeraGrid Software & Services Common user environment across heterogeneous resources TeraGrid VO service agreement

Validation & Verification Common software stack: 20 core packages: Globus, SRB, Condor-G, MPICH-G2, OpenSSH, SoftEnv, etc. 9 viz package/builds: Chromium, ImageMagick, Mesa, VTK, NetPBM, etc. 21 IA-64/Intel/Linux packages: glibc, GPFS, PVFS, OpenPBS, intel compilers, etc. 50 version reporters: compatible versions of SW 123 tests/resource: package functionality Services: Globus GRAM, GridFTP, MDS, SRB, DB2, MyProxy, OpenSSH Cross-site: Globus GRAM, GridFTP, OpenSSH

Validation & Verification (cont.) Common user environment $TG_CLUSTER_SCRATCH, $TG_APPS_PREFIX, etc. SoftEnv configuration - manipulate user environment Verify environment vars defined in default environment Verify Softenv keys defined consistently across sites

Inca deployment on TeraGrid 9 sites/16 resources Run under user account inca

Detailed Status Views SW packages Resources

Drill-down capability

Summary Status History of percentage of tests passed in Grid category for a 6 month period All tests passed: 100% One or more tests failed: < 100% Key Tests not applicable to machine or have not yet been ported

Measuring TeraGrid Performance GRASP (Grid Assessment Probes) test and measure performance of basic grid functions Pathload [Dovrolis et al] measures dynamic available bandwidth uses efficient and lightweight probes

Lessons learned Initially focused on system administrative view Moving towards user-centric view File transfer functionality and performance File system availability Job submission SRB performance Interconnect bandwidth Applications: NAMD, AWM

Integration with Knowledge Base Are you having problem(s) with: Data Job Management Security … YES: Are you having trouble transferring a file? YES: Are you seeing poor performance? … 1.Check to see if you have valid proxy … Narrow down trouble area Reporters they can run Narrow down set of reporters

Outline Introduction Inca architecture Case study: V&V on TeraGrid Current and Future Work Feedback

Inca Today Software available at: Current version: Also available in NMI R7 Users:

Inca 2.0 Initial version of Inca focused on basic functionality New features: Improved storage & archiving capabilities Scalability - control and data storage Usability - improved installation and configuration control Performance - self-monitoring Security - SSL, proxy delegation Condor integration Release in 3-6 months

View Error History Submit information or suggestions Search for information on error/reporter

View Resource Usage

Summary Inca is a framework that provides automated testing, benchmarking, and monitoring Grid-level execution to detect problems and report to system administrators Users can view status pages and compare to problems they see Users can run reporters as themselves to debug account/environment problems Currently in-use for TeraGrid V&V, GEON, and others

Outline Introduction Inca architecture Case study: V&V on TeraGrid Current and Future Work Feedback

Feedback How are you monitoring your Grid infrastructure? What do you need to test? What diagnostic/debugging tools are available to users? Displaying test results to users? In what format? How much detail?

More Information Current Inca version: New version in 3-6 months