December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide High Level Grid Services Warren Smith Texas.

Slides:



Advertisements
Similar presentations
The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
Advertisements

Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.
1 CHEP 2000, Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, (*) Work in collaboration with.
A Computation Management Agent for Multi-Institutional Grids
Intermediate Condor: DAGMan Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Grid Monitoring By Zoran Obradovic CSE-510 October 2007.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
Part 8: DAGMan A: Grid Workflow Management B: DAGMan C: Laboratory: DAGMan.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Grid Compute Resources and Job Management. 2 Local Resource Managers (LRM)‏ Compute resources have a local resource manager (LRM) that controls:  Who.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
CPT Demo May Build on SC03 Demo and extend it. Phase 1: Doing Root Analysis and add BOSS, Rendezvous, and Pool RLS catalog to analysis workflow.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Intermediate Condor: Workflows Rob Quick Open Science Grid Indiana University.
GO-ESSP Workshop, LLNL, Livermore, CA, Jun 19-21, 2006, Center for ATmosphere sciences and Earthquake Researches Construction of e-science Environment.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Information Services Andrew Brown Jon Ludwig Elvis Montero grid:seminar1:lectures:seminar-grid-1-information-services.ppt.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.
SAN DIEGO SUPERCOMPUTER CENTER Inca Control Infrastructure Shava Smallen Inca Workshop September 4, 2008.
Intermediate Condor: Workflows Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Introduction to The Storage Resource.
Grid Compute Resources and Job Management. 2 How do we access the grid ?  Command line with tools that you'll use  Specialised applications Ex: Write.
Rights Management in Globus Data Services Ann Chervenak, ISI/USC Bill Allcock, ANL/UC.
Peter F. Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Managing Job.
Grid Compute Resources and Job Management. 2 Job and compute resource management This module is about running jobs on remote compute resources.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Peter Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Introduction &
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
SAN DIEGO SUPERCOMPUTER CENTER Welcome to the 2nd Inca Workshop Sponsored by the NSF September 4 & 5, 2008 Presenters: Shava Smallen
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
STAR Scheduler Gabriele Carcassi STAR Collaboration.
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
High Level Grid Services
Condor DAGMan: Managing Job Dependencies with Condor
Operations Support Manager - Open Science Grid
Condor-G Making Condor Grid Enabled
Presentation transcript:

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide High Level Grid Services Warren Smith Texas Advanced Computing Center University of Texas

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Outline Grid Monitoring –Ganglia –MonALISA –Nagios –Others Workflow –Condor DAGMan (and Condor-G) –Pegasus Data –Storage Resource Broker –Replica Location Service –Distributed file systems

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Other High Level Services (Not Covered) Resource Brokering Metascheduling –GRMS, MARS Credential issuance –PURSE, GAMA Authorization –Shibboleth –VOMS –CAS

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Grid Monitoring Ganglia MonALISA Nagios Others

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Ganglia Monitors clusters and aggregations of clusters Collects system status information –Provided in XML documents –Provides it graphically via a web interface Can be subscribed to and aggregated across multiple clusters Focus on simplicity and performance –Can monitor 1000s of systems MDS, MonALISA can consume information provided by Ganglia

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide gmond Ganglia Monitoring Daemon Runs on each resource being monitored Collects a standard set of information Configuration file specifies –When to collect information –When to send Based on time and/or change –Who to send to –Who to allow to request Supports UDP unicast, UDP multicast, TCP

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Information collected by gmond

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide gmetric Program to provide custom information to Ganglia –e.g. CPU temperature, batch queue length Uses the gmond configuration file to determine who to send to Executed as a cron job –Execute command(s) to gather the data –Execute gmetric to send data

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide gmetad Aggregates information from gmonds Configuration file specifies which gmonds to get data from –Connects to gmonds using TCP Stores information in Round Robin Database (RRD) –Small database where data for each attribute is stored in time order –Maximum size –Oldest data is forgotten PHP scripts to display RRD data as web pages –Graphs over time

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Who’s Using Ganglia? Planet Lab Lots of clusters –SDSC –NASA Goddard –Naval Research Lab –…

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide MonALISA Distributed monitoring system Agent-based design Written in Java Uses JINI & SOAP/WSDL –Locating services & communicating Gathers information using other systems –SNMP, Ganglia, MRTG, Hawkeye, custom Clients –Locate and subscribe to services that provide monitoring information –GUI client, web client, administrative client

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Monitoring I2 Network Traffic, Grid03 Farms and Jobs

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide MonALISA Services Autonomous, self-describing services –Built on a generic Dynamic Distributed Services Architecture Each monitoring service stores data in a relational database Automatic update of monitoring services Lookup discovery service

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Who’s using MonALISA? Open Science Grid –Included in the Virtual Data Toolkit Internet2 ABILENE Compact Muon Solenoid Many others

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Nagios Overview A monitoring framework –Configurable –Extensible Provides a relatively comprehensive set of functionality Supports distributed monitoring Supports taking actions in addition to monitoring Large community using and extending Doesn’t store historical data in a true database Quality of add-ons varies

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Nagios send_ncsa Nagios plugins Nagios configuration files Remote system Architecture Nagios send_nsca Nagios plugins Nagios configuration files Remote system Nagios CGIs Nagios NSCA httpd Nagios log files Nagios plugins Nagios configuration files Central collector

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Nagios Features I Web interface –Current status, graphs Monitoring –Monitoring of a number of properties included –People provide plugins to monitor other properties, we can do the same –Periodic monitoring w/ user-defined periods Thresholds to indicate problems Actions when problems occur –Notification , page, extensible –Actions to attempt to fix problem (e.g. restart a daemon)

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Nagios Features II Escalations –If a problem occurs n times do x Attempt to fix automatically –If a probem occurs more than n times do y Ticket in to trouble ticket system –… Distributed monitoring –A Nagios daemon can test things all over –Can also have Nagios daemons on multiple systems Certain daemons can act as central collection points

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Who’s Using Nagios? It’s included in a number of Unix distros –Debian –SUSE –Gentoo –OpenBSD Nagios users can register with the site –986 sites have registered –~200,000 hosts monitored –~720,000 services monitored

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide TeraGrid’s Inca Hierarchical Status Monitoring –Groups tests into logical sets –Supports many levels of detail and summarization Flexible, scalable architecture –Very simple reporter API –Can use existing test scripts (unit tests, status tools) –Hierarchical controllers –Several query/display tools

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide And Many Others… SNMP –OpenNMS –HP OpenView Big Brother / Big Sister Globus MDS ACDC (U Buffalo) GridCat GPIR (TACC) …

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Workflow Condor DAGMan –Starting with Condor-G Pegasus

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Workflow Definition Set of tasks with dependencies Tasks can be anything, but in grids: –Execute programs –Move data Dependencies can be –Control - “do T2 after T1 finishes” –Data - “T2 input 1 comes from T1 output 1” Can be acyclic or have cycles/iterations Can have conditional execution A large variety of types of workflows

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Condor-G: Condor + Globus Submit your jobs to condor –Jobs say they want to run via Globus Condor manages your jobs –Queuing, fault tolerance Submits jobs to resources via Globus

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Globus Universe Condor has a number of universes –Standard - to take advantage of features like checkpointing and redirecting file I/O –Vanilla - to run jobs without the frills –Java - to run java codes Globus universe to run jobs via Globus –Universe = Globus –Which Globus Gatekeeper to use –Optional: Location of file containing your Globus certificate universe = globus globusscheduler = beak.cs.wisc.edu/jobmanager executable = progname queue

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide How Condor-G Works Schedd LSF Personal CondorGlobus Resource Queues, submits, and manages jobs Available commands: –condor_submit, condor_rm, condor_q, condor_hold, … Manages cluster resources

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide How Condor-G Works Schedd LSF Personal CondorGlobus Resource 600 Globus jobs

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide How Condor-G Works Schedd LSF Personal CondorGlobus Resource GridManager 600 Globus jobs

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide How Condor-G Works Schedd JobManager LSF Personal CondorGlobus Resource GridManager 600 Globus jobs

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide How Condor-G Works Schedd JobManager LSF User Job Personal CondorGlobus Resource GridManager 600 Globus jobs

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Globus Universe Fault Tolerance Submit side failure: – All relevant state for each submitted job is stored persistently in the Condor job queue. – This persistent information allows the Condor GridManager upon restart to read the state information and reconnect to JobManagers that were running at the time of the crash. Execute side: – Condor worked with Globus to improve fault tolerance X.509 proxy expiration –Condor can put jobs on hold and user to refresh proxy

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Condor DAGMan Directed Acyclic Graph Manager DAGMan allows you to specify the dependencies between your Condor jobs, so it can manage them automatically for you. (e.g., “Don’t run job “B” until job “A” has completed successfully.”)

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide What is a DAG? A DAG is the data structure used by DAGMan to represent these dependencies. Each job is a “node” in the DAG. Each node can have any number of “parent” or “children” nodes – as long as there are no loops! Job A Job B Job C Job D

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Defining a DAG A DAG is defined by a.dag file, listing each of its nodes and their dependencies: # diamond.dag Job A a.sub Job B b.sub Job C c.sub Job D d.sub Parent A Child B C Parent B C Child D Each node will run the Condor job specified by its accompanying Condor submit file Each node can have a pre and post step Job A Job BJob C Job D

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Submitting a DAG To start your DAG, just run condor_submit_dag with your.dag file, and Condor will start a personal DAGMan daemon which to begin running your jobs: % condor_submit_dag diamond.dag condor_submit_dag submits a Scheduler Universe Job with DAGMan as the executable. Thus the DAGMan daemon itself runs as a Condor job, so you don’t have to baby-sit it.

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Running a DAG DAGMan manages the submission of your jobs to Condor based on the DAG dependencies. –Can configure throttling of job submission In case of a failure, DAGMan creates a “rescue” file with the current state of the DAG. –Failures can be retried a configurable number of times –The rescue file can be used to restore the prior state of the DAG when restarting Once the DAG is complete, the DAGMan job itself is finished, and exits

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Who’s Using Condor-G & DAGMan? Pegasus LIGO, Atlas, CMS, … gLite TACC DAGMan available on every Condor pool

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Pegasus Pegasus - Planning for Execution on Grids –Intelligently decide how to run a workflow on a grid Take as input an abstract workflow –Abstract DAG in XML (DAX) Generates concrete workflow –Select computer systems (MDS) –Select file replicas (RLS) Executes the workflow (Condor Dagman)

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Pegasus Science Gateway Condor

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Pegasus Workflows Abstract workflow –Edges are data dependencies Implicit data movement –Processing on the data Concrete workflow –Edges are control flow Explicit data movement as tasks Acyclic Supports parallelism

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Who’s Using Pegasus? LIGO Atlas High energy physics application Southern California Earthquake Center (SCEC) Astronomy: Montage and Galaxy Morphology applications Bioinformatics Tomography

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Data Storage Resource Broker Replica Location Service

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Storage Resource Broker (SRB) Manages collections of data –In many cases, the data are files Provides a logical namespace Maps logical names to physical instances Associates metadata with logical names –Metadata Catalog (MCat) Interfaces to variety of storage –Local disk –Parallel file systems –Archives –Databases

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide SRB Client Implementations A set of Basic APIs –Over 160 APIs –Used by all clients to make request to servers Scommands –Unix like command line utilities for UNIX and Window platforms –Over 60 - Sls, Scp, Sput, Sget …

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide SRB Client Implementations inQ – Window GUI browser Jargon – Java SRB client classes –Pure Java implementation mySRB – Web based GUI –run using web browser Java Admin Tool –GUI for User and Resource management Matrix – Web service for SRB work flow

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide SRB server SRB agent SRB server Example Read MCAT Read Application SRB agent Logical Name 1.Logical-to-Physical mapping 2.Identification of Replicas 3.Access & Audit Control Peer-to-peer Brokering Data Access R1 R

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Authentication Grid Security Infrastructure –PKI certificates Challenge-response mechanism –No passwords sent over network Ticket –Valid for specified time period or number of accesses Generic Security Service API –Authentication of server to remote storage

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Authorization Collection-owned data –At each remote storage system, an account ID is created under which the data grid stores files User authenticates to SRB SRB checks access controls SRB server authenticates to a remote SRB server Remote SRB server authenticates to the remote storage repository

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Metadata in SRB SRB System Metadata Free-form Metadata (User-defined) –Attribute-Value-Unit Triplets… Extensible Schema Metadata –User Defined –Tables integrated into MCAT Core Schema External Database Metadata operations –Metadata Insertion through User Interfaces –Bulk Metadata Insertion –Template based Metadata Extraction –Query Metadata through well defined Interfaces

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Who’s Using SRB? Very large number of users A sample: –National Virtual Observatory –Large Hadron Collider –NASA –NCAR –BIRN

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Replica Location Service (RLS) Maintains a mapping from logical file names to physical file names –1 logical file to 1+ physical files Improves performance and fault tolerance when accessing data Supports user-defined attributes of logical files Component of Globus toolkit –WS-RF service RLS was designed and implemented in a collaboration between the Globus project and the EU DataGrid project

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Replica Location Service In Context RLS is one component in a data management architecture Provides a simple, distributed registry of mappings Consistency management provided by higher-level services

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide LRC RLI LRC Replica Location Indexes Local Replica Catalogs Replica Location Index (RLI) nodes aggregate information about one or more LRCs LRCs use soft state update mechanisms to inform RLIs about their state: relaxed consistency of index Optional compression of state updates reduces communication, CPU and storage overheads RLS Features Local Replica Catalogs (LRCs) contain consistent information about logical-to-target mappings

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Who’s Using RLS? Used with Pegasus and Chimera: –LIGO –Atlas High energy physics application –Southern California Earthquake Center (SCEC) –Astronomy: Montage and Galaxy Morphology applications –Bioinformatics –Tomography Other RLS Users –QCD Grid, US CMS experiment (integrated with POOL)

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Distributed File Systems What everyone would like Hard to implement Features that are needed –Performance –Fault tolerance –Security –Fine-grained authorization –Access via Unix file system libraries and programs –User-defined metadata Some would like this

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Example Distributed File Systems AFS & DFS –Kerberos for security –Performance and fault tolerance problems NFS –Performance, security, and fault tolerance problems NFSv4 –Tries to imporve performance and security GridNFS –Univ of Michigan –Extend NFSv4 –Add grid security and improve performance IBM GPFS –Originally designed as a cluster parallel file system –Being used in distributed environments –Relatively large hardware requirements

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Summary Grid Monitoring –Ganglia –MonALISA –Nagios –Others Workflow –Condor DAGMan (and Condor-G) –Pegasus Data –Storage Resource Broker –Replica Location Service –Distributed file systems