Presentation on theme: "C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002."— Presentation transcript:
C. Loomis – Status of European DataGrid – May 23, 2002 – 1 Status of European DataGrid Charles Loomis CNRS/LAL NorduGrid Workshop May 23, 2002
C. Loomis – Status of European DataGrid – May 23, 2002 – 2 Introduction & Outline European DataGrid 3-year EU-funded project Goals: develop grid middleware deploy onto working testbed demonstrate grid technology with working applications Strong application component unique! Current Software Machine Tour Status Testbed Deployed software Present & Future Sites Near-term Developments EDG v1.2 Latest Globus Release EDG License Longer-term Developments Testing & Support Infrastructure Enhanced EDG Features Interoperability Further Information
C. Loomis – Status of European DataGrid – May 23, 2002 – 3 User Interface Lightweight access to grid Access from Laptop No host certificate needed. Some question about CRLs. Limitations Cannot run ftp daemon here. Services: UserInterface (CLI) Globus GSI globus-url-copy (client) Development libraries BrokerInfo Replica Catalog APIs GDMP client interface
C. Loomis – Status of European DataGrid – May 23, 2002 – 4 Resource Broker Finds resources, submits & tracks jobs: Heavyweight machine. Talks to RC and MDS. Acts as users network presence. Talks to proxy server. Bottleneck Can replicate, but enough? Services: Resource Broker JobSubmission Service Condor-G below Information Index Logging & Bookkeeping GSI-ftp daemon
C. Loomis – Status of European DataGrid – May 23, 2002 – 5 Computing Element Accepts & Executes Jobs: Gatekeeper acts as public interface to computing resources Worker Node(s) provides all software needed for applications accessible via batch system PBS, LSF, … Services: Gatekeeper GSI-ftp daemon GIIS/GRIS
C. Loomis – Status of European DataGrid – May 23, 2002 – 6 Storage Element Generic interface to storage: Gatekeeper should go away GSIFTP RFIO Services: Gatekeeper GDMP GSI-ftp daemon RFIO daemon
C. Loomis – Status of European DataGrid – May 23, 2002 – 7 Replica Catalog Provides information about replicas: Catalog Service accessed via RB or directly Services: LDAP GIIS/GRIS
C. Loomis – Status of European DataGrid – May 23, 2002 – 8 Authorization/Authentication System All based on GSI (PKI): Certification Authorities Virtual Organization Servers Services: LDAP for VO servers various SW for CAs mkgridmap generation software
C. Loomis – Status of European DataGrid – May 23, 2002 – 9 Software Distribution & Installation Storage: Package repository CVS server Distribution HTTP downloads wget with rpm lists most primitive link in chain Installation LCFG (LCFG-lite) Only works for RH6.2
C. Loomis – Status of European DataGrid – May 23, 2002 – 10 Software on Production Testbed Stopped work on 1.1-series to focus on 1.2. Deployed v1.1.4+patches version not uniform Significant functionality missing for applications. Replica Management Access to mass storage. Difficult for middleware to support this version. Testbed works, but… Known stability problems: Information Index dies regularly. Broker needs to be restarted often. Support limited Maintenance reduced to life support. Effort for new sites limited to available effort.
C. Loomis – Status of European DataGrid – May 23, 2002 – 11 Production Testbed Sites Production Sites Most have dedicated hardware. Lyon running on main batch system. Typically few to 10s of machines. LCFG for Install. & Config. Lyon again exception. Limitations to Expansion Info. systems unreliable. manual reg. not scalable or dynamic How to add countries w/o CA? OK for users (CNRS CA) Not OK for host certificates. SiteLocation CataniaCatania (I) CC-IN2P3Lyon (F) CERNGeneva (CH) CNAFBologna (I) Imperial CollegeLondon (UK) MSUMoscow (Russia) NIKHEFAmsterdam (NL) PadovaPadova (I) RALRutherford (UK) TorinoTorino (I) Croatia Taiwan United States
C. Loomis – Status of European DataGrid – May 23, 2002 – 12 EDG Release 1.2 New Features in 1.2 Release ( 10) Replica Management API first implementation has limited API Access to Mass Storage Systems authorization linked to user account mapping Auto-resubmission of failed jobs. will help with stability problems (but is not a solution!) Current Problems GASS cache file locking problems (failed job submissions) OpenLDAP timeout (II hangs; complete loss of MDS information) FTree interfering with gatekeeper. (Causes crashes; failed submissions)
C. Loomis – Status of European DataGrid – May 23, 2002 – 13 Expected Schedule May June ITeam at CERN1.2 alpha RAL/CNAFTest 3 Sites Refine alpha GASS/MDS Prbs. JJ/Ingo Tests<1% error rate App. Testing 1.3 code license info Deployment Decision ESRIN Demo Core Site Deployment General Deployment 1.2 beta
C. Loomis – Status of European DataGrid – May 23, 2002 – 14 Upgrade to Latest Globus Release EDG Globus beta-21 is based on first Globus2 beta. Includes some patches for security. Some EDG-specific patches. (Larger changes for EDG 1.2.) Upgrade to current Globus2 release depends on: Desire of the applications groups Only known critical problem is with file transfers >20min. Whether it contains fixes for GASS/MDS problems. When EDG software for release 1.2 is deemed stable. EDG 2.0 release in fall will be based on Globus2! OGSA being evaluated, but no whole-scale move yet. Some new EDG software functions as Web Service
C. Loomis – Status of European DataGrid – May 23, 2002 – 15 Testing & Support Testing Group Goal: Intensive testing of releases Provide framework for: unit tests integration tests stress tests Provide material for objective evaluation of software for EU-review. Use tests for: check of quality of software verification of functionality check configuration of new sites Has started with EDG 1.2 ( 10). should have feedback for EDG 1.2 deployment decision Support Infrastructure Provide -based support for both end-users and system administrators. ITeam and other experts New system administrator group Tracking & follow-up of problems. Create knowledge base for FAQs and typical problems. Interact with LCG and CrossGrid to share the support effort. System in place shortly; fully functional for Testbed2.
C. Loomis – Status of European DataGrid – May 23, 2002 – 16 EDG Software License EDG software license will be in BSD family (see EDG website): OpenSource license. Developments may be put back into code base. Allows commercial use of code. Standard license for most Grid-projects Exception: ClassAds, Condor-G will be LGPL. EDG audit of external packages: Necessary to ensure we can apply our own license. Necessary to ensure that we properly attribute other groups work. Need to be especially careful with GPL code. Ensure that core functionality consistent with license. LCFG will likely be GPL license rather than the EDG license.
C. Loomis – Status of European DataGrid – May 23, 2002 – 17 Release Schedule Moved to iterative releases: Keep developments compatible. Provide intermediate checks on progress. Allow applications to evaluate functionality. Not all intermediate releases will be deployed! Release 2.0 is hard deadline; others somewhat flexible. Details in Release Plan document on web site, highlights… ReleaseDate 1.1Jan March May July Sept. 30
C. Loomis – Status of European DataGrid – May 23, 2002 – 18 Release 1.2 General Emphasis on stability. Deploy as production release. Globus Uses first Globus2 beta (beta-21) Plus EDG patches. Workload Management (WP1) Proxy renewal for long jobs. Auto-resubmission of failed jobs. Data Management (WP2) Replica Manager (first impl.) GDMP 3.0 Fabric Management (WP4) Updated LCFG EDG Gatekeeper (LCAS) Storage Element (WP5) Access to existing data in MSS. Networking (WP7) Publish network data into MDS.
C. Loomis – Status of European DataGrid – May 23, 2002 – 19 Release 1.3 General Autobuild all EDG packages. Copyright and license for code. Globus Update to latest Globus2 release Workload Management (WP1) C APIs MPICH support. Data Management (WP2) Replica Manager Replica Location Service (giggle) Grid Mon./Info. Services (WP3) R-GMA deployed in parallel with MDS Fabric Management (WP4) EDG JobManager Storage Element (WP5) RFIO with GSI Prototype GridFTP with MSS access. Networking (WP7) Network cost function.
C. Loomis – Status of European DataGrid – May 23, 2002 – 20 Release 1.4 General Support RH6.2, RH7.2 GLUE Schema New authorization scheme. Workload Management (WP1) Interactive jobs. Job dependencies. Triggered file transfers. Data Management (WP2) Replica Manager with Optimiser SpitFire beta release. Grid Mon./Info. Services (WP3) Better integration of R-GMA. Unified (GLUE) schema. Fabric Management (WP4) KickStart translator. Monitoring & Alarms. Condor supported. Storage Element (WP5) DiskManager for disk-only SE. Testbed (WP6) New authorization scheme. Networking (WP7) Publication of network metrics.
C. Loomis – Status of European DataGrid – May 23, 2002 – 21 Release 2.0 General Support RH6.2, RH7.2, Solaris? Workload Management (WP1) Job checkpointing. Accounting. Advance reservation. Data Management (WP2) Full integration of components. Grid Mon./Info. Services (WP3) R-GMA WebServices Fabric Management (WP4) HLD templates. Credential service (LCMAPS). Storage Element (WP5) DiskManager access to all HSM. Reservation, pinning, quotas. Testbed (WP6) Laptop based UI machine. Networking (WP7) Network cost for all sites.
C. Loomis – Status of European DataGrid – May 23, 2002 – 22 Interoperability Working with GriPhyN, PPDG, iVDGL, DataTag, CrossGrid, First concrete example is GLUE schema. Places for conflict: Information systems Agreed interfaces
C. Loomis – Status of European DataGrid – May 23, 2002 – 23 Further Information Interesting web sites: EDG: general information about EDG project links to all work package web sites WP6: support information (contacts, bug reporting, documentation, mailing lists) meeting agenda/minutes links to source code in CVS; packages in package repository Bleeding-edge information: Warning: this is a high-volume list!