C. Loomis – Status EDG – Dec. 12, 2002 – 1 Status of the European DataGrid Project Charles Loomis (LAL/CNRS) LAL December 12, 2002 Outline Introduction.

Slides:



Advertisements
Similar presentations
SAM-Grid Status Core SAM development SAM-Grid architecture Progress Future work.
Advertisements

Data Management Expert Panel - WP2. WP2 Overview.
Andrew McNab - Manchester HEP - 2 May 2002 Testbed and Authorisation EU DataGrid Testbed 1 Job Lifecycle Software releases Authorisation at your site Grid/Web.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Plateforme de Calcul pour les Sciences du Vivant SRB & gLite V. Breton.
Andrew McNab - EDG Access Control - 14 Jan 2003 EU DataGrid security with GSI and Globus Andrew McNab University of Manchester
A Computation Management Agent for Multi-Institutional Grids
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
C. Loomis – Title – Date – 1 European DataGrid Charles Loomis Seillac June 26, 2002 Outline EDG Introduction & Goals EDG Architecture Example Job Submission.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
A Model for Grid User Management Rich Baker Dantong Yu Tomasz Wlodek Brookhaven National Lab.
AustrianGrid, LCG & more Reinhard Bischof HPC-Seminar April 8 th 2005.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
Workload Management Massimo Sgaravatto INFN Padova.
Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
The EDG Testbed Deployment Details The European DataGrid Project
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
VOX Project Status T. Levshina. Talk Overview VOX Status –Registration –Globus callouts/Plug-ins –LRAS –SAZ Collaboration with VOMS EDG team Preparation.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Enabling Grids for E-sciencE ENEA and the EGEE project gLite and interoperability Andrea Santoro, Carlo Sciò Enea Frascati, 22 November.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
Grid Workload Management Massimo Sgaravatto INFN Padova.
CMS Stress Test Report Marco Verlato (INFN-Padova) INFN-GRID Testbed Meeting 17 Gennaio 2003.
The European DataGrid Project Team The EU DataGrid.
C. Loomis – Integration Status- October 31, n° 1 Integration Status October 31, 2001
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
Quick Introduction to NorduGrid Oxana Smirnova 4 th Nordic LHC Workshop November 23, 2001, Stockholm.
09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
DataGRID PTB, Geneve, 10 April 2002 Testbed Software Test Plan Status Laurent Bobelin on behalf of Test Group.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
Overview of Privilege Project at Fermilab (compilation of multiple talks and documents written by various authors) Tanya Levshina.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
An overview of the EGEE infrastructure and middleware EGEE is funded by the European Union under contract IST Emmanuel Medernach Based on a.
DataGrid is a project funded by the European Commission under contract IST rd EU Review – 19-20/02/2004 The EU DataGrid Project Three years.
The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
VOX Project Status T. Levshina. 5/7/2003LCG SEC meetings2 Goals, team and collaborators Purpose: To facilitate the remote participation of US based physicists.
Site Authorization Service Local Resource Authorization Service (VOX Project) Vijay Sekhri Tanya Levshina Fermilab.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
Stephen Burke – Sysman meeting - 22/4/2002 Partner Logo The Testbed – A User View Stephen Burke, PPARC/RAL.
CERN Certification & Testing LCG Certification & Testing Team (C&T Team) Marco Serra - CERN / INFN Zdenek Sekera - CERN.
Current Globus Developments Jennifer Schopf, ANL.
Bob Jones – Project Architecture - 1 March n° 1 Project Architecture, Middleware and Delivery Schedule Bob Jones Technical Coordinator, WP12, CERN.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th A proposal for distributed computing monitoring for SuperB G.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
DataGrid France 12 Feb – WP9 – n° 1 WP9 Earth Observation Applications.
The EDG Testbed Deployment Details
StoRM: a SRM solution for disk based storage systems
U.S. ATLAS Grid Production Experience
Introduction to Grid Technology
Update on EDG Security (VOMS)
Presentation transcript:

C. Loomis – Status EDG – Dec. 12, 2002 – 1 Status of the European DataGrid Project Charles Loomis (LAL/CNRS) LAL December 12, 2002 Outline Introduction & Goals EDG Architecture EDG Deployment & Use External Software Typical Failure Modes Future Developments

C. Loomis – Status EDG – Dec. 12, 2002 – 2 European DataGrid (EDG) European DataGrid  EU-funded, 3-year project (2001-3)  Goals: —develop grid middleware —deploy onto working testbed —demonstrate grid technology with working applications  Strong application component unique! EDG Organization WP1Workload Mgt. WP2Data Mgt. WP3Info. & Monitoring Sys. WP4Fabric Mgt. WP5Storage Mgt. WP6Testbed WP7Networking WP8HEP Apps. WP9Biomedical Apps. WP10Earth Ob. Apps. WP11Dissemination WP12Project Mgt. 6 Partners; 21 Associates

C. Loomis – Status EDG – Dec. 12, 2002 – 3 EDG Goals Actors End Users Virtual Organization Site Administrators Transparent Access  Allow users transparent access to authorized resources with single authentication.  Allow users to delegate authorization to services.  High-level selection of resources, including datasets. Virtual Organizations  Allow groups of people to acquire resources from sites.  Allow organization to manage resource use among members. Optimization  Allow optimal use of resources at site and grid levels.

C. Loomis – Status EDG – Dec. 12, 2002 – 4 EDG Architecture Global Batch System:  Centralized Architecture.  Heavy infrastructure. Computing Element Storage Element Site X Information Systems submit query retrieve broker chooses optimal site for job Resource Broker User Interface publish state MDS Replica Catalogs

C. Loomis – Status EDG – Dec. 12, 2002 – 5 Comments Optimization of Resources  Centralized Architecture  Resource Broker —must know state of grid and schedule effectively —requires knowledge of site policies and user/job details  Information System (MDS & RC) —must respond quickly to high-volume and high-rate queries Central Points-of-Failure  Resource Broker (redundancy at VO-level)  MDS (unique hierarchy; some redundancy possible) With high-rate submissions:  RB requires lots of memory, CPU, disk space.  MDS requires lots of file descriptors, CPU.

C. Loomis – Status EDG – Dec. 12, 2002 – 6 Authentication & Authorization Computing Element Storage Element Site X Update CRL retrieve membership lists proxy sent for authentication User Certification Authorities Virtual Organizations register request certificate accept/reject request ~10 Different VOs ATLAS, CMS, … ~15 National CAs France, INFN, … /C=FR/O=CNRS/OU=LAL/CN=Charles

C. Loomis – Status EDG – Dec. 12, 2002 – 7 Comments Infrastructure  ~15 National CAs as production service  10 Virtual Organizations —High-Energy Physics: ALICE, BaBar, ATLAS, CMS, DZero, LHCb —Earth Observation —Biomedical Applications —Misc.: WP6, ITeam, Guidelines Limited Central Points-of-Failure  VO Membership Server (for VO members)  Certification Authority (for CA members) Caching, infrequent updates minimize problems; compromise security.

C. Loomis – Status EDG – Dec. 12, 2002 – 8 Deployment & Use Development Testbed (1.4.0)  To facilitate testing and integration of new middleware.  3 sites (3 countries) SiteLocationCPUs CC-IN2P3Lyon (F)4005 CERNGeneva (CH)1646 CNAFBologna (I)40 LegnaroLegnaro (I)50 NIKHEFAmsterdam (NL)22 PadovaPadova (I)12 RALRutherford (GB)162 Production Testbed (1.4.0)  For applications to use & stress software in “semi-production” environment.  8 sites (5 countries) Application Use  CMS Event Simulation  ATLAS Event Simulation  Regular Tutorial Use Stability  Filled Grid this week!

C. Loomis – Status EDG – Dec. 12, 2002 – 9 Globus Experience GSI Security (OK)  Some limitations with size of proxies. GridFTP (OK)  Recent protocol change because of security fix. Replica Catalog (OK, limited)  Unannounced, unnecessary schema change. GateKeeper/JobManager (Poor)  Race conditions under load leading to failures.  High resource use; poor response to errors. Information System-MDS (Poor)  Serious problems with stability.  Query times increase dramatically under load.

C. Loomis – Status EDG – Dec. 12, 2002 – 10 Globus Experience (cont.) Interaction  Generally responsive to identified problems.  Little advance warning of major changes. —Schema changes. —Rewrite of JobManager/Batch System interface. Testing  Essentially non-existent by Globus.  Major delays in EDG because of MDS and Gatekeeper.  Finding/testing/fixing of major problems done outside Globus. Globus “high-level” services inappropriate for production environment.

C. Loomis – Status EDG – Dec. 12, 2002 – 11 Condor Experience CondorG  Used for reliable job submission from Resource Broker.  Responsive to problems and provide quick fixes.  Encountered few problems in our testing. Condor  Supported “batch” system for EDG.  Largely untested, but expect to use with next major release.

C. Loomis – Status EDG – Dec. 12, 2002 – 12 Typical Failure Modes Operations:  CRL generation (CA); CRL update (sites)  Network accessibility (VO LDAP servers)  Misconfiguration of services (typically SE) Poor implementation (BUGS)  Most catastrophic ones eliminated. Resource Exhaustion  File descriptors, ports, disk space. Design Limitations  Central points-of-failure (RB, MDS).

C. Loomis – Status EDG – Dec. 12, 2002 – 13 Future Developments EDG Plans:  Advanced data management —Real “Storage Element”. —Replica Location Service (distributed Replica Catalog) —Replica Manager (higher-level user interface)  Job Management —job splitting, checkpointing —interactive jobs  Replace MDS with R-GMA.  More robust, consistent security model.  Local resources better tied to grid credentials. OGSA (Open Grid Services Architecture)  New services written as web services.  Probably no complete conversion with EDG lifetime.

C. Loomis – Status EDG – Dec. 12, 2002 – 14 SlashGrid Grid File System:  Uses grid credentials for access to local files.  Frees grid user from local unix account. —Simplifies mapping of users to accounts. —Allows true account recycling. More Uses:  Could hide remote access to data.  Provide compatibility to Globus security model.  … Implementation:  User-space daemon on top of CODA kernel module.  Plug-in interface allows easy extension.

C. Loomis – Status EDG – Dec. 12, 2002 – 15 Authentication & Authorization (VOMS) Computing Element Storage Element Site X Update CRL proxy sent for authentication and authorization User Certification Authorities VOMS request “ticket” request certificate accept/reject request ~15 National CAs France, INFN, … Local Authorization Decision!

C. Loomis – Status EDG – Dec. 12, 2002 – 16 Conclusions Software & Testbed:  Production-quality security infrastructure in place.  Production and development testbeds: —Deployed. —Starting to see heavy use by end-users. —Reasonable stability for the first time.  Failure modes: —Moving from bugs and operations problems to design and resource limitations. Unanswered Questions:  Can optimization be achieved? At what level?  How can resources be limited, reserved, and shared?  Can efficient scheduling be done with inhomogeneous site policies?