Presentation is loading. Please wait.

Presentation is loading. Please wait.

INFNGRID Workshop – Bari, Italy, October 2004

Similar presentations


Presentation on theme: "INFNGRID Workshop – Bari, Italy, October 2004"— Presentation transcript:

1 INFNGRID Workshop – Bari, Italy, 25-27 October 2004
Experience supporting experiments data challenges on LCG-2 and requirements Flavia Donno, IT-GD, CERN for LCG/GDA group EGEE is a project funded by the European Union under contract IST

2 Mandate of the LCG/EIS Team
EIS : Experiment Integration and Support Team Help LHC Experiments integrating their production environment with the Grid Middleware and utilities. Give support during all steps of the integration process: understanding of the middleware functionality, testing new prototypal components, getting on the LCG Infrastructure. Production is the main focus. Experiment Support does not mean User Support. Experiment Support does not mean GOC. INFNGRID Workshop, Bari, Italy CERN IT-GD October

3 EIS Organization One person per experiment The EIS Testbed
Patricia Mendez Lorenzo: Alice Simone Campana: ATLAS Andrea Sciaba’: CMS Roberto Santinelli: LHCb Antonio Delgado Peris: Development and Docs Flavia Donno: Coordination Docs, special middleware distributions, examples Central Repository for Special utilities Experiment Software Installation Toolkit IS Interface Tools Data Management Prototype Utilities Special WMS (Integration with exp catalog) Authorization APIs INFNGRID Workshop, Bari, Italy CERN IT-GD October

4 Main Tasks We had a different experience with Integration and Support before and during experiments Data Challenges Integration Help with middleware functionality and usage Perform functionality tests Provide special distributions Provide missing tools where needed Discuss requirements and bring them to the attention of the developers Check problems and understand the origin of them Check how the middleware and infrastructure are used and suggest better ways if appropriate Support Provide documentation: Manuals, Guides, User Scenarios, FAQ Provide usage examples Provide and maintain a private testbed Answer first line User Support questions INFNGRID Workshop, Bari, Italy CERN IT-GD October

5 Integration During Experiments Data Challenges
Everything described up to now Active participation to daily organization meetings Understanding of experiment specific production environment Development of special utilities to use in experiment specific software (Monnalisa sensors, IS APIs, RLS interface, etc.) Quite intensive activity. It takes one person full time per experiment INFNGRID Workshop, Bari, Italy CERN IT-GD October

6 Support During Experiments Data Challenges
Everything described up to now But also … Monitoring experiment specific production system (even in shift) Provide full user support Configuring experiment specific utilities (acrontab, etc.) Chasing mis-configured sites and solving site-related problems Suggesting better site configuration for resource usage Monitoring/Managing GRID and Experiment Specific Services Provide Security Advice Not really in the EIS mandate INFNGRID Workshop, Bari, Italy CERN IT-GD October

7 Middleware Issues Middleware:
See Provides summaries of middleware related issues Important: 1st systematic confrontation of required functionalities with capabilities of the existing middleware Some can be patched, worked around, but most has to be direct input as essential requirements to gLite and future developments Some are fundamental problems with underlying models and architectures Middleware: Not perfect but quite stable Much has been improved during DC’s – a lot of effort still going into improvements and fixes Big hole is missing space management on SE’s Largest problem now is stable operations and providing status information and useful tools to users INFNGRID Workshop, Bari, Italy CERN IT-GD October

8 Middleware Issues Solved: Leak in WMS APIs Resubmission to same site
“BrokerHelper: cannot plan” – Retries with BDII implemented in RB Jobs remaining in waiting state due to overloaded BDII Services crashing Queue timing information not normalized Queues per VO Latency in BDII refresh “Globus Error 155” – 8% of jobs affected – output non retrievable Fuzzy rank improved Proxy reusage ... INFNGRID Workshop, Bari, Italy CERN IT-GD October

9 Middleware Issues - Requirements
To be solved: BDII response time – “better” information system Possibility to selects sites based on the published characteristics of the WNs (such as available disk space, memory, swap,etc.) Support for bulk operations Better monitoring tools Improved mechanism to distribute jobs on the Grid RB should automatically consider certain published site info (such as max queueable jobs) Retrieving of sandbox files always possible Improved error reporting Data Management (DM) with pre-staging capabilities Default SE for output storage DM with retries and timeout. Real DM Service. Scalable File Catalogue (FC) with transactions and security. Real SE solution. Consistency between FC and SE content. ... INFNGRID Workshop, Bari, Italy CERN IT-GD October

10 Operational Issues Slow response from sites
Upgrades, response to problems, etc Problems reported daily – some problems last for weeks Lack of staff available to fix problems All on vacation, … Mis-configurations (see next slide) Lack of configuration management – problems that are fixed reappear Lack of fabric management Lack of understanding (training?) Admins reformat disks of SE … Firewall issues often no good coordination between grid admins and firewall maintainers PBS problems Are we seeing the scaling limits of PBS? Forget to read documentation … INFNGRID Workshop, Bari, Italy CERN IT-GD October

11 Site Mis-configuration
Site mis-configuration was responsible for most of the problems that occurred during the experiments Data Challenges. Here is a non-complete list of problems: – The variable VO_<VO>_SW_DIR points to a non existent area on WNs. – The ESM is not allowed to write in the area dedicated to the software installation – Only one certificate allowed to be mapped to the ESM local account – Wrong information published in the information system (Glue Object Classes not linked) – Queue time limits published in minutes instead of seconds and not normalized – /etc/ld.so.conf not properly configured. Shared libraries not found. – Machines not synchronized in time – Grid-mapfiles not properly built – Pool accounts not created but the rest of the tools configured with pool accounts – Firewall issues – CA files not properly installed – NFS problems for home directories or ESM areas – Services configured to use the wrong BDII – Wrong user profiles – Default user shell environment too big INFNGRID Workshop, Bari, Italy CERN IT-GD October

12 Summary EIS provides help integrating VO specific software environment with GRID middleware Our organization foresees direct experiment support via a contact person, a testbed, software repository, special middleware distributions, documentation Integration and support are quite intensive tasks During experiment Data Challenges we covered tasks that are proper of a GOC or GGUS, chasing middleware bugs, maintaing experiment specific services, assisting sites with configuration problems, collecting middleware and functionality requirements Overall a very interesting a productive experience (LHC experiments seem to find EIS team very supportive and needed) Our mailing list: EIS experiment contacts: EIS testbed users: Our WEB site: INFNGRID Workshop, Bari, Italy CERN IT-GD October


Download ppt "INFNGRID Workshop – Bari, Italy, October 2004"

Similar presentations


Ads by Google