Presentation is loading. Please wait.

Presentation is loading. Please wait.

FP6−2004−Infrastructures−6-SSA-026409 www.eu-eela.org E-infrastructure shared between Europe and Latin America Giuseppe Platania INFN Catania First EELA.

Similar presentations


Presentation on theme: "FP6−2004−Infrastructures−6-SSA-026409 www.eu-eela.org E-infrastructure shared between Europe and Latin America Giuseppe Platania INFN Catania First EELA."— Presentation transcript:

1 FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Giuseppe Platania INFN Catania First EELA ROC-on-Duty Tutorial Itacuruçá Island, State of Rio de Janeiro, Brazil 29 November December 2006 Troubleshooting of common problems

2 FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 2 Outline SECURITY JOB SUBMISSION SITE BDII

3 FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania SECURITY

4 FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 4 Security (1/5) GRAM Authentication test failure: –Test: globusrun -a -r GRAM Authentication test failure: authentication failed: GSS Major Status: Authentication Failed GSS Minor Status Error Chain: init.c:499: globus_gss_assist_init_sec_context_async: Error during context initialization init_sec_context.c:171: gss_init_sec_context: SSLv3 handshake problems globus_i_gsi_gss_utils.c:888: globus_i_gsi_gss_handshake: Unable to verify remote side's credentials globus_i_gsi_gss_utils.c:847: globus_i_gsi_gss_handshake: Unable to verify remote side's credentials: Couldn't verify the remote certificate OpenSSL Error: s3_pkt.c:1046: in library: SSL routines, function SSL3_READ_BYTES: sslv3 alert bad certificate –Solutions: check on CE if the CA rpm is installed or if the 2119 port is closed by a firewall –You find more details at the page 2 of the troubleshooting guide

5 FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 5 Security (2/5) Invalid CRL: The available CRL has expired: –Test: globusrun -a -r GSS authentication failure GSS Major Status: Authentication Failed GSS Minor Status Error Chain: accept_sec_context.c:170: gss_accept_sec_context: SSLv3 handshake problems globus_i_gsi_gss_utils.c:881: globus_i_gsi_gss_handshake: Unable to verify remote side's credentials globus_i_gsi_gss_utils.c:854: globus_i_gsi_gss_handshake: SSLv3 handshake problems: Couldn't do ssl handshake OpenSSL Error: s3_srvr.c:1816: in library: SSL routines, function SSL3_GET_CLIENT_CERTIFICATE: no certificate returned globus_gsi_callback.c:351: globus_i_gsi_callback_handshake_callback: Could not verify credential globus_gsi_callback.c:477: globus_i_gsi_callback_cred_verify: Could not verify credential globus_gsi_callback.c:769: globus_i_gsi_callback_check_revoked: Invalid CRL: The available CRL has expired Failure: GSS failed Major:000a0000 Minor: Token: –Solutions: check on CE if the CRL has expired (see /var/log/globus-gatekeeper.log) If yes run: /opt/glite/libexec/fetch-crl.sh –You find more details at the page 3-4 of the troubleshooting guide

6 FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 6 Security (3/5) FTPD GSSAPI error: GSS Major Status: General failure: –Test: edg-gridftp-ls gsiftp:// / error the server sent an error response: FTPD GSSAPI error: GSS Major Status: Authentication Failed 535-FTPD GSSAPI error: GSS Minor Status Error Chain: 535-FTPD GSSAPI error: accept_sec_context.c:170: gss_accept_sec_context: SSLv3 handshake problems 535-FTPD GSSAPI error: globus_i_gsi_gss_utils.c:881: globus_i_gsi_gss_handshake: Unable to verify remote side's credentials 535-FTPD GSSAPI error: globus_i_gsi_gss_utils.c:854: globus_i_gsi_gss_handshake: SSLv3 handshake problems: Couldn't do ssl handshake 535-FTPD GSSAPI error: OpenSSL Error: s3_srvr.c:1816: in library: SSL routines, function SSL3_GET_CLIENT_CERTIFICATE: no certificate returned 535-FTPD GSSAPI error: globus_gsi_callback.c:351: globus_i_gsi_callback_handshake_callback: Could not verify credential 535-FTPD GSSAPI error: globus_gsi_callback.c:436: globus_i_gsi_callback_cred_verify: The certificate has expired: Credential with subject: /C=IT/O=GILDA/OU=Personal Certificate/L=INFN Catania/CN=Giuseppe has expired. 535 FTPD GSSAPI error: accepting context –Solutions: Syncronize the nodes –You find more details at the page 5 of the troubleshooting guide

7 FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 7 Security (4/5) No local mapping for Globus ID –Test: edg-gridftp-ls gsiftp:// / error the server sent an error response: LCMAPS credential mapping NOT successful (see /var/log/gridftp-lcas_lcmaps.log) LCMAPS 0: :57: : lcmaps_plugin_voms_poolaccount-plugin_run(): no match (or no poolaccount available) for group (/VO=gilda/GROUP=/gilda) in /opt/edg/etc/lcmaps/gridmapfile –Solutions: ensure that under /etc/grid-security/gridmapdir there are the pool accounts files (such as gildaxxx) –You find more details at the page 6-8 of the troubleshooting guide

8 FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 8 Security (5/5) LCMAPS credential mapping NOT successful: –Test: edg-gridftp-ls gsiftp:// / error the server sent an error response: LCMAPS credential mapping NOT successful (see /var/log/gridftp-lcas_lcmaps.log) LCMAPS 0: :57: : lcmaps_plugin_voms_poolaccount-plugin_run(): no match (or no poolaccount available) for group (/VO=gilda/GROUP=/gilda) in /opt/edg/etc/lcmaps/gridmapfile Solutions: check if: oVO is enabled oin /opt/edg/etc/lcmaps/gridmapfile thare are the VOMS entries oall pool accounts are already in use –You find more details at the page 9 of the troubleshooting guide

9 FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania JOB SUBMISSION

10 FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 10 Job Submission (1/8) 7 authentication failed: –Reasons from edg-job-get-logging-info: 7 authentication failed: GSS Major Status: Authentication Failed GSS Minor Status Error Chain:init.c:497: globus_gss_assist_init_sec_context_async: Error during context initialization init_sec_context –Solutions: check for the reverse lookup problem in "/etc/hosts" on the client side or dns configuration. –You find more details at the page 10 of the troubleshooting guide

11 FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 11 Job Submission (2/8) Cannot read JobWrapper output.... : –Reasons from edg-job-get-logging-info: Cannot read JobWrapper output, both from Condor and from Maradona –Solutions:  Fix WN, CE, DNS or batch system configuration.  Check PBS status running pbsnodes qstat  Try restarting the PBS daemons on the CE (and on the WN).  The gatekeeper and the gridftpd on the CE might not map the DN to the same local user.  This can happen if the one service is configured to use VOMS (via LCMAPS), while the other relies on the standard grid-mapfile. Test this as follows: $ globus-job-run my-CE /usr/bin/id $ globus-url-copy file:/etc/group gsiftp://my-CE/tmp/test $ globus-job-run my-CE /bin/ls -l /tmp/test –You find more details at the page of the troubleshooting guide

12 FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 12 Job Submission (3/8) Brokerhelper: Cannot plan. No compatible resources : –Reasons from edg-job-get-logging-info: Cannot plan (a helper failed) –Solutions: The status Cannot plan (a helper failed) means that a helper of the Workload Manager failed. Match making may fail for several reasons that may arise either from a failing middleware component, or application software unavailable, or a wrong request in the job JDL:  middleware failure is due to Information Service problems: the service is down  application software unavailable: the JDL requires a wrong/unsupported software version by that site  wrong user request takes place when the user asks for: an unsopported CPU type/operating system/memory/VO –You find more details at the page of the troubleshooting guide

13 FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 13 Job Submission (4/8) ssh problem from WN to CE: –TEST: globus-job-run :2119/jobmanager-lcgpbs -q short /bin/id It doesn’t give back no output –Solutions:  Remove shosts.equiv and ssh_known_hosts files from /etc/ssh directory on the CE.  Re-run the following scripts on CE, that are usually also cron jobs: /opt/edg/sbin/edg-pbs-knownhosts /opt/edg/sbin/edg-pbs-shostsequiv  Re-run the following script on WN, that is usually also a cron job: /opt/edg/sbin/edg-pbs-knownhosts –You find more details at the page 15 of the troubleshooting guide

14 FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 14 Job Submission (5/8) submit-helper script... gave error: cache export dir...: –TEST: globus-job-run :2119/jobmanager-lcgpbs /bin/id submit-helper script running on host lxb1761 gave error: cache_export_dir (/home/dteam002/.lcgjm/globus-cache-export.Of5sOd) on gatekeeper did not contain a cache_export_dir.tar archive –Solutions :  The CE is not running a gridftp daemon. Check on the CE: o/etc/init.d/globus-gridftp status oRestart it as needed  The gridftp port could be closed –You find more details at the page of the troubleshooting guide

15 FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 15 Job Submission (6/8) Globus error 3: –Reason from edg-job-get-logging-info: Got a job held event, reason: Globus error 3: an I/O operation failed –Solutions : The problem was that memory was very low. queue_submit() in Helper.pm of GRAM checks for memory and returns a NORESOURCES error if the free memory is less than 2% of the total, NORESOURCES is GRAM error 3, not necesarily IO. Check the WN that has the above problem and reboot it –You find more details at the page 18 of the troubleshooting guide

16 FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 16 Job Submission (7/8) Unspecified gridmanager error: –Reason from edg-job-get-logging-info: Got a job held event, reason: Unspecified gridmanager error –Solutions : the user does not have permission to submit to the given queue, or because the batch system is in some bad state. Check it on the configuration of the CE. –You find more details at the page 19 of the troubleshooting guide

17 FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 17 Job Submission (8/8) GRAM Job submission failed: –TEST: globus-job-run :2119/jobmanager-lcgpbs /bin/id GRAM Job submission failed because the job manager failed to open stderr (error code 74) –Solutions : The UI does not have inbound connectivity for the GLOBUS_TCP_PORT_RANGE ( ). Fix the UI’s firewall. –You find more details at the page 20 of the troubleshooting guide

18 FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania SITE BDII

19 FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 19 Site BDII (1/3) Check if the GIIS is running on CE: ldapsearch -x -h -p b mds-vo-name=,o=grid ldap_bind: Can't contact LDAP server Solution: check if the site bdii is running on CE: o/etc/init.d/bdii status orestart it as needed

20 FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 20 Site BDII (2/3) The Site BDII doesn’t publish CE informations: Ensure that in /opt/bdii/etc/bdii-update.conf there is the CE ldap URL such as: CE ldap://ce.localdomain:2135/mds-vo-name=local,o=grid Solution: put CE ldap URL in /opt/bdii/etc/bdii-update.conf and restart the BDII service (see /opt/bdii/var/bdii.log to check the errors)

21 FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 21 Site BDII (3/3) Site BDII error: tail -f /opt/bdii/var/bdii.log CE: ldap_bind: Can't contact LDAP server Time for searches: 0 s Time to update DB: 1 s Grabbing port 2170 for 2172 Tue Sep 19 11:47:45 CEST 2006 Sleeping for 30 Solution: ensure that the GRIS is running: o/etc/init.d/globus-mds restart orestart it as needed


Download ppt "FP6−2004−Infrastructures−6-SSA-026409 www.eu-eela.org E-infrastructure shared between Europe and Latin America Giuseppe Platania INFN Catania First EELA."

Similar presentations


Ads by Google