Presentation is loading. Please wait.

Presentation is loading. Please wait.

Panda Monitoring, Job Information, Performance Collection Kaushik De (UT Arlington), Torre Wenaus (BNL) OSG All Hands Consortium Meeting March 3, 2008.

Similar presentations


Presentation on theme: "Panda Monitoring, Job Information, Performance Collection Kaushik De (UT Arlington), Torre Wenaus (BNL) OSG All Hands Consortium Meeting March 3, 2008."— Presentation transcript:

1 Panda Monitoring, Job Information, Performance Collection Kaushik De (UT Arlington), Torre Wenaus (BNL) OSG All Hands Consortium Meeting March 3, 2008 Kaushik De (UT Arlington), Torre Wenaus (BNL) OSG All Hands Consortium Meeting March 3, 2008

2 Torre Wenaus, BNL 2 Panda Basics Launched 8/05 to achieve scalable data-driven WMS Production 12/05 Integrated with data mgmt Pilot-based ‘CPU harvesting’ Analysis as well as production Automation, monitoring, low operations manpower Insulate users (end- and VO-) from grid complexity, problems Lower entry threshold OSG program since 9/06 VO-neutral Condor integration Cautious in its dependencies Proven components Launched 8/05 to achieve scalable data-driven WMS Production 12/05 Integrated with data mgmt Pilot-based ‘CPU harvesting’ Analysis as well as production Automation, monitoring, low operations manpower Insulate users (end- and VO-) from grid complexity, problems Lower entry threshold OSG program since 9/06 VO-neutral Condor integration Cautious in its dependencies Proven components Workload management system for Production ANd Distributed Analysis Developed by U.S.ATLAS – now adopted ATLAS-wide

3 Torre Wenaus, BNL 3 Operations Monitoring Panda Monitoring Link

4 Torre Wenaus, BNL 4 Workflow Monitoring

5 Torre Wenaus, BNL 5 Error Reporting, Tracking

6 Torre Wenaus, BNL 6

7 7 Job Information Detailed information from job specification schema

8 Torre Wenaus, BNL 8 User Level Monitoring - ‘My Panda’

9 Torre Wenaus, BNL 9 Non-ATLAS OSG Usage Currently CHARMM protein folding application. Soliciting others User/VO does - job submission, using simple http-based Python client - pilot submission, such that pilots carry their DN identity - queue group (tag) organization they require BNL/ATLAS/OSG provides - Panda service/DB infrastructure; same as used by US ATLAS - Panda monitoring, VO customization possible - Configured machine(s) for VO pilot submission (@ Madison) - Support from ~3 FTE pool at BNL - Future: Data mgmt and data-driven workflow User/VO does - job submission, using simple http-based Python client - pilot submission, such that pilots carry their DN identity - queue group (tag) organization they require BNL/ATLAS/OSG provides - Panda service/DB infrastructure; same as used by US ATLAS - Panda monitoring, VO customization possible - Configured machine(s) for VO pilot submission (@ Madison) - Support from ~3 FTE pool at BNL - Future: Data mgmt and data-driven workflow

10 Torre Wenaus, BNL 10 Usage Accounting Accounted by ‘Panda site’ Corresponding to queue(s) at a physical site, or a VO Accounted by ‘Panda site’ Corresponding to queue(s) at a physical site, or a VO

11 Torre Wenaus, BNL 11 Queue Info DB Site/queue status, configuration info Loaded from various sources: grid info services, data management configuration, Panda configuration Automatic control of current queue status from BDII Or, operator-driven queue status via http interface Data for intelligent brokerage: e.g. available releases, memory Site performance statistics gathering The basis of dynamic brokerage Dynamic pilot rate controls Site/queue status, configuration info Loaded from various sources: grid info services, data management configuration, Panda configuration Automatic control of current queue status from BDII Or, operator-driven queue status via http interface Data for intelligent brokerage: e.g. available releases, memory Site performance statistics gathering The basis of dynamic brokerage Dynamic pilot rate controls

12 Torre Wenaus, BNL 12 Panda Monitoring Usage by OSG The obvious way: use Panda WMS! As CHARMM does But can monitoring be used independently of Panda doing the workload management? Some motivations: Uniform OSG-wide usage monitoring/reporting/job diagnostics Managing resource controls and quotas: usage data gathering; defining, applying and enforcing quotas VO-specific data reporting and presentation Answer is yes, quite easily, if there is interest Through simple http based data submission to Panda DBs The obvious way: use Panda WMS! As CHARMM does But can monitoring be used independently of Panda doing the workload management? Some motivations: Uniform OSG-wide usage monitoring/reporting/job diagnostics Managing resource controls and quotas: usage data gathering; defining, applying and enforcing quotas VO-specific data reporting and presentation Answer is yes, quite easily, if there is interest Through simple http based data submission to Panda DBs

13 Torre Wenaus, BNL 13 Panda Monitoring Outside Panda Panda job submission interface is based on http’ing an info packet defining the job to the Panda server Could use the same interface to define a job to Panda for monitoring purposes only Job status updates would be sent to Panda the same way So current job state, job time per state etc. can be recorded Because Panda DB schema remain unchanged, Panda monitoring works out of the box While also being customizable based on VO-specific job info sent with the job definition Similarly the usage reporting and performance summarizing tools would be available Give us a guinea pig VO/application and we can try this out Panda job submission interface is based on http’ing an info packet defining the job to the Panda server Could use the same interface to define a job to Panda for monitoring purposes only Job status updates would be sent to Panda the same way So current job state, job time per state etc. can be recorded Because Panda DB schema remain unchanged, Panda monitoring works out of the box While also being customizable based on VO-specific job info sent with the job definition Similarly the usage reporting and performance summarizing tools would be available Give us a guinea pig VO/application and we can try this out

14 Torre Wenaus, BNL 14 Site Performance Probes Primarily, Panda monitoring tracks/reports actual workflow of VO specific applications Recent extensions include data management and release installation applications (for ATLAS) Further extension could be VO specific test probes Special pilot jobs probe VO specific functionalities Runs at regular intervals at all sites Panda monitoring provides integrated interface to site performance through pilot probes On our ToDo list for ATLAS – could become generic tool for all VO’s Primarily, Panda monitoring tracks/reports actual workflow of VO specific applications Recent extensions include data management and release installation applications (for ATLAS) Further extension could be VO specific test probes Special pilot jobs probe VO specific functionalities Runs at regular intervals at all sites Panda monitoring provides integrated interface to site performance through pilot probes On our ToDo list for ATLAS – could become generic tool for all VO’s


Download ppt "Panda Monitoring, Job Information, Performance Collection Kaushik De (UT Arlington), Torre Wenaus (BNL) OSG All Hands Consortium Meeting March 3, 2008."

Similar presentations


Ads by Google