Download presentation
Presentation is loading. Please wait.
Published byScarlett Fields Modified over 8 years ago
1
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th A proposal for distributed computing monitoring for SuperB G. Russo, D. Del Prete, S. Pardi, INFN Napoli & Università Federico II, Napoli, Italy
2
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th The rationale The distributed computing system that will support the SuperB project will need a valid software tool for the management and monitoring Most functionalities have already been coded (e.g. Atlas tier1), but there is no general model, to be used in a distributed environment The typical case is the Italian SuperB Tier1 for offline analysis, not yet designed, but which will likely be e a distributed Tier1, over three-four separated sites. 2
3
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th Computing sites for SuperB in Italy 3 3
4
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th The Model 4 The requirements of distributed computing centers covering heterogeneous needs We require a monitoring system that allows us to centralize a wide range of services Sw requirements System Monitoring Model kind of users services
5
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th System Monitoring Requirements 1/2 Highly usable and cross-platform: Web Based Interactive interface Adequate user profiling: Operator, Support, User Enable user authentication through X.509 certificates Service Oriented Architecture Modular and Extensible composition Single Sign-On authentication Access to distributed remote resources 5 non-functional requirements
6
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th 6 System Monitoring Requirements 2/3 Centralize all necessary applications in a Web portal Use individual applications as components 6
7
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th Why Liferay ? Already chosen by IGI, the Italian grid infrastructure new institute Experienced users in Napoli, Catania, Bari Public domain, but support is available Integration with authentication and authorization tools already done Can accomodate existing tools with minimal re-writing, if any 7 7
8
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th The functional requirements relate to services monitoring which belong to each level Grid architecture, ranging from all services machine monitoring (cpu, storage, ambient sensors, …), and ends with the resource monitoring that users utilizing through applications (grid sites mapping, queue and job advanced monitoring, …) 8 System Monitoring Requirements 3/3 functional requirements
9
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th The System Features 9 Machine-level services monitoring (Fabric Layer) “Status and Notifications about all basic services” Servers node: cpu load, disk space, free memory, ping, … Network devices: Traffic Tx-Rx, traffic load %, ping latency, Errors/Discarded packets detection, … Ambient sensors: liquid cooling and ambient temperature, fan speed, … Temporal Graph report Event log reporting Web management access Interactive maps consultation
10
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th 10 System Features Middleware level monitoring and management Verification of node installation instance is in line with the fruition of services (job execution, toolkit software, …) Distributed package versions monitoring Distribute Initialization of the remote nodes from web interface Storage Resource Manager services monitoring
11
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th 11 The System Features Local Resource Management Systems monitoring Grid site mapping Queues and Job advanced Monitoring All information at LRMS level, examples: – Host on which the job is running – Time when the job was created; – Time when the job is queued; – Time when the job is eligible to be sent to execution; – Time when the job was sent running Monitoring point of view: – Virtual organization – virtual organizations groups – Queues Graphical reporting states
12
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th 12 The System Features Application-level WebUI (Application Layer) Access and management of jobs in grid systems User services: – Send the job to the grid system – Check the status of each user’s job – Delete user’s jobs – Retrieve user’s job outputs – Report any errors through the clear messages that include reason for this error – Retrieving information about the Storage Element, Computing Element, LFC and TAG – Send and retrieve files from different Storage Element and register by LFC
13
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th What users would like to monitor in a distributed Tier1 ? END Users: Access to Grid applications and resource monitorig All LRMS information (Queue and Job status) WebUI job submission Data flow from Tier 0 (as in Atlas Tier 2) Disk space control Device and machine status 13
14
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th What users would like to monitor in a distributed Tier1 ? OPERATIONS: (includes End-Users privileges) Remote administration of distributed file systems and storage resources (as in Atlas Tier 1) Remote web management access (nodes, network device, …) (as in Atlas Tier 2) Distribute Initialization of the remote nodes (from web interface) and package versions monitoring with a centralized interface (new) 14
15
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th What users would like to monitor in a distributed Tier1 ? SUPPORT: (includes End-Users privileges) The Support-User can access all contents of the Operator-User, but cannot change configurations Event log web interface Notifications of critical events Ticket system for troubleshooting 15
16
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th 16 System architecture Liferay integrates all tools, and will provide x.509 authentication and services access using single sign-on philosophy
17
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th Using Liferay as a portlet container, we could integrate several etherogeneous tools, allowing an integrated vision 17 Atlas Tier2 experience
18
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th 18 Atlas Network Monitoring
19
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th 19 Atlas Devices Monitoring
20
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th EXAMPLE On-line example at: http://tier2.na.infn.it 20
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.