Presentation is loading. Please wait.

Presentation is loading. Please wait.

Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006.

Similar presentations


Presentation on theme: "Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006."— Presentation transcript:

1 Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

2 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 2 What is Accounting? (in the Grid context) Grid accounting is the process of maintaining a (consistent) Grid-wide view of VO members' resource utilization.[1] [1] Accounting in Grid Environments, by Peter Gardfjäll, Department of Computing Science, Umeå University,Sweden

3 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 3 Why do we want an accounting system?  Resource providers (SLAC, Fermilab…) want to perform cost- benefits analysis  Resource providers wants to improve planning  Resource providers want better security  Resource providers want to improve QoS (priorities, debugging…)  Support a Grid “Economic Model”

4 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 4 What is the real problem (solution)?  Nobody talked about “Grid economy”  Do we really want an Accounting system?  Or maybe a monitoring system will do? Lets look at accounting and monitoring

5 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 5 Accounting vs. Monitoring A monitoring system:  Purpose: monitoring system health, debugging, system profiling  Gathers state information about the system resources  Collects system events.  It works like a DAQ system: as close as possible to the system, as less intrusive as possible  Quasi Real-time to real-time An accounting system:  It keeps track of resources usage  It links a users’ service requests with the resources consumed to satisfied that requests  It has accounts, banks, “currency” and support an economic model (policies)  “After the facts”

6 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 6 For Example: Monitoring at SLAC What do we monitor:  Network  Switches, routers status Internet Mbytes/sec in/out  Computer Clusters  Batch systems, NFS and AFS servers, databases servers  Storage Space  Disks usage, HPPS  Some metrics we use:  CPU utilization, Memory  Disk usage, Disk I/O  Various Networking metrics (Mbytes in/out of switches, routers, servers…)  Some primitive job submission results (LSF)  We use a lot of monitoring tools and infrastructure: Ganglia, Nagios, OpenView, SNTP tools, Monalisa…

7 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 7 For Example: Accounting at SLAC?  The monitoring system cannot link resource usage to users/groups  Maybe by looking into the logs and correlating the events…but a lot of work  Accounting infrastructures and tools ala Ganglia or Nagios do not exist  Basically we cannot (yet) fully link a user name with a precise set of computing resource usage metrics

8 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 8 What I think we should track  Job submission:  Priority in the batch queue  CPU-time  Wall clock time  Memory usage  Storage  Disk usage,  Tape storage usage  Storage class (to be defined)  Network data transfer  Network speed  Quantity of data transferred  Special software usage, Operator/Administrator services…maybe later

9 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 9 Goals  Track services and resources usage per grid user after the fact  Focus on quality, integrity and security of the information  Accounting Information easily available to people (web interface) and to applications (Web Services)  Build a system that is simple to manage (install, configure and upgrade) and to extends (well defined APIs)  Based on well proven and standard (industrial strength) technologies  However we do not cover (but keep in mind)  User charging system,  Resources or services pricing  Support for an economic model for resource allocation

10 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 10 System Properties  Interoperability  The Accounting System should leverage existing standards to maximize interoperability with other Grids and Accounting Services.  Fault Tolerance  Reduce and flag data loss.  Resilient to communication failures over LAN and WAN.  resilient to the failure of one of its component.  Security  Guarantees integrity and non–repudiation of the accounting records at the site level.  Uses secure communication channels (mutual authentication, message integrity, confidentiality) and access control lists.  Scalability and Performance  Not really an issue  Other  leverage existing tools and infrastructures to solve related problems.

11 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 11 Simple Domain Model

12 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 12 Design Direction  We are currently focused on getting the infrastructure right more than the specific metrics to measure resources usage  Open: we give APIs  Distributed: Meters are distributed objects  Based on open source standard technologies: Web Services, Java Platform, Tomcat, Axis, Hibernate  Same idea as GUMS and JClarens: the service is an independent Tomcat Application (JClarens for authentication)  Insure interoperability with OSG partners (LCG, TeraGrid…)

13 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 13 Architecture Overview

14 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 14 Meter  A Meter is responsible for  Gathering all the data about a Grid service usage  Gathering all the data about the resources used by that Grid service  Assembling a Service Usage record  Logically there is 1 Meter entity per 1 Grid Service  Each Meter is composed by one or more Probes and one Assembler (plus some other components for management functions)  Grid Service uses resources distributed across the Resource Provider’s LAN, therefore the Meter is also distributed

15 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 15 Meter Logical View

16 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 16 Meter’s Probe and Assembler  Probes use secure channel (mutual authentication, data integrity) to send usage information to the Assemblers.  Usage information is packaged in ProbeEvents that are send to the Assemblers through a Web Service interface.  Each ProbeEvent object has a standard header and a payload in XML format.  Probes use “at least one semantics” technique to send ProbeEvents to the Assemblers (communication is resilient to failure)  Assemblers can choose synchronous or asynchrous processing of ProbeEvents

17 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 17 Collector Main functionalities:  Hosting the Meters' components (the Assemblers) that are responsible for assembling Service Usage Records  Monitoring the Meters' components called Probes  Communication between Probes and Assemblers: routing of ProbesEvents to the proper Assembler  Communication between Assemblers and Data Store

18 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 18 Collector Logical View Data Store Component

19 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 19 Accountant  This is a component thought for future use.  Main functionalities:  further process the Service Usage Records to apply economic policy (pricing & billing)

20 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 20 Deployment View  Deployed as a Tomcat application: can take advantage of Tomcat clustering features for scalability and availability  Collector and Publisher can run on two different Tomcat instance  Can use the most popular database implementations; the database server can be on the same host with Tomcat or on different host  Probes can run anywhere on the LAN

21 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 21 Deployment Diagram

22 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 22

23 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 23 Conclusion  More Information  Project Charter, Requirements and Design Documents Project Charter  OSG Accounting Twiki page and OSG Accounting Twiki  Mailing list: osg-accounting@openscience.org Mailing list  Any Questions, Comments, etc?

24 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 24 SPARE SLIDES

25 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 25

26 CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 26 Prob e Collector Repository of Accounting Records Data Store Access Layer Resource Provider Site W SA PI Web Presenter Statistical Analyzer Prob e Collector Repository of Accounting Records Grid Operation Center Prob e Collector Repository of Accounting Records Data Store Access Layer VO Center Web Presenter Statistical Analyzer Prob e Data Store Access Layer Web Presenter Statistical Analyzer Overview


Download ppt "Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006."

Similar presentations


Ads by Google