INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org www.glite.org Building a robust distributed system: some lessons from R-GMA CHEP-07, Victoria,

Slides:



Advertisements
Similar presentations
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MyProxy and EGEE Ludek Matyska and Daniel.
Advertisements

21 Sep 2005LCG's R-GMA Applications R-GMA and LCG Steve Fisher & Antony Wilson.
WP2: Data Management Gavin McCance University of Glasgow.
INFSO-RI Enabling Grids for E-sciencE Information and Monitoring Status and Plans GridPP18, Glasgow, Mar 2007.
INFSO-RI Enabling Grids for E-sciencE Information and Monitoring Status and Plans GridPP16, QMUL, 29 Jun 2006 Steve.
EGEE is a project funded by the European Union under contract IST R-GMA: Status and Plans Antony Wilson / RAL GridPP 12 - Brunel
Thialfi: A Client Notification Service for Internet-Scale Applications
Databasteknik Databaser och bioinformatik Data structures and Indexing (II) Fang Wei-Kleiner.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Web Performance Tuning Lin Wang, Ph.D. US Department of Education Copyright [Lin Wang] [2004]. This work is the intellectual property of the author. Permission.
Chapter 4 Memory Management Basic memory management Swapping
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Hash Tables.
Lectures on File Management
Indexing.
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
COEN/ELEC I Performance Testing slides 1-12 II User Documentation slides 13 - end 2.
The Zebra Striped Network File System Presentation by Joseph Thompson.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
The Google File System.
Wide-area cooperative storage with CFS
Backup and Recovery Part 1.
FIREWALL TECHNOLOGIES Tahani al jehani. Firewall benefits  A firewall functions as a choke point – all traffic in and out must pass through this single.
Scalability By Alex Huang. Current Status 10k resources managed per management server node Scales out horizontally (must disable stats collector) Real.
1 The Google File System Reporter: You-Wei Zhang.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Introduction to R-GMA: Relational Grid Monitoring Architecture.
File Processing - Database Overview MVNC1 DATABASE SYSTEMS Overview.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Information System (IS) Valeria Ardizzone.
Application code Registry 1 Alignment of R-GMA with developments in the Open Grid Services Architecture (OGSA) is advancing. The existing Servlets and.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks R-GMA Now With Added Authorization Steve.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
Serverless Network File Systems Overview by Joseph Thompson.
WP3 Information and Monitoring Steve Fisher / RAL 23/9/2003.
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
An information and monitoring system for static and dynamic information about grid resources, applications, networks … RDBMS Servlet aware of API during.
EGEE is a project funded by the European Union under contract IST R-GMA: Production Services for Information and Monitoring in the Grid John.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.
INFSO-RI Enabling Grids for E-sciencE Information and Monitoring Status and Plans Plzeň, 10 July 2006 Steve Fisher/RAL.
WP3 RGMA Deployment Laurence Field / RAL Steve Fisher / RAL.
INFSO-RI Enabling Grids for E-sciencE
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
A Data Stream Publish/Subscribe Architecture with Self-adapting Queries Alasdair J G Gray and Werner Nutt School of Mathematical and Computer Sciences,
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
A Scalable Virtual Registry Service for jGMA Matthew Grove DSG Seminar 3 rd May 2005.
INFSO-RI Enabling Grids for E-sciencE Building a robust distributed system: some lessons from R-GMA WLCG Service Reliability.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
INFSO-RI Enabling Grids for E-sciencE Information System Valeria Ardizzone INFN EGEE NA4 Generic Applications Meeting Catania,
EGEE is a project funded by the European Union under contract IST Information and Monitoring Services within a Grid R-GMA (Relational Grid.
FESR Trinacria Grid Virtual Laboratory Relational Grid Monitoring Architecture (R-GMA) Valeria Ardizzone INFN Catania Tutorial per Insegnanti.
INFSO-RI Enabling Grids for E-sciencE R-GMA Gergely Sipos and Péter Kacsuk MTA SZTAKI Credit to Valeria Ardizzone.
Chapter 5 Record Storage and Primary File Organizations
FILE SYSTEM IMPLEMENTATION 1. 2 File-System Structure File structure Logical storage unit Collection of related information File system resides on secondary.
INFSO-RI Enabling Grids for E-sciencE gLite Information System: R-GMA Tony Calanducci INFN Catania gLite tutorial at the EGEE User.
ZOOKEEPER. CONTENTS ZooKeeper Overview ZooKeeper Basics ZooKeeper Architecture Getting Started with ZooKeeper.
20 Copyright © 2006, Oracle. All rights reserved. Best Practices and Operational Considerations.
INFSO-RI Enabling Grids for E-sciencE Information and Monitoring Status and Plans GridPP-DB, IC London, 12 July, 2007.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
System Components Operating System Services System Calls.
Relational Grid Monitoring Architecture (R-GMA)
HERON.
Grid Event Management Using R-GMA Monitoring Framework
R-GMA as an example of a generic framework for information exchange
R-GMA (Relational Grid Monitoring Architecture) for monitoring applications “s” gLite and LCG.
Chapter 2: Operating-System Structures
RELATIONAL GRID MONITORING ARCHITECHTURE
THE GOOGLE FILE SYSTEM.
Overview Multimedia: The Role of WINS in the Network Infrastructure
Chapter 2: Operating-System Structures
Presentation transcript:

INFSO-RI Enabling Grids for E-sciencE Building a robust distributed system: some lessons from R-GMA CHEP-07, Victoria, Canada, 3-7 September 2007 Steve Fisher/RAL on behalf of JRA1-UK

Enabling Grids for E-sciencE INFSO-RI Steve Fisher/RAL 2 Overview GMA and R-GMA overview Managing Memory Usage Buffers –Consumer –Primary producer –Secondary producer Loss of control messages Replication –Schema –Registry Conclusions

Enabling Grids for E-sciencE INFSO-RI Steve Fisher/RAL 3 GMA Defined by the GGF –Now OGF 3 Components –Producer –Consumer –Registry Real system needs to tie down message formats –This has been done by R-GMA The INFOD-WG at GGF –IBM, Oracle and others have defined a GMA compliant specification Producer Registry Consumer Register Locate Query Data

Enabling Grids for E-sciencE INFSO-RI Steve Fisher/RAL 4 R-GMA Relational implementation of the GGFs GMA Provides a uniform method to publish and access both information and monitoring data Registry is hidden It is intended for use by: –Other middleware components –End users Easy for individuals to define, publish and retrieve data All data has a timestamp, enabling its use for monitoring Producer Registry Consumer Register Locate Query Data

Enabling Grids for E-sciencE INFSO-RI Steve Fisher/RAL 5 R-GMA Producers Primary – source of data Secondary – republish data –Co-locate information to speed up queries –Reduce network traffic PP SP PP – Primary Producer SP – Secondary Producer

Enabling Grids for E-sciencE INFSO-RI Steve Fisher/RAL 6 Three points in the R-GMA evolution EDG –corresponding to the version developed within EDG. EGEE-I – for the version deployed in gLite 3.0 EGEE-II – for the version that will be rolled out late Summer and Autumn of 2007 as upgrades to gLite 3.1 Designed to address properly all the long term problems

Enabling Grids for E-sciencE INFSO-RI Steve Fisher/RAL 7 Managing Memory Usage R-GMA may have varying amounts of memory available –May share servlet container (Tomcat) with other servlets –JVM may be badly configured EGEE-II solution –Use JDK 5 Observer to monitor memory usage –When memory low RGMABusyException returned for all user calls that may take extra memory inserting data into the system creating new producer or consumer resources We try to be fair –If you behave reasonably you should not be penalised –If problem is caused by too many reasonable demands must reject requests with the RGMABusyException.

Enabling Grids for E-sciencE INFSO-RI Steve Fisher/RAL 8 Avoiding bottlenecks in the data flow Buffers are shown B –Primary producer –Secondary producer –Consumer

Enabling Grids for E-sciencE INFSO-RI Steve Fisher/RAL 9 Consumer Buffering - problem Consumer has a buffer for each client where the results of the query are stored until they are popped –If the application is slow to pop() then buffers can fill up. –Cannot send an RGMABusyException to a pop() as this call reduces memory use.

Enabling Grids for E-sciencE INFSO-RI Steve Fisher/RAL 10 Consumer Buffering - solution EGEE-I solution –Allocate each instance a certain amount of memory –When this is full data are written to disk –Once the data are all read from disk, the disk file is removed and memory is used again. –If allocation on disk runs out we close the consumer. –This allows us to cope with peaks of data and is working well

Enabling Grids for E-sciencE INFSO-RI Steve Fisher/RAL 11 Primary Producer Buffering Problem only exists with memory storage –Latest store to answer latest queries Must hold tuples up to their LRT –History store for history and continuous queries Must hold enough tuples to satisfy the history retention period Must hold tuples for which delivery to existing continuous queries has not been attempted.

Enabling Grids for E-sciencE INFSO-RI Steve Fisher/RAL 12 Primary Producer Buffering - solution EGEE-I solution –RGMABufferFullException which is thrown when a producer tries to publish a new tuple and the producer has exceeded a server defined limit. –Works most of the time EGEE-II extra –We have the RGMABusyException to fall back on.

Enabling Grids for E-sciencE INFSO-RI Steve Fisher/RAL 13 Secondary Producer Buffering In EDG and EGEE-I designs the secondary producer was made up of consumers and one producer. In the EGEE-II design incoming data are stored directly in the tuple store. A memory based tuple store can grow very large but nobody to send an RGMABusyException to. For this reason we do not generally recommend using memory based tuple stores for secondary producers. Will close the secondary producer when unable to deal with memory demands implied by retention periods. –This will be added to the servlet code that would normally be sending the RGMABusyException.

Enabling Grids for E-sciencE INFSO-RI Steve Fisher/RAL 14 Coping with loss of control messages EDG –Register once –Refresh periodically –Only register results in notification and start –Network problems can block everything EGEE-I –Use register as refresh No longer need messages to get through But much more traffic –Split queue into slow medium and fast queue EDG EGEE-I

Enabling Grids for E-sciencE INFSO-RI Steve Fisher/RAL 15 Coping with loss of control messages EGEE-II –For a producer a register message now return the consumers of interest and vice versa. –Producers now notify consumers themselves –Messages to other servers go via a task on the task queue EGEE-I EGEE-II

Enabling Grids for E-sciencE INFSO-RI Steve Fisher/RAL 16 Task Manager Assumption is that tasks dependent upon some unreliable resource –e.g. network connection to a server and that server Assign a key to each resource One queue of tasks but a pool of task invocators Initially empty set of good keys If a task is successful on its first try its key is added to good set If a task fails its key is removed from good set Only if key is in good set will more than one task be run with that key Some invocators only take tasks with a good key

Enabling Grids for E-sciencE INFSO-RI Steve Fisher/RAL 17 Schema Replication VDB is defined by a configuration file identifying the master server Each server has full information for each VDB served Tried to avoid a master but very difficult Updates are first done on the master Replication request is updates since

Enabling Grids for E-sciencE INFSO-RI Steve Fisher/RAL 18 Registry replication A registry owns those records that were last changed by direct calls and is responsible for pushing updates of these records. The registry is updated: –by direct calls – sets master flag –upon receipt of replication messages – clear master flag If registry unavailable direct update requests are routed to a different registry instance and records in the new registry will get the master flag set. –The system will clean up when a registry assumes mastership for the record and replicates its records. A hash table is used to hold the add registration and delete registration requests keyed on the primary key of the entry. –Each replication cycle a new hash table is created to take new entries and the old one is processed. Time stamps are associated with the replication messages –Can recognise missed messages and recover

Enabling Grids for E-sciencE INFSO-RI Steve Fisher/RAL 19 Conclusions Try to think of everything that can go wrong. Keep it simple. Polling is much simpler though less efficient than notification. Make the system self correcting and avoid critical messages. Avoid single points of failure. Reject incoming requests if a server cannot cope rather than just going slowly or crashing. Server code should protect itself against running out of memory. External conditions can change at any time: it is not good enough to just check at service startup. This will give us a highly robust and scalable R-GMA.