WP2: Data Management Gavin McCance University of Glasgow.

Slides:



Advertisements
Similar presentations
18 Copyright © 2005, Oracle. All rights reserved. Distributing Modular Applications: Introduction to Web Services.
Advertisements

21 Sep 2005LCG's R-GMA Applications R-GMA and LCG Steve Fisher & Antony Wilson.
Experiences of the Grid… Gavin McCance University of Glasgow NeSC Meeting, 24 October 2001.
WP2: Data Management Gavin McCance University of Glasgow November 5, 2001.
WP2: Data Management Gavin McCance University of Glasgow.
Metadata Progress GridPP18 20 March 2007 Mike Kenyon.
Author - Title- Date - n° 1 GDMP The European DataGrid Project Team
Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
Notes to the presenter. I would like to thank Jim Waldo, Jon Bostrom, and Dennis Govoni. They helped me put this presentation together for the field.
GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October , 2000.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
Grid Computing, B. Wilkinson, 20046c.1 Globus III - Information Services.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
GridPP9 – 5 February 2004 – Data Management DataGrid is a project funded by the European Union GridPP is funded by PPARC WP2+5: Data and Storage Management.
Technology Overview. Agenda What’s New and Better in Windows Server 2003? Why Upgrade to Windows Server 2003 ?  From Windows NT 4.0  From Windows 2000.
The EU DataGrid – Information and Monitoring Services The European DataGrid Project Team
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
A. Cavalli - F. Semeria INFN Experience With Globus GIS 1 A. Cavalli - F. Semeria INFN First INFN Grid Workshop Catania, 9-11 April 2001 INFN Experience.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May Grid Monitoring Services Robin Middleton RAL/PPD24-May-01.
File and Object Replication in Data Grids Chin-Yi Tsai.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Information System on gLite middleware Vincent.
Application code Registry 1 Alignment of R-GMA with developments in the Open Grid Services Architecture (OGSA) is advancing. The existing Servlets and.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Heterogeneous Database Replication Gianni Pucciani LCG Database Deployment and Persistency Workshop CERN October 2005 A.Domenici
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
CLRC and the European DataGrid Middleware Information and Monitoring Services The current information service is built on the hierarchical database OpenLDAP.
Nguyen Tuan Anh. VN-Grid: Goals  Grid middleware (focus of this presentation)  Tuan Anh  Grid applications  Hoai.
Replica Consistency in a Data Grid1 IX International Workshop on Advanced Computing and Analysis Techniques in Physics Research December 1-5, 2003 High.
Metadata Mòrag Burgon-Lyon University of Glasgow.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
April 2003 Iosif Legrand MONitoring Agents using a Large Integrated Services Architecture Iosif Legrand California Institute of Technology.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America gLite Information System Claudio Cherubino.
DATABASE REPLICATION DISTRIBUTED DATABASE. O VERVIEW Replication : process of copying and maintaining database object, in multiple database that make.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
Data Management The European DataGrid Project Team
GIIS Implementation and Requirements F. Semeria INFN European Datagrid Conference Amsterdam, 7 March 2001.
EGEE is a project funded by the European Union under contract IST Information and Monitoring Services within a Grid R-GMA (Relational Grid.
GT3 Index Services Lecture for Cluster and Grid Computing, CSCE 490/590 Fall 2004, University of Arkansas, Dr. Amy Apon.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
FESR Trinacria Grid Virtual Laboratory gLite Information System Muoio Annamaria INFN - Catania gLite 3.0 Tutorial Trigrid Catania,
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
E-science grid facility for Europe and Latin America Updates on Information System Annamaria Muoio - INFN Tutorials for trainers 01/07/2008.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Maintaining and Searching Metadata Mario Antonioletti, Shannon Hastings, Peter Kunszt, Stephen Langella, Simon Laws, Susan Malaika, Gavin McCance, Alex.
Gavin McCance University of Glasgow GridPP2 Workshop, UCL
Open Source distributed document DB for an enterprise
AMGA Web Interface Salvatore Scifo INFN sez. Catania
Grid Metadata Management
GSAF Grid Storage Access Framework
gLite Information System
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
AMGA Web Interface Vincenzo Milazzo
Information Services Claudio Cherubino INFN Catania Bologna
Presentation transcript:

WP2: Data Management Gavin McCance University of Glasgow

GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow  Key areas covered by WP2  Current Status GDMP  Services to be Delivered GridPP  CPU and Bandwidth Investigation  Summary WP2: Data Management

GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow WP2: Data Management   Goal: develop middle-ware infrastructure to manage petabyte-scale data Secure Region High Level Services Medium Level Services Core Services Service levels reasonably well defined GridPP: Identify Key Areas Within Software Structure

GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow Key Areas and Services  Concentrate mostly on M9 deliverables and where GridPP fits in  Replication  GDMP integration with Globus Replica Catalogue  Query / Replica Optimisation (not for M9!)  Investigate Genetic Algorithms for efficient optimisation of cost functions  SQL Database Service  Complements the LDAP Directory Service approach  Service Index  Efficient and scalable discovery mechanism

GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow GDMP Replication  CERN’s GDMP: Asad Samar / Heinz Stockinger  Allows world-wide replication of large OO databases  Modules soon available for Objectivity, Root and FZ files (M9)  WP2: Numerous replication strategies possible  e.g. (fully) consistent synchronous replication or more lazy asynchronous replication  Reviews...  Much current discussion in WP2 and beyond… workshops? [Distributed Database Management Systems and the Data Grid, Heinz Stockiner]

GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow GDMP Replica Catalogue Get import file list Export Catalogue Import Catalogue Import Catalogue Import Catalogue Replica Catalogue Site1 (Publisher) Site2 (Subscriber) Site3 (Subscriber) Site4 Publish files Get import file list Notify subscribers of new files  M9… GDMP now interfaced to the Globus Replica Catalogue Logical File Physical File Logical Collection File Registration, Searching and Deletion implemented [GDMP Integration with Globus’ Replica Catalogue, Asad Samar]

GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow Query / Replica Optimisation  Should the replica manager make a new replica? Can a query/job be split into sub-queries? Which replica to use?  Higher level service! Uses cost model to make decision...  Minimise over all subsets of data accessed in sub- queries and all physical file replicas  Preliminary work done in development of cost models… more to be studied...  GridPP can contribute to WP2! [Towards a Cost Model for Distributed and Replicated Data Stores, Heinz & Kurt Stockinger, CERN]

GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow GA Approach  GridPP work will investigate uses of Genetic Algorithms for optimising complex multi- dimensional cost functions  Solutions are ‘bred’ in parallel, ranked according to the cost function, and re-bred using the best candidates using some crossing and mutation operators Multiple points evolved simultaneously; more robust against local minima Optimisations generally faster for complex functions, particularly for more unpredictable situations e.g. networks!

GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow  LDAP? Hierarchical model assumes you know the query before designing the database!  Arbitrary / Computed queries can be expensive / impossible!  RDBMS model is better for these queries  Investigating SQL databases…  Issues with transactions to be investigated  M9 should see basic SQL insert, delete, update and select operations.  Standard protocols should be used!  e.g. Generic SQL wrapped in XML over HTTPS... M9: SQL Database Service PostgreSQL

GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow M9: SQL Database Service  Producer / Consumer Model  A Producer adds meta-data and registers table format.  (Dynamic registration of new tables is outside M9..?)  A Consumer uses a known or registered schema (tbd!) to construct query.  translated by server to SQL.. queried.. returned to client as XML / HTML  APIs to be implemented:  JAVA, Web, Command line

GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow M9: Service Index  Grid services must be able to discover each other!  Neither the ‘everyone knows...’ approach nor the hierarchical approach is scalable. sds.cern.ch sds.anl.gov sds.infn.it sds.ral.uk sds.padova-infn.it sds.trieste-infn.it sds.bologna-infn.it Allowed  Hierarchical Model Construct a ‘web’ of Service Indices

GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow M9: Service Index  Services publish XML based description…  e.g. name, contact protocols / details, type, who can know about me.  JINI style ‘leases’: services must report periodically or be dropped from list  Clients query service-indices using XML based query with standard schema (tbd!)…  M9 will see basic propagation of queries.  Security: Services must be able to limit who can access their description !  Coarse grained..  Other than this, the service index will not provide any access policy control..!

GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow M9: Service Index  Service descriptions should be small! (<1k)  User defined (eg. experiment specific) schema should be ~ discouraged.  After M9.. more intelligent web traversing tools can be developed!  Agent technology?  How to find a service index??  Hard wired ‘root’ service indices??  Limited scope multicast advertising??

GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow CPU and Bandwidth Monitoring  Scalable CPU Monitoring system for ScotGRID cluster with JAS GUI being developed General cluster overview More detailed individual node information

GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow CPU and Bandwidth Monitoring  Network measurement tools being evaluated and developed Δt bb Bandwidth measurement from UDP packet dispersion MonitorX Pipechar IPERF

GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow CPU and Bandwidth Monitoring  Other methods / tools being investigated and developed Bandwidth measurement from Round-trip-time (RTT) using UDP, TC/PIP and ICMP mptraceu pathchar Uses RTT through routers as a function of packet size to obtain bandwidth

GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow Summary  GDMP Replication Manager completed  Active discussion in WP2 and beyond about replication strategies  Cost models… GA approach?  SQL Database Service being investigated for M9  Service Index being investigated for M9  CPU and Network Monitoring work is underway in ScotGRID...