Presentation is loading. Please wait.

Presentation is loading. Please wait.

WP2: Data Management Gavin McCance University of Glasgow.

Similar presentations


Presentation on theme: "WP2: Data Management Gavin McCance University of Glasgow."— Presentation transcript:

1 WP2: Data Management Gavin McCance University of Glasgow

2 GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow  Key areas covered by WP2  Current Status GDMP  Services to be Delivered GridPP  CPU and Bandwidth Investigation  Summary WP2: Data Management

3 GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow WP2: Data Management   Goal: develop middle-ware infrastructure to manage petabyte-scale data Secure Region High Level Services Medium Level Services Core Services Service levels reasonably well defined GridPP: Identify Key Areas Within Software Structure

4 GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow Key Areas and Services  Concentrate mostly on M9 deliverables and where GridPP fits in  Replication  GDMP integration with Globus Replica Catalogue  Query / Replica Optimisation (not for M9!)  Investigate Genetic Algorithms for efficient optimisation of cost functions  SQL Database Service  Complements the LDAP Directory Service approach  Service Index  Efficient and scalable discovery mechanism

5 GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow GDMP Replication  CERN’s GDMP: Asad Samar / Heinz Stockinger  Allows world-wide replication of large OO databases  Modules soon available for Objectivity, Root and FZ files (M9)  WP2: Numerous replication strategies possible  e.g. (fully) consistent synchronous replication or more lazy asynchronous replication  Reviews...  Much current discussion in WP2 and beyond… workshops? [Distributed Database Management Systems and the Data Grid, Heinz Stockiner]

6 GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow GDMP Replica Catalogue Get import file list Export Catalogue Import Catalogue Import Catalogue Import Catalogue Replica Catalogue Site1 (Publisher) Site2 (Subscriber) Site3 (Subscriber) Site4 Publish files Get import file list Notify subscribers of new files  M9… GDMP now interfaced to the Globus Replica Catalogue Logical File Physical File Logical Collection File Registration, Searching and Deletion implemented [GDMP Integration with Globus’ Replica Catalogue, Asad Samar]

7 GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow Query / Replica Optimisation  Should the replica manager make a new replica? Can a query/job be split into sub-queries? Which replica to use?  Higher level service! Uses cost model to make decision...  Minimise over all subsets of data accessed in sub- queries and all physical file replicas  Preliminary work done in development of cost models… more to be studied...  GridPP can contribute to WP2! [Towards a Cost Model for Distributed and Replicated Data Stores, Heinz & Kurt Stockinger, CERN]

8 GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow GA Approach  GridPP work will investigate uses of Genetic Algorithms for optimising complex multi- dimensional cost functions  Solutions are ‘bred’ in parallel, ranked according to the cost function, and re-bred using the best candidates using some crossing and mutation operators Multiple points evolved simultaneously; more robust against local minima Optimisations generally faster for complex functions, particularly for more unpredictable situations e.g. networks!

9 GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow  LDAP? Hierarchical model assumes you know the query before designing the database!  Arbitrary / Computed queries can be expensive / impossible!  RDBMS model is better for these queries  Investigating SQL databases…  Issues with transactions to be investigated  M9 should see basic SQL insert, delete, update and select operations.  Standard protocols should be used!  e.g. Generic SQL wrapped in XML over HTTPS... M9: SQL Database Service PostgreSQL

10 GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow M9: SQL Database Service  Producer / Consumer Model  A Producer adds meta-data and registers table format.  (Dynamic registration of new tables is outside M9..?)  A Consumer uses a known or registered schema (tbd!) to construct query.  translated by server to SQL.. queried.. returned to client as XML / HTML  APIs to be implemented:  JAVA, Web, Command line

11 GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow M9: Service Index  Grid services must be able to discover each other!  Neither the ‘everyone knows...’ approach nor the hierarchical approach is scalable. sds.cern.ch sds.anl.gov sds.infn.it sds.ral.uk sds.padova-infn.it sds.trieste-infn.it sds.bologna-infn.it Allowed  Hierarchical Model Construct a ‘web’ of Service Indices

12 GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow M9: Service Index  Services publish XML based description…  e.g. name, contact protocols / details, type, who can know about me.  JINI style ‘leases’: services must report periodically or be dropped from list  Clients query service-indices using XML based query with standard schema (tbd!)…  M9 will see basic propagation of queries.  Security: Services must be able to limit who can access their description !  Coarse grained..  Other than this, the service index will not provide any access policy control..!

13 GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow M9: Service Index  Service descriptions should be small! (<1k)  User defined (eg. experiment specific) schema should be ~ discouraged.  After M9.. more intelligent web traversing tools can be developed!  Agent technology?  How to find a service index??  Hard wired ‘root’ service indices??  Limited scope multicast advertising??

14 GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow CPU and Bandwidth Monitoring  Scalable CPU Monitoring system for ScotGRID cluster with JAS GUI being developed General cluster overview More detailed individual node information

15 GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow CPU and Bandwidth Monitoring  Network measurement tools being evaluated and developed Δt bb Bandwidth measurement from UDP packet dispersion MonitorX Pipechar IPERF

16 GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow CPU and Bandwidth Monitoring  Other methods / tools being investigated and developed Bandwidth measurement from Round-trip-time (RTT) using UDP, TC/PIP and ICMP mptraceu pathchar Uses RTT through routers as a function of packet size to obtain bandwidth

17 GRID IIII D UK Particle Physics Gavin McCance - University of Glasgow Summary  GDMP Replication Manager completed  Active discussion in WP2 and beyond about replication strategies  Cost models… GA approach?  SQL Database Service being investigated for M9  Service Index being investigated for M9  CPU and Network Monitoring work is underway in ScotGRID...


Download ppt "WP2: Data Management Gavin McCance University of Glasgow."

Similar presentations


Ads by Google