Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimizing of data access using replication technique Renata Słota 1, Darin Nikolow 1,Łukasz Skitał 2, Jacek Kitowski 1,2 1 Institute of Computer Science.

Similar presentations


Presentation on theme: "Optimizing of data access using replication technique Renata Słota 1, Darin Nikolow 1,Łukasz Skitał 2, Jacek Kitowski 1,2 1 Institute of Computer Science."— Presentation transcript:

1 Optimizing of data access using replication technique Renata Słota 1, Darin Nikolow 1,Łukasz Skitał 2, Jacek Kitowski 1,2 1 Institute of Computer Science AGH-UST, Cracow 2 ACC CYFRONET AGH, Cracow

2 Agenda Motivation of the work Why does today grid computing need replication? Replication basics Clusterix Data Management System Architecture, optimization and replication algorithms Optimization Example Replication Example Summary, conclusions

3 Site-level vs. Grid-level replication Site-level replication Replicas in one site Implementation examples: RAID HSM Grid-level replication Data management systems Replicas spread on many sites

4 Motivation of the work Why does today grid computing need replication? Data protection and availability Malfunction of one storage does not affect data itself, only performance is affected Performance Low level optimization and replication are not sufficient (RAID, HSM) Limited network bandwidth Limited storage performance

5 Replication scenarios Static replication Decision made by system administrator or user Limited system support: replica selection, replica coherency, replica ordering Dynamic replication Decision made by dedicated grid component based on current data access pattern of users Full system support

6 Replication consequences Optimal replica selection algorithm Replica creation and removal algorithm Cost of replica creation, update and storage Replica coherency

7 Clusterix National Cluster of Linux Systems Project aim: To develop set of tools and procedures allowing to build productive Grid environment based on local PC clusters spread in independent supercomputing centers Network Layer: Pionier – Polish optical networks

8 Clusterix Data Management System Architecture

9 Optimization Algorithm Selects optimal storage element for: data accessing replica creation Takes under consideration current state of the System Optimal storage element is one with the maximal weight W(s,d) W(s,d)=min((1-NetLoad(s))  bandwidth(s,d), (1-Sload(s))  Sbandwidth(s)) s – storage element d – destination node NetLoad(s) – s network interface load Bandwidth(s,d) – available bandwidth between s and d Sload(s) – storage system load Sbandwidth(s) – storage system bandwidth

10 Automatic replication algorithm Takes under consideration gain from replication G(), cost of replica creation C(), cost of replicas update U() and administrative factor A(). Replication profit: P(d,R,S,f)=G(d,R,S,f)+C(d,R,f)+U(d,R,S,f)+A(d,f) d – storage element, which profit is computed for R – set of storage elements containing replicas of f S – statistic data – history of file usage f – considered file

11 Storage oriented problems Data intensive applications for Clusterix Simulation of transonic flow past a wings tips Visualization of complex multidimensional structures Ecosystem modeling and simulation

12 Optimization Example Node A needs file F stored on SE1, SE2 and SE3 JIMS SE1 NMS Node A JIMS SE2 NMS JIMS SE3 NMS CDMS Optimizer NMS F F F F

13 Optimization Example Node A sends request to CDMS JIMS SE1 NMS Node A JIMS SE2 NMS JIMS SE3 NMS CDMS Optimizer NMS F F F

14 Optimization Example CDMS uses Optimizer to choice optimal SE JIMS SE1 NMS Node A JIMS SE2 NMS JIMS SE3 NMS CDMS Optimizer NMS F F F

15 Optimization Example Optimizer is working… JIMS SE1 NMS Node A JIMS SE2 NMS JIMS SE3 NMS CDMS Optimizer W(s2,d)=min((1-NetLoad(s2))  bandwidth(s2,d), (1-Sload(s2))  Sbandwidth(s2)) NMS W(s1,d)=min((1-NetLoad(s1))  bandwidth(s1,d), (1-Sload(s1))  Sbandwidth(s1)) W(s3,d)=min((1-NetLoad(s3))  bandwidth(s3,d), (1-Sload(s3))  Sbandwidth(s3)) F F F

16 Initial replication example NMSCDMS Optimizer NMS JIMS SE2 NMS JIMS SE3 NMS Clusterix Entry point User Workstation JIMS SE1 NMS

17 Dynamic replication in Clusterix Initial replication Every stored data file should be replicated Replication on demand Job driven replication Replication ordered by external process Replication based on statistic analysis Data access pattern driven replication

18 Automatic replication example Situation 3 clusters 4 storage elements 2 contain replica of Set of applications running on these clusters and accessing file SE1 F SE2SE3 SE4 F F F

19 Sleeping… Working… Automatic replication example CDMS Optimizer SE4 SE1 Replication Module Statistic Module SE2SE3 FF Gain Cost of rep. Cost of update Adm. factor

20 Working… Automatic replication example CDMS Optimizer SE4 SE1 Replication Module Statistic Module SE2SE3 FF Decision: SE2 F SE4 F FF F F FF Sleeping…

21 Automatic replication example CDMS Optimizer SE4 SE1 Replication Module Statistic Module SE2SE3 FF F

22 Summary Architecture of CDMS with Optimization and Replication modules has been designed Replication and optimization algorithms has been specified Modules interfaces has been specified Future work Integration and tests

23 Conclusions Simulation of replication vs. real system implementation Replication should be designed to meet specific Clusterix applications profile Data availability Replication drawbacks

24 Publications Extended functionality of Virtual Storage System for grid Renata Słota, Darin Nikolow, Łukasz Skitał, Jacek Kitowski Cracow Grid Workshop 2004, poster no. 13 Application of data replication methods in Clusterix project (in polish) Renata Słota, Darin Nikolow, Łukasz Skitał, Jacek Kitowski Pionier 2004, 19-20 May, Poznań, electronic publication Implementation of replication methods in the Grid Environment Renata Słota, Darin Nikolow, Łukasz Skitał, Jacek Kitowski Submitted to European Grid Conference

25 Thank You!

26 Clusterix Data Management System Architecture Replication module Responsible for: –Automatic replica creation/removalAutomatic replica creation/removal Implementation –JavaJava –Apache SOAPApache SOAP Cooperate with: –Optimization moduleOptimization module –Statistic moduleStatistic module

27 Clusterix Data Management System Architecture Optimization Module Responsible for: –storage element selection for newly created replica,storage element selection for newly created replica, –optimal replica selection.optimal replica selection. Implementation –C/C++C/C++ –gSOAPgSOAP Cooperates with: –Network Monitoring System (NMS)Network Monitoring System (NMS) –Information SystemInformation System JMX-based Infrastructure Monitoring System (JIMS)

28 Clusterix Data Management System Architecture Information System (JIMS) Department of Computer Science, AGH University of Science & Technology Provides the following information for selected node: Available storage capacity Total storage capacity Network interface load Network interface bandwidth Storage system load Average storage system load Maximal measured storage bandwidth

29 Clusterix Data Management System Architecture Network Monitoring System Poznan Supercomputing and Networking Center Provides the following information: Maximum bandwidth between two network nodes Current load between two network nodes Nodes availability

30 Clusterix Data Management System Architecture Statistic Module Białystok Technical University Responsible for gathering information about past data usage


Download ppt "Optimizing of data access using replication technique Renata Słota 1, Darin Nikolow 1,Łukasz Skitał 2, Jacek Kitowski 1,2 1 Institute of Computer Science."

Similar presentations


Ads by Google