Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enhanced Replica Selection Technique for Binding Replica Sites in Data Grids Authors: Mahdi S. Al-Mhanna, Rafah M. Almuttairi University of Babylon, Babylon,

Similar presentations

Presentation on theme: "Enhanced Replica Selection Technique for Binding Replica Sites in Data Grids Authors: Mahdi S. Al-Mhanna, Rafah M. Almuttairi University of Babylon, Babylon,"— Presentation transcript:


2 Enhanced Replica Selection Technique for Binding Replica Sites in Data Grids Authors: Mahdi S. Al-Mhanna, Rafah M. Almuttairi University of Babylon, Babylon, Iraq University of Babylon

3 Outlines  What is the DataGrid?  Replication Overview.  Replica Optimizer.  What is Association Rules?  New Replica Selection Technique for Binding Cheapest Replica Sites in Data Grids.  Comparison with traditional methods.  Related Work.  Conclusion. University of Babylon

4 What is Data Grid? University of Babylon

5 What is Data Grid ?  An infrastructure that enables replicating the required data sets in different distributed sites and then selecting the best replica matching user’s requirements.  Data Grid job: It contains a set of files that must be available at the time before or during execution time. Job 1: File 1; File10; File 11; File12; ……… University of Babylon

6 Why do we need it? Worldwide LHC Computing Grid The Large Hadron Collider will produce roughly 15 petabytes (15 million gigabytes) of data annually – enough to fill more than 1.7 million dual-layer DVDs a year! CMSLHCbATLASALICE 1 Megabyte (1MB) A digital photo 1 Gigabyte (1GB) = 1000MB A DVD movie 1 Terabyte (1TB) = 1000GB World annual book production 1 Petabyte (1PB) = 1000TB Annual production of one LHC experiment 1 Exabyte (1EB) = 1000 PB World annual information production University of Babylon

7 LHC data LHC data correspond to about 20 million CDs each year! Concorde (15 Km) Balloon (30 Km) CD stack with 1 year LHC data! (~ 20 Km) Mt. Blanc (4.8 Km) Where will the experiments store all of these data? Data for LHC: a problem? The Grid: a possible solution! University of Babylon

8 Replication Overview University of Babylon

9 Data problem solved by, Storing data in different hosts Data problem solved by, Storing data in different hosts. SVA Compressed Data University of Babylon

10 Resource Broker and Resource Optimizer … Shall I store them in one site to minimize searching latency ? But, Single server will be flooded when the requests increase. Latency increases with long queue of waited jobs. Replication Overview University of Babylon

11 So,…Where should I keep my files to get it on demand ??!! …Which file I have to save more than one copy in different places???! ! Resource Optimizer Replication Strategies!! … Which file I should be used when it is needed?!! Optimizatio n Strategies!! University of Babylon

12 The replica optimiser provides: A run time selection service to get the best file replica. User Interface Resource Broker Storage Element Computing Element Replica Manager Replica Optimizer Storage Element Computing Element Replica Manager Replica Optimizer Job exe. site Job 1 : f 1,f 2,...,f z Job 2 : f 7,f 8,...,f z. Job t : f 11,f1 2,...,f z Get best sites having: f 1,f 2,...,f z Job 1 : f 7,f 8,...,f z Job 2 : f 1,f 2,...,f z University of Babylon

13 Association rules technique University of Babylon

14 Association rules: Pincer-Search algorithm This algorithm is used to combine different associated sites, having uncongested links at the time of the file(s) transfer. Preliminary Concepts Association rules: association rules are mostly used in mining transaction data. Crucial terms in association rules terminology are: Item (in DM terminology corresponds to attribute-value pair). Transaction (a set of items; corresponds to example). A Set (data set) of transactions containing more different items. University of Babylon

15 Efficient Set algorithm University of Babylon

16 Figure 4. Execution flows using ESM Get the first Job from job queue Get all related LFN from job description file Get the physical adds of replicas’ sites from RLS Using Ipref, Probing info. are stored in NHD Calculate the score and update the TABLE Yes No Apply EST to get Ln which is list of associated sites Do we have history for current job in NHD? Use (GridFTP) transfer files from efficient set of sites. Get the next Job from job queue

17 Experiments University of Babylon

18 Study case: PRAGMA Grid, 35 institutions in 25 regions  Let us assume there are eight sites having a requested file.  Ipref, network service is used to get RTT report  RLS, to collect physical names of replicas University of Babylon


20 1- Network History File NHF/Using Ipref RTT/STTS1S1 S2S2 S3S3 S4S4 S5S5 S6S6 S7S7 S8S8 1 60 17620233678.725676.4 213 2 65 21120934376.525686.8 202 3 303 17529833864.825857.2 205 4 305 21320327392.625585.1 202 5 300 2102233349529255 212 6 313 17620733566.729855.3 212 7 310 17529827394.225585.8 202 8 307 21626027194.425660 212 9310 21726034295.228990.1 212 10310 21122433966.325790.3 212 11310 17520427292.826291.4 202 12307 1762052716925688.7 210 13310 21122734492.529964.3 212 1450 17520227066.329990.5 216 15316 21426033663.429657.8 224 1674 20920634194.328756.4 222 1.RTT: Round Trip Time 2.STT: Single Trip Time 3.Unit: ms University of Babylon

21 2- Logical History File (LHF) University of Babylon

22 Affected Factors  Factors affect directly on data transfer time.  Bandwidth  Latency of links Factors affect the performance data transfer.  CPU  I/O slightly University of Babylon

23 Cont… According to the first three factors, in our algorithm, a score function is defined as the following: Score i = p i BW × w i BW +p i CPU × w i CPU + p i I/O × w i I/O Sum = ∑ Score j, 1 < J < M AVG = sum / M, M= Number of sites If score i ≥ AVG then Score i =1 otherwise Score i = 0 University of Babylon

24 Cont… Where, - Score i. the replica selection cost model for server i. - p i BW. percentage of Bandwidth available from server i to the computing site. - w i BW. network and Bandwidth ratio of replica server i defined by the administrator of the Data Grid organization p i CPU. percentage CPU idle states of replica server i - w i CPU. CPU load ratio of server i defined by the administrator. - p i I/O. percentage of memory free space of server i - w i I/O. memory free space ratio defined by the administrator. University of Babylon

25 Rules "The rule": "The rule": Rule #1: if Site(s) S 4, S 7 are selected then this implies that site(s) S 3 can also be selected at the same time. This rule has 100% confidence. I(Rule1)= S 4, S 7  S 3 Support(S4,S7,S3)/Support(S4,S7)*Support(S3)= 1.109 The confidence of this rule is 100%. The requested files will be sent by S 3, S 4, S 7 To compute the correlation of this rule and see how far it is better than choosing the site randomly, we use an Improvement equation. University of Babylon

26 EST vs Traditional Model University of Babylon

27 Data Grid and their associated network geometry Computing Site (CS) is (S 0 ) RS= {S 14,S 1, S 3 }. Red stars referring to congested routers. Traditional selection  S14 less number of Hops EST,  (S 3 ) uncongested links. EST vs Traditional Model University of Babylon

28 Comparison with Traditional selection strategy Traditional selection strategy and EST University of Babylon

29 Conclusion

30  Efficient Selection Technique (EST)  Efficient Selection Technique (EST) is a dynamic replica selection strategy.  Aims to adapt at run-time its criteria to flexible QoS binding contracts specified by the service provider and/or the client.  The concept of association rules of data mining approach is used.  To reduce the searching space, response time and network resources consumed. Conclusion University of Babylon

31 Related work  During 2001-2003, Sudharshan et al. In their work they used the history of previous file transfer information to predict the best site holding a copy of the requested file.  R. Kavitha, and I. Foster, in 2001 they used Network Bandwidth and link’s latency.  Corina and M. Mesaac in 2003 and Ceryen and M. Kevin 2005 used different algorithms such as greedy, random, partitioned and weight algorithms in the selection engine. Their selection methods were with respect to some static metric such as the geographical distance in miles and the topological distance in number of hops.  Rashedur et al. in 2005 exploited a replica selection technique with the K-Nearest Neighbor (KNN) rule used to select the best replica from the information gathered locally.  Rashedur et al., In 2008 proposed a Neural Network predictive technique (NN based) to estimate the transfer time between sites.  A. Jaradat et al., In 2009, proposed a new approach that utilizes availability, security and time as selection criteria between different replicas, by adopting k-means clustering algorithm concepts to create a balanced (best) solution.  Omer Aljadaan in 2010, used a Genetic algorithm for clustering replicas and get best replica matching QoS of user’s requirements.  Most of these methods have a limitations and some of them have a drawback and others work under a specific conditions. University of Babylon

32 References References M. Rashedur Rahman, Ken Barker, Reda Alhajj, "Replica selection in grid environment:a data-mining approach", Distributed systems and grid computing (DSGC),pp: 695 – 700, 2005Rashedur RahmanKen BarkerReda Alhajj J. Gwertzman and M. Seltzer, "The case for geographical push cashing", In Proceeding of the 5th Workshop on Hot ZTopic in Operating Systems, 1995. R. Kavitha, I. Foster, "Design and evaluation of replication strategies for a high performance data grid", in, Proceedings of Computing and High Energy and, Nuclear Physics, 2001. S. Vazhkudai, J. Schopf, I. Foster, "Predicting the performance of wide-area data transfers", in: 16th International PDPS, 2002. S. Vazhkudai, J. Schopf, "Using regression techniques to predict large data transfers", in: Computing: Infrastructure and Applications, The International Journal of High Performance Computing Applications, IJHPCA, August, 2003. A. Abbas, Grid Computing:" A Practical Guide to Technology and A PPLICATIONS ", 2006. S. Vazhkudai, S Tuecke, I. Foster, "Replica selection in the globus data grid", in: First IEEE/ACM International Conference on Cluster Computing and the Grid, CCGrid 2001. J. Guyton and M. Schwartz."Locating nearby copies of replicated internet servers", In Proceeding of ACM SIGCOMM’ 95, 1995. A. Tirumala, J. Ferguson, Iperf 1.2 - The TCP/UDP Bandwidth Measurement Tool, 2002. R. Wolski, Dynamically forecasting network performance using the Network Weather Service, Cluster Computing (1998). Yunhong Gu, Robert L. Grossman, "UDT: UDP-based data transfer for high-speed wide area networks", Computer Networks, Volume 51, Issue 7, 16 May 2007, Pages 1777 1799. Elsevier.UDT: UDP-based data transfer for high-speed wide area networks R.M. Rahman, K. Barker, R. Alhajj, "Predicting the performance of GridFTP transfers", in: Proceedings of IEEE Symposium of Parallel and Distributed Systems, 2004, New Mexico, USA, p. 238a. University of Babylon

33 J. F. Kurose, K.W. Ross, "Computer Networking A Top-Down Approach Featuring the Internet", 3rd edition. S. Venugopal,. R. Buyya,"The Gridbus Toolkit for Service Oriented Grid and Utility Computing: An Overview and Status Report"2004. R. Agrawal, T. Imielinski, A.Swami, "Mining association rules between sets of items in large databases". In: Proc. ACM SIGMOD Intl. Conf. Management Data, 1993 R.M Rahman, K Barker and R Alhajj, "Replica selection strategies in data grid", Journal of Parallel and Distributed Computing, Volume 68, Issue 12, Pages 1561-1574, December 2008.Journal of Parallel and Distributed ComputingVolume 68, Issue 12 A. Jaradat, R. Salleh and A. Abid, "Imitating K-Means to Enhance Data Selection", Journal of Applied Sciences 9 (19): 3569-3574, 2009, ISSN 1812-5654, Asian Network for Scientific Information-2009. S. Venugopal,. R. Buyya, K. Ramamohanarao, "A taxonomy of Data Grids for distributed data sharing, management, and processing". ACM Comput. Surv. 38, 1 (Jun. 2006), ACM New York, NY, USAACM A. K Pujari, "Data mining techniques", Hyderabad : Universities Press, 2002.A. K Pujari G. Williams, M. Hegland and S. Roberts, "A Data Mining Tutorial", IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN’98) 14 December 1998. T. Ceryen, and M. Kevin, 2005. "Performance characterization of decentralized algorithms for replica selection in dstributed object systems". Proceedngs of 5th International Workshop on Software and Performance, July 11 -14, Palma, de Mallorca, Spain, pp: 257-262. F. Corina, and M. Mesaac, 2003, "A scalable replica selection strategy based on flexible contracts". Proceedings of the 3rd IEEE Workshop on Internet Applications, June 23-24, IEEE Computer Society Washington, DC, USA, pp: 95-99. R. M. almuttari, R. Wankar, A. Negi, C.R. Rao. "Intelligent Replica Selection Strategy for Data Grid", In proceeding of the 10 th International conference on Parallel and Distributed Proceeding Techniques and Applications, IEEE Computer Society Washington, DC, WorldComp2010, GCA2010, LasVegas, USA, Volume3, pp: 95-100, July 12- 15-2010. Cisco Distributed Director, M. Sayal, Y. Breitbart, P. Scheuermann, R. Vingralek, "Selection algorithms for replicated web servers". In Proceeding of the Workshop on Internet Server Performance,1998. E. Zegura, M. Ammar, Z. Fei, and S. Bhattacharjee, "Application-layer anycasting: a server selection architecture and use in a replicated web service", IEEE/ACM Transactions on Networking, vol. 8, no. 4, pp. 455–466, Aug. 2000. University of Babylon

34 ? ????

Download ppt "Enhanced Replica Selection Technique for Binding Replica Sites in Data Grids Authors: Mahdi S. Al-Mhanna, Rafah M. Almuttairi University of Babylon, Babylon,"

Similar presentations

Ads by Google