A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama.

A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama

Agenda Background &Previous Work Motivation System Models Result Conclusion Future Work

Background Large scale geographically distributed systems are becoming more and more popular Replication of data is the most common solution to improve file access time Dynamic behavior of Grid users makes it difficult to make decisions concerning data replications to meet the system availability goal

Previous work: Several replica schemes compared for saving access latency and bandwidth – unlimited storage [ Ranganathan, et al. 2002 ] HotZone algorithm to minimize the client-to- replica latency [ Szymaniak et al. 2005 ] HBR - dynamic replica replication strategy to reduce data access time by avoiding networking congestion [ Park et al. 2003 ]

Motivation: As bandwidth and computing capacity have become relatively cheaper, the data access latency can drop dramatically System reliability and availability becomes the focus Any data file access failure can lead to an incorrect result or a job crash People can tolerate a small delay but not any system unreliability

Motivation: Replicate data to:  Maximize system data availability  Assume limited storage resources  Without sacrificing data access latency

Architecture:

System Model: Note that system level data availability is more important than an individual file’s availability Two new measurements proposed: System File Missing Rate SFMR number of files potentially unavailable number of all the files requested by all the jobs. System Bytes Missing Rate SBMR number of bytes potentially unavailable total number of bytes requested by all jobs.

System Model: Given a set of jobs, J = (j 1, j 2, j 3 …, j N ), each job will access one file set F= (f 1,f 2..f k ) File must stored at a Storage Element (SE) File availability will depend on the SE availability For any file, its availability is : p i = 1-

1. SFMR = 2. SBMR= Job requests can be converted to a series of file access operations System Model:

SFMR = SBMR= The set O means the file accessing set. We assume the whole storage limit in the whole grid system is S, so we have: ≤S, C i denotes the number of copies of f i and S is the total storage available. System Model:

For each file access operation r i, at instant T, we associate it with an important variable V i, which will be set to the number of times this file will be accessed in the future. How to make such a value V i ( 4 ways ): 1.No Prediction : The Vi = 1 at any time. 2.Bio Prediction: Vi is based on the file access history to predict the value of the file by a binomial distribution. 3.Zipf Prediction: Vi is based on the file access history to predict the value of the file by a Zipf distribution. 4.Queue Prediction: The current job queue is used to predict the value of the file. If the queue is empty, this Queue Prediction function will work the same as No Prediction. System Model:

To achieve the optimal the SFMR and SBMR, we have to maximize the following values: and If the file sizes are the same, SFMR = SBMR. To better describe our scheme and algorithm, We introduce a weight value as: W i =(P j * V j ) /(C j *S j ) System Model:

Algorithm: MinDmr Optimizer (): 1. if requested file fi exists in the site then continue 2. if requested file fi does not exist in the site and site has enough free space then retrieve fi from remote site and store it. 3. if requested file fi does not exist in the site and site does not have enough free space then  sort the files in current SE by the file weight Wi in ascending order.  fetch the files from the sorted file list in order and add it into the candidates list until the accumulative file size of the candidate files are greater than or equal to the requested file. 4. Replicate the file if the value gained by replicating the file f i > accumulative value loss by deleting the candidate file f j from the SE: ΔP i *V i > ∑ΔP j *V j

Simulation Setting OptorSim : developed by the EU DataGrid Project to test dynamic replica schemes.  Eco optimizer (economical model – file replicated if maximizes profit of SE) Simulation Configuration : File Set Size : 200 Job Set Size : 10000; File set per job : 3~20 File Size : 1G

Network Topology Setting:

Results - SFMR with varying replica optimizers

Results - The Total job time with sequential access SFMR with varying job schedulers

Results – SFMR with varying job queue length Total Job Time with varying job queue length

Results – Missing Rate Gap (SBMR-SFMR) SFMR with sequential access pattern

Conclusion Proposed two metrics of data availability to evaluate the reliability of the system data in the Data Grid system Discussed how we model the system availability problem Developed four prediction-based replica optimizers with the assumption that the Grid storage space is limited Presented our replica greedy algorithm that treats the hot and cold data file differently and uses a weighting factor for the replacement scheme. Simulation results indicate our new strategies will outperform all others overall in terms of data availability

Future Work: When the file size is not unique size, how to enhance our scheme to differentiate the system file missing rate and system bytes missing rate Work on new measurements to evaluate the job missing rate Design new scheme and prediction function to minimize the new measurements

A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama.

Similar presentations

Presentation on theme: "A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama.

Similar presentations

Presentation on theme: "A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama."— Presentation transcript:

Similar presentations

About project

Feedback