A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama.

Slides:



Advertisements
Similar presentations
Dynamic Grid Optimisation TERENA Conference, Lijmerick 5/6/02 A. P. Millar University of Glasgow.
Advertisements

Cost-Based Cache Replacement and Server Selection for Multimedia Proxy Across Wireless Internet Qian Zhang Zhe Xiang Wenwu Zhu Lixin Gao IEEE Transactions.
Hadi Goudarzi and Massoud Pedram
IoP HEPP 2004 Birmingham, 7/4/04 David Cameron, University of Glasgow 1 Simulation of Replica Optimisation Strategies for Data.
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
End-to-End Analysis of Distributed Video-on-Demand Systems Padmavathi Mundur, Robert Simon, and Arun K. Sood IEEE Transactions on Multimedia, February.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Adaptive Content Management in Structured P2P Communities Jussi Kangasharju Keith W. Ross David A. Turner.
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
Performance Evaluation of Peer-to-Peer Video Streaming Systems Wilson, W.F. Poon The Chinese University of Hong Kong.
Introspective Replica Management Yan Chen, Hakim Weatherspoon, and Dennis Geels Our project developed and evaluated a replica management algorithm suitable.
Performance and Robustness Testing of Explicit-Rate ABR Flow Control Schemes Milan Zoranovic Carey Williamson October 26, 1999.
On Fairness, Optimizing Replica Selection in Data Grids Husni Hamad E. AL-Mistarihi and Chan Huah Yong IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,
Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.
Peer-to-peer Multimedia Streaming and Caching Service by Won J. Jeon and Klara Nahrstedt University of Illinois at Urbana-Champaign, Urbana, USA.
Modeling Quality-Quantity based Communication Orr Srour under the supervision of Ishai Menache.
CUHK Analysis of Movie Replication and Benefits of Coding in P2P VoD Yipeng Zhou Aug 29, 2012.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 34 – Media Server (Part 3) Klara Nahrstedt Spring 2012.
GRID COMPUTING: REPLICATION CONCEPTS Presented By: Payal Patel.
1 Ekow J. Otoo Frank Olken Arie Shoshani Adaptive File Caching in Distributed Systems.
Storage Allocation in Prefetching Techniques of Web Caches D. Zeng, F. Wang, S. Ram Appeared in proceedings of ACM conference in Electronic commerce (EC’03)
Bargaining Towards Maximized Resource Utilization in Video Streaming Datacenters Yuan Feng 1, Baochun Li 1, and Bo Li 2 1 Department of Electrical and.
Min Xu1, Yunfeng Zhu2, Patrick P. C. Lee1, Yinlong Xu2
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
Network Aware Resource Allocation in Distributed Clouds.
DELAYED CHAINING: A PRACTICAL P2P SOLUTION FOR VIDEO-ON-DEMAND Speaker : 童耀民 MA1G Authors: Paris, J.-F.Paris, J.-F. ; Amer, A. Computer.
Tiziana FerrariNetwork metrics usage for optimization of the Grid1 DataGrid Project Work Package 7 Written by Tiziana Ferrari Presented by Richard Hughes-Jones.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Segment-Based Proxy Caching of Multimedia Streams Authors: Kun-Lung Wu, Philip S. Yu, and Joel L. Wolf IBM T.J. Watson Research Center Proceedings of The.
IEEE Globecom 2010 Tan Le Yong Liu Department of Electrical and Computer Engineering Polytechnic Institute of NYU Opportunistic Overlay Multicast in Wireless.
1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,
A Prediction-based Fair Replication Algorithm in Structured P2P Systems Xianshu Zhu, Dafang Zhang, Wenjia Li, Kun Huang Presented by: Xianshu Zhu College.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Ajou University, South Korea GCC 2003 Presentation Dynamic Data Grid Replication Strategy based on Internet Hierarchy Sang Min Park , Jai-Hoon Kim, and.
1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,
Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.
《 Hierarchical Caching Management for Software Defined Content Network based on Node Value 》 Reporter : Jing Liu , China Affiliation : University of Science.
Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Caitriana Nicholson, CHEP 2006, Mumbai Caitriana Nicholson University of Glasgow Grid Data Management: Simulations of LCG 2008.
Service-oriented Resource Broker for QoS-Guaranteed in Grid Computing System Yichao Yang, Jin Wu, Lei Lang, Yanbo Zhou and Zhili Sun Centre for communication.
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CMP L2 Cache Management Presented by: Yang Liu CPS221 Spring 2008 Based on: Optimizing Replication, Communication, and Capacity Allocation in CMPs, Z.
NUS.SOC.CS5248 Ooi Wei Tsang 1 Proxy Caching for Streaming Media.
Content caching and scheduling in wireless networks with elastic and inelastic traffic Group-VI 09CS CS CS30020 Performance Modelling in Computer.
Greedy Algorithms Interval Scheduling and Fractional Knapsack These slides are based on the Lecture Notes by David Mount for the course CMSC 451 at the.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
1 Traffic Engineering By Kavitha Ganapa. 2 Introduction Traffic engineering is concerned with the issue of performance evaluation and optimization of.
Internet Traffic Engineering Motivation: –The Fish problem, congested links. –Two properties of IP routing Destination based Local optimization TE: optimizing.
Architecture for Resource Allocation Services Supporting Interactive Remote Desktop Sessions in Utility Grids Vanish Talwar, HP Labs Bikash Agarwalla,
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
Proxy Caching for Streaming Media
The Impact of Replacement Granularity on Video Caching
Greedy Method 6/22/2018 6:57 PM Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.
Authors: Sajjad Rizvi, Xi Li, Bernard Wong, Fiodar Kazhamiaka
Presented by Haoran Wang
Mohammad Malli Chadi Barakat, Walid Dabbous Alcatel meeting
A Replica Location Service
ElasticTree Michael Fruchtman.
Replication Middleware for Cloud Based Storage Service
Storing and Replication in Topic-Based Pub/Sub Networks
Outline Introduction Background Distributed DBMS Architecture
Presentation transcript:

A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama

Agenda Background &Previous Work Motivation System Models Result Conclusion Future Work

Background Large scale geographically distributed systems are becoming more and more popular Replication of data is the most common solution to improve file access time Dynamic behavior of Grid users makes it difficult to make decisions concerning data replications to meet the system availability goal

Previous work: Several replica schemes compared for saving access latency and bandwidth – unlimited storage [ Ranganathan, et al ] HotZone algorithm to minimize the client-to- replica latency [ Szymaniak et al ] HBR - dynamic replica replication strategy to reduce data access time by avoiding networking congestion [ Park et al ]

Motivation: As bandwidth and computing capacity have become relatively cheaper, the data access latency can drop dramatically System reliability and availability becomes the focus Any data file access failure can lead to an incorrect result or a job crash People can tolerate a small delay but not any system unreliability

Motivation: Replicate data to:  Maximize system data availability  Assume limited storage resources  Without sacrificing data access latency

Architecture:

System Model: Note that system level data availability is more important than an individual file’s availability Two new measurements proposed: System File Missing Rate SFMR number of files potentially unavailable number of all the files requested by all the jobs. System Bytes Missing Rate SBMR number of bytes potentially unavailable total number of bytes requested by all jobs.

System Model: Given a set of jobs, J = (j 1, j 2, j 3 …, j N ), each job will access one file set F= (f 1,f 2..f k ) File must stored at a Storage Element (SE) File availability will depend on the SE availability For any file, its availability is : p i = 1-

1. SFMR = 2. SBMR= Job requests can be converted to a series of file access operations System Model:

SFMR = SBMR= The set O means the file accessing set. We assume the whole storage limit in the whole grid system is S, so we have: ≤S, C i denotes the number of copies of f i and S is the total storage available. System Model:

For each file access operation r i, at instant T, we associate it with an important variable V i, which will be set to the number of times this file will be accessed in the future. How to make such a value V i ( 4 ways ): 1.No Prediction : The Vi = 1 at any time. 2.Bio Prediction: Vi is based on the file access history to predict the value of the file by a binomial distribution. 3.Zipf Prediction: Vi is based on the file access history to predict the value of the file by a Zipf distribution. 4.Queue Prediction: The current job queue is used to predict the value of the file. If the queue is empty, this Queue Prediction function will work the same as No Prediction. System Model:

To achieve the optimal the SFMR and SBMR, we have to maximize the following values: and If the file sizes are the same, SFMR = SBMR. To better describe our scheme and algorithm, We introduce a weight value as: W i =(P j * V j ) /(C j *S j ) System Model:

Algorithm: MinDmr Optimizer (): 1. if requested file fi exists in the site then continue 2. if requested file fi does not exist in the site and site has enough free space then retrieve fi from remote site and store it. 3. if requested file fi does not exist in the site and site does not have enough free space then  sort the files in current SE by the file weight Wi in ascending order.  fetch the files from the sorted file list in order and add it into the candidates list until the accumulative file size of the candidate files are greater than or equal to the requested file. 4. Replicate the file if the value gained by replicating the file f i > accumulative value loss by deleting the candidate file f j from the SE: ΔP i *V i > ∑ΔP j *V j

Simulation Setting OptorSim : developed by the EU DataGrid Project to test dynamic replica schemes.  Eco optimizer (economical model – file replicated if maximizes profit of SE) Simulation Configuration : File Set Size : 200 Job Set Size : 10000; File set per job : 3~20 File Size : 1G

Network Topology Setting:

Results - SFMR with varying replica optimizers

Results - The Total job time with sequential access SFMR with varying job schedulers

Results – SFMR with varying job queue length Total Job Time with varying job queue length

Results – Missing Rate Gap (SBMR-SFMR) SFMR with sequential access pattern

Conclusion Proposed two metrics of data availability to evaluate the reliability of the system data in the Data Grid system Discussed how we model the system availability problem Developed four prediction-based replica optimizers with the assumption that the Grid storage space is limited Presented our replica greedy algorithm that treats the hot and cold data file differently and uses a weighting factor for the replacement scheme. Simulation results indicate our new strategies will outperform all others overall in terms of data availability

Future Work: When the file size is not unique size, how to enhance our scheme to differentiate the system file missing rate and system bytes missing rate Work on new measurements to evaluate the job missing rate Design new scheme and prediction function to minimize the new measurements