1 Towards a Grid File System Based on a Large-Scale BLOB Management Service Viet-Trung Tran 1, Gabriel Antoniu 2, Bogdan Nicolae 3, Luc Bougé 1, Osamu.

Slides:



Advertisements
Similar presentations
A Proposal of Capacity and Performance Assured Storage in The PRAGMA Grid Testbed Yusuke Tanimura 1) Hidetaka Koie 1,2) Tomohiro Kudoh 1) Isao Kojima 1)
Advertisements

Distributed Data Processing
Data Management Expert Panel - WP2. WP2 Overview.
Database Architectures and the Web
Ceph: A Scalable, High-Performance Distributed File System
Ceph: A Scalable, High-Performance Distributed File System Sage Weil Scott Brandt Ethan Miller Darrell Long Carlos Maltzahn University of California, Santa.
Ceph: A Scalable, High-Performance Distributed File System Priya Bhat, Yonggang Liu, Jing Qin.
Transaction.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Technical Architectures
An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.
Asper School of Business University of Manitoba Systems Analysis & Design Instructor: Bob Travica System architectures Updated: November 2014.
Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) Monday, Aug. 19, 2002.
Seafile - Scalable Cloud Storage System
Microsoft ® Application Virtualization 4.5 Infrastructure Planning and Design Series.
ORACLE APPLICATION SERVER BY PHANINDER SURAPANENI CIS 764.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
Fault tolerance in BlobSeer Bogdan Nicolae University of Rennes 1 Jesús Montes Sánchez CesViMa – Universidad Politécnica de Madrid.
JuxMem: An Adaptive Supportive Platform for Data Sharing on the Grid Gabriel Antoniu, Luc Bougé, Mathieu Jan IRISA / INRIA & ENS Cachan, France Workshop.
MODULE – 8 OBJECT-BASED AND UNIFIED STORAGE
LEGO – Rennes, 3 Juillet 2007 Deploying Gfarm and JXTA-based applications using the ADAGE deployment tool Landry Breuil, Loïc Cudennec and Christian Perez.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Latest Relevant Techniques and Applications for Distributed File Systems Ela Sharda
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Large-scale Deployment in P2P Experiments Using the JXTA Distributed Framework Gabriel Antoniu, Luc Bougé, Mathieu Jan & Sébastien Monnet PARIS Research.
By: Ashish Gohel 8 th sem ISE.. Why Cloud Computing ? Cloud Computing platforms provides easy access to a company’s high-performance computing and storage.
Peer-to-Peer Distributed Shared Memory? Gabriel Antoniu, Luc Bougé, Mathieu Jan IRISA / INRIA & ENS Cachan/Bretagne France Dagstuhl seminar, October 2003.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
High Performance File System Service for Cloud Computing Kenji Kobayashi, Osamu Tatebe University of Tsukuba, JAPAN.
CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.
JuxMem: An Adaptive Supportive Platform for Data Sharing on the Grid Gabriel Antoniu, Luc Bougé, Mathieu Jan IRISA / INRIA & ENS Cachan, France Grid Data.
Building Hierarchical Grid Storage Using the GFarm Global File System and the JuxMem Grid Data-Sharing Service Gabriel Antoniu, Lo ï c Cudennec, Majd Ghareeb.
The JuxMem-Gfarm Collaboration Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage Using the Gfarm Global File System Gabriel Antoniu,
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
Ceph: A Scalable, High-Performance Distributed File System
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Group member: Kai Hu Weili Yin Xingyu Wu Yinhao Nie Xiaoxue Liu Date:2015/10/
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Aneka Cloud ApplicationPlatform. Introduction Aneka consists of a scalable cloud middleware that can be deployed on top of heterogeneous computing resources.
Chapter 2 Database Environment.
The Sky.NET Framework COMP 410 April 22, Overview Brief overview of the current status of the Sky.Net FrameworkBrief overview of the current status.
AFS/OSD Project R.Belloni, L.Giammarino, A.Maslennikov, G.Palumbo, H.Reuter, R.Toebbicke.
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
Distributed Computing Systems CSCI 6900/4900. Review Definition & characteristics of distributed systems Distributed system organization Design goals.
Data-Centric Systems Lab. A Virtual Cloud Computing Provider for Mobile Devices Gonzalo Huerta-Canepa presenter 김영진.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
An Introduction to GPFS
BIG DATA/ Hadoop Interview Questions.
Autonomic aspects in cloud data management Alexandra Carpen-Amarie KerData.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Ivy: A Read/Write Peer-to- Peer File System Authors: Muthitacharoen Athicha, Robert Morris, Thomer M. Gil, and Benjie Chen Presented by Saurabh Jha 1.
Introduction to Data Management in EGI
CHAPTER 3 Architectures for Distributed Systems
Storage Virtualization
A Survey on Distributed File Systems
Database Environment Transparencies
AWS Cloud Computing Masaki.
Database System Architectures
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

1 Towards a Grid File System Based on a Large-Scale BLOB Management Service Viet-Trung Tran 1, Gabriel Antoniu 2, Bogdan Nicolae 3, Luc Bougé 1, Osamu Tatebe 4 1 ENS Cachan - Brittany, France 2 INRIA Centre Rennes - Bretagne-Atlantique, France 3 University of Rennes 1, France 4 University of Tsukuba, Japan

2 New Challenges for Large-scale Data Storage Scalable storage management for new-generation, data-oriented high-performance applications  Massive, unstructured data objects (Terabytes)  Many data objects (10³)  High concurrency (10³ concurrent clients)  Fine-grain access (Megabytes)  Large-scale distributed platform: large clusters, grids, clouds, desktop grids Applications: distributed, with high throughput under concurrency  E-science Data-centric applications  Storage for cloud services  Map-Reduce-based data mining  Checkpointing on desktop grids

3 BlobSeer: a BLOB-based Approach Developed by the KerData team at INRIA Rennes  Recently created from the PARIS project-team Generic data-management platform for huge, unstructured data  Huge data (TB)  Highly concurrent, fine-grain access (MB): R/W/A  Prototype available Key design features  Decentralized metadata management  Beyond MVCC: multiversioning exposed to the user  Lock-free write access through versioning  Write-once pages A back-end for higher-level, sophisticated data management systems  Short term: highly scalable distributed file systems  Middle term: storage for cloud services  Long term: extremely large distributed databases BLOB

4 BlobSeer: Design Each blob is fragmented into equally-sized “pages”  Allows huge data amounts to be distributed all over the peers  Avoids contention for simultaneous accesses to disjoint parts of the data block Metadata : locate pages that make up a given blob  Fine-grained and distributed  Efficiently managed through a DHT Versioning  Update/append: generate new pages rather than overwrite  Metadata is extended to incorporate the update  Both the old and the new version of the blob are accessible

5 Clients Providers Metadata providers Provider manager Version manager BlobSeer: Architecture Clients  Perform fine grain blob accesses Providers  Store the pages of the blob Provider manager  Monitors the providers  Favors data load balancing Metadata providers  Store information about page location Version manager  Ensures concurrency control

6 Version Management in BlobSeer Beyond Multiversion Concurrency Control

7 Background: Object-based File Systems From block-based to object-based file systems Block management at the file server Object management is delegated to object-based storage devices (OSDs) or storage servers Advantages  Scalability: offload ~90% of workload from the metadata server  OSDs and storage servers are more autonomous, self-managed and easier to share Examples of object-based file systems  Lustre (Schwan, 2003)  Ceph (Weil et al., 2006)  GoogleFS (Ghemawat et al., 2003)  XtreemFS (Hupfeld et al., 2008)

8 Towards a BLOB-based File System Goal: Build a BLOB-based file system, able to cope with huge data and heavy access concurrency in a large-scale distributed environment Hierarchical approach  High-level file system metadata management: the Gfarm grid file system  Low-level object management: the BlobSeer BLOB management system BlobSeer Gfarm

9 gfmd: Gfarm management daemon gfsd : Gfarm storage daemon The Gfarm Grid File System The Gfarm file system [University of Tsukuba, Japan] A distributed file system designed for working at the Grid scale File can be shared among all nodes and clients Applications can access files using the same path regardless of the file location Main components  Gfarm's metadata server  File system nodes  Gfarm clients

10 The Gfarm Grid File System [2] Advanced features  User management  Authentication and single sign-on based on Grid Security Infrastructure (GSI)  POSIX file system API (gfarm2fs) and Gfarm API Limitations  No file striping, thus file size is limited  No access concurrency  No versioning capability

11 Why combine Gfarm and BlobSeer? Lack of POSIX file system interface Access concurrency Fine-grain access Versioning BlobSeer POSIX interface User management Support GSI File sizes are limited Not suitable for concurrent access No versioning Gfarm Access concurrency Huge file sizes Fine-grain access Versioning Gfarm/BlobSeer POSIX interface User management Support GSI General idea: Gfarm handles file metadata, BlobSeer handles file data

12 The first approach Each file system node (gfsd) connects to BlobSeer to store/get Gfarm file data It must manage the mapping from Gfarm files to BLOBs It always acts as an intermediary for data transfer Bottleneck at the file system node Coupling Gfarm and BlobSeer [1] Gfarm BlobSeer

13 The first approach Each file system node (gfsd) connects to BlobSeer to store/get Gfarm file data It must manage the mapping from Gfarm files to BLOBs It always acts as an intermediary for data transfer Bottleneck at the file system node Coupling Gfarm and BlobSeer [1] Gfarm BlobSeer

14 Coupling Gfarm and BlobSeer [2] Second approach The gfsd maps Gfarm files to BLOBs, and responds the client's request within the BLOB ID Then, the client directly access data in BlobSeer Gfarm

15 Gfarm/BlobSeer: Design Considerations The whole system is dimensioned for a multi-site Grid One BlobSeer instance at each site The client can dynamically switch between these two well defined access modes Remote access mode (red lines): if the client cannot directly access BlobSeer BlobSeer direct access mode (blue lines): the clients can directly access BlobSeer

16 Discussion The integrated system can support huge file sizes since Gfarm file is now transparently striped over BlobSeer data providers. Remote access mode Only required modifying the source code of the gfsd daemon Could not resolve the current bottleneck problem on the gfsd daemon BlobSeer direct access mode Supports concurrent accesses Was more complicated in implementation, a new access mode had to be introduced into Gfarm

17 Experimental Evaluation on Grid'5000 [1] Access throughput with no concurrency Gfarm benchmark measures throughput when writing/reading Configuration: 1 gfmd, 1 gfsd, 1 client, 9 data providers, 8MB page size BlobSeer direct access mode provides a higher throughput WritingReading

18 Experimental Evaluation on Grid'5000 [2] Access throughput under concurrency Configuration 1 gfmd 1 gfsd 24 data providers Each client accesses 1GB of a 10GB file Page size 8MB Gfarm sequentializes concurrent accesses

19 Experimental evaluation on Grid'5000 [3] Access throughput under heavy concurrency Configuration (deployed on 157 nodes of Rennes site) 1 gfmd 1 gfsd Each client accesses 1GB of a 64GB file Page size 8MB Up to 64 concurrent clients 64 data providers 24 metadata providers 1 version manager 1 page manager

20 Introducting Versioning in Gfarm/BlobSeer Versioning is the capability of accessing data of a specified file version Not only to rollback data when desired, but also to access different file versions within the same computation Favors efficient access concurrency Approach Delegate versioning management to BlobSeer A Gfarm file is mapped to a single BLOB A file version is mapped to the corresponding version of the BLOB

21 Difficulties Gfarm does not support versioning Some issues we had to deal with: New extended API for clients A coordination between clients and gfsd daemons via new RPC calls Modification of the inner data structures of Gfarm in order to handle file versions Versioning is not a standard feature of POSIX file systems

22 Versioning interface Versioning capability was fully implemented At Gfarm API level gfs_get_current_version(GFS_File gf,size_t *nversion) ‏ gfs_get_latest_version(GFS_File gf,size_t *nversion) ‏ gfs_set_version(GFS_File gf,size_t version) ‏ gfs_pio_vread(size_t nversion,GFS_File gf, void *buffer, int size, int *np) ‏ At POSIX file system level Defined some ioctl commands fd = open(argv[1], 0_RDWR); np = pwrite(fd, buffer_w, BUFFER_SIZE,0); ioctl(fd, BLOB_GET_LATEST_VERSION, &nversion); ioctl(fd, BLOB_SET_ACTIVE_VERSION, &nversion); np = pread(fd, buffer_r, BUFFER_SIZE,0); ioctl(fd, BLOB_GET_ACTIVE_VERSION, &nversion); close(fd);

23 Conclusion The experimental results suggest that our prototype well exploited the combined advantages of Gfarm and BlobSeer. A file system API Concurrent accesses Huge file sizes (tested up to 64GB file) Versioning Future work Experiment the system on a more complex topology as several sites of Grid'5000, and with a WAN network Ensure the semantic of consistency since Gfarm has not yet maintained cache coherence between buffers on clients Compare our prototype to other Grid file systems

24 Application scenarios Grid and cloud storage for data-mining applications with massive data Distributed storage for large-scale Petascale computing applications Storage for desktop grid applications with high write-throughput requirements Storage support for extremely large databases