Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University.

Slides:



Advertisements
Similar presentations
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Advertisements

Precept 6 Hashing & Partitioning 1 Peng Sun. Server Load Balancing Balance load across servers Normal techniques: Round-robin? 2.
The Zebra Striped Network Filesystem. Approach Increase throughput, reliability by striping file data across multiple servers Data from each client is.
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Facebook f4 Steve Ko Computer Sciences and Engineering University at Buffalo.
Ceph: A Scalable, High-Performance Distributed File System
Ceph: A Scalable, High-Performance Distributed File System Priya Bhat, Yonggang Liu, Jing Qin.
1 Routing and Scheduling in Web Server Clusters. 2 Reference The State of the Art in Locally Distributed Web-server Systems Valeria Cardellini, Emiliano.
Module 8: Concepts of a Network Load Balancing Cluster
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Other File Systems: AFS, Napster. 2 Recap NFS: –Server exposes one or more directories Client accesses them by mounting the directories –Stateless server.
Parallel File System. Outline Working Progress Distributed Metadata Cluster  Subtree Partitioning  Pure Hash.
Router Architectures An overview of router architectures.
Storage Networking. Storage Trends Storage growth Need for storage flexibility Simplify and automate management Continuous availability is required.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
Distributed File Systems Sarah Diesburg Operating Systems CS 3430.
Network File Systems Victoria Krafft CS /4/05.
PETAL: DISTRIBUTED VIRTUAL DISKS E. K. Lee C. A. Thekkath DEC SRC.
© 2001 by Prentice Hall7-1 Local Area Networks, 3rd Edition David A. Stamper Part 3: Software Chapter 7 LAN System Software.
OpenFlow-Based Server Load Balancing GoneWild Author : Richard Wang, Dana Butnariu, Jennifer Rexford Publisher : Hot-ICE'11 Proceedings of the 11th USENIX.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Page 1 of 30 NFS Industry Conference October 22-23, 2002 NFSNFS INDUSTRYINDUSTRY CONFERENCECONFERENCE Spinnaker Networks, Inc
LECTURE 9 CT1303 LAN. LAN DEVICES Network: Nodes: Service units: PC Interface processing Modules: it doesn’t generate data, but just it process it and.
Networked File System CS Introduction to Operating Systems.
Pooja Shetty Usha B Gowda.  Network File Systems (NFS)  Drawbacks of NFS  Parallel Virtual File Systems (PVFS)  PVFS components  PVFS application.
Module 12: Designing High Availability in Windows Server ® 2008.
Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
Page 110/12/2015 CSE 30341: Operating Systems Principles Network-Attached Storage  Network-attached storage (NAS) is storage made available over a network.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
Sensitivity of Cluster File System Access to I/O Server Selection A. Apon, P. Wolinski, and G. Amerson University of Arkansas.
1 Windows 2000 Product family (Week 3, Monday 1/23/2006) © Abdou Illia, Spring 2006.
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
Sandor Acs 05/07/
CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.
Web Cache Redirection using a Layer-4 switch: Architecture, issues, tradeoffs, and trends Shirish Sathaye Vice-President of Engineering.
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
IP1 The Underlying Technologies. What is inside the Internet? Or What are the key underlying technologies that make it work so successfully? –Packet Switching.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Ceph: A Scalable, High-Performance Distributed File System
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Rehab AlFallaj.  Network:  Nodes: Service units: PC Interface processing Modules: it doesn’t generate data, but just it process it and do specific task.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Review CS File Systems - Partitions What is a hard disk partition?
AFS/OSD Project R.Belloni, L.Giammarino, A.Maslennikov, G.Palumbo, H.Reuter, R.Toebbicke.
CSci8211: Distributed Systems: RAMCloud 1 Distributed Shared Memory/Storage Case Study: RAMCloud Developed by Stanford Platform Lab  Key Idea: Scalable.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Dynamic and Scalable Distributed Metadata Management in Gluster File System Huang Qiulan Computing Center,Institute of High Energy Physics,
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Table General Guidelines for Better System Performance
Distributed File Systems
Steve Ko Computer Sciences and Engineering University at Buffalo
Steve Ko Computer Sciences and Engineering University at Buffalo
Storage Networking.
VIRTUAL SERVERS Presented By: Ravi Joshi IV Year (IT)
Unit OS10: Fault Tolerance
CT1303 LAN Rehab AlFallaj.
Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016
Storage Networking.
COS 518: Advanced Computer Systems Lecture 9 Michael Freedman
Be Fast, Cheap and in Control
Table General Guidelines for Better System Performance
Presentation transcript:

Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University  Department of Computer Science Goals Devise a highly scalable network storage architecture Interpose on a standard file system protocol. –Prototype supports NFS version 3. Distribute responsibilities and data. –Divide functions (e.g., data vs. metadata). –Scale functions by aggregating servers. This talk: Request routing to scale functions.

Duke University  Department of Computer Science In the Beginning... NFS ClientNFS Server Network Client sends and receives standard NFS packets. Server sends and receives standard NFS packets.

Duke University  Department of Computer Science Interposed Routing NFS Client*Server Client sends and receives standard NFS packets. Slice µProxy intercepts and redirects NFS packets to specialized servers. µ*Server

Duke University  Department of Computer Science Outline Interposed routing Slice architecture Functional decomposition Data decomposition Functions Block-I/O Small-file Metadata Request routing Performance

Duke University  Department of Computer Science Slice Architecture file placement policy network storage array small-file servers directory servers name space requests bulk I/O small file read/write name routing striping policy client µproxy

Duke University  Department of Computer Science Functional Decomposition file placement policy network storage array small-file servers directory servers name space requests bulk I/O small file read/write name routing striping policy client µproxy

Duke University  Department of Computer Science Data Decomposition file placement policy network storage array small-file servers directory servers name space requests bulk I/O small file read/write name routing striping policy client µproxy

Duke University  Department of Computer Science Outline Interposed routing Slice architecture Functional decomposition Data decomposition Functions Block-I/O Storage Nodes Small-file Servers Directory Servers Request routing Performance

Duke University  Department of Computer Science Block-I/O Storage Nodes Network storage nodes provide all storage in Slice. Prototype uses a simple object-based model. –Read, write, remove, truncate. Clients access storage nodes directly. –Static striping, or flexible block-maps. –Optional RAID “10” mirrored striping. network storage array bulk I/O striping policy client µproxy

Duke University  Department of Computer Science Handle read and write operations on small files. All I/O requests below threshold (e.g., 64 KB). –Also the initial “small” segments of large files. Absorb and aggregate I/O on small files. –Data backed by storage array. Storage nodes need not handle small files well. Small-File Servers small-file servers file placement policy small file read/write client µproxy network storage array

Duke University  Department of Computer Science Directory Servers Handle name space operations. Associate name with attributes (lookup, getattr). Manage directory contents (create, readdir). –Preserve dependencies between objects. Create affects new object and its parent directory. directory servers name routing policy name space requests client µproxy network storage array

Duke University  Department of Computer Science Outline Interposed routing Slice architecture Functional decomposition Data decomposition Functions Block-I/O Storage Nodes Small-file Servers Directory Servers Request routing Performance

Duke University  Department of Computer Science Request Routing Goals Focus on name space. Spread name space across multiple servers. –Balance capacity and load. (Maybe) keep entries on same server as parent. –Some name space ops involve multiple sites. Create entry, update parent modify time.

Duke University  Department of Computer Science Request Routing Three policies for name space request routing: Volume Partitioning: –Divide the name space into volumes. –Volumes have well defined mount points. Mkdir Switching: –Items on same server as parent directory. –Some mkdirs redirect to another server. Name Hashing: –Name space is a distributed hash table. –Requests hash by name, parent dir.

Duke University  Department of Computer Science Outline Interposed routing Slice architecture Functional decomposition Data decomposition Functions Block-I/O Storage Nodes Small-file Servers Directory Servers Request routing Performance

Duke University  Department of Computer Science Experiment Configuration Hardware Client: 450 MHz P3 with 32 bit 33 MHz PCI. Server: 733 MHz P3 with 64 bit 66 MHz PCI. Server: 8x 18 GB Seagate Ultra-2 Cheetah disks. Gigabit Ethernet with 9 KB “jumbo” frames. Software FreeBSD 4.0-release. Modified NFS stack and firmware for zero-copy. NFS uses UDP/IP with 32 KB MTU. Slice kernel modules; µProxy is IP filter on client.

Duke University  Department of Computer Science Block-I/O Scaling

Duke University  Department of Computer Science Name Space Scaling

Duke University  Department of Computer Science Mkdir Switching Affinity

Duke University  Department of Computer Science SPECsfs97 Throughput

Duke University  Department of Computer Science SPECsfs97 Latency

Duke University  Department of Computer Science Summary Slice interposes between NFS client and server. Simple redirection of NFS version 3 packets. –Slice µProxy inspects and rewrites packets. Separates functions normally for central server. –Functional decomposition for request stream. –Data decomposition to scale each function. Prototype shows performance and scalability.

Duke University  Department of Computer Science EOF

Duke University  Department of Computer Science Handling Failures Approach: write-ahead logging. µProxy logs intentions for “dangerous” operations to coordinator. –Also logs when finished. Coordinator completes or aborts aging operations. –Roll forward, or back. Independent of client, server, and storage nodes. µ CoordinatorNFS Client 4. Safe again 2. Danger! 3. (do it) 1. Request 5. Response