Smoke and Mirrors: Shadowing Files at a Geographically Remote Location Without Loss of Performance August 2008 Hakim Weatherspoon, Lakshmi Ganesh, Tudor.

Slides:



Advertisements
Similar presentations
PLATO: Predictive Latency- Aware Total Ordering Mahesh Balakrishnan Ken Birman Amar Phanishayee.
Advertisements

Remus: High Availability via Asynchronous Virtual Machine Replication
Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University.
© 2014 Vicom Infinity Storage System High-Availability & Disaster Recovery Overview [638] John Wolfgang Enterprise Storage Architecture & Services
High throughput chain replication for read-mostly workloads
Smoke and Mirrors: Shadowing Files at a Geographically Remote Location Without Loss of Performance - and - Message Logging: Pessimistic, Optimistic, Causal,
1 Disk Based Disaster Recovery & Data Replication Solutions Gavin Cole Storage Consultant SEE.
CS162 Section Lecture 9. KeyValue Server Project 3 KVClient (Library) Client Side Program KVClient (Library) Client Side Program KVClient (Library) Client.
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
Chapter 13 (Web): Distributed Databases
1 © Copyright 2010 EMC Corporation. All rights reserved. EMC RecoverPoint/Cluster Enabler for Microsoft Failover Cluster.
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
Vijay Vasudevan, Amar Phanishayee, Hiral Shah, Elie Krevat David Andersen, Greg Ganger, Garth Gibson, Brian Mueller* Carnegie Mellon University, *Panasas.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Overview Distributed vs. decentralized Why distributed databases
1 Choosing Disaster Recovery Solution for Database Systems EECS711 : Security Management and Audit Spring 2010 Presenter : Amit Dandekar Instructor : Dr.
Module – 12 Remote Replication
1© Copyright 2011 EMC Corporation. All rights reserved. EMC RECOVERPOINT/ CLUSTER ENABLER FOR MICROSOFT FAILOVER CLUSTER.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious.
National Manager Database Services
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
Proven Techniques for Maximizing Availability Maximum Availability Architecture Lawrence To, Shari Yamaguchi High Availability Systems Group Systems Technologies.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Disaster Recovery as a Cloud Service Chao Liu SUNY Buffalo Computer Science.
Computer Measurement Group, India Reliable and Scalable Data Streaming in Multi-Hop Architecture Sudhir Sangra, BMC Software Lalit.
Network Support for Cloud Services Lixin Gao, UMass Amherst.
Word Wide Cache Distributed Caching for the Distributed Enterprise.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
CIS 725 Wireless networks. Low bandwidth High error rates.
Implementing Multi-Site Clusters April Trần Văn Huệ Nhất Nghệ CPLS.
CS An Overlay Routing Scheme For Moving Large Files Su Zhang Kai Xu.
N-Tier Client/Server Architectures Chapter 4 Server - RAID Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept RAID – Redundant Array.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
© 2011 Cisco All rights reserved.Cisco Confidential 1 APP server Client library Memory (Managed Cache) Memory (Managed Cache) Queue to disk Disk NIC Replication.
RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel.
DotHill Systems Data Management Services. Page 2 Agenda Why protect your data?  Causes of data loss  Hardware data protection  DMS data protection.
Smoke and Mirrors: Shadowing Files at a Geographically Remote Location Without Loss of Performance Hakim Weatherspoon Joint with Lakshmi Ganesh, Tudor.
TCP Throughput Collapse in Cluster-based Storage Systems
Page 1 of John Wong CTO Twin Peaks Software Inc. Mirror File System A Multiple Server File System.
Continuous Access Overview Damian McNamara Consultant.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Page 110/12/2015 CSE 30341: Operating Systems Principles Network-Attached Storage  Network-attached storage (NAS) is storage made available over a network.
Obile etworking M-TCP : TCP for Mobile Cellular Networks Kevin Brown and Suresh Singh Department of Computer Science Univ. of South Carolina.
Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), Lakshmi Ganesh (UT Austin), and Hakim Weatherspoon.
Durability and Crash Recovery for Distributed In-Memory Storage Ryan Stutsman, Asaf Cidon, Ankita Kejriwal, Ali Mashtizadeh, Aravind Narayanan, Diego Ongaro,
An Adaptive, High Performance MAC for Long- Distance Multihop Wireless Networks Presented by Jason Lew.
Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
1 Distributed Databases BUAD/American University Distributed Databases.
StarFish: highly-available block storage Eran Gabber Jeff Fellin Michael Flaster Fengrui Gu Bruce Hillyer Wee Teck Ng Banu O¨ zden Elizabeth Shriver 2003.
1 Seneca: Remote Mirroring Done Write Minwen Ji, Alistair Veitch and John Wilkes HP Labs December 2, 2015.
High Availability in DB2 Nishant Sinha
© Logicalis Group HA options Cross Site Mirroring (XSM) and Orion Jonathan Woods.
Slingshot: Time-Critical Multicast for Clustered Applications Mahesh Balakrishnan Stefan Pleisch Ken Birman Cornell University.
Tempest: An Architecture for Scalable Time-Critical Services Mahesh Balakrishnan Amar Phanishayee Tudor Marian Professor Ken Birman.
CubicRing ENABLING ONE-HOP FAILURE DETECTION AND RECOVERY FOR DISTRIBUTED IN- MEMORY STORAGE SYSTEMS Yiming Zhang, Chuanxiong Guo, Dongsheng Li, Rui Chu,
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Distributed Databases
Determining BC/DR Methods
Operating System Reliability
Operating System Reliability
Operating System Reliability
Operating System Reliability
Overview Continuation from Monday (File system implementation)
Cisco Prime NAM for WAN Optimization Deployment
Operating System Reliability
Operating System Reliability
Operating System Reliability
Presentation transcript:

Smoke and Mirrors: Shadowing Files at a Geographically Remote Location Without Loss of Performance August 2008 Hakim Weatherspoon, Lakshmi Ganesh, Tudor Marian, Mahesh Balakrishnan, and Ken Birman Large-Scale Distributed Systems and Middleware (LADIS) September 15 th, 2008

Critical Infrastructure Protection and Compliance  U.S. Department of Treasury Study Financial Sector vulnerable to significant data loss in disaster Need new technical options  Risks are real, technology available, Why is problem not solved?

Bold New (Distributed Systems) World…  High performance applications coordinate over vast distances Systems coordinate to cache, mirror, tolerate disaster, etc. Enabled by cheap storage and cpus, commodity datacenters, and plentiful bandwidth  Scale is amazing, but tradeoffs are old ones

Mirroring and speed of light dilemma…  Want asynchronous performance to local data center  And want synchronous guarantee Primary site Remote mirror async sync Conundrum: there is no middle ground

Mirroring and speed of light dilemma…  Want asynchronous performance to local data center  And want synchronous guarantee Primary site Remote mirror sync Conundrum: there is no middle ground Local-sync Remote-sync

Challenge  How can we increase reliability of local-sync protocols? Given many enterprises use local-sync mirroring anyways  Different levels of local-sync reliability Send update to mirror immediately Delay sending update to mirror – deduplication reduces BW

Talk Outline  Introduction  Enterprise Continuity How data loss occurs How we prevent it Smoke and mirrors file system  Evaluation  Conclusion

How does loss occur? Primary site Remote mirror  Rather, where do failures occur?  Rolling disasters Packet loss Partition Site Failure Power Outage

Enterprise Continuity: Network-sync Local-sync Network-syncRemote-sync Wide-area network Primary site Remote mirror

Enterprise Continuity Middle Ground  Use network level redundancy and exposure reduces probability data lost due to network failure Primary site Remote mirror Data Packet Repair Packet Network-level Ack Storage-level Ack

Enterprise Continuity Middle Ground  Network-sync increases data reliability reduces data loss failure modes, can prevent data loss if At the same time primary site fail network drops packet And ensure data not lost in send buffers and local queues  Data loss can still occur Split second(s) before/after primary site fails… Network partitions Disk controller fails at mirror Power outage at mirror  Existing mirroring solutions can use network-sync

Smoke and Mirrors File System  A file system constructed over network-sync Transparently mirrors files over wide-area Embraces concept: file is in transit (in the WAN link) but with enough recovery data to ensure that loss rates are as low as for the remote disk case! Group mirroring consistency

Mirroring consistency and Log-Structured File System B2B1 append( B1,B2 ) V1R1I2B4B3I1 append( V1.. ) V1 R1 I2 I1 B2 B1 B3B4

Talk Outline  Introduction  Enterprise Continuity  Evaluation  Conclusion

Evaluation  Demonstrate SMFS performance over Maelstrom In the event of disaster, how much data is lost? What is system and app throughput as link loss increases? How much are the primary and mirror sites allowed to diverge?  Emulab setup 1 Gbps, 25ms to 100ms link connects two data centers Eight primary and eight mirror storage nodes 64 testers submit 512kB appends to separate logs  Each tester submits only one append at a time

Data loss as a result of disaster  Local-sync unable to recover data dropped by network  Local-sync+FEC lost data not in transit  Network-sync did not lose any data Represents a new tradeoff in design space Primary site Remote mirror - 50 ms one-way latency - FEC(r,c) = (8,3) Local- sync Network- sync Remote- sync

High throughput at high latencies

Application Throughput  App throughput measures application perceived performance  Network and Local-sync+FEC tput significantly greater than Remote-sync(+FEC)

…There is a tradeoff

Latency Distributions

Conclusion  Technology response to critical infrastructure needs  When does the filesystem return to the application? Fast — return after sending to mirror Safe — return after ACK from mirror  SMFS — return to user after sending enough FEC  Network-sync: Lossy Network  Lossless Network  Disk!  Result: Fast, Safe Mirroring independent of link length!

 Questions?