A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox

Slides:



Advertisements
Similar presentations
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Advertisements

Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
DStore: Recovery-friendly, self-managing clustered hash table Andy Huang and Armando Fox Stanford University.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.
Cse Feb-001 CSE 451 Section February 24, 2000 Project 3 – VM.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
Two-Tier Architecture of OSD Metadata Management Xianbo Zhang, Keqiang Wu 11/11/2002.
G Robert Grimm New York University (with some slides by Steve Gribble) Distributed Data Structures for Internet Services.
CSE 490dp Resource Control Robert Grimm. Problems How to access resources? –Basic usage tracking How to measure resource consumption? –Accounting How.
CSE 451: Operating Systems Winter 2010 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura.
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
Implementing High Availability
Case Study - GFS.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
1 The Google File System Reporter: You-Wei Zhang.
Selecting and Implementing An Embedded Database System Presented by Jeff Webb March 2005 Article written by Michael Olson IEEE Software, 2000.
Networked File System CS Introduction to Operating Systems.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
GigaSpaces Global HTTP Session Sharing October 2013 Massive Web Application Scaling.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Chapter 6.5 Distributed File Systems Summary Junfei Wen Fall 2013.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
GFS : Google File System Ömer Faruk İnce Fatih University - Computer Engineering Cloud Computing
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Service Primitives for Internet Scale Applications Amr Awadallah, Armando Fox, Ben Ling Computer Systems Lab Stanford University.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
The Problems HTTP is disconnected So many database vendors Create a simple consistent versatile interface on the data Look at ADO.NET classes OleDb SQL.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
)1()1( Presenter: Noam Presman Advanced Topics in Storage Systems – Semester B 2013 Authors: A.Cidon, R.Stutsman, S.Rumble, S.Katti,
Distributed File Systems 11.2Process SaiRaj Bharath Yalamanchili.
Progress Report Armando Fox with George Candea, James Cutler, Ben Ling, Andy Huang.
Free Recovery: A Step Towards Self-Managing State Andy Huang and Armando Fox Stanford University.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling, Emre Kiciman, Armando Fox
DStore: An Easy-to-Manage Persistent State Store Andy Huang and Armando Fox Stanford University.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Lecture 17 Raid. Device Protocol Variants Status checks: polling vs. interrupts Data: PIO vs. DMA Control: special instructions vs. memory-mapped I/O.
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Understanding and Improving Server Performance
Slicer: Auto-Sharding for Datacenter Applications
The Case for a Session State Storage Layer
Distributed File Systems
CSE-291 Cloud Computing, Fall 2016 Kesden
Noah Treuhaft UC Berkeley ROC Group ROC Retreat, January 2002
Maximum Availability Architecture Enterprise Technology Centre.
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
Google Filesystem Some slides taken from Alan Sussman.
Replication Middleware for Cloud Based Storage Service
Fault Tolerance Distributed Web-based Systems
RM3G: Next Generation Recovery Manager
CSE 451: Operating Systems Winter 2009 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
Mark Zbikowski and Gary Kimura
CSE 451: Operating Systems Winter 2012 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
Lecture 21: Replication Control
Decoupled Storage: “Free the Replicas!”
Lecture 21: Replication Control
Presentation transcript:

A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox

© 2003 Benjamin Ling Outline n Motivation: What is Session State? n Existing solutions n SSM: Architecture and Algorithm n SSM: Recovery-friendly n SSM: Self-Managing n Related and Future Work n Conclusion

© 2003 Benjamin Ling Example of Session State

© 2003 Benjamin Ling Session State and Existing Solutions n We focus on a subcategory of session state l Single-user, serial access, semi-persistent data l Examples: Temporary application data, application workflow l Example of usage (e.g. J2EE): Browser App Server

© 2003 Benjamin Ling Existing solutions : n File System and Databases l Poor failure behavior n Lose data (FS) l Slow recovery (Both) l Difficult to administer (DB) l Difficult to tune (both) n In-memory replication using primary/secondary: l Performance coupling l Poor failover (uneven load balancing)

© 2003 Benjamin Ling Goal n Build a session state store that is: l Failure-friendly n Does not lose data on crash n Degrades gracefully l Recovery-friendly n Recovers fast l Self-Managing l High performance n Avoids performance coupling

© 2003 Benjamin Ling Session State Manager (SSM) Brick 1 Brick 2 Brick 3 Brick 4 Brick 5 AppServer STUBSTUB STUBSTUB Redundant, in-memory hash table distributed across nodes Algorithm: Redundancy similar to quorums Write to many random nodes, wait for few (avoid performance coupling) Write to many random nodes, wait for few (avoid performance coupling) Read one Read one RAM, Network Interface

© 2003 Benjamin Ling Write example: “Write to Many, Wait for Few” Browser AppServer STUBSTUB Brick 1 Brick 2 Brick 3 Brick 4 Try to write to W random bricks, W = 4 Must wait for WQ bricks to reply, WQ = 2 Brick 5

© 2003 Benjamin Ling Write example: “Write to Many, Wait for Few” Browser AppServer STUBSTUB Brick 1 Brick 2 Brick 3 Brick 4 Try to write to W random bricks, W = 4 Must wait for WQ bricks to reply, WQ = 2 Brick 5

© 2003 Benjamin Ling Write example: “Write to Many, Wait for Few” Browser AppServer STUBSTUB Brick 1 Brick 2 Brick 3 Brick 4 Try to write to W random bricks, W = 4 Must wait for WQ bricks to reply, WQ = 2 Brick 5

© 2003 Benjamin Ling Write example: “Write to Many, Wait for Few” Browser AppServer STUBSTUB Brick 1 Brick 2 Brick 3 Brick 4 Try to write to W random bricks, W = 4 Must wait for WQ bricks to reply, WQ = Brick 5

© 2003 Benjamin Ling Algorithm Properties n Client remembers metadata l Fate sharing n Stubs are stateless n Negative feedback loop

© 2003 Benjamin Ling SSM: Recovery-Friendly n Failure l No data is lost, WQ-1 copies of the data remain l State is available for R/W during failure n Recovery l Start a new brick – don’t need to recover anything l No special case recovery code (restart=recovery) l State is available for R/W during brick restart n Repair phase does not reduce throughput/performance l Session state is self-recovering n User’s access pattern will cause data to be rewritten

© 2003 Benjamin Ling SSM: Self-Managing n Adaptive: l Stub maintains count of maximum allowable in-flight requests to each brick n Additive increase on successful request n Multiplicative decrease on timeout l Stubs discover load capacity of each brick  Self-Tuning n Admission control l Stubs say “no” if insufficient bricks l Propagate backpressure from bricks to clients n Turn users away under overload  Self-Protecting

© 2003 Benjamin Ling Self-Tuning and Self-Protecting Without Add Inc/Mult Dec adapatation… Overload with AI/MD adaptation

© 2003 Benjamin Ling Other implementation details n Garbage collection l Generational hash table n Hash table of hash tables n Each hash table has an associated time range n When time has passed, GC that table l No reference counting, scanning, etc.

© 2003 Benjamin Ling Is it cheap? Is it fast? Is it easy to use? n How much does replication cost? l With 10 bricks, 1G memory, state size 8k, replication factor of 3 l Serve around 416,000 concurrent users n Configurable request timeout – currently 60 ms l Dwarfed by computation time and client RT time n Easy to add a brick, kill a brick l System continues running

© 2003 Benjamin Ling Publications The Case for a Session State Storage Layer Ben Ling, Armando Fox 9th Workshop on Hot Topics in Operating Systems (HotOS IX), Lihue, HI, May 2003 A Self-Managing Session State Layer Ben Ling, Armando Fox Accepted to the 5th Annual Workshop On Active Middleware Services (AMS 2003), Seattle, WA, June

© 2003 Benjamin Ling Related Work n Palimpsest – Timothy Roscoe, Intel l Temporal storage l Erasure coding l No guarantees, just estimates n DeStor – Andy Huang, Stanford l Persistent, multi-user, non-transactional data n FAB – HP Labs l Enterprise disk storage l Redundancy at disk block level

© 2003 Benjamin Ling Future Work n Do fault analysis and model failure l Memory and network failure modes l Performance faults? n How to choose replication factor? l 10 bricks, WQ of 3, inter-request rate of 5 minutes -> “5 nines” of availability if MTTF of bricks > 22 minutes n Adaptively change replication factor?

© 2003 Benjamin Ling SSM: Relaxing ACID n A – we guarantee n C – guaranteed by workload (full rewrite of state) n I – guaranteed by workload (single user, serial-access) n D – relaxed (ephemeral guarantee, RAM enough)  n Fast, simple, clean recovery l No data loss on failure l Data can be R/W during failure/recovery n Self-Managing

© 2003 Benjamin Ling Summary n We have built a system for: l Semi-persistent storage for single-user, serial-access data l Recovery friendly: n Crash Only – Crash-safe, fast recovery n No special case recovery code n Reboot any individual node n Continuous data availability l Self-Managing: n Self-Tuning and Protecting n Simple management and fault enforcement model Benjamin Ling

© 2003 Benjamin Ling SSM: Recovery-Friendly, Self-Managing Store Questions or Comments? Benjamin Ling