The Case for a Session State Storage Layer

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

Replication and Consistency (2). Reference r Replication in the Harp File System, Barbara Liskov, Sanjay Ghemawat, Robert Gruber, Paul Johnson, Liuba.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
DStore: Recovery-friendly, self-managing clustered hash table Andy Huang and Armando Fox Stanford University.
Approaches to EJB Replication. Overview J2EE architecture –EJB, components, services Replication –Clustering, container, application Conclusions –Advantages.
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
City University London
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Wide-area cooperative storage with CFS
Case Study - GFS.
Web Application Architecture: multi-tier (2-tier, 3-tier) & mvc
Distributed File Systems Sarah Diesburg Operating Systems CS 3430.
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
Exploiting Application Semantics: Harvest, Yield CS 444A Fall 99 Software for Critical Systems Armando Fox & David Dill © 1999 Armando Fox.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Computer Measurement Group, India Reliable and Scalable Data Streaming in Multi-Hop Architecture Sudhir Sangra, BMC Software Lalit.
1 Oracle 9i AS Availability and Scalability Margaret H. Mei Senior Product Manager, ST.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Advanced Web Forms with Databases Programming Right from the Start with Visual Basic.NET 1/e 13.
CH2 System models.
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
Probabilistic Consistency and Durability in RAINS: Redundant Array of Independent, Non-Durable Stores Andy Huang and Armando Fox Stanford University.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Concurrency Control. Objectives Management of Databases Concurrency Control Database Recovery Database Security Database Administration.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Service Primitives for Internet Scale Applications Amr Awadallah, Armando Fox, Ben Ling Computer Systems Lab Stanford University.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Chap 7: Consistency and Replication
Distributed File Systems 11.2Process SaiRaj Bharath Yalamanchili.
A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox
Free Recovery: A Step Towards Self-Managing State Andy Huang and Armando Fox Stanford University.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling, Emre Kiciman, Armando Fox
DStore: An Easy-to-Manage Persistent State Store Andy Huang and Armando Fox Stanford University.
Chapter Five Distributed file systems. 2 Contents Distributed file system design Distributed file system implementation Trends in distributed file systems.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
GENERAL SCALABILITY CONSIDERATIONS
Snapshots, checkpoints, rollback, and restart
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Cluster-Based Scalable
Free Transactions with Rio Vista
Slicer: Auto-Sharding for Datacenter Applications
N-Tier Architecture.
Distributed File Systems
Distributed Systems – Paxos
CSE-291 Cloud Computing, Fall 2016 Kesden
Noah Treuhaft UC Berkeley ROC Group ROC Retreat, January 2002
Software Design and Architecture
CHAPTER 3 Architectures for Distributed Systems
Introduction to Networks
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
NFS and AFS Adapted from slides by Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift.
Providing Secure Storage on the Internet
Free Transactions with Rio Vista
Cary G. Gray David R. Cheriton Stanford University
Component-based Applications
Decoupled Storage: “Free the Replicas!”
Distributed Databases Recovery
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
by Mikael Bjerga & Arne Lange
Concurrency Control.
Presentation transcript:

The Case for a Session State Storage Layer Benjamin Ling and Armando Fox Stanford University Paper available in the back

Outline What is session state? Existing solutions, and why they are inadequate Proposed solution: Middle-tier storage layer Related and Future Work

What is Session State? Users interact with applications for a period of time, called a session Session state –lifetime is the duration of a user session, relevant only to a single user User workflow in enterprise software Shopping cart in ecommerce Many architectures produce/consume user session state (e.g. J2EE) Session state is a large class of state – we address a subcategory

An example of usage Example of usage: Alice is using a web-based marketing application, specifying target customers and offers they should receive App Server 1 Browser

An example of usage Example of usage: Alice is using a web-based marketing application, specifying target customers and offers they should receive 2 App Server 1 3 Browser

An example of usage Example of usage: Alice is using a web-based marketing application, specifying target customers and offers they should receive 2 App Server 1 3 Browser 4 5

An example of usage Example of usage: Alice is using a web-based marketing application, specifying target customers and offers they should receive Important: Session State must be present on each interaction Retrieval of session state is in critical path of app 2 App Server 1 3 Browser 4 6 5

Exploiting properties of session state Not shared – user reads her own state No concurrency control needed Is semi-persistent Temporal, lease-like guarantee is sufficient Is keyed to a particular user – don’t need general query mech Single-key lookup sufficient Is updated on every interaction Previous copies can be discarded Does not need to be ACID Only atomic update and consistency necessary

Existing solutions Database File System In memory, non-replicated In memory, replicated

Database or File System “I already have a DB and FS, why don’t I just use one of them to store session state?” Drawbacks: D1: Contention – Session state requests interfere with requests for persistent objects D2: Failure and recovery is expensive Slow -> bad for end users Fast -> usually very expensive D3: Session cleanup an afterthought Someone has to scrub -> degrades performance D4: Performance hit – not just a round-trip, sometimes disk access

In-Memory: Replicated and non-replicated Try to avoid network roundtrip, usually faster than DB/FS Affinity: Require a user to “stick” to a particular server Middle-tier becomes stateful Affinity limits load balancing options Replicated -> pay RT cost, but usually not disk access Non-replicated -> lose data on crash

Replicated in-memory solutions: Drawbacks D5: Contention -> secondary App Servers now face contention from session state updates D6: Recovery more difficult -> special case code necessary -> system is harder to reason about D7: Poor failure/recovery performance D8: Lack of separation of concerns App Server now does state storage and app processing D9: Performance coupling

Proposed solution: Middle-tier Storage Design principles P1: Avoid special case recovery code Reduces total cost of ownership P2: Design for separation of concerns P3: Session cleanup should be easy P4: Graceful degradation upon node failures No cache warming effects, uneven failure P5: Avoid performance coupling

Assumptions Secure and well-administered cluster System Area Network (high throughput, low latency) No network partitions UPS reduces probability of system-wide simultaneous Fail-stop components

Middle-tier storage components AppServer S T U B Bricks: Stores objects via Hash table interface, periodic beacons Stubs: interface with bricks, keeps track of live bricks Brick 1 SAN Brick 2 AppServer S T U B Brick N

Write Algorithm (Stub -> Brick) Call W the write set, WQ the write quota, R the read set. A stub: Calculate checksum for object and expiration time. Create a list of bricks L, initially the empty set. Choose W random bricks, and issue the write of {object, checksum, and expiry} to each brick. Wait for WQ of the bricks to return with success messages, or until t elapsed. When each brick replies, add its identifier to the set L. If t has elapsed and the size of L is less than WQ, repeat step 3. Otherwise, continue. Create a cookie consisting of H, the identifiers of the WQ bricks that acknowledged the write, and the expiry, and calculate a checksum for the cookie. Return the cookie to the caller.

Write example Try to write to W bricks, W = 4 Must wait for WQ bricks to reply, WQ = 2 Brick 1 AppServer S T U B Brick 2 Browser Brick 3 Brick 4 Brick 5

Write example Try to write to W bricks, W = 4 Must wait for WQ bricks to reply, WQ = 2 Brick 1 AppServer S T U B Brick 2 Browser Brick 3 Brick 4 Brick 5

Write example Try to write to W bricks, W = 4 Must wait for WQ bricks to reply, WQ = 2 Brick 1 AppServer S T U B Brick 2 Browser Brick 3 Brick 4 Brick 5

Write example Try to write to W bricks, W = 4 Must wait for WQ bricks to reply, WQ = 2 Brick 1 AppServer S T U B Brick 2 Browser 14 Brick 3 Brick 4 Brick 5

Read Algorithm (Stub -> Brick) Verify the checksum on the cookie Issue the read to R random bricks chosen from the list of WQ bricks contained in the cookie. Wait for 1 of the bricks to return, or until t elapses. If the timeout has elapsed and no response has been returned, repeat step 2. Otherwise, continue. Verify checksum and expiration. If checksum is invalid, repeat step 2. Otherwise continue. Return the object to the caller.

Read example Ask R bricks for the read, wait for fastest 1 to reply. R = 2 Brick 1 Brick 2 Brick 3 Brick 4 14 AppServer S T U B Browser

Read example Ask R bricks for the read, wait for fastest 1 to reply. R = 2 Brick 1 Brick 2 Brick 3 Brick 4 14 AppServer S T U B Browser

Read example Ask R bricks for the read, wait for fastest 1 to reply. R = 2 Brick 1 Brick 2 Brick 3 Brick 4 14 AppServer S T U B Browser

Read example Ask R bricks for the read, wait for fastest 1 to reply. R = 2 Brick 1 Brick 2 Brick 3 Brick 4 14 AppServer S T U B Browser

What happens on failure? All components stateless/soft-state Restart! App Server Crash Restart and reconstruct list of live bricks Brick Crash All state on brick is lost, but… Copies are in WQ -1 other bricks, so state is not lost Rejuvenation of state Easily add new nodes to a running system

Interesting properties Negative feedback loop example Let write group for a given write be A, B, C. B is slow. WQ =2 Since B is slow, will not reply to write of a key X B won’t be involved in read of X May help ease load on overloaded nodes Bricks can say “no” Since more writes are issued than necessary, when overloaded, a brick can drop writes

Related Work Quorums – don’t need to read so many copies, since we know where up-to-date copies live DDS – performance coupling, persistent, negative cache-warming effects “Directory oriented available copies” – uses a directory to find available copies Berkeley DB – emphasized fast restart and failure as a common case DeStor – focuses on persistent storage

Conclusion Session State Server that: Performs well, without coupling Fault-tolerant Recovers instantly Scalable Lowers total cost of ownership

Questions? bling@cs.stanford.edu http://www.stanford.edu/~bling/SessionStore.ps