Service Primitives for Internet Scale Applications Amr Awadallah, Armando Fox, Ben Ling Computer Systems Lab Stanford University.

Slides:



Advertisements
Similar presentations
Dynamo: Amazon’s Highly Available Key-value Store
Advertisements

Replication and Consistency (2). Reference r Replication in the Harp File System, Barbara Liskov, Sanjay Ghemawat, Robert Gruber, Paul Johnson, Liuba.
CMPT Dr. Alexandra Fedorova Lecture X: Transactions.
Distributed Systems 2006 Styles of Client/Server Computing.
Transaction Management and Concurrency Control
© 2001 Stanford Distinguishing P, S, D state n Persistent: loss inevitably affects application correctness, cannot easily be regenerated l Example: billing.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Chapter 8 : Transaction Management. u Function and importance of transactions. u Properties of transactions. u Concurrency Control – Meaning of serializability.
1 Transaction Management Database recovery Concurrency control.
DBMS Functions Data, Storage, Retrieval, and Update
Presentation on Clustering Paper: Cluster-based Scalable Network Services; Fox, Gribble et. al Internet Services Suman K. Grandhi Pratish Halady.
Transaction Management WXES 2103 Database. Content What is transaction Transaction properties Transaction management with SQL Transaction log DBMS Transaction.
Transaction. A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “DATABASE RECOVERY” (PART – 1) Academic Year 2014 Spring.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Exploiting Application Semantics: Harvest, Yield CS 444A Fall 99 Software for Critical Systems Armando Fox & David Dill © 1999 Armando Fox.
Database Replication. Replication Replication is the process of sharing information so as to ensure consistency between redundant resources, such as software.
Distributed Systems Tutorial 11 – Yahoo! PNUTS written by Alex Libov Based on OSCON 2011 presentation winter semester,
CS Storage Systems Lecture 14 Consistency and Availability Tradeoffs.
RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel.
© , OrangeScape Technologies Limited. Confidential 1 Write Once. Cloud Anywhere. Building Highly Scalable Web applications BASE gives way to ACID.
Database Management System Module 5 DeSiaMorewww.desiamore.com/ifm1.
File Processing - Database Overview MVNC1 DATABASE SYSTEMS Overview.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Trade-offs in Cloud.
Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.
Chapterb19 Transaction Management Transaction: An action, or series of actions, carried out by a single user or application program, which reads or updates.
Transactions. 421B: Database Systems - Transactions 2 Transaction Processing q Most of the information systems in businesses are transaction based (databases.
Transaction Processing Concepts. 1. Introduction To transaction Processing 1.1 Single User VS Multi User Systems One criteria to classify Database is.
Durability and Crash Recovery for Distributed In-Memory Storage Ryan Stutsman, Asaf Cidon, Ankita Kejriwal, Ali Mashtizadeh, Aravind Narayanan, Diego Ongaro,
Data in the Cloud – I Parallel Databases The Google File System Parallel File Systems.
Chapter 15 Recovery. Topics in this Chapter Transactions Transaction Recovery System Recovery Media Recovery Two-Phase Commit SQL Facilities.
Data and Database Administration Chapter 12 (Contd.)
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
1 Transactions Chapter Transactions A transaction is: a logical unit of work a sequence of steps to accomplish a single task Can have multiple.
Concurrency Control. Objectives Management of Databases Concurrency Control Database Recovery Database Security Database Administration.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Feb 1, 2001CSCI {4,6}900: Ubiquitous Computing1 Eager Replication and mobile nodes Read on disconnected clients may give stale data Eager replication prohibits.
Caching Consistency and Concurrency Control Contact: Dingshan He
CAP Theorem Justin DeBrabant CIS Advanced Systems - Fall 2013.
Chapter 10 Recovery System. ACID Properties  Atomicity. Either all operations of the transaction are properly reflected in the database or none are.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
)1()1( Presenter: Noam Presman Advanced Topics in Storage Systems – Semester B 2013 Authors: A.Cidon, R.Stutsman, S.Rumble, S.Katti,
A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling and Armando Fox
Transaction Management Transparencies. ©Pearson Education 2009 Chapter 14 - Objectives Function and importance of transactions. Properties of transactions.
Software System Lab. Transactions Transaction Concept A transaction is a unit of program execution that accesses and possibly updates various.
Progress Report Armando Fox with George Candea, James Cutler, Ben Ling, Andy Huang.
Free Recovery: A Step Towards Self-Managing State Andy Huang and Armando Fox Stanford University.
Features Scalability Manage Services Deliver Features Faster Create Business Value Availability Latency Lifecycle Data Integrity Portability.
DStore: An Easy-to-Manage Persistent State Store Andy Huang and Armando Fox Stanford University.
Cloud Computing and Architecuture
Cluster-Based Scalable
Hadoop Aakash Kag What Why How 1.
The Case for a Session State Storage Layer
Trade-offs in Cloud Databases
Dynamo: Amazon’s Highly Available Key-value Store
Noah Treuhaft UC Berkeley ROC Group ROC Retreat, January 2002
Database Management System
Cassandra Transaction Processing
Introduction to NewSQL
ACID PROPERTIES.
Temple University – CIS Dept. CIS661 – Principles of Data Management
Outline Introduction Background Distributed DBMS Architecture
Decoupled Storage: “Free the Replicas!”
Transaction Properties: ACID vs. BASE
Temple University – CIS Dept. CIS616– Principles of Data Management
Concurrency Control.
Transaction Communication
Presentation transcript:

Service Primitives for Internet Scale Applications Amr Awadallah, Armando Fox, Ben Ling Computer Systems Lab Stanford University

2 Service Primitives for Interactive Internet-Scale Applications. Amr Awadallah, Armando Fox, Ben Ling. ROC, Sep Interactive Internet-Scale Application? n Millions of users. Global LB Local LB Presentation Servers + $ LB Application Servers + $ Fail over StateReplica Local LB Presentation Servers + $ Presentation Servers + $ Application Servers + $ Application Servers + $ Data Center State PS + $ LB AS + $ Fail over Local LB State PS + $ LB AS + $ Fail over

3 Service Primitives for Interactive Internet-Scale Applications. Amr Awadallah, Armando Fox, Ben Ling. ROC, Sep Motivation n A general framework to describe IIA’s and characterize the functional properties that can be traded away to improve the following operational metrics: l Throughput (how many user requests/sec?) l Interactivity (latency, how fast user requests finish?) l Availability (% of time user perceives service as up), including fast recovery to improve availability l TCO (Total Cost of Ownership) n In particular, enumerate architectural primitives that expose partial degradation of functional properties and illustrate how they can be built with “commodity” HW.

4 Service Primitives for Interactive Internet-Scale Applications. Amr Awadallah, Armando Fox, Ben Ling. ROC, Sep Recall ACID n Atomicity: For a transaction involving two or more discrete pieces of information, either all pieces changed are committed or none. n Consistency: A transaction creates a new valid state obeying all user integrity constraints. n Isolation: Changes from non-committed transactions remains hidden from all other concurrent transactions (Serializable, Repeatable-R, Commited-R, Uncommit-R) n Durability: Committed data survives beyond system restarts and storage failures.

5 Service Primitives for Interactive Internet-Scale Applications. Amr Awadallah, Armando Fox, Ben Ling. ROC, Sep ACID is too much for Internet scale n Yahoo UDB: tens of thousands of reads/sec, up to 10k writes/sec n Geoplexing used for both disaster recovery and scalability, but eager replication (strong consistency) across replicas scales poorly l If total DB size grows with # nodes, deadlock rate increases at the same rate as number of nodes l If DB size grows sublinearly, deadlock rate increases as cube of number of nodes n Even if we could use transactional DB’s and eager replication, cost would be too high

6 Service Primitives for Interactive Internet-Scale Applications. Amr Awadallah, Armando Fox, Ben Ling. ROC, Sep The New Properties n Durability (State): Hard, Soft, Stateless n Consistency: Strong, Eventual, Weak, NonC n Completeness: Full, Incomp-R, Lossy-W n Visibility: User, Entity, World

7 Service Primitives for Interactive Internet-Scale Applications. Amr Awadallah, Armando Fox, Ben Ling. ROC, Sep Durability (Hard, Soft, Stateless) n Hard: This is permanent state in the original sense of the D in ACID. n Soft: This is temporary storage in the RAM sense, i.e. if power fails then data is lost. This is cheaper and acceptable if user can rebuild state quickly. n Stateless: No need to store state on behalf of the user.

8 Service Primitives for Interactive Internet-Scale Applications. Amr Awadallah, Armando Fox, Ben Ling. ROC, Sep Consistency (Strong, Eventual, Weak) n Eventual: after a write, there is some time t after which all reads see the new value. (eg caching) n Strong: in addition, before time t, no reads see the new value (single-copy ACID consistency) n Weak: This is weak consistency in the TACT sense - captures ordering inaccuracies, or persistent staleness.

9 Service Primitives for Interactive Internet-Scale Applications. Amr Awadallah, Armando Fox, Ben Ling. ROC, Sep Completeness (Full, Incomp, Lossy) n Complete: all updates either succeed, or fail synchronously. All queries return 100% accurate data. n Incomplete Queries: This is aggregated lossy reads over partitioned state, or state sampling. The best example here is Inktomi’s distributed search where its ok that some partitions not return results under load. n Lossy Updates: This means that its ok for some commited writes to not make it. Example: Lossy Counters and online polls.

10 Service Primitives for Interactive Internet-Scale Applications. Amr Awadallah, Armando Fox, Ben Ling. ROC, Sep Visibility (World, Entity, User) n World: The state and changes to it are visible to all the world, e.g. listing a product on eBay. n Entity: State is only visible to a group of users, or within a specific subset of the data (e.g. eBay Jewlery) n User: The state and changes to it are only visible to the user interacting with it, e.g. the MyYahoo user profile. This could be simpler to implement using ReadMyWrites techniques.

11 Service Primitives for Interactive Internet-Scale Applications. Amr Awadallah, Armando Fox, Ben Ling. ROC, Sep Architectural Primitives PrimitivesTradesGains Caching, Replication Eventual Consistency Interactiveness, Availability, Throughput PartitioningEntity Visibility Interactiveness, Graceful Degradation Lossy/Sampled Aggregation Weak ConsistencyInteractiveness, Graceful Degradation

12 Service Primitives for Interactive Internet-Scale Applications. Amr Awadallah, Armando Fox, Ben Ling. ROC, Sep Examples of Primitives LossyUpdate(key,newVal) LossyAccumulator(key, updateOp) - for commutative ops LossyAggregate(searchKeys) - lossy search of an index

13 Service Primitives for Interactive Internet-Scale Applications. Amr Awadallah, Armando Fox, Ben Ling. ROC, Sep LossyUpdate implementation n LossyUpdate l Steve Gribble’s DHT: atomic ops, single-copy consistency; during failure recovery, reads are slower and writes are refused l If update occurs while updated partition is recovering => fail l Otherwise, update is persistent l When is this useful? n LossyAccumulator (for hit counter, online poll, etc) l Every period T, in-memory sub-accumulators from worker nodes are swept to persistent copy l At the same time, current value of master accumulator is read by each worker node, to serve reads locally l Worker nodes don’t backup in-memory copy => fast restart l Can bound loss rate of accumulator and inconsistency in read

14 Service Primitives for Interactive Internet-Scale Applications. Amr Awadallah, Armando Fox, Ben Ling. ROC, Sep What is given up n What is given up l Strict consistency of read copies of accumulator l Precision of accumulator value (lost updates) n What is gained: fast recovery for each node, continuous operation despite transient per-node failures