Cassandra - A Decentralized Structured Storage System

Slides:



Advertisements
Similar presentations
Dynamo: Amazon’s Highly Available Key-value Store
Advertisements

CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.
CASSANDRA-A Decentralized Structured Storage System Presented By Sadhana Kuthuru.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Dynamo: Amazon’s Highly Available Key-value Store ID2210-VT13 Slides by Tallat M. Shafaat.
Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.
High throughput chain replication for read-mostly workloads
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications.
AMAZON’S KEY-VALUE STORE: DYNAMO DeCandia,Hastorun,Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels: Dynamo: Amazon's highly available.
Map/Reduce in Practice Hadoop, Hbase, MongoDB, Accumulo, and related Map/Reduce- enabled data stores.
Cassandra Structured Storage System over a P2P Network Avinash Lakshman, Prashant Malik.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
1 Dynamo Amazon’s Highly Available Key-value Store Scott Dougan.
Cloud Storage Theo Benson. Outline Distributed storage – Commodity server, limited resources, – Geodistribution, scalable, reliable Cassandra [FB] – High.
PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall Some slides/illustrations.
Cloud Storage Yizheng Chen. Outline Cassandra Hadoop/HDFS in Cloud Megastore.
NoSQL Databases: MongoDB vs Cassandra
Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Yahoo! Research Presenter.
Cassandra Database Project Alireza Haghdoost, Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov. 17, 2011 News Presentation:
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
A Decentralized Structure Storage Model - Avinash Lakshman & Prashanth Malik - Presented by Srinidhi Katla CASSANDRA.
Wide-area cooperative storage with CFS
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Cloud Storage: All your data belongs to us! Theo Benson This slide includes images from the Megastore and the Cassandra papers/conference slides.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Peer-to-Peer in the Datacenter: Amazon Dynamo Aaron Blankstein COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
IBM Almaden Research Center © 2011 IBM Corporation 1 Spinnaker Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore Jun Rao Eugene.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University Subject : Cassandra - A Decentralized Structured Storage System Professor.
VLDB2012 Hoang Tam Vo #1, Sheng Wang #2, Divyakant Agrawal †3, Gang Chen §4, Beng Chin Ooi #5 #National University of Singapore, †University of California,
Cloud Computing Cloud Data Serving Systems Keke Chen.
High Throughput Computing on P2P Networks Carlos Pérez Miguel
Apache Cassandra - Distributed Database Management System Presented by Jayesh Kawli.
Alireza Angabini Advanced DB class Dr. M.Rahgozar Fall 88.
Cassandra – A Decentralized Structured Storage System Lecturer : Prof. Kyungbaek Kim Presenter : I Gde Dharma Nugraha.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
HDB++: High Availability with
Scalable Data Scale #2 site on the Internet (time on site) >200 billion monthly page views Over 1 million developers in 180 countries.
1 Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sears Yahoo! Research.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Cassandra. Outline Introduction Background Use Cases Data Model & Query Language Architecture Conclusion.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Cassandra Architecture.
Kitsuregawa Laboratory Confidential. © 2007 Kitsuregawa Laboratory, IIS, University of Tokyo. [ hoshino] paper summary: dynamo 1 Dynamo: Amazon.
Amirhossein Saberi May CASSANDRA NAME A daughter of the Trojan king Priam, who was given the gift of prophecy by Apollo. When she cheated him, however,
Plan for Final Lecture What you may expect to be asked in the Exam?
Cassandra The Fortune Teller
Distributed, real-time actionable insights on high-volume data streams
Distributed Systems CS 425 / ECE 428 Fall 2012
Cassandra - A Decentralized Structured Storage System
Dynamo: Amazon’s Highly Available Key-value Store
NOSQL.
The NoSQL Column Store used by Facebook
NOSQL databases and Big Data Storage Systems
Apache Cassandra for the SQLServer DBA
Replication Middleware for Cloud Based Storage Service
Providing Secure Storage on the Internet
Building a Database on S3
Benchmarking Cloud Serving Systems with YCSB
A Scalable Peer-to-peer Lookup Service for Internet Applications
Presentation transcript:

Cassandra - A Decentralized Structured Storage System Avinash Lakshman and Prashant Malik Facebook Presented by Ravi Theja M

Agenda Outline Data Model System Architecture Implementation Experiments

Outline Extension of Bigtable with aspects of Dynamo Motivations: High Availability High Write Throughput Fail Tolerance

Data Model Table is a multi dimensional map indexed by key (row key). Columns are grouped into Column Families. 2 Types of Column Families Simple Super (nested Column Families) Each Column has Name Value Timestamp

Data Model keyspace column family column settings name value timestamp * Figure taken from Eben Hewitt’s (author of Oreilly’s Cassandra book) slides.

System Architecture Partitioning Replication Cluster Membership How data is partitioned across nodes Replication How data is duplicated across nodes Cluster Membership How nodes are added, deleted to the cluster

Partitioning Nodes are logically structured in Ring Topology. Hashed value of key associated with data partition is used to assign it to a node in the ring. Hashing rounds off after certain value to support ring structure. Lightly loaded nodes moves position to alleviate highly loaded nodes.

Replication Each data item is replicated at N (replication factor) nodes. Different Replication Policies Rack Unaware – replicate data at N-1 successive nodes after its coordinator Rack Aware – uses ‘Zookeeper’ to choose a leader which tells nodes the range they are replicas for Datacenter Aware – similar to Rack Aware but leader is chosen at Datacenter level instead of Rack level.

Partitioning and Replication h(key1) 1 1/2 E A N=3 C h(key2) F B D * Figure taken from Avinash Lakshman and Prashant Malik (authors of the paper) slides. 9 9

Gossip Protocols Network Communication protocols inspired for real life rumour spreading. Periodic, Pairwise, inter-node communication. Low frequency communication ensures low cost. Random selection of peers. Example – Node A wish to search for pattern in data Round 1 – Node A searches locally and then gossips with node B. Round 2 – Node A,B gossips with C and D. Round 3 – Nodes A,B,C and D gossips with 4 other nodes …… Round by round doubling makes protocol very robust.

Gossip Protocols Variety of Gossip Protocols exists Dissemination protocol Event Dissemination: multicasts events via gossip. high latency might cause network strain. Background data dissemination: continuous gossip about information regarding participating nodes Anti Entropy protocol Used to repair replicated data by comparing and reconciling differences. This type of protocol is used in Cassandra to repair data in replications.

Cluster Management Uses Scuttleback (a Gossip protocol) to manage nodes. Uses gossip for node membership and to transmit system control state. Node Fail state is given by variable ‘phi’ which tells how likely a node might fail (suspicion level) instead of simple binary value (up/down). This type of system is known as Accrual Failure Detector.

Accrual Failure Detector If a node is faulty, the suspicion level monotonically increases with time. Φ(t)  k as t  k Where k is a threshold variable (depends on system load) which tells a node is dead. If node is correct, phi will be constant set by application. Generally Φ(t) = 0

Bootstrapping and Scaling Two ways to add new node New node gets assigned a random token which gives its position in the ring. It gossips its location to rest of the ring New node reads its config file to contact it initial contact points. New nodes are added manually by administrator via CLI or Web interface provided by Cassandra. Scaling in Cassandra is designed to be easy. Lightly loaded nodes can move in the ring to alleviate heavily loaded nodes.

Local Persistence Relies on local file system for data persistency. Write operations happens in 2 steps Write to commit log in local disk of the node Update in-memory data structure. Why 2 steps or any preference to order or execution? Read operation Looks up in-memory ds first before looking up files on disk. Uses Bloom Filter (summarization of keys in file store in memory) to avoid looking up files that do not contain the key.

Read repair if digests differ Read Operation Client Query Result Cassandra Cluster Read repair if digests differ Closest replica Result Replica A Digest Query Digest Response Digest Response Replica B Replica C * Figure taken from Avinash Lakshman and Prashant Malik (authors of the paper) slides.

Facebook Inbox Search Cassandra developed to address this problem. 50+TB of user messages data in 150 node cluster on which Cassandra is tested. Search user index of all messages in 2 ways. Term search : search by a key word Interactions search : search by a user id Latency Stat Search Interactions Term Search Min 7.69 ms 7.78 ms Median 15.69 ms 18.27 ms Max 26.13 ms 44.41 ms

Comparison with MySQL MySQL > 50 GB Data Writes Average : ~300 ms Reads Average : ~350 ms Cassandra > 50 GB Data Writes Average : 0.12 ms Reads Average : 15 ms Stats provided by Authors using facebook data.

Comparison using YCSB Following results taken from ‘Benchmarking Cloud Serving Systems with YCSB’ by Brain F Cooper et all. YCSB is Yahoo Cloud Server Benchmarking framework. Comparison between Cassandra, HBase, PNUTS, and MySQL. Cassandra and Hbase have higher read latencies on a read heavy workload than PNUTS and MySQL, and lower update latencies on a write heavy workload. PNUTS and Cassandra scaled well as the number of servers and workload increased proportionally.

Comparison using YCSB Cassandra, HBase and PNUTS were able to grow elastically while the workload was executing. PNUTS and Cassandra scaled well as the number of servers and workload increased proportionally. HBase’s performance was more erratic as the system scaled.

Thank You