Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Cassandra concepts, patterns and anti- patterns Dave ApacheCon.

Slides:



Advertisements
Similar presentations
Megastore: Providing Scalable, Highly Available Storage for Interactive Services. Presented by: Hanan Hamdan Supervised by: Dr. Amer Badarneh 1.
Advertisements

CASSANDRA-A Decentralized Structured Storage System Presented By Sadhana Kuthuru.
Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Map/Reduce in Practice Hadoop, Hbase, MongoDB, Accumulo, and related Map/Reduce- enabled data stores.
Cassandra Structured Storage System over a P2P Network Avinash Lakshman, Prashant Malik.
COLUMN-BASED DBS BigTable, HBase, SimpleDB, and Cassandra.
NoSQL Databases: MongoDB vs Cassandra
Cassandra Database Project Alireza Haghdoost, Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov. 17, 2011 News Presentation:
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
A Decentralized Structure Storage Model - Avinash Lakshman & Prashanth Malik - Presented by Srinidhi Katla CASSANDRA.
Dynamo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as well as related cloud storage implementations.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
Distributed storage for structured data
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Peer-to-Peer in the Datacenter: Amazon Dynamo Aaron Blankstein COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
ZhangGang, Fabio, Deng Ziyan /31 NoSQL Introduction to Cassandra Data Model Design Implementation.
Introduction & Data Modeling
Zhang Gang Big data High scalability One time write, multi times read …….(to be add )
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
MapReduce – An overview Medha Atre (May 7, 2008) Dept of Computer Science Rensselaer Polytechnic Institute.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Dynamo: Amazon's Highly Available Key-value Store Dr. Yingwu Zhu.
LOGO Discussion Zhang Gang 2012/11/8. Discussion Progress on HBase 1 Cassandra or HBase 2.
Apache Cassandra - Distributed Database Management System Presented by Jayesh Kawli.
Cassandra - A Decentralized Structured Storage System
Cassandra – A Decentralized Structured Storage System Lecturer : Prof. Kyungbaek Kim Presenter : I Gde Dharma Nugraha.
1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.
Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability.
Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Exam and Lecture Overview.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.
MongoDB is a database management system designed for web applications and internet infrastructure. The data model and persistence strategies are built.
Intuitions for Scaling Data-Centric Architectures
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
NOSQL DATABASE Not Only SQL DATABASE
HDB++: High Availability with
Bigtable : A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows,
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
An Introduction to Super-Scalability But first…
Big Data Yuan Xue CS 292 Special topics on.
Database Processing Chapter "No, Drew, You Don’t Know Anything About Creating Queries.” Copyright © 2015 Pearson Education, Inc. Operational database.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Apache Accumulo CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
Kitsuregawa Laboratory Confidential. © 2007 Kitsuregawa Laboratory, IIS, University of Tokyo. [ hoshino] paper summary: dynamo 1 Dynamo: Amazon.
BIG DATA/ Hadoop Interview Questions.
Bigtable A Distributed Storage System for Structured Data.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Plan for Final Lecture What you may expect to be asked in the Exam?
and Big Data Storage Systems
Cassandra - A Decentralized Structured Storage System
Introduction to Cassandra
A free and open-source distributed NoSQL database
CSE-291 (Cloud Computing) Fall 2016
Cassandra Transaction Processing
NOSQL.
The NoSQL Column Store used by Facebook
NOSQL databases and Big Data Storage Systems
Hadoop and NoSQL at Thomson Reuters
Apache Cassandra for the SQLServer DBA
1 Demand of your DB is changing Presented By: Ashwani Kumar
NoSQL Databases Antonino Virgillito.
Fundamentals of Databases
Presentation transcript:

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Cassandra concepts, patterns and anti- patterns Dave ApacheCon EU 2012

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Agenda Choosing NoSQL Cassandra concepts (Dynamo and Big Table) Patterns and anti-patterns of use

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Choosing NoSQL...

Cassandra concepts, patterns and anti-patterns - ApacheCon EU Find data store that doesn’t use SQL 2.Anything 3.Cram all the things into it 4.Triumphantly blog this success 5.Complain a month later when it bursts into flames

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 “NoSQL DBs trade off traditional features to better support new and emerging use cases” solutions-to-hard-problems

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 More widely used, tested and documented software.. (MySQL first OS release 1998).. for a relatively immature product (Cassandra first open-sourced in 2008)

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Ad-hoc querying.. (SQL join, group by, having, order).. for a rich data model with limited ad-hoc querying ability (Cassandra makes you denormalise)

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 What do we get in return?

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Proven horizontal scalability Cassandra scales reads and writes linearly as new nodes are added

Cassandra concepts, patterns and anti-patterns - ApacheCon EU

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 High availability Cassandra is fault-resistant with tunable consistency levels

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Low latency, solid performance Cassandra has very good write performance

Cassandra concepts, patterns and anti-patterns - ApacheCon EU * Add pinch of salt

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Operational simplicity Homogenous cluster, no “master” node, no SPOF

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Rich data model Cassandra is more than simple key-value – columns, composites, counters, secondary indexes

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Choosing NoSQL...

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 “they say … I can’t decide between this project and this project even though they look nothing like each other. And the fact that you can’t decide indicates that you don’t actually have a problem that requires them.” computing-and-fast_ip computing-and-fast_ip (at 30:15)

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Or you haven’t learned enough about them..

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 What tradeoffs are you making? How is it designed? What algorithms does it use? Are the fundamental design decisions sane? html

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Concepts...

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Consistent hashing Vector clocks * Gossip protocol Hinted handoff Read repair iles/amazon-dynamo-sosp2007.pdf Columnar SSTable storage Append-only Memtable Compaction gtable-osdi06.pdf * not in Cassandra Amazon Dynamo + Google Big Table

Cassandra concepts, patterns and anti-patterns - ApacheCon EU Clien t tokens are integers from 0 to Distributed Hash Table (DHT)

Cassandra concepts, patterns and anti-patterns - ApacheCon EU Clien t Coordinator node consistent hashing Clien t

Cassandra concepts, patterns and anti-patterns - ApacheCon EU Clien t replication factor (RF) 3 coordinator node Clien t

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Consistency Level (CL) How many replicas must respond to declare success?

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 LevelDescription ONE1 st Response QUORUMN/2 + 1 replicas LOCAL_QUORUMN/2 + 1 replicas in local data centre EACH_QUORUMN/2 + 1 replicas in each data centre ALLAll replicas For read operations

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 LevelDescription ANYOne node, including hinted handoff ONEOne node QUORUMN/2 + 1 replicas LOCAL_QUORUMN/2 + 1 replicas in local data centre EACH_QUORUMN/2 + 1 replicas in each data centre ALLAll replicas For write operations

Cassandra concepts, patterns and anti-patterns - ApacheCon EU Clien t coordinator node Clien t RF = 3 CL = Quorum

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Hinted Handoff A hint is written to the coordinator node when a replica is down

Cassandra concepts, patterns and anti-patterns - ApacheCon EU Clien t coordinator node Clien t RF = 3 CL = Quorum node offline hint

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Read Repair Background digest query on-read to find and update out-of-date replicas * * carried out in the background unless CL:ALL

Cassandra concepts, patterns and anti-patterns - ApacheCon EU Clien t coordinator node Clien t RF = 3 CL = One background digest query, then update out-of-date replicas

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Big Table...

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Sparse column based data model SSTable disk storage Append-only commit log Memtable (buffer and sort) Immutable SSTable files Compaction

Cassandra concepts, patterns and anti-patterns - ApacheCon EU timestamp Name Value Column Timestamp used for conflict resolution (last write wins)

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Name Value Column Name Value Column Name Value Column we can have millions of columns * * theoretically up to 2 billion

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Name Value Column Name Value Column Name Value Column Row Key Row

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Column Family Column Row Key Column Row Key Column Row Key Column we can have billions of rows

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Write Memtable SSTable Commit Log Memory Disk Write path buffer writes and sort data flush on time or size trigger immutable

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Sorted data written to disk in blocks Each “query” can be answered from a single slice of disk Therefore start from your queries and work backwards

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Patterns and anti-patterns...

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Storing entities as individual columns under one row Pattern

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 row: USERID1234 name:Dave job:Developer Pattern we can use C* secondary indexes to fetch all users with job=developer one row per user

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Storing whole entity as single column blob Anti-pattern

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 row: USERID1234 data: {"name":"Dave", "job":"Developer"} now we can’t use secondary indexes nor easily update safely one row per user Anti-pattern

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Mutate just the changes to entities, make use of C* conflict resolution Pattern

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 $userCf->insert( "USER1234", array("job" => "Cruft") ); Pattern we only update the “job” column, avoiding any race conditions on reading all properties and then writing all, having only updated one

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Lock, read, update Anti-pattern

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Don’t overwrite anything; store as time series data Pattern

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 row: USERID1234 a384cff0-26c1-11e2-81c c9a66 {"action":"create", "name":"Dave"} 10dc4c40-26c2-11e2-81c c9a66 {"action":"update", "name":"foo"} Pattern column name is a type 1 UUID (time based) one row per user; many columns (wide row)

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 We can store all sorts of stuff as time series Pattern

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Order Preserving Paritioner (OPP) randompartitioner-vs-orderpreservingpartitioner/ Anti-pattern

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Distributed counters Pattern

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Super Columns (a trap for the unwary) for-the-unwary/ Anti-pattern

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 In conclusion...

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Cassandra is founded on sound design principles

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 The data model is incredibly powerful

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 CQL and a new breed of clients are making it easier to use

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Lots of tools and integrations exist to expand the feature set

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 There is a strong community and multiple companies offering professional support

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Thanks Learn more about Cassandra (if you’re ever in London) meetup.com/Cassandra-London Learn more about the fundamentals Watch videos from Cassandra SF s looking for a job?

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Extending functionality Search via Apache Solr and DataStax Enterprise Batch processing via Apache Hadoop and DataStax Enterprise Real-time analytics via Acunu Reflex